The name feature has been around in Linguistics for a long time
The basic idea is to capture generalisations by decomposing monolithic categories into collections of simpler features
Originally developed for phonology, where we might have e.g.
Where we can now 'explain' why /i/ and /u/ behave similarly in certain cases, while /i/ and /e/ go together in other cases.
Those are all binary features
+/- singular; +/- finite
But more often we find features whose values are some enumerated type
person: {1st,2nd,3rd}; number: {sg, pl}; ntype: {count, mass}
We'll follow J&M and write collections of features like this:
It will be convenient to generalise and allow features to take feature bundles as values:
We can now add feature bundles to categories in our grammars
In practice we allow some further notational conveniences:
For example
At one level, features are just a convenience
The allow us to write lexicon entries and rules more transparently
But they also "capture generalisations"
If we write a pair of rules using some (in principle opaque) complex category labels, they are not obviously related in any way:
It appears as if we have to justify each of these independently
just as wellWhereas when we write
we are making a stronger claim, even though 'behind the scenes' this single line corresponds to a collection of simple atomic-category rules
Once you move to feature bundles as the values of features
One strand of modern grammatical theory
Puts essentially all the expressive power of the grammar into feature structures
When we write '=' between two feature paths or feature variables, we mean more than an equality test
Consider the noun phrase "a sheep", and the following rules
The resulting parse tree reveals that we have not only tested for compatibility between the various feature structures, we've actually merged them:
where by the ① we mean that all three agreement values are the the same feature structure
The implications of unification run deep
The three occurrences of ① don't just appear the same
J&M give a detailed introduction to unification, which is what this is called, in section 15.2 (J&M 2nd ed.), and a formal definition in section 15.4.
The directed acyclic graph (DAG) way of drawing feature structures used in J&M 15.4 makes clearer when necessary structure identity is the case, as opposed to contingent value equality
A parser is an algorithm that computes a structure for an input string given a grammar.
All parsers have two fundamental properties:
A recursive descent parser treats a grammar as a specification of how to break down a top-level goal into subgoals
This means that it works very similarly to a particular blind approach to constructing a rewriting interpretation derivation:
We're trying to build a parse tree, given
As for any other depth-first search, we may have to backtrack
Note that, to make the search go depth-first
Finally, we'll need a notion of where the focus of attention is in the tree we're building
We start with
Repeatedly
S → NP VP
NP
and VP
The three imperative actions in the preceding algorithm are defined as follows:
We'll see the operation of this algorithm in detail in this week's lab
Schematic view of the top-down search space:
In depth-first search the parser
In breadth-first search the parser
The bottom-up search space works, as the name implies, from the leaves upwards
Search strategy does not imply a particular directionality in which structures are built.
Recursive descent parsing searches depth-first and builds top-down
Although Shift-reduce parsing also searches depth-first, in contrast it builds structures bottom-up.
It does this by repeatedly
As described, this is just a recogniser
Actual parsing requires more bookkeeping
Given certain constraints, it is possible to pre-compute auxiliary information about the grammar and exploit it during parsing so that no backtracking is required.
Modern computer languages are often parsed this way
A string can have more than one structural analysis (called global ambiguity) for one or both of two reasons:
Within a single analysis, some sub-strings can be analysed in more than one way
Local ambiguity is very common in natural languages as described by formal grammars
All depth-first parsing is inherently serial, and serial parsers can be massively inefficient when faced with local ambiguity.
Depth-first parsing strategies demonstrate other problems with "parsing as search":
The complexity of this blind backtracking is exponential in the worst case because of repeated re-analysis of the same sub-string.
Chart parsing is the name given to a family of solutions to this problem
It seems like we should be able to avoid the kind of repeated reparsing a simple recursive descent parser must often do
A CFG parser, that is, a context-free parser, should be able to avoid re-analyzing sub-strings
The parser's exploration of its search space can exploit this independence
Dynamic programming is the basis for all chart parsing algorithms.
Given a problem, dynamic programming systematically fills a table of solutions to sub-problems
Once solutions to all sub-problems have been accumulated
For parsing, sub-problems are analyses of sub-strings
Each entry in the chart or WFST corresponds to a complete constituent (sub-tree), indexed by the start and end of the sub-string that it covers
A well-formed substring table (aka chart) can be depicted as either a matrix or a graph
When a WFST (aka chart) is depicted as a matrix:
Here's a sample matrix, part-way through a parse
0 See 1 with 2 a 3 telescope 4 in 5 hand 6
We can read this as saying:
A sample graph, for the same situation mid-parse
Important examples of parser types which use a WFST include:
CKY (Cocke, Kasami, Younger) is an algorithm for recognising constituents and recording them in the chart (WFST).
CKY was originally defined for Chomsky Normal Form
We can enter constituent A
in cell (i,j)
iff either
A → b
and
b
is found in cell (i,j)
A → B C
and there is at least one k
between
i
and j
such that
B
is found in cell (i,k)
C
is found in cell (k,j)
Proceeding systematically bottom-up, CKY guarantees that the parser
only looks for rules which might yield a constituent from i
to j
after
it has found all the constituents that might contribute to it, that is
j
Note that this process manifests the fundamental weakness of blind bottom-up parsing:
Grammatical rules | Lexical rules |
S → NP VP | Det → a | the (determiner) |
NP → Det Nom | N → fish | frogs | soup (noun) |
NP → Nom | Prep → in | for (preposition) |
Nom → N SRel | TV → saw | ate (transitive verb) |
Nom → N | IV → fish | swim (intransitive verb) |
VP → TV NP | Relpro → that (relative pronoun) |
VP → IV PP | |
VP → IV | |
PP → Prep NP | |
SRel → Relpro VP |
Nom: nominal (the part of the NP after the determiner, if any)
SRel: subject relative clause, as in the frogs that ate fish.
Non-terminals occuring (only) on the LHS of lexical rules are sometimes called pre-terminals
Sometimes instead of sequences of words