If trees are useful, how do we get them?
Parsing is the process of taking a string and a grammar and returning one or more parse trees for that string
Analogous to running a finite-state transducer over a tape
By grammar, or syntax, we have in mind the kind of implicit knowledge of your native language that you had mastered by the time you were 3 years old without explicit instruction
Not the kind of stuff you were later taught in “grammar” school
Refers to the way words can be arranged in a given language
Grammars (and parsing) are key components in many applications
There's a useful (traditional) contrast between two perspectives on this topic:
Key notions that we’ll cover
Key formalism
Groups of words can be shown to act as single units, called constituents
In a given language, these units form coherent classes that behave in similar ways, with respect to
We can observe some commonality over the behaviour of the following English phrases:
- they
- Cambodia
- my aunt's pen
- the reason I can't stay
- taking another look at Moby Dick
- three french hens
One piece of evidence is that they can all precede verbs
______________ surprised him.
Internal, syntagmatic, evidence would be, for example, to observe that combining
usually results in a coherent phrase, which can fit in the above frame
The internal structure of NPs varies from language to language:
There’s nothing easy or obvious about how we come up with
That’s one of reasons there are so many different theories of grammar and competing analyses of the same data
The approach we'll explore isn't exactly "cutting-edge"
Also known as phrase structure grammars
Making explicit the restrictions on rewriting we started with earlier, a CFG consists of
VP → V NP
Some overly-simple rules for noun phrases:
NP → Det Nominal
NP → ProperNoun
Nominal → Noun | Nominal Noun
These rules describe two kinds of NPs:
The third rule illustrates:
a delivery truck repair manual
Huddleston and Pullum's The Cambridge Grammar of the English Language is 1860 pages long
So we won't cover all of English by a very long way
We'll look briefly at
A plane left
S → NP VP
Leave!
S → VP
Did the plane leave?
S → Aux NP VP
When did the plane leave?
S → WH-NP Aux NP VP
We can identify three quite distinct types of noun phrases:
she, he, we, ...
NP -> Pro
Edinburgh, Star Wars, the Eiffel Tower, ...
NP -> PropN
the next prime minister after Thatcher
NP -> CNP
Consider the following moderately complicated noun phrase:
the first three morning flights from Denver to Tampa leaving before 10
We'll need something along the lines of the following tree
That big NP is really about flights
We can dissect this kind of NP into:
Complex noun phrases can start with determiners
CNP → Det CNP
Determiners can be
the, this, a, her
Det → Art
- simple Robin’s car
- complex Robin’s youngest child’s toy
Det → PropN 's
Det → CNP 's
Other premodifiers include
every flight
CNP → Quant CNP
three flights
CNP → Card CNP
first flight
CNP → Ord CNP
- large cars
- extremely sleepy baby
CNP → AP CNP
AP → Adv AP
AP → Adj
There are constraints we haven't captured on the order of pre-modifiers:
- every eligible candidate
- *eligible every candidate
- big red bus
- ?red big bus
Eventually (or even right away), we get to the Nominal
CNP → Nominal
Nominal → Noun
Nominal → Nominal Noun
The postmodifiers which stack up behind the head may include
flight from Seattle
- flights arriving before noon
- first flight to depart
- flights that serve breakfast
- people whom the pilot knows
Similar general (recursive) rules to handle these
Nominal → Nominal PP
Nominal → Nominal GerundVP
Nominal → Nominal InfVP
Nominal → Nominal RelClause