1. Parsing: What and why
Parsing means determining how a grammar accepts
a sentence
- The details will depend on the nature of the grammar
- For many formalisms, a parse tree is a good way of
demonstrating a parse
What practical reasons are there for parsing?
At least three:
- To support a compositional semantics
- Not just in NLP: consider expressions in a programming language
- By parsing
x = 3+2/4
- We get an obvious scaffold on which to erect the evaluation
- which embodies the precedence rules of the language
2. Why parse, cont'd
More reasons to parse:
- To eliminate (or at least reduce) ambiguity
- For example, in speech recognition, consider "we just [eh ii t]"
- Bigrams won't do much good at distinguishing "just ate" from "just eight"
- But a parser will rule out "We just eight"
- To identify phrase boundaries for text-to-speech
- To get the intonation (timing, pitch) right
3. Why is parsing hard?
Ambiguity makes parsing (potentially) hard (or, at least, expensive)
Some (local) ambiguity is eliminated
- But other (global) ambiguity remains
What makes a parsing algorithm good is how well it contains
the cost of dealing with ambiguity
We will consider two main types of ambiguity
- Part-of-speech ambiguity
- "break/vb the bank" vs. "take a break/nn"
- Structural ambiguity
- "young (men and women)" vs. "(empty cups) and (fag ends)"
These can of course combine, as in the famous "He saw her duck" or "I'm
interested in growing plants"
4. The impact of ambiguity
Examples such as "I'm interested in growing plants" are typical, in that humans often fail to spot the
ambiguity at first
- Our context-driven expectations and/or common sense hide one meaning or another
But machines typically have no way of avoiding having to enumerate all possible readings
The early so-called "broad coverage" grammars found
thousands of parses for many sentences in the original Wall Street
Journal corpus from the ACL/DCI.
Tagging doesn't help:
- The best machine taggers are somewhere around 95% accurate
- The average sentence length in the WSJ corpus is 25 words
- .95 raised to the 25th power is .28
5. Tagging doesn't help enough, cont'd
Taggers make errors, say 1 time in 20
- In other words, the probability of having a completely accurate tag
sequence for the average WSJ sentence of 25 words is less than 30%
- Even if we could get 99% word tagging accuracy, we'd still be looking
at 78% accuracy for the average sentence
- And working backwards, Mary Dalrymple has shown that even with perfect tagging, 30% of sentences would have no reduction in ambiguity, as all parses share the same tag sequence
We'll come back to this when we talk about probabilistic parsing
6. The Chomsky Hierarchy
A reminder of something you looked hard at in INF 2A
- Regular languages
- Regular expressions; Finite-state Automata
- Context-free languages
- (CF) Phrase structure grammars;
Pushdown Automata (
is not regular)
- Context-sensitive languages
- (CS) PSG; Linear-bounded automata ( is not context-free)
- Recursively enumerable languages
- General rewriting rules;
Turing machine
The current consensus is that the natural languages are just a
bit more complex than context-free.
- There's a point in-between CF and
CS, sometimes referred to as the indexed languages, which may be
the sweet spot
- , a reduplication language, i.e. every sentence is a pair of identical
strings over some alphabet, is an indexed language
We'll start with context-free grammars, and their parsers, as they cover
almost all the grammatical phenomena of natural languages
7. Context-free phrase structure grammars
We'll skip the formalities
- And just use the standard rule notation
- J&M Chapter 12 section 12.2.1 has the full story
We'll always use S for the start symbol
Capitalised or camel-case for non-terminals and pre-terminals
- pre-terminals are a common convention for natural
language CF-PSGs: they are the lexical categories
- That is, their expansions are aways a single terminal
And lower-case for terminals
And we'll use some obvious abbreviations using vertical bar:
- NT → ... | ... | ...
- PT → term1 | term2 | term3
8. CF-PSG example
Here's enough to get both readings of "he saw her duck"
- S → NP VP
- NP → D N | Pro | PropN
- D → PosPro | Art | NP 's
- VP → Vi | Vt NP | Vp NP VP
- Pro → i | we | you | he | she | him | her
- PosPro → my | our | your | his | her
- PropN → Robin | Jo
- Art → a | an | the
- N → cat | dog | duck | park | telescope | bench
- Vi → sleep | run | duck
- Vt → eat | break | see | saw
- Vp → see | saw | heard
Parse 1

Parse 2

9. Parsing as search: top-down and bottom-up
We can think of parsing as a search problem:
- Search the space of possible parse trees in an orderly fashion
- Until the right answer is found
- Or the first answer
- Or all answers
- Or the best answer
The top-down search space looks like this for our grammar:

We can search the top-down space breadth-first:

Or depth-first:

We stop breadth-first when we get longer than the input
We stop depth-first when we get mismatches
Loops in the grammar cause problems!
The bottom-up search space can also be searched either breadth-first or 'height'-first
10. Brute-force parsing: recursive descent
Recursive descent parsing explores the search space top-down and,
usually, depth-first
It's trivial to implement
- But very slow to find the answer
11. Required reading
Jurafsky & Martin, second edition, Chapter 12 sections
12.1–12.3, 12.6; Chapter 13 sections 13.1–13.2