This week’s tutorial exercises focus on probabilistic parsing, the CYK algorithm and estimation of PCFGs. After working through the exercises, you should be able to:
Simulate the CYK algorithm on a PCFG, where rules are associated with probabilities. You will understand the main inductive step in CYK in the context of non-binary rules.
Estimate the probabilities of a PCFG from a small corpus.
Understand the main idea behind the asymptotic complexity of an algorithm (extra challenge, marked by (*)).
(Some of the questions in this tutorial are based on questions given in INF2A in 2018.)
Consider the following probabilistic context free grammar:
S \(\rightarrow\) | Subj VP (1.0) |
VP \(\rightarrow\) | V Obj (0.5) \(\mid\) V Obj Obj (0.3) \(\mid\) V Small (0.2) |
Small \(\rightarrow\) | Obj V (1.0) |
Subj \(\rightarrow\) | I (0.3) \(\mid\) NP (0.7) |
Obj \(\rightarrow\) | her (0.2) \(\mid\) NP (0.8) |
NP \(\rightarrow\) | N (0.5) \(\mid\) Det N (0.5) |
V \(\rightarrow\) | make (0.6) \(\mid\) duck (0.4) |
N \(\rightarrow\) | duck (0.5) \(\mid\) goose (0.5) |
Det \(\rightarrow\) | her (1.0) |
Use a probabilistic CYK-style algorithm to find the most probable parse for the sentence belows and determine its probability.
I make her duck
N.B. Do not convert the grammar to Chomsky Normal Form — instead, you should draw up a CYK-style parse chart for the grammar by combining cells from more than two other cells. If you are not sure how to do that, consider the discussion question (a) below. For small examples this is perfectly feasible, but you should check that you understand why a CYK-style algorithm has difficulties with non-CNF grammars in general. (Hint: think about what additional inferences you might need to make in order to work with grammars in this form.)
Derive a PCFG grammar from the corpus below. Write down the rules of the grammar and calculate their probabilities. Assume for the purpose of the grammar that the corpus is lowercased (e.g. The has been replaced by the).
Draw all possible parse trees for the following sentence, and compute their probabilities using the probabilistic grammar you have created.
He saw the man with the telescope
The following questions are challenge questions, and you don’t need to be concerned if you did not manage to find time to think about or solve all of the questions below. Depending on time, some of this will be discussed during the live session.
The ``time complexity’’ of an algorithm (such as CYK) is a measure of how fast an algorithm runs (or how efficiently) as a function of its input size. If you are not fully familiar with the notion of time complexity or do not remember it in full – now would be the time to brush up on it – this is a very important concept when designing an algorithm, and is likely to prove to be useful in your future career; the more efficient an algorithm is, the more usable it is.
How would you change the CYK algorithm so that it handles non-binary rules (assume there are no unary rules in the grammar of the form of one nonterminal rewriting to a single nonterminal)? Meaning, you would allow in the grammar lexical rules, rewriting a nonterminal to a word, and you would allow in non-lexical rules, rewriting a nonterminal to a sequence of nonterminals, possibly more than two. Think about the chart as a data structure that contains all possible nonterminals which can span a specific substring in the sentence. How would you process each cell if you include non-binary rules in the grammar?
What is the asymptotic time complexity of the CYK algorithm when it includes grammars in Chomsky normal form? Asymptotic time complexity refers to the number of computer operations that must be taken in order to run an algorithm. The complexity is calculated as a function of the input size, and the sentence length in this case denoted by \(n\). If you are not fully familiar with this concept of complexity because of your background, you can consult with pages like this.
How would your answer to the previous question change if we now allow non-binary rules? To account for this, your time complexity calculation needs to take into account both \(n\), the sentence length, and \(d\), the maximal length of the righthand size of a rule in the grammar.
In the lectures, we have seen a construction of an excerpt of a grammar for English. If you speak any other language than English, can you give a few examples for the way you would model some parts of the grammar of that language using a CFG? You can choose any linguistic phenomenon that was described in class.