Author: | Henry S. Thompson |
---|---|
Date: | 2015-10-20, updated 2016-10-17 |
Copyright: | This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License: You may re-use, redistribute, or modify this work for non-commercial purposes provided you retain attribution to any previous author(s). |
What production should be added to handle the sentence "He ate salad" ?
Add the production Pro -> 'He'
Is there a problem with either of the parse trees ?
The parse tree for "he ate salad with a fork" is correct: the preposition phrase "with a fork" is correctly attached to the verb ('high' attachment). For the sentence "he ate salad with mushrooms", the preposition phrase "with mushrooms" is wrongly attached to the verb, whereas it should be attached to the noun phrase "salad" ('low' attachment).
Change the order of the rules "NP -> N" and "NP -> NP PP"
This leads into a standard problem of left recursion as "NP -> NP PP" production is applied infinitely.
How do you think this behaviour depends on the particular way this recursive descent parser chooses which rule to expand when there are multiple options?
Clearly it does: the infinite recursion depends on the left-recursive rule being chosen before other options (although note that in the case of some unparseable strings the recursion would happen no matter what the order was.
Though the second sentence is ungrammatical, it parsed by our grammar. Modify the grammar to handle such cases
Change VP -> V | V NP | V NP PP to VP -> Vi | Vt NP | Vp NP PP. This will exploit the subcategorization information in the alternative set of productions for the verbs.
Change the grammar to handle number agreement and parse the following sentences:
Use grammar2 in lab5-sol.py
What is the type of the parsed sentence object (Hint: type command) ?
type(psents[0]) gives the object type which is nltk.tree.Tree
Extract the list of words and the list of word,pos-tag tuples from psents[0] using some of the other available methods.
psents[0].leaves() and psents[0].pos()` will give, respectively, the list of words and word, pos-tag tuples
What are the 10 most frequent and least frequent lexical and grammatical productions ?
First download the answer code and look at the definition of production_distribution
- Then do
%run lab5-sol.py
lex_prods, nonlex_prods = production_distribution(psents)
For the 10 most frequent productions
sorted(lex_prods.items(), key=lambda x : x[1], reverse=True)[:10]
sorted(nonlex_prods.items(), key=lambda x : x[1], reverse=True)[:10]
10 least frequent productions
sorted(lex_prods.items(), key=lambda x : x[1])[:10]
sorted(nonlex_prods.items(), key=lambda x : x[1])[:10]