| ANLP: Competitive Grammar Writing Exercise |
The exercise will take place in Week 6, during the lecture on October 25, 2019. Class that day could potentially take two hours (it is your decision whether to stay for a second hour).
There will be a special hands-on lecture, in which you will design and develop a grammar for English, as part of a Competitive Grammar Writing task. This will take the lab in Week 5 much further, and allow you to attempt and write a grammar for "language in the wild". The lab in Week 5 should inspire you to take some of its techniques, and make them more robust. The goal of this exercise is to experiment with the ambiguity that is present in natural language, and understand how complicated it is to model natural language syntax.
Preparation before the lecture:
- You may team up or work alone. A team can have no more than 3 people.
- Members of the team should download the file cgw_anlp.tgz to a machine from which they will work during the lecture. (The tarball can be extracted by typing: tar xvfz cgw_anlp.tgz. A directory called cgw should be created.)
- Note that you need NLTK to be able to run this code in Python. DICE machines have it, but if you want to run this on your own laptop, you should install it.
- Please carefully read the ANLP-README.txt (first) and README.md (second) in the .tgz file and make sure the code runs properly on a machine of your choice. The better you understand the tools available to you, the more efficient you will be during the lecture in creating your grammar. You are more than welcome to start thinking on how to create your grammar prior to the lecture (either alone or with the rest of your team).
- Please remember that at least ONE team member should bring their laptop that day to lecture!
During the lecture:
Come prepared to the lecture! If you feel comfortable with the exercise, you are more than welcome to work on your grammar at any point before the lecture.
- Each team will sit together and develop a grammar for English. The goal is to have a grammar that has (1) high-coverage - meaning, it can generate
many sentences in English; (2) high-precision - meaning, it should avoid generating non-grammatical sentences as much as possible.
- The .tgz file gives different tools to test and play with your grammar while developing it. You should use these tools.
- More instructions will be given in the beginning of the lecture.
- The lecturer will try to answer any questions team members might have during the class. However, each team is expected to work independently as much as possible to make the most of this exercise.
After the lecture:
- A team may continue to develop its grammar following the lecture.
- A single member of each team should use the submit command to submit two files:
- The deadline for submission is 26 Oct 2019 at 4:00pm. The files should be submitted using Learn in a ZIP file containing the files mentioned above (grammar files and team file).
Only one member of the team should submit the file.
Evaluation of grammars:
We will evaluate your grammars by doing the following:
- We will ask humans to evaluate whether the sentences that your grammars generated are grammatical or not.
- We will try to parse the pool of grammatical sentences (perhaps coupled with other grammatical sentences) using your grammar and see what is the average cross-entropy that it gives grammatical sentences.
This means that you want to generate grammatical sentences as much as possible, while making them complex so that other grammars cannot parse them.
The team with the best grammar will get a small prize in a lecture following the exercise. Depending on the financial situation of the lecturers at the time, there might be second- and third-place prizes. No grade-affecting marking is involved in this assignment.
Acknowledgements
The code used in this task is based on the competitive grammar writing task code written by Anoop Sarkar (
here) based on the following
paper:
Jason Eisner and Noah A. Smith. Competitive Grammar Writing. In Proceedings of the ACL Workshop on Issues in Teaching Computational Linguistics, pages 97-105, Columbus, OH, June 2008.