 | Informatics 2A: Competitive Grammar Writing Exercise |
The exercise will take place in Lecture 24, on November 12, 2018. Class that day could potentially take two hours (it is your decision whether to stay for a second hour).
There will be a special hands-on lecture, in which you will design and develop a grammar for English, as part of a Competitive Grammar Writing task. The goal of this exercise is to experiment with the ambiguity that is present in natural language, and understand how complicated it is to model natural language syntax, as opposed to programming languages syntax which you leared about earlier in the class.
Preparation before the lecture:
- You may team up or work alone. A team can have no more than 3 people.
- Members of the team should download the file cgw_inf2a.tgz to a machine from which they will work during the lecture. (The tarball can be extracted by typing: tar xvfz cgw_inf2.tgz. A directory called cgw should be created.)
- Note that you need NLTK to be able to run this code in Python. DICE machines have it, but if you want to run this on your own laptop, you should install it.
- Please carefully read the INF2A-README.txt (first) and README.md (second) in the .tgz file and make sure the code runs properly on a machine of your choice. The better you understand the tools available to you, the more efficient you will be during the lecture in creating your grammar. You are more than welcome to start thinking on how to create your grammar prior to the lecture (either alone or with the rest of your team).
- Please remember that at least ONE team member should bring their laptop that day to lecture!
During the lecture:
Come prepared to the lecture! If you feel comfortable with the exercise, you are more than welcome to work on your grammar at any point before the lecture.
- Each team will sit together and develop a grammar for English. The goal is to have a grammar that has (1) high-coverage - meaning, it can generate
many sentences in English; (2) high-precision - meaning, it should avoid generating non-grammatical sentences as much as possible.
- The .tgz file gives different tools to test and play with your grammar while developing it. You should use these tools.
- More instructions will be given in the beginning of the lecture.
- The lecturer will try to answer any questions team members might have during the class. However, each team is expected to work independently as much as possible to make the most of this exercise.
After the lecture:
Evaluation of grammars:
We will evaluate your grammars by doing the following:
- We will ask humans to evaluate whether the sentences that your grammars generated are grammatical or not.
- We will try to parse the pool of grammatical sentences (perhaps coupled with other grammatical sentences) using your grammar and see what is the average cross-entropy that it gives grammatical sentences.
This means that you want to generate grammatical sentences as much as possible, while making them complex so that other grammars cannot parse them.
The team with the best grammar will get a small prize in the lecture on XX November 2017. Depending on the financial situation of the lecturers at the time, there might be second- and third-place prizes. No grade-affecting marking is involved in this assignment.
Acknowledgements
The code used in this task is based on the competitive grammar writing task code written by Anoop Sarkar (
here) based on the following
paper:
Jason Eisner and Noah A. Smith. Competitive Grammar Writing. In Proceedings of the ACL Workshop on Issues in Teaching Computational Linguistics, pages 97-105, Columbus, OH, June 2008.