Running and editing OpenCCG grammars for generation

1. Downloading an OpenCCG grammar

Create a directory in your home directory to keep all your code for the NLG course. For example you could call it nlg. From inside this directory, type the following command into a terminal window:

  
      svn checkout http://opennlg13.googlecode.com/svn/trunk/lab lab
    

This should create a subdirectory called lab containing five XML files. Move into this subdirectory and list them to make sure.

2. Setting the relevant environment variables

Before you start, you will need to set three environment variables, so that your machine knows where to find the OpenCCG code. Type the following commands (carefully!) into your shell:

      export JAVA_HOME=/usr

      export OPENCCG_HOME=/group/ltg/projects/lcontrib/lib/openccg

      export PATH="$PATH:$OPENCCG_HOME/bin"
    

Note that these environment variables are local to the shell you are currently working in, and will not persist. If you want, and know how, you can add them to your .brc file.

3. Running the grammar

You are now ready to start using the grammar to parse and generate sentences. Make sure you are inside the lab directory, and start up the OpenCCG interactive interpreter as follows:

      tccg
    

You should see a statement telling you that grammar 'lab' has successfully loaded, and you should also see the prompt tccg>, telling you that the interpreter is awaiting input. If you type in a string of words, the grammar will attempt to parse it. Type in the following strings and see what happens:

      tccg> Giovanni's rocks

      tccg> Giovanni's serves some cheap Italian food

      tccg> Giovanni's rocks some cheap Italian food  
    

In each case, do you understand what the tccg output is telling you?

We can get tccg to show us CCG derivations for grammatical strings. Type the following into the tccg interpreter:

      tccg> :derivs

      tccg> Giovanni's rocks

      tccg> Giovanni's serves some cheap Italian food
    

Exercise - On a sheet of paper, write down the second derivation that tccg shows you (i.e. for the input string Giovanni's serves some cheap Italian food), in the standard format we saw in the lectures, with the words along the top, and the derivation building from top to bottom.

We can also get tccg to show us semantic representations for grammatical strings, in the format of hierarchical formulas of hybrid logic. Type the following into the tccg interpreter:

      tccg> :sem

      tccg> Giovanni's rocks

      tccg> Giovanni's serves some cheap Italian food
    

Exercise - Draw the graph which is described by the hybrid logic representation of the second sentence.

Finally, we can also use the tccg interpreter to "regenerate" from the semantic representation of the last sentence parsed - the program will then apply the chart realisation algorithm to find every strings which realises the input. Type the following at the tccg> prompt to see regeneration in action:

      tccg> Giovanni's rocks
      
      tccg> :r

      tccg> Giovanni's serves some cheap Italian food

      tccg> :r
    

How many realisations did you get in each case? Was this what you expected?

4. Examining the grammar source files

As we said earlier, the OpenCCG grammar we are playing with here consists of five XML files in the lab directory. Of these files, the following two are particularly important: morph.xml and lexicon.xml.

Open these two files in your favourite text editor, for example in emacs. For the moment, just examine the files closely, especially the way the lexical information is distributed across the two files. Ignore any parts where the XML has been commented out, i.e. bits between <!-- and -->.

On a sheet of paper, write down the lab lexicon in the traditional CCG format, i.e. as a list of words paired with categories (including sets of EPs). As an example, your entry for the verb "rocks" should look like this:

      rocks :- Se\NPx : @e rock, @e <THEME> x
    

Pay particular attention to the family containing the category information for the determiner "some". What is different about this family from the others? Why do you think this is?

5. Testbeds

Now, open the testbed.xml file in your text editor and have a quick look at the contents. This file contains a test suite of two kinds of sentence:

Exit from the tccg interpreter (using :q) and type the following into the shell:

      ccg-test
    

When you do this, OpenCCG will attempt to parse every sentence listed and then regenerate from the semantic representation. If the result is "OK" in each case, then this means that it can parse and generate all the ones we want it to, but none of the ones we don't.

6. Adding lexical entries

As it stands, our grammar is somewhat limited. It can only talk about just the one restaurant, Giovanni's, and can only make statements about the overall quality of the restaurant itself, and the price and nationality of the food. Your task in this exercise is to extend the lexicon by adding the following to the morph.xml file (you shouldn't need to touch the lexicon.xml file yet):

  1. the names of two restaurants of your choice
  2. two adjectives denoting nationalities
  3. two adjectives denoting properties that can be predicated of the food served in restaurants
  4. an adjective which means the same thing as "cheap" - make sure you set the stem attribute to "cheap" so that the realiser knows they are synonyms
  5. the synonym for "food" of your choice and preferred level of formality (e.g. "fare", "nosh", "grub", ...)

In each case, make sure the interpreter can parse and regenerate from example sentences in the way you want and would expect. To do this, you will need to keep reloading the grammar each time you make an edit - do this by exiting tccg using :q and then restarting the program. Add appropriate sentences to the testbed file, including ungrammatical ones, and test the grammar using ccg-test.

Finally, add entries to the morph.xml file, so that you can parse and regenerate from "Giovanni's plays some funky music". Again, add appropriate sentences to the testbed file, and run ccg-test.

7. Unary rules

One of the ways in which OpenCCG extends the CCG formalism proper is that OpenCCG allows us to define arbitrary unary rules, in addition to the CCG rules of application, type raising and composition.

Open up the file rules.xml in your text editor, and uncomment the unary rule specified here. Take a close look at this rule, and try and figure out what this rule does, and how it will allow us to generate a wider range of sentences.

Reload the grammar. Type in the following at the prompt:

      tccg> Giovanni's serves some cheap Italian food

      tccg> :r
    

Do you notice anything different from before? Make the relevant changes to the testbed file.

Can you think of any future problems this unary rule might give us, if we keep extending our little grammar? How might we go about solving these problems?

8. NP conjunction

In this part of the lab, we are going to see how we can get the grammar to generate sentences with conjoined subject NPs, i.e. sentences of the form "Giovanni's and Dario's rock" or "Giovanni's and Dario's serve cheap food".

Add the restaurant name "Dario's" to morph.xml. Uncomment both the lexical entry for the conjunction "and" in morph.xml, as well as the lexical family for conjunctions in lexicon.xml. On a sheet of paper write out the lexical entry for "and" in the traditional format, including semantic representation.

Reload the grammar, and try out the following in tccg:

      tccg> Dario's and Giovanni's

      tccg> :r
    

Is the result of regeneration what you would expect?

Now, uncomment the plural verb forms "rock" and "serve" in morph.xml. Reload the grammar, and try out the following:

      tccg> Giovanni's and Dario's rock

      tccg> Giovanni's and Dario's serve cheap Italian food
    

In each case, draw out the semantic graph described by the hybrid logic formula.

Now let's try regeneration:

      tccg> Giovanni's rocks

      tccg> :r

      tccg> Giovanni's and Dario's rock

      tccg> :r

      tccg> Dario's serves cheap Italian food

      tccg> :r

      tccg> Giovanni's and Dario's serve cheap Italian food

      tccg> :r
    

Are you happy with the results of regeneration here? If not why not?

9. Subject/verb agreement (1)

Uncomment the two "macros" listed in the morph.xml file. Add the appropriate one of the following attributes to the lexical entries for the intransitive and transitive verb forms "rock", "rocks", "serve" and "serves" (and why not "play" and "plays" while you are at it):

      macros="@singular"

      macros="@plural"
    

Now reload the grammar and try out the following again to see if this has solved our problem:

      tccg> Giovanni's rocks

      tccg> :r

      tccg> Giovanni's and Dario's rock

      tccg> :r

      tccg> Dario's serves cheap Italian food

      tccg> :r

      tccg> Giovanni's and Dario's serve cheap Italian food

      tccg> :r
    

Is this better?

To understand how the feature macros interact with the lexical entries and families to create fully fleshed out lexical entries, try the following:

      tccg> :feats

      tccg> rocks

      tccg> rock

      tccg> serves

      tccg> serve
    

10. Subject/verb agreement (2)

Uncomment the feat element in the proper noun family in lexicon.xml. What effect do you think that this will have on the sentences generated by our grammar?

Reload the grammar and try out the following again to see if this has solved our problem:

      tccg> Giovanni's rocks

      tccg> :r

      tccg> Giovanni's and Dario's rock

      tccg> :r

      tccg> Dario's serves cheap Italian food

      tccg> :r

      tccg> Giovanni's and Dario's serve cheap Italian food

      tccg> :r
    

Does this solve the problem? Is it at least better than before?

11. Subject/verb agreement (3)

Edit the lexical family for conjunction so as to get the right generation results for the following tests:

      tccg> Giovanni's and Dario's rock

      tccg> :r

      tccg> Giovanni's and Dario's serve cheap Italian food

      tccg> :r
    

Finally, update the testbed file to reflect the final state of your grammar, remembering to include ungrammatical examples too.

12. Coda

If you finish early, play around with the grammar to try and make it do something interesting. If you get stuck, ask one of the OpenCCG experts on hand.