ICL Home >> Lab Sessions >> Lab 7 |
This lab is based on the NLTK-Lite chunking tutorial.
Start an interactive Python session from the command line, and enter the following statements (or put them in a file and execute them):
>>> from nltk_lite.corpora import conll2000 >>> from itertools import islice >>> from nltk_lite import parse >>> >>> tagged = conll2000.tagged('train') # get a tagged version of training corpus >>> taggedsample = list(islice(tagged,10,13)) # make a list of 3 sentences >>> >>> rule = parse.ChunkRule('<DT>*<JJ>*<NN>+', "Chunk a sequence of DT, JJ and NN") >>> chp = parse.RegexpChunk([rule], chunk_node = 'NP', top_node='S') >>> >>> chunk_tree = chp.parse(taggedsample[0], trace=1) >>> print chunk_tree
Now have a look at the chunked version of the data, and compare it with the output of your rule:
>>> chunked = conll2000.chunked('train') # get a chunked version of training corpus >>> chunkedsample = list(islice(chunked,10,13)) >>> print chunkedsample
Try to improve or add to the rule
above so as to improve your coverage of NP chunks.
You can try measuring how well your chunker does on the first sentence of the sample by using the NLTK-Lite chunk scorer.
>>> chunkscore = parse.ChunkScore() >>> correct = chunkedsample[0] >>> guess = chunk_tree >>> chunkscore.score(correct, guess) >>> print chunkscore
Your result should look something like this:
ChunkParse score: Precision: 33.3% Recall: 14.3% F-Measure: 20.0%
To compare the results of your chunker against the training data chunks in
a more systematic manner, we should look at more of the data. We can also make things
a bit simpler by using the leaves
method to strip out the tree structure (i.e., the
chunks) from the chunked training data:
>>> for correct in chunked: ... guess = chp.parse(correct.leaves()) ... chunkscore.score(correct, guess) >>> print chunkscore
Informatics Forum, 10 Crichton Street, Edinburgh, EH8 9AB, Scotland, UK
Tel: +44 131 651 5661, Fax: +44 131 651 1426, E-mail: school-office@inf.ed.ac.uk Please contact our webadmin with any comments or corrections. Logging and Cookies Unless explicitly stated otherwise, all material is copyright © The University of Edinburgh |