ICL Home >> Lab Sessions >> Lab 4 |
This lab is based on the NLTK-Lite tagging tutorial and the first lecture on PoS tagging.
Import the POS-tagged treebank
text into NLTK-lite
and answer the following questions:
flies
)
IN
+ DET
+ NN
(eg
in the lab
).
The regular expression tagger (NN_CD_tagger
)
defined in the notes, aims to identify cardinal numbers and tags
everything else as NN
. The performance of this
tagger could be imporved by extending the regular expression
used for tagging, eg by tagging any word that ends with
s
as a plural noun. Propose three new rules, plus the
plural noun rule just mentioned, which could be used to tag unknown
words based on the shape (e.g., suffixes or other formal properties) of
the word.
Informatics Forum, 10 Crichton Street, Edinburgh, EH8 9AB, Scotland, UK
Tel: +44 131 651 5661, Fax: +44 131 651 1426, E-mail: school-office@inf.ed.ac.uk Please contact our webadmin with any comments or corrections. Logging and Cookies Unless explicitly stated otherwise, all material is copyright © The University of Edinburgh |