Introduction to Computational Linguistics

Reading and Resources


The course will use the Natural Language Toolkit (NLTK-Lite), developed at Univ of Pennsylvania by Steven Bird and Edward Loper as an open source project at Sourceforge. This year, we will be using Version 0.6 of the toolkit. NLTK-Lite is provided as a Python package and modules from the package can therefore be imported into Python programs. For more details, including documentation, see the

Linguistic Corpus Resources on DICE

For general information about language and speech data on DICE, have a look at the corpora web page.

A variety of corpora are also included as part of the NLTK-Lite distribution, and can be found /usr/share/nltk-data.

Recommended textbook

Daniel Jurafsky and James H. Martin. Speech and Language Processing. Prentice-Hall, 2000. (Errata)

New and revised chapters available online! Speech and Language Processing, 2nd Ed.

Python Resources

The following books and other resources are not required, but may prove useful as references for the programming portions of the course.

Other reading

