The main textbook for this course is Speech and Language Processing by Jurafsky and Martin. You can purchase a copy of the 2nd International edition, which is relatively inexpensive, and we will need to refer to some chapters from that edition. However, where possible we will be using the chapters from the online draft 3rd edition, which contains more up-to-date content.
You are responsible for all material covered by the assigned reading, and many students find that it is useful to do the reading before lecture.
In the schedule below, we use the following key to required readings:
In section references, when I say section 0 it refers to whatever introductory material comes before section 1.
In previous years some Informatics students have asked for more background reading on linguistics. A good place to start might be this text, which is available online through the University library:
Linguistic Fundamentals for Natural Language Processing: 100 Essentials from Morphology and Syntax by Emily M Bender. Synthesis Lectures on Human Language Technologies, June 2013, Vol. 6, No. 3 , Pages 1-184.
Other optional readings are provided below for students who wish to learn more details or about recent research related to each topic. Some of these papers may also give you ideas for your IRR review. Many of the optional readings assume additional mathematical or machine learning background beyond what is covered in this course, but you may be able to understand the general idea of these papers by reading the introduction and skimming the rest, even if you cannot understand all of the details.
In the schedule below, we use the following key to optional readings:
|SG||1. Course overview||slides, 2x2, video||JM2 Ch 1|
|SG||2. Morphology: Inflection, derivation, FSAs||slides, 2x2, video||JM2 Ch 2, 3.0-3.1, 3.9|
|SG||3. Morphology: Finite State Transducers, edit distance||slides, 2x2, video||JM2 3.2–3.8, 3.10–3.11, worked edit distance example|
|SG||4. Probability estimation and probabilistic models||slides, 2x2, video||Basic Probability Theory tutorial|
|SG||5. Language modelling: N-gram models, entropy||slides, 2x2, video||JM3 4.0-4.3|
|SG||6. Language modelling: smoothing||slides, 2x2, video||JM3 4.4-4.5|
|SG||7. Text Categorization: Naive Bayes models and evaluation||slides, 2x2, video||JM3 6.0-6.7|
|SG||8. Part-of-speech Tagging and HMMs||slides, 2x2, video||JM3 10.0-10.3 (high priority), 10.4, 10.7|
|SG||9. Algorithms for HMMs||slides, 2x2, video||JM3 Ch 9.0-9.2, 9.4 (high priority), rest of Ch 9|
Language modelling. Issued 2 Oct, due 18 Oct.
|DG||10. Context-free grammar||slides||JM3 11.0-11.2, 11.5, 12.0-12.1|
|DG||11. English syntax||slides||JM3 11.3|
|AL||12. Parsing algorithms||slides, 2x2, video||JM2 13.0-13.1, JM3 12.0-12.2|
Tutorial:HMMs and tagging. Exercises, Solutions
|SG||13. Probabilistic grammars and parsing||slides, 2x2, video||JM3 11.4, 13.0-13.2 (high priority), 13.6.0, 13.8|
|SG||14. Exam preparation lecture||slides, 2x2, video|
|SG||15. Dependency grammar and parsing||slides, 2x2, video||JM3 14.0-14.4.0 (high priority)|
Lab:Recursive Descent Parser. html, pdf, Solutions
|SG||16. Dependency parsing (2), logistic regression and discriminative models (1)||slides, 2x2, video||JM3 14.4.1-2, 14.6, 7.0-7.2 (high priority), 7.6|
|SG||18. Logistic regression (2)||slides, 2x2, video||JM3 7.3, 7.4 (but not high priority)|
Tutorial:Syntax and parsing. Exercises. Solutions
Grammars and parsing. Issued 23 Oct, due 8 Nov.
|AL||19. Lexical semantics: Word senses, relations, and semantic roles||slides, 2x2, video||JM3 17.0-17.4 (high priority), 17.5, 22.0-22.4|
|SG||20. Distributional semantics 1: coocurrence and PMI, response to feedback||slides, 2x2, video, response slides||JM3 15.0-15.2.0, 15.3.0 (high priority), 15.5|
|FF||21. Guest research lecture: Federico Fancellu on The Alexa Challenge||slides, video|
Lab:Text classification and feature selection. html, pdf, Solutions
|DG||22. Distributional semantics 2: LSA and word2vec||slides||JM3 16.0-16.5|
|DG||23. Distributional semantics 3: Topic models||slides||Probabilistic topic models by David Blei|
|DG||24. Data, evaluation, and ethics 1||slides||SS2 1.1-1.4|
Tutorial:Logistic regression and lexical semantics. Exercises. Solutions
Distributional similarity Issued 10 Nov, due 27 Nov.
|DG||25. Data, evaluation, and ethics 2||slides||SS2 1.5,1.6,3.5|
|SG||26. Sentence semantics: Meaning Representation||slides, 2x2, video||JM2 17.0-17.3.4 (high priority), 17.3.5, 17.4.0|
|SG||27. Sentence semantics: Syntax/semantics interface||slides, 2x2, video||JM2 18.0-18.3.0 (high priority)|
Lab:Sentiment analysis on Twitter. html, pdf, Solutions
|DG||28. Data, evaluation, and ethics 3||slides||SS2 2.1-2.3|
|EvM||29. Guest research lecture: Image description||(Optional: see below)|
|WM||30. Arabic language challenges for language technologies|
Tutorial:Semantics. Exercises. Solutions
Informatics Forum, 10 Crichton Street, Edinburgh, EH8 9AB, Scotland, UK
Tel: +44 131 651 5661, Fax: +44 131 651 1426, E-mail: firstname.lastname@example.org
Please contact our webadmin with any comments or corrections. Logging and Cookies
Unless explicitly stated otherwise, all material is copyright © The University of Edinburgh