The main textbook for this course is Speech and Language Processing by Jurafsky and Martin. You can purchase a copy of the 2nd International edition, which is relatively inexpensive, and we will need to refer to some chapters from that edition. A few copies are also on reserve in the library. However, where possible we will be using the chapters from the online draft 3rd edition, which contains more up-to-date content.
You are responsible for all material covered by the assigned reading, and many students find that it is useful to do the reading before lecture. Past students have requested that we prioritize readings, so high priority readings are marking with a (*). If you are really short on time you should focus on these and return to others later. But you are still expected to read everything eventually, and keeping up is still the best strategy!
In the schedule below, we use the following key to required readings:
In section references, when I say section 0 it refers to whatever introductory material comes before section 1.
In previous years some Informatics students have asked for more background reading on linguistics. A good place to start might be this text, which is available online through the University library:
Linguistic Fundamentals for Natural Language Processing: 100 Essentials from Morphology and Syntax by Emily M Bender. Synthesis Lectures on Human Language Technologies, June 2013, Vol. 6, No. 3 , Pages 1-184.
Other optional readings are provided below for students who wish to learn more details or about recent research related to each topic. Some of these papers may also give you ideas for your IRR review. Many of the optional readings assume additional mathematical or machine learning background beyond what is covered in this course, but you may be able to understand the general idea of these papers by reading the introduction and skimming the rest, even if you cannot understand all of the details.
In the schedule below, we use the following key to optional readings:
This schedule is subject to change! Check back frequently.
Videos of the lectures will be available through Learn, normally shortly after each lecture finishes.
Slides from last year are provided in advance where available. This year's slides will normally be posted the day before the lecture.
|SG||1. Course overview||slides, 2x2||JM2 Ch 1|
|SG||2. Morphology: Inflection, derivation, FSAs||slides, 2x2||JM2 2.0-2.1, 2.2 (*), 2.3-2.4, 3.0-3.1 (*)|
|SG||3. Morphology: Finite State Transducers, edit distance||slides, 2x2||JM2 3.2-3.7 (*), JM3 2.2-2.5 (except 2.4.3)worked edit distance example|
|SG||4. Probability estimation and probabilistic models||slides, 2x2||Basic Probability Theory tutorial|
|SG||5. Language modelling: N-gram models, entropy||slides, 2x2||JM3 3.0-3.3 (*)|
|SG||6. Language modelling: smoothing||slides, 2x2||JM3 3.4 (*), 3.5|
|SG||7. Text Categorization: Naive Bayes models and evaluation||slides, 2x2||JM3 4.0-4.3 (*), 4.4-4.6, 4.7 (*)|
|SG||8. Part-of-speech Tagging and HMMs||slides, 2x2||JM3 8.0-8.4.4 (*), 8.7|
|SG||9. Algorithms for HMMs||slides, 2x2||JM3 8.4.5-8.4.6 (*), JM2 6.3-6.5|
|HT||10. Context-free grammar||slides, 2x2||JM3 10.0-10.2 (*), 10.5|
|HT||11. English syntax||slides, 2x2||JM3 10.3.1-10.3.3 (*), glossary of categories|
|HT||12. More English syntax, agreement||slides, 2x2||JM3 10.3.4 (*)|
Tutorial:HMMs and tagging. Exercises, Solutions
|HT||13. Features, recursive descent and parsing as search||slides, 2x2||JM3 11.1, 11.2.2 (*), (Features, optional: JM2 15.1, 15.3)|
|HT||14. CKY parsing, probabilistic grammars, probabilistic parsing||slides, 2x2||JM3 10.4 (*), 11.2.3 (*), 12.1.0 (*), 12.8|
|SG||15. Probabilistic parsing, dependency syntax||slides, 2x2||JM3 12.1.-12.1.2, 12.2-12.6.0, 13.0-13.3 (*)|
Lab:Recursive Descent Parser. html, pdf, -Solutions
|SG||16. Dependency parsing||slides, 2x2||JM3 13.4 (*), 13.6, 5.0-5.1 (*)|
|SG||17. Exam preparation, midesemester feedback||slides, 2x2|
|SG||18. Logistic regression (1)||slides, 2x2||JM3 5.2-5.5, JM2 6.7|
Tutorial:Syntax and parsing. Exercises. Solutions
Assignment 2: CKY Parsing. Issued 20 Oct, due 5 Nov.
|SG||19. Logistic regression (2)||slides, 2x2||(Same as last time)|
|HT||20. Lexical semantics 1: Word senses, relations, disambiguation||slides, 2x2||JM3 6.0-1 (*), Appendix C.1-2 (*). JM3 C.4,5,8,9. JM2 19.1-5 and 20.1-2 contain similar material|
|HT||21. Lexical semantics 2: vector models, co-ocurrence and PMI||slides, 2x2,||JM3 6.3-4,7 (*), JM2 20.7.2 (*)|
Lab:Text classification and feature selection. html, pdf, Solutions
|HT||22. Lexical semantics 3: tf-idf, dense vectors for word embeddings||slides, 2x2||JM3 6.5,6.8,6.10 (*) 6.6|
|SG||23. Assignment 3 info||slides, 2x2|
|JW||24. Data, evaluation, and ethics 1||slides>, 2x2|
Tutorial:Logistic regression and lexical semantics. Exercises. Solutions
|WM||Guest lecture: Arabic language processing||slides, 2x2|
|SG||26. Sentence semantics: Meaning Representation||slides, 2x2||JM3 14.0-14.3.4 (*), 14.3.5, 14.4.0|
|SG||27. Sentence semantics: Syntax/semantics interface||slides, 2x2||JM2 18.0-18.3.0 (*)|
Lab:Sentiment analysis on Twitter. html, pdf, Solutions
|HT||28. Data, evaluation, and ethics 2||slides, 2x2||JM3 4.8-9*|
|HT||29. Data, evaluation, and ethics 3||slides, 2x2||(Optional: See below)|
|HT||30. Data, evaluation, and ethics 4||slides, 2x2||School research ethics procedure*, Optional readings (see below)|
Tutorial:Semantics. Exercises. Solutions
Informatics Forum, 10 Crichton Street, Edinburgh, EH8 9AB, Scotland, UK
Tel: +44 131 651 5661, Fax: +44 131 651 1426, E-mail: email@example.com
Please contact our webadmin with any comments or corrections. Logging and Cookies
Unless explicitly stated otherwise, all material is copyright © The University of Edinburgh