This year's schedule and materials will be available on Learn. Materials from 2019 course are below.
The main textbook for this course is Speech and Language Processing by Jurafsky and Martin. You can purchase a copy of the 2nd International edition, which is relatively inexpensive, and we will need to refer to some chapters from that edition. A few copies are also on reserve in the library. However, where possible we will be using the chapters from the online draft 3rd edition, which contains more up-to-date content.
You are responsible for all material covered by the assigned reading, and many students find that it is useful to do the reading before lecture. Past students have requested that we prioritize readings, so high priority readings are marking with a (*). If you are really short on time you should focus on these and return to others later. But you are still expected to read everything eventually, and keeping up is still the best strategy!
In the schedule below, we use the following key to required readings:
In section references, when I say section 0 it refers to whatever introductory material comes before section 1.
Linguistics background
In previous years some Informatics students have asked for more background reading on linguistics. A good place to start might be this text, which is available online through the University library:
Linguistic Fundamentals for Natural Language Processing: 100 Essentials from Morphology and Syntax by Emily M Bender. Synthesis Lectures on Human Language Technologies, June 2013, Vol. 6, No. 3 , Pages 1-184.
Further mathematical details
Some students may want a more rigourous treatment of the models and machine learning methods we discuss. In that case I suggest the following textbook. It covers many of the same topics we do, but assumes somewhat more background and comfort with formal methods.
Introduction to Natural Language Processing by Jacob Eisenstein. MIT Press, 2019. (Draft version is available for free from author's github page here.)
Weekly optional readings
Other optional readings related to each week's topics are provided below for students who wish to learn more details, especially about recent research in the area. Some of these papers may also give you ideas for your IRR review. Many of the optional readings assume additional mathematical or machine learning background beyond what is covered in this course, but you may be able to understand the general idea of these papers by reading the introduction and skimming the rest, even if you cannot understand all of the details.
In the schedule below, we use the following key to optional readings:
Videos of the lectures will be available through Learn, normally shortly after each lecture finishes.
Who? | Lecture topics | Slides | Reading |
---|---|---|---|
SG | 1. Course overview | slides, 2x2 | JM2 Ch 1 |
SG | 2. Morphology: Inflection, derivation, FSAs | slides, 2x2 | JM2/3 2.0-2.1 (*), JM2 2.2 (*); JM2 2.3-2.4, 3.0-3.1 (*) |
SG | 3. Morphology: Finite State Transducers, edit distance | slides, 2x2 | JM2 3.2-3.7 (*); JM3 2.2-2.5 (except 2.4.3, 2.4.5) worked edit distance example |
Lab:
UNIX tools for text processing.
Optional reading:
Who? | Lecture topics | Slides | Reading |
---|---|---|---|
SG | 4. Probability estimation and probabilistic models | slides, 2x2 | Basic Probability Theory tutorial |
SG | 5. Language modelling: N-gram models, entropy | slides, 2x2 | JM3 3.0-3.3 (*) |
SG | 6. Language modelling: smoothing | slides, 2x2 | JM3 3.4 (*), 3.5 |
Tutorial:
Probability and FSMs.
Optional reading:
Who? | Lecture topics | Slides | Reading |
---|---|---|---|
SG | 7. Text Categorization: Naive Bayes models and evaluation | slides, 2x2 | JM3 4.0-4.3 (*), 4.4-4.6, 4.7 (*) |
SG | 8. Part-of-speech Tagging and HMMs | slides, 2x2 | JM3 8.0-8.4.4 (*), 8.7 |
SG | 9. Algorithms for HMMs | slides, 2x2 | JM3 8.4.5-8.4.6 (*), JM3 Appendix A.2-A.5 |
Lab:
Working with probability distributions.
Homework assignment:
Assignment 1 issued this week.Optional reading:
Who? | Lecture topics | Slides | Reading |
---|---|---|---|
SG | 10. Data, Evaluation, Implications (1): dialect and discrimination | slides, 2x2 | Blodgett and O'Connor (2017). |
SC | 11. Syntax and Context-free grammar, ambiguity | slides, 2x2 | JM3 12.0-12.2 (*), 12.5 |
SC | 12. English syntax, agreement, parsing | slides, 2x2 | JM3 12.3 (*), glossary of categories |
Tutorial:
HMMs and tagging.Optional reading (dialect and discrimination):
Optional reading (syntax):
Who? | Lecture topics | Slides | Reading |
---|---|---|---|
SC | 13. Parsing as search: recursive descent and CKY | slides, 2x2 | JM3 13.0-13.2 (*) |
SC | 14. Treebanks and statistical parsing | slides, 2x2 | JM3 12.4, 14.0-14.3 (*), 14.4-14.6.0, 14.8 |
SC | 15. Dependency syntax and parsing | slides, 2x2 | JM3 15.0-15.3 (*) |
Lab:
Recursive Descent Parser.Optional reading:
Who? | Lecture topics | Slides | Reading |
---|---|---|---|
SC | 16. Dependency parsing and logistic regression | slides, 2x2 | JM3 15.4 (*), 15.6, 5.0-5.1 (*) |
SC | 17. Exam preparation, midesemester feedback | slides, 2x2 | |
SC | 18. Grammar writing exercise | Instructions on how to prepare for class |
Tutorial:
Syntax and parsing.Optional reading:
Who? | Lecture topics | Slides | Reading |
---|---|---|---|
SC | 19. Logistic regression (cont) | slides, 2x2 | JM3 5.2-5.5, JM2 6.7 |
SC | 20. Lexical semantics 1: Word senses, relations, disambiguation | slides, 2x2 | JM3 6.0-1 (*), Appendix C.0-2 (*). JM3 C.4-5, C.8-9. |
SC | 21. Lexical semantics 2: vector models, co-ocurrence and PMI | slides, 2x2 | JM3 6.2-4 (*), JM3 6.7 (*), JM2 20.7.2 |
Lab:
Text classification and feature selection.Optional reading:
Who? | Lecture topics | Slides | Reading |
---|---|---|---|
SC | 22. Lexical semantics 3: tf-idf, dense vectors for word embeddings | slides, 2x2 | JM3 6.5,6.8,6.10 (*) 6.6 |
SG | 23. Data, evaluation, implications (2): use and collection of human data, including social media and assignment 2 | slides, 2x2 | School research ethics procedure |
SG | 24. Data, evaluation, implications (3): evaluation, claims, and evidence | slides, 2x2 | correlation, 2x2 |
Tutorial:
Homework assignment:
Assignment 3 issued this week. See Learn (Assessment and Exams) for materials and due date.
Optional reading:
Who? | Lecture topics | Slides | Reading |
---|---|---|---|
JW | Guest lecture on ethical issues in NLP | ||
SG | 26. Sentence semantics: Meaning Representation | slides, 2x2 | JM3 16.0-16.3.4 (*), 16.3.5, 16.4.0 |
SG | 27. Sentence semantics: Syntax/semantics interface | slides, 2x2 | JM2 18.0-18.3.0 (*) |
Lab:
Sentiment analysis on Twitter.Optional reading:
Who? | Lecture topics | Slides | Reading |
---|---|---|---|
SG | 28. Coreference resolution | slides, 2x2 | JM3 22.0-22.2 (*), 22.9 |
SG | 29. Gender bias (esp in coreference) | slides, 2x2 | JM3 22.10*, Zhao et al (2019) |
SG | no lecture |
Tutorial:
Semantics.Optional reading:
Informatics Forum, 10 Crichton Street, Edinburgh, EH8 9AB, Scotland, UK
Tel: +44 131 651 5661, Fax: +44 131 651 1426, E-mail: school-office@inf.ed.ac.uk Please contact our webadmin with any comments or corrections. Logging and Cookies Unless explicitly stated otherwise, all material is copyright © The University of Edinburgh |