ANLP 2019 Schedule and materials

This year's schedule and materials will be available on Learn. Materials from 2019 course are below.

Required reading

The main textbook for this course is Speech and Language Processing by Jurafsky and Martin. You can purchase a copy of the 2nd International edition, which is relatively inexpensive, and we will need to refer to some chapters from that edition. A few copies are also on reserve in the library. However, where possible we will be using the chapters from the online draft 3rd edition, which contains more up-to-date content.

You are responsible for all material covered by the assigned reading, and many students find that it is useful to do the reading before lecture. Past students have requested that we prioritize readings, so high priority readings are marking with a (*). If you are really short on time you should focus on these and return to others later. But you are still expected to read everything eventually, and keeping up is still the best strategy!

In the schedule below, we use the following key to required readings:

JM2 = Daniel Jurafsky and James H. Martin (2009). Speech and Language Processing (2nd Edition). Prentice Hall.
There are a few copies on reserve in the Library.
JM3 = Daniel Jurafsky and James H. Martin (2019). Speech and Language Processing (3rd Edition draft).

In section references, when I say section 0 it refers to whatever introductory material comes before section 1.

Optional reading

Linguistics background

In previous years some Informatics students have asked for more background reading on linguistics. A good place to start might be this text, which is available online through the University library:

Linguistic Fundamentals for Natural Language Processing: 100 Essentials from Morphology and Syntax by Emily M Bender. Synthesis Lectures on Human Language Technologies, June 2013, Vol. 6, No. 3 , Pages 1-184.

Further mathematical details

Some students may want a more rigourous treatment of the models and machine learning methods we discuss. In that case I suggest the following textbook. It covers many of the same topics we do, but assumes somewhat more background and comfort with formal methods.

Introduction to Natural Language Processing by Jacob Eisenstein. MIT Press, 2019. (Draft version is available for free from author's github page here.)

Weekly optional readings

Other optional readings related to each week's topics are provided below for students who wish to learn more details, especially about recent research in the area. Some of these papers may also give you ideas for your IRR review. Many of the optional readings assume additional mathematical or machine learning background beyond what is covered in this course, but you may be able to understand the general idea of these papers by reading the introduction and skimming the rest, even if you cannot understand all of the details.

In the schedule below, we use the following key to optional readings:

(A): These readings provide more detail about the week's topics, without requiring much additional knowledge of machine learning or later parts of this course.
(B): The main concepts required to understand these readings are covered by the end of this course (though perhaps not all details).
(M): These readings assume significant mathematical or machine learning background beyond what is covered in this course.

Full Schedule

Videos of the lectures will be available through Learn, normally shortly after each lecture finishes.

Week 1, starting 16 Sep

Who?	Lecture topics	Slides	Reading
SG	1. Course overview	slides, 2x2	JM2 Ch 1
SG	2. Morphology: Inflection, derivation, FSAs	slides, 2x2	JM2/3 2.0-2.1 (), JM2 2.2 (); JM2 2.3-2.4, 3.0-3.1 (*)
SG	3. Morphology: Finite State Transducers, edit distance	slides, 2x2	JM2 3.2-3.7 (*); JM3 2.2-2.5 (except 2.4.3, 2.4.5) worked edit distance example

Lab:

UNIX tools for text processing.

Optional reading:

(M) Cotterell et al. (2016). A Joint Model of Orthography and Morphological Segmentation. An example of how the transducer concept is used in current NLP research, with a probabilistic model that learns both the spelling component and the morpheme segmentation component.
(B) Schone and Jurafsky (2001). Knowledge-Free Induction of Inflectional Morphologies. Uses relatively simple methods to combine multiple sources of information, aiming to learn morphological relationships from a corpus without annotation.
(B) Kirov et al. (2017). A Rich Morphological Tagger for English: Exploring the Cross-Linguistic Tradeoff Between Morphology and Syntax. Uses dependency syntax and neural models, which we will discuss later in this course.
(B) Faruqui et al. (2016). Morpho-syntactic Lexicon Generation Using Graph-based Semi-supervised Learning Uses a graph-based method to predict morpho-syntactic information for large lexicons from small seed lexicons, for a variety of languages.

Week 2, starting 23 Sep

Who?	Lecture topics	Slides	Reading
SG	4. Probability estimation and probabilistic models	slides, 2x2	Basic Probability Theory tutorial
SG	5. Language modelling: N-gram models, entropy	slides, 2x2	JM3 3.0-3.3 (*)
SG	6. Language modelling: smoothing	slides, 2x2	JM3 3.4 (*), 3.5

Tutorial:

Probability and FSMs.

Optional reading:

(A, B) Neubig (2017). Neural Machine Translation and Sequence-to-sequence Models: A Tutorial. The beginning parts of this tutorial focus on things that are relevant for many parts of NLP, not just machine translation. In particular, Section 3 provides an alternative introduction to n-gram language models. Sections 4-5 present log-linear and neural network language models. We'll talk about log-linear models (but not for LMs) later in this course; neural net LMs are covered in depth in NLU+.
(A) Chen and Goodman (1999). An empirical study of smoothing techniques for language modeling. In-depth explanations of many different smoothing methods for n-gram models, with empirical comparisons.
(M) Teh (2006). A Hierarchical Bayesian Language Model based on Pitman-Yor Processes. Demonstrates the mathematical connection between Kneser-Ney smoothing and a hierarchical non-parametric Bayesian model. (Assumes significant mathematical background.)

Week 3, starting 30 Sep

Who?	Lecture topics	Slides	Reading
SG	7. Text Categorization: Naive Bayes models and evaluation	slides, 2x2	JM3 4.0-4.3 (), 4.4-4.6, 4.7 ()
SG	8. Part-of-speech Tagging and HMMs	slides, 2x2	JM3 8.0-8.4.4 (*), 8.7
SG	9. Algorithms for HMMs	slides, 2x2	JM3 8.4.5-8.4.6 (*), JM3 Appendix A.2-A.5

Lab:

Working with probability distributions.

Homework assignment:

Assignment 1 issued this week.

Optional reading:

(A) Garrette et al. (2017). Real-World Semi-Supervised Learning of POS-Taggers for Low-Resource Languages. This paper discusses some of the challenges of developing a POS tagger with as little data annotation as possible and studies what the best use of annotation time is. (It also demonstrates the use of an FST for morphology.)

Week 4, starting 7 Oct

Who?	Lecture topics	Slides	Reading
SG	10. Data, Evaluation, Implications (1): dialect and discrimination	slides, 2x2	Blodgett and O'Connor (2017).
SC	11. Syntax and Context-free grammar, ambiguity	slides, 2x2	JM3 12.0-12.2 (*), 12.5
SC	12. English syntax, agreement, parsing	slides, 2x2	JM3 12.3 (*), glossary of categories

Tutorial:

HMMs and tagging.

Optional reading (dialect and discrimination):

(A) Sap et al. (2019). The Risk of Racial Bias in Hate Speech Detection. Looks at how annotator bias can create algorithmic bias, and ways to mitigate the issue by making annotators aware of dialect differences.
(A/B) Nguyen et al. (2016). Computational Sociolinguistics: A Survey. Provides background and recent work in many aspects of computational sociolinguistics, for students who may wish to know more about the field or some aspect of it.

Optional reading (syntax):

(A) Linguistic Fundamentals for Natural Language Processing: 100 Essentials from Morphology and Syntax by Emily M Bender. Synthesis Lectures on Human Language Technologies, June 2013, Vol. 6, No. 3 , Pages 1-184. [Available online from the University Library.]
(A) JM2 15.1, 15.3. These sections cover feature structures and unification, a way of dealing with agreement without grammar blowup. These ideas are at the heart of grammar formalisms such as Lexical-Functional Grammar and Head-driven phrase structure grammar, which have a long history in computational linguistics and NLP.

Week 5, starting 14 Oct

Who?	Lecture topics	Slides	Reading
SC	13. Parsing as search: recursive descent and CKY	slides, 2x2	JM3 13.0-13.2 (*)
SC	14. Treebanks and statistical parsing	slides, 2x2	JM3 12.4, 14.0-14.3 (*), 14.4-14.6.0, 14.8
SC	15. Dependency syntax and parsing	slides, 2x2	JM3 15.0-15.3 (*)

Lab:

Recursive Descent Parser.

Optional reading:

(B) Chen and Manning (2014). A Fast and Accurate Dependency Parser using Neural Networks. An early paper from the recent rise of neural networks for NLP. We'll learn about neural networks next week.
(A) JM3 12.6, 14.7. These sections of the textbook describe Combinatory Categorial Grammar (CCG) and how to parse with it. CCG is another lexicalized grammar formalism, and is popular in NLP because it has very explicit ties between the syntax and semantics, making it convenient for semantic parsing. We will discuss semantic parsing later in the course, although sadly the JM3 chapter, which I believe will add discussion of semantic parsing with CCG, isn't ready yet! CCG is particularly popular in Edinburgh because it was invented by Edinburgh Prof Mark Steedman.

Week 6, starting 21 Oct

Who?	Lecture topics	Slides	Reading
SC	16. Dependency parsing and logistic regression	slides, 2x2	JM3 15.4 (), 15.6, 5.0-5.1 ()
SC	17. Exam preparation, midesemester feedback	slides, 2x2
SC	18. Grammar writing exercise		Instructions on how to prepare for class

Tutorial:

Syntax and parsing.

Optional reading:

(A) Charniak and Johnson (2005). Coarse-to-Fine n-Best Parsing and MaxEnt Discriminative Reranking. This paper shows how to use a logistic regression model (also called a Maximum Entropy model, as in this paper) to rerank parses produced by a generative statistical parser, and discusses what features are helpful. Its results on the WSJ corpus were state of the art for many years.
(M) Choe and Charniak (2016). Parsing as Language Modeling. This paper presents a recent best performance on parsing WSJ. It uses ideas from neural network language models which are covered in sem 2 NLP courses.

Week 7, starting 28 Oct

Who?	Lecture topics	Slides	Reading
SC	19. Logistic regression (cont)	slides, 2x2	JM3 5.2-5.5, JM2 6.7
SC	20. Lexical semantics 1: Word senses, relations, disambiguation	slides, 2x2	JM3 6.0-1 (), Appendix C.0-2 (). JM3 C.4-5, C.8-9.
SC	21. Lexical semantics 2: vector models, co-ocurrence and PMI	slides, 2x2	JM3 6.2-4 (), JM3 6.7 (), JM2 20.7.2

Lab:

Text classification and feature selection.

Optional reading:

(A) JM3, Chapter 7 on Neural Networks.
(A) Haspelmath (2000). Semantic maps and cross-linguistic comparison. This chapter discusses polysemy from a cross-linguistic perspective. In particular, it looks at how different languages carve up the space of meanings, and whether this might tell us something about universal properties of language. It may also give you a better idea of why machine translation is difficult!

Week 8, starting 4 Nov

Who?	Lecture topics	Slides	Reading
SC	22. Lexical semantics 3: tf-idf, dense vectors for word embeddings	slides, 2x2	JM3 6.5,6.8,6.10 (*) 6.6
SG	23. Data, evaluation, implications (2): use and collection of human data, including social media and assignment 2	slides, 2x2	School research ethics procedure
SG	24. Data, evaluation, implications (3): evaluation, claims, and evidence	slides, 2x2	correlation, 2x2

Tutorial:

Homework assignment:

Assignment 3 issued this week. See Learn (Assessment and Exams) for materials and due date.

Optional reading:

(A) Steyvers and Griffiths (2007). Probabilistic topic models. Distributional semantic models and topic models have been extensively investigated not just in NLP, but also as models of human cognition. This paper provides a brief introduction to topic models as cognitive models. A much more thorough investigation can be found in Griffiths, Steyvers, and Tenenbaum (2007).
(A) Shoemark, Kirby, and Goldwater (2018). Inducing a lexicon of sociolinguistic variables from code-mixed text. This paper is just one example of other fun things you can do with word embeddings. We used the structure of the vector space to identify pairs of words on Twitter that have similar meanings in different dialects of English, such as 'revising' (Standard British) vs 'studying' (General American) or 'out' (Standard British) vs 'oot' (Scottish).

Week 9, starting 11 Nov

Who?	Lecture topics	Slides	Reading
JW	Guest lecture on ethical issues in NLP
SG	26. Sentence semantics: Meaning Representation	slides, 2x2	JM3 16.0-16.3.4 (*), 16.3.5, 16.4.0
SG	27. Sentence semantics: Syntax/semantics interface	slides, 2x2	JM2 18.0-18.3.0 (*)

Lab:

Sentiment analysis on Twitter.

Optional reading:

(A) Zettlemoyer and Collins (2005). Learning to Map Sentences to Logical Form: Structured Classification with Probabilistic Categorial Grammars. This is a seminal paper in the field of semantic parsing, showing how to learn meanings for individual words and the parameters of a probabilistic parsing model, given as input sentences paired with their full sentential meaning representations. It does this using a grammar formalism known as Combinatory Categorial Grammar (CCG), which is now widely used for scaled-up versions of the knowledge base query task addressed here. There is a good brief introduction to CCG in the paper.
(M) Reddy et al (2014). Large-scale semantic parsing without question-answer pairs One of several more recent papers that aim to scale up semantic parsing to large knowledge bases (here, Freebase) and reduce the need for supervision.

Week 10, starting 18 Nov

Who?	Lecture topics	Slides	Reading
SG	28. Coreference resolution	slides, 2x2	JM3 22.0-22.2 (*), 22.9
SG	29. Gender bias (esp in coreference)	slides, 2x2	JM3 22.10*, Zhao et al (2019)
SG	no lecture

Tutorial:

Semantics.

Optional reading:

Sun, Tony, et al. (2019). Mitigating Gender Bias in Natural Language Processing: Literature Review.
Lots of papers on other ethics-related topics are now being published at the Ethics in NLP workshop, begun in 2017. Follow the links for a list of downloadable papers from 2017 or 2018.

Home : Teaching : Courses : Anlp

ANLP 2019 Schedule and materials

Schedule: quick links

Required reading

Optional reading

Full Schedule

Week 1, starting 16 Sep

Week 2, starting 23 Sep

Week 3, starting 30 Sep

Week 4, starting 7 Oct

Week 5, starting 14 Oct

Week 6, starting 21 Oct

Week 7, starting 28 Oct

Week 8, starting 4 Nov

Week 9, starting 11 Nov

Week 10, starting 18 Nov