ANLP 2019 Course Revision Guide
Disclaimer:
This page provides a list of concepts you should be familiar with and questions you should be able to answer if you are thoroughly familiar with the material in the course. It is safe to assume that if you have a good grasp of everything listed here, you will do well on the exam. However, we will not guarantee that only the topics mentioned here, and nothing else, will appear on the exam. In a few cases (mainly formulas) we do specify what *will not* be required for the exam, but otherwise we make no guarantees.
Exam rubric
The exam rubric is as follows:
"All questions are compulsory. Different questions may have different numbers of total marks. Take note of this in allocating time to questions. Calculators may not be used in this examination."
Past papers
Previous versions of this course were called Advanced Natural Language Processing through the 2014-15 academic year. You can look up past exam papers if you wish.
However, there was a major revision of the course in 2014-15, and
several additional updates since then (in particular, about 25%
of the material changed in 2017, and around 10% in 2018). Exams prior to 2014-15 therefore include a lot of material that we no longer cover (and vice versa), and use a somewhat different question style. These are not recommended as study guides.
In comparison to delivery in 2014-2016, the main topics we have eliminated are active chart parsing, details of advanced smoothing methods, and nearly all topics related to discourse. The main topics we have added (some in 2017 and some in 2018) are dependency parsing and more material on distributional semantics, data analysis, experiments, evaluation, and ethical issues.
In working through papers from 2014-2018, you should not expect to be able to answer the following questions:
- 2014-15: 2(b)ii, 3(d)
- 2015-16: 2(b)ii-iii, 2(d)ii, 3(c)
- 2016-17: 2(b), 2(c)ii, 3(d)
- 2017-18: 2(f)
In addition, the rubric of the exam is different this year, as noted above.
Generative probabilistic models
We have discussed the following generative probabilistic models:
- N-Gram Language Model
- Hidden Markov Model
- Probabilistic Context-Free Grammar
- Naive Bayes classifier
For each of these, you should be able to
- describe the generative process and write down the associated formula for the joint probability of latent and observed variables.
- compute the probability of (say) a tag-word sequence, parse tree, or whatever the model describes (assuming you know the model parameters).
- for the models with latent variables, compute the most probable [tag sequence/parse tree/class] for a particular input, hand-simulating any algorithms that might be needed (again assuming you know the model parameters).
- explain how the model is trained.
- give examples of tasks the model could be applied to, and how it would be applied.
- say what the model can and cannot capture about natural language, ideally giving examples of its failure modes.
Discriminative probabilistic models
We have discussed the following discriminative probabilistic model:
- Logistic Regression/MaxEnt model
For this model, you should be able to
- understand the formula for computing the conditional probability of the hidden class given the obervations/features, and be able to apply that formula if you are given an example problem with features and weights. We are not likely to ask you to write down the full formula yourself, but you must know it well enough to be able to answer questions like "which class is the most probable" (without us giving you the formula).
- give examples of tasks the model could be applied to, and how it would apply (e.g., what features might be useful).
- explain at a high level what training the model aims to achieve, and how it differs from training a generative model.
- explain the role of regularization, and identify situations in which it is most important.
- discuss the pros and cons of discriminative vs generative models.
Other formulas
In addition to the equations for the models listed above, you should know the formulas for the following concepts, what they may be used for, and be able to apply them appropriately. Where relevant you should be able to discuss strengths and weaknesses of the associated method, and alternatives.
- Bayes' Rule (also: defn of Condition Probability, law of Total Probability aka Sum Rule, and all other relevant formulas in the Basic Probability Theory reading)
- Noisy channel model
- Maximum Likelihood Estimation
- Add-One / Add-Alpha Smoothing
- Interpolation (for smoothing)
- Dot product, cosine similarity
- Pointwise mutual information
- Precision and recall
Algorithms and computational methods
For each of the following algorithms, you should be able to explain what each of these computes (its input and output), what it is used for, and be able to hand simulate each one. Some of these algorithms are naive, or solve problems that more naive algorithms face. You should be able to explain what those problems are and how the better algorithms solve them.
- Minimum string edit distance algorithm
- Viterbi Algorithm for Hidden Markov Models
- Forward Algorithm for Hidden Markov Models (to compute likelihood)
- Recursive descent parsing
- CKY parsing
- arc-standard transition-based dependency parsing
For each of the following methods, we haven't discussed algorithms at the level of data structures or implemention, but you should still be able to explain what each method computes (its input and output), what it is used for, and be able to hand simulate each one.
- Finite State Automata and Transducers
- Parsing with semantic attachments
For each of the following methods, you should be able to explain what it computes (its input and output), what it is used for, and be able to describe how it works in some detail.
- Expectation-Maximization (forward-backward algorithm) for HMMs
Additional Mathematical and Computational Concepts
Overarching concepts:
- Dynamic programming: What characterizes the tasks this is applied to, and the way that DP solves them? What are examples of DP algorithms? In parsing, be able to contrast DP with other search strategies (e.g. breadth-first or depth-first search).
- Zipf's Law and sparse data: What is Zipf's law and what are its implications? What does "sparse data" refer to? Be able to discuss these with respect to specific tasks.
- Probability estimation and smoothing: What are different methods for estimating probabilities from corpus data, and what are the pros and cons of each, and the characteristic errors? Under what circumstances might you find simpler methods acceptable, or unacceptable? You should be familiar at a high level at least with:
- Maximum Likelihood Estimation
- Add-One / Add-Alpha Smoothing
- Interpolation
- Backoff
- Kneser-Ney Smoothing
Except as noted under "Formulas" above, you do not need to memorize the formulas, but should understand the conceptual differences and motivation behind each method, and should be able to *use* the formulas if they are given to you.
- Prior and likelihood: What do these refer to (in general, and in specific models)? What is their role in a probabilistic model?
- Training, development, and test sets: How are these used and for what reason? Be able to explain their application to particular problems.
In addition, for the following concepts you should be able to
explain each one, give one or two examples where appropriate, and be
able to identify examples if given to you. You should be able to say
what NLP tasks these are relevant to and why.
- Regular Language and Regular Expression
- Noisy channel model
- Search space in parsing: What is one searching for and what is one searching through?
- Breadth-first search, depth-first search, and differences between them
- Top-Down parsing vs. Bottom-Up parsing
- Best-first probabilistic parsing
- Inside cost
- Well-formed Substring Tables
- Difference between recognition and parsing
- Pointwise mutual information
- Context vector
- Vector representation of words/word embedding
- Dimensionality reduction
- Vector-based similarity measures
Linguistic and Representational Concepts
You should be able to explain each of these concepts, give
one or two examples where appropriate, and be able to identify examples if given to you. You should be able to say what NLP tasks these are relevant to and why.
- Ambiguity (of many varieties, wrt all tasks we've discussed)
- Stems, Affixes, Root, Lemma
- Inflectional and derivational Morphology
- Lexical compound
- Part-of-Speech
- Open-class Words, Closed-class Words
- Context-Free Grammar
- Terminal and non-terminal (phrasal) categories
- Bounded and unbounded dependencies
- Dependency syntax
- Head Words (in Syntax)
- Word Senses and relations between them (synonym, hypernym, hyponym, similarity)
- Distributional hypothesis
- Collocations
- Meaning representations (MR)
- First Order Logic
- Canonical Form
- Verifiability
- Compositionality
- Expressivity
- Quantifiers and quantifier scoping
- Lambda expressions
- Reification of events
- Reference, coreference, and anaphora
- Constraints and preferences on coreference resolution
- Winograd schema
Tasks
You should be able to explain each of these tasks, give one or two
examples where appropriate, and discuss cases of ambiguity or what
makes the task difficult. In most cases you should be able to say what
algorithm(s) or general method(s) can be used to solve the task, and
what evaluation method(s) are typically used.
- Tokenization
- Spelling correction
- Morphological analysis
- Language modelling
- PoS-tagging
- Text categorization
- Syntactic parsing
- Word sense disambiguation
- Question answering
- Sentiment analysis
- Semantic analysis (semantic parsing)
- Coreference resolution
Resources
You should be able to describe what linguistic information is captured in each of these resources, and how it might be used in an NLP system.
You should also be able to identify legal and ethical issues in the creation and collection of linguistic resources.
Evaluation concepts and methods
For each of the following, you should be able to explain what each of the specific methods measures, what tasks it would be appropriate for, and why.
- Perplexity
- Accuracy
- Precision, recall, and F-measure (for parsing and for other tasks)
- Correlation
In addition:
- Instrinsic vs. extrinsic evaluation: be able to explain the difference and give examples of each for particular tasks.
- Human evaluation: be able to give examples of tasks where it might be used and in what way, and be able to discuss any ethical issues that might arise.
- Statistical significance: be able to explain what it is and why
it is relevant to evaluation.
- Corpora: Issues involved in collection, annotation and
distribution
- Using social machines/crowd-sourcing for
annotation/evaluation: benefits and weaknesses
Additional ethical issues
In addition to the topics
listed under Resources and Evaluation, you should be able to identify
and briefly discuss other potential ethical issues arising from
developing or deploying NLP tools (e.g., fairness/bias, licensing and
privacy concerns), and say how they might be relevant when presented
with an example task. You should be able to describe methods for measuring and/or mitigating bias that were discussed in lectures.