Data Intensive Linguistics

This course is an introduction to data-driven methods applied to natural language processing. The emphasis is on methods, but we will survey applications such as syntactic parsing, text classification, information extraction, tagging, summarization. The final lectures will deal with statistical machine translation.

Lecturer: Philipp Koehn

TA: Sebastian Riedel

Lectures: Mondays and Thursdays, 14:00, FH Room A9/11

Tutorials: Tuesday or Wednesday 13:00, AT 5.01
Tutorial group assignments.

Tutorials

Assessment

A single assessment (worth 30%) of the course was given out on January 30. You will have to turn in your paper and code on March 23 in class.

The rest of the marks (70%) will go on the exam. Past exam, solutions.

Syllabus

No Date Topic Slides Reference
1 9 Jan Introduction (I): Words and probability display | print MS chapter 1
2 12 Jan Introduction (II): Estimation and information theory display | print MS chapter 2
3 16 Jan Language modeling (I): From counts to smoothing display | print MS chapter 6
JM chapter 6
4 19 Jan Language modeling (II): Smoothing and back-off display | print MS chapter 6
JM chapter 6
5 23 Jan Tagging (I): Part-of-speech tagging with HMM display | print MS chapter 9/10
JM chapter 8
6 27 Jan Tagging (II): Transformation-Based Learning display | print MS chapter 10
7 30 Jan Project display -
8 2 Feb Tagging (III): Maximum Entropy Models display | print Ratnaparkhi [1996]
Berger et al. [1993]
9 6 Feb Parsing (I): Context-free grammars and chart parsing display | print JM chapter 9/12
10 9 Feb Parsing (II): Lexicalised and probabilistic parsing display | print JM chapter 12
11 13 Feb Word sense disambiguation display | print JM section 17.2,
MS chapter 7
Yarowsky [1995]
12 16 Feb Text categorization and clustering display | print MS chapter 14/16
13 20 Feb Semantics and discourse display | print Carlson et al. [2001]
Pang and Lee [2005]
- 23 Feb NO LECTURE - -
14 27 Feb Machine translation (I): Introduction display | print -
15 2 Mar Machine translation (II): Word-based models and the EM algorithm display | print Brown et al. [2003]
16 6 Mar Machine translation (III): Decoding display | print Koehn [2004]
17 9 Mar Machine translation (IV): Phrase-based models display | print Koehn et al. [2003]
Och and Ney [2002]
18 13 Mar Machine translation (V): Syntax-based models display | print Yamada and Knight [2002]
Chiang [2005]
Collins et al. [2005]
19 16 Mar Machine translation (VI): Advanced topics display part2|
print part2
-
- 20 Mar NO LECTURE - -
20 23 Mar Review - -

MS refers to "Manning and Schütze", JM refers to "Jurafsky and Martin", the two textbooks listed below.

Topics may shift and change during flight.

References

When possible, online papers will be made available. As for books, the key references are:

 


Home : Teaching : Courses