Text Technologies for Data Science
The course deals with retrieval technologies behind search engines,
such as Google.
Lecturer: Dr. Victor Lavrenko
TA: Dominik Wurzer
Lectures: 12:10 Mondays and Thursdays in
Hume Tower, Faculty Room South
Python tutorial: Appleton Tower room 4.12, 4-5:30pm Thursday 18/09 or 4-5:30pm Friday 19/09
Lab sessions: Appleton Tower room 4.12, please sign up
Assessment: Final exam: 70% of the mark. Courseworks: 30%.
Policy on late submissions and plagiarism.
- Ranking algorithms, due 4pm Monday 6th October [questions]
Discussion forum: use this to ask questions about lectures,
sign up here.
Introduction: documents, queries, bag-of-words trick
Readings: SE Ch. 1 and 2
Laws of text: Zipf, Heaps, clumpting, index size
Readings: SE Ch.4
Vector space: term weighting, similarity functions.
Readings: SE Ch. 7.1
Vocabulary mismatch 1: tokenization
Readings: SE Ch. 5 (except 5.7)
Vocabulary mismatch 2: stemming, synonyms
[slides 7-14, 22-27]
Vocabulary mismatch 3: relevance feedback, LSI
Readings: SE Ch. 6 and 7.3.2
This page is maintained by Victor Lavrenko
|Informatics Forum, 10 Crichton Street, Edinburgh, EH8 9AB, Scotland, UK
Tel: +44 131 651 5661, Fax: +44 131 651 1426, E-mail:
Please contact our webadmin with
any comments or corrections. Logging and Cookies
Unless explicitly stated otherwise, all material is copyright ©
The University of Edinburgh