Data Mining and Exploration

DME -- Spring 2012

Instructor: Charles Sutton, <csutton@inf.ed.ac.uk>
TA: Victor Hernandez-Urbina j.v.hernandez-urbina@sms.ed.ac.uk
Office Hours: Friday 3:50 pm - 4:30 pm, or by appointment
Course Text (recommended): The Elements of Statistical Learning: Data Mining, Inference, and Prediction, Hastie, Tibshirani, and Friedman. (PDF available online.)
Additional Reference (optional, also online): Principles of Data Mining by Hand, Mannila and Smyth

IMPORTANT: Prerequisites: PMR is a pre-requisite for this course, MLPR is a co-requisite for this course.

Welcome! The idea behind this course is to learn to apply modern machine learning techniques, such as those discussed in IAML, MLPR, and PMR, to real world problems. We will discuss data mining, visualization, how to evaluate your results, and a few practical algorithms for clustering and classification. A primary component of the course will be case studies: You will read (and present!) recent research papers from the machine learning and data mining literature. See the course descriptor for more information.

News

Times

Paper Presentation

Miniprojects

Labs

Important Dates

26 Jan, 4pmChoose paper and group for presentation. Submit to TA via email: Victor Hernandez-Urbina j.v.hernandez-urbina@sms.ed.ac.uk
1 Feb, 12-14hrsLab 1: Data visualisation in R.
8 FebChoose data set and group for miniproject. Submit to TA via email: Victor Hernandez-Urbina j.v.hernandez-urbina@sms.ed.ac.uk
8 Feb, 12-14hrsLab 2: LSA in R. LDA in MALLET.
27 Feb (in class)Paper presentations begin.
29 Feb, 12-14hrsLab 3: Classification and evaluation in R.
1 Mar, 4pmSubmit progress report for miniproject.
29 Mar, 4pmMINIPROJECT DUE. Submit to ITO. No extensions.

Lecture Schedule

N.B. Readings in the list below are examinable.

 Date Topic
120 JanIntroduction about Course, Overview of Data Mining [slides] Visualizing Data [slides 4up] [slides 1up]
Reading: HMS Chapters 1-3
227 Jan Decision trees [slides 4up] [slides 1up] Ensemble methods [slides 4up] [slides 1up]
Reading: Rob Shapire boosting tutorial (Sections 4-8 not examinable)
Reading: Leo Breiman, Bagging predictors, Machine Learning, 1996
Reading: Section 9.2 of Hastie, Tibshirani, and Friedman
Reading: HMS Section 10.5
33 FebApplications: Topic Models [slides] [slides 1up]
Reading: HTF, Section 14.5.1
Reading: Thomas Hofmann, Probabilistic Latent Semantic Analysis, Proceedings of the Fifteenth Conference on Uncertainty in Artificial Intelligence. (1999)
Reading: Blei, Ng, and Jordan. Latent Dirichlet Allocation. Journal of Machine Learning Research 3 (2003) 993--1022.
410 FebTopic Models (cont)
517 FebEvaluation of Learning Algorithms [slides 4up] [slides 1up]
Readings: Fawcett, T. (2003). ROC Graphs: Notes and Practical Considerations for Researchers. HP Labs Tech Report HPL-2003-4
Dietterich, T. G., (1998). Approximate Statistical Tests for Comparing Supervised Classification Learning Algorithms. Neural Computation, 10 (7) 1895-1924
Christopher D. Manning, Prabhakar Raghavan and Hinrich Schütze, (2008). Introduction to Information Retrieval, Cambridge University Press. Section 16.3 (only)
24 Feb NO CLASS: Innovative Learning Week
62 Mar Collaborative Filtering
Paper presentations (x2, see schedule)
79 Mar Association Rules [slides] [slides 1up]
Reading: HMS Chapter 13
Paper presentations (x2, see schedule)
816 Mar Paper presentations (x3, see schedule)
923 Mar Paper presentations (x2, see schedule)

This page is maintained by Charles Sutton and Victor Hernandez-Urbina.



Home : Teaching : Courses 

Informatics Forum, 10 Crichton Street, Edinburgh, EH8 9AB, Scotland, UK
Tel: +44 131 651 5661, Fax: +44 131 651 1426, E-mail: school-office@inf.ed.ac.uk
Please contact our webadmin with any comments or corrections.
Unless explicitly stated otherwise, all material is copyright © The University of Edinburgh