Data Mining and Exploration

DME -- Spring 2013

Instructor: Charles Sutton, <csutton@inf.ed.ac.uk>
TA: Andreea Radulescu, <a.radulescu@sms.ed.ac.uk>
Office Hours: Tuesday and Friday, 10:00 - 11:00, IF 3.26. Also by appointment. Coffee, tea, and biscuits will be provided.
Course Text (recommended): The Elements of Statistical Learning: Data Mining, Inference, and Prediction, Hastie, Tibshirani, and Friedman. (PDF available online.)
Additional Reference (optional, also online): Principles of Data Mining by Hand, Mannila and Smyth

IMPORTANT: Prerequisites: PMR is a pre-requisite for this course, MLPR is a co-requisite for this course.

Welcome! The idea behind this course is to learn to apply modern machine learning techniques, such as those discussed in IAML, MLPR, and PMR, to real world problems. We will discuss data mining, visualization, how to evaluate your results, and a few practical algorithms for clustering and classification. A primary component of the course will be case studies: You will read (and present!) recent research papers from the machine learning and data mining literature. See the course descriptor for more information.

News

Times

Paper Presentation

Miniprojects

Labs

There will be three lab classes. These are intended to given a brief intro to some software packages that you might decide to use in the mini-project. They are not marked, but I encourage you to attend. Hopefully they should not be too difficult. There is no formal allocation to the two lab sessions. Simply show up to whichever of the sessions suits you.

The lab sheets will be posted below prior to the lab sessions.

Deadlines

29 Jan, 4pmChoose paper and group for presentation. Submit to TA via email (see address at top of page).
11 FebChoose data set and group for miniproject. Submit to TA via email (see address at top of page)
1 March (in class)Paper presentations begin.
28 Feb, 4pmSubmit progress report for miniproject.
21 Mar, 4pmMINIPROJECT DUE. Submit to ITO. No extensions.

Lecture Schedule

N.B. Readings in the list below are examinable. The lecture slides are available from the NB discussion site.

 Date Topic
118 JanIntroduction, Overview of Data Mining , Visualizing Data
225 Jan Decision trees , Ensemble methods
Reading: Rob Shapire boosting tutorial (Sections 4-8 not examinable)
Reading: Leo Breiman, Bagging predictors, Machine Learning, 1996
Reading: Section 9.2 of Hastie, Tibshirani, and Friedman
Additional reading: Murphy, Sections 16.1, 16.2 (skip 16.2.5 and 16.2.6), 16.4.1, 16.4.3
31 FebEvaluation of Learning Algorithms
Lab 1 this week (29 Jan).
Readings: Fawcett, T. (2003). ROC Graphs: Notes and Practical Considerations for Researchers. HP Labs Tech Report HPL-2003-4
Dietterich, T. G., (1998). Approximate Statistical Tests for Comparing Supervised Classification Learning Algorithms. Neural Computation, 10 (7) 1895-1924
48 FebApplications: Topic Models
Reading: HTF, Section 14.5.1
Reading: Thomas Hofmann, Probabilistic Latent Semantic Analysis, Proceedings of the Fifteenth Conference on Uncertainty in Artificial Intelligence. (1999)
Reading: Blei, Ng, and Jordan. Latent Dirichlet Allocation. Journal of Machine Learning Research 3 (2003) 993-1022.
Reading: Murphy, 27.3.1, 27.3.2, 27.3.3
Lab 2 this week (5 Feb).
515 FebFinish topic models.
Association Rules
Reading: HTF 14.2
Lab 3 this week (12 Feb).
22 Feb NO CLASS: Innovative Learning Week
61 Mar Paper presentations
78 Mar Paper presentations
815 Mar Paper presentations
922 Mar Paper presentations

This page is maintained by Charles Sutton.



Home : Teaching : Courses 

Informatics Forum, 10 Crichton Street, Edinburgh, EH8 9AB, Scotland, UK
Tel: +44 131 651 5661, Fax: +44 131 651 1426, E-mail: school-office@inf.ed.ac.uk
Please contact our webadmin with any comments or corrections. Logging and Cookies
Unless explicitly stated otherwise, all material is copyright © The University of Edinburgh