Data Mining and Exploration, Spring 2017

Table of Contents

In the course, we will discuss modern techniques for analysing, interpreting, visualising and exploiting the data that are captured in scientific and commercial environments. The course develops the ideas taught in other machine learning courses and discusses the issues in applying them to real-world data sets.

The course consists of lectures, supporting computer labs, student presentations on research papers, and a practical mini-project on a real-world dataset.

The breakdown of your total course grade is as follows: 50%: exam; 35%: miniproject; 10%: your presentation; 5%: the presentation summaries (see here).

Lecturer: Michael Gutmann
Teaching Assistant: Agamemnon Krasoulis
DME catalogue pages: DRPS | Informatics | Timetable


Semester week Date Activity Date Activity
wk1     Thu 19/01 lecture 1
wk2 Tue 24/01 lab 1 Thu 26/01 lecture 2
wk3 Tue 31/01 lab 2 Thu 02/02 lecture 3
wk4 Tue 07/02 lab 3 Thu 09/02 lecture 4
wk5 Tue 14/02 lab 4 Thu 16/02 lecture 5
wk6     Thu 02/03 student presentations
wk7     Thu 09/03 student presentations
wk8     Thu 16/03 student presentations
wk9     Thu 23/03 student presentations
wk10     Thu 30/03 student presentations
wk11     Thu 06/04 Recap, Q&A

15:10 - 17:00
Forrest Hill, room 3.D01

15:10 - 17:00
Medical School, BLT (Basement Lecture Theatre) - Doorway 6

Important dates

Deadline for your paper preference Fri 10 Feb 2017, 4pm
Deadline for your project info Fri 17 Feb 2017, 4pm
Miniproject interim report deadline Tue 14 March 2017, 4pm
Miniproject final report deadline Fri 7 April 2017, 4pm
Exam see here


Lecture notes are here (they will be updated as we progress).

  • Lecture 1
    Introduction to the data analysis process, simple descriptions and preprocessing of data
    opening slides
    Chapter 1 in the lecture notes
  • Lecture 2
    Principal component analysis by variance maximisation, by minimisation of approximation error
    Chapter 2 in the lecture notes
  • Lecture 3
    PCA by matrix approximation, dimensionality reduction by PCA
    Chapter 3 in the lecture notes
  • Lecture 4
    Dimensionality reduction by kernel PCA, multidimensional scaling, isomap
    Chapter 3 in the lecture notes
  • Lecture 5
    Evaluating the performance in predictive modelling (e.g. classification and regression), techniques for choosing hyper-parameters
    Chapter 4 in the lecture notes

Computer labs

The course has four computer labs on topics introduced in the lecture. The labs will allow you to play with different methods to gain some intuitive understanding and provide you with practical tools for the miniproject. The GitHub repository for the labs is here.

Student presentations

In the second half of the course, we will have presentations on some of the papers listed here. Feel free to propose papers yourself but please check with the lecturer about suitability.

Detailed instructions and information on the format of the presentations are here.


The goal of the project is to apply machine learning methods to a real data set. A list of potential data sets is available here (same as for the IRDS course). For each dataset, the web page gives a description of the task to be undertaken. You will produce a project report that will be assessed.

Detailed instructions and information on the format of the report are here.

Author: Michael Gutmann

Created: 2017-03-13 Mon 19:51