DME – Instructions for the Miniproject

Table of Contents

Based on a previous version by Charles Suttons and Stefanos Angelidis
(back to the course homepage)

In the project, you will use data science methods in a realistic setting. A list of potential projects and corresponding data sets is here (same as for the IRDS course). For each dataset, the web page gives a description of the task to be undertaken. If you wish to propose your own project, feel free to contact the TA. You will produce a project report that will be assessed.


You will have considerable freedom in the projects. But it should involve most parts of the data analysis process described in the first lecture. An example project involves

  • reading up on some relevant background to well understand the task and what has been done previously (via google scholar, internet search, in some cases references are provided)
  • some exploratory data analysis
  • if classification is the goal, choosing some methods that might work well on the task, based on the first two steps
  • evaluating the results of the different methods on the task (e.g. by assessing the generalisation performance).

There is no need that you are outperforming previous methods. What is important is that the path taken is reasonable, methodologically correct, and clearly described in the report. Good projects would nonetheless discuss possible differences in performance.

Some of the data sets may be too large to be used directly in the software that you have. In such cases, you are allowed to appropriately sub-sample the data set.


You will work in groups of maximally 4 people. You can use piazza to find team-mates.

By Friday 17 February 2017, 4pm, each group should send an email to the TA with a ranked list of 3 datasets you would like to work on. Please also indicate the names and student numbers of all group members. We'll try to keep everyone happy, but there's a chance you won't be allocated your 1st choice.

By Friday 10 March 2017, 4pm, each group must email the instructor an interim report. It should contain what you have done so far and your plans for completion of the mini-project by the final deadline. While not necessary, you may want to consider it to be a draft of the final report (and you can use the same template if you like). This report will not form part of your numerical mark for the course. The goal of interim report is to make sure that your project has the right scope and that you are on track.

Evaluation of the work on the mini-project will be by a written final report. Each group need only submit one report. Unless for special reasons, all members of the group will receive the same grade.

By Friday 7 April 2017, 4pm, the final report is due. You will need to submit the report (as a pdf), its latex source (including figures) as well as all source code to reproduce the results in the report. The latex source should compile on dice with pdflatex. Please include style files that are not typically installed on dice. Submission should be done via the submit command, e.g. as
submit dme 2 <name-of-single-compressed-archive-file>*
Please use tar.gz or .zip as archive formats. The grade will be based on the final report only.

The report

The report including figures should be maximally 8 pages long, using this template (adapted from the NIPS conference). It should contain the following in some manner:

  • description of the task
  • relevant background and related previous work
  • explanation of the significance/relevance of the objective/task
  • information on the data preparation
  • exploratory data analysis
  • description of the learning (e.g. classification) methods used
  • results and evaluation
  • conclusions

At the end of the report, you must include a short description of how each member of the group contributed to the project, which can be on an additional ninth page. References can also be on the ninth page.

Marking Breakdown

The marking criteria include the appropriateness of the machine learning methods chosen, quality of the analysis, the quality of the evaluation, the amount of work, and the quality of the explanation of the report (both text and graphics). A guide to the letter marks are:

  • A: Well explained description of points above plus extra achievement at understanding or analysis of results. Clear explanations, evidence of creative or deeper thought will contribute to a higher grade.
  • B: Well explained description of points above.
  • C: Good description of points above but significant deficiencies.
  • D: Evidence that the student has gained some understanding, but not addressed the specified task properly.
  • E/F/G Serious error or slack work.


Late penalties The policy of the School of Informatics is that no late submissions are allowed except on valid ground agreed a priori with the year organiser.

Plagiarism Policy The projects are (usually) group projects. Hence you are expected to discuss the work within your group, and to work on your report together. You should write up the project as a whole, including the work of the others in your project. At the end of the report, there should be a short description of how each member of the group contributed to the project. For information about the School Plagiarism policy, see here.

Author: Michael Gutmann

Created: 2017-04-06 Thu 14:44