Datasets for Data Mining

This page contains a list of suggested datasets for the DME mini-projects. Students can choose one of these datasets to work on, or can propose datasets of their own choice. At the bottom of this page, you will find some examples of datasets which we judged as inappropriate for the projects. Please consult the lecturer or the TA before proposing your own dataset.


Contents


Object Classification from Images (Caltech101)


Sentiment Classification from Movie Reviews


Identifying Malaria Parasites from Images


Predicting Cuisines of Recipes


Human Activity Recognition Using Smartphones Data Set


Web quality assesment


Short term movements in stock prices


Energy Usage in the Informatics Forum


Student performance on mathematical problems


Marketing: Predicting Customer Churn


Particle physics data set


Physiological data set


Brain-Computer Interface data set


Prediction of Gene/Protein Localization data set


Prediction of Molecular Bioactivity for Drug Design: Binding to Thrombin dataset


The 4 Universities dataset


Internet advertisements dataset


The Reuters-21578 text dataset


The charitable donations dataset


The caravan insurance data


The yeast S. cerevisiae gene expression vectors


The colon cancer data


The leukemia data set


The human splice site data


Volcanoes on Venus


Network intrusion data


The SuperCOSMOS Sky Survey objects catalogue


Less interesting datasets

You are allowed to come up with your own dataset for this project. In order to guide you in this search, we present here some examples of datasets which were considered less interesting.

The Landsat image data from Statlog


The OHSUMED document collection


The predictive toxicology dataset


The Syskill and Webert Web Page Ratings.


20 News Groups dataset


Yeast Gene Regulation Prediction dataset


CATS benchmark


This page was originally written by Frederick Ducatelle, and has been updated and maintained by Charles Sutton and Stefanos Angelidis.


Home : Teaching : Courses : Dme : 2015 

Informatics Forum, 10 Crichton Street, Edinburgh, EH8 9AB, Scotland, UK
Tel: +44 131 651 5661, Fax: +44 131 651 1426, E-mail: school-office@inf.ed.ac.uk
Please contact our webadmin with any comments or corrections. Logging and Cookies
Unless explicitly stated otherwise, all material is copyright © The University of Edinburgh