Datasets for Data Mining

This page contains a list of datasets that were selected for the projects for Data Mining and Exploration. Students can choose one of these datasets to work on, or can propose data of their own choice. At the bottom of this page, you will find some examples of datasets which we judged as inappropriate for the projects.


Particle physics data set


Physiological data set


Brain-Computer Interface data set


Prediction of Gene/Protein Localization data set


Prediction of Molecular Bioactivity for Drug Design: Binding to Thrombin dataset


The 4 Universities dataset


Internet advertisements dataset


The Reuters-21578 text dataset


The charitable donations dataset


The caravan insurance data


The yeast S. cerevisiae gene expression vectors


The colon cancer data


The leukemia data set


The human splice site data


Volcanoes on Venus


Network intrusion data


The SuperCOSMOS Sky Survey objects catalogue


Less interesting datasets

You are allowed to come up with your own dataset for this project. In order to guide you in this search, we present here some examples of datasets which were considered less interesting.

The Landsat image data from Statlog


The OHSUMED document collection


The predictive toxicology dataset


The Syskill and Webert Web Page Ratings.


20 News Groups dataset


Yeast Gene Regulation Prediction dataset


CATS benchmark


This page was originally written by Frederick Ducatelle and is maintained by Charles Sutton.


Home : Teaching : Courses : Dme 

Informatics Forum, 10 Crichton Street, Edinburgh, EH8 9AB, Scotland, UK
Tel: +44 131 651 5661, Fax: +44 131 651 1426, E-mail: school-office@inf.ed.ac.uk
Please contact our webadmin with any comments or corrections. Logging and Cookies
Unless explicitly stated otherwise, all material is copyright © The University of Edinburgh