Datasets for Data Mining

This page contains a list of datasets that were selected for the projects for Data Mining and Exploration. Students can choose one of these datasets to work on, or can propose data of their own choice. At the bottom of this page, you will find some examples of datasets which we judged as inappropriate for the projects.

Particle physics data set

Physiological data set

Brain-Computer Interface data set

Prediction of Gene/Protein Localization data set

Prediction of Molecular Bioactivity for Drug Design: Binding to Thrombin dataset

The 4 Universities dataset

Internet advertisements dataset

The Reuters-21578 text dataset

The charitable donations dataset

The caravan insurance data

The yeast S. cerevisiae gene expression vectors

The colon cancer data

The leukemia data set

The human splice site data

Volcanoes on Venus

Network intrusion data

The SuperCOSMOS Sky Survey objects catalogue

Less interesting datasets

You are allowed to come up with your own dataset for this project. In order to guide you in this search, we present here some examples of datasets which were considered less interesting.

The Landsat image data from Statlog

The OHSUMED document collection

The predictive toxicology dataset

The Syskill and Webert Web Page Ratings.

20 News Groups dataset

Yeast Gene Regulation Prediction dataset

CATS benchmark

This page was originally written by Frederick Ducatelle and is maintained by Charles Sutton.

Home : Teaching : Courses : Dme 

Informatics Forum, 10 Crichton Street, Edinburgh, EH8 9AB, Scotland, UK
Tel: +44 131 651 5661, Fax: +44 131 651 1426, E-mail:
Please contact our webadmin with any comments or corrections. Logging and Cookies
Unless explicitly stated otherwise, all material is copyright © The University of Edinburgh