Informatics Report Series


Report   

EDI-INF-RR-0874


Related Pages

Report (by Number) Index
Report (by Date) Index
Author Index
Institute Index

Home
Title:Feature selection and classification for microarray data analysis: Evolutionary methods for identifying predictive genes
Authors: Thanyaluk Jirapech-Umpai ; Stuart Aitken
Date:Dec 2004
Publication Title:BMC Bioinformatics
Publication Type:Journal Article Publication Status:Published
DOI:10.1186/1471-2105-6-148
Abstract:
Background In the clinical context, samples assayed by microarray are often classified by cell line or tumour type and it is of interest to discover a set of genes that can be used as class predictors. The leukemia dataset of Golub et al. [1] and the NCI60 dataset of Ross et al. [2] present multiclass classification problems where three tumour types and nine cell lines respectively must be identified. We apply an evolutionary algorithm to identify the near-optimal set of predictive genes that classify the data. We also examine the initial gene selection step whereby the most informative genes are selected from the genes assayed. Results In the absence of feature selection, classification accuracy on the training data is typically good, but not replicated on the testing data. Gene selection using the RankGene software [3] is shown to significantly improve performance on the testing data. Further, we show that the choice of feature selection criteria can have a significant effect on accuracy. The evolutionary algorithm is shown to perform stably across the space of possible parameter settings indicating the robustness of the approach. We assess performance using a low variance estimation technique, and present an analysis of the genes most often selected as predictors. Conclusion The computational methods we have developed perform robustly and accurately, and yield results in accord with clinical knowledge: A Z-score analysis of the genes most frequently selected identifies genes known to discriminate AML and Pre-T ALL leukemia. This study also confirms that significantly different sets of genes are found to be most discriminatory as the sample classes are refined.

Links To Paper
1st Link
Bibtex format
@Article{EDI-INF-RR-0874,
author = { Thanyaluk Jirapech-Umpai and Stuart Aitken },
title = {Feature selection and classification for microarray data analysis: Evolutionary methods for identifying predictive genes},
journal = {BMC Bioinformatics},
year = 2004,
month = {Dec},
doi = {10.1186/1471-2105-6-148},
url = {http://www.biomedcentral.com/1471-2105/6/148},
}


Home : Publications : Report 

Please mail <reports@inf.ed.ac.uk> with any changes or corrections.
Unless explicitly stated otherwise, all material is copyright The University of Edinburgh