Article (Scientific journals)
Using rule-based machine learning for candidate disease gene prioritization and sample classification of cancer gene expression data
GLAAB, Enrico; Bacardit, Jaume; Garibaldi, Jonathan M. et al.
2012In PLoS ONE, 7 (7), p. 39932 - 39932
Peer Reviewed verified by ORBi
 

Files


Full Text
journal.pone.0039932.pdf
Publisher postprint (566.72 kB)
Download

All documents in ORBilu are protected by a user license.

Send to



Details



Keywords :
microarray; cancer; machine learning; gene expression; disease; prediction; classification; feature selection; rule learning; evolutionary learning
Abstract :
[en] Microarray data analysis has been shown to provide an effective tool for studying cancer and genetic diseases. Although classical machine learning techniques have successfully been applied to find informative genes and to predict class labels for new samples, common restrictions of microarray analysis such as small sample sizes, a large attribute space and high noise levels still limit its scientific and clinical applications. Increasing the interpretability of prediction models while retaining a high accuracy would help to exploit the information content in microarray data more effectively. For this purpose, we evaluate our rule-based evolutionary machine learning systems, BioHEL and GAssist, on three public microarray cancer datasets, obtaining simple rule-based models for sample classification. A comparison with other benchmark microarray sample classifiers based on three diverse feature selection algorithms suggests that these evolutionary learning techniques can compete with state-of-the-art methods like support vector machines. The obtained models reach accuracies above 90% in two-level external cross-validation, with the added value of facilitating interpretation by using only combinations of simple if-then-else rules. As a further benefit, a literature mining analysis reveals that prioritizations of informative genes extracted from BioHEL’s classification rule sets can outperform gene rankings obtained from a conventional ensemble feature selection in terms of the pointwise mutual information between relevant disease terms and the standardized names of top-ranked genes.
Disciplines :
Biochemistry, biophysics & molecular biology
Identifiers :
UNILU:UL-ARTICLE-2012-902
Author, co-author :
GLAAB, Enrico  ;  University of Nottingham
Bacardit, Jaume;  University of Nottingham
Garibaldi, Jonathan M.;  University of Nottingham
Krasnogor, Natalio;  University of Nottingham
External co-authors :
yes
Language :
English
Title :
Using rule-based machine learning for candidate disease gene prioritization and sample classification of cancer gene expression data
Publication date :
2012
Journal title :
PLoS ONE
eISSN :
1932-6203
Publisher :
Public Library of Science
Volume :
7
Issue :
7
Pages :
39932 - 39932
Peer reviewed :
Peer Reviewed verified by ORBi
Available on ORBilu :
since 03 July 2013

Statistics


Number of views
139 (6 by Unilu)
Number of downloads
147 (2 by Unilu)

Scopus citations®
 
104
Scopus citations®
without self-citations
91
OpenCitations
 
84
OpenAlex citations
 
124
WoS citations
 
87

Bibliography


Similar publications



Contact ORBilu