Detecting disease genes based on semi-supervised learning and protein--protein interaction networks

Nguyen, Thanh-Phuong; Ho, Tu-Bao

doi:10.1016/j.artmed.2011.09.003

No full text

Article (Scientific journals)

Detecting disease genes based on semi-supervised learning and protein--protein interaction networks

Nguyen, Thanh-Phuong; Ho, Tu-Bao

2012 • In Artificial Intelligence in Medicine, 54 (1), p. 63--71

Peer Reviewed verified by ORBi

Permalink
https://hdl.handle.net/10993/21697

DOI
10.1016/j.artmed.2011.09.003

PubMed
22000346

Files (0)Send to Details Statistics Bibliography Similar publications

Files

Full Text

No document available.

Send to

RIS BibTex APA Chicago Permalink X Linkedin

Details

Abstract :

[en] Objective Predicting or prioritizing the human genes that cause disease, or “disease genes”, is one of the emerging tasks in biomedicine informatics. Research on network-based approach to this problem is carried out upon the key assumption of “the network-neighbour of a disease gene is likely to cause the same or a similar disease”, and mostly employs data regarding well-known disease genes, using supervised learning methods. This work aims to find an effective method to exploit the disease gene neighbourhood and the integration of several useful omics data sources, which potentially enhance disease gene predictions. Methods We have presented a novel method to effectively predict disease genes by exploiting, in the semi-supervised learning (SSL) scheme, data regarding both disease genes and disease gene neighbours via protein–protein interaction network. Multiple proteomic and genomic data were integrated from six biological databases, including Universal Protein Resource, Interologous Interaction Database, Reactome, Gene Ontology, Pfam, and InterDom, and a gene expression dataset. Results By employing a 10 times stratified 10-fold cross validation, the SSL method performs better than the k-nearest neighbour method and the support vector machines method in terms of sensitivity of 85%, specificity of 79%, precision of 81%, accuracy of 82%, and a balanced F-function of 83%. The other comparative experimental evaluations demonstrate advantages of the proposed method given a small amount of labeled data with accuracy of 78%. We have applied the proposed method to detect 572 putative disease genes, which are biologically validated by some indirect ways. Conclusion Semi-supervised learning improved ability to study disease genes, especially a specific disease when the known disease genes (as labeled data) are very often limited. In addition to the computational improvement, the analysis of predicted disease proteins indicates that the findings are beneficial in deciphering the pathogenic mechanisms.

Disciplines :

Human health sciences: Multidisciplinary, general & others

Author, co-author :

Nguyen, Thanh-Phuong ; The Microsoft Research, University of Trento Centre for Computational Systems Biology (COSBI)

Ho, Tu-Bao; Japan Advance Institute of Science and Technology

External co-authors :

yes

Language :

English

Title :

Detecting disease genes based on semi-supervised learning and protein--protein interaction networks

Publication date :

2012

Journal title :

Artificial Intelligence in Medicine

ISSN :

0933-3657

eISSN :

1873-2860

Publisher :

Elsevier Science, Amsterdam, Netherlands

Volume :

Issue :

Pages :

63--71

Peer reviewed :

Peer Reviewed verified by ORBi

Available on ORBilu :

since 03 August 2015

Statistics

Number of views

59 (3 by Unilu)

Number of downloads

0 (0 by Unilu)

More statistics

Scopus citations^®

Scopus citations^®
without self-citations

OpenCitations

WoS citations^™