Machine learning; Deep learning; Deep neural network; Network-based methods; Graph topology; Disease prediction; Clinical outcome prediction
Résumé :
[en] Background
The availability of high-throughput omics datasets from large patient cohorts has allowed the development of methods that aim at predicting patient clinical outcomes, such as survival and disease recurrence. Such methods are also important to better understand the biological mechanisms underlying disease etiology and development, as well as treatment responses. Recently, different predictive models, relying on distinct algorithms (including Support Vector Machines and Random Forests) have been investigated. In this context, deep learning strategies are of special interest due to their demonstrated superior performance over a wide range of problems and datasets. One of the main challenges of such strategies is the “small n large p” problem. Indeed, omics datasets typically consist of small numbers of samples and large numbers of features relative to typical deep learning datasets. Neural networks usually tackle this problem through feature selection or by including additional constraints during the learning process.
Methods
We propose to tackle this problem with a novel strategy that relies on a graph-based method for feature extraction, coupled with a deep neural network for clinical outcome prediction. The omics data are first represented as graphs whose nodes represent patients, and edges represent correlations between the patients’ omics profiles. Topological features, such as centralities, are then extracted from these graphs for every node. Lastly, these features are used as input to train and test various classifiers.
Results
We apply this strategy to four neuroblastoma datasets and observe that models based on neural networks are more accurate than state of the art models (DNN: 85%-87%, SVM/RF: 75%-82%). We explore how different parameters and configurations are selected in order to overcome the effects of the small data problem as well as the curse of dimensionality.
Conclusions
Our results indicate that the deep neural networks capture complex features in the data that help predicting patient clinical outcomes.
Centre de recherche :
- Luxembourg Centre for Systems Biomedicine (LCSB): Biomedical Data Science (Glaab Group)
Disciplines :
Sciences du vivant: Multidisciplinaire, généralités & autres
Auteur, co-auteur :
TRANCHEVENT, Leon-Charles ; Luxembourg Institute of Health- LIH > Department of Oncology > Proteome and Genome Research Unit ; University of Luxembourg > Luxembourg Centre for Systems Biomedicine (LCSB) > Biomedical Data Science (Glaab Group)
Azuaje, Francisco; Luxembourg Institute of Health - LIH > Department of Oncology > Proteome and Genome Research Unit ; UCB Celltech > Data and Translational Sciences
Rajapakse, Jagath; Nanyang Technological University > School of Computer Science and Engineering > Bioinformatics Research Center
Co-auteurs externes :
yes
Langue du document :
Anglais
Titre :
A deep neural network approach to predicting clinical outcomes of neuroblastoma patients
Xiao B, Zhang W, Chen L, Hang J, Wang L, Zhang R, Liao Y, Chen J, Ma Q, Sun Z, Li L. Analysis of the miRNA-mRNA-lncRNA network in human estrogen receptor-positive and estrogen receptor-negative breast cancer based on TCGA data. Gene; 658:28-35. https://doi.org/10.1016/j.gene.2018.03.011.
Suhre K, Arnold M, Bhagwat AM, Cotton RJ, Engelke R, Raffler J, Sarwath H, Thareja G, Wahl A, DeLisle RK, Gold L, Pezer M, Lauc G, El-Din Selim MA, Mook-Kanamori DO, Al-Dous EK, Mohamoud YA, Malek J, Strauch K, Grallert H, Peters A, Kastenmüller G, Gieger C, Graumann J. Connecting genetic risk to disease end points through the human blood plasma proteome. Nat Commun; 8:14357. https://doi.org/10.1038/ncomms14357.
Mook-Kanamori DO, Selim MME-D, Takiddin AH, Al-Homsi H, Al-Mahmoud KAS, Al-Obaidli A, Zirie MA, Rowe J, Yousri NA, Karoly ED, Kocher T, Sekkal Gherbi W, Chidiac OM, Mook-Kanamori MJ, Abdul Kader S, Al Muftah WA, McKeon C, Suhre K. 1,5-anhydroglucitol in saliva is a noninvasive marker of short-term glycemic control. J Clin Endocrinol Metab; 99(3):479-483. https://doi.org/10.1210/jc.2013-3596.
Liloglou T, Bediaga NG, Brown BRB, Field JK, Davies MPA. Epigenetic biomarkers in lung cancer. Cancer Lett; 342(2):200-212. https://doi.org/10.1016/j.canlet.2012.04.018.
Feng H, Jin P, Wu H. Disease prediction by cell-free DNA methylation. Brief Bioinformatics. https://doi.org/10.1093/bib/bby029.
Zhang W, Yu Y, Hertwig F, Thierry-Mieg J, Zhang W, Thierry-Mieg D, Wang J, Furlanello C, Devanarayan V, Cheng J, Deng Y, Hero B, Hong H, Jia M, Li L, Lin SM, Nikolsky Y, Oberthuer A, Qing T, Su Z. Comparison of RNA-seq and microarray-based models for clinical endpoint prediction. Genome Biol. 2015; 16(1). https://doi.org/10.1186/s13059-015-0694-1.
Yu K-H, Levine DA, Zhang H, Chan DW, Zhang Z, Snyder M. Predicting ovarian cancer patients' clinical response to platinum-based chemotherapy by their tumor proteomic signatures. J Proteome Res; 15(8):2455-2465. https://doi.org/10.1021/acs.jproteome.5b01129.
The Cancer Genome Atlas Research Network. Comprehensive, Integrative Genomic Analysis of Diffuse Lower-Grade Gliomas. N Engl J Med. 2015; 372(26):2481-98. https://doi.org/10.1056/NEJMoa1402121.
Calvas P, Jamot L, Weinbach J, Chassaing N, RaDiCo Team T. The RaDiCo AC-OEIL: a french rare disease cohort dedicated to ocular developmental anomalies in children; 95. https://doi.org/10.1111/j.1755-3768.2017.02782.
De Roach JN, McLaren TL, Paterson RL, O'Brien EC, Hoffmann L, Mackey DA, Hewitt AW, Lamey TM. Establishment and evolution of the australian inherited retinal disease register and DNA bank. Clin Experiment Ophthalmol; 41(5):476-483. https://doi.org/10.1111/ceo.12020.
Firth HV, Richards SM, Bevan AP, Clayton S, Corpas M, Rajan D, Vooren SV, Moreau Y, Pettett RM, Carter NP. DECIPHER: Database of chromosomal imbalance and phenotype in humans using ensembl resources. Am J Hum Genet; 84(4):524-533. https://doi.org/10.1016/j.ajhg.2009.03.010.
Kursa MB. Robustness of random forest-based gene selection methods. BMC Bioinformatics; 15:8. https://doi.org/10.1186/1471-2105-15-8.
Francescatto M, Chierici M, Rezvan Dezfooli S, Zandonà A, Jurman G, Furlanello C. Multi-omics integration for neuroblastoma clinical endpoint prediction. Biol Direct; 13(1):5. https://doi.org/10.1186/s13062-018-0207-8.
Kong Y, Yu T. A graph-embedded deep feedforward network for disease outcome classification and feature selection using gene expression data. Bioinformatics. https://doi.org/10.1093/bioinformatics/bty429.
Dutkowski J, Ideker T. Protein networks as logic functions in development and cancer. PLoS Comput Biol; 7(9):1002180. https://doi.org/10.1371/journal.pcbi.1002180.
Yousefi S, Song C, Nauata N, Cooper L. Learning genomic representations to predict clinical outcomes in cancer. http://arxiv.org/abs/1609.08663.
Katzman J, Shaham U, Bates J, Cloninger A, Jiang T, Kluger Y. DeepSurv: Personalized treatment recommender system using a cox proportional hazards deep neural network; 18(1). https://doi.org/10.1186/s12874-018-0482-1.
Yousefi S, Amrollahi F, Amgad M, Dong C, Lewis JE, Song C, Gutman DA, Halani SH, Velazquez Vega JE, Brat DJ, Cooper LAD. Predicting clinical outcomes from large scale cancer genomic profiles with deep survival models. Sci Rep; 7(1):11707. https://doi.org/10.1038/s41598-017-11817-6.
Wang C, Gong B, Bushel PR, Thierry-Mieg J, Thierry-Mieg D, Xu J, Fang H, Hong H, Shen J, Su Z, Meehan J, Li X, Yang L, Li H, Łabaj PP, Kreil DP, Megherbi D, Gaj S, Caiment F, van Delft J, Kleinjans J, Scherer A, Devanarayan V, Wang J, Yang Y, Qian H-R, Lancashire LJ, Bessarabova M, Nikolsky Y, Furlanello C, Chierici M, Albanese D, Jurman G, Riccadonna S, Filosi M, Visintainer R, Zhang KK, Li J, Hsieh J-H, Svoboda DL, Fuscoe JC, Deng Y, Shi L, Paules RS, Auerbach SS, Tong W. The concordance between RNA-seq and microarray data depends on chemical treatment and transcript abundance. Nat Biotechnol; 32(9):926-32. https://doi.org/10.1038/nbt.3001.
Wang Q, Diskin S, Rappaport E, Attiyeh E, Mosse Y, Shue D, Seiser E, Jagannathan J, Shusterman S, Bansal M, Khazi D, Winter C, Okawa E, Grant G, Cnaan A, Zhao H, Cheung N-K, Gerald W, London W, Matthay KK, Brodeur GM, Maris JM. Integrative genomics identifies distinct molecular classes of neuroblastoma and shows that multiple genes are targeted by regional alterations in DNA copy number. Cancer Res; 66(12):6050-62. https://doi.org/10.1158/0008-5472.CAN-05-4618.
Molenaar JJ, Koster J, Zwijnenburg DA, van Sluis P, Valentijn LJ, van der Ploeg I, Hamdi M, van Nes J, Westerman BA, van Arkel J, Ebus ME, Haneveld F, Lakeman A, Schild L, Molenaar P, Stroeken P, van Noesel MM, Ora I, Santo EE, Caron HN, Westerhout EM, Versteeg R. Sequencing of neuroblastoma identifies chromothripsis and defects in neuritogenesis genes. Nature; 483(7391):589-593. https://doi.org/10.1038/nature10910.
Gene Expression Omnibus. https://www.ncbi.nlm.nih.gov/geo/. Accessed 21 Mar 2017.
R, 2: Genomics Analysis and Visualization Platform. https://hgserver1.amc.nl/cgi-bin/r2/main.cgi. Accessed 20 June 2018.
Zhang B, Horvath S. A general framework for weighted gene co-expression network analysis. Stat Appl Genet Mol Biol. 2005; 4:17. https://doi.org/10.2202/1544-6115.1128.
Tranchevent L-C, Nazarov PV, Kaoma T, Schmartz GP, Muller A, Kim S-Y, Rajapakse JC, Azuaje F. Predicting clinical outcome of neuroblastoma patients using an integrative network-based approach. Biol Direct; 13(1):12. https://doi.org/10.1186/s13062-018-0214-9.
Wang B, Mezlini AM, Demir F, Fiume M, Tu Z, Brudno M, Haibe-Kains B, Goldenberg A. Similarity network fusion for aggregating data types on a genomic scale. Nat Methods. 2014; 11(3):333-7. https://doi.org/10.1038/nmeth.2810.
Decelle A., Krzakala F., Moore C., Zdeborová L.Asymptotic analysis of the stochastic block model for modular networks and its algorithmic applications. Phys Rev E; 84(6):066106. https://doi.org/10.1103/PhysRevE.84.066106.
Das J., Yu H.HINT: High-quality protein interactomes and their applications in understanding human disease. BMC Syst Biol; 6:92. https://doi.org/10.1186/1752-0509-6-92.
Zerbino DR, Achuthan P, Akanni W, Amode MR, Barrell D, Bhai J, Billis K, Cummins C, Gall A, Girón CG, Gil L, Gordon L, Haggerty L, Haskell E, Hourlier T, Izuogu OG, Janacek SH. Ensembl 2018. Nucleic Acids Res; 46:754-61. https://doi.org/10.1093/nar/gkx1098.
Srivastava N, Hinton G, Krizhevsky A, Sutskever I, Salakhutdinov R. Dropout: A simple way to prevent neural networks from overfitting. J Mach Learn Res; 15:1929-58.
Kingma DP, Ba J. Adam: A method for stochastic optimization. http://arxiv.org/abs/1412.6980.
Choobdar S, Ahsen ME, Crawford J, Tomasoni M, Fang T, Lamparter D, Lin J, Hescott B, Hu X, Mercer J, Natoli T, Narayan R, Consortium TDMIC, Subramanian A, Zhang JD, Stolovitzky G, Kutalik Z, Lage K, Slonim DK, Saez-Rodriguez J, Cowen LJ, Bergmann S, Marbach D. Assessment of network module identification across complex diseases. bioRxiv. 2019:265553. https://doi.org/10.1101/265553.
Defferrard M, Bresson X, Vandergheynst P. Convolutional neural networks on graphs with fast localized spectral filtering. http://arxiv.org/abs/1606.09375.
Kipf TN, Welling M. Semi-supervised classification with graph convolutional networks. http://arxiv.org/abs/1609.02907.