Doctoral thesis (Dissertations and theses)
Interpreting Omics Data in Parkinson’s Disease: A Statistical, Machine Learning, and Graph Representation Learning Approach


Full Text
Author postprint (27.97 MB)
Request a copy

All documents in ORBilu are protected by a user license.

Send to


Keywords :
Parkinson's disease; Machine learning; graph representation learning; graph neural networks; omics; transcriptomics; metabolomics; pathway; networks
Abstract :
[en] Parkinson’s disease (PD) is characterized by the heterogeneity and complexity of both its clinical symptoms and molecular mechanisms, which hinders the development of reliable diagnostic and prognostic biomarkers. This thesis presents an integrated approach to identify cross-sectional and longitudinal molecular signatures associated with PD diagnosis and motor symptoms by incorporating domain-specific knowledge into the analysis and modeling of blood transcriptomics and metabolomics. Statistical analyses and machine learning algorithms were applied to identify, compare, and interpret relevant factors for predicting PD diagnosis and motor dysfunction severity using molecular measurements at baseline and over time. Both individual molecules and aggregated, higher-level functional representations of global activity changes in cellular pathways, compartments, and protein complex signatures were examined. In addition, two modelling pipelines exploiting graph representation learning on sample similarity networks and molecular interaction networks were implemented for PD case-control classification. Although the resulting machine learning models still have limitations in terms of predictive performance, they highlight a number of robust and pronounced PD-specific changes at baseline and over time, including changes in mitochondrial β-oxidation of fatty acids and purine/xanthine metabolism. These changes remain significant when the analyses are adjusted for relevant confounders, such as the effects of dopaminergic medications on plasma metabolomics. In addition to different machine learning methods, different feature selection approaches were evaluated, highlighting the Lasso approach with unsupervised filters as a favorable strategy. Furthermore, the investigation of longitudinal data showed that even with a limited number of available time points, identified candidate dynamic biomarkers hold promise for further validation studies in larger cohorts with multiple follow-up examinations. Finally, the study of omics data using graph representation learning on molecular interaction networks provided mechanistic insights, confirming changes in known PD-associated genes and metabolites, and uncovering promising new candidate markers. While the use of molecular interaction networks is limited by experimental biases and the incompleteness of known interactions, networks built upon sample similarity among omics profiles can provide an unbiased graph structure, although interpretation of the results may be more challenging. Overall, the comprehensive study of statistical, machine learning, and graph representation learning models presented in this thesis highlights the benefits of using prior domain knowledge for omics data analysis and reveals robust disease associations at the level of single molecules and higher-level representations. The work illustrates the potential of higherlevel functional and network representations, together with dynamic biomarker analysis of longitudinal data, for building predictive models to study a complex and heterogeneous disease such as PD. In addition to these methodological findings, the biological results provide new insights into relevant disease mechanisms in PD and lay the groundwork for validation studies in larger, independent cohorts.
Disciplines :
Engineering, computing & technology: Multidisciplinary, general & others
Biochemistry, biophysics & molecular biology
Computer science
Author, co-author :
GOMEZ DE LOPE, Elisa  ;  University of Luxembourg > Faculty of Science, Technology and Medicine (FSTM) > Department of Engineering (DoE) ; Unilu - University of Luxembourg [LU] > Luxembourg Center for Systems Biomedicine (LCSB), Biomedical Data Science (BDS)
Language :
Title :
Interpreting Omics Data in Parkinson’s Disease: A Statistical, Machine Learning, and Graph Representation Learning Approach
Defense date :
22 February 2024
Institution :
Unilu - Université du Luxembourg [Science, Technology and Medicine], Luxembourg
Degree :
Promotor :
GLAAB, Enrico  ;  University of Luxembourg > Luxembourg Centre for Systems Biomedicine (LCSB) > Biomedical Data Science
LIO, Pietro;  University of Cambridge [GB] > Computer Science, Artificial Intelligence
SCHNEIDER, Reinhard ;  University of Luxembourg > Luxembourg Centre for Systems Biomedicine (LCSB) > Bioinformatics Core
DEL SOL MESA, Antonio ;  University of Luxembourg > Luxembourg Centre for Systems Biomedicine (LCSB) > Computational Biology
NAZAROV, Petr;  Luxembourg Institute of Health > Multiomics Data Science, Platform Bioinformatics
Focus Area :
Computational Sciences
Systems Biomedicine
Available on ORBilu :
since 25 March 2024


Number of views
55 (9 by Unilu)
Number of downloads
4 (4 by Unilu)


Similar publications

Contact ORBilu