Doctoral thesis (Dissertations and theses)
Machine learning-based Identification of Biomarkers in Clinical Cohort and Cancer Cell Line Data
DIDIER, Jeff
2025
 

Files


Full Text
PhD_Thesis_Biology_JD_v7_final_electronic_numbered.pdf
Author postprint (39.01 MB)
Download

All documents in ORBilu are protected by a user license.

Send to



Details



Keywords :
Machine learning, Biomarkers, Clinical cohort, Cancer, Geriatrics, Frailty, Chronobiology
Abstract :
[en] This dissertation is an illustrative example of how computational biology can be applied in interdisciplinary settings to identify prognostic biomarkers in clinical cohort and cancer cell line data and highlights the potential for integrating these methodologies into modern systems biology curricula. While the first two chapters of this dissertation are focusing on applied computational biology, the third chapter is exploring the integration of these machine learning approaches into current systems biology education. The first chapter entitled ‘Biomarker detection in clinical cohort data using machine learning’ showcases how data-driven computational biology can be applied for exploratory and hypothesis-generating research in biomedical clinical cohort data. In the context of the geriatric condition of frailty, post-hoc interpretable machine learning applications reveal that men and women show distinct frailty phenotype profiles, linked to body composition in men and physiological anomalies in women. In fact, (pre-)frailty prediction performance improved with sex-specific tailored machine learning models. These revealed that the physical frailty profile in men is characterised by high fat and low body lean mass, whereas the female physical frailty is more linked to vitamin D deficiency and increased concentrations of monocytes, leukocytes and eosinophils in blood. Furthermore, post-hoc analysis indicates that the combinations of such features, not single markers, best capture these sex-specific pre-frailty patterns. Eventually, these findings led to follow-up research on validating and further investigating these intriguing physical pre-frailty patterns in a Luxembourgish Parkinson’s Disease study. The second chapter, ‘Drug sensitivity prediction for time-of-day cancer treatment profiling’, concentrates on hypothesis-driven approaches to predict time-dependent drug sensitivity in cancer cell line expression data. The projects in this chapter underscore circadian dynamics as critical factor influencing overall cancer drug responsiveness, and our approaches significantly contributed to the development and validation of a robust quantitative phenotyping platform to evaluate drug timing effects and predict drug sensitivity, resulting in the introduction of the chronotherapeutic index and the chronosensitivity index to assess timing effect and sensitivity of cancer drugs. Additionally, these applications help leveraging circadian characteristics to stratify cancer cell lines into new subtypes with high predictive value, this in the context of triple negative breast cancer and neuroblastoma. For example, new circadian-related subtypes were identified in triple negative breast cancer, separating them in unstable, weak, dysfunctional, and functional circadian state. Overall, these contributions helped building an interdisciplinary and translational framework where cellular clock phenotypes effectively could shape chronotherapy design in oncological treatments. The projects presented in this chapter were initiated and led by the Granada Lab of the Charité Comprehensive Cancer Center of the Medical University of Berlin with close collaboration of the Systems Biology and Epigenetics Team of the Department of Life Sciences and Medicine at the University of Luxembourg. Finally, the third chapter is focusing on ‘Machine learning integration in systems biology education’. This chapter attempts to lay out the status quo of machine learning in current systems biology study lines. It aligns the importance of interdisciplinary collaborations and the integration of computational biology to respond to the current opportunities and challenges in this field of study. In a recently published review, we realised that systems biology education must combine deep biological knowledge with computational and technological methods, yet current graduate programs still struggle to deliver this integration effectively. Insufficient exposure to multimodal data integration (e.g., clinical cohort data and cell line data coupled with machine learning approaches) adds to the consequences of this lack. As a result, we concluded that without early and sustained institutional commitment, the field risks producing graduates underprepared for translational bioinformatics and precision systems applications anticipated to shape the future of the field. A good example to mitigate such consequences is the careful design of adaptive and interdisciplinary educational material that can be used in classrooms to, for example, predict drug targets and candidate drugs for repurposed cancer therapies in the context of metabolic modelling, machine learning, and expression data. In conclusion, this dissertation exhibits how computational biology can drive discovery in both research and education. From identifying prognostic biomarkers in geriatric conditions to shaping cancer treatment strategies, and from data integration to curriculum design, it underscores the power and necessity of bridging biology and machine learning in today’s scientific landscape.
Disciplines :
Life sciences: Multidisciplinary, general & others
Author, co-author :
DIDIER, Jeff  ;  University of Luxembourg > Faculty of Science, Technology and Medicine > Department of Health, Medicine and Life Sciences > Team Thomas SAUTER
Language :
English
Title :
Machine learning-based Identification of Biomarkers in Clinical Cohort and Cancer Cell Line Data
Defense date :
10 November 2025
Number of pages :
226
Institution :
Unilu - University of Luxembourg [Faculty of Science, Technology, and Medicine (FSTM)], Belval, Luxembourg
Degree :
Docteur en Biologie (DIP_DOC_0002_B)
Promotor :
SAUTER, Thomas ;  University of Luxembourg > Faculty of Science, Technology and Medicine (FSTM) > Department of Health, Medicine and Life Sciences (DHML)
President :
THEOBALD, Martin ;  University of Luxembourg > Faculty of Science, Technology and Medicine (FSTM) > Department of Computer Science (DCS)
Jury member :
KLUCKEN, Jochen  ;  University of Luxembourg > Luxembourg Centre for Systems Biomedicine (LCSB) > Digital Medicine
Müller Fabian;  Universität des Saarlandes > Integrative Cellular Biology and Bioinformatics
Song Giltae;  Pusan National University > School of Computer Science and Engineering
Funders :
FNR - Luxembourg National Research Fund
Funding number :
PRIDE17/12252781
Funding text :
I gratefully acknowledge the financial support of the doctoral training unit data-driven computational modelling and applications (DRIVEN) by the Luxembourg National Research Fund under the PRIDE program (PRIDE17/12252781).
Available on ORBilu :
since 12 February 2026

Statistics


Number of views
84 (10 by Unilu)
Number of downloads
26 (1 by Unilu)

Bibliography


Similar publications



Contact ORBilu