Doctoral thesis (Dissertations and theses)
Towards quantum-accurate description of molecular interactions in biochemical systems
PULEVA, Mirela
2025
Dataset
 

Files


Full Text
_PhDThesis_MirelaPuleva.pdf
(31.45 MB)
Download

All documents in ORBilu are protected by a user license.

Send to



Details



Keywords :
Molecular dimers; Protein-ligand interactions; Machine Learning; Method benchmarking
Abstract :
[en] The accurate prediction of molecular properties is essential for understanding physicochemical phenomena and enabling impactful applications such as acceleration of the drug discovery pipeline. To this end, accurate description of quantum mechanical (QM) effects is crucial, especially in non-covalently bound systems such as protein-ligand interactions, due to collective long-range van der Waals (vdW) interactions. While high-level ab initio techniques like Coupled Cluster (CC) and diffusion Monte Carlo (DMC) can provide accurate estimations of relevant molecular properties, such as interaction energies, they are computationally feasible only for systems up to hundreds, and in rare cases thousands of atoms. With the increase of system size, approximate methods, such as density functional theory (DFT) with vdW corrections, semi-empirical models, and classical force fields (FF), must be employed depending on the achievable balance between computational cost and accuracy. In the last decade, a new family of alternative approaches, i.e., Machine Learning (ML) methods, has gained growing attention due to the promise of delivering ab initio level estimations at the cost of coarse-grained approaches such as classical FFs. As a matter of fact, ML methods, 'learn' complex relationships between atomic configurations and molecular properties from data, ideally producing scalable and transferable models, which enable accurate predictions at reduced computational cost, even for large-scale systems. Yet, until now, ML methods have not been sufficiently generalized to tackle in a reliable way protein-ligand interactions. To obtain reliable models for such large conformers, in fact, two elements are required. Firstly, it is necessary to have accurate data on large molecular compounds representatives for amino acids, i.e., the main components of proteins, and for their reciprocal interactions. Secondly, well defined test cases for the models are necessary to verify their stability in performing dynamical simulations at different temperatures, and to test their portability. This thesis has thus the objective of addressing both these questions, by defining a protocol to construct high-quality reference data-that can be used to training data-driven ML potentials and eventually for validating approximate QM methods-and by proposing a set of benchmark tests that serve the purpose of validating the stability and accuracy of the models. To answer the former question, here a novel dataset is introduced, i.e., the Quantum Interacting Dimer (QUID) dataset, that is specifically created using state-of-the-art PBE0+MBD calculations, on large, complex and chemically diverse pocket-like molecular dimers, in and out of equilibrium, serving as prototype reference data for protein-ligand systems that interact via vdW interactions. The non-covalent interactions within the dataset are investigated in-depth, and compared to gold standard ab initio calculations obtained via CC and QMC. These accurate references have afterwards been used to test existing DFT exchange-correlation functionals, semi-empirical and classical Force Field (FF) methods, and for the ablation studies of a ML model. Several dispersion-inclusive density functionals are proven to provide accurate interaction energy predictions in the study, while the investigation of the semiempirical methods and FFs indicates towards a need for further improvements. On the other hand, to answer the latter question of ML models' stability here, the TEA Challenge 2023 is presented as a reliable verification to the performance and stability of a representative sample of ML approaches, both kernel and neural network (NN), on MD simulations of large organic molecules. As a result, the region of reliable performance for the models was identified, avenues for improvement were proposed, and a set of guidelines to the field were suggested. Based on the outcome of the TEA Challenge 2023, a ML model was chosen for a protein-protein interactions test of the Sars-Cov-2 protein. Furthermore, as an alternative strategy towards accurate calculations of large (bio)molecular systems, a ML-corrected DFTB method was proposed in the EquiDTB framework, aiming to improve the accuracy of the semi-empirical method by replacing its standard parametrised repulsive potential with a physics-inspired equivariant NN. In summary, the systematic investigations presented in this thesis lay a rigorous foundation for the development of next-generation models capable of predicting binding energies and performing molecular dynamics simulations of protein-ligand interactions with quantum chemical accuracy. By integrating high-fidelity reference data, physically grounded approximations, and machine learning techniques, this work contributes essential methodological insights and benchmarks that will support future efforts to model complex biomolecular systems with both accuracy and scalability.
Disciplines :
Physics
Author, co-author :
PULEVA, Mirela ;  University of Luxembourg > Faculty of Science, Technology and Medicine (FSTM) > Department of Physics and Materials Science (DPHYMS)
Language :
English
Title :
Towards quantum-accurate description of molecular interactions in biochemical systems
Defense date :
24 October 2025
Institution :
University Luxembourg [Faculty of Science, Technology and Medicine], Luxembourg, Luxembourg
Degree :
Docteur en Physique (DIP_DOC_0003_B)
President :
FODOR, Etienne ;  University of Luxembourg > Faculty of Science, Technology and Medicine (FSTM) > Department of Physics and Materials Science (DPHYMS)
Jury member :
SKUPIN, Alexander  ;  University of Luxembourg > Luxembourg Centre for Systems Biomedicine (LCSB) > Integrative Cell Signalling
Maurer, Reinhard;  University of Warwick > Department of Physics ; University of Warwick > Department of Chemistry
Řezáč, Jan;  CAS - Czech Academy of Sciences > Institute of Organic Chemistry and Biochemistry
TKATCHENKO, Alexandre ;  University of Luxembourg > Faculty of Science, Technology and Medicine (FSTM) > Department of Physics and Materials Science (DPHYMS)
Funders :
Institute for Advanced Studies, University of Luxembourg, Campus Belval, L-4365 Esch-sur-Alzette, Luxembourg
Funding text :
PhD ``Young Academics'' program, "AQMA" project
Data Set :
QUantum Interacting Dimer (QUID) dataset

Related publication: Nat Commun 16, 8583 (2025). https://doi.org/10.1038/s41467-025-63587-9


TEA Challenge 2023 data

Related publications: https://doi.org/10.1039/D4SC06530A and https://doi.org/10.1039/D4SC06529H

Available on ORBilu :
since 19 November 2025

Statistics


Number of views
74 (7 by Unilu)
Number of downloads
119 (1 by Unilu)

Bibliography


Similar publications



Contact ORBilu