Abstract :
[en] Accurate modelling of chemical and physical interactions is crucial for obtaining thermodynamic and dynamical properties of any chemical system, enabling a myriad of possible applications. Many of these applications are computationally prohibitive when using advanced Computational Chemistry (CompChem) methods even on modern supercomputers. Because of this, machine learning (ML) force fields (FFs), combining the accuracy of state-of-the-art ab initio methods and the efficiency of classical FFs, are being increasingly used to reconstruct potential-energy surfaces (PESs) of molecules and solids. It is precisely the synergy of ML and CompChem that has revolutionized the field in the last decade, rising the applications to a qualitatively new level. Despite this great success, there are still many unsolved challenges. In this context, my thesis aims to investigate the capability of the existing MLFFs to provide simultaneously accurate and efficient models offering unprecedented insights into the (thermo)dynamics of realistic molecular systems.
Using the examples of molecular interactions that are pervasive in (bio)chemical systems, we show a counterintuitive effect of strengthening of such interactions, as well as an unexpected prevalence of quantum nuclear fluctuations over thermal contributions at room temperature. We reveal that, when dealing with complex PESs, the predictions of state-of-the-art ML models (BPNN, SchNet, GAP, and sGDML) greatly depend on the descriptor used, and on the region of the PES being sampled. Given the varying performance of MLFFs, we present a descriptor optimization scheme improving simultaneously the accuracy and efficiency of ML models. Our results show that the commonly employed strategies followed to construct both local and global descriptors need to be improved because the optimal descriptors are a non-trivial combination of local and global features. Therefore, the work presented in this thesis highlights the potential of MLFFs to provide insights into chemical systems while, at the same time, discloses the current limitations preventing the construction of accurate MLFFs for more realistic systems. Also, I propose the optimization of the description of interactions within an ML model as a valuable step towards obtaining efficient and accurate MLFFs of large and flexible molecules.
Overall, this thesis suggests that the full workflow for building ML models still needs significant elaboration. Despite this finding, the combination of CompChem and ML methods in atomistic modelling promises to enable us to solve multiple problems in different areas, such as medicine, materials design, pharmacology, energy production, environmental sciences, among others.