Keywords :
machine learning, force fields, molecular dynamics, machine learning force fields, interatomic potential, molecular simulations
Abstract :
[en] Machine Learning Force Fields (MLFF) are a crucial tool for bringing the accuracy of computationally expensive quantum mechanical calculations to practically feasible applications on molecules and materials. Over time, the sophistication of MLFF architectures has increased to match the complexity of systems of ever-growing sizes. This increase in complexity comes with a higher need for in-depth analytical tools to properly assess a Machine Learning (ML) model's quality. Even the most advanced models that showcase remarkably low overall prediction errors demonstrate highly heterogeneous predictive capabilities across the Configurational Space (CS) of a single system. In practice, these can significantly impact the reliability of a model as a high prediction error on a few key geometries can easily lead to e.g. the destabilisation of a molecular dynamics simulation. In this work, we provide a cross-platform software package designed to give a detailed view into the performance and shortcomings of an MLFF model, complete with an easy-to-use graphical user interface. Entitled FFAST (Force Field Analysis Software and Tools), this actively developed software enables any user to gauge the quality of many state-of-the-art ML architectures and infer potential pitfalls in practical applications. Analytical tools are provided at any desired level of resolution, from average error metrics over entire datasets to assessments of prediction accuracies on an atom-by-atom basis. To provide an optimal compromise between detailed analysis and simplicity, a novel approach is developed to determine a model's predictive capabilities across different regions of CS. This is achieved by employing methods from unsupervised learning to create clusters of qualitatively different configurations and calculate their respective prediction accuracies separately. This provides much-needed context to otherwise general error metrics and captures insightful details of the model's capacity to reproduce e.g. important out-of-equilibrium mechanisms rarely frequented in the reference dataset. Furthermore, inhomogeneous error curves across clusters also provide information on which regions are likely poorly represented in a model's training set. The potential of the aforementioned methods as well as FFAST are showcased on example datasets of stachyose and docosahexaenoic acid (DHA) as well as a handful of smaller organic molecules. After successfully proceeding through a typical FFAST workflow with two state-of-the-art ML models (Nequip and MACE), it was quickly determined that carbons and oxygens near glycosidic bonds of the stachyose molecule have increased prediction errors. Furthermore, prediction errors on DHA rise as the molecule folds, with notably low accuracy on the carboxylic group at the edge of the molecule. Finally, the cluster prediction errors of a handful of small organic molecules were generated for three different ML models (sGDML, SchNet and GAP/SOAP). The latter showed that in all cases, prediction accuracies are highly heterogeneous across CS, hinting at a host of potential problems in real-world applications, where model stability is paramount. Motivated by the newfound heterogeneities, an iterative training process is proposed that actively seeks out configuration clusters poorly represented in the training set. Assisted by clustering techniques, the training set is gradually extended until it equally reflects all relevant parts of CS of the reference dataset. It is found that models trained this way have enhanced stability in molecular dynamics, with performance comparable to significantly increasing the total number of training points. Furthermore, they demonstrate an up to two-fold decrease in the root mean squared errors for force predictions in the problematic regions of CS. Finally, future avenues for research are briefly discussed in light of this work's findings. This includes the suggestion of a "divide and conquer" approach to subdivide the task of learning CS into smaller building blocks, providing a potentially reliable way to ensure reliability across all possible states of the system. As the complexity and sizes of the systems tackled by MLFFs have grown over recent years, a subdivision of the learning process into more manageable building blocks is of wide appeal.
Institution :
Unilu - University of Luxembourg [Faculty of Science, Technology and Medicine], Luxembourg, Luxembourg