Abstract :
[en] In this thesis, we present a unified framework for systematically developing and validating MLFFs for complex semiconducting materials and interfaces. First, we conduct large-scale MLFF simulations of a perovskite slab system to provide detailed insights into its geometric and electronic structure. We perform a comprehensive analysis of state-of-the-art MLFFs to identify the most suitable architectures and models, pinpoint potential avenues for improvement, and establish a rigorous benchmarking protocol for atomistic simulation of materials and interfaces. Finally, we introduce a novel data-driven strategy that systematically incorporates molecular and crystalline symmetries, thereby enhancing the predictive accuracy and efficiency of MLFFs.
Practically, we have conducted large-scale simulations of cesium lead iodide (CsPbI$_3$) perovskite slab. We demonstrate that the nonlocal many-body dispersion (MBD-NL) method accurately reproduces the experimentally observed phase diagram, unlike another widely used D3 scheme, which incorrectly predicts the cubic phase at room temperature instead of the orthorhombic phase. Furthermore, by extending our study to finite temperatures for a slab structure, we demonstrate the influence of the van der Waals interaction on the electronic structure near the surface and within the slab. In combination with state-of-the-art MLFF models, these findings establish a foundation for large-scale and predictive modeling of the geometric and electronic properties of two-dimensional perovskite interfaces, thereby advancing their mechanistic understanding and facilitating the design of next-generation materials.
By conducting a comprehensive assessment of state-of-the-art MLFF architectures within the TEA Challenge 2023, we benchmarked five representative models—sGDML, SOAP/GAP, FCHL19, MACE, and SO3krates—across periodic materials and interfaces. While modern equivariant message-passing neural networks deliver the highest overall accuracy, all architectures display considerable maximum force errors and heterogeneous per-atom performance. In molecular dynamics simulations, kernel-based methods often fail outside well-sampled regions, whereas E(3)-equivariant neural networks generally preserve stability but remain limited in capturing long-range interactions critical for adsorption and desorption processes. These findings provide practical guidelines for MLFF development, establish a streamlined evaluation workflow, and highlight key directions for improving model robustness and transferability in the study of materials and molecular systems.
Heterogeneity in force predictions for different atomic environments underscores the need for a data-driven symmetry search to automatically identify atomic "orbits"—chemically distinct local environments that conventional MLFFs often conflate under their global permutation-invariance assumptions. We demonstrate significant improvements in force prediction accuracy when integrating these orbits into both kernel-based (sGDML) and E(3)-equivariant neural message-passing (MACE) architectures for organic interfaces and slab perovskite systems.
For sGDML, trained on ethanol, 1,8-naphthyridine, D-alanine, and D-histidine molecules adsorbed on graphene, we establish a strong correlation between force prediction accuracy and chemical diversity, quantified by orbit count. Using an orbit-enhanced sGDML model, we constructed the average potential energy surface and potential surface of mean force for 1,8-naphthyridine on graphene, finding energy barriers exceeding 300~K between unit-cell minima, preferential diffusion along narrow channels that avoid atop-carbon sites, and substantial rotational barriers about the surface-normal axis -- suggesting that, at room temperature, transport proceeds via coupled in-plane translation and rotation. Incorporating orbits into MACE enables us to reduce the model size by an order of magnitude while preserving predictive accuracy, as demonstrated for the CsPbI$_3$ perovskite slab and graphene with a pyridinic-N defect.
Overall, this work advances a robust pipeline for reliable and efficient atomistic simulations by integrating high-quality datasets, physically informed ML model architectures, and community-wide efforts toward rigorous validation protocols that extend beyond static error metrics to include dynamic stability and chemical transferability.
Disciplines :
Physical, chemical, mathematical & earth Sciences: Multidisciplinary, general & others
Institution :
Unilu - University of Luxembourg [Faculty of Science, Technology and Medicine], Esch-sur-Alzette, Luxembourg
UMONS - Université de Mons [The Faculty of Science], Mons, Belgium