Reference : Many-Body Descriptors for Predicting Molecular Properties with Machine Learning: Anal...
Scientific journals : Article
Physical, chemical, mathematical & earth Sciences : Physics
Physics and Materials Science
http://hdl.handle.net/10993/37899
Many-Body Descriptors for Predicting Molecular Properties with Machine Learning: Analysis of Pairwise and Three-Body Interactions in Molecules
English
Pronobis, Wiktor* [Machine Learning Group, Technische Universität Berlin, 10587 Berlin, Germany]
Tkatchenko, Alexandre* mailto [University of Luxembourg > Faculty of Science, Technology and Communication (FSTC) > Physics and Materials Science Research Unit >]
Müller, Klaus-Robert []
* These authors have contributed equally to this work.
11-May-2018
Journal of Chemical Theory and Computation
American Chemical Society
14
13
Yes
International
1549-9618
1549-9626
DC
[en] Machine learning (ML) based prediction of molecular properties
across chemical compound space is an important and alternative approach to
efficiently estimate the solutions of highly complex many-electron problems in
chemistry and physics. Statistical methods represent molecules as descriptors
that should encode molecular symmetries and interactions between atoms.
Many such descriptors have been proposed; all of them have advantages and
limitations. Here, we propose a set of general two-body and three-body interaction
descriptors which are invariant to translation, rotation, and atomic indexing.
By adapting the successfully used kernel ridge regression methods of machine
learning, we evaluate our descriptors on predicting several properties of small
organic molecules calculated using density-functional theory. We use two data sets.
The GDB-7 set contains 6868 molecules with up to 7 heavy atoms of type CNO.
The GDB-9 set is composed of 131722 molecules with up to 9 heavy atoms
containing CNO. When trained on 5000 random molecules, our best model achieves an accuracy of 0.8 kcal/mol (on the remaining
1868 molecules of GDB-7) and 1.5 kcal/mol (on the remaining 126722 molecules of GDB-9) respectively. Applying a linear
regression model on our novel many-body descriptors performs almost equal to a nonlinear kernelized model. Linear models are
readily interpretable: a feature importance ranking measure helps to obtain qualitative and quantitative insights on the importance of
two- and three-body molecular interactions for predicting molecular properties computed with quantum-mechanical methods.
Researchers ; Professionals ; Students ; General public ; Others
http://hdl.handle.net/10993/37899
10.1021/acs.jctc.8b00110
H2020 ; 725291 - BeStMo - Beyond Static Molecules: Modeling Quantum Fluctuations in Complex Molecular Environments

File(s) associated to this reference

Fulltext file(s):

FileCommentaryVersionSizeAccess
Open access
126-many-body-descriptors-ML-JCTC-2018.pdfPublisher postprint4.51 MBView/Open

Bookmark and Share SFX Query

All documents in ORBilu are protected by a user license.