Project Details
Projekt Print View

Quantum Chemical Molecular Representations for Machine Learning

Subject Area Theoretical Chemistry: Electronic Structure, Dynamics, Simulation
Term since 2022
Project identifier Deutsche Forschungsgemeinschaft (DFG) - Project number 497190956
 
This project aims to develop new molecular representations for machine learning (ML) based on efficient tight-binding (TB) quantum chemistry ('quantum features') and to connect those representations to various new network architectures. The models will be applied to predict chemically relevant properties of pharmaceutical-type molecules, like conformational and tautomerization energies, pKa values, solubility or partition coefficients. It is a project of a world-wide leading theoretical chemistry group for the development and application of simplified quantum chemical (QC) methods with strong support from the science and technology company Merck with established competence in leveraging extensive chemical data. For computing the quantum features, a new model Hamiltonian (ShellQ) in an extended AO basis set (vDZP) will be developed that is able to reproduce accurately various properties (atomic charge, shell population, bond order, dipole moment, polarizability) from a reference DFT calculation and is still generally applicable to the whole periodic table including organometallic systems. It accounts for the first time in a semiempirical context for fundamental physical effects like orbital contraction and electronic polarization. It is combined with established continuum solvation theories to model solvated molecules. Further main aspects of the proposal are the optimization of the neural network architecture based on ShellQ features, development of feature representation, the automatized generation of molecular training data sets, and state-of-the-art multitask-learning inspired from image recognition algorithms. In general, we follow a Delta-ML strategy where a correction term to a fast QC calculation (typically the established GFN-xTB or GFN-FF methods) based on the available features is computed by the network. This entire approach is supposed to provide efficiency and accuracy for a potentially wide range of chemical properties.
DFG Programme Priority Programmes
 
 

Additional Information

Textvergrößerung und Kontrastanpassung