Project Details
Molecular Descriptors in Matrix Completion Methods
Applicants
Professor Dr.-Ing. Hans Hasse; Professor Dr.-Ing. Fabian Jirasek; Professorin Dr. Heike Leitte
Subject Area
Theoretical Chemistry: Electronic Structure, Dynamics, Simulation
Term
since 2022
Project identifier
Deutsche Forschungsgemeinschaft (DFG) - Project number 497201843
Knowledge on properties of mixtures is of paramount interest in chemistry and chemical engineering. We have recently introduced matrix completion methods (MCMs) from machine learning as a novel technique for predicting properties of mixtures. These MCMs also open up new ways for characterizing the molecular components based on their behavior in mixtures. The idea behind an MCM is to store the set of available data on a certain property of binary mixtures in a matrix, where the rows and columns correspond, e.g., to solutes and solvents. The lack of experimental data usually results in a sparsely occupied matrix; the MCM then predicts the remaining entries, which is of high practical interest. Hereby, the MCM often yields better results than established physical benchmark methods. For solving the prediction task, the MCM determines descriptors of the pure components (called latent component features) based only on the mixture data. In the present work, we will explore these latent component features obtained from the MCM in detail. In particular, we are interested in finding out, how they are related to established molecular component descriptors. On the one side, this will enable us to substantially further enhance the predictions of the mixture properties by developing hybrid MCMs. On the other side, we aim at establishing the latent component features as a new class of pure component descriptors, which can be determined flexibly from mixture properties and can be used in many ways, also beyond the MCM. Our research will contribute to the research area “design and evaluation of molecular representations for machine learning” of SPP 2363, to which we will add expertise from engineering, machine learning, and visual data analytics, while profiting from the inspiring input from chemistry.
DFG Programme
Priority Programmes
Subproject of
SPP 2363:
Utilization and Development of Machine Learning for Molecular Applications – Molecular Machine Learning
International Connection
USA
Cooperation Partner
Professor Dr. Stephan Mandt