Quantum Mechanics / Machine Learning Models
Matthias Rupp
University of BaselDepartment of [email protected]
IPAM Summer School on Electronic Structure Theory,Los Angeles, California, July 31, 2014
Outline
Introduction What are QM/ML models?
Machine learning How does ML work?
Applications What can be done with them?
Pitfalls What can go wrong?
Demonstration Worked example
Matthias Rupp: QM/ML Models 2
Approximations
accuracy
−−−−−−−−−−−→
generality
Full configuration interactionspeed
−−−−−−−−−−−→
Quantum Monte CarloCoupled clusterDensity functional theoryMNDO, tight bindingForce fields
QM/ML models:
The accuracy of quantum chemistry,at the speed of machine learning
Matthias Rupp: QM/ML Models — Introduction 3
QM/ML models
Exploit redundancy in a series of QM calculations
• QM/ML = quantum mechanics + machine learning
• Interpolate between QM calculations using ML
• Smoothness assumption (regularization)
property
æ
æ
æ
æ
æ
æ
molecular structure
• reference calculations
— QM
- - - ML
Matthias Rupp: QM/ML Models — Introduction 4
Relationship to other models
Quantum chemistry
Generally applicableNo or little fittingForm from physicsDeductiveFew or no parametersSlowSmall systems
Force fields
Limited domainFitting to one classForm from physicsMostly deductiveSome parametersFastLarge systems
Machine learning
Generally applicableRefitted to any datasetForm from statisticsInductiveMany parametersIn betweenLarge systems
Matthias Rupp: QM/ML Models — Introduction 5
What is machine learning?
• Interpolation
• Algorithmic search for patterns in data
• Inference from known samples to new ones
• Regularity, information content
• Data-driven approach
• Empirical but principled
Matthias Rupp: QM/ML Models — Machine learning 7
Hastie, Tibshirani, Friedman, The Elements of Statistical Learning, Springer, 2nd ed., 2009.Bishop, Pattern Recognition and Machine Learning, Springer, 2006.
Machine learning algorithms
• Artificial neural networks (Haykin, 2008; Montavon et al (ed.), 2012)
• Kernel ridge regression (Hastie, Tibshirani, Friedman, 2009)
• Gaussian process regression (Rasmussen & Williams, 2006)
• Support vector machines (Cristianini & Shawe-Taylor, 2000)
• Principal component analysis (Jolliffe, 2004)
• Symbolic regression (Schmidt, Lipson, Science, 2009)
• Many others. . .
Matthias Rupp: QM/ML Models — Machine learning 8
Kernel learning
Idea:• Transform samples into higher-dimensional space
• Implicitly compute inner products there
• Rewrite linear algorithm to use only inner products
-2 Π -Π 0 Π 2 Πx
Input space X
7→
φ−→
-2Π -Π Π 2Πx
-1
1sin x
Feature space H
k : X × X → R, k(x , z) =⟨
φ(x), φ(z)⟩
Matthias Rupp: QM/ML Models — Machine learning 9
Scholkopf, Smola: Learning with Kernels, 2002; Hofmann et al.: Ann. Stat. 36, 1171, 2008.
KernelsKernels correspond to inner products.
If k : X × X → R is symmetric positive semi-definite,then k(x , z) = 〈φ(x), φ(z)〉 for some φ : X → H.
Inner products encode information about lengths and angles:||x − z ||2 = 〈x , x〉 − 2 〈x , z〉+ 〈z , z〉 , cos θ = 〈x ,z〉
||x || ||z|| .
0
Θ
x
z
ÈÈx-zÈÈ2
ÈÈxÈÈ2
ÈÈzÈÈ2
ÈÈ z ÈÈ2 cos ΘÈÈ x ÈÈ2
• Well characterized function class
• Closure properties
• Access data only by Kij = k(xi , xj)
• X can be any non-empty set
• Examples:Linear kernel 〈x, z〉
Gaussian kernel exp(
− ||x−z||2
2σ2
)
Matthias Rupp: QM/ML Models — Machine learning 10
Kernel ridge regression
• Regularized form of ordinary regression
• Regularization prevents over-fitting by penalizing large coefficients
• Use of kernels for non-linearity
Solution has form
f (x) =n
∑
i=1
αik(xi , x)
Coefficients α are obtained by solving
n∑
i=1
(
f (xi )− yi)2
+ λαTKα,
which has solutionα =
(
K+ λI)−1
y.
Matthias Rupp: QM/ML Models — Machine learning 11
Gaussian process regression
• Generalization of multivariate normal distribution to functions
• Determined by mean function and covariance function = kernel
• Conditioning of prior on training data yields posterior distribution
• Variance as confidence estimates for predictions
� 4 � 2 0 2 4input� 3
� 2
� 1
0
1
2
3target
✁
✁
✁
✁
✁
✂ 4 ✂ 2 0 2 4input✂ 3
✂ 2
✂ 1
0
1
2
3target
• Intuitively: Place a basis function on each training datum xi
• Solution has form f (x) =∑n
i=1 αik(xi , x)
Matthias Rupp: QM/ML Models — Machine learning 12
Rasmussen, Williams: Gaussian Processes for Machine Learning, MIT Press, 2006.
Applications
• Potential energy surfaces (Handley & Behler, Eur. Phys. J. B 152, 2014)
• Molecular and materials properties (Rupp et al, PRL 058301, 2012)
• Polarizabilities (Kandathil et al, JCC 1850, 2013)
• Density functional theory (Snyder et al, JCP 224104, 2013)
• Transition state theory dividing surfaces (Pozun et al, JCP 174101, 2012)
• Materials properties (Pilania et al, Sci. Rep. 2810, 2013; Ghiringhelli et al, 2014)
• Transmission coefficients (Lopez-Bezanilla& von Lilienfeld,PRB 235411, 2014)
• Collective variables (Rohrdanz et al, JCP 124116, 2011)
• Others (e.g., nuclear physics, cheminformatics)
Matthias Rupp: QM/ML Models — Applications 13
Gaussian approximation potentials
• Gaussian process regression
• Molecular dynamics
• Partitioned energies
• Representation:
Local density
Projection to 4d sphere
Hyperspherical harmonics
Bispectrum
✶�
✺
�
✻
✹
✷
�❘✁✂✄☎✂✁✆✝✞✟✠ ✡✞✟☛✁☞✌✆ ❉☞✟✄✂✍✝
� �✵✷ �✵✻ �✵✎ ✶
❉✏✑✦▲❉✒
●✒✓
❇✞✆✍✍✆✞
✑✆✞❚✂✔✔
�✵✹
❘✆✟✕✌☞✂✍ ✕✂✂✞✝☞✍✟✌✆
❊✖✗✘✙✚✛✗✜
❊✖✗✘✙✚✛✗✜
Transition path energies
0
15%
C11 C12 C44
50%
Elastic const.Vacancyenergy
(100)
Surface energy
(110) (111) (112)
GAPBOP
MEAMFS
Errors on properties of Tungsten
Matthias Rupp: QM/ML Models — Applications 14
Bartok, Csanyi et al, Phys Rev Lett 104: 136403, 2010. Szlachta et al, arXiv 1405.4370, 2014.
Density functional theory
Learning the map from electron density to kinetic energy
• Orbital-free DFT
• 1D toy system
• DFT/LDA as reference
• Error decays to zero
• Self-consistent densities
• Bond breaking and formation
H2 potential H2 binding curve H2 forces
Matthias Rupp: QM/ML Models — Applications 15
Snyder et al, Phys Rev Lett 108: 253002, 2012. Snyder et al, J ChemPhys 139: 224104, 2013.
Transition state theory
• Characterization of dividing surfaces
• Support vector machines
• No prior information required
• Iteratively refined by biased sampling along dividing surface
❘
P✶
P✷
①
② ❚�✶
❚�✷
①✁
✭✂✄
✭☎✄
✭✆✄
✭❜✄
s✝✞✞✟✠
♣✡☛☞✌s
Matthias Rupp: QM/ML Models — Applications 16
Pozun et al, J. Chem. Phys. 136: 174101, 2012.
∆-learning: Setup
Learning the error between different levels of theory
• Learn corrections to a baseline method(∆ = reference - baseline)
• Augmenting legacy QM methods
• Puts physics into QM/ML model
• Examples: ∆B3LYPPM7 , ∆G4MP2
PM7 , ∆CCSD(T)HF
Matthias Rupp: QM/ML Models — Applications 18
Ramakrishnan, Dral, Rupp, von Lilienfeld, submitted, 2014.
∆-learning: Data
Learning the error between different levels of theory
134 k small organic moleculesPM7, DFT B3LYP
!""""""
""!" """"""
!"""""""""""!"""""""
#""""$"""%""""""
%#"$"""
""""!""""""
!""""""
%#"$"""!""""""
#""""""$"""""%""""""" """!""""""
!
"!
#!
$%!
$&!
%'! %'( )'! )'(
&'()*+,-*.
&/.R*E
6 k const. isomers of C7H10O2
PM7, G4MP2; HF, MP2, CCSD(T)
Matthias Rupp: QM/ML Models — Applications 19
Ramakrishnan et al, submitted, 2014. Ramakrishnan et al, Nat ScientificData, accepted, 2014.
Overfitting: Model complexity and generalization error
Underfitting
0.0 0.5 1.0 1.5 2.0x0.0
0.2
0.4
0.6
0.8
1.0
1.2y
0.123 / 0.443
λ too large
Fitting
0.0 0.5 1.0 1.5 2.0x0.0
0.2
0.4
0.6
0.8
1.0
1.2y
0.044 / 0.068
λ right
Overfitting
0.0 0.5 1.0 1.5 2.0x0.0
0.2
0.4
0.6
0.8
1.0
1.2y
0.036 / 0.939
λ too small
Matthias Rupp: QM/ML Models — Pitfalls 21
Rupp, PhD thesis, 2009. Li Li et al, submitted.
Overfitting: Another example
æ
æ æ
ææ
æ æ
æ
æ
æ
æ
æ
æ
æ
æ
æ
æ
ææ
æ
x
y
Matthias Rupp: QM/ML Models — Pitfalls 22
Overfitting: Early stopping rule
training�complexity
erro
r
training set
test
set
stop
Matthias Rupp: QM/ML Models — Pitfalls 23
Validation
Golden rule
Training must never use validation data
Example 1: overfitting× train on all data, predict all dataX split data, train, predict
Example 2: centering× center data, split data, train & predictX split data, center training set, train, center test set, predict
Example 3: cross-validation with feature selection× feature selection, cross-validationX feature selection for each split of cross-validation
Matthias Rupp: QM/ML Models — Pitfalls 24
Reliability of predictions
Predictive variance of Gaussian process regression model
Matthias Rupp: QM/ML Models — Pitfalls 25
Snyder et al, Phys. Rev. Lett. 108: 253002, 2012.
Gradients
Functional derivative of model as-is and projected on training data
Matthias Rupp: QM/ML Models — Pitfalls 26
Snyder et al, J. Chem. Phys. 139: 224104, 2013.
Summary
• QM/ML models combine quantum chemistry with machine learningby interpolating between reference QM calculations
• The concept is broadly applicable
Matthias Rupp: QM/ML Models — Summary 27
Live demonstration
Matthias Rupp: QM/ML Models — Demonstration 28
Acknowledgements
The Basel team
von Lilienfeld Ramakrishnan Chang
Collaborators
M.R. Bauer, F. Biegler, L. Blooston, F.M.Boeckler, F. Brockherde, K. Burke,P. Dral, S. Fazli, G. Folkers, V. Gobre, K. Hansen, G. Henkelman, J. Huang,A. Knoll, A. Lange, L. Li, A. Lopez-Bezanilla, G.Montavon, K.-R.Muller,I.M. Pelaschier, Z. Pozun, M. Reutlinger, M. Scheffler, G. Schneider,D. Sheppard, J.C. Snyder, A. Tkatchenko, S. Varma, A. Vazquez-Mayagoitia,R.Wilcken, A. Ziehe
Institutions IPAM ∗ EU FP7 ∗ DFG ∗ SNSF
Matthias Rupp: QM/ML Models — Demonstration 29