MATERIALS PROPERTY PREDICTIONS USING MACHINE LEARNING:
RECENT EXAMPLES AND FUTURE OUTLOOK
NOMAD SUMMER: A HANDS-ON COURSE ON TOOLS FOR NOVEL-MATERIALS DISCOVERY
28th September 2017, 9:00 AM Physics Department (Lise-Meitner-Haus), Humboldt-University, Berlin
Ghanshyam PilaniaTheory Department
Fritz Haber Institute of the Max Planck Society &
Materials Science and Technology (MST) DivisionLos Alamos National Laboratory
AGE OF DATA, ML & AI
Medical applications
Transportation
Robotics Speech recognition Finance and marketing
Recommender systems Facial recognition Board games
…
INFORMATICS: IN THE PAST
Steel production (India & Sri Lanka)
Kepler’s Laws (Johannes Kepler)
Data-driven Nursing (Florence Nightingale)
Periodic Table (Dimitri Mendeleev)
Hume-Rothery Rules Hall-Petch Relationship
Chem/Bio Informatics Polymer Informatics
Materials Informatics
6th Century BC 17th Century 19th Century 19th Century Mid 20th Century Late 20th Century Early 21st Century
INFORMATICS: IN THE PAST
Steel production (India & Sri Lanka)
Kepler’s Laws (Johannes Kepler)
Data-driven Nursing (Florence Nightingale)
Periodic Table (Dimitri Mendeleev)
Hume-Rothery Rules Hall-Petch Relationship
Chem/Bio Informatics Polymer Informatics
Materials Informatics
6th Century BC 17th Century 19th Century 19th Century Mid 20th Century Late 20th Century Early 21st Century
MATERIALS INFORMATICS: TODAYMoore’s Law
Computational Power Algorithms & Methods Development High throughput Exp. Materials Genome Initiative
Open-Source Materials Databases
and many more…
DATA TO INSIGHTS AND PREDICTIONS
Design • Efficient enumeration • Targeted Search • Adaptive Optimization
Laborious computations
Materials Propertyand/or
FingerprintMachine Learning
Experiments
A Mannodi-Kanakkithodi et al. Sci. Rep. 6, 20952 (2016)
RECENT HIGHLIGHTS
Chem. Mater. 29, 2574 (2017)
Understanding Radiation Damage Resistance
Chemistry of MaterialsMsc: cm5b04109
The following graphic will be used for the TOC:
1
Chem. Mater. 28, 1304 (2016) & J. Phys. Chem. C 120, 14575 (2016)
Learning Models for Dielectric Breakdown StrengthLearning Bandgaps Solids
Sci. Rep. 6 19375 (2016) & Comput. Mater. Sci. 129 156 (2017).
Pred
icte
d H
SE g
ap
Computed HSE gap
Sci. Rep. 6, 20952 (2016) & Comput. Mater. Sci. 125 123 (2016).
Designing Polymer Dielectrics for Energy Storage
• Phenomenological model discovery for intrinsic dielectric breakdown strength of insulators using machine learning
• Multi-fidelity machine learning models for bandgap prediction
Sci. Rep. 6 19375 (2016) & Comput. Mater. Sci. 129 156 (2017).
Pred
icte
d H
SE g
ap
Computed HSE gap
Chemistry of MaterialsMsc: cm5b04109
The following graphic will be used for the TOC:
1
Chem. Mater. 28, 1304 (2016) & J. Phys. Chem. C 120, 14575 (2016)
• Phenomenological model discovery for intrinsic dielectric breakdown strength of insulators using machine learning
Chemistry of MaterialsMsc: cm5b04109
The following graphic will be used for the TOC:
1
Chem. Mater. 28, 1304 (2016) & J. Phys. Chem. C 120, 14575 (2016)
EXAMPLE 1
DIELECTRIC BREAKDOWN
Rapid reduction of resistance of an electrical insulator under the presence of extreme electric field
+ + + + + + + +
— — — — — — — —
E-fielde-
PREDICTING BREAKDOWN FIELD
Predicting intrinsic electrical breakdown field of an insulator from first principles is difficult…
Determined by the balance between energy gain (E-field to e) and loss (e to phonon) of the electron
Can the breakdown field be estimated rapidly using a simple heuristic model ? Consider 82 binary octets (ex. ZnO, NaCl, …)
Fröhlich, Nature 151, 339 (1943)Sun. et al., Appl. Phys. Lett. 101, 132906 (2012)C. Kim, G. Pilania, R. Ramprasad, Chem. Mat. 28, 1304 (2016).
Dependence of Chemistry?
FIRST PRINCIPLES CALCULATIONS
Alkali metal halides
Transition metal halides
Post-transition metal halides
Alkaline earth metal chalcogenides
Transition metal oxides
Group IV
Group III-V Group II-VI
C. Kim, G. Pilania, R. Ramprasad, “From organized high-throughput data to phenomenological theory using machine learning: the example of dielectric breakdown”, Chem. Mat. 28, 1304 (2016).
LEARNING FROM DATA
Easily accessible material properties
Band gap (Eg)Phonon cutoff frequency (ωmax)Mean phonon frequency (ωmean)
Bulk modulus (M)Dielectric constant, electron (εe)Dielectric constant, total (εtot)
Nearest neighbor distance (dNN)Density (ρ)
Intrinsic breakdown field of 82 binary octets (by DFT)
Correlation analysis & Machine learning
Fb = f(A, B, …)
FEATURE CREATION
8 primary features
Band gap (Eg)Phonon cutoff frequency (ωmax)Mean phonon frequency (ωmean)
Bulk modulus (M)Dielectric constant, electron (εe)Dielectric constant, total (εtot)
Nearest neighbor distance (dNN)Density (ρ)
187,952 compound features
96 featureswith 1 function
ex) ln(Eg)
4,480 unique featureswith 2 functionsex) εtot2/exp(dNN)
183,368 unique featureswith 3 functions
ex) ln(ωmax)exp(M)/Eg2
8
1296
96
96
9696
12 prototypical functions
x, x-1, x2, x-2,
x3, x-3, x1/2, x-1/2,
ln(x), 1/ln(x), ex, e-x
Ghiringhelli, et al., Phys. Rev. Lett. 114 105503 (2015)
FEATURE SELECTION USING LASSO
187,952 compound features
LASSO-based down-selection
DiscardYes
No
36 features
Featuren (n=1~187,952)
Survive
Highlycorrelated?
(based on LASSO* coefficient)
Ranking Feature
Absolute Pearson
correlation /w lnFb
1 lnEg lnωmax / dNN1/2 0.899
2 (Eg ωmax)1/2 0.890
3 lnEg ωmax1/2 0.890
4 Eg1/2 lnωmax 0.889
5 Eg1/2 / dNN 0.885
6 lnEg / dNN2 0.883
7 lnEg / exp(dNN) 0.880
8 Eg1/2 / lndNN 0.879
9 ωmax Eg1/2 / lnωmean 0.871
10 (ωmax / Eg)1/2 0.869
… … …
36 (εtot Eg)1/2 0.480
*LASSO: Least absolute shrinkage and selection operator R. Tibshirani et al, “Regression shrinkage and selection via the Lasso” J. R. Stat. Soc. Ser. B 58, 267 (1996).
Ghiringhelli, et al., Phys. Rev. Lett. 114 105503 (2015)
PREDICTION MODEL
Model formula: Fb=24.442 exp{0.315(Eg ωmax)1/2}
Predicted intrinsic breakdown field in MV/m
Band gap in eVPhonon cutoff frequency in THz
Correlation
Entirely based on heuristic not a law! But practically useful in estimating electrical breakdown field strength.
Mac
hine
Lea
rnin
gDFT
C. Kim, G. Pilania, R. Ramprasad, Chem. Mat. 28, 1304 (2016).
APPLICATION TO PEROVSKITES
C. Kim, G. Pilania, R. Ramprasad, J. of Phys. Chem. C 120, 14575-14580 (2016).
209 PerovskitesPrediction of
breakdown fieldCompounds with
highest breakdown field
Contours: Breakdown field (MV/m)
Boron containing compounds appear highly promising
Fb = f(Eg, ωmax)
SUMMARY: EXAMPLE 1Intrinsic dielectric breakdown field of 82 binary octets are obtained by using quantum mechanical calculations.
Phenomenological models as a function of Eg & ωmax are developed using machine learning method.
Application to perovskites predicts boron containing compounds as promising high breakdown strength materials
Chemistry of MaterialsMsc: cm5b04109
The following graphic will be used for the TOC:
1
• Multi-fidelity machine learning models for bandgap prediction
Sci. Rep. 6 19375 (2016) & Comput. Mater. Sci. 129 156 (2017).
Pred
icte
d H
SE g
ap
Computed HSE gap
EXAMPLE 2
MULTI-FIDELITY INFORMATION FUSION
Given limited computational resources, how to tune the cost-accuracy trade-off for optimal predictions?
Method A Method B Method C
Accuracy: High Medium Low
Cost: High Medium Low
Implications for materials genomics:➢ High through chemical space explorations ➢ Rational materials design and discovery
MULTI-FIDELITY LEARNING FOR BANDGAPS OF SOLIDS
• A property of interest for many applications, including energy harvesting, energy storage, catalysis, scintillation and device physics
• A natural hierarchy of “DFT and beyond” approaches provides different options for the “cost-accuracy” trade-offs
➢ Standard local and semi-local XC functionals do not provide a good description of the bandgaps
➢ Hybrid functionals and beyond-DFT techniques are extremely expensive
Jacob's ladder of DFT exchange-correlation (XC) functionals
by John P. Perdew
Perdew et al. J. Chem. Phys. 123, 062201 (2005).
HSE
PBE
AN EXAMPLE OF ELPASOLITES• A class of Materials with potential
applications in energy harvesting and scintillation
• Exhibiting flexible chemistry, amenable to combinatorial synthesis
• Variable chemistry on a fixed cubic lattice makes this class an ideal test case for machine learning
• A dataset of 600 Elpasolites
• DFT computed PBE bandgaps used as low fidelity data and HSE06 bandgaps were taken as high fidelity data
600 PBE
250 HSE
Computed
200 PBE
200 HSE
Training
50 HSE
Validation Prediction
350 HSE F. P. Doty, P. Yang, and M. A. Rodriguez, Elpasolite scintillators, Sandia Natl. Lab2012 (2012).
DETAILS OF THE FEATURE SET
Dey et al. Comput. Mater. Sci. 83, 185 (2014).T. Gu, W. Lu, X. Bao, and N. Chen, Solid State Sci. 8, 129 (2006).G. Pilania, A. Mannodi-Kanakkithodi, B. P. Uberuaga, R. Ramprasad, J. E. Gubernatis, T. Lookman, Sci. Rep. 6 19375 (2016).
The Feature Space
Low fidelity training data
High fidelity training data
High fidelity predictions
• Elemental electronegativity, • First ionization potential, • Empirical radius and • Pettifor's Mendeleev number
(for each of the species occurring at A, B, B' and X sites)
DETAILS OF THE LEARNING MODEL• We employ a co-kriging model within a Bayesian framework
L. Le Gratiet and J. Garnier, Int. J. Uncertain. Quantif. 4, 365 (2014).G. Pilania, A. Mannodi-Kanakkithodi, B. P. Uberuaga, R. Ramprasad, J. E. Gubernatis, T. Lookman, Sci. Rep. 6 19375 (2016).
Independent Gaussian Processes
K =
THE COST-ACCURACY TRADE-OFF
G. Pilania, J. E. Gubernatis, T. Lookman Comput. Mater. Sci. 129 156-163 (2017).
RM
S Er
ror
(eV
) on
the
Val
idat
ion
Set
(Uns
een
Dat
a)
110HSE130PBE
40HSE130PBE
110HSE200PBE
180HSE200PBE
200 PBE
200 HSE
Training
nc
ne
nc > ne
Pred
icte
d H
SE g
ap (e
V)
Computed HSE gap (eV)
Pred
icte
d H
SE g
ap (e
V)
Computed HSE gap (eV)
Pred
icte
d H
SE g
ap (e
V)
Computed HSE gap (eV) Computed HSE gap (eV)
Pred
icte
d H
SE g
ap (e
V)
THE COST-ACCURACY TRADE-OFF
G. Pilania, J. E. Gubernatis, T. Lookman Comput. Mater. Sci. 129 156-163 (2017).
CURRENT STATE-OF-THE-ART
Chan and Ceder, Phys. Rev. Lett. 105, 196403 (2010).R. Ramakrishnan et al. J. Chem. Theory Comput. 11, 2087 (2015).
Use PBE bandgap as a feature in ML
Lee et al. Physical Review B 93, 115104 (2016).
Linear fit DFT v/s experiments
Setyawan et al. ACS combinatorial science 13, 382-390 (2011).
L. Ward et al. NPJ Comp. Mater. 2, 16028 (2016).L. Weston and C. Stampfl, arXiv preprint arXiv:1708.08530, (2017).
Anions
Results are averaged
over the B-site
Catalysis Scintillation Energy harvesting
A reliable Intermediate Filter for Application Specific Screening
G. Pilania, J. E. Gubernatis, T. Lookman Comput. Mater. Sci. 129 156-163 (2017).
PRACTICALLY USEFUL FOR CHEMICAL SPACE EXPLORATIONS
NEXT CRITICAL STEPS…Multi-fidelity learning
Pl Pm Ph
Decreasing number of training data points Increasing computational cost and accuracy
Learning
f(Fi1, Fi2, …, FiN, {Pli }, {Pm
i } ) = Ph
i
The learning problem
a b
New data from computations
and/or experiments
Feature extraction
or fingerprinting
Next candidate selection
ML model training,
validation and prediction
Exploration vs exploitation
tradeoff balancing
Feature/ descriptor database
uncertainty quantification
Learning framework
Adaptive learning &
design
Material
Material 1
Material 2 . . .
Material N
Fingerprint
F11, F12, … F1M
F21, F22, … F2M . . .
FN1, FN2, … FNM
Low Medium High
Pl1 Pm
1 Ph1
Pl2 Pm
2 Ph2
. . . . . .
. . .
PlN Pm
N PhN
Material X PhX = ?
Model input : Optional input : Model output :
Fingerprint vector FX Pl
X and/or PmX
PhX
Prediction model
• Use uncertainties for adaptive design (active learning)
• Tune the trade-offs between exploration and exploitation
NEXT CRITICAL STEPS…• Use uncertainties for adaptive
design (active learning)
• Tune the trade-offs between exploration and exploitation
Multi-fidelity learning
Pl Pm Ph
Decreasing number of training data points Increasing computational cost and accuracy
Learning
f(Fi1, Fi2, …, FiN, {Pli }, {Pm
i } ) = Ph
i
The learning problem
a b
New data from computations
and/or experiments
Feature extraction
or fingerprinting
Next candidate selection
ML model training,
validation and prediction
Exploration vs exploitation
tradeoff balancing
Feature/ descriptor database
uncertainty quantification
Learning framework
Adaptive learning &
design
Material
Material 1
Material 2 . . .
Material N
Fingerprint
F11, F12, … F1M
F21, F22, … F2M . . .
FN1, FN2, … FNM
Low Medium High
Pl1 Pm
1 Ph1
Pl2 Pm
2 Ph2
. . . . . .
. . .
PlN Pm
N PhN
Material X PhX = ?
Model input : Optional input : Model output :
Fingerprint vector FX Pl
X and/or PmX
PhX
Prediction model
• Multi-objective optimization to systematically explore Pareto-optimal set
• Improved ways of incorporating domain knowledge in machine learning models
LEARNING FORM OTHER COMMUNITIES
Surrogate-Based Modeling and Optimization
Slawomir KozielLeifur Leifsson Editors
Applications in Engineering
Springer Tracts in Mechanical Engineering
Emiliano IulianoEsther Andrés Pérez Editors
Application of Surrogate-based Global Optimization to Aerodynamic Design
Springer Proceedings in Mathematics & Statistics
Slawomir KozielLeifur LeifssonXin-She Yang Editors
Solving Computationally Expensive Engineering ProblemsMethods and Applications
Slawomir Koziel · Leifur Leifsson
Simulation-Driven Design by Knowledge-Based Response Correction Techniques
Heike Trautmann · Günter RudolphKathrin Klamroth · Oliver SchützeMargaret Wiecek · Yaochu JinChristian Grimme (Eds.)
123
LNCS
101
73
9th International Conference, EMO 2017Münster, Germany, March 19–22, 2017Proceedings
EvolutionaryMulti-CriterionOptimization
NEXT CRITICAL STEPS…
Materials problems are different! Still, many already existing methods can be useful to solve materials challenges
AcknowledgementsCo-workers & Collaborators:
Funding & Computational ResourcesHosts at the Fritz Haber Institute
J. Gubernatis T. Lookman J. Theiler A. K. M.-Kanakkithodi
C. Kim R. Ramprasad
M. Scheffler L. Ghiringhelli