Compressed sensingmeets
symbolic regression:SISSO
- Part 2 -
Luca M. Ghiringhelli
On-line course on Big Data and Artificial Intelligence in Materials Sciences
P = c1d1 + c2d2 + … + cndn
Compressed sensing, not only LASSO
Residual1P (property)
d1d2
P = c1d1 + c2d2 + … + cndn
Residual1P (property)
d1d1*d2
d2*
Compressed sensing, not only LASSO
Greedy method:Orthogonal Matching Pursuit
Limitation of greedy methods:
Compressed sensing: SISSO
SIS:Sure-Independence Screening
S2DS1D
featuresResidual1D
features
P (property)
Ouyang et al., PRM 2018, DOI: 10.1103/PhysRevMaterials.2.083802
Compressed sensing: SISSO
SIS:Sure-Independence Screening
S2DS1D
featuresResidual1D
features
P (property)
SO:Sparsifying Operator
Exact (by enumeration) overSimilarity criterion in SIS step:● Scalar product (Pearson correlation)● Spearman correlation (captures nonlinear monotonicity)● Mutual information, …● However: computational cost is to be factored in
Compressed sensing: SISSO
SIS:Sure-Independence Screening
S2DS1D
featuresResidual1D
features
P (property)
SO:Sparsifying Operator
Exact solution of:
by enumeration over
Ouyang et al., PRM 2018, DOI: 10.1103/PhysRevMaterials.2.083802
Compressed sensing: SISSO
SIS:Sure-Independence Screening
S2DS1D
featuresResidual1D
features
P (property)
SO:Sparsifying Operator
Exact (by enumeration) over
In practice:0. i = 1, S = Ø1. Rank features according to similarity to Residuali-1 (Property = Residual0). 2. Add first k features to S. 3. Perform least-square regression over all n-tuples in S.4. The lowest error model is the i-dimensional SISSO model.5. i ← i+1; goto 1.
P = c1d1 + c2d2 + … + cndn
Predicting crystal structures from the composition
Octet binaries (NaCl, ZnS, BN, KF, GaAs, CaO, …)Rock-salt or Zinc-blende structure?
Learning the relative stability from the property of the isolated atomic species
Rock salt6-fold coordinationIonic bonding
Zinc blende4-fold coordinationCovalent bonding
KS le
vels
[eV]
Valence p
Valence sRadius @ max
example: Sn (Tin)
Valence p (HOMO)
Valence sKS level s [eV]
LUMO
Atomic features
exp(x)
xn
Energy2 Energy1
| x - y |
x / y
Length1 Length2
x / y
exp(-x)
ln(x)
Systematic construction of candidates
Length1 Length2
x + y
x·y
arctan(x)
Length1 Length2
x / y
exp(-x)
Energy2 Energy1
| x - y |
Systematic construction of candidates
P = c1d1 + c2d2 + … + cndnEach feature (column in the matrix), is a tree-represented candidate function, projected onto the training data.The (selected) descriptor has as component the features selected by the sparse recovery algorithm (here, SISSO).
Structure map from SISSOstarting from 7x2 atomic features
LMG et al., PRL 2015 DOI: 10.1103/PhysRevLett.114.105503LMG et al., NJP 2017 DOI: 10.1088/1367-2630/aa57bf
Predicting crystal structures from the composition
P = c1d1 + c2d2 + …
In SISSOthe “hyperparameters” are:
The level of sparsity i.e., the number of “activated” features in P = c1d1 + c2d2 + …
The size of the feature spacedetermined by the complexity of the tree
Tuned via cross-validation: Iterated random selection of a subset of the data for training + test on the left-out set
Data-driven model complexity
Ouyang et al. J. Phys. Mater. 2019 DOI: 10.1088/2515-7639/ab077b
In SISSO the “hyperparameters” are:
The level of sparsity i.e., the number of “activated” features in P = c1d1 + c2d2 + …
The size of the feature spacedetermined by the complexity of the tree
Tuned via cross-validation: Iterated random selection of a subset of the data for training + test on the left-out set
Two levels of the tree, formulas like
Three levels of the tree, formulas like
Data-driven model complexity
Ouyang et al. J. Phys. Mater. 2019 DOI: 10.1088/2515-7639/ab077b
Compressed-sensing-based model identification:Shares concepts with
● Regularized regression. But: Massive sparsification.
● Dimensionality reduction. But supervised, and yielding sparse, “interpretable” descriptors
● Features (basis-set) selection. But: non-greedy solver.
● Symbolic regression. But: deterministic solver.
Few bits of taxonomy for SISSO
Compressed-sensing-based model identification:Shares concepts with
● Regularized regression. But: Massive sparsification.
● Dimensionality reduction. But supervised, and yielding sparse, “interpretable” descriptors
● Features (basis-set) selection. But: non-greedy solver.
● Symbolic regression. But: deterministic solver.
Few bits of taxonomy for SISSO
Open challenges of symbolic regression + compressed sensing approach:● Efficiently include constants and scaling factors in the symbolic tree● Include known, physical invariances in the symbolic-tree construction● Include vectors (and tensors) as features. Contractions?
Intepretability
James, Witten, Hastie, Tibshirani, An Introduction to Statistical Learning, Springer (2013)
Flexibility/complexity
Inte
rpre
tabi
lity
● Sparsifying methods, LASSO, SISSO, Symbolic regression● Linear regression
● Kernelized regression● Trees
● Forests● Support vector machine
● Neural Networks
Model Interpretability: related to sparse features selection
Intepretability
James, Witten, Hastie, Tibshirani, An Introduction to Statistical Learning, Springer (2013)
Flexibility/complexity
Inte
rpre
tabi
lity
● Sparsifying methods, LASSO, SISSO, Symbolic regression● Linear regression
● Kernelized regression● Trees
● Forests● Support vector machine
● Neural Networks
Model Interpretability: related to sparse features selection
In general, with symbolic regression:● If the exact equation is within reach of the searching/optimizing algorithm,
it is found. Simple model does not necessarily mean less accurateFor other powerful ML methods (kernel regression, regression treesand forests, deep learning, this is not the case.
● The few fitting parameters yield stability with respect to noise (low complexity no overfitting)→
x Atomic fractionIE Ionization energyχ Electronegativity
Intepretability: what might endow us with
x Atomic fractionIE Ionization energyχ Electronegativity
Intepretability: what might endow us with
HgTe (std pressure, ZB)GaAs (std pressure, ZB)
CdTe (std pressure, ZB)
(9 GPa, RS)(29 GPa, oI4)
(4 GPa, RS)
Intepretability: what might endow us with
Intepretability: what might endow us with
Multi-task learning
Application: multi-phase stability diagramProperties: crystal-structure formation energies
d1
d2 RS
CsCl
Multi-task learning
Multi-task learning
Ouyang et al. J. Phys. Mater. 2019 DOI: 10.1088/2515-7639/ab077b
Multi-task learning
Ouyang et al. J. Phys. Mater. 2019 DOI: 10.1088/2515-7639/ab077b
MT-SISSO is remarkably
data-parsimonious
Multi-task learning
Ouyang et al. J. Phys. Mater. 2019 DOI: 10.1088/2515-7639/ab077b
Intepretability
James, Witten, Hastie, Tibshirani, An Introduction to Statistical Learning, Springer (2013)
Flexibility/complexity
Inte
rpre
tabi
lity
● Sparsifying methods, LASSO, SISSO, Symbolic regression● Linear regression
● Kernelized regression● Trees
● Forests● Support vector machine
● Neural Networks
Model Interpretability: related to sparse features selection
In general, with symbolic regression:● If the exact equation is within reach of the searching/optimizing algorithm,
it is found. Simple model does not necessarily mean less accurateFor other powerful ML methods (kernel regression, regression treesand forests, deep learning, this is not the case.
● The few fitting parameters yield stability with respect to noise (low complexity no overfitting)→
Slide 1Slide 2Slide 6Slide 8Slide 12Slide 14Slide 20Slide 21Slide 22Slide 23Slide 25Slide 26Slide 31Slide 32Slide 37Slide 38Slide 39Slide 40Slide 41Slide 43Slide 44Slide 45Slide 50Slide 51Slide 52Slide 53Slide 54Slide 55