Causality
Bernhard Scholkopf and Jonas PetersMPI for Intelligent Systems, Tubingen
MLSS, Tubingen
21st July 2015
Charig et al.: “Comparison of treatment of renal calculi by open surgery, (...) ”, British Medical Journal, 1986
B. Scholkopf & J. Peters (MPI) Causality 21st July 2015
Charig et al.: “Comparison of treatment of renal calculi by open surgery, (...) ”, British Medical Journal, 1986
B. Scholkopf & J. Peters (MPI) Causality 21st July 2015
J. Mooij et al.: Distinguishing cause from effect using observational data: methods and benchmarks, submitted
B. Scholkopf & J. Peters (MPI) Causality 21st July 2015
Assume P(X1, . . . ,X4) has been induced by
X1 = f1(X3,N1)
X2 = N2
X3 = f3(X2,N3)
X4 = f4(X2,X3,N4)
• Ni jointly independent
• G0 has no cycles
X4
X2 X3
X1G0
Functional causal model.Can the DAG be recovered from P(X1, . . . ,X4)?
No.JP, J. Mooij, D. Janzing and B. Scholkopf: Causal Discovery with Continuous Additive Noise Models, JMLR 2014
S. Shimizu, P. Hoyer, A. Hyvarinen, A. Kerminen: A linear non-Gaussian acyclic model for causal discovery. JMLR, 2006
P. Buhlmann, JP, J. Ernest: CAM: Causal add. models, high-dim. order search and penalized regr., Annals of Statistics 2014
B. Scholkopf & J. Peters (MPI) Causality 21st July 2015
Assume P(X1, . . . ,X4) has been induced by
X1 = f1(X3,N1)
X2 = N2
X3 = f3(X2,N3)
X4 = f4(X2,X3,N4)
• Ni jointly independent
• G0 has no cycles
X4
X2 X3
X1G0
Functional causal model.Can the DAG be recovered from P(X1, . . . ,X4)? No.JP, J. Mooij, D. Janzing and B. Scholkopf: Causal Discovery with Continuous Additive Noise Models, JMLR 2014
S. Shimizu, P. Hoyer, A. Hyvarinen, A. Kerminen: A linear non-Gaussian acyclic model for causal discovery. JMLR, 2006
P. Buhlmann, JP, J. Ernest: CAM: Causal add. models, high-dim. order search and penalized regr., Annals of Statistics 2014
B. Scholkopf & J. Peters (MPI) Causality 21st July 2015
Assume P(X1, . . . ,X4) has been induced by
X1 = f1(X3) +N1
X2 = N2
X3 = f3(X2) +N3
X4 = f4(X2,X3) +N4
• Ni ∼ N (0, σ2i ) jointly independent
• G0 has no cycles
X4
X2 X3
X1G0
Additive noise model with Gaussian noise.Can the DAG be recovered from P(X1, . . . ,X4)? Yes iff fi nonlinear.JP, J. Mooij, D. Janzing and B. Scholkopf: Causal Discovery with Continuous Additive Noise Models, JMLR 2014
P. Buhlmann, JP, J. Ernest: CAM: Causal add. models, high-dim. order search and penalized regr., Annals of Statistics 2014
S. Shimizu, P. Hoyer, A. Hyvarinen, A. Kerminen: A linear non-Gaussian acyclic model for causal discovery. JMLR, 2006
B. Scholkopf & J. Peters (MPI) Causality 21st July 2015
Consider a distribution generated by
Y = f (X ) + NY
with NY ,Xind∼ N
X Y
Then, if f is nonlinear, there is no
X = g(Y ) + MX
with MX ,Yind∼ N
X Y
JP, J. Mooij, D. Janzing and B. Scholkopf: Causal Discovery with Continuous Additive Noise Models, JMLR 2014
B. Scholkopf & J. Peters (MPI) Causality 21st July 2015
Consider a distribution generated by
Y = f (X ) + NY
with NY ,Xind∼ N
X Y
Then, if f is nonlinear, there is no
X = g(Y ) + MX
with MX ,Yind∼ N
X Y
JP, J. Mooij, D. Janzing and B. Scholkopf: Causal Discovery with Continuous Additive Noise Models, JMLR 2014
B. Scholkopf & J. Peters (MPI) Causality 21st July 2015
Consider a distribution corresponding to
Y = X 3 + NY
with NY ,Xind∼ N
X Y
with
X ∼ N (1, 0.52)
NY ∼ N (0, 0.42)
B. Scholkopf & J. Peters (MPI) Causality 21st July 2015
−0.5 0.0 0.5 1.0 1.5 2.0 2.5
05
1015
X
Y
B. Scholkopf & J. Peters (MPI) Causality 21st July 2015
−0.5 0.0 0.5 1.0 1.5 2.0 2.5
05
1015
X
Y
B. Scholkopf & J. Peters (MPI) Causality 21st July 2015
−0.5 0.0 0.5 1.0 1.5 2.0 2.5
05
1015
X
Y
B. Scholkopf & J. Peters (MPI) Causality 21st July 2015
−0.8 −0.6 −0.4 −0.2 0.0 0.2 0.4
05
1015
gam(X ~ s(Y))$residuals
Y
B. Scholkopf & J. Peters (MPI) Causality 21st July 2015
Surprise (under some assumptions):
2 variables ⇒ p variables
JP, J. Mooij, D. Janzing and B. Scholkopf: Causal Discovery with Continuous Additive Noise Models, JMLR 2014
Let P(X1, . . . ,Xp) be induced by a ...
conditions identif.structural equation model: Xi = fi (XPAi
,Ni ) - 7
additive noise model: Xi = fi (XPAi) + Ni nonlin. fct. 3
causal additive model: Xi =∑
k∈PAifik(Xk) + Ni nonlin. fct. 3
linear Gaussian model: Xi =∑
k∈PAiβikXk + Ni linear fct. 7
.
(results hold for Gaussian noise)
B. Scholkopf & J. Peters (MPI) Causality 21st July 2015
Surprise (under some assumptions):
2 variables ⇒ p variables
JP, J. Mooij, D. Janzing and B. Scholkopf: Causal Discovery with Continuous Additive Noise Models, JMLR 2014
Let P(X1, . . . ,Xp) be induced by a ...
conditions identif.structural equation model: Xi = fi (XPAi
,Ni ) - 7
additive noise model: Xi = fi (XPAi) + Ni nonlin. fct. 3
causal additive model: Xi =∑
k∈PAifik(Xk) + Ni nonlin. fct. 3
linear Gaussian model: Xi =∑
k∈PAiβikXk + Ni linear fct. 7
.
(results hold for Gaussian noise)
B. Scholkopf & J. Peters (MPI) Causality 21st July 2015
B. Scholkopf & J. Peters (MPI) Causality 21st July 2015
GAUL GAUSS“the LINEAR”
B. Scholkopf & J. Peters (MPI) Causality 21st July 2015
GAUL GAUSS“the LINEAR”
B. Scholkopf & J. Peters (MPI) Causality 21st July 2015
0 20 40 60 80 1000
10
20
30
40
50
60
70
80
90
100
Significant
Not significant
Decision rate (%)
Accura
cy (
%)
IGCI
LiNGaM
Additive Noise
PNL
see alsoD. Lopez-Paz, K. Muandet, B. Scholkopf, I. Tolstikhin: Towards a Learning Theory of Cause-Effect Inference, ICML 2015
E. Sgouritsa, D. Janzing, P. Hennig, B. Scholkopf: Inf. of Cause and Effect with Unsupervised Inverse Regr., AISTATS 2015
B. Scholkopf & J. Peters (MPI) Causality 21st July 2015
Real data: genetic perturbation experiments for yeast (Kemmeren et al.,2014)
p = 6170 genes
nobs = 160 wild-types
nint = 1479 gene deletions (targets known)
true hits: ≈ 0.1% of pairs
“Invariant prediction” method: E = {obs, int}JP, P. Buhlmann, N. Meinshausen: Causal inference using inv. pred.: identification and conf. intervals, arXiv, 1501.01332D. Rothenhaeusler, C. Heinze et al.: backShift: Learning causal cyclic graphs from unknown shift interv., arXiv 1506.02494
M. Rojas-Carulla et al.: A Causal Perspective on Domain Adaptation, arXiv 1507.05333
B. Scholkopf & J. Peters (MPI) Causality 21st July 2015
Real data: genetic perturbation experiments for yeast (Kemmeren et al.,2014)
p = 6170 genes
nobs = 160 wild-types
nint = 1479 gene deletions (targets known)
true hits: ≈ 0.1% of pairs
“Invariant prediction” method: E = {obs, int}JP, P. Buhlmann, N. Meinshausen: Causal inference using inv. pred.: identification and conf. intervals, arXiv, 1501.01332D. Rothenhaeusler, C. Heinze et al.: backShift: Learning causal cyclic graphs from unknown shift interv., arXiv 1506.02494
M. Rojas-Carulla et al.: A Causal Perspective on Domain Adaptation, arXiv 1507.05333
B. Scholkopf & J. Peters (MPI) Causality 21st July 2015
Real data: genetic perturbation experiments for yeast (Kemmeren et al.,2014)
p = 6170 genes
nobs = 160 wild-types
nint = 1479 gene deletions (targets known)
true hits: ≈ 0.1% of pairs
“Invariant prediction” method: E = {obs, int}
JP, P. Buhlmann, N. Meinshausen: Causal inference using inv. pred.: identification and conf. intervals, arXiv, 1501.01332D. Rothenhaeusler, C. Heinze et al.: backShift: Learning causal cyclic graphs from unknown shift interv., arXiv 1506.02494
M. Rojas-Carulla et al.: A Causal Perspective on Domain Adaptation, arXiv 1507.05333
B. Scholkopf & J. Peters (MPI) Causality 21st July 2015
Real data: genetic perturbation experiments for yeast (Kemmeren et al.,2014)
p = 6170 genes
nobs = 160 wild-types
nint = 1479 gene deletions (targets known)
true hits: ≈ 0.1% of pairs
“Invariant prediction” method: E = {obs, int}JP, P. Buhlmann, N. Meinshausen: Causal inference using inv. pred.: identification and conf. intervals, arXiv, 1501.01332D. Rothenhaeusler, C. Heinze et al.: backShift: Learning causal cyclic graphs from unknown shift interv., arXiv 1506.02494
M. Rojas-Carulla et al.: A Causal Perspective on Domain Adaptation, arXiv 1507.05333
B. Scholkopf & J. Peters (MPI) Causality 21st July 2015
ACTIVITY GENE 5954
AC
TIV
ITY
GE
NE
471
0
−1.0 −0.5 0.0 0.5
−1.
0−
0.5
0.0
0.5
observational training data
ACTIVITY GENE 5954−1.0 −0.5 0.0 0.5
interventional training data(interv. on genes other than 5954 and 4710)
ACTIVITY GENE 5954
AC
TIV
ITY
GE
NE
471
0
−5 −4 −3 −2 −1 0 1
−5
−4
−3
−2
−1
01
interventional test data point(intervention on gene 5954)
most significant pair
B. Scholkopf & J. Peters (MPI) Causality 21st July 2015
ACTIVITY GENE 3729
AC
TIV
ITY
GE
NE
373
0
−0.5 0.0 0.5 1.0
−0.
50.
00.
51.
0
observational training data
ACTIVITY GENE 3729−0.5 0.0 0.5 1.0
interventional training data(interv. on genes other than 3729 and 3730)
ACTIVITY GENE 3729
AC
TIV
ITY
GE
NE
373
0
−4 −3 −2 −1 0 1 2
−4
−3
−2
−1
01
2 interventional test data point(intervention on gene 3729)
2nd most significant pair
B. Scholkopf & J. Peters (MPI) Causality 21st July 2015
ACTIVITY GENE 3672
AC
TIV
ITY
GE
NE
147
5
−0.5 0.0 0.5 1.0 1.5−0.
50.
00.
51.
01.
5
observational training data
ACTIVITY GENE 3672−0.5 0.0 0.5 1.0 1.5
interventional training data(interv. on genes other than 3672 and 1475)
ACTIVITY GENE 3672
AC
TIV
ITY
GE
NE
147
5
−3 −2 −1 0 1 2
−3
−2
−1
01
2 interventional test data point(intervention on gene 3672)
3rd most significant pair
B. Scholkopf & J. Peters (MPI) Causality 21st July 2015
# INTERVENTION PREDICTIONS
# S
TR
ON
G IN
TE
RV
EN
TIO
N E
FF
EC
TS
0 5 10 15 20 25
02
46
8
PERFECTINVARIANTHIDDEN−INVARIANTPCRFCIREGRESSION (CV−Lasso)GES and GIESRANDOM (99% prediction− interval)
B. Scholkopf & J. Peters (MPI) Causality 21st July 2015
http://xkcdsw.com/3039
B. Scholkopf & J. Peters (MPI) Causality 21st July 2015
B. Watterson: It’s a magical world, Andrews McMeel Publishing, 1996
B. Scholkopf & J. Peters (MPI) Causality 21st July 2015