Shohei Shimizu
Shiga University / Osaka University, Japan
Launching department of data science in 2017!
1
A non-Gaussian approach for causal
structure learning in the presence of
hidden common causes
CRM Workshop Statistical Causal Inference and its Applications to Genetics
Illustrating the problem
Strong correlation btw chocolate
consumption and number of Nobel
laureates (Messerli12NEJM)
3
2002-2011Chocolate consumption (kg/yr/capita)Num
. N
obel la
ure
ate
s p
er
10 m
illio
n p
op.
Corr. 0.791
P-value < 0.001
Eating more chocolate increases
num. Nobel laureates?
• Three candidate models (Messerli12NEJM; Maurage+13JNutrition)
4
Chclt Nobel?
Chclt Nobelor
GDP GDP
Chclt Nobelor
GDP
Corr. 0.791
P-value < 0.001N
ob
el
Chocolate
Hidden
Common
cause
Manage this gap!
Hidden
Common
cause
Hidden
Common
cause
1. Estimation of causal direction with no temporal information being used
2. Coping with hidden common causes
5
Divided into two parts
x1 x2
?x1 x2
or
x1 x2 ?x1 x2 or
f1 f1
12b21b
12b21b
Once a direction has been estimated, the connection strength b21 or b12 can be computed
Basic non-Gaussian model
with no hidden common
causeS. Shimizu, P. O. Hoyer, A. Hyvärinen
and A. Kerminen
Journal of Machine Learning Research
2006
x1 x2 ?x1 x2 or
Linear Non-Gaussian Acyclic
Model (LiNGAM) (Shimizu+06JMLR)
• Identifiable: causal directions and coefficients
• Various extensions including nonlinear (Hoyer+08NIPS,
Zhang+09UAI) and cyclic (Lacerda+08UAI) models
7
i
ij
jiji exbx
x1 x2
x3
21b
23b13b
2e
3e
1e
Linearity
Acyclicity
Non-Gaussian errors eiIndependence of errors ei
(no hidden common causes)
88Different directions give
different data distributionsGaussian Non-Gaussian
(ex. uniform)
Model 1:
Model 2:
x1
x2
x1
x2
e1
e2
x1
x2
e1
e2
x1
x2
x1
x2
x1
x2
212
11
8.0 exx
ex
22
121 8.0
ex
exx
1varvar 21 xx
,021 eEeE
LiNGAM with hidden
common causes
P. O. Hoyer, S. Shimizu, A. Kerminen,
and M. Palviainen
Int. J. Approximate Reasoning
2008
x1 x2?
x1 x2
orf1 f1
qf
2121
1
22
1
1
11
exbfx
efx
Q
q
Q
q
i
ij
jij
Q
q
qiqi exbfx 1
• Extension to incorporate non-Gaussian hidden
common causes
LiNGAM with hidden
common causes (Hoyer+08IJAR)
10
where are independent (WLG): ),,1( Qqfq
x1 x2 2e1e
1f 2f
Our proposal:
A Bayesian LiNGAM
approach
S. Shimizu and K. Bollen.
Journal of Machine Learning Research,
2014
and something extra
Key idea (1/2)
• Transform the model to a model with
no hidden common causes
12
)1(
1x)1(
2x
)(
2
mx)1(
1xx1 x2
f1 fQ…
2e1e
)1(
2e)1(
1e
)(
2
me)(
1
me
……
21b
21b
21b)(
2
m
)1(
2
LiNGAM with no hidden
common causes but with
possibly different
intercepts over obs.
LiNGAM with
hidden common
causes
)1(
1
)(
1
m
Key idea (2/2)
• Include the sums of hidden common causes as
the model parameters, i.e., observation-specific
intercepts:
• Not explicitly model hidden common causes
– Neither necessary to specify the number of hidden
common causes Q nor estimate the coefficients
13
)(
2
m
)(
2
)(
121
1
)(
2
)(
2
mmQ
q
m
m exbfx
m-th obs.:
q2
Observation-specific
intercept
• Compare the marginal likelihoods wth data stndrdzd
• Many obs.-specific intercepts
– Similar to mixed models and multi-level models
– Informative prior
• Model p(𝑒𝑖) by a Generalized Gaussian with a shape
parameter (hypr-prmtr selection: Empirical Bayes)
)()(
121
)(
2
)(
2
)(
1
)(
1
)(
1
m
i
mmm
mmm
exbx
ex
Bayesian model selection14
),,1;2,1()( nmim
i
Model 3 (x1 x2)
)(
2
)(
2
)(
2
)(
1
)(
212
)(
1
)(
1
mmm
mmmm
ex
exbx
Model 4 (x1 x2)
Prior for the observation-specific
intercepts
• Motivation: Central limit theorem
– Sums of independent variables tend to be more
Gaussian
• Approximate the density by a bell-shaped
curve dist.
– Dependent due to hidden common causes
15
Q
q
m
mQ
q
m
m ff1
)(
2
)(
2
1
)(
1
)(
1 ,
~)(
2
)(
1
m
m
t-distribution with sd ,
correlation , and DOF1221,v
)(m
qf
(here, 8)
Experiment on artificial
datasets
x1 x2
f1 fQ…
2e1e x1 x2
f1 fQ…
2e1e
?or
Direction estimation17
Total: 240 trials
Precisions
N. Decisions
N. obs 2logBF>0 2logBF>2 2logBF>6 2logBF>10
50 0.62 0.63 0.70 0.59
100 0.64 0.68 0.72 0.84
200 0.66 0.69 0.74 0.81
N.obs 2logBF>0 2logBF>2 2logBF>6 2logBF>10
50 240 163 60 17
100 240 194 118 62
200 240 213 153 105
Strong evidence (Kass & Raftery,1995)
Connection strength estimation18
Direction wrongly estimated
Direction correctly estimated
Estim
ate
d
True
What should be the next?
Identifiable models for continuous
and discrete variable
(with hidden common causes)
LiNGAM + Logistic model?
• Continuous effect and discrete cause:
• Discrete effect and continuous (discrete)
cause:
• 𝑓𝑖 satisfies:
• Difficulty: Not closed under marginalization?Prior?
20
i
dscrt
lil
R
r
dscrt
rir
Q
q
cntns
qiq
cntns
i exbffx 11
idscrt
j
cntns
j
dscrt
r
cntns
qi
dscrt
i exxfffx ),or(},{},{
cntns
jij
cntns
jij
R
r
dscrt
rir
Q
q
cntns
qiq
dscrt
i xbxbffx orlogistic11
Conclusion
Conclusion
• Estimation of causal direction in the
presence of hidden common causes is a
major challenge in causal discovery
• Proposed a semi-parametric approach
– LiNGAM + mixed-model
• Open problem: Identifiable models for
continuous and discrete variables (and
simple estimation algorithms for the
models)
22
References
• S. Shimizu, P. O. Hoyer, A. Hyvärinen and A. Kerminen. A linear
non-gaussian acyclic model for causal discovery. Journal of
Machine Learning Research, 7(Oct): 2003--2030, 2006.
• P. O. Hoyer, S. Shimizu, A. Kerminen and M. Palviainen.
Estimation of causal effects using linear non-gaussian causal
models with hidden variables. International Journal of
Approximate Reasoning, 49(2): 362-378, 2008.
• S. Shimizu and K. Bollen. Bayesian estimation of causal direction
in acyclic structural equation models with individual-specific
confounder variables and non-Gaussian distributions. Journal
of Machine Learning Research, 15(Aug): 2629--2652, 2014.
• A collection of related papers:
https://sites.google.com/site/sshimizu06/home/lingampapers
23