A non-Gaussian approach for causal structure learning in ... · Estimation of causal effects using...

Shohei Shimizu

Shiga University / Osaka University, Japan

Launching department of data science in 2017!

1

A non-Gaussian approach for causal

structure learning in the presence of

hidden common causes

CRM Workshop Statistical Causal Inference and its Applications to Genetics

Illustrating the problem

Strong correlation btw chocolate

consumption and number of Nobel

laureates (Messerli12NEJM)

3

2002-2011Chocolate consumption (kg/yr/capita)Num

. N

obel la

ure

ate

s p

er

10 m

illio

n p

op.

Corr. 0.791

P-value < 0.001

Eating more chocolate increases

num. Nobel laureates?

• Three candidate models (Messerli12NEJM; Maurage+13JNutrition)

4

Chclt Nobel?

Chclt Nobelor

GDP GDP

Chclt Nobelor

GDP

Corr. 0.791

P-value < 0.001N

ob

el

Chocolate

Hidden

Common

cause

Manage this gap!

Hidden

Common

cause

Hidden

Common

cause

1. Estimation of causal direction with no temporal information being used

2. Coping with hidden common causes

5

Divided into two parts

x1 x2

?x1 x2

or

x1 x2 ?x1 x2 or

f1 f1

12b21b

12b21b

Once a direction has been estimated, the connection strength b21 or b12 can be computed

Basic non-Gaussian model

with no hidden common

causeS. Shimizu, P. O. Hoyer, A. Hyvärinen

and A. Kerminen

Journal of Machine Learning Research

2006

x1 x2 ?x1 x2 or

Linear Non-Gaussian Acyclic

Model (LiNGAM) (Shimizu+06JMLR)

• Identifiable: causal directions and coefficients

• Various extensions including nonlinear (Hoyer+08NIPS,

Zhang+09UAI) and cyclic (Lacerda+08UAI) models

7

i

ij

jiji exbx

x1 x2

x3

21b

23b13b

2e

3e

1e

Linearity

Acyclicity

Non-Gaussian errors eiIndependence of errors ei

(no hidden common causes)

88Different directions give

different data distributionsGaussian Non-Gaussian

(ex. uniform)

Model 1:

Model 2:

x1

x2

x1

x2

e1

e2

x1

x2

e1

e2

x1

x2

x1

x2

x1

x2

212

11

8.0 exx

ex

22

121 8.0

ex

exx

1varvar 21 xx

,021 eEeE

LiNGAM with hidden

common causes

P. O. Hoyer, S. Shimizu, A. Kerminen,

and M. Palviainen

Int. J. Approximate Reasoning

2008

x1 x2?

x1 x2

orf1 f1

qf

2121

1

22

1

1

11

exbfx

efx

Q

q

qq

Q

q

qq

i

ij

jij

Q

q

qiqi exbfx 1

• Extension to incorporate non-Gaussian hidden

common causes

LiNGAM with hidden

common causes (Hoyer+08IJAR)

10

where are independent (WLG): ),,1( Qqfq

x1 x2 2e1e

1f 2f

Our proposal:

A Bayesian LiNGAM

approach

S. Shimizu and K. Bollen.

Journal of Machine Learning Research,

2014

and something extra

Key idea (1/2)

• Transform the model to a model with

no hidden common causes

12

)1(

1x)1(

2x

)(

2

mx)1(

1xx1 x2

f1 fQ…

2e1e

)1(

2e)1(

1e

)(

2

me)(

1

me

……

21b

21b

21b)(

2

m

)1(

2

LiNGAM with no hidden

common causes but with

possibly different

intercepts over obs.

LiNGAM with

hidden common

causes

)1(

1

)(

1

m

Key idea (2/2)

• Include the sums of hidden common causes as

the model parameters, i.e., observation-specific

intercepts:

• Not explicitly model hidden common causes

– Neither necessary to specify the number of hidden

common causes Q nor estimate the coefficients

13

)(

2

m

)(

2

)(

121

1

)(

2

)(

2

mmQ

q

m

qq

m exbfx

m-th obs.:

q2

Observation-specific

intercept

• Compare the marginal likelihoods wth data stndrdzd

• Many obs.-specific intercepts

– Similar to mixed models and multi-level models

– Informative prior

• Model p(𝑒𝑖) by a Generalized Gaussian with a shape

parameter (hypr-prmtr selection: Empirical Bayes)

)()(

121

)(

2

)(

2

)(

1

)(

1

)(

1

m

i

mmm

mmm

exbx

ex

Bayesian model selection14

),,1;2,1()( nmim

i

Model 3 (x1 x2)

)(

2

)(

2

)(

2

)(

1

)(

212

)(

1

)(

1

mmm

mmmm

ex

exbx

Model 4 (x1 x2)

Prior for the observation-specific

intercepts

• Motivation: Central limit theorem

– Sums of independent variables tend to be more

Gaussian

• Approximate the density by a bell-shaped

curve dist.

– Dependent due to hidden common causes

15

Q

q

m

qq

mQ

q

m

qq

m ff1

)(

2

)(

2

1

)(

1

)(

1 ,

~)(

2

)(

1

m

m

t-distribution with sd ,

correlation , and DOF1221,v

)(m

qf

(here, 8)

Experiment on artificial

datasets

x1 x2

f1 fQ…

2e1e x1 x2

f1 fQ…

2e1e

?or

Direction estimation17

Total: 240 trials

Precisions

N. Decisions

N. obs 2logBF>0 2logBF>2 2logBF>6 2logBF>10

50 0.62 0.63 0.70 0.59

100 0.64 0.68 0.72 0.84

200 0.66 0.69 0.74 0.81

N.obs 2logBF>0 2logBF>2 2logBF>6 2logBF>10

50 240 163 60 17

100 240 194 118 62

200 240 213 153 105

Strong evidence (Kass & Raftery,1995)

Connection strength estimation18

Direction wrongly estimated

Direction correctly estimated

Estim

ate

d

True

What should be the next?

Identifiable models for continuous

and discrete variable

(with hidden common causes)

LiNGAM + Logistic model?

• Continuous effect and discrete cause:

• Discrete effect and continuous (discrete)

cause:

• 𝑓𝑖 satisfies:

• Difficulty: Not closed under marginalization?Prior?

20

i

dscrt

lil

R

r

dscrt

rir

Q

q

cntns

qiq

cntns

i exbffx 11

idscrt

j

cntns

j

dscrt

r

cntns

qi

dscrt

i exxfffx ),or(},{},{

cntns

jij

cntns

jij

R

r

dscrt

rir

Q

q

cntns

qiq

dscrt

i xbxbffx orlogistic11

Conclusion

Conclusion

• Estimation of causal direction in the

presence of hidden common causes is a

major challenge in causal discovery

• Proposed a semi-parametric approach

– LiNGAM + mixed-model

• Open problem: Identifiable models for

continuous and discrete variables (and

simple estimation algorithms for the

models)

22

References

• S. Shimizu, P. O. Hoyer, A. Hyvärinen and A. Kerminen. A linear

non-gaussian acyclic model for causal discovery. Journal of

Machine Learning Research, 7(Oct): 2003--2030, 2006.

• P. O. Hoyer, S. Shimizu, A. Kerminen and M. Palviainen.

Estimation of causal effects using linear non-gaussian causal

models with hidden variables. International Journal of

Approximate Reasoning, 49(2): 362-378, 2008.

• S. Shimizu and K. Bollen. Bayesian estimation of causal direction

in acyclic structural equation models with individual-specific

confounder variables and non-Gaussian distributions. Journal

of Machine Learning Research, 15(Aug): 2629--2652, 2014.

• A collection of related papers:

https://sites.google.com/site/sshimizu06/home/lingampapers

23

Date post:	22-Jul-2020
Category:	Documents
Upload:	others
View:	4 times
Download:	0 times

A non-Gaussian approach for causal structure learning in ... · Estimation of causal effects using...

Documents