Causal effects 0.6em A gentle(?) introduction · 2017. 11. 2. · Proc. of the Royal Soc. of...

Causal e�ects

A gentle(?) introduction

Morten Frydenberg & Stefan HansenSection for Biostatistics

Danish Society for Theoretical StatisticsTwo-day meeting

October 31, 2017

Section for Biostatistics Causal e�ects October 31, 2017 1 / 56

Disclaimer

This is

An introduction to the concept of causality.

De�ned by counterfactuals and only for a binarytreatment/action/exposure.

Depicted by Directed Acylic Graphs.

A limited view into the wide �eld of theories, methods and application.

This is not

The complete history of causality.

The complete overview of the state of art/science.

An overview of the many di�erent de�nitions of, cause, causal e�ect,confounding, exchangeability or causal DAGs.


Cause and e�ect

We are every day confronted with questions involving �cause and e�ect�:

Does drug X cause a decrease in blood pressure in persons with

hypertension?

Does alcohol intake during pregnancy cause a lower IQ of the child at

age 15?

Will Brexit cause a lower exchange rate for the Pound?

...and if yes, then:

How large is the e�ect?


Bradford Hills viewpoints

The Bradford Hill criteria viewpoints.

1 Strength: The larger association, the more likely it is causal.

2 Consistency: Consistent �ndings across populations/geographical regions/etc.

3 Speci�city: One cause, one e�ect.

4 Temporality: The e�ect has to occur after the cause.

5 Biological gradient: Greater exposure −→ greater e�ect.

6 Plausibility: Does a causal e�ect sound biologically plausible?

7 Coherence: Coherence between epidemiological and laboratory �ndings.

8 Experiment: Supported by experimental evidence?

9 Analogy: E�ect of similar factors may be considered.

�Here then are nine di�erent viewpoints from all of which we should study

association before we cry causation�1

1Hill, Austin Bradford (1965). The Environment and Disease: Association orCausation? Proc. of the Royal Soc. of Medicine.


The randomized controlled trial

Suppose we want to �nd out if a treatment has an e�ect on an outcome Y .The best way to do this would be to make a Randomized Controlled Trial:

Choose n persons from the population

Randomly allocate each person to the treatment A = 1 or controlA = 0.

Wait and observe the outcome Y for each person.

We let

Ȳa =1

Na

n∑i=1

Yi1Ai=a, with Na = #{i : Ai = a},

denote the average of Y in each treatment group.22Note, that it is random which persons who end up in the two group, implying that

N1 and N0 are random, but N1 + N0 = n.Section for Biostatistics Causal e�ects October 31, 2017 5 / 56

The randomized controlled trial

The obvious estimator of the treatment e�ect is

TE = Ȳ1 − Ȳ0.

In this presentation we will try to explain, why this is a good (unbiased)estimate of the true/causal e�ect of the treatment.3

Or to be more precise: What are the conditions under which this is true?

3We use the term �treatment� throughout this presentation. It might also be anexposure like drinking alcohol during pregnancy.


Counterfactuals � binary treatment

Question:What is the causal e�ect of a binary treatment (A) on an outcome (Y )?

Ideally, we would like to observe

Y 1i : the outcome had the treatment been given,

Y 0i : the outcome had the treatment not been given,

for all individuals in our population.

Y 1i and Y0i are called counterfactuals because we only get to observe one

of them for each individual.


Causal e�ects � de�nitions

The individual causal e�ect (for person i)

Y 1i − Y 0i

We say there is an individual causal e�ect if Y 1i 6= Y 0i .

Let Y 1 and Y 0, denote the counterfactuals for a randomly selectedindividual in our population.

The average causal e�ect

ACE = E[Y 1]− E[Y 0] = E[Y 1 − Y 0]

We say there is an average causal e�ect if ACE 6= 0.We are usually interested in the ACE.


Causal e�ects � the problem

We do not observe both Y 1 and Y 0 but only one of them!

To be precise we observe

the treatment: A

and

the observed outcome: Y = Y A = A · Y 1 + (1− A) · Y 0.

We hope to be able to estimate ACE = E[Y 1]− E[Y 0] based on (A,Y ).


Estimating causal e�ects

We are looking for an estimator of

ACE = E[Y 1]− E[Y 0]

based on a random sample of (A,Y ) of, say, size n.

A perhaps naive attempt would be

ÂCE = TE = Ȳ1 − Ȳ0

where Ȳa is the average Y among those who received treatment a.

However, in general, we will only have

E[ÂCE] = E[Y | A = 1]− E[Y | A = 0].


Exchangeability

So, estimating a causal e�ect boils down to

E[Y 1] = E[Y | A = 1] and E[Y 0] = E[Y | A = 0] (1)

in which case E[ÂCE] = ACE.

Equation (1) will be satis�ed if we have4

Mean exchangeability

E[Y a | A = 1] = E[Y a | A = 0] ( = E[Y a] ), a = 0, 1,

This will be satis�ed if the treatment is independent of the counterfactuals:

Exchangeability (weak ignorability)

Y a ⊥⊥ A, a = 0, 14Representativity assumption; the treated are representative for the untreated in

terms of their counterfactual outcome Y 1 and vice versa.Section for Biostatistics Causal e�ects October 31, 2017 11 / 56

Association is causation

To sum up; under exchangeability we have

E[Y | A = 1]− E[Y | A = 0] = E[Y 1]− E[Y 0].

In other words, association is causation.

Figure borrowed from Causal Inference by Hernán and Robins (2017)Section for Biostatistics Causal e�ects October 31, 2017 12 / 56

When can we expect exchangeability?

In a randomized trial, treatment is determined by �a toss of a coin�.In particular, treatment allocation does not depend on any subjectcharacteristics � hence it cannot be associated with the counterfactuals Y a.

In this case, exchangeability will hold:

The treated and untreated are interchangeable.

Association is causation.

For this reason, the randomized controlled trial is considered the goldstandard of study designs.


An observational (not a RCT) study

In many settings, an RCT is not possible or ethical and we have to rely onobservational studies, where we do not intervene, but observe.Imagine that we have data from an observational study:

Drinking alcohol N IQ at age 15

Yes (1) 400 56No (0) 600 54

TE = 56− 54 = 2

The actual data look exactly as it could have come from an RCT.

The crucial di�erence: the allocation of �treatment� is not random, butpossibly associated with some characteristics of each person.

When is TE = Ȳ1 − Ȳ0 a valid estimate of the ACE?

When can we get a valid estimate of the ACE?


An observational study: Confounding

All mothers


Yes 400 56No 600 54

TE = 56− 54 = 2

Mother high education


Yes 300 60No 120 70

TE = 60− 70 = −10

Mother low education


Yes 100 45No 480 50

TE = 45− 50 = −5



Assume that within education level it is random who chose to drink duringpregnancy.Mother high education (L = High) 42%

Alcohol N IQ

Yes 300 60No 120 70

TEHigh = −10 Pr(A = 1 | L = High) = 71%

We have exchangeability among mothers with high education. That is

ȲYes = 60 is an estimate of YYes

ȲNo = 70 is an estimate of YNo

So TEHigh = −10 is a valid estimate of the ACE among mothers with higheducation .



Assume that within education level it is random who chose to drink duringpregnancy.Mother low education (L = Low) 58%

Alcohol N IQ

Yes 100 45No 480 50

TELow = −5 Pr(A = 1 | L = Low) = 17%

We have exchangeability among mothers with low education. That is

ȲYes = 45 is an estimate of YYes

ȲNo = 50 is an estimate of YNo

So TELow = −5 is a valid estimate of the ACE among mothers with loweducation.



Assume that within education level it is random who chose to drink duringpregnancy.Mother high education (L = High) 42%

Alcohol N IQ

Yes 300 60No 120 70

TEHigh = −10 Pr(A = 1 | L = High) = 71%

Mother low education (L = Low) 58%

Alcohol N IQ

Yes 100 45No 480 50

TELow = −5 Pr(A = 1 | L = Low) = 17%

For the whole population: The probability of alcohol drinking is associatedwith the IQ levels, i.e. we do not have no exchangeability.TE = 2 is not a valid estimate of the ACE!


Conditional exchangeability

What if we have an observational study?One could have collected information on a su�ciently large set of variables,say, L such that we have5


Y a ⊥⊥ A | L, a = 0, 1.

It implies

E[Y a | A = 1, L = l ] = E[Y a | A = 0, L = l ], a = 0, 1,

i.e. the two treatment groups are interchangeable with respect to theirmean counterfactual levels conditional on L.

Conditional exchangeability will be satis�ed in a conditional randomizedtrial.

5Here L can be a collection of variables.Section for Biostatistics Causal e�ects October 31, 2017 19 / 56


Under conditional exchangeability we have that association is causationconditional on L, that is,

E[Y | A = a, L = l ] = E[Y a | L = l ].

Thus,

E[Y a] =∑l

E[Y | A = a, L = l ] Pr(L = l)

where the right-hand side is estimable from data. Moreover,

ÃCE =∑l

TEl · Pr(L = l)

will be an unbiased estimate of ACE. This is called standardization.


Confounding

No confounding

When we have mean exchangeability, i.e.

E[Y a | A = 0] = E[Y a|A = 1], a = 0, 1,

we say that there is no confounding.

No unobserved confounding

When we have mean exchangeability conditional on L, i.e.

E[Y a | A = 0, L = l ] = E[Y a | A = 1, L = l ], a = 0, 1,

we say that there is no unobserved confounding.


An observational study: Standardization

It was random within education level, who chose to drink during pregnancy.But the probability of alcohol drinking is di�erent in the twosub-populations which also have di�erent (counterfactual) levels of IQ.We have confounding!

TE = 2

is not a valid estimate of average causal e�ect of alcohol.

But we have conditional exchangeability given level of education!We can estimate the average causal e�ect of alcohol by standardization:

Pr(L = High) = 42% Pr(A = 1 | L = High) = 71% TEHigh = −10Pr(L = Low) = 58% Pr(A = 1 | L = Low) = 17% TELow = −5

ÃCE = (−10) · 0.42 + (−5) · 0.58 = −0.07


Directed acyclic graphs (DAGs)

Growing interest in the use of DAGs to answer causal questions.

Compact and intuitive way of encoding subject matter knowledge.

Enables researchers to use graphical tools as a way of validatingexchangeability (no unmeasured confounding).

May ease communication among researchers.


Directed acyclic graphs (DAGs)

Let V1, . . . ,Vk denote k nodes.

A DAG D is a graph with nodes V1, . . . ,Vk with directed edges (arrows)containing no cycles (loops).

V2

V4

V3

V6

V5

V1


Probabilistic DAGs

When nodes represent random variables, DAGs can be used to encodeconditional independence properties. This is done by assuming the Markovproperty. A probability measure P satis�es the (local) Markov property if

Vj ⊥⊥ nd(Vj) | pa(Vj)

under P .6

Here nd(Vj) and pa(Vj) are the non-descendants and parents of Vj ,respectively.

6Note that non-descendants and parents can be read o� the DAG withoutconsidering the numbering of the variables.


Probabilistic DAGs

V2

V4

V3

V6

V5

V1

V1 ⊥⊥ V2, V2 ⊥⊥ (V1,V3,V5),V3 ⊥⊥ (V2,V5) | V1, V4 ⊥⊥ (V1,V5) | (V2,V3),

V5 ⊥⊥ (V2,V3,V4) | V1, V6 ⊥⊥ (V1,V2,V3) | (V4,V5).


Probabilistic DAGs

Equipping a DAG with the Markov property is equivalent to requiring thatV = (V1, . . . ,Vk) satis�es the Markov factorization

7;

fV (v) =k∏

j=1

fVj |pa(Vj )(vj | paj), v = (v1, . . . , vk).

For our example this is

fV (v) = fV6|(V4,V5)(v6 | v4, v5) · fV5|V1(v5 | v1)· fV4|(V2,V3)(v4 | v2, v3) · fV3|V1(v3 | v1)· fV2(v2) · fV1(v1).

7Here assuming that V has density w.r.t. P.Section for Biostatistics Causal e�ects October 31, 2017 27 / 56

d -separation

What about other conditional independencies between the variables?

Let S1, S2, S3 denote subsets of variables. Whether S1 ⊥⊥ S2 | S3 may beanswered by checking whether S1 and S2 is d-separated by S3.

The d-separation algorithm

Look at the DAG including only variables in S1,S2, S3 and theirancestors (the ancestral graph).

Connect the parents and remove arrowheads (the moralized graph).

Check whether S3 separates all paths between S1 and S2 in this graph.


d -separation � Example

Assume P is Markov with respect to the DAG below.

V2

V4

V3

V6

V5

V1

Can we deduce that V4 ⊥⊥ (V1,V5) | V3?



Is V4 ⊥⊥ (V1,V5) | V3?

V2

V4

V3

V6

V5

V1

Yes, as V3 blocks all paths from V4 to (V1,V5) in this graph.



Is V4 ⊥⊥ (V1,V5) | V3?

V2

V4

V3

V6

V5

V1



d -separation

So d-separation provides us with a graphical tool for verifying someconditional independencies!

Property of d-separation

If D is a DAG and P is a probability measure satisfying the Markovproperty induced by D, then

S3 d-separates S1 and S2 in D =⇒ S1 ⊥⊥ S2 | S3 (w.r.t. P).


Causal DAGs

So far we have not made any causal statements - we have only talkedabout probabilistic independencies.

A di�erent use of DAGs is to have them incorporate causal relationshipsthereby giving rise to causal DAGs. The assumptions behind causal DAGs

are many,

are not always rigorously stated in the litterature,

are untestable from data as they involve counterfactual thinking,

are not explicitly visible from an ordinary DAG.

Today, we are going to give our interpretation of the assumptions behinda causal DAG.


Causal DAGs

A causal DAG is a DAG satisfying:

1 if two variables have a common cause, this common cause should itselfbe in the graph,

2 the lack of an arrow between two variables is interpreted as theabsence of a direct causal e�ect,8

3 any variable is a potential cause of its descendants (temporality),

4 and for which the causal Markov assumption holds.

What does this mean?

...for all individuals.8...for all individuals.Section for Biostatistics Causal e�ects October 31, 2017 35 / 56

Causal DAGs � counterfactuals

Counterfactuals depend only on the value of its parents, e.g.

F a,b,c,d ,e = F d ,e and Da,b,c = Db,c .

B

D

C

F

E

A

Note that we are not talking about Da,b,c,e,f but only Da,b,c (temporality).


Causal DAGs � the arrows

The presence of an arrow between two variables thus means that either

there is a direct causal e�ect between them,

or we are not willing to assume there isn't any.

However, C can still have an indirect e�ect on F through D in that

FDc ,E 6= FDc

′,E , for some c 6= c ′

potentially.

B

D

C

F

E

A


Causal DAGs � the causal Markov property

Let us intervene on A by setting A = a.

B

D

C

F

E

A

The new random variables B,C a,DCa,E a,FD

Ca ,E a should satisfy theMarkov property for the DAG without A (for all values of a).




B

DCa

C a

FDCa ,E a

E a

a

The derived random variables B,C a,DCa,E a,FD





B

DCa

C a

FDCa ,E a

E aE a

a

The derived random variables B,C a,DCa,E a,FD



The backdoor criterion

Causal DAGs can be used to determine if the conditional exchangeabilityassumption, Y a ⊥⊥ A | L, holds for some set of measured variables L.

The backdoor criterion

We have

Y a ⊥⊥ A | L, a = 0, 1,

if all backdoor paths between A and Y are blocked by L. In other words,if A and Y are d-separated by L in the graph obtained by removing arrowsout of A.


Backdoor criterion: The randomized trial

A

U

Y

w [The graph to check]

A

U

Y

Y a ⊥⊥ A


Backdoor criterion: Observed confounder

A

L

Y

w [The graph to check]L

A Y

Y a 6⊥⊥ A Y a ⊥⊥ A | L


Backdoor criterion: Observed confounders

A

L1 L2

Y w [The graph to check]L1

A

L2

Y

Y a 6⊥⊥ A Y a ⊥⊥ A | (L1, L2) Y a ⊥⊥ A | L1 Y a ⊥⊥ A | L2



Y a ⊥⊥ A? Y a ⊥⊥ A | L1? Y a ⊥⊥ A | L2?

AL3

L2

L1

Y

w [The graph to check]

AL3

L2

L1

Y



Y a ⊥⊥ A? Y a ⊥⊥ A | L1? Y a ⊥⊥ A | L2?

AL3

L2

L1

Y

[Ancestral graph]

w [Moralized graph]

AL3

L2

L1

Y



Y a ⊥⊥ A? Y a ⊥⊥ A | L1? Y a ⊥⊥ A | L2?

AL3

L2

L1

Y

[Ancestral graph]

w[Moralized graph]

AL3

L2

L1

Y



Y a ⊥⊥ A? Y a ⊥⊥ A | L1? Y a ⊥⊥ A | L2?

AL3

L2

L1

Y

[Ancestral graph]

w

[Moralized graph]

AL3

L2

L1

Y



Y a ⊥⊥ A | (L1, L3)? Y a ⊥⊥ A | (L2, L3)?Y a ⊥⊥ A | (L1, L2, L3)? Y a ⊥⊥ A | L3?

AL3

L2

L1

Y

[Ancestral graph]

w

[Moralized graph]

AL3

L2

L1

Y




AL3

L2

L1

Y

[Ancestral graph]

w

[Moralized graph]

AL3

L2

L1

Y



Y a ⊥⊥ A | (L1, L3)? Y a ⊥⊥ A | (L2, L3)?Y a ⊥⊥ A | (L1, L2, L3)? Y a 6⊥⊥ A | L3?

AL3

L2

L1

Y

[Ancestral graph]

w

[Moralized graph]

AL3

L2

L1

Y


Backdoor criterion: Unobserved confounders

A

U

Y

w [The graph to check]U

A Y

Y a 6⊥⊥ A Y a ⊥⊥ A | U,...but U is unobserved. What to do, Vanessa?


Some references

P. Dawid.Beware of the DAG!Journal of Machine Learning and Reseach: Workshop and Conference

Proceedings, 2008.

M. A. Hernán and J. M. Robins.Causal Inference.Unpublished book, 2017.

J. Pearl.Causality (2nd Edition).Cambridge University Press, 2009.


Date post:	05-Feb-2021
Category:	Documents
Upload:	others
View:	0 times
Download:	0 times

Causal effects 0.6em A gentle(?) introduction · 2017. 11. 2. · Proc. of the Royal Soc. of...

Documents