Causal e�ects
A gentle(?) introduction
Morten Frydenberg & Stefan HansenSection for Biostatistics
Danish Society for Theoretical StatisticsTwo-day meeting
October 31, 2017
Section for Biostatistics Causal e�ects October 31, 2017 1 / 56
Disclaimer
This is
An introduction to the concept of causality.
De�ned by counterfactuals and only for a binarytreatment/action/exposure.
Depicted by Directed Acylic Graphs.
A limited view into the wide �eld of theories, methods and application.
This is not
The complete history of causality.
The complete overview of the state of art/science.
An overview of the many di�erent de�nitions of, cause, causal e�ect,confounding, exchangeability or causal DAGs.
Section for Biostatistics Causal e�ects October 31, 2017 2 / 56
Cause and e�ect
We are every day confronted with questions involving �cause and e�ect�:
Does drug X cause a decrease in blood pressure in persons with
hypertension?
Does alcohol intake during pregnancy cause a lower IQ of the child at
age 15?
Will Brexit cause a lower exchange rate for the Pound?
...and if yes, then:
How large is the e�ect?
Section for Biostatistics Causal e�ects October 31, 2017 3 / 56
Bradford Hills viewpoints
The Bradford Hill criteria viewpoints.
1 Strength: The larger association, the more likely it is causal.
2 Consistency: Consistent �ndings across populations/geographical regions/etc.
3 Speci�city: One cause, one e�ect.
4 Temporality: The e�ect has to occur after the cause.
5 Biological gradient: Greater exposure −→ greater e�ect.
6 Plausibility: Does a causal e�ect sound biologically plausible?
7 Coherence: Coherence between epidemiological and laboratory �ndings.
8 Experiment: Supported by experimental evidence?
9 Analogy: E�ect of similar factors may be considered.
�Here then are nine di�erent viewpoints from all of which we should study
association before we cry causation�1
1Hill, Austin Bradford (1965). The Environment and Disease: Association orCausation? Proc. of the Royal Soc. of Medicine.
Section for Biostatistics Causal e�ects October 31, 2017 4 / 56
The randomized controlled trial
Suppose we want to �nd out if a treatment has an e�ect on an outcome Y .The best way to do this would be to make a Randomized Controlled Trial:
Choose n persons from the population
Randomly allocate each person to the treatment A = 1 or controlA = 0.
Wait and observe the outcome Y for each person.
We let
Ȳa =1
Na
n∑i=1
Yi1Ai=a, with Na = #{i : Ai = a},
denote the average of Y in each treatment group.22Note, that it is random which persons who end up in the two group, implying that
N1 and N0 are random, but N1 + N0 = n.Section for Biostatistics Causal e�ects October 31, 2017 5 / 56
The randomized controlled trial
The obvious estimator of the treatment e�ect is
TE = Ȳ1 − Ȳ0.
In this presentation we will try to explain, why this is a good (unbiased)estimate of the true/causal e�ect of the treatment.3
Or to be more precise: What are the conditions under which this is true?
3We use the term �treatment� throughout this presentation. It might also be anexposure like drinking alcohol during pregnancy.
Section for Biostatistics Causal e�ects October 31, 2017 6 / 56
Counterfactuals � binary treatment
Question:What is the causal e�ect of a binary treatment (A) on an outcome (Y )?
Ideally, we would like to observe
Y 1i : the outcome had the treatment been given,
Y 0i : the outcome had the treatment not been given,
for all individuals in our population.
Y 1i and Y0i are called counterfactuals because we only get to observe one
of them for each individual.
Section for Biostatistics Causal e�ects October 31, 2017 7 / 56
Causal e�ects � de�nitions
The individual causal e�ect (for person i)
Y 1i − Y 0i
We say there is an individual causal e�ect if Y 1i 6= Y 0i .
Let Y 1 and Y 0, denote the counterfactuals for a randomly selectedindividual in our population.
The average causal e�ect
ACE = E[Y 1]− E[Y 0] = E[Y 1 − Y 0]
We say there is an average causal e�ect if ACE 6= 0.We are usually interested in the ACE.
Section for Biostatistics Causal e�ects October 31, 2017 8 / 56
Causal e�ects � the problem
We do not observe both Y 1 and Y 0 but only one of them!
To be precise we observe
the treatment: A
and
the observed outcome: Y = Y A = A · Y 1 + (1− A) · Y 0.
We hope to be able to estimate ACE = E[Y 1]− E[Y 0] based on (A,Y ).
Section for Biostatistics Causal e�ects October 31, 2017 9 / 56
Estimating causal e�ects
We are looking for an estimator of
ACE = E[Y 1]− E[Y 0]
based on a random sample of (A,Y ) of, say, size n.
A perhaps naive attempt would be
ÂCE = TE = Ȳ1 − Ȳ0
where Ȳa is the average Y among those who received treatment a.
However, in general, we will only have
E[ÂCE] = E[Y | A = 1]− E[Y | A = 0].
Section for Biostatistics Causal e�ects October 31, 2017 10 / 56
Exchangeability
So, estimating a causal e�ect boils down to
E[Y 1] = E[Y | A = 1] and E[Y 0] = E[Y | A = 0] (1)
in which case E[ÂCE] = ACE.
Equation (1) will be satis�ed if we have4
Mean exchangeability
E[Y a | A = 1] = E[Y a | A = 0] ( = E[Y a] ), a = 0, 1,
This will be satis�ed if the treatment is independent of the counterfactuals:
Exchangeability (weak ignorability)
Y a ⊥⊥ A, a = 0, 14Representativity assumption; the treated are representative for the untreated in
terms of their counterfactual outcome Y 1 and vice versa.Section for Biostatistics Causal e�ects October 31, 2017 11 / 56
Association is causation
To sum up; under exchangeability we have
E[Y | A = 1]− E[Y | A = 0] = E[Y 1]− E[Y 0].
In other words, association is causation.
Figure borrowed from Causal Inference by Hernán and Robins (2017)Section for Biostatistics Causal e�ects October 31, 2017 12 / 56
When can we expect exchangeability?
In a randomized trial, treatment is determined by �a toss of a coin�.In particular, treatment allocation does not depend on any subjectcharacteristics � hence it cannot be associated with the counterfactuals Y a.
In this case, exchangeability will hold:
The treated and untreated are interchangeable.
Association is causation.
For this reason, the randomized controlled trial is considered the goldstandard of study designs.
Section for Biostatistics Causal e�ects October 31, 2017 13 / 56
An observational (not a RCT) study
In many settings, an RCT is not possible or ethical and we have to rely onobservational studies, where we do not intervene, but observe.Imagine that we have data from an observational study:
Drinking alcohol N IQ at age 15
Yes (1) 400 56No (0) 600 54
TE = 56− 54 = 2
The actual data look exactly as it could have come from an RCT.
The crucial di�erence: the allocation of �treatment� is not random, butpossibly associated with some characteristics of each person.
When is TE = Ȳ1 − Ȳ0 a valid estimate of the ACE?
When can we get a valid estimate of the ACE?
Section for Biostatistics Causal e�ects October 31, 2017 14 / 56
An observational study: Confounding
All mothers
Drinking alcohol N IQ at age 15
Yes 400 56No 600 54
TE = 56− 54 = 2
Mother high education
Drinking alcohol N IQ at age 15
Yes 300 60No 120 70
TE = 60− 70 = −10
Mother low education
Drinking alcohol N IQ at age 15
Yes 100 45No 480 50
TE = 45− 50 = −5
Section for Biostatistics Causal e�ects October 31, 2017 15 / 56
An observational study: Confounding
Assume that within education level it is random who chose to drink duringpregnancy.Mother high education (L = High) 42%
Alcohol N IQ
Yes 300 60No 120 70
TEHigh = −10 Pr(A = 1 | L = High) = 71%
We have exchangeability among mothers with high education. That is
ȲYes = 60 is an estimate of YYes
ȲNo = 70 is an estimate of YNo
So TEHigh = −10 is a valid estimate of the ACE among mothers with higheducation .
Section for Biostatistics Causal e�ects October 31, 2017 16 / 56
An observational study: Confounding
Assume that within education level it is random who chose to drink duringpregnancy.Mother low education (L = Low) 58%
Alcohol N IQ
Yes 100 45No 480 50
TELow = −5 Pr(A = 1 | L = Low) = 17%
We have exchangeability among mothers with low education. That is
ȲYes = 45 is an estimate of YYes
ȲNo = 50 is an estimate of YNo
So TELow = −5 is a valid estimate of the ACE among mothers with loweducation.
Section for Biostatistics Causal e�ects October 31, 2017 17 / 56
An observational study: Confounding
Assume that within education level it is random who chose to drink duringpregnancy.Mother high education (L = High) 42%
Alcohol N IQ
Yes 300 60No 120 70
TEHigh = −10 Pr(A = 1 | L = High) = 71%
Mother low education (L = Low) 58%
Alcohol N IQ
Yes 100 45No 480 50
TELow = −5 Pr(A = 1 | L = Low) = 17%
For the whole population: The probability of alcohol drinking is associatedwith the IQ levels, i.e. we do not have no exchangeability.TE = 2 is not a valid estimate of the ACE!
Section for Biostatistics Causal e�ects October 31, 2017 18 / 56
Conditional exchangeability
What if we have an observational study?One could have collected information on a su�ciently large set of variables,say, L such that we have5
Conditional exchangeability
Y a ⊥⊥ A | L, a = 0, 1.
It implies
E[Y a | A = 1, L = l ] = E[Y a | A = 0, L = l ], a = 0, 1,
i.e. the two treatment groups are interchangeable with respect to theirmean counterfactual levels conditional on L.
Conditional exchangeability will be satis�ed in a conditional randomizedtrial.
5Here L can be a collection of variables.Section for Biostatistics Causal e�ects October 31, 2017 19 / 56
Conditional exchangeability
Under conditional exchangeability we have that association is causationconditional on L, that is,
E[Y | A = a, L = l ] = E[Y a | L = l ].
Thus,
E[Y a] =∑l
E[Y | A = a, L = l ] Pr(L = l)
where the right-hand side is estimable from data. Moreover,
ÃCE =∑l
TEl · Pr(L = l)
will be an unbiased estimate of ACE. This is called standardization.
Section for Biostatistics Causal e�ects October 31, 2017 20 / 56
Confounding
No confounding
When we have mean exchangeability, i.e.
E[Y a | A = 0] = E[Y a|A = 1], a = 0, 1,
we say that there is no confounding.
No unobserved confounding
When we have mean exchangeability conditional on L, i.e.
E[Y a | A = 0, L = l ] = E[Y a | A = 1, L = l ], a = 0, 1,
we say that there is no unobserved confounding.
Section for Biostatistics Causal e�ects October 31, 2017 21 / 56
An observational study: Standardization
It was random within education level, who chose to drink during pregnancy.But the probability of alcohol drinking is di�erent in the twosub-populations which also have di�erent (counterfactual) levels of IQ.We have confounding!
TE = 2
is not a valid estimate of average causal e�ect of alcohol.
But we have conditional exchangeability given level of education!We can estimate the average causal e�ect of alcohol by standardization:
Pr(L = High) = 42% Pr(A = 1 | L = High) = 71% TEHigh = −10Pr(L = Low) = 58% Pr(A = 1 | L = Low) = 17% TELow = −5
ÃCE = (−10) · 0.42 + (−5) · 0.58 = −0.07
Section for Biostatistics Causal e�ects October 31, 2017 22 / 56
Directed acyclic graphs (DAGs)
Growing interest in the use of DAGs to answer causal questions.
Compact and intuitive way of encoding subject matter knowledge.
Enables researchers to use graphical tools as a way of validatingexchangeability (no unmeasured confounding).
May ease communication among researchers.
Section for Biostatistics Causal e�ects October 31, 2017 23 / 56
Directed acyclic graphs (DAGs)
Let V1, . . . ,Vk denote k nodes.
A DAG D is a graph with nodes V1, . . . ,Vk with directed edges (arrows)containing no cycles (loops).
V2
V4
V3
V6
V5
V1
Section for Biostatistics Causal e�ects October 31, 2017 24 / 56
Probabilistic DAGs
When nodes represent random variables, DAGs can be used to encodeconditional independence properties. This is done by assuming the Markovproperty. A probability measure P satis�es the (local) Markov property if
Vj ⊥⊥ nd(Vj) | pa(Vj)
under P .6
Here nd(Vj) and pa(Vj) are the non-descendants and parents of Vj ,respectively.
6Note that non-descendants and parents can be read o� the DAG withoutconsidering the numbering of the variables.
Section for Biostatistics Causal e�ects October 31, 2017 25 / 56
Probabilistic DAGs
V2
V4
V3
V6
V5
V1
V1 ⊥⊥ V2, V2 ⊥⊥ (V1,V3,V5),V3 ⊥⊥ (V2,V5) | V1, V4 ⊥⊥ (V1,V5) | (V2,V3),
V5 ⊥⊥ (V2,V3,V4) | V1, V6 ⊥⊥ (V1,V2,V3) | (V4,V5).
Section for Biostatistics Causal e�ects October 31, 2017 26 / 56
Probabilistic DAGs
Equipping a DAG with the Markov property is equivalent to requiring thatV = (V1, . . . ,Vk) satis�es the Markov factorization
7;
fV (v) =k∏
j=1
fVj |pa(Vj )(vj | paj), v = (v1, . . . , vk).
For our example this is
fV (v) = fV6|(V4,V5)(v6 | v4, v5) · fV5|V1(v5 | v1)· fV4|(V2,V3)(v4 | v2, v3) · fV3|V1(v3 | v1)· fV2(v2) · fV1(v1).
7Here assuming that V has density w.r.t. P.Section for Biostatistics Causal e�ects October 31, 2017 27 / 56
d -separation
What about other conditional independencies between the variables?
Let S1, S2, S3 denote subsets of variables. Whether S1 ⊥⊥ S2 | S3 may beanswered by checking whether S1 and S2 is d-separated by S3.
The d-separation algorithm
Look at the DAG including only variables in S1,S2, S3 and theirancestors (the ancestral graph).
Connect the parents and remove arrowheads (the moralized graph).
Check whether S3 separates all paths between S1 and S2 in this graph.
Section for Biostatistics Causal e�ects October 31, 2017 28 / 56
d -separation � Example
Assume P is Markov with respect to the DAG below.
V2
V4
V3
V6
V5
V1
Can we deduce that V4 ⊥⊥ (V1,V5) | V3?
Section for Biostatistics Causal e�ects October 31, 2017 29 / 56
d -separation � Example
Is V4 ⊥⊥ (V1,V5) | V3?
V2
V4
V3
V6
V5
V1
Yes, as V3 blocks all paths from V4 to (V1,V5) in this graph.
Section for Biostatistics Causal e�ects October 31, 2017 30 / 56
d -separation � Example
Is V4 ⊥⊥ (V1,V5) | V3?
V2
V4
V3
V6
V5
V1
Yes, as V3 blocks all paths from V4 to (V1,V5) in this graph.
Section for Biostatistics Causal e�ects October 31, 2017 31 / 56
d -separation � Example
Is V4 ⊥⊥ (V1,V5) | V3?
V2
V4
V3
V6
V5
V1
Yes, as V3 blocks all paths from V4 to (V1,V5) in this graph.
Section for Biostatistics Causal e�ects October 31, 2017 32 / 56
d -separation
So d-separation provides us with a graphical tool for verifying someconditional independencies!
Property of d-separation
If D is a DAG and P is a probability measure satisfying the Markovproperty induced by D, then
S3 d-separates S1 and S2 in D =⇒ S1 ⊥⊥ S2 | S3 (w.r.t. P).
Section for Biostatistics Causal e�ects October 31, 2017 33 / 56
Causal DAGs
So far we have not made any causal statements - we have only talkedabout probabilistic independencies.
A di�erent use of DAGs is to have them incorporate causal relationshipsthereby giving rise to causal DAGs. The assumptions behind causal DAGs
are many,
are not always rigorously stated in the litterature,
are untestable from data as they involve counterfactual thinking,
are not explicitly visible from an ordinary DAG.
Today, we are going to give our interpretation of the assumptions behinda causal DAG.
Section for Biostatistics Causal e�ects October 31, 2017 34 / 56
Causal DAGs
A causal DAG is a DAG satisfying:
1 if two variables have a common cause, this common cause should itselfbe in the graph,
2 the lack of an arrow between two variables is interpreted as theabsence of a direct causal e�ect,8
3 any variable is a potential cause of its descendants (temporality),
4 and for which the causal Markov assumption holds.
What does this mean?
...for all individuals.8...for all individuals.Section for Biostatistics Causal e�ects October 31, 2017 35 / 56
Causal DAGs � counterfactuals
Counterfactuals depend only on the value of its parents, e.g.
F a,b,c,d ,e = F d ,e and Da,b,c = Db,c .
B
D
C
F
E
A
Note that we are not talking about Da,b,c,e,f but only Da,b,c (temporality).
Section for Biostatistics Causal e�ects October 31, 2017 36 / 56
Causal DAGs � the arrows
The presence of an arrow between two variables thus means that either
there is a direct causal e�ect between them,
or we are not willing to assume there isn't any.
However, C can still have an indirect e�ect on F through D in that
FDc ,E 6= FDc
′,E , for some c 6= c ′
potentially.
B
D
C
F
E
A
Section for Biostatistics Causal e�ects October 31, 2017 37 / 56
Causal DAGs � the causal Markov property
Let us intervene on A by setting A = a.
B
D
C
F
E
A
The new random variables B,C a,DCa,E a,FD
Ca ,E a should satisfy theMarkov property for the DAG without A (for all values of a).
Section for Biostatistics Causal e�ects October 31, 2017 38 / 56
Causal DAGs � the causal Markov property
Let us intervene on A by setting A = a.
B
DCa
C a
FDCa ,E a
E a
a
The derived random variables B,C a,DCa,E a,FD
Ca ,E a should satisfy theMarkov property for the DAG without A (for all values of a).
Section for Biostatistics Causal e�ects October 31, 2017 39 / 56
Causal DAGs � the causal Markov property
Let us intervene on A by setting A = a.
B
DCa
C a
FDCa ,E a
E aE a
a
The derived random variables B,C a,DCa,E a,FD
Ca ,E a should satisfy theMarkov property for the DAG without A (for all values of a).
Section for Biostatistics Causal e�ects October 31, 2017 40 / 56
The backdoor criterion
Causal DAGs can be used to determine if the conditional exchangeabilityassumption, Y a ⊥⊥ A | L, holds for some set of measured variables L.
The backdoor criterion
We have
Y a ⊥⊥ A | L, a = 0, 1,
if all backdoor paths between A and Y are blocked by L. In other words,if A and Y are d-separated by L in the graph obtained by removing arrowsout of A.
Section for Biostatistics Causal e�ects October 31, 2017 41 / 56
Backdoor criterion: The randomized trial
A
U
Y
w [The graph to check]
A
U
Y
Y a ⊥⊥ A
Section for Biostatistics Causal e�ects October 31, 2017 42 / 56
Backdoor criterion: Observed confounder
A
L
Y
w [The graph to check]L
A Y
Y a 6⊥⊥ A Y a ⊥⊥ A | L
Section for Biostatistics Causal e�ects October 31, 2017 43 / 56
Backdoor criterion: Observed confounders
A
L1 L2
Y w [The graph to check]L1
A
L2
Y
Y a 6⊥⊥ A Y a ⊥⊥ A | (L1, L2) Y a ⊥⊥ A | L1 Y a ⊥⊥ A | L2
Section for Biostatistics Causal e�ects October 31, 2017 44 / 56
Backdoor criterion: Observed confounders
Y a ⊥⊥ A? Y a ⊥⊥ A | L1? Y a ⊥⊥ A | L2?
AL3
L2
L1
Y
w [The graph to check]
AL3
L2
L1
Y
Section for Biostatistics Causal e�ects October 31, 2017 45 / 56
Backdoor criterion: Observed confounders
Y a ⊥⊥ A? Y a ⊥⊥ A | L1? Y a ⊥⊥ A | L2?
AL3
L2
L1
Y
[Ancestral graph]
w [Moralized graph]
AL3
L2
L1
Y
Section for Biostatistics Causal e�ects October 31, 2017 46 / 56
Backdoor criterion: Observed confounders
Y a ⊥⊥ A? Y a ⊥⊥ A | L1? Y a ⊥⊥ A | L2?
AL3
L2
L1
Y
[Ancestral graph]
w[Moralized graph]
AL3
L2
L1
Y
Section for Biostatistics Causal e�ects October 31, 2017 47 / 56
Backdoor criterion: Observed confounders
Y a ⊥⊥ A? Y a ⊥⊥ A | L1? Y a ⊥⊥ A | L2?
AL3
L2
L1
Y
[Ancestral graph]
w
[Moralized graph]
AL3
L2
L1
Y
Section for Biostatistics Causal e�ects October 31, 2017 48 / 56
Backdoor criterion: Observed confounders
Y a ⊥⊥ A? Y a ⊥⊥ A | L1? Y a ⊥⊥ A | L2?
AL3
L2
L1
Y
[Ancestral graph]
w
[Moralized graph]
AL3
L2
L1
Y
Section for Biostatistics Causal e�ects October 31, 2017 49 / 56
Backdoor criterion: Observed confounders
Y a ⊥⊥ A | (L1, L3)? Y a ⊥⊥ A | (L2, L3)?Y a ⊥⊥ A | (L1, L2, L3)? Y a ⊥⊥ A | L3?
AL3
L2
L1
Y
[Ancestral graph]
w
[Moralized graph]
AL3
L2
L1
Y
Section for Biostatistics Causal e�ects October 31, 2017 50 / 56
Backdoor criterion: Observed confounders
Y a ⊥⊥ A | (L1, L3)? Y a ⊥⊥ A | (L2, L3)?Y a ⊥⊥ A | (L1, L2, L3)? Y a ⊥⊥ A | L3?
AL3
L2
L1
Y
[Ancestral graph]
w
[Moralized graph]
AL3
L2
L1
Y
Section for Biostatistics Causal e�ects October 31, 2017 51 / 56
Backdoor criterion: Observed confounders
Y a ⊥⊥ A | (L1, L3)? Y a ⊥⊥ A | (L2, L3)?Y a ⊥⊥ A | (L1, L2, L3)? Y a ⊥⊥ A | L3?
AL3
L2
L1
Y
[Ancestral graph]
w
[Moralized graph]
AL3
L2
L1
Y
Section for Biostatistics Causal e�ects October 31, 2017 52 / 56
Backdoor criterion: Observed confounders
Y a ⊥⊥ A | (L1, L3)? Y a ⊥⊥ A | (L2, L3)?Y a ⊥⊥ A | (L1, L2, L3)? Y a ⊥⊥ A | L3?
AL3
L2
L1
Y
[Ancestral graph]
w
[Moralized graph]
AL3
L2
L1
Y
Section for Biostatistics Causal e�ects October 31, 2017 53 / 56
Backdoor criterion: Observed confounders
Y a ⊥⊥ A | (L1, L3)? Y a ⊥⊥ A | (L2, L3)?Y a ⊥⊥ A | (L1, L2, L3)? Y a 6⊥⊥ A | L3?
AL3
L2
L1
Y
[Ancestral graph]
w
[Moralized graph]
AL3
L2
L1
Y
Section for Biostatistics Causal e�ects October 31, 2017 54 / 56
Backdoor criterion: Unobserved confounders
A
U
Y
w [The graph to check]U
A Y
Y a 6⊥⊥ A Y a ⊥⊥ A | U,...but U is unobserved. What to do, Vanessa?
Section for Biostatistics Causal e�ects October 31, 2017 55 / 56
Some references
P. Dawid.Beware of the DAG!Journal of Machine Learning and Reseach: Workshop and Conference
Proceedings, 2008.
M. A. Hernán and J. M. Robins.Causal Inference.Unpublished book, 2017.
J. Pearl.Causality (2nd Edition).Cambridge University Press, 2009.
Section for Biostatistics Causal e�ects October 31, 2017 56 / 56