Bla
nch
ena
y C
oP
201
9
Teaching Causality in ECO372
P. Blanchenay
Teaching and Learning Community of Practice
2019/05/27
Bla
nch
ena
y C
oP
201
9
(Data) science without (data) conscience is but the ruin of the soul.
— (counterfactual) Rabelais.
Bla
nch
ena
y C
oP
201
9
Angrist & Pischke “Through our classes darkly” (JEP, 2017)
Undergrad econometrics does not address causality
Focus on Gauss-Markov assumptions & their failures
Implicit or explicit focus on estimator efficiency
Little on identification strategies (e.g. diff-in-diff, RDD)
Bla
nch
ena
y C
oP
201
9
Outline
Today: Approach to causal inference in econometrics
• Structure of ECO372
• Two frameworks to explain causality (not regressions)
• How I test students
• Causality beyond econometrics
Not today: settling debates
• reduced form vs. structural
• potential outcomes vs. causal graphs (DAGs)
Bla
nch
ena
y C
oP
201
9
ECO372 APPLIED REGRESSION ANALYSIS
Bla
nch
ena
y C
oP
201
9
ECO372 Applied Regression Analysis and Empirical Papers
2019H1 renamed : “Data Analysis and Applied Econometrics in Practice”
Objective: quant. methods → applied empirical research
• Causal inference and identification
• Empirical strategies
• Stata, replication
Bla
nch
ena
y C
oP
201
9
Motivation#1Getting teaching closer to practice
Many questions in economics are causal
Identification central in applied empirical work
Variety of empirical strategies (diff-in-diff, RDD…)
Reliability of findings
Classic econometrics instruction does not focus on this
Bla
nch
ena
y C
oP
201
9
Angrist & Pischke approach
Potential Outcomes (Roy-Rubin) causal model
Start with RCTs as “perfect setting”
Regression to deal with selection on observables
Quasi-experimental approaches:
Instrumental variables
Diff-in-diff, segue into Panel Data
Regression Discontinuity
Bla
nch
ena
y C
oP
201
9
Gauss-Markov assumptions failure
GM assumptions “Classic” metrics Angrist Pischke
Linear model with mean zero errors
Get functional form right CEF Linear approximation
Errors are homoskedastic GLS “Just add ‘robust’”
Errors are serially uncorrelated
GLS, time series • “Just add ‘robust’”• Clustered SE in
clustered RCTs
(Errors are normally distributed)
Alternative estimators Focus on large sample / asymptotics
Exogeneity • Extensive discussion of measurement error
• Technical discussion of IV
• Small discussion of measurement error
• Extensive focus on empirical strategies that yield CIA
Bla
nch
ena
y C
oP
201
9
Motivation #2Economics comparative advantage
Think hard about data!
Many disciplines do stats; not many causal inference
Big data not the solution to all problems
Bla
nch
ena
y C
oP
201
9
Course structure
RCTs as “perfect setting”
Regression to deal with selection on observables
Instrumental Variables
RCTs with imperfect compliance
Difference-in-differences
Regression Discontinuity
Bla
nch
ena
y C
oP
201
9
Causal frameworks
Two causal frameworks as Ariadne’s threads
• Potential Outcomes
• Causal graphs (DAGs)
Emphasis on identification assumptions
Bla
nch
ena
y C
oP
201
9
A different take on regressions
𝑦𝑖 = 𝛼 + 𝛽𝐷𝑖 + 𝛾𝑊𝑖 + 휀𝑖
Start with binary treatment
Not all regressors equal
Bla
nch
ena
y C
oP
201
9
Correlation and causation
Two distinct questions:
1. “If there were a correlation between 𝐷 and 𝑌, would this represent the effect of 𝐷 on 𝑌?”
• Tools: assumptions, DAGs, sometimes regressions
2. “Is there a correlation between 𝐷 and 𝑌?”
• Tools: regressions, statistical inference
Bla
nch
ena
y C
oP
201
9
Bla
nch
ena
y C
oP
201
9
TWO CAUSAL FRAMEWORKS
Potential Outcomes, DAGs
Bla
nch
ena
y C
oP
201
9
Combining two approaches
• Potential Outcomes (PO)
• Used by the textbook
• Causal graphs (mostly Pearl, 2000)
Bla
nch
ena
y C
oP
201
9
Potential Outcomes (PO) / Roy-Rubin
• Binary treatment 𝐷
• Potential Outcome: 𝑌0𝑖 if untreated ; 𝑌1𝑖 if treated ;
• Treatment effect: 𝑌1𝑖 − 𝑌0𝑖
• But only observe either 𝑌𝑖0 or 𝑌1𝑖
Bla
nch
ena
y C
oP
201
9
Potential Outcomes (PO) / Roy-Rubin
Assume constant effect 𝛽: 𝑌𝑖 = 𝛼 + 𝛽𝐷𝑖 + 휀𝑖Baseline potential outcome: 𝑌0𝑖 = 𝛼 + 휀𝑖
Then:𝐸 𝑌𝑖 𝐷𝑖 = 1 − 𝐸 𝑌𝑖 𝐷𝑖 = 0
observed
= 𝛽 + 𝐸 휀𝑖 𝐷𝑖 = 1 − 𝐸 휀𝑖 𝐷𝑖 = 0𝐒𝐞𝐥𝐞𝐜𝐭𝐢𝐨𝐧 𝐛𝐢𝐚𝐬
Conditional Independence Assumption (CIA)𝐸 휀𝑖 𝐷𝑖 = 0 = 𝐸 휀𝑖 𝐷𝑖 = 1
Bla
nch
ena
y C
oP
201
9
All design assumptions deal with CIA
Diff-in-diff: common trend assumptionAbsent treatment, treated units would have evolved the same way as untreated units
𝑦𝑖𝑡 = 𝛼 + 𝛽 ⋅ 𝑇𝑅𝐸𝐴𝑇𝑖 + 𝛾 𝑃𝑂𝑆𝑇𝑡+𝛿 𝑇𝑅𝐸𝐴𝑇 × 𝑃𝑂𝑆𝑇 𝑖𝑡 + 휀𝑖𝑡
CIA : 𝐸 휀𝑖𝑡 𝑇𝑅𝐸𝐴𝑇, 𝑃𝑂𝑆𝑇 = 𝐸[휀𝑖𝑡]
Bla
nch
ena
y C
oP
201
9
Directed Acyclical Graphs (DAGs)
• X has a (possible) effect on D, and on Y
• D has an effect of Y
• 휀 is unobserved (and has an effect on Y)• Typically omitted from graph
• Directed: causal relationships have a direction (effect of X on Y)
• Acyclical: Forbids cycles such as𝐷
𝑋
𝑌
𝐷
𝑋
𝑌
휀
Bla
nch
ena
y C
oP
201
9
𝑋 is a confounder(common cause)
𝑋 is a collider(common outcome)
Terminology
𝐷
𝑋
𝑌 𝐷
𝑋
𝑌
Bla
nch
ena
y C
oP
201
9
Causal paths
2 causal paths from D to Y:
Direct path: 𝐷 → 𝑌
Backdoor path: 𝐷 ← 𝑋 → 𝑌
𝐷
𝑋
𝑌
휀
Bla
nch
ena
y C
oP
201
9
Open causal paths
• Open if either:
• There is no collider on the path
• There is a collider 𝑋, and we control / hold it constant 𝑋
𝐷
𝑋
𝑌 𝐷
𝑊
𝑌
𝑋
Bla
nch
ena
y C
oP
201
9
Closed causal paths
• Closed if either:
• There is a collider on the path
• We control for a non-collider on the path
𝐷
𝑊
𝑌
𝑋
𝐷
𝑊
𝑌
Bla
nch
ena
y C
oP
201
9
A and B correlated because open paths
• 𝐴 → 𝐵
• 𝐴 → 𝐷 ← 𝐵
Correlation between A and B does not represent only the direct effect of A on B
Open paths create correlations
𝐴
𝐶
𝐵
𝐸
𝐷
𝐹
Bla
nch
ena
y C
oP
201
9
DAGs and identification
Backdoor criterion (sufficient)
The covariance between 𝐷 and 𝑌 identifies the causal effect of 𝐷 on 𝑌 if all backdoor paths from 𝐷 to 𝑌 are closed.
Identification strategies try to rule out backdoor paths
Bla
nch
ena
y C
oP
201
9
Different benefits
• Potential Outcomes
• Easy to talk about counterfactuals
• Neat interpretable algebra, formula for bias
• Causal graphs
• Visual
• Connects the assumptions of each empirical strategy
• Offers immediate reasoning about control variables
Bla
nch
ena
y C
oP
201
9
Collider bias (~ “bad controls”)
• Controlling on a collider (common outcome) re-opens a causal path
Collider bias, “bad controls”, endog. selection bias, Simpson’s paradox
(Conditional) Correlation between D and Y does not reflect causal effect of 𝐷 on 𝑌
𝐷
𝑋
𝑌
Bla
nch
ena
y C
oP
201
9
Collider bias (~ “bad controls”)(1) (2)
SAT Maths SAT Maths
SAT Verbal 0.029 -0.251***
(0.0364) (0.0350)
Accepted 0.598***
(19.26)
Observations 800 800
(1)
SAT Maths
SAT Verbal 0.029
(0.0364)
Accepted
Observations 800
Bla
nch
ena
y C
oP
201
9
2 frameworks = 2 Ariadne’s threads
Identification strategies rule out
• Violation of CIA
• Open backdoor paths
Examples:
• RCT
• Multivariate regression (control variables)
• Individual fixed effects
• Instrumental variables
Bla
nch
ena
y C
oP
201
9
TESTING STUDENTS
Bla
nch
ena
y C
oP
201
9
How I test students’ understanding
Questions on specific papers/studies
• If the researchers estimate Eq(1), would መ𝛽 estimate the causal effect of 𝑋 on 𝑌? Why or why not?
Make students create data and then run estimations
True or False questions (h/t Karen Bernhardt-Walther)
Bla
nch
ena
y C
oP
201
9
Bla
nch
ena
y C
oP
201
9
Effect of Facezon HQ on wages
Q1: Generate wages according to:
𝑤𝑖𝑡 = 10 + 1.3 𝐻𝑄𝑖𝑡+0.2 𝑦𝑒𝑎𝑟𝑡 × 𝐶𝑖𝑡𝑦𝐴𝑖 + 0.6 𝑦𝑒𝑎𝑟𝑡 × 𝐶𝑖𝑡𝑦𝐵𝑖 + 휀𝑖𝑡
Q2: You receive the data on wages in each city. How would you estimate the effect of Facezon HQ on wages? Estimate diff-in-diff:
𝑤𝑖𝑡 = 𝛼 + 𝛽𝐶𝑖𝑡𝑦𝐵𝑖 + 𝛾 𝑃𝑂𝑆𝑇2016 𝑡
+𝛿 𝐶𝑖𝑡𝑦𝐵 × 𝑃𝑂𝑆𝑇2016 + 𝑢𝑖𝑡
Does your estimate መ𝛿 correspond to what you expected from Q1? Why or why not?
Bla
nch
ena
y C
oP
201
9
True/false questions
For an RCT on the effect of receiving food stamps on the decision to work, participants were recruited at Whole Foods and No Frills supermarkets. True or false? In that RCT, one should not control for the recruitment location, as this is a ‘bad control’.
The Ontario government considers offering a subsidy for childcare to families that fall below $40,000 of yearly joint income. True or false? Families are likely to under-report their income in order to qualify, but a researcher could always use an instrumental variable approach to estimate the effect of the childcare subsidy.
Bla
nch
ena
y C
oP
201
9
Trade-offs
Few proofs & little maths (students selection)
Little discussion of heteroskedasticity
• Just add vce(robust) or vce(cluster …)
No time series
OLS only (IV as 2SLS)
Little discussion of heterogeneous effects
Bla
nch
ena
y C
oP
201
9
CAUSAL INFERENCE BEYOND ECONOMETRICS
Bla
nch
ena
y C
oP
201
9
Causal inference beyond econometrics
• An economic theory generates causal statements
• Empirics allow to sort between theories
• “↗ Min wage ⇒ ↗ unemployment”
• How would you test that?
Bla
nch
ena
y C
oP
201
9
Example: Price elasticity
https://www.dropbox.com/s/8nujfq892ut5a37/Lecture%2016%20Estimating%20Elasticity.pptx?dl=
Bla
nch
ena
y C
oP
201
9
Example: Demand elasticity
• We only observe equilibrium values of P,Q
• How can we find demand elasticity?
https://www.dropbox.com/s/8nujfq892ut5a37/Lecture%2016%20Estimating%20Elasticity.pptx?dl=
P
QQ1 Q2
P1
P2
Bla
nch
ena
y C
oP
201
9
Example: Demand elasticity
Which is it?
https://www.dropbox.com/s/8nujfq892ut5a37/Lecture%2016%20Estimating%20Elasticity.pptx?dl=
P
QQ1 Q2
P1
P2
P
QQ1 Q2
P1
P2
Scenario 1:Less elastic demand, positive supply shock.
Scenario 2:More elastic demand, positive supply shock and negative demand shock.
D
D1
D2
S1
S2
S1
S2
Bla
nch
ena
y C
oP
201
9
Example: Identifying demand elasticity
• Suppose you have information on average price and quantity of bread sold per month in Cleveland when there are 30 bakeries. Suppose three new bakeries open on April 1st, increasing supply for April.
• 휀𝐷 = −
𝑄𝐴𝑃𝑅−𝑄𝑀𝐴𝑅𝑄𝑀𝐴𝑅
𝑃𝐴𝑃𝑅−𝑃𝑀𝐴𝑅𝑃𝑀𝐴𝑅
P
QQ1 Q2
S1
S2
D
• If we are sure that an elasticity is estimated by an exogenous shock only to supply, we say it isidentified.
Bla
nch
ena
y C
oP
201
9
N. Huntington-Klein ECO305Economics, Causality, and Analytics
• Focus on causal inference and programming
• No regression!
• Controlling done through subsamples/matching
Bla
nch
ena
y C
oP
201
9
Some resources
Angrist-Pischke / Potential Outcomes
• Textbooks: Mostly Harmless Econometrics , Mastering ‘Metrics
• Angrist & Pischke (2017), Journal of Economic Perspectives, “Through our classes darkly”.
Directed Acyclical Graphs
• Scott Cunningham (regularly updated) “Causal Inference: The Mixtape”, particularly section 3: accessible intro to DAGs
• Nick Huntington-Klein ECO305: causal inference without regressions; causal graphs examples of common empirical strategies
• Morgan & Winship (2007, 2nd ed 2015) Counterfactuals and Causal Inference: combines Potential Outcomes & DAGs, focus on economics
• Judea Pearl, Causality (2000, 2nd ed 2009), particularly chapter 3: full formalism of DAGs & causal inference
Bla
nch
ena
y C
oP
201
9
Thank you!
Bla
nch
ena
y C
oP
201
9
EXAMPLES
Bla
nch
ena
y C
oP
201
9
RCT
• Ensures 𝐸 휀𝑖 𝐷𝑖 = 1 = 𝐸 휀𝑖 𝐷𝑖 = 0
• CIA satisfied
• Ensures no causal path between 𝐷 and other covariates
Backdoor criterion
randomized 𝐷
𝑈
𝑌
𝑋1 𝑋2
Back
Bla
nch
ena
y C
oP
201
9
Control variables
• DGP: 𝑌𝑖 = 𝛼 + 𝛽𝐷𝑖 + 𝛾𝑋𝑖 + 𝑒𝑖• Run: 𝑌𝑖 = 𝛼 + 𝛽𝐷𝑖 + 𝑢𝑖• Omitted Variable Bias if 𝐸 𝑢𝑖 𝑃𝑖 = 1 ≠ 𝐸 𝑢𝑖 𝑃𝑖 = 0
• Controlling for 𝑋 closes causal path 𝐷 ← 𝑋 → 𝑌
𝐷
𝑋
𝑌
Back
Bla
nch
ena
y C
oP
201
9
Fixed effects (within) in panel data
• Controlling for individual closes backdoor path
U
𝑋
Individual
𝑌
Time
𝑋
Individual
𝑌
Time
Back
Bla
nch
ena
y C
oP
201
9
Instrumental Variables
Req1 (first stage): instrument 𝑍 has an effect on 𝐷
Req2 (exogeneity): 𝑍 is as good as randomly assigned
No unobserved confounder between 𝑍 and 𝑌
Req3 (exclusion restriction)
No other causal path between 𝑍 and 𝑌
𝐷
𝑈
𝑌𝑍
Back