Soc504: Causal Inference Topics
Brandon Stewart1
Princeton
April 10 - April 19, 2017
1This lecture draws from slides by Matt Blackwell, Jens Hainmueller, Erin Hartmanand Gary King
Stewart (Princeton) Causal Inference Apr 10 - Apr 19, 2017 1 / 168
ReadingsMonday
I King, Gary and Langche Zeng. “The Dangers of Extreme Counterfactuals,”PoliticalAnalysis, 14, 2, (2007): 131-159.
I King, Gary and Langche Zeng. “When Can History be Our Guide? The Pitfalls ofCounterfactual Inference,” International Studies Quarterly, 2006, 51 (March, 2007):183–210.
Wednesday
I Daniel Ho, Kosuke Imai, Gary King, and Elizabeth Stuart.”Matching asNonparametric Preprocessing for Reducing Model Dependence in Parametric CausalInference,” Political Analysis, 15 (2007): 199-236.
Monday
I Review Morgan and Winship Potential Outcomes ChapterI Kosuke Imai, Gary King, and Elizabeth Stuart. Misunderstandings Among
Experimentalists and Observationalists About Causal Inference. Journal of the RoyalStatistical Society, Series A, (2008) 171, part 2: 481502
Wednesday
I Optional: Imai, Keele, Tingley and Yamamoto. “Unpacking the black box ofcausality: Learning about causal mechanisms from experimental and observationalstudies” American Political Science Review (2011)
I Optional: Acharya, Blackwell and Sen. “Explaining Causal Findings Without Bias:Detecting and Assessing Direct Effects.” American Political Science Review.(2016).
Stewart (Princeton) Causal Inference Apr 10 - Apr 19, 2017 2 / 168
ReadingsMonday
I King, Gary and Langche Zeng. “The Dangers of Extreme Counterfactuals,”PoliticalAnalysis, 14, 2, (2007): 131-159.
I King, Gary and Langche Zeng. “When Can History be Our Guide? The Pitfalls ofCounterfactual Inference,” International Studies Quarterly, 2006, 51 (March, 2007):183–210.
Wednesday
I Daniel Ho, Kosuke Imai, Gary King, and Elizabeth Stuart.”Matching asNonparametric Preprocessing for Reducing Model Dependence in Parametric CausalInference,” Political Analysis, 15 (2007): 199-236.
Monday
I Review Morgan and Winship Potential Outcomes ChapterI Kosuke Imai, Gary King, and Elizabeth Stuart. Misunderstandings Among
Experimentalists and Observationalists About Causal Inference. Journal of the RoyalStatistical Society, Series A, (2008) 171, part 2: 481502
Wednesday
I Optional: Imai, Keele, Tingley and Yamamoto. “Unpacking the black box ofcausality: Learning about causal mechanisms from experimental and observationalstudies” American Political Science Review (2011)
I Optional: Acharya, Blackwell and Sen. “Explaining Causal Findings Without Bias:Detecting and Assessing Direct Effects.” American Political Science Review.(2016).
Stewart (Princeton) Causal Inference Apr 10 - Apr 19, 2017 2 / 168
ReadingsMonday
I King, Gary and Langche Zeng. “The Dangers of Extreme Counterfactuals,”PoliticalAnalysis, 14, 2, (2007): 131-159.
I King, Gary and Langche Zeng. “When Can History be Our Guide? The Pitfalls ofCounterfactual Inference,” International Studies Quarterly, 2006, 51 (March, 2007):183–210.
Wednesday
I Daniel Ho, Kosuke Imai, Gary King, and Elizabeth Stuart.”Matching asNonparametric Preprocessing for Reducing Model Dependence in Parametric CausalInference,” Political Analysis, 15 (2007): 199-236.
Monday
I Review Morgan and Winship Potential Outcomes ChapterI Kosuke Imai, Gary King, and Elizabeth Stuart. Misunderstandings Among
Experimentalists and Observationalists About Causal Inference. Journal of the RoyalStatistical Society, Series A, (2008) 171, part 2: 481502
Wednesday
I Optional: Imai, Keele, Tingley and Yamamoto. “Unpacking the black box ofcausality: Learning about causal mechanisms from experimental and observationalstudies” American Political Science Review (2011)
I Optional: Acharya, Blackwell and Sen. “Explaining Causal Findings Without Bias:Detecting and Assessing Direct Effects.” American Political Science Review.(2016).
Stewart (Princeton) Causal Inference Apr 10 - Apr 19, 2017 2 / 168
ReadingsMonday
I King, Gary and Langche Zeng. “The Dangers of Extreme Counterfactuals,”PoliticalAnalysis, 14, 2, (2007): 131-159.
I King, Gary and Langche Zeng. “When Can History be Our Guide? The Pitfalls ofCounterfactual Inference,” International Studies Quarterly, 2006, 51 (March, 2007):183–210.
Wednesday
I Daniel Ho, Kosuke Imai, Gary King, and Elizabeth Stuart.”Matching asNonparametric Preprocessing for Reducing Model Dependence in Parametric CausalInference,” Political Analysis, 15 (2007): 199-236.
Monday
I Review Morgan and Winship Potential Outcomes ChapterI Kosuke Imai, Gary King, and Elizabeth Stuart. Misunderstandings Among
Experimentalists and Observationalists About Causal Inference. Journal of the RoyalStatistical Society, Series A, (2008) 171, part 2: 481502
Wednesday
I Optional: Imai, Keele, Tingley and Yamamoto. “Unpacking the black box ofcausality: Learning about causal mechanisms from experimental and observationalstudies” American Political Science Review (2011)
I Optional: Acharya, Blackwell and Sen. “Explaining Causal Findings Without Bias:Detecting and Assessing Direct Effects.” American Political Science Review.(2016).
Stewart (Princeton) Causal Inference Apr 10 - Apr 19, 2017 2 / 168
ReadingsMonday
I King, Gary and Langche Zeng. “The Dangers of Extreme Counterfactuals,”PoliticalAnalysis, 14, 2, (2007): 131-159.
I King, Gary and Langche Zeng. “When Can History be Our Guide? The Pitfalls ofCounterfactual Inference,” International Studies Quarterly, 2006, 51 (March, 2007):183–210.
WednesdayI Daniel Ho, Kosuke Imai, Gary King, and Elizabeth Stuart.”Matching as
Nonparametric Preprocessing for Reducing Model Dependence in Parametric CausalInference,” Political Analysis, 15 (2007): 199-236.
Monday
I Review Morgan and Winship Potential Outcomes ChapterI Kosuke Imai, Gary King, and Elizabeth Stuart. Misunderstandings Among
Experimentalists and Observationalists About Causal Inference. Journal of the RoyalStatistical Society, Series A, (2008) 171, part 2: 481502
Wednesday
I Optional: Imai, Keele, Tingley and Yamamoto. “Unpacking the black box ofcausality: Learning about causal mechanisms from experimental and observationalstudies” American Political Science Review (2011)
I Optional: Acharya, Blackwell and Sen. “Explaining Causal Findings Without Bias:Detecting and Assessing Direct Effects.” American Political Science Review.(2016).
Stewart (Princeton) Causal Inference Apr 10 - Apr 19, 2017 2 / 168
ReadingsMonday
I King, Gary and Langche Zeng. “The Dangers of Extreme Counterfactuals,”PoliticalAnalysis, 14, 2, (2007): 131-159.
I King, Gary and Langche Zeng. “When Can History be Our Guide? The Pitfalls ofCounterfactual Inference,” International Studies Quarterly, 2006, 51 (March, 2007):183–210.
WednesdayI Daniel Ho, Kosuke Imai, Gary King, and Elizabeth Stuart.”Matching as
Nonparametric Preprocessing for Reducing Model Dependence in Parametric CausalInference,” Political Analysis, 15 (2007): 199-236.
Monday
I Review Morgan and Winship Potential Outcomes ChapterI Kosuke Imai, Gary King, and Elizabeth Stuart. Misunderstandings Among
Experimentalists and Observationalists About Causal Inference. Journal of the RoyalStatistical Society, Series A, (2008) 171, part 2: 481502
Wednesday
I Optional: Imai, Keele, Tingley and Yamamoto. “Unpacking the black box ofcausality: Learning about causal mechanisms from experimental and observationalstudies” American Political Science Review (2011)
I Optional: Acharya, Blackwell and Sen. “Explaining Causal Findings Without Bias:Detecting and Assessing Direct Effects.” American Political Science Review.(2016).
Stewart (Princeton) Causal Inference Apr 10 - Apr 19, 2017 2 / 168
ReadingsMonday
I King, Gary and Langche Zeng. “The Dangers of Extreme Counterfactuals,”PoliticalAnalysis, 14, 2, (2007): 131-159.
I King, Gary and Langche Zeng. “When Can History be Our Guide? The Pitfalls ofCounterfactual Inference,” International Studies Quarterly, 2006, 51 (March, 2007):183–210.
WednesdayI Daniel Ho, Kosuke Imai, Gary King, and Elizabeth Stuart.”Matching as
Nonparametric Preprocessing for Reducing Model Dependence in Parametric CausalInference,” Political Analysis, 15 (2007): 199-236.
MondayI Review Morgan and Winship Potential Outcomes Chapter
I Kosuke Imai, Gary King, and Elizabeth Stuart. Misunderstandings AmongExperimentalists and Observationalists About Causal Inference. Journal of the RoyalStatistical Society, Series A, (2008) 171, part 2: 481502
Wednesday
I Optional: Imai, Keele, Tingley and Yamamoto. “Unpacking the black box ofcausality: Learning about causal mechanisms from experimental and observationalstudies” American Political Science Review (2011)
I Optional: Acharya, Blackwell and Sen. “Explaining Causal Findings Without Bias:Detecting and Assessing Direct Effects.” American Political Science Review.(2016).
Stewart (Princeton) Causal Inference Apr 10 - Apr 19, 2017 2 / 168
ReadingsMonday
I King, Gary and Langche Zeng. “The Dangers of Extreme Counterfactuals,”PoliticalAnalysis, 14, 2, (2007): 131-159.
I King, Gary and Langche Zeng. “When Can History be Our Guide? The Pitfalls ofCounterfactual Inference,” International Studies Quarterly, 2006, 51 (March, 2007):183–210.
WednesdayI Daniel Ho, Kosuke Imai, Gary King, and Elizabeth Stuart.”Matching as
Nonparametric Preprocessing for Reducing Model Dependence in Parametric CausalInference,” Political Analysis, 15 (2007): 199-236.
MondayI Review Morgan and Winship Potential Outcomes ChapterI Kosuke Imai, Gary King, and Elizabeth Stuart. Misunderstandings Among
Experimentalists and Observationalists About Causal Inference. Journal of the RoyalStatistical Society, Series A, (2008) 171, part 2: 481502
Wednesday
I Optional: Imai, Keele, Tingley and Yamamoto. “Unpacking the black box ofcausality: Learning about causal mechanisms from experimental and observationalstudies” American Political Science Review (2011)
I Optional: Acharya, Blackwell and Sen. “Explaining Causal Findings Without Bias:Detecting and Assessing Direct Effects.” American Political Science Review.(2016).
Stewart (Princeton) Causal Inference Apr 10 - Apr 19, 2017 2 / 168
ReadingsMonday
I King, Gary and Langche Zeng. “The Dangers of Extreme Counterfactuals,”PoliticalAnalysis, 14, 2, (2007): 131-159.
I King, Gary and Langche Zeng. “When Can History be Our Guide? The Pitfalls ofCounterfactual Inference,” International Studies Quarterly, 2006, 51 (March, 2007):183–210.
WednesdayI Daniel Ho, Kosuke Imai, Gary King, and Elizabeth Stuart.”Matching as
Nonparametric Preprocessing for Reducing Model Dependence in Parametric CausalInference,” Political Analysis, 15 (2007): 199-236.
MondayI Review Morgan and Winship Potential Outcomes ChapterI Kosuke Imai, Gary King, and Elizabeth Stuart. Misunderstandings Among
Experimentalists and Observationalists About Causal Inference. Journal of the RoyalStatistical Society, Series A, (2008) 171, part 2: 481502
Wednesday
I Optional: Imai, Keele, Tingley and Yamamoto. “Unpacking the black box ofcausality: Learning about causal mechanisms from experimental and observationalstudies” American Political Science Review (2011)
I Optional: Acharya, Blackwell and Sen. “Explaining Causal Findings Without Bias:Detecting and Assessing Direct Effects.” American Political Science Review.(2016).
Stewart (Princeton) Causal Inference Apr 10 - Apr 19, 2017 2 / 168
ReadingsMonday
I King, Gary and Langche Zeng. “The Dangers of Extreme Counterfactuals,”PoliticalAnalysis, 14, 2, (2007): 131-159.
I King, Gary and Langche Zeng. “When Can History be Our Guide? The Pitfalls ofCounterfactual Inference,” International Studies Quarterly, 2006, 51 (March, 2007):183–210.
WednesdayI Daniel Ho, Kosuke Imai, Gary King, and Elizabeth Stuart.”Matching as
Nonparametric Preprocessing for Reducing Model Dependence in Parametric CausalInference,” Political Analysis, 15 (2007): 199-236.
MondayI Review Morgan and Winship Potential Outcomes ChapterI Kosuke Imai, Gary King, and Elizabeth Stuart. Misunderstandings Among
Experimentalists and Observationalists About Causal Inference. Journal of the RoyalStatistical Society, Series A, (2008) 171, part 2: 481502
WednesdayI Optional: Imai, Keele, Tingley and Yamamoto. “Unpacking the black box of
causality: Learning about causal mechanisms from experimental and observationalstudies” American Political Science Review (2011)
I Optional: Acharya, Blackwell and Sen. “Explaining Causal Findings Without Bias:Detecting and Assessing Direct Effects.” American Political Science Review.(2016).
Stewart (Princeton) Causal Inference Apr 10 - Apr 19, 2017 2 / 168
ReadingsMonday
I King, Gary and Langche Zeng. “The Dangers of Extreme Counterfactuals,”PoliticalAnalysis, 14, 2, (2007): 131-159.
I King, Gary and Langche Zeng. “When Can History be Our Guide? The Pitfalls ofCounterfactual Inference,” International Studies Quarterly, 2006, 51 (March, 2007):183–210.
WednesdayI Daniel Ho, Kosuke Imai, Gary King, and Elizabeth Stuart.”Matching as
Nonparametric Preprocessing for Reducing Model Dependence in Parametric CausalInference,” Political Analysis, 15 (2007): 199-236.
MondayI Review Morgan and Winship Potential Outcomes ChapterI Kosuke Imai, Gary King, and Elizabeth Stuart. Misunderstandings Among
Experimentalists and Observationalists About Causal Inference. Journal of the RoyalStatistical Society, Series A, (2008) 171, part 2: 481502
WednesdayI Optional: Imai, Keele, Tingley and Yamamoto. “Unpacking the black box of
causality: Learning about causal mechanisms from experimental and observationalstudies” American Political Science Review (2011)
I Optional: Acharya, Blackwell and Sen. “Explaining Causal Findings Without Bias:Detecting and Assessing Direct Effects.” American Political Science Review.(2016).
Stewart (Princeton) Causal Inference Apr 10 - Apr 19, 2017 2 / 168
1 Assessing Counterfactuals
2 A (Brief) Review of Selection on Observables
3 Matching as Non-parametric Preprocessing
4 Fundamentals of Matching
5 Three Approaches to Matching
6 The Propensity Score
7 Mechanisms: Estimands and Identification
8 Mechanisms: Estimation
9 Controlled Direct Effects
10 Appendix: The Case Against Propensity Score Matching
Stewart (Princeton) Causal Inference Apr 10 - Apr 19, 2017 3 / 168
1 Assessing Counterfactuals
2 A (Brief) Review of Selection on Observables
3 Matching as Non-parametric Preprocessing
4 Fundamentals of Matching
5 Three Approaches to Matching
6 The Propensity Score
7 Mechanisms: Estimands and Identification
8 Mechanisms: Estimation
9 Controlled Direct Effects
10 Appendix: The Case Against Propensity Score Matching
Stewart (Princeton) Causal Inference Apr 10 - Apr 19, 2017 3 / 168
Counterfactuals
Three types:
1 Forecasts Will Donald Trump win reelection?2 What-if Questions What would have happened if the U.S. had not
invaded Iraq?3 Causal Effects What is the causal effect of the Iraq war on U.S.
Supreme Court decision making? (a factual minus a counterfactual)
Counterfactuals are some part of most research, absolutely essentialin the context quantities of interest
The model will always give an answer- so how do identify reasonablecounterfactuals?
Summary of Today: don’t ask your model unreasonable questions.(remember the Momentous Sprint?)
Stewart (Princeton) Causal Inference Apr 10 - Apr 19, 2017 4 / 168
Counterfactuals
Three types:
1 Forecasts Will Donald Trump win reelection?2 What-if Questions What would have happened if the U.S. had not
invaded Iraq?3 Causal Effects What is the causal effect of the Iraq war on U.S.
Supreme Court decision making? (a factual minus a counterfactual)
Counterfactuals are some part of most research, absolutely essentialin the context quantities of interest
The model will always give an answer- so how do identify reasonablecounterfactuals?
Summary of Today: don’t ask your model unreasonable questions.(remember the Momentous Sprint?)
Stewart (Princeton) Causal Inference Apr 10 - Apr 19, 2017 4 / 168
Counterfactuals
Three types:1 Forecasts Will Donald Trump win reelection?
2 What-if Questions What would have happened if the U.S. had notinvaded Iraq?
3 Causal Effects What is the causal effect of the Iraq war on U.S.Supreme Court decision making? (a factual minus a counterfactual)
Counterfactuals are some part of most research, absolutely essentialin the context quantities of interest
The model will always give an answer- so how do identify reasonablecounterfactuals?
Summary of Today: don’t ask your model unreasonable questions.(remember the Momentous Sprint?)
Stewart (Princeton) Causal Inference Apr 10 - Apr 19, 2017 4 / 168
Counterfactuals
Three types:1 Forecasts Will Donald Trump win reelection?2 What-if Questions What would have happened if the U.S. had not
invaded Iraq?
3 Causal Effects What is the causal effect of the Iraq war on U.S.Supreme Court decision making? (a factual minus a counterfactual)
Counterfactuals are some part of most research, absolutely essentialin the context quantities of interest
The model will always give an answer- so how do identify reasonablecounterfactuals?
Summary of Today: don’t ask your model unreasonable questions.(remember the Momentous Sprint?)
Stewart (Princeton) Causal Inference Apr 10 - Apr 19, 2017 4 / 168
Counterfactuals
Three types:1 Forecasts Will Donald Trump win reelection?2 What-if Questions What would have happened if the U.S. had not
invaded Iraq?3 Causal Effects What is the causal effect of the Iraq war on U.S.
Supreme Court decision making? (a factual minus a counterfactual)
Counterfactuals are some part of most research, absolutely essentialin the context quantities of interest
The model will always give an answer- so how do identify reasonablecounterfactuals?
Summary of Today: don’t ask your model unreasonable questions.(remember the Momentous Sprint?)
Stewart (Princeton) Causal Inference Apr 10 - Apr 19, 2017 4 / 168
Counterfactuals
Three types:1 Forecasts Will Donald Trump win reelection?2 What-if Questions What would have happened if the U.S. had not
invaded Iraq?3 Causal Effects What is the causal effect of the Iraq war on U.S.
Supreme Court decision making? (a factual minus a counterfactual)
Counterfactuals are some part of most research, absolutely essentialin the context quantities of interest
The model will always give an answer- so how do identify reasonablecounterfactuals?
Summary of Today: don’t ask your model unreasonable questions.(remember the Momentous Sprint?)
Stewart (Princeton) Causal Inference Apr 10 - Apr 19, 2017 4 / 168
Counterfactuals
Three types:1 Forecasts Will Donald Trump win reelection?2 What-if Questions What would have happened if the U.S. had not
invaded Iraq?3 Causal Effects What is the causal effect of the Iraq war on U.S.
Supreme Court decision making? (a factual minus a counterfactual)
Counterfactuals are some part of most research, absolutely essentialin the context quantities of interest
The model will always give an answer- so how do identify reasonablecounterfactuals?
Summary of Today: don’t ask your model unreasonable questions.(remember the Momentous Sprint?)
Stewart (Princeton) Causal Inference Apr 10 - Apr 19, 2017 4 / 168
Counterfactuals
Three types:1 Forecasts Will Donald Trump win reelection?2 What-if Questions What would have happened if the U.S. had not
invaded Iraq?3 Causal Effects What is the causal effect of the Iraq war on U.S.
Supreme Court decision making? (a factual minus a counterfactual)
Counterfactuals are some part of most research, absolutely essentialin the context quantities of interest
The model will always give an answer- so how do identify reasonablecounterfactuals?
Summary of Today: don’t ask your model unreasonable questions.(remember the Momentous Sprint?)
Stewart (Princeton) Causal Inference Apr 10 - Apr 19, 2017 4 / 168
Which model would you choose? (Both fit the data well.)
Compare prediction at x = 1.5 to prediction at x = 5
How do you choose a model?
R2? Some “test”? “Theory”?
The bottom line: answers to some questions don’t exist in the data.
Our estimate of certain quantities of interest is highly modeldependent
Stewart (Princeton) Causal Inference Apr 10 - Apr 19, 2017 5 / 168
Which model would you choose? (Both fit the data well.)
Compare prediction at x = 1.5 to prediction at x = 5
How do you choose a model?
R2? Some “test”? “Theory”?
The bottom line: answers to some questions don’t exist in the data.
Our estimate of certain quantities of interest is highly modeldependent
Stewart (Princeton) Causal Inference Apr 10 - Apr 19, 2017 5 / 168
Which model would you choose? (Both fit the data well.)
Compare prediction at x = 1.5 to prediction at x = 5
How do you choose a model?
R2? Some “test”? “Theory”?
The bottom line: answers to some questions don’t exist in the data.
Our estimate of certain quantities of interest is highly modeldependent
Stewart (Princeton) Causal Inference Apr 10 - Apr 19, 2017 5 / 168
Which model would you choose? (Both fit the data well.)
Compare prediction at x = 1.5 to prediction at x = 5
How do you choose a model? R2?
Some “test”? “Theory”?
The bottom line: answers to some questions don’t exist in the data.
Our estimate of certain quantities of interest is highly modeldependent
Stewart (Princeton) Causal Inference Apr 10 - Apr 19, 2017 5 / 168
Which model would you choose? (Both fit the data well.)
Compare prediction at x = 1.5 to prediction at x = 5
How do you choose a model? R2? Some “test”?
“Theory”?
The bottom line: answers to some questions don’t exist in the data.
Our estimate of certain quantities of interest is highly modeldependent
Stewart (Princeton) Causal Inference Apr 10 - Apr 19, 2017 5 / 168
Which model would you choose? (Both fit the data well.)
Compare prediction at x = 1.5 to prediction at x = 5
How do you choose a model? R2? Some “test”? “Theory”?
The bottom line: answers to some questions don’t exist in the data.
Our estimate of certain quantities of interest is highly modeldependent
Stewart (Princeton) Causal Inference Apr 10 - Apr 19, 2017 5 / 168
Which model would you choose? (Both fit the data well.)
Compare prediction at x = 1.5 to prediction at x = 5
How do you choose a model? R2? Some “test”? “Theory”?
The bottom line: answers to some questions don’t exist in the data.
Our estimate of certain quantities of interest is highly modeldependent
Stewart (Princeton) Causal Inference Apr 10 - Apr 19, 2017 5 / 168
Which model would you choose? (Both fit the data well.)
Compare prediction at x = 1.5 to prediction at x = 5
How do you choose a model? R2? Some “test”? “Theory”?
The bottom line: answers to some questions don’t exist in the data.
Our estimate of certain quantities of interest is highly modeldependent
Stewart (Princeton) Causal Inference Apr 10 - Apr 19, 2017 5 / 168
Model Dependence Proof
Model Free Inference
To estimate E (Y |X = x) at x , average many observed Y with value x
Assumptions (Model-Based Inference)
1 Definition: model dependence at x is the difference between predictedoutcomes for any two models that fit about equally well.
2 The functional form follows strong continuity (think smoothness,although it is less restrictive)
Result
The maximum degree of model dependence: solely a function of thedistance from the counterfactual to the data
Stewart (Princeton) Causal Inference Apr 10 - Apr 19, 2017 6 / 168
Model Dependence Proof
Model Free Inference
To estimate E (Y |X = x) at x , average many observed Y with value x
Assumptions (Model-Based Inference)
1 Definition: model dependence at x is the difference between predictedoutcomes for any two models that fit about equally well.
2 The functional form follows strong continuity (think smoothness,although it is less restrictive)
Result
The maximum degree of model dependence: solely a function of thedistance from the counterfactual to the data
Stewart (Princeton) Causal Inference Apr 10 - Apr 19, 2017 6 / 168
Model Dependence Proof
Model Free Inference
To estimate E (Y |X = x) at x , average many observed Y with value x
Assumptions (Model-Based Inference)
1 Definition: model dependence at x is the difference between predictedoutcomes for any two models that fit about equally well.
2 The functional form follows strong continuity (think smoothness,although it is less restrictive)
Result
The maximum degree of model dependence: solely a function of thedistance from the counterfactual to the data
Stewart (Princeton) Causal Inference Apr 10 - Apr 19, 2017 6 / 168
Model Dependence Proof
Model Free Inference
To estimate E (Y |X = x) at x , average many observed Y with value x
Assumptions (Model-Based Inference)
1 Definition: model dependence at x is the difference between predictedoutcomes for any two models that fit about equally well.
2 The functional form follows strong continuity (think smoothness,although it is less restrictive)
Result
The maximum degree of model dependence: solely a function of thedistance from the counterfactual to the data
Stewart (Princeton) Causal Inference Apr 10 - Apr 19, 2017 6 / 168
Model Dependence Proof
Model Free Inference
To estimate E (Y |X = x) at x , average many observed Y with value x
Assumptions (Model-Based Inference)
1 Definition: model dependence at x is the difference between predictedoutcomes for any two models that fit about equally well.
2 The functional form follows strong continuity (think smoothness,although it is less restrictive)
Result
The maximum degree of model dependence: solely a function of thedistance from the counterfactual to the data
Stewart (Princeton) Causal Inference Apr 10 - Apr 19, 2017 6 / 168
Model Dependence Proof
Model Free Inference
To estimate E (Y |X = x) at x , average many observed Y with value x
Assumptions (Model-Based Inference)
1 Definition: model dependence at x is the difference between predictedoutcomes for any two models that fit about equally well.
2 The functional form follows strong continuity (think smoothness,although it is less restrictive)
Result
The maximum degree of model dependence: solely a function of thedistance from the counterfactual to the data
Stewart (Princeton) Causal Inference Apr 10 - Apr 19, 2017 6 / 168
Model Dependence Proof
Model Free Inference
To estimate E (Y |X = x) at x , average many observed Y with value x
Assumptions (Model-Based Inference)
1 Definition: model dependence at x is the difference between predictedoutcomes for any two models that fit about equally well.
2 The functional form follows strong continuity (think smoothness,although it is less restrictive)
Result
The maximum degree of model dependence: solely a function of thedistance from the counterfactual to the data
Stewart (Princeton) Causal Inference Apr 10 - Apr 19, 2017 6 / 168
Model Dependence Proof
Model Free Inference
To estimate E (Y |X = x) at x , average many observed Y with value x
Assumptions (Model-Based Inference)
1 Definition: model dependence at x is the difference between predictedoutcomes for any two models that fit about equally well.
2 The functional form follows strong continuity (think smoothness,although it is less restrictive)
Result
The maximum degree of model dependence: solely a function of thedistance from the counterfactual to the data
Stewart (Princeton) Causal Inference Apr 10 - Apr 19, 2017 6 / 168
Detecting Model Dependence
Randomly select a large number of infants
Randomly assign them to 0,6,8,10,12,16 years of education
Assume 100% compliance, and no measurement error, omittedvariables, or missing data
Regress cumulative salary in year 17 on education
We find a coefficient of β̂ = $1, 000, big t-statistics, narrowconfidence intervals, and pass every test for auto-correlation, fit,normality, linearity, homoskedasticity, etc.
Stewart (Princeton) Causal Inference Apr 10 - Apr 19, 2017 7 / 168
Detecting Model DependenceA (Hypothethical) Research Design
Randomly select a large number of infants
Randomly assign them to 0,6,8,10,12,16 years of education
Assume 100% compliance, and no measurement error, omittedvariables, or missing data
Regress cumulative salary in year 17 on education
We find a coefficient of β̂ = $1, 000, big t-statistics, narrowconfidence intervals, and pass every test for auto-correlation, fit,normality, linearity, homoskedasticity, etc.
Stewart (Princeton) Causal Inference Apr 10 - Apr 19, 2017 7 / 168
Detecting Model DependenceA (Hypothethical) Research Design
Randomly select a large number of infants
Randomly assign them to 0,6,8,10,12,16 years of education
Assume 100% compliance, and no measurement error, omittedvariables, or missing data
Regress cumulative salary in year 17 on education
We find a coefficient of β̂ = $1, 000, big t-statistics, narrowconfidence intervals, and pass every test for auto-correlation, fit,normality, linearity, homoskedasticity, etc.
Stewart (Princeton) Causal Inference Apr 10 - Apr 19, 2017 7 / 168
Detecting Model DependenceA (Hypothethical) Research Design
Randomly select a large number of infants
Randomly assign them to 0,6,8,10,12,16 years of education
Assume 100% compliance, and no measurement error, omittedvariables, or missing data
Regress cumulative salary in year 17 on education
We find a coefficient of β̂ = $1, 000, big t-statistics, narrowconfidence intervals, and pass every test for auto-correlation, fit,normality, linearity, homoskedasticity, etc.
Stewart (Princeton) Causal Inference Apr 10 - Apr 19, 2017 7 / 168
Detecting Model DependenceA (Hypothethical) Research Design
Randomly select a large number of infants
Randomly assign them to 0,6,8,10,12,16 years of education
Assume 100% compliance, and no measurement error, omittedvariables, or missing data
Regress cumulative salary in year 17 on education
We find a coefficient of β̂ = $1, 000, big t-statistics, narrowconfidence intervals, and pass every test for auto-correlation, fit,normality, linearity, homoskedasticity, etc.
Stewart (Princeton) Causal Inference Apr 10 - Apr 19, 2017 7 / 168
Detecting Model DependenceA (Hypothethical) Research Design
Randomly select a large number of infants
Randomly assign them to 0,6,8,10,12,16 years of education
Assume 100% compliance, and no measurement error, omittedvariables, or missing data
Regress cumulative salary in year 17 on education
We find a coefficient of β̂ = $1, 000, big t-statistics, narrowconfidence intervals, and pass every test for auto-correlation, fit,normality, linearity, homoskedasticity, etc.
Stewart (Princeton) Causal Inference Apr 10 - Apr 19, 2017 7 / 168
Detecting Model DependenceA (Hypothethical) Research Design
Randomly select a large number of infants
Randomly assign them to 0,6,8,10,12,16 years of education
Assume 100% compliance, and no measurement error, omittedvariables, or missing data
Regress cumulative salary in year 17 on education
We find a coefficient of β̂ = $1, 000, big t-statistics, narrowconfidence intervals, and pass every test for auto-correlation, fit,normality, linearity, homoskedasticity, etc.
Stewart (Princeton) Causal Inference Apr 10 - Apr 19, 2017 7 / 168
What Inferences Would You Be Willing to Make?
A Factual Question: How much salary would someone receive with 12years of education (a high school degree)?
The model-free estimate: mean(Y ) among those with X = 12.
The model-based estimate: Ŷ = X β̂ = 12× $1, 000 = $12, 000
Stewart (Princeton) Causal Inference Apr 10 - Apr 19, 2017 8 / 168
What Inferences Would You Be Willing to Make?
A Factual Question: How much salary would someone receive with 12years of education (a high school degree)?
The model-free estimate: mean(Y ) among those with X = 12.
The model-based estimate: Ŷ = X β̂ = 12× $1, 000 = $12, 000
Stewart (Princeton) Causal Inference Apr 10 - Apr 19, 2017 8 / 168
What Inferences Would You Be Willing to Make?
A Factual Question: How much salary would someone receive with 12years of education (a high school degree)?
The model-free estimate: mean(Y ) among those with X = 12.
The model-based estimate: Ŷ = X β̂ = 12× $1, 000 = $12, 000
Stewart (Princeton) Causal Inference Apr 10 - Apr 19, 2017 8 / 168
What Inferences Would You Be Willing to Make?
A Factual Question: How much salary would someone receive with 12years of education (a high school degree)?
The model-free estimate: mean(Y ) among those with X = 12.
The model-based estimate: Ŷ = X β̂ = 12× $1, 000 = $12, 000
Stewart (Princeton) Causal Inference Apr 10 - Apr 19, 2017 8 / 168
Counterfactual Inferences with Interpolation
How much salary would someone receive with 14 years of education(an Associates Degree)?
Model free estimate: impossible
Model-based estimate: Ŷ = X β̂ = 14× $1, 000 = $14, 000
Stewart (Princeton) Causal Inference Apr 10 - Apr 19, 2017 9 / 168
Counterfactual Inferences with Interpolation
How much salary would someone receive with 14 years of education(an Associates Degree)?
Model free estimate: impossible
Model-based estimate: Ŷ = X β̂ = 14× $1, 000 = $14, 000
Stewart (Princeton) Causal Inference Apr 10 - Apr 19, 2017 9 / 168
Counterfactual Inferences with Interpolation
How much salary would someone receive with 14 years of education(an Associates Degree)?
Model free estimate: impossible
Model-based estimate: Ŷ = X β̂ = 14× $1, 000 = $14, 000
Stewart (Princeton) Causal Inference Apr 10 - Apr 19, 2017 9 / 168
Counterfactual Inferences with Interpolation
How much salary would someone receive with 14 years of education(an Associates Degree)?
Model free estimate: impossible
Model-based estimate: Ŷ = X β̂ = 14× $1, 000 = $14, 000
Stewart (Princeton) Causal Inference Apr 10 - Apr 19, 2017 9 / 168
Counterfactual Inference with Extrapolation
How much salary would someone receive with 24 years of education(a Ph.D.)?
Ŷ = X β̂ = 24× $1, 000 = $24, 000
Stewart (Princeton) Causal Inference Apr 10 - Apr 19, 2017 10 / 168
Counterfactual Inference with Extrapolation
How much salary would someone receive with 24 years of education(a Ph.D.)?
Ŷ = X β̂ = 24× $1, 000 = $24, 000
Stewart (Princeton) Causal Inference Apr 10 - Apr 19, 2017 10 / 168
Counterfactual Inference with Extrapolation
How much salary would someone receive with 24 years of education(a Ph.D.)?
Ŷ = X β̂ = 24× $1, 000 = $24, 000
Stewart (Princeton) Causal Inference Apr 10 - Apr 19, 2017 10 / 168
Another Counterfactual Inference with Extrapolation
How much salary would someone receive with 53 years of education?
Ŷ = X β̂ = 53× $1, 000 = $53, 000Recall: the regression passed every test and met every assumption;identical calculations worked for the other questions.
What’s changed? How would we recognize it when the example is lessextreme or multidimensional?
Stewart (Princeton) Causal Inference Apr 10 - Apr 19, 2017 11 / 168
Another Counterfactual Inference with Extrapolation
How much salary would someone receive with 53 years of education?
Ŷ = X β̂ = 53× $1, 000 = $53, 000Recall: the regression passed every test and met every assumption;identical calculations worked for the other questions.
What’s changed? How would we recognize it when the example is lessextreme or multidimensional?
Stewart (Princeton) Causal Inference Apr 10 - Apr 19, 2017 11 / 168
Another Counterfactual Inference with Extrapolation
How much salary would someone receive with 53 years of education?
Ŷ = X β̂ = 53× $1, 000 = $53, 000
Recall: the regression passed every test and met every assumption;identical calculations worked for the other questions.
What’s changed? How would we recognize it when the example is lessextreme or multidimensional?
Stewart (Princeton) Causal Inference Apr 10 - Apr 19, 2017 11 / 168
Another Counterfactual Inference with Extrapolation
How much salary would someone receive with 53 years of education?
Ŷ = X β̂ = 53× $1, 000 = $53, 000Recall: the regression passed every test and met every assumption;identical calculations worked for the other questions.
What’s changed? How would we recognize it when the example is lessextreme or multidimensional?
Stewart (Princeton) Causal Inference Apr 10 - Apr 19, 2017 11 / 168
Another Counterfactual Inference with Extrapolation
How much salary would someone receive with 53 years of education?
Ŷ = X β̂ = 53× $1, 000 = $53, 000Recall: the regression passed every test and met every assumption;identical calculations worked for the other questions.
What’s changed? How would we recognize it when the example is lessextreme or multidimensional?Stewart (Princeton) Causal Inference Apr 10 - Apr 19, 2017 11 / 168
Model Dependence with One Explanatory Variable
Suppose Y is starting salary; X is education in 10 categories.
To estimate E (Y |X ): we need 10 parameters, E (Y |X = xj),j = 1, . . . , 10.
Model-free method: average 50 observations on Y for each value of X
Model-based method: regress Y on X , summarizing 10 parameterswith 2 (intercept and slope).
The difference between the 10 we need and the 2 we estimate withregression is pure assumption.
(If X were continuous, we would be reducing ∞ to 2, also byassumption)
Stewart (Princeton) Causal Inference Apr 10 - Apr 19, 2017 12 / 168
Model Dependence with One Explanatory Variable
Suppose Y is starting salary; X is education in 10 categories.
To estimate E (Y |X ): we need 10 parameters, E (Y |X = xj),j = 1, . . . , 10.
Model-free method: average 50 observations on Y for each value of X
Model-based method: regress Y on X , summarizing 10 parameterswith 2 (intercept and slope).
The difference between the 10 we need and the 2 we estimate withregression is pure assumption.
(If X were continuous, we would be reducing ∞ to 2, also byassumption)
Stewart (Princeton) Causal Inference Apr 10 - Apr 19, 2017 12 / 168
Model Dependence with One Explanatory Variable
Suppose Y is starting salary; X is education in 10 categories.
To estimate E (Y |X ): we need 10 parameters, E (Y |X = xj),j = 1, . . . , 10.
Model-free method: average 50 observations on Y for each value of X
Model-based method: regress Y on X , summarizing 10 parameterswith 2 (intercept and slope).
The difference between the 10 we need and the 2 we estimate withregression is pure assumption.
(If X were continuous, we would be reducing ∞ to 2, also byassumption)
Stewart (Princeton) Causal Inference Apr 10 - Apr 19, 2017 12 / 168
Model Dependence with One Explanatory Variable
Suppose Y is starting salary; X is education in 10 categories.
To estimate E (Y |X ): we need 10 parameters, E (Y |X = xj),j = 1, . . . , 10.
Model-free method: average 50 observations on Y for each value of X
Model-based method: regress Y on X , summarizing 10 parameterswith 2 (intercept and slope).
The difference between the 10 we need and the 2 we estimate withregression is pure assumption.
(If X were continuous, we would be reducing ∞ to 2, also byassumption)
Stewart (Princeton) Causal Inference Apr 10 - Apr 19, 2017 12 / 168
Model Dependence with One Explanatory Variable
Suppose Y is starting salary; X is education in 10 categories.
To estimate E (Y |X ): we need 10 parameters, E (Y |X = xj),j = 1, . . . , 10.
Model-free method: average 50 observations on Y for each value of X
Model-based method: regress Y on X , summarizing 10 parameterswith 2 (intercept and slope).
The difference between the 10 we need and the 2 we estimate withregression is pure assumption.
(If X were continuous, we would be reducing ∞ to 2, also byassumption)
Stewart (Princeton) Causal Inference Apr 10 - Apr 19, 2017 12 / 168
Model Dependence with One Explanatory Variable
Suppose Y is starting salary; X is education in 10 categories.
To estimate E (Y |X ): we need 10 parameters, E (Y |X = xj),j = 1, . . . , 10.
Model-free method: average 50 observations on Y for each value of X
Model-based method: regress Y on X , summarizing 10 parameterswith 2 (intercept and slope).
The difference between the 10 we need and the 2 we estimate withregression is pure assumption.
(If X were continuous, we would be reducing ∞ to 2, also byassumption)
Stewart (Princeton) Causal Inference Apr 10 - Apr 19, 2017 12 / 168
Model Dependence with One Explanatory Variable
Suppose Y is starting salary; X is education in 10 categories.
To estimate E (Y |X ): we need 10 parameters, E (Y |X = xj),j = 1, . . . , 10.
Model-free method: average 50 observations on Y for each value of X
Model-based method: regress Y on X , summarizing 10 parameterswith 2 (intercept and slope).
The difference between the 10 we need and the 2 we estimate withregression is pure assumption.
(If X were continuous, we would be reducing ∞ to 2, also byassumption)
Stewart (Princeton) Causal Inference Apr 10 - Apr 19, 2017 12 / 168
Model Dependence with Two Explanatory Variables
How many parameters do we now need to estimate?
20? Nope. Its10× 10 = 100. This is the curse of dimensionality: the number ofparameters goes up geometrically, not additively.
If we run a regression, we are summarizing 100 parameters with 3 (anintercept and two slopes).
But what about including an interaction? Right, so now we’resummarizing 100 parameters with 4.
The difference: an enormous assumption based on convenience, notevidence or theory.
Stewart (Princeton) Causal Inference Apr 10 - Apr 19, 2017 13 / 168
Model Dependence with Two Explanatory VariablesVariables: X (education) and Z , parent’s income, both with 10 categories
How many parameters do we now need to estimate?
20? Nope. Its10× 10 = 100. This is the curse of dimensionality: the number ofparameters goes up geometrically, not additively.
If we run a regression, we are summarizing 100 parameters with 3 (anintercept and two slopes).
But what about including an interaction? Right, so now we’resummarizing 100 parameters with 4.
The difference: an enormous assumption based on convenience, notevidence or theory.
Stewart (Princeton) Causal Inference Apr 10 - Apr 19, 2017 13 / 168
Model Dependence with Two Explanatory VariablesVariables: X (education) and Z , parent’s income, both with 10 categories
How many parameters do we now need to estimate?
20? Nope. Its10× 10 = 100. This is the curse of dimensionality: the number ofparameters goes up geometrically, not additively.
If we run a regression, we are summarizing 100 parameters with 3 (anintercept and two slopes).
But what about including an interaction? Right, so now we’resummarizing 100 parameters with 4.
The difference: an enormous assumption based on convenience, notevidence or theory.
Stewart (Princeton) Causal Inference Apr 10 - Apr 19, 2017 13 / 168
Model Dependence with Two Explanatory VariablesVariables: X (education) and Z , parent’s income, both with 10 categories
How many parameters do we now need to estimate? 20?
Nope. Its10× 10 = 100. This is the curse of dimensionality: the number ofparameters goes up geometrically, not additively.
If we run a regression, we are summarizing 100 parameters with 3 (anintercept and two slopes).
But what about including an interaction? Right, so now we’resummarizing 100 parameters with 4.
The difference: an enormous assumption based on convenience, notevidence or theory.
Stewart (Princeton) Causal Inference Apr 10 - Apr 19, 2017 13 / 168
Model Dependence with Two Explanatory VariablesVariables: X (education) and Z , parent’s income, both with 10 categories
How many parameters do we now need to estimate? 20? Nope.
Its10× 10 = 100. This is the curse of dimensionality: the number ofparameters goes up geometrically, not additively.
If we run a regression, we are summarizing 100 parameters with 3 (anintercept and two slopes).
But what about including an interaction? Right, so now we’resummarizing 100 parameters with 4.
The difference: an enormous assumption based on convenience, notevidence or theory.
Stewart (Princeton) Causal Inference Apr 10 - Apr 19, 2017 13 / 168
Model Dependence with Two Explanatory VariablesVariables: X (education) and Z , parent’s income, both with 10 categories
How many parameters do we now need to estimate? 20? Nope. Its10× 10 = 100.
This is the curse of dimensionality: the number ofparameters goes up geometrically, not additively.
If we run a regression, we are summarizing 100 parameters with 3 (anintercept and two slopes).
But what about including an interaction? Right, so now we’resummarizing 100 parameters with 4.
The difference: an enormous assumption based on convenience, notevidence or theory.
Stewart (Princeton) Causal Inference Apr 10 - Apr 19, 2017 13 / 168
Model Dependence with Two Explanatory VariablesVariables: X (education) and Z , parent’s income, both with 10 categories
How many parameters do we now need to estimate? 20? Nope. Its10× 10 = 100. This is the curse of dimensionality: the number ofparameters goes up geometrically, not additively.
If we run a regression, we are summarizing 100 parameters with 3 (anintercept and two slopes).
But what about including an interaction? Right, so now we’resummarizing 100 parameters with 4.
The difference: an enormous assumption based on convenience, notevidence or theory.
Stewart (Princeton) Causal Inference Apr 10 - Apr 19, 2017 13 / 168
Model Dependence with Two Explanatory VariablesVariables: X (education) and Z , parent’s income, both with 10 categories
How many parameters do we now need to estimate? 20? Nope. Its10× 10 = 100. This is the curse of dimensionality: the number ofparameters goes up geometrically, not additively.
If we run a regression, we are summarizing 100 parameters with 3 (anintercept and two slopes).
But what about including an interaction? Right, so now we’resummarizing 100 parameters with 4.
The difference: an enormous assumption based on convenience, notevidence or theory.
Stewart (Princeton) Causal Inference Apr 10 - Apr 19, 2017 13 / 168
Model Dependence with Two Explanatory VariablesVariables: X (education) and Z , parent’s income, both with 10 categories
How many parameters do we now need to estimate? 20? Nope. Its10× 10 = 100. This is the curse of dimensionality: the number ofparameters goes up geometrically, not additively.
If we run a regression, we are summarizing 100 parameters with 3 (anintercept and two slopes).
But what about including an interaction? Right, so now we’resummarizing 100 parameters with 4.
The difference: an enormous assumption based on convenience, notevidence or theory.
Stewart (Princeton) Causal Inference Apr 10 - Apr 19, 2017 13 / 168
Model Dependence with Two Explanatory VariablesVariables: X (education) and Z , parent’s income, both with 10 categories
How many parameters do we now need to estimate? 20? Nope. Its10× 10 = 100. This is the curse of dimensionality: the number ofparameters goes up geometrically, not additively.
If we run a regression, we are summarizing 100 parameters with 3 (anintercept and two slopes).
But what about including an interaction? Right, so now we’resummarizing 100 parameters with 4.
The difference: an enormous assumption based on convenience, notevidence or theory.
Stewart (Princeton) Causal Inference Apr 10 - Apr 19, 2017 13 / 168
Model Dependence with Many Explanatory Variables
Suppose: 15 explanatory variables, with 10 categories each.
I need to estimate 1015 (a quadrillion) parameters with how manyobservations?
I Regression reduces this to 16 parameters; quite an assumption!
Suppose: 80 explanatory variables.
I 1080 is more than the number of atoms in the universe.I Yet, with a few simple assumptions, we can still run a regression and
estimate only 81 parameters.
The curse of dimensionality introduces huge assumptions, oftenunrecognized.
Stewart (Princeton) Causal Inference Apr 10 - Apr 19, 2017 14 / 168
Model Dependence with Many Explanatory Variables
Suppose: 15 explanatory variables, with 10 categories each.
I need to estimate 1015 (a quadrillion) parameters with how manyobservations?
I Regression reduces this to 16 parameters; quite an assumption!
Suppose: 80 explanatory variables.
I 1080 is more than the number of atoms in the universe.I Yet, with a few simple assumptions, we can still run a regression and
estimate only 81 parameters.
The curse of dimensionality introduces huge assumptions, oftenunrecognized.
Stewart (Princeton) Causal Inference Apr 10 - Apr 19, 2017 14 / 168
Model Dependence with Many Explanatory Variables
Suppose: 15 explanatory variables, with 10 categories each.I need to estimate 1015 (a quadrillion) parameters with how many
observations?
I Regression reduces this to 16 parameters; quite an assumption!
Suppose: 80 explanatory variables.
I 1080 is more than the number of atoms in the universe.I Yet, with a few simple assumptions, we can still run a regression and
estimate only 81 parameters.
The curse of dimensionality introduces huge assumptions, oftenunrecognized.
Stewart (Princeton) Causal Inference Apr 10 - Apr 19, 2017 14 / 168
Model Dependence with Many Explanatory Variables
Suppose: 15 explanatory variables, with 10 categories each.I need to estimate 1015 (a quadrillion) parameters with how many
observations?I Regression reduces this to 16 parameters; quite an assumption!
Suppose: 80 explanatory variables.
I 1080 is more than the number of atoms in the universe.I Yet, with a few simple assumptions, we can still run a regression and
estimate only 81 parameters.
The curse of dimensionality introduces huge assumptions, oftenunrecognized.
Stewart (Princeton) Causal Inference Apr 10 - Apr 19, 2017 14 / 168
Model Dependence with Many Explanatory Variables
Suppose: 15 explanatory variables, with 10 categories each.I need to estimate 1015 (a quadrillion) parameters with how many
observations?I Regression reduces this to 16 parameters; quite an assumption!
Suppose: 80 explanatory variables.
I 1080 is more than the number of atoms in the universe.I Yet, with a few simple assumptions, we can still run a regression and
estimate only 81 parameters.
The curse of dimensionality introduces huge assumptions, oftenunrecognized.
Stewart (Princeton) Causal Inference Apr 10 - Apr 19, 2017 14 / 168
Model Dependence with Many Explanatory Variables
Suppose: 15 explanatory variables, with 10 categories each.I need to estimate 1015 (a quadrillion) parameters with how many
observations?I Regression reduces this to 16 parameters; quite an assumption!
Suppose: 80 explanatory variables.I 1080 is more than the number of atoms in the universe.
I Yet, with a few simple assumptions, we can still run a regression andestimate only 81 parameters.
The curse of dimensionality introduces huge assumptions, oftenunrecognized.
Stewart (Princeton) Causal Inference Apr 10 - Apr 19, 2017 14 / 168
Model Dependence with Many Explanatory Variables
Suppose: 15 explanatory variables, with 10 categories each.I need to estimate 1015 (a quadrillion) parameters with how many
observations?I Regression reduces this to 16 parameters; quite an assumption!
Suppose: 80 explanatory variables.I 1080 is more than the number of atoms in the universe.I Yet, with a few simple assumptions, we can still run a regression and
estimate only 81 parameters.
The curse of dimensionality introduces huge assumptions, oftenunrecognized.
Stewart (Princeton) Causal Inference Apr 10 - Apr 19, 2017 14 / 168
Model Dependence with Many Explanatory Variables
Suppose: 15 explanatory variables, with 10 categories each.I need to estimate 1015 (a quadrillion) parameters with how many
observations?I Regression reduces this to 16 parameters; quite an assumption!
Suppose: 80 explanatory variables.I 1080 is more than the number of atoms in the universe.I Yet, with a few simple assumptions, we can still run a regression and
estimate only 81 parameters.
The curse of dimensionality introduces huge assumptions, oftenunrecognized.
Stewart (Princeton) Causal Inference Apr 10 - Apr 19, 2017 14 / 168
How Factual is your Counterfactual?
Is your counterfactual close enough to data so that statisticalmethods provide empirical answers?
If not, the same calculations will be based on indefensible modelassumptions. With the curse of dimensionality, its too easy to fall intothis trap.
A good existing approach: Sensitivity testing, but this requires theuser to specify a class of models and then to estimate them all andcheck how much inferences change
King/Zeng “Convex Hull” approach:
I Specify your explanatory variables, X .I Assume E(Y |X ) is (minimally) smooth in XI No need to specify models (or a class of models), estimators, or
dependent variables.I Results of one run apply to the class of all models, all estimators, and
all dependent variables.
Stewart (Princeton) Causal Inference Apr 10 - Apr 19, 2017 15 / 168
How Factual is your Counterfactual?
Is your counterfactual close enough to data so that statisticalmethods provide empirical answers?
If not, the same calculations will be based on indefensible modelassumptions. With the curse of dimensionality, its too easy to fall intothis trap.
A good existing approach: Sensitivity testing, but this requires theuser to specify a class of models and then to estimate them all andcheck how much inferences change
King/Zeng “Convex Hull” approach:
I Specify your explanatory variables, X .I Assume E(Y |X ) is (minimally) smooth in XI No need to specify models (or a class of models), estimators, or
dependent variables.I Results of one run apply to the class of all models, all estimators, and
all dependent variables.
Stewart (Princeton) Causal Inference Apr 10 - Apr 19, 2017 15 / 168
How Factual is your Counterfactual?
Is your counterfactual close enough to data so that statisticalmethods provide empirical answers?
If not, the same calculations will be based on indefensible modelassumptions. With the curse of dimensionality, its too easy to fall intothis trap.
A good existing approach: Sensitivity testing, but this requires theuser to specify a class of models and then to estimate them all andcheck how much inferences change
King/Zeng “Convex Hull” approach:
I Specify your explanatory variables, X .I Assume E(Y |X ) is (minimally) smooth in XI No need to specify models (or a class of models), estimators, or
dependent variables.I Results of one run apply to the class of all models, all estimators, and
all dependent variables.
Stewart (Princeton) Causal Inference Apr 10 - Apr 19, 2017 15 / 168
How Factual is your Counterfactual?
Is your counterfactual close enough to data so that statisticalmethods provide empirical answers?
If not, the same calculations will be based on indefensible modelassumptions. With the curse of dimensionality, its too easy to fall intothis trap.
A good existing approach: Sensitivity testing, but this requires theuser to specify a class of models and then to estimate them all andcheck how much inferences change
King/Zeng “Convex Hull” approach:
I Specify your explanatory variables, X .I Assume E(Y |X ) is (minimally) smooth in XI No need to specify models (or a class of models), estimators, or
dependent variables.I Results of one run apply to the class of all models, all estimators, and
all dependent variables.
Stewart (Princeton) Causal Inference Apr 10 - Apr 19, 2017 15 / 168
How Factual is your Counterfactual?
Is your counterfactual close enough to data so that statisticalmethods provide empirical answers?
If not, the same calculations will be based on indefensible modelassumptions. With the curse of dimensionality, its too easy to fall intothis trap.
A good existing approach: Sensitivity testing, but this requires theuser to specify a class of models and then to estimate them all andcheck how much inferences change
King/Zeng “Convex Hull” approach:
I Specify your explanatory variables, X .I Assume E(Y |X ) is (minimally) smooth in XI No need to specify models (or a class of models), estimators, or
dependent variables.I Results of one run apply to the class of all models, all estimators, and
all dependent variables.
Stewart (Princeton) Causal Inference Apr 10 - Apr 19, 2017 15 / 168
How Factual is your Counterfactual?
Is your counterfactual close enough to data so that statisticalmethods provide empirical answers?
If not, the same calculations will be based on indefensible modelassumptions. With the curse of dimensionality, its too easy to fall intothis trap.
A good existing approach: Sensitivity testing, but this requires theuser to specify a class of models and then to estimate them all andcheck how much inferences change
King/Zeng “Convex Hull” approach:I Specify your explanatory variables, X .
I Assume E(Y |X ) is (minimally) smooth in XI No need to specify models (or a class of models), estimators, or
dependent variables.I Results of one run apply to the class of all models, all estimators, and
all dependent variables.
Stewart (Princeton) Causal Inference Apr 10 - Apr 19, 2017 15 / 168
How Factual is your Counterfactual?
Is your counterfactual close enough to data so that statisticalmethods provide empirical answers?
If not, the same calculations will be based on indefensible modelassumptions. With the curse of dimensionality, its too easy to fall intothis trap.
A good existing approach: Sensitivity testing, but this requires theuser to specify a class of models and then to estimate them all andcheck how much inferences change
King/Zeng “Convex Hull” approach:I Specify your explanatory variables, X .I Assume E(Y |X ) is (minimally) smooth in X
I No need to specify models (or a class of models), estimators, ordependent variables.
I Results of one run apply to the class of all models, all estimators, andall dependent variables.
Stewart (Princeton) Causal Inference Apr 10 - Apr 19, 2017 15 / 168
How Factual is your Counterfactual?
Is your counterfactual close enough to data so that statisticalmethods provide empirical answers?
If not, the same calculations will be based on indefensible modelassumptions. With the curse of dimensionality, its too easy to fall intothis trap.
A good existing approach: Sensitivity testing, but this requires theuser to specify a class of models and then to estimate them all andcheck how much inferences change
King/Zeng “Convex Hull” approach:I Specify your explanatory variables, X .I Assume E(Y |X ) is (minimally) smooth in XI No need to specify models (or a class of models), estimators, or
dependent variables.
I Results of one run apply to the class of all models, all estimators, andall dependent variables.
Stewart (Princeton) Causal Inference Apr 10 - Apr 19, 2017 15 / 168
How Factual is your Counterfactual?
Is your counterfactual close enough to data so that statisticalmethods provide empirical answers?
If not, the same calculations will be based on indefensible modelassumptions. With the curse of dimensionality, its too easy to fall intothis trap.
A good existing approach: Sensitivity testing, but this requires theuser to specify a class of models and then to estimate them all andcheck how much inferences change
King/Zeng “Convex Hull” approach:I Specify your explanatory variables, X .I Assume E(Y |X ) is (minimally) smooth in XI No need to specify models (or a class of models), estimators, or
dependent variables.I Results of one run apply to the class of all models, all estimators, and
all dependent variables.
Stewart (Princeton) Causal Inference Apr 10 - Apr 19, 2017 15 / 168
Interpolation vs Extrapolation in one Dimension
Stewart (Princeton) Causal Inference Apr 10 - Apr 19, 2017 16 / 168
Interpolation or Extrapolation in One and Two Dimensions
Interpolation: Inside the convex hull
Extrapolation: Outside the convex hull
Calculating the convex hull would take forever in high-dimensions
WhatIf package uses linear programming to check if a candidatepoint is inside the hull
The key idea is making sure your counterfactual is near the data!
Stewart (Princeton) Causal Inference Apr 10 - Apr 19, 2017 17 / 168
Interpolation or Extrapolation in One and Two Dimensions
Interpolation: Inside the convex hull
Extrapolation: Outside the convex hull
Calculating the convex hull would take forever in high-dimensions
WhatIf package uses linear programming to check if a candidatepoint is inside the hull
The key idea is making sure your counterfactual is near the data!
Stewart (Princeton) Causal Inference Apr 10 - Apr 19, 2017 17 / 168
Interpolation or Extrapolation in One and Two Dimensions
Interpolation: Inside the convex hull
Extrapolation: Outside the convex hull
Calculating the convex hull would take forever in high-dimensions
WhatIf package uses linear programming to check if a candidatepoint is inside the hull
The key idea is making sure your counterfactual is near the data!
Stewart (Princeton) Causal Inference Apr 10 - Apr 19, 2017 17 / 168
Interpolation or Extrapolation in One and Two Dimensions
Interpolation: Inside the convex hull
Extrapolation: Outside the convex hull
Calculating the convex hull would take forever in high-dimensions
WhatIf package uses linear programming to check if a candidatepoint is inside the hull
The key idea is making sure your counterfactual is near the data!
Stewart (Princeton) Causal Inference Apr 10 - Apr 19, 2017 17 / 168
Interpolation or Extrapolation in One and Two Dimensions
Interpolation: Inside the convex hull
Extrapolation: Outside the convex hull
Calculating the convex hull would take forever in high-dimensions
WhatIf package uses linear programming to check if a candidatepoint is inside the hull
The key idea is making sure your counterfactual is near the data!
Stewart (Princeton) Causal Inference Apr 10 - Apr 19, 2017 17 / 168
Interpolation or Extrapolation in One and Two Dimensions
Interpolation: Inside the convex hull
Extrapolation: Outside the convex hull
Calculating the convex hull would take forever in high-dimensions
WhatIf package uses linear programming to check if a candidatepoint is inside the hull
The key idea is making sure your counterfactual is near the data!
Stewart (Princeton) Causal Inference Apr 10 - Apr 19, 2017 17 / 168
Replication: Doyle and Sambanis, APSR 2000
Data: 124 Post-World War II civil wars
Dependent variable: peacebuilding success
Treatment variable: multilateral UN peacekeeping intervention (0/1)
Control variables: war type, severity, and duration; developmentstatus; etc...
Counterfactuals: UN intervention switched (0/1 to 1/0) for eachobservation
Percent of counterfactuals in the convex hull:
0%
Thus, without estimating any models, we know inferences will bemodel dependent; for illustration, here is an example. . . .
Stewart (Princeton) Causal Inference Apr 10 - Apr 19, 2017 18 / 168
Replication: Doyle and Sambanis, APSR 2000
Data: 124 Post-World War II civil wars
Dependent variable: peacebuilding success
Treatment variable: multilateral UN peacekeeping intervention (0/1)
Control variables: war type, severity, and duration; developmentstatus; etc...
Counterfactuals: UN intervention switched (0/1 to 1/0) for eachobservation
Percent of counterfactuals in the convex hull:
0%
Thus, without estimating any models, we know inferences will bemodel dependent; for illustration, here is an example. . . .
Stewart (Princeton) Causal Inference Apr 10 - Apr 19, 2017 18 / 168
Replication: Doyle and Sambanis, APSR 2000
Data: 124 Post-World War II civil wars
Dependent variable: peacebuilding success
Treatment variable: multilateral UN peacekeeping intervention (0/1)
Control variables: war type, severity, and duration; developmentstatus; etc...
Counterfactuals: UN intervention switched (0/1 to 1/0) for eachobservation
Percent of counterfactuals in the convex hull:
0%
Thus, without estimating any models, we know inferences will bemodel dependent; for illustration, here is an example. . . .
Stewart (Princeton) Causal Inference Apr 10 - Apr 19, 2017 18 / 168
Replication: Doyle and Sambanis, APSR 2000
Data: 124 Post-World War II civil wars
Dependent variable: peacebuilding success
Treatment variable: multilateral UN peacekeeping intervention (0/1)
Control variables: war type, severity, and duration; developmentstatus; etc...
Counterfactuals: UN intervention switched (0/1 to 1/0) for eachobservation
Percent of counterfactuals in the convex hull:
0%
Thus, without estimating any models, we know inferences will bemodel dependent; for illustration, here is an example. . . .
Stewart (Princeton) Causal Inference Apr 10 - Apr 19, 2017 18 / 168
Replication: Doyle and Sambanis, APSR 2000
Data: 124 Post-World War II civil wars
Dependent variable: peacebuilding success
Treatment variable: multilateral UN peacekeeping intervention (0/1)
Control variables: war type, severity, and duration; developmentstatus; etc...
Counterfactuals: UN intervention switched (0/1 to 1/0) for eachobservation
Percent of counterfactuals in the convex hull:
0%
Thus, without estimating any models, we know inferences will bemodel dependent; for illustration, here is an example. . . .
Stewart (Princeton) Causal Inference Apr 10 - Apr 19, 2017 18 / 168
Replication: Doyle and Sambanis, APSR 2000
Data: 124 Post-World War II civil wars
Dependent variable: peacebuilding success
Treatment variable: multilateral UN peacekeeping intervention (0/1)
Control variables: war type, severity, and duration; developmentstatus; etc...
Counterfactuals: UN intervention switched (0/1 to 1/0) for eachobservation
Percent of counterfactuals in the convex hull:
0%
Thus, without estimating any models, we know inferences will bemodel dependent; for illustration, here is an example. . . .
Stewart (Princeton) Causal Inference Apr 10 - Apr 19, 2017 18 / 168
Replication: Doyle and Sambanis, APSR 2000
Data: 124 Post-World War II civil wars
Dependent variable: peacebuilding success
Treatment variable: multilateral UN peacekeeping intervention (0/1)
Control variables: war type, severity, and duration; developmentstatus; etc...
Counterfactuals: UN intervention switched (0/1 to 1/0) for eachobservation
Percent of counterfactuals in the convex hull:
0%
Thus, without estimating any models, we know inferences will bemodel dependent; for illustration, here is an example. . . .
Stewart (Princeton) Causal Inference Apr 10 - Apr 19, 2017 18 / 168
Replication: Doyle and Sambanis, APSR 2000
Data: 124 Post-World War II civil wars
Dependent variable: peacebuilding success
Treatment variable: multilateral UN peacekeeping intervention (0/1)
Control variables: war type, severity, and duration; developmentstatus; etc...
Counterfactuals: UN intervention switched (0/1 to 1/0) for eachobservation
Percent of counterfactuals in the convex hull: 0%
Thus, without estimating any models, we know inferences will bemodel dependent; for illustration, here is an example. . . .
Stewart (Princeton) Causal Inference Apr 10 - Apr 19, 2017 18 / 168
Replication: Doyle and Sambanis, APSR 2000
Data: 124 Post-World War II civil wars
Dependent variable: peacebuilding success
Treatment variable: multilateral UN peacekeeping intervention (0/1)
Control variables: war type, severity, and duration; developmentstatus; etc...
Counterfactuals: UN intervention switched (0/1 to 1/0) for eachobservation
Percent of counterfactuals in the convex hull: 0%
Thus, without estimating any models, we know inferences will bemodel dependent; for illustration, here is an example. . . .
Stewart (Princeton) Causal Inference Apr 10 - Apr 19, 2017 18 / 168
Doyle and Sambanis, Logit ModelOriginal Model Modified Model
Variables Coeff SE P-val Coeff SE P-valWartype −1.742 .609 .004 −1.666 .606 .006Logdead −.445 .126 .000 −.437 .125 .000Wardur .006 .006 .258 .006 .006 .342Factnum −1.259 .703 .073 −1.045 .899 .245Factnum2 .062 .065 .346 .032 .104 .756Trnsfcap .004 .002 .010 .004 .002 .017Develop .001 .000 .065 .001 .000 .068Exp −6.016 3.071 .050 −6.215 3.065 .043Decade −.299 .169 .077 −0.284 .169 .093Treaty 2.124 .821 .010 2.126 .802 .008UNOP4 3.135 1.091 .004 .262 1.392 .851Wardur*UNOP4 — — — .037 .011 .001Constant 8.609 2.157 0.000 7.978 2.350 .000N 122 122Log-likelihood -45.649 -44.902Pseudo R2 .423 .433
Stewart (Princeton) Causal Inference Apr 10 - Apr 19, 2017 19 / 168
Doyle and Sambanis: Model Dependence
Stewart (Princeton) Causal Inference Apr 10 - Apr 19, 2017 20 / 168
UN Peacekeeping Operations
Stewart (Princeton) Causal Inference Apr 10 - Apr 19, 2017 21 / 168
Another Example
Remember our negative binomial model?
mod |z|)
(Intercept) 0.5943 0.1718 3.459 0.000541 ***
cathunemp 7.9323 0.9150 8.669 < 2e-16 ***
protunemp -19.1683 2.3713 -8.084 6.29e-16 ***
Stewart (Princeton) Causal Inference Apr 10 - Apr 19, 2017 22 / 168
Proposed Counterfactuals
Let’s consider two first differences we might plausibly estimate. At thebaseline, both variables are assumed fixed at their sample means.
1. Counterfactual 1: Catholic unemployment increases by one standarddeviation and Protestant unemployment increases by one standarddeviation.
2. Counterfactual 2: Catholic unemployment decreases by one standarddeviation and Protestant unemployment increases by one standarddeviation.
Stewart (Princeton) Causal Inference Apr 10 - Apr 19, 2017 23 / 168
Proposed Counterfactuals
Let’s consider two first differences we might plausibly estimate.
At thebaseline, both variables are assumed fixed at their sample means.
1. Counterfactual 1: Catholic unemployment increases by one standarddeviation and Protestant unemployment increases by one standarddeviation.
2. Counterfactual 2: Catholic unemployment decreases by one standarddeviation and Protestant unemployment increases by one standarddeviation.
Stewart (Princeton) Causal Inference Apr 10 - Apr 19, 2017 23 / 168
Proposed Counterfactuals
Let’s consider two first differences we might plausibly estimate. At thebaseline, both variables are assumed fixed at their sample means.
1. Counterfactual 1: Catholic unemployment increases by one standarddeviation and Protestant unemployment increases by one standarddeviation.
2. Counterfactual 2: Catholic unemployment decreases by one standarddeviation and Protestant unemployment increases by one standarddeviation.
Stewart (Princeton) Causal Inference Apr 10 - Apr 19, 2017 23 / 168
Proposed Counterfactuals
Let’s consider two first differences we might plausibly estimate. At thebaseline, both variables are assumed fixed at their sample means.
1. Counterfactual 1: Catholic unemployment increases by one standarddeviation and Protestant unemployment increases by one standarddeviation.
2. Counterfactual 2: Catholic unemployment decreases by one standarddeviation and Protestant unemployment increases by one standarddeviation.
Stewart (Princeton) Causal Inference Apr 10 - Apr 19, 2017 23 / 168
Proposed Counterfactuals
Let’s consider two first differences we might plausibly estimate. At thebaseline, both variables are assumed fixed at their sample means.
1. Counterfactual 1: Catholic unemployment increases by one standarddeviation and Protestant unemployment increases by one standarddeviation.
2. Counterfactual 2: Catholic unemployment decreases by one standarddeviation and Protestant unemployment increases by one standarddeviation.
Stewart (Princeton) Causal Inference Apr 10 - Apr 19, 2017 23 / 168
Proposed Counterfactuals Plotted
●●●●●●●●●●●●●●●●●●●
●
●●●●●●●●●●
●
●●●●
●●
●
●●●
●
●
●
●●
●
●●
●
●
●●●●
●●●●
●
●●●●●
●
●
●
●●
●
●
●
●
●●
●●
●
●
●●●
●●
●●
●
●
●
●
●●
●
●● ●
●
●●
●●●
●●● ●●●
●●
●
●●●●
●
●●
●●
●●●
●●
●
●●●●●
●
●●●●●●
●●
●●
●●●●
●●
●●
●●●
●
●●
●●
●
●●●
●
●
● ●●●
●●●
●●●
●
●●●●●●
●●
●●
●
●●●●●●
●●●
●●●
●
●●●●● ●●●
●●●
●●●
●●● ●●●
●●●●
●●●
●
●●●●●●● ●
●●●
●●●●
●●
●●●
●●
●●●●●●●●●
●●●●●●●●
●●
●
●●● ●●●●
●●
●
●●●●
●●●●●
●
●
●
●●●●●●●●●
●●●
●
●●●●●●●●●●●
●●
●●
●●●●
●●●●
●
●●●
●●
●●●●●
●●●● ●●●●●
●
●●●
●●●
●●●●●●●●●●●●
●●●●●●●●●
●●●
●●●●●●●●●●●●
0.05 0.10 0.15
0.10
0.20
0.30
0.40
protunemp
cath
unem
p
Stewart (Princeton) Causal Inference Apr 10 - Apr 19, 2017 24 / 168
Checking the Convex Hull
library(WhatIf)
cf1
A Measure of Distance
The whatif function also tells us the percentage of data points within 1geometric variance of the counterfactual.
> cf.res1$sum.stat
1
0.2608696
> cf.res2$sum.stat
1
0.04603581
The geometric variance is a generalization of the usual variance which ismore suitable to discrete and continuous variables- essentially it is theaverage pairwise Gower distance in the data. The number of GV’s awaycan be altered with the nearby argument.
Stewart (Princeton) Causal Inference Apr 10 - Apr 19, 2017 26 / 168
A Measure of Distance
The whatif function also tells us the percentage of data points within 1geometric variance of the counterfactual.
> cf.res1$sum.stat
1
0.2608696
> cf.res2$sum.stat
1
0.04603581
The geometric variance is a generalization of the usual variance which ismore suitable to discrete and continuous variables- essentially it is theaverage pairwise Gower distance in the data. The number of GV’s awaycan be altered with the nearby argument.
Stewart (Princeton) Causal Inference Apr 10 - Apr 19, 2017 26 / 168
A Measure of Distance
The whatif function also tells us the percentage of data points within 1geometric variance of the counterfactual.
> cf.res1$sum.stat
1
0.2608696
> cf.res2$sum.stat
1
0.04603581
The geometric variance is a generalization of the usual variance which ismore suitable to discrete and continuous variables- essentially it is theaverage pairwise Gower distance in the data. The number of GV’s awaycan be altered with the nearby argument.
Stewart (Princeton) Causal Inference Apr 10 - Apr 19, 2017 26 / 168
Biases in Regression: A Decomposition
d = mean(Y |D = 1)−mean(Y |D = 0)
bias ≡ E (d)− θ = ∆o + ∆p + ∆i + ∆e
∆o Omitted variable bias (ignorability)
∆p Post-treatment bias (check this with theory!)
∆i Interpolation bias (use models or matching)
∆e Extrapolation bias (check this with data!)
Stewart (Princeton) Causal Inference Apr 10 - Apr 19, 2017 27 / 168
Biases in Regression: A Decomposition
d = mean(Y |D = 1)−mean(Y |D = 0)
bias ≡ E (d)− θ = ∆o + ∆p + ∆i + ∆e
∆o Omitted variable bias (ignorability)
∆p Post-treatment bias (check this with theory!)
∆i Interpolation bias (use models or matching)
∆e Extrapolation bias (check this with data!)
Stewart (Princeton) Causal Inference Apr 10 - Apr 19, 2017 27 / 168
Biases in Regression: A Decomposition
d = mean(Y |D = 1)−mean(Y |D = 0)
bias ≡ E (d)− θ
= ∆o + ∆p + ∆i + ∆e
∆o Omitted variable bias (ignorability)
∆p Post-treatment bias (check this with theory!)
∆i Interpolation bias (use models or matching)
∆e Extrapolation bias (check this with data!)
Stewart (Princeton) Causal Inference Apr 10 - Apr 19, 2017 27 / 168
Biases in Regression: A Decomposition
d = mean(Y |D = 1)−mean(Y |D = 0)
bias ≡ E (d)− θ = ∆o + ∆p + ∆i + ∆e
∆o Omitted variable bias (ignorability)
∆p Post-treatment bias (check this with theory!)
∆i Interpolation bias (use models or matching)
∆e Extrapolation bias (check this with data!)
Stewart (Princeton) Causal Inference Apr 10 - Apr 19, 2017 27 / 168
Biases in Regression: A Decomposition
d = mean(Y |D = 1)−mean(Y |D = 0)
bias ≡ E (d)− θ = ∆o + ∆p + ∆i + ∆e
∆o Omitted variable bias (ignorability)
∆p Post-treatment bias (check this with theory!)
∆i Interpolation bias (use models or matching)
∆e Extrapolation bias (check this with data!)
Stewart (Princeton) Causal Inference Apr 10 - Apr 19, 2017 27 / 168
Biases in Regression: A Decomposition
d = mean(Y |D = 1)−mean(Y |D = 0)
bias ≡ E (d)− θ = ∆o + ∆p + ∆i + ∆e
∆o Omitted variable bias (ignorability)
∆p Post-treatment bias (check this with theory!)
∆i Interpolation bias (use models or matching)
∆e Extrapolation bias (check this with data!)
Stewart (Princeton) Causal Inference Apr 10 - Apr 19, 2017 27 / 168
Biases in Regression: A Decomposition
d = mean(Y |D = 1)−mean(Y |D = 0)
bias ≡ E (d)− θ = ∆o + ∆p + ∆i + ∆e
∆o Omitted variable bias (ignorability)
∆p Post-treatment bias (check this with theory!)
∆i Interpolation bias (use models or matching)
∆e Extrapolation bias (check this with data!)
Stewart (Princeton) Causal Inference Apr 10 - Apr 19, 2017 27 / 168
Biases in Regression: A Decomposition
d = mean(Y |D = 1)−mean(Y |D = 0)
bias ≡ E (d)− θ = ∆o + ∆p + ∆i + ∆e
∆o Omitted variable bias (ignorability)
∆p Post-treatment bias (check this with theory!)
∆i Interpolation bias (use models or matching)
∆e Extrapolation bias (check this with data!)
Stewart (Princeton) Causal Inference Apr 10 - Apr 19, 2017 27 / 168
Counterfactuals Summary
When the model is true, we can extrapolate or interpolate as desired.
In practical settings we do not believe the model is true, even if it islocally accurate
Thus we may get wildly different counterfactuals from differentmodels when we are far from the data, we call this model dependence
The convex hull provides a way to check for extrapolation
This is a great way of assessing the reasonableness of our simulatedquantities of interest
Stewart (Princeton) Causal Inference Apr 10 - Apr 19, 2017 28 / 168
Counterfactuals Summary
When the model is true, we can extrapolate or interpolate as desired.
In practical settings we do not believe the model is true, even if it islocally accurate
Thus we may get wildly different counterfactuals from differentmodels when we are far from the data, we call this model dependence
The convex hull provides a way to check for extrapolation
This is a great way of assessing the reasonableness of our simulatedquantities of interest
Stewart (Princeton) Causal Inference Apr 10 - Apr 19, 2017 28 / 168
Counterfactuals Summary
When the model is true, we can extrapolate or interpolate as desired.
In practical settings we do not believe the model is true, even if it islocally accurate
Thus we may get wildly different counterfactuals from differentmodels when we are far from the data, we call this model dependence
The convex hull provides a way to check for extrapolation
This is a great way of assessing the reasonableness of our simulatedquantities of interest
Stewart (Princeton) Causal Inference Apr 10 - Apr 19, 2017 28 / 168
Counterfactuals Summary
When the model is true, we can extrapolate or interpolate as desired.
In practical settings we do not believe the model is true, even if it islocally accurate
Thus we may get wildly different counterfactuals from differentmodels when we are far from the data, we call this model dependence
The convex hull provides a way to check for extrapolation
This is a great way of assessing the reasonableness of our simulatedquantities of interest
Stewart (Princeton) Causal Inference Apr 10 - Apr 19, 2017 28 / 168
Counterfactuals Summary
When the model is true, we can extrapolate or interpolate as desired.
In practical settings we do not believe the model is true, even if it islocally accurate
Thus we may get wildly different counterfactuals from differentmodels when we are far from the data, we call this model dependence
The convex hull provides a way to check for extrapolation
This is a great way of assessing the reasonableness of our simulatedquantities of interest
Stewart (Princeton) Causal Inference Apr 10 - Apr 19, 2017 28 / 168
1 Assessing Counterfactuals
2 A (Brief) Review of Selection on Observables
3 Matching as Non-parametric Preprocessing
4 Fundamentals of Matching
5 Three Approaches to Matching
6 The Propensity Score
7 Mechanisms: Estimands and Identification
8 Mechanisms: Estimation
9 Controlled Direct Effects
10 Appendix: The Case Against Propensity Score Matching
Stewart (Princeton) Causal Inference Apr 10 - Apr 19, 2017 29 / 168
1 Assessing Counterfactuals
2 A (Brief) Review of Selection on Observables
3 Matching as Non-parametric Preprocessing
4 Fundamentals of Matching
5 Three Approaches to Matching
6 The Propensity Score
7 Mechanisms: Estimands and Identification
8 Mechanisms: Estimation
9 Controlled Direct Effects
10 Appendix: The Case Against Propensity Score Matching
Stewart (Princeton) Causal Inf