Gov 2002 4 Observational Studiesand Confounding
Matthew Blackwell
September 10 2015
Where are we Where are we going
bull Last two weeks randomized experimentsbull From here on observational studies
What are they How do they admit the possiblity of confounding How can we adjust for confounding
1 Observationalstudies
Experiment review
bull An experiment is a study where assignment to treatment iscontrolled by the researcher
119901119894 = ℙ[119863119894 = 1] be the probability of treatment assignmentprobability
119901119894 is controlled and known by researcher in an experimentbull A randomized experiment is an experiment with the following
properties
1 Positivity assignment is probabilistic 0 lt 119901119894 lt 1 No deterministic assignment
2 Unconfoundedness ℙ[119863119894 = 1|119832(1) 119832(0)] = ℙ[119863119894 = 1] Treatment assignment does not depend on any potential
outcomes Sometimes written as 119863119894 ⟂⟂ (119832(1) 119832(0))
Observational studies
bull Many different sets of identification assumptions that wersquollcover
bull To start focus on studies that are similar to experiments justwithout a known and controlled treatment assignment
No guarantee that the treatment and control groups arecomparable
1 Positivity assignment is probabilistic0 lt ℙ[119863119894 = 1|119831 119832(1) 119832(0)] lt 1
2 No unmeasured confoundingℙ[119863119894 = 1|119831 119832(1) 119832(0)] = ℙ[119863119894 = 1|119831]
For some observed 119831 Also called unconfoundedness ignorability selection on
observables no omitted variables exogenous conditionalexchangeable etc
Designing observational studies
bull Rubin (2008) argues that we should still ldquodesignrdquo ourobservational studies
Pick the ideal experiment to this observational study Hide the outcome data Try to estimate the randomization procedure Analyze this as an experiment with this estimated procedure
bull Tries to minimize ldquosnoopingrdquo by picking the best modelingstrategy before seeing the outcome
Discrete covariates
bull Suppose that we knew that 119863119894 was unconfounded within levelsof a binary 119883119894
bull Then we could always estimate the causal effect using iteratedexpectations as in a stratified randomized experiment
1201241198831114107120124[119884119894|119863119894 = 1119883119894] minus 120124[119884119894|119863119894 = 0119883119894]1114110
= 1114101120124[119884119894|119863119894 = 1119883119894 = 1] minus 120124[119884119894|119863119894 = 0119883119894 = 1]1114104111405911138401113840111384011138401113840111384011138401113840111384011138401113840111384011138401113840111384011138401113840111384011138401113840111384011138401113840111384011138401113840111384011138401113840111384011140601113840111384011138401113840111384011138401113840111384011138401113840111384011138401113840111384011138401113840111384011138401113840111384011138401113840111384011138401113840111384011138401113840111384011138401114061
diff-in-means for 119883119894=1113568
ℙ[119883119894 = 1]11140591113840111384011138401113840111406011138401113840111384011138401114061share of 119883119894=1113568
+ 1114101120124[119884119894|119863119894 = 1119883119894 = 0] minus 120124[119884119894|119863119894 = 0119883119894 = 0]1114104111405911138401113840111384011138401113840111384011138401113840111384011138401113840111384011138401113840111384011138401113840111384011138401113840111384011138401113840111384011138401113840111384011138401113840111384011140601113840111384011138401113840111384011138401113840111384011138401113840111384011138401113840111384011138401113840111384011138401113840111384011138401113840111384011138401113840111384011138401113840111384011138401114061
diff-in-means for 119883119894=1113567
ℙ[119883119894 = 0]11140591113840111384011138401113840111406011138401113840111384011138401114061share of 119883119894=1113567
bull Never used our knowledge of the randomization for thisquantity
Continuous covariates
bull So great we can stratify Why not do this all the timebull What if 119883119894 = income for unit 119894
Each unit has its own value of 119883119894 $54134 $123043 $23842 If 119883119894 = 54134 is unique will only observe 1 of these
120124[119884119894|119863119894 = 1119883119894 = 54134] minus 120124[119884119894|119863119894 = 0119883119894 = 54134]
cannot stratify to each unique value of 119883119894bull Practically this is massively important almost always have
data with unique values
Going to a superpopulation
bull From here on out wersquoll focus less on the finite populationmodel
Harder with (functionally) continuous covariatesbull Assume that each unit 119894 is drawn from an infinite
superpopulation implies that (119884119894(0) 119884119894(1) 119863119894 119883119894) are a draw from their
population joint distributionbull Potential outcomes are now typical random variables
120583119888(119909) = 120124[119884119894(0)|119883119894 = 119909] and 120583119905(119909) = 120124[119884119894(1)|119883119894 = 119909] 1205901113569119888 (119909) = 120141[119884119894(0)|119883119894 = 119909] and 1205901113569119905 (119909) = 120141[119884119894(1)|119883119894 = 119909] 120591 = 120124[120583119905(119909) minus 120583119888(119909)|119883119894 = 119909]
Assumptions in the superpopulation
bull With an infinite superpopulation worry less aboutconditioning on the entire sample
Units are now independent due to random sampling from aninfinite population
bull No unmeasured confoudning implies that
ℙ(119863119894 = 1|119884119894(0) 119884119894(1) 119883119894) = ℙ(119863119894 = 1|119883119894)
bull Or written using conditional independence
119863119894 ⟂⟂ 1114100119884119894(0) 119884119894(1)1114103|119883119894
bull Positivity can be written 0 lt ℙ(119863119894 = 1|119883119894 = 119909) lt 1 for all 119909 inthe support of 119883119894
2 Confounding
What is confounding
bull Confounding is the bias caused by common causes of thetreatment and outcome
Leads to ldquospurious correlationrdquobull In observational studies the goal is to avoid confounding
inherent in the databull Pervasive in the social sciences
effect of income on voting (confounding age) effect of job training program on employment (confounding
motivation) effect of political institutions on economic development
(confounding previous economic development)bull No unmeasured confounding assumes that wersquove measured all
sources of confounding
Big problem
bull How can we determine if no unmeasured confounding holds ifwe didnrsquot assign the treatment
bull Put differently What covariates do we need to condition on What covariates do we need to match on What covaraites do we need to include in our regressions
bull One way from the assumption itself ℙ[119863119894 = 1|119831 119832(1) 119832(0)] = ℙ[119863119894 = 1|119831] Include covariates such that conditional on them the
treatment assignment does not depend on the potentialoutcomes
bull Another way use DAGs and look at back-door paths
Backdoor paths and blocking paths
bull Backdoor path is a non-causal path from 119863 to 119884 Would remain if we removed any arrows pointing out of 119863
bull Backdoor paths between 119863 and 119884 common causes of 119863and 119884
119863
119883
119884
bull Here there is a backdoor path 119863 larr 119883 rarr 119884 where 119883 is acommon cause for the treatment and the outcome
Other types of confounding
119863
119880 119883
119884
bull 119863 is enrolling in a job training programbull 119884 is getting a jobbull 119880 is being motivatedbull 119883 is number of job applications sent outbull Big assumption here no arrow from 119880 to 119884
Other types of confounding
119863
119880 119883
119884
bull 119863 is exercisebull 119884 is having a diseasebull 119880 is lifestylebull 119883 is smokingbull Big assumption here no arrow from 119880 to 119884
Whatrsquos the problem with backdoorpaths
119863
119880 119883
119884
bull A path is blocked if1 we control for or stratify a non-collider on that path OR2 we do not control for a collider
bull Unblocked backdoor paths confoundingbull In the DAG here if we condition on 119883 then the backdoor
path is blocked
Not all backdoor paths
119863
1198801113568119883119883
119884
bull Conditioning on the posttreatment covariates opens thenon-causal path
selection bias
M-bias
119863
1198801113568 1198801113569119883119883
119884
bull Not all backdoor paths induce confoundingbull This backdoor path is blocked by the collider 119883119894 that we donrsquot
control forbull If we control for 119883119894 opens the path and induces
confounding Sometimes called M-bias
bull Controversial because of differing views on what to control for Rubin thinks that M-bias is a ldquomathematical curiosityrdquo and we
should control for all pretreatment variables Pearl and others think M-bias is a real threat
Backdoor criterion
bull Can we use a DAG to evaluate no unmeasured confoundersbull Pearl answered yes with the backdoor criterion which states
that the effect of 119863 on 119884 is identified if1 No backdoor paths from 119863 to 119884 OR2 Measured covariates are sufficient to block all backdoor paths
from 119863 to 119884
bull First is really only valid for randomized experimentsbull The backdoor criterion is fairly powerful Tells us
if there confounding given this DAG if it is possible to removing the confounding and what variables to condition on to eliminate the confounding
SWIGs
119863 | 119889 119884(119889)
119880 119883
119884
bull Itrsquos a little hard to see how the backdoor criterion implies nounmeasured confounders
No potential outcomes on this graphbull Richardson and Robins Single World Intervention Graphs
Split 119863 node into natural value (119863) and intervention value 119889 Let all effects of 119863 take their potential value under
intervention 119884(119889)bull Now can see are 119863 and 119884(119889) related
119863 larr 119880 rarr 119883 rarr 119884(119889) implies not independent Conditioning on 119883 blocks that backdoor path 119863 ⟂⟂ 119884(119889)|119883
No unmeasured confounders is nottestable
bull No unmeasured confounding places no restrictions on theobserved data
1114100119884119894(0)|119863119894 = 1119883119894111410311140591113840111384011138401113840111384011138401113840111384011138401113840111406011138401113840111384011138401113840111384011138401113840111384011138401114061unobserved
119889= 1114100119884119894(0)|119863119894 = 0119883119894111410311140591113840111384011138401113840111384011138401113840111384011138401113840111406011138401113840111384011138401113840111384011138401113840111384011138401114061observed
bull Here 119889= means equal in distributionbull No way to directly test this assumption without the
counterfactual data which is missing by definitionbull With backdoor criterion you must have the correct DAG
Assessing no unmeasured confounders
bull Can do ldquoplacebordquo tests where 119863119894 cannot have an effect(lagged outcomes etc)
bull Della Vigna and Kaplan (2007 QJE) effect of Fox Newsavailability on Republican vote share
Availability in 20002003 canrsquot affect past vote sharesbull Unconfoundedness could still be violated even if you pass this
test
Alternatives to no unmeasuredconfounding
bull Without explicit randomization we need some way ofidentifying causal effects
bull No unmeasured confounders asymp randomized experiment Indentification results very similar to experiments
bull With unmeasured confounding are we doomed Maybe notbull Other approaches rely on finding plausibly exogenous variation
in assignment of 119863119894 Instrumental variables (randomization + exclusion restriction) Over-time variation (diff-in-diff fixed effects) Arbitrary thresholds for treatment assignment (RDD)
3 No unmeasuredconfounders and OLS
Justifying regression
bull We know how randomized experiments imply thatdifferences-in-means identify the ATE
bull In the next few weeks wersquoll work through how no unmeasuredconfounding justifies a number of estimation strategies
bull Today itrsquos useful to walk through what no unmeasuredconfounding can buy us in a familiar setting OLS
Wersquoll cover regression more formally later
Constant effects set up
bull Assume a constant effects setup
119884119894(0) = 120572 + 119883prime119894 120573 + 119906119894
119884119894(1) = 120572 + 120591 + 119883prime119894 120573 + 119906119894
bull Constant effects because 119884119894(1) minus 119884119894(0) = 120591 for all unitsbull Use consistency to get the usual regression formula
119884119894 = 119884119894(1)119863119894 + 119884119894(0)(1 minus 119863119894)= 119884119894(0) + 1114100119884119894(1) minus 119884119894(0)1114103 sdot 119863119894
= 120572 + 120591 sdot 119863119894 + 119883prime119894 120573 + 119906119894
bull Does no unmeasured confounding help us identify the causalparameter 120591
Regression on residuals
bull First estimate the residuals of regression of the treatment andoutcome on the covariates
119894 = 119884119894 minus 120124[119884119894|119883119894]119894 = 119863119894 minus 120124[119863119894|119883119894]
bull Running a regression of 119894 on 119894 is equivalent to controllingfor 119883119894
119884119894 = 120572 + 120591 sdot 119863119894 + 119883prime119894 120573 + 119906119894
119894 = 120572 + 120591 sdot 119894 + 119894bull Here 119894 = 119906119894 minus 120124[119906119894|119883119894]
What does OLS estimate
bull Using the usual OLS theory we can show that the probabilitylimit of the OLS estimator of 120591 is
plim 111369611136931113700 =Cov(119894 119894)Var(119894)
= Cov(119894 120572 + 120591 sdot 119894 + 119894)Var(119894)
= 120591 sdot Cov(119894 119894) + Cov(119894 119894)Var(119894)
= 120591 + Cov(119894 119894)Var(119894)
Key OLS assumption
plim 111369611136931113700 = 120591 + Cov(119894 119894)Var(119894)
bull Key identification comes from Cov(119894 119894) = 0 Conditional on 119883119894 no relationship between 119863119894 and 119906119894
bull Note 119906119894 is a function of 119883119894 and 119884119894(119889) 119906119894 = 119884119894(0) minus 120572 minus 119883prime
119894 120573 when 119863119894 = 0 119906119894 = 119884119894(1) minus 120572 minus 120591 minus 119883prime
119894 120573 when 119863119894 = 1 condition on 119883119894 only variation in 119906119894 comes from 119884119894(119889)
bull No unmeasured confounding implies this assumption
119863119894 ⟂⟂ 1114100119884119894(1) 119884119894(0)1114103|119883119894 ⟹ 119863119894 ⟂⟂ 119906119894|119883119894 ⟹ Cov(119894 119894) = 0
Omitted variable bias
bull What happens when this is violated Suppose that there isone omitted variable (residualized from 119883119894)
119894 = 120582119894 + 120596119894
bull Wersquoll assume that if we could measure 119871119894 then nounmeasured confounding would hold
bull Leads to inconsistency in the OLS estimator
plim 111369611136931113700 = 120591 + 120582Cov(119894 119894)Var(119894)
bull Bias here is terms multiplied together1 coefficient on 119871119894 (120582)2 the coefficient of regression of 119863119894 on 119871119894 also controlling for 119883119894
4 Estimating causaleffects under nounmeasuredconfounders
Basic approach to estimation
bull Remember the usual approach to estimating the ATE withcovariates
bull Stratification Stratify the units by the covariates Calculate CATE within these strata
bull Standardizationdirect adjustment Average the CATEs across the strata to get ATE
bull How to create strata when 119883 has continuous components If 119883 is discrete with only a few levels can use the exact values
of 119883 Otherwise we may have to subclassifycoarsen the data
Classic example cigarspipes versuscigarettes
bull 119863119894 = 1 for pipecigar smokers 119863119894 = 0 for cigarette smokersbull 119884119894 = death in the first year of follow-upbull Naive positive effect cigarpipe smokers more likely to die
Whatrsquos the confounder here Age Pipecigar smokers much older than cigarette smokers
bull Cochranrsquos approach stratify based on coarsened age Divide age into 119896 strata 119878119894 isin 1199041113568 1199041113569 hellip 119904119896 1199041113568 might be 18-25 1199041113569 might be 26-35 and so on Calculate effect within strata and aggregate
bull Key assumption no unmeasured confounders using stratifiedversion of age
119863119894 ⟂⟂ 1114100119884119894(0) 119884119894(1)1114103|119878119894
Stratification on the propensity score
bull What about when 119883 has has many dimensionsbull Curse of dimensionality there will be very few if any units in
a given stratum of 119883119894bull Stratify on a low-dimensional summary the propensity score
119890(119909) = ℙ[119863119894 = 1|119883119894 = 119909]
PS = unitrsquos probability of being treated conditional on 119883119894
bull For a particular unit this is 119890(119883119894) = ℙ[119863119894 = 1|119883119894]bull Rosenbaum and Rubin (1983) showed that
119863119894 ⟂⟂ 1114100119884119894(0) 119884119894(1)1114103 | 119883119894 ⟹ 119863119894 ⟂⟂ 1114100119884119894(0) 119884119894(1)1114103 | 119890(119883119894)
stratifying on 119890119894 is the same as stratifying on the full 119883119894
Propensity score as balancing score
bull The propensity score is actually a balancing score whichmeans that
119863119894 ⟂⟂ 119883119894 | 119890(119883119894)
bull Conditional on the propensity score treatment is independentof the covariates
Treatment status is said to be balanced 119891(119883119894|119863119894 = 1 119890(119883119894)) = 119891(119883119894|119863119894 = 0 119890(119883119894))
bull Of course we have to know the true PS to have all theseresults work
Estimating the propensity score
bull Of course in observational studies we donrsquot know thepropensity score
bull We would run a parametric model with parameters 120574 toestimate the propensity scores
1 Estimate 2 Create 119894 = Pr[119863119894 = 1|119883119894 ]
bull For instance in R we could easily calculate the propensityscores using the glm function
pscores lt- glm(treat ~ var1 + var2 + var3 data = mydata
family = binomial())$fittedvalues
Propensity score specifics
bull What variables do we include in the propensity score model Any set of variables that blocks all the backdoor paths from 119863119894
to 119884119894
bull Check balance within strata of 119894 Covariates should bebalanced
119891(119883119894|119863119894 = 1 119894) = 119891(119883119894|119863119894 = 0 119894)
bull Can also use automatednonparametric tools for estimating 119894 Covariate Balancing Propensity Scores (Imai and Ratkovic)
Stratifying by the propensity score
bull How will we use the propensity score Matching (next week) Weighting (two weeks) Regression
(three weeks)bull Today coarsening the propensity score and stratifyingbull Choose boundary points 0 = 1198871113567 lt 1198871113568 lt hellip lt 119887119870minus1113568 lt 119887119870 = 1bull Create block indicators
119861119894(119896) =
⎧⎪⎪⎨⎪⎪⎩1 if 119887119896minus1113568 lt (119883119894) lt 1198871198960 otherwise
bull Calculate within-strata effect estimates
120591119896 = 120124[119884119894|119863119894 = 1 119861119894(119896) = 1] minus 120124[119884119894|119863119894 = 0 119861119894(119896) = 1]
Standardizationdirect adjustment
bull We calculated the CATEs for each strata of the PS 120591119896bull We can use law of iterated expectations to back out the ATEbull Take the average of the CATEs over the distribution of 119883
120591 =1198701114012119896=1113568
120591119896ℙ[119861119894(119896) = 1]
bull Note that ℙ[119861119894(119896) = 1] is just the proportion of units in block 119896
ℙ[119861119894(119896) = 1] =sum119873
119894=1113568 119861119894(119896)119873
5 Wrapping Up
Summary
bull Defined observational studiesbull Defined confounding and assessed when no unmeasured
confounding holdsbull Saw how no unmeasured confounding helps with OLSbull Saw how to estimate causal effects under no unmeasured
confounding using the propensity score
Next few weeks
bull Learn how to estimate causal effects under no unmeasuredconfounders via
Matching Weighting Regression
bull Then we move onto situations where no unmeasuredconfounders is violated
Where are we Where are we going
bull Last two weeks randomized experimentsbull From here on observational studies
What are they How do they admit the possiblity of confounding How can we adjust for confounding
1 Observationalstudies
Experiment review
bull An experiment is a study where assignment to treatment iscontrolled by the researcher
119901119894 = ℙ[119863119894 = 1] be the probability of treatment assignmentprobability
119901119894 is controlled and known by researcher in an experimentbull A randomized experiment is an experiment with the following
properties
1 Positivity assignment is probabilistic 0 lt 119901119894 lt 1 No deterministic assignment
2 Unconfoundedness ℙ[119863119894 = 1|119832(1) 119832(0)] = ℙ[119863119894 = 1] Treatment assignment does not depend on any potential
outcomes Sometimes written as 119863119894 ⟂⟂ (119832(1) 119832(0))
Observational studies
bull Many different sets of identification assumptions that wersquollcover
bull To start focus on studies that are similar to experiments justwithout a known and controlled treatment assignment
No guarantee that the treatment and control groups arecomparable
1 Positivity assignment is probabilistic0 lt ℙ[119863119894 = 1|119831 119832(1) 119832(0)] lt 1
2 No unmeasured confoundingℙ[119863119894 = 1|119831 119832(1) 119832(0)] = ℙ[119863119894 = 1|119831]
For some observed 119831 Also called unconfoundedness ignorability selection on
observables no omitted variables exogenous conditionalexchangeable etc
Designing observational studies
bull Rubin (2008) argues that we should still ldquodesignrdquo ourobservational studies
Pick the ideal experiment to this observational study Hide the outcome data Try to estimate the randomization procedure Analyze this as an experiment with this estimated procedure
bull Tries to minimize ldquosnoopingrdquo by picking the best modelingstrategy before seeing the outcome
Discrete covariates
bull Suppose that we knew that 119863119894 was unconfounded within levelsof a binary 119883119894
bull Then we could always estimate the causal effect using iteratedexpectations as in a stratified randomized experiment
1201241198831114107120124[119884119894|119863119894 = 1119883119894] minus 120124[119884119894|119863119894 = 0119883119894]1114110
= 1114101120124[119884119894|119863119894 = 1119883119894 = 1] minus 120124[119884119894|119863119894 = 0119883119894 = 1]1114104111405911138401113840111384011138401113840111384011138401113840111384011138401113840111384011138401113840111384011138401113840111384011138401113840111384011138401113840111384011138401113840111384011138401113840111384011140601113840111384011138401113840111384011138401113840111384011138401113840111384011138401113840111384011138401113840111384011138401113840111384011138401113840111384011138401113840111384011138401113840111384011138401114061
diff-in-means for 119883119894=1113568
ℙ[119883119894 = 1]11140591113840111384011138401113840111406011138401113840111384011138401114061share of 119883119894=1113568
+ 1114101120124[119884119894|119863119894 = 1119883119894 = 0] minus 120124[119884119894|119863119894 = 0119883119894 = 0]1114104111405911138401113840111384011138401113840111384011138401113840111384011138401113840111384011138401113840111384011138401113840111384011138401113840111384011138401113840111384011138401113840111384011138401113840111384011140601113840111384011138401113840111384011138401113840111384011138401113840111384011138401113840111384011138401113840111384011138401113840111384011138401113840111384011138401113840111384011138401113840111384011138401114061
diff-in-means for 119883119894=1113567
ℙ[119883119894 = 0]11140591113840111384011138401113840111406011138401113840111384011138401114061share of 119883119894=1113567
bull Never used our knowledge of the randomization for thisquantity
Continuous covariates
bull So great we can stratify Why not do this all the timebull What if 119883119894 = income for unit 119894
Each unit has its own value of 119883119894 $54134 $123043 $23842 If 119883119894 = 54134 is unique will only observe 1 of these
120124[119884119894|119863119894 = 1119883119894 = 54134] minus 120124[119884119894|119863119894 = 0119883119894 = 54134]
cannot stratify to each unique value of 119883119894bull Practically this is massively important almost always have
data with unique values
Going to a superpopulation
bull From here on out wersquoll focus less on the finite populationmodel
Harder with (functionally) continuous covariatesbull Assume that each unit 119894 is drawn from an infinite
superpopulation implies that (119884119894(0) 119884119894(1) 119863119894 119883119894) are a draw from their
population joint distributionbull Potential outcomes are now typical random variables
120583119888(119909) = 120124[119884119894(0)|119883119894 = 119909] and 120583119905(119909) = 120124[119884119894(1)|119883119894 = 119909] 1205901113569119888 (119909) = 120141[119884119894(0)|119883119894 = 119909] and 1205901113569119905 (119909) = 120141[119884119894(1)|119883119894 = 119909] 120591 = 120124[120583119905(119909) minus 120583119888(119909)|119883119894 = 119909]
Assumptions in the superpopulation
bull With an infinite superpopulation worry less aboutconditioning on the entire sample
Units are now independent due to random sampling from aninfinite population
bull No unmeasured confoudning implies that
ℙ(119863119894 = 1|119884119894(0) 119884119894(1) 119883119894) = ℙ(119863119894 = 1|119883119894)
bull Or written using conditional independence
119863119894 ⟂⟂ 1114100119884119894(0) 119884119894(1)1114103|119883119894
bull Positivity can be written 0 lt ℙ(119863119894 = 1|119883119894 = 119909) lt 1 for all 119909 inthe support of 119883119894
2 Confounding
What is confounding
bull Confounding is the bias caused by common causes of thetreatment and outcome
Leads to ldquospurious correlationrdquobull In observational studies the goal is to avoid confounding
inherent in the databull Pervasive in the social sciences
effect of income on voting (confounding age) effect of job training program on employment (confounding
motivation) effect of political institutions on economic development
(confounding previous economic development)bull No unmeasured confounding assumes that wersquove measured all
sources of confounding
Big problem
bull How can we determine if no unmeasured confounding holds ifwe didnrsquot assign the treatment
bull Put differently What covariates do we need to condition on What covariates do we need to match on What covaraites do we need to include in our regressions
bull One way from the assumption itself ℙ[119863119894 = 1|119831 119832(1) 119832(0)] = ℙ[119863119894 = 1|119831] Include covariates such that conditional on them the
treatment assignment does not depend on the potentialoutcomes
bull Another way use DAGs and look at back-door paths
Backdoor paths and blocking paths
bull Backdoor path is a non-causal path from 119863 to 119884 Would remain if we removed any arrows pointing out of 119863
bull Backdoor paths between 119863 and 119884 common causes of 119863and 119884
119863
119883
119884
bull Here there is a backdoor path 119863 larr 119883 rarr 119884 where 119883 is acommon cause for the treatment and the outcome
Other types of confounding
119863
119880 119883
119884
bull 119863 is enrolling in a job training programbull 119884 is getting a jobbull 119880 is being motivatedbull 119883 is number of job applications sent outbull Big assumption here no arrow from 119880 to 119884
Other types of confounding
119863
119880 119883
119884
bull 119863 is exercisebull 119884 is having a diseasebull 119880 is lifestylebull 119883 is smokingbull Big assumption here no arrow from 119880 to 119884
Whatrsquos the problem with backdoorpaths
119863
119880 119883
119884
bull A path is blocked if1 we control for or stratify a non-collider on that path OR2 we do not control for a collider
bull Unblocked backdoor paths confoundingbull In the DAG here if we condition on 119883 then the backdoor
path is blocked
Not all backdoor paths
119863
1198801113568119883119883
119884
bull Conditioning on the posttreatment covariates opens thenon-causal path
selection bias
M-bias
119863
1198801113568 1198801113569119883119883
119884
bull Not all backdoor paths induce confoundingbull This backdoor path is blocked by the collider 119883119894 that we donrsquot
control forbull If we control for 119883119894 opens the path and induces
confounding Sometimes called M-bias
bull Controversial because of differing views on what to control for Rubin thinks that M-bias is a ldquomathematical curiosityrdquo and we
should control for all pretreatment variables Pearl and others think M-bias is a real threat
Backdoor criterion
bull Can we use a DAG to evaluate no unmeasured confoundersbull Pearl answered yes with the backdoor criterion which states
that the effect of 119863 on 119884 is identified if1 No backdoor paths from 119863 to 119884 OR2 Measured covariates are sufficient to block all backdoor paths
from 119863 to 119884
bull First is really only valid for randomized experimentsbull The backdoor criterion is fairly powerful Tells us
if there confounding given this DAG if it is possible to removing the confounding and what variables to condition on to eliminate the confounding
SWIGs
119863 | 119889 119884(119889)
119880 119883
119884
bull Itrsquos a little hard to see how the backdoor criterion implies nounmeasured confounders
No potential outcomes on this graphbull Richardson and Robins Single World Intervention Graphs
Split 119863 node into natural value (119863) and intervention value 119889 Let all effects of 119863 take their potential value under
intervention 119884(119889)bull Now can see are 119863 and 119884(119889) related
119863 larr 119880 rarr 119883 rarr 119884(119889) implies not independent Conditioning on 119883 blocks that backdoor path 119863 ⟂⟂ 119884(119889)|119883
No unmeasured confounders is nottestable
bull No unmeasured confounding places no restrictions on theobserved data
1114100119884119894(0)|119863119894 = 1119883119894111410311140591113840111384011138401113840111384011138401113840111384011138401113840111406011138401113840111384011138401113840111384011138401113840111384011138401114061unobserved
119889= 1114100119884119894(0)|119863119894 = 0119883119894111410311140591113840111384011138401113840111384011138401113840111384011138401113840111406011138401113840111384011138401113840111384011138401113840111384011138401114061observed
bull Here 119889= means equal in distributionbull No way to directly test this assumption without the
counterfactual data which is missing by definitionbull With backdoor criterion you must have the correct DAG
Assessing no unmeasured confounders
bull Can do ldquoplacebordquo tests where 119863119894 cannot have an effect(lagged outcomes etc)
bull Della Vigna and Kaplan (2007 QJE) effect of Fox Newsavailability on Republican vote share
Availability in 20002003 canrsquot affect past vote sharesbull Unconfoundedness could still be violated even if you pass this
test
Alternatives to no unmeasuredconfounding
bull Without explicit randomization we need some way ofidentifying causal effects
bull No unmeasured confounders asymp randomized experiment Indentification results very similar to experiments
bull With unmeasured confounding are we doomed Maybe notbull Other approaches rely on finding plausibly exogenous variation
in assignment of 119863119894 Instrumental variables (randomization + exclusion restriction) Over-time variation (diff-in-diff fixed effects) Arbitrary thresholds for treatment assignment (RDD)
3 No unmeasuredconfounders and OLS
Justifying regression
bull We know how randomized experiments imply thatdifferences-in-means identify the ATE
bull In the next few weeks wersquoll work through how no unmeasuredconfounding justifies a number of estimation strategies
bull Today itrsquos useful to walk through what no unmeasuredconfounding can buy us in a familiar setting OLS
Wersquoll cover regression more formally later
Constant effects set up
bull Assume a constant effects setup
119884119894(0) = 120572 + 119883prime119894 120573 + 119906119894
119884119894(1) = 120572 + 120591 + 119883prime119894 120573 + 119906119894
bull Constant effects because 119884119894(1) minus 119884119894(0) = 120591 for all unitsbull Use consistency to get the usual regression formula
119884119894 = 119884119894(1)119863119894 + 119884119894(0)(1 minus 119863119894)= 119884119894(0) + 1114100119884119894(1) minus 119884119894(0)1114103 sdot 119863119894
= 120572 + 120591 sdot 119863119894 + 119883prime119894 120573 + 119906119894
bull Does no unmeasured confounding help us identify the causalparameter 120591
Regression on residuals
bull First estimate the residuals of regression of the treatment andoutcome on the covariates
119894 = 119884119894 minus 120124[119884119894|119883119894]119894 = 119863119894 minus 120124[119863119894|119883119894]
bull Running a regression of 119894 on 119894 is equivalent to controllingfor 119883119894
119884119894 = 120572 + 120591 sdot 119863119894 + 119883prime119894 120573 + 119906119894
119894 = 120572 + 120591 sdot 119894 + 119894bull Here 119894 = 119906119894 minus 120124[119906119894|119883119894]
What does OLS estimate
bull Using the usual OLS theory we can show that the probabilitylimit of the OLS estimator of 120591 is
plim 111369611136931113700 =Cov(119894 119894)Var(119894)
= Cov(119894 120572 + 120591 sdot 119894 + 119894)Var(119894)
= 120591 sdot Cov(119894 119894) + Cov(119894 119894)Var(119894)
= 120591 + Cov(119894 119894)Var(119894)
Key OLS assumption
plim 111369611136931113700 = 120591 + Cov(119894 119894)Var(119894)
bull Key identification comes from Cov(119894 119894) = 0 Conditional on 119883119894 no relationship between 119863119894 and 119906119894
bull Note 119906119894 is a function of 119883119894 and 119884119894(119889) 119906119894 = 119884119894(0) minus 120572 minus 119883prime
119894 120573 when 119863119894 = 0 119906119894 = 119884119894(1) minus 120572 minus 120591 minus 119883prime
119894 120573 when 119863119894 = 1 condition on 119883119894 only variation in 119906119894 comes from 119884119894(119889)
bull No unmeasured confounding implies this assumption
119863119894 ⟂⟂ 1114100119884119894(1) 119884119894(0)1114103|119883119894 ⟹ 119863119894 ⟂⟂ 119906119894|119883119894 ⟹ Cov(119894 119894) = 0
Omitted variable bias
bull What happens when this is violated Suppose that there isone omitted variable (residualized from 119883119894)
119894 = 120582119894 + 120596119894
bull Wersquoll assume that if we could measure 119871119894 then nounmeasured confounding would hold
bull Leads to inconsistency in the OLS estimator
plim 111369611136931113700 = 120591 + 120582Cov(119894 119894)Var(119894)
bull Bias here is terms multiplied together1 coefficient on 119871119894 (120582)2 the coefficient of regression of 119863119894 on 119871119894 also controlling for 119883119894
4 Estimating causaleffects under nounmeasuredconfounders
Basic approach to estimation
bull Remember the usual approach to estimating the ATE withcovariates
bull Stratification Stratify the units by the covariates Calculate CATE within these strata
bull Standardizationdirect adjustment Average the CATEs across the strata to get ATE
bull How to create strata when 119883 has continuous components If 119883 is discrete with only a few levels can use the exact values
of 119883 Otherwise we may have to subclassifycoarsen the data
Classic example cigarspipes versuscigarettes
bull 119863119894 = 1 for pipecigar smokers 119863119894 = 0 for cigarette smokersbull 119884119894 = death in the first year of follow-upbull Naive positive effect cigarpipe smokers more likely to die
Whatrsquos the confounder here Age Pipecigar smokers much older than cigarette smokers
bull Cochranrsquos approach stratify based on coarsened age Divide age into 119896 strata 119878119894 isin 1199041113568 1199041113569 hellip 119904119896 1199041113568 might be 18-25 1199041113569 might be 26-35 and so on Calculate effect within strata and aggregate
bull Key assumption no unmeasured confounders using stratifiedversion of age
119863119894 ⟂⟂ 1114100119884119894(0) 119884119894(1)1114103|119878119894
Stratification on the propensity score
bull What about when 119883 has has many dimensionsbull Curse of dimensionality there will be very few if any units in
a given stratum of 119883119894bull Stratify on a low-dimensional summary the propensity score
119890(119909) = ℙ[119863119894 = 1|119883119894 = 119909]
PS = unitrsquos probability of being treated conditional on 119883119894
bull For a particular unit this is 119890(119883119894) = ℙ[119863119894 = 1|119883119894]bull Rosenbaum and Rubin (1983) showed that
119863119894 ⟂⟂ 1114100119884119894(0) 119884119894(1)1114103 | 119883119894 ⟹ 119863119894 ⟂⟂ 1114100119884119894(0) 119884119894(1)1114103 | 119890(119883119894)
stratifying on 119890119894 is the same as stratifying on the full 119883119894
Propensity score as balancing score
bull The propensity score is actually a balancing score whichmeans that
119863119894 ⟂⟂ 119883119894 | 119890(119883119894)
bull Conditional on the propensity score treatment is independentof the covariates
Treatment status is said to be balanced 119891(119883119894|119863119894 = 1 119890(119883119894)) = 119891(119883119894|119863119894 = 0 119890(119883119894))
bull Of course we have to know the true PS to have all theseresults work
Estimating the propensity score
bull Of course in observational studies we donrsquot know thepropensity score
bull We would run a parametric model with parameters 120574 toestimate the propensity scores
1 Estimate 2 Create 119894 = Pr[119863119894 = 1|119883119894 ]
bull For instance in R we could easily calculate the propensityscores using the glm function
pscores lt- glm(treat ~ var1 + var2 + var3 data = mydata
family = binomial())$fittedvalues
Propensity score specifics
bull What variables do we include in the propensity score model Any set of variables that blocks all the backdoor paths from 119863119894
to 119884119894
bull Check balance within strata of 119894 Covariates should bebalanced
119891(119883119894|119863119894 = 1 119894) = 119891(119883119894|119863119894 = 0 119894)
bull Can also use automatednonparametric tools for estimating 119894 Covariate Balancing Propensity Scores (Imai and Ratkovic)
Stratifying by the propensity score
bull How will we use the propensity score Matching (next week) Weighting (two weeks) Regression
(three weeks)bull Today coarsening the propensity score and stratifyingbull Choose boundary points 0 = 1198871113567 lt 1198871113568 lt hellip lt 119887119870minus1113568 lt 119887119870 = 1bull Create block indicators
119861119894(119896) =
⎧⎪⎪⎨⎪⎪⎩1 if 119887119896minus1113568 lt (119883119894) lt 1198871198960 otherwise
bull Calculate within-strata effect estimates
120591119896 = 120124[119884119894|119863119894 = 1 119861119894(119896) = 1] minus 120124[119884119894|119863119894 = 0 119861119894(119896) = 1]
Standardizationdirect adjustment
bull We calculated the CATEs for each strata of the PS 120591119896bull We can use law of iterated expectations to back out the ATEbull Take the average of the CATEs over the distribution of 119883
120591 =1198701114012119896=1113568
120591119896ℙ[119861119894(119896) = 1]
bull Note that ℙ[119861119894(119896) = 1] is just the proportion of units in block 119896
ℙ[119861119894(119896) = 1] =sum119873
119894=1113568 119861119894(119896)119873
5 Wrapping Up
Summary
bull Defined observational studiesbull Defined confounding and assessed when no unmeasured
confounding holdsbull Saw how no unmeasured confounding helps with OLSbull Saw how to estimate causal effects under no unmeasured
confounding using the propensity score
Next few weeks
bull Learn how to estimate causal effects under no unmeasuredconfounders via
Matching Weighting Regression
bull Then we move onto situations where no unmeasuredconfounders is violated
1 Observationalstudies
Experiment review
bull An experiment is a study where assignment to treatment iscontrolled by the researcher
119901119894 = ℙ[119863119894 = 1] be the probability of treatment assignmentprobability
119901119894 is controlled and known by researcher in an experimentbull A randomized experiment is an experiment with the following
properties
1 Positivity assignment is probabilistic 0 lt 119901119894 lt 1 No deterministic assignment
2 Unconfoundedness ℙ[119863119894 = 1|119832(1) 119832(0)] = ℙ[119863119894 = 1] Treatment assignment does not depend on any potential
outcomes Sometimes written as 119863119894 ⟂⟂ (119832(1) 119832(0))
Observational studies
bull Many different sets of identification assumptions that wersquollcover
bull To start focus on studies that are similar to experiments justwithout a known and controlled treatment assignment
No guarantee that the treatment and control groups arecomparable
1 Positivity assignment is probabilistic0 lt ℙ[119863119894 = 1|119831 119832(1) 119832(0)] lt 1
2 No unmeasured confoundingℙ[119863119894 = 1|119831 119832(1) 119832(0)] = ℙ[119863119894 = 1|119831]
For some observed 119831 Also called unconfoundedness ignorability selection on
observables no omitted variables exogenous conditionalexchangeable etc
Designing observational studies
bull Rubin (2008) argues that we should still ldquodesignrdquo ourobservational studies
Pick the ideal experiment to this observational study Hide the outcome data Try to estimate the randomization procedure Analyze this as an experiment with this estimated procedure
bull Tries to minimize ldquosnoopingrdquo by picking the best modelingstrategy before seeing the outcome
Discrete covariates
bull Suppose that we knew that 119863119894 was unconfounded within levelsof a binary 119883119894
bull Then we could always estimate the causal effect using iteratedexpectations as in a stratified randomized experiment
1201241198831114107120124[119884119894|119863119894 = 1119883119894] minus 120124[119884119894|119863119894 = 0119883119894]1114110
= 1114101120124[119884119894|119863119894 = 1119883119894 = 1] minus 120124[119884119894|119863119894 = 0119883119894 = 1]1114104111405911138401113840111384011138401113840111384011138401113840111384011138401113840111384011138401113840111384011138401113840111384011138401113840111384011138401113840111384011138401113840111384011138401113840111384011140601113840111384011138401113840111384011138401113840111384011138401113840111384011138401113840111384011138401113840111384011138401113840111384011138401113840111384011138401113840111384011138401113840111384011138401114061
diff-in-means for 119883119894=1113568
ℙ[119883119894 = 1]11140591113840111384011138401113840111406011138401113840111384011138401114061share of 119883119894=1113568
+ 1114101120124[119884119894|119863119894 = 1119883119894 = 0] minus 120124[119884119894|119863119894 = 0119883119894 = 0]1114104111405911138401113840111384011138401113840111384011138401113840111384011138401113840111384011138401113840111384011138401113840111384011138401113840111384011138401113840111384011138401113840111384011138401113840111384011140601113840111384011138401113840111384011138401113840111384011138401113840111384011138401113840111384011138401113840111384011138401113840111384011138401113840111384011138401113840111384011138401113840111384011138401114061
diff-in-means for 119883119894=1113567
ℙ[119883119894 = 0]11140591113840111384011138401113840111406011138401113840111384011138401114061share of 119883119894=1113567
bull Never used our knowledge of the randomization for thisquantity
Continuous covariates
bull So great we can stratify Why not do this all the timebull What if 119883119894 = income for unit 119894
Each unit has its own value of 119883119894 $54134 $123043 $23842 If 119883119894 = 54134 is unique will only observe 1 of these
120124[119884119894|119863119894 = 1119883119894 = 54134] minus 120124[119884119894|119863119894 = 0119883119894 = 54134]
cannot stratify to each unique value of 119883119894bull Practically this is massively important almost always have
data with unique values
Going to a superpopulation
bull From here on out wersquoll focus less on the finite populationmodel
Harder with (functionally) continuous covariatesbull Assume that each unit 119894 is drawn from an infinite
superpopulation implies that (119884119894(0) 119884119894(1) 119863119894 119883119894) are a draw from their
population joint distributionbull Potential outcomes are now typical random variables
120583119888(119909) = 120124[119884119894(0)|119883119894 = 119909] and 120583119905(119909) = 120124[119884119894(1)|119883119894 = 119909] 1205901113569119888 (119909) = 120141[119884119894(0)|119883119894 = 119909] and 1205901113569119905 (119909) = 120141[119884119894(1)|119883119894 = 119909] 120591 = 120124[120583119905(119909) minus 120583119888(119909)|119883119894 = 119909]
Assumptions in the superpopulation
bull With an infinite superpopulation worry less aboutconditioning on the entire sample
Units are now independent due to random sampling from aninfinite population
bull No unmeasured confoudning implies that
ℙ(119863119894 = 1|119884119894(0) 119884119894(1) 119883119894) = ℙ(119863119894 = 1|119883119894)
bull Or written using conditional independence
119863119894 ⟂⟂ 1114100119884119894(0) 119884119894(1)1114103|119883119894
bull Positivity can be written 0 lt ℙ(119863119894 = 1|119883119894 = 119909) lt 1 for all 119909 inthe support of 119883119894
2 Confounding
What is confounding
bull Confounding is the bias caused by common causes of thetreatment and outcome
Leads to ldquospurious correlationrdquobull In observational studies the goal is to avoid confounding
inherent in the databull Pervasive in the social sciences
effect of income on voting (confounding age) effect of job training program on employment (confounding
motivation) effect of political institutions on economic development
(confounding previous economic development)bull No unmeasured confounding assumes that wersquove measured all
sources of confounding
Big problem
bull How can we determine if no unmeasured confounding holds ifwe didnrsquot assign the treatment
bull Put differently What covariates do we need to condition on What covariates do we need to match on What covaraites do we need to include in our regressions
bull One way from the assumption itself ℙ[119863119894 = 1|119831 119832(1) 119832(0)] = ℙ[119863119894 = 1|119831] Include covariates such that conditional on them the
treatment assignment does not depend on the potentialoutcomes
bull Another way use DAGs and look at back-door paths
Backdoor paths and blocking paths
bull Backdoor path is a non-causal path from 119863 to 119884 Would remain if we removed any arrows pointing out of 119863
bull Backdoor paths between 119863 and 119884 common causes of 119863and 119884
119863
119883
119884
bull Here there is a backdoor path 119863 larr 119883 rarr 119884 where 119883 is acommon cause for the treatment and the outcome
Other types of confounding
119863
119880 119883
119884
bull 119863 is enrolling in a job training programbull 119884 is getting a jobbull 119880 is being motivatedbull 119883 is number of job applications sent outbull Big assumption here no arrow from 119880 to 119884
Other types of confounding
119863
119880 119883
119884
bull 119863 is exercisebull 119884 is having a diseasebull 119880 is lifestylebull 119883 is smokingbull Big assumption here no arrow from 119880 to 119884
Whatrsquos the problem with backdoorpaths
119863
119880 119883
119884
bull A path is blocked if1 we control for or stratify a non-collider on that path OR2 we do not control for a collider
bull Unblocked backdoor paths confoundingbull In the DAG here if we condition on 119883 then the backdoor
path is blocked
Not all backdoor paths
119863
1198801113568119883119883
119884
bull Conditioning on the posttreatment covariates opens thenon-causal path
selection bias
M-bias
119863
1198801113568 1198801113569119883119883
119884
bull Not all backdoor paths induce confoundingbull This backdoor path is blocked by the collider 119883119894 that we donrsquot
control forbull If we control for 119883119894 opens the path and induces
confounding Sometimes called M-bias
bull Controversial because of differing views on what to control for Rubin thinks that M-bias is a ldquomathematical curiosityrdquo and we
should control for all pretreatment variables Pearl and others think M-bias is a real threat
Backdoor criterion
bull Can we use a DAG to evaluate no unmeasured confoundersbull Pearl answered yes with the backdoor criterion which states
that the effect of 119863 on 119884 is identified if1 No backdoor paths from 119863 to 119884 OR2 Measured covariates are sufficient to block all backdoor paths
from 119863 to 119884
bull First is really only valid for randomized experimentsbull The backdoor criterion is fairly powerful Tells us
if there confounding given this DAG if it is possible to removing the confounding and what variables to condition on to eliminate the confounding
SWIGs
119863 | 119889 119884(119889)
119880 119883
119884
bull Itrsquos a little hard to see how the backdoor criterion implies nounmeasured confounders
No potential outcomes on this graphbull Richardson and Robins Single World Intervention Graphs
Split 119863 node into natural value (119863) and intervention value 119889 Let all effects of 119863 take their potential value under
intervention 119884(119889)bull Now can see are 119863 and 119884(119889) related
119863 larr 119880 rarr 119883 rarr 119884(119889) implies not independent Conditioning on 119883 blocks that backdoor path 119863 ⟂⟂ 119884(119889)|119883
No unmeasured confounders is nottestable
bull No unmeasured confounding places no restrictions on theobserved data
1114100119884119894(0)|119863119894 = 1119883119894111410311140591113840111384011138401113840111384011138401113840111384011138401113840111406011138401113840111384011138401113840111384011138401113840111384011138401114061unobserved
119889= 1114100119884119894(0)|119863119894 = 0119883119894111410311140591113840111384011138401113840111384011138401113840111384011138401113840111406011138401113840111384011138401113840111384011138401113840111384011138401114061observed
bull Here 119889= means equal in distributionbull No way to directly test this assumption without the
counterfactual data which is missing by definitionbull With backdoor criterion you must have the correct DAG
Assessing no unmeasured confounders
bull Can do ldquoplacebordquo tests where 119863119894 cannot have an effect(lagged outcomes etc)
bull Della Vigna and Kaplan (2007 QJE) effect of Fox Newsavailability on Republican vote share
Availability in 20002003 canrsquot affect past vote sharesbull Unconfoundedness could still be violated even if you pass this
test
Alternatives to no unmeasuredconfounding
bull Without explicit randomization we need some way ofidentifying causal effects
bull No unmeasured confounders asymp randomized experiment Indentification results very similar to experiments
bull With unmeasured confounding are we doomed Maybe notbull Other approaches rely on finding plausibly exogenous variation
in assignment of 119863119894 Instrumental variables (randomization + exclusion restriction) Over-time variation (diff-in-diff fixed effects) Arbitrary thresholds for treatment assignment (RDD)
3 No unmeasuredconfounders and OLS
Justifying regression
bull We know how randomized experiments imply thatdifferences-in-means identify the ATE
bull In the next few weeks wersquoll work through how no unmeasuredconfounding justifies a number of estimation strategies
bull Today itrsquos useful to walk through what no unmeasuredconfounding can buy us in a familiar setting OLS
Wersquoll cover regression more formally later
Constant effects set up
bull Assume a constant effects setup
119884119894(0) = 120572 + 119883prime119894 120573 + 119906119894
119884119894(1) = 120572 + 120591 + 119883prime119894 120573 + 119906119894
bull Constant effects because 119884119894(1) minus 119884119894(0) = 120591 for all unitsbull Use consistency to get the usual regression formula
119884119894 = 119884119894(1)119863119894 + 119884119894(0)(1 minus 119863119894)= 119884119894(0) + 1114100119884119894(1) minus 119884119894(0)1114103 sdot 119863119894
= 120572 + 120591 sdot 119863119894 + 119883prime119894 120573 + 119906119894
bull Does no unmeasured confounding help us identify the causalparameter 120591
Regression on residuals
bull First estimate the residuals of regression of the treatment andoutcome on the covariates
119894 = 119884119894 minus 120124[119884119894|119883119894]119894 = 119863119894 minus 120124[119863119894|119883119894]
bull Running a regression of 119894 on 119894 is equivalent to controllingfor 119883119894
119884119894 = 120572 + 120591 sdot 119863119894 + 119883prime119894 120573 + 119906119894
119894 = 120572 + 120591 sdot 119894 + 119894bull Here 119894 = 119906119894 minus 120124[119906119894|119883119894]
What does OLS estimate
bull Using the usual OLS theory we can show that the probabilitylimit of the OLS estimator of 120591 is
plim 111369611136931113700 =Cov(119894 119894)Var(119894)
= Cov(119894 120572 + 120591 sdot 119894 + 119894)Var(119894)
= 120591 sdot Cov(119894 119894) + Cov(119894 119894)Var(119894)
= 120591 + Cov(119894 119894)Var(119894)
Key OLS assumption
plim 111369611136931113700 = 120591 + Cov(119894 119894)Var(119894)
bull Key identification comes from Cov(119894 119894) = 0 Conditional on 119883119894 no relationship between 119863119894 and 119906119894
bull Note 119906119894 is a function of 119883119894 and 119884119894(119889) 119906119894 = 119884119894(0) minus 120572 minus 119883prime
119894 120573 when 119863119894 = 0 119906119894 = 119884119894(1) minus 120572 minus 120591 minus 119883prime
119894 120573 when 119863119894 = 1 condition on 119883119894 only variation in 119906119894 comes from 119884119894(119889)
bull No unmeasured confounding implies this assumption
119863119894 ⟂⟂ 1114100119884119894(1) 119884119894(0)1114103|119883119894 ⟹ 119863119894 ⟂⟂ 119906119894|119883119894 ⟹ Cov(119894 119894) = 0
Omitted variable bias
bull What happens when this is violated Suppose that there isone omitted variable (residualized from 119883119894)
119894 = 120582119894 + 120596119894
bull Wersquoll assume that if we could measure 119871119894 then nounmeasured confounding would hold
bull Leads to inconsistency in the OLS estimator
plim 111369611136931113700 = 120591 + 120582Cov(119894 119894)Var(119894)
bull Bias here is terms multiplied together1 coefficient on 119871119894 (120582)2 the coefficient of regression of 119863119894 on 119871119894 also controlling for 119883119894
4 Estimating causaleffects under nounmeasuredconfounders
Basic approach to estimation
bull Remember the usual approach to estimating the ATE withcovariates
bull Stratification Stratify the units by the covariates Calculate CATE within these strata
bull Standardizationdirect adjustment Average the CATEs across the strata to get ATE
bull How to create strata when 119883 has continuous components If 119883 is discrete with only a few levels can use the exact values
of 119883 Otherwise we may have to subclassifycoarsen the data
Classic example cigarspipes versuscigarettes
bull 119863119894 = 1 for pipecigar smokers 119863119894 = 0 for cigarette smokersbull 119884119894 = death in the first year of follow-upbull Naive positive effect cigarpipe smokers more likely to die
Whatrsquos the confounder here Age Pipecigar smokers much older than cigarette smokers
bull Cochranrsquos approach stratify based on coarsened age Divide age into 119896 strata 119878119894 isin 1199041113568 1199041113569 hellip 119904119896 1199041113568 might be 18-25 1199041113569 might be 26-35 and so on Calculate effect within strata and aggregate
bull Key assumption no unmeasured confounders using stratifiedversion of age
119863119894 ⟂⟂ 1114100119884119894(0) 119884119894(1)1114103|119878119894
Stratification on the propensity score
bull What about when 119883 has has many dimensionsbull Curse of dimensionality there will be very few if any units in
a given stratum of 119883119894bull Stratify on a low-dimensional summary the propensity score
119890(119909) = ℙ[119863119894 = 1|119883119894 = 119909]
PS = unitrsquos probability of being treated conditional on 119883119894
bull For a particular unit this is 119890(119883119894) = ℙ[119863119894 = 1|119883119894]bull Rosenbaum and Rubin (1983) showed that
119863119894 ⟂⟂ 1114100119884119894(0) 119884119894(1)1114103 | 119883119894 ⟹ 119863119894 ⟂⟂ 1114100119884119894(0) 119884119894(1)1114103 | 119890(119883119894)
stratifying on 119890119894 is the same as stratifying on the full 119883119894
Propensity score as balancing score
bull The propensity score is actually a balancing score whichmeans that
119863119894 ⟂⟂ 119883119894 | 119890(119883119894)
bull Conditional on the propensity score treatment is independentof the covariates
Treatment status is said to be balanced 119891(119883119894|119863119894 = 1 119890(119883119894)) = 119891(119883119894|119863119894 = 0 119890(119883119894))
bull Of course we have to know the true PS to have all theseresults work
Estimating the propensity score
bull Of course in observational studies we donrsquot know thepropensity score
bull We would run a parametric model with parameters 120574 toestimate the propensity scores
1 Estimate 2 Create 119894 = Pr[119863119894 = 1|119883119894 ]
bull For instance in R we could easily calculate the propensityscores using the glm function
pscores lt- glm(treat ~ var1 + var2 + var3 data = mydata
family = binomial())$fittedvalues
Propensity score specifics
bull What variables do we include in the propensity score model Any set of variables that blocks all the backdoor paths from 119863119894
to 119884119894
bull Check balance within strata of 119894 Covariates should bebalanced
119891(119883119894|119863119894 = 1 119894) = 119891(119883119894|119863119894 = 0 119894)
bull Can also use automatednonparametric tools for estimating 119894 Covariate Balancing Propensity Scores (Imai and Ratkovic)
Stratifying by the propensity score
bull How will we use the propensity score Matching (next week) Weighting (two weeks) Regression
(three weeks)bull Today coarsening the propensity score and stratifyingbull Choose boundary points 0 = 1198871113567 lt 1198871113568 lt hellip lt 119887119870minus1113568 lt 119887119870 = 1bull Create block indicators
119861119894(119896) =
⎧⎪⎪⎨⎪⎪⎩1 if 119887119896minus1113568 lt (119883119894) lt 1198871198960 otherwise
bull Calculate within-strata effect estimates
120591119896 = 120124[119884119894|119863119894 = 1 119861119894(119896) = 1] minus 120124[119884119894|119863119894 = 0 119861119894(119896) = 1]
Standardizationdirect adjustment
bull We calculated the CATEs for each strata of the PS 120591119896bull We can use law of iterated expectations to back out the ATEbull Take the average of the CATEs over the distribution of 119883
120591 =1198701114012119896=1113568
120591119896ℙ[119861119894(119896) = 1]
bull Note that ℙ[119861119894(119896) = 1] is just the proportion of units in block 119896
ℙ[119861119894(119896) = 1] =sum119873
119894=1113568 119861119894(119896)119873
5 Wrapping Up
Summary
bull Defined observational studiesbull Defined confounding and assessed when no unmeasured
confounding holdsbull Saw how no unmeasured confounding helps with OLSbull Saw how to estimate causal effects under no unmeasured
confounding using the propensity score
Next few weeks
bull Learn how to estimate causal effects under no unmeasuredconfounders via
Matching Weighting Regression
bull Then we move onto situations where no unmeasuredconfounders is violated
Experiment review
bull An experiment is a study where assignment to treatment iscontrolled by the researcher
119901119894 = ℙ[119863119894 = 1] be the probability of treatment assignmentprobability
119901119894 is controlled and known by researcher in an experimentbull A randomized experiment is an experiment with the following
properties
1 Positivity assignment is probabilistic 0 lt 119901119894 lt 1 No deterministic assignment
2 Unconfoundedness ℙ[119863119894 = 1|119832(1) 119832(0)] = ℙ[119863119894 = 1] Treatment assignment does not depend on any potential
outcomes Sometimes written as 119863119894 ⟂⟂ (119832(1) 119832(0))
Observational studies
bull Many different sets of identification assumptions that wersquollcover
bull To start focus on studies that are similar to experiments justwithout a known and controlled treatment assignment
No guarantee that the treatment and control groups arecomparable
1 Positivity assignment is probabilistic0 lt ℙ[119863119894 = 1|119831 119832(1) 119832(0)] lt 1
2 No unmeasured confoundingℙ[119863119894 = 1|119831 119832(1) 119832(0)] = ℙ[119863119894 = 1|119831]
For some observed 119831 Also called unconfoundedness ignorability selection on
observables no omitted variables exogenous conditionalexchangeable etc
Designing observational studies
bull Rubin (2008) argues that we should still ldquodesignrdquo ourobservational studies
Pick the ideal experiment to this observational study Hide the outcome data Try to estimate the randomization procedure Analyze this as an experiment with this estimated procedure
bull Tries to minimize ldquosnoopingrdquo by picking the best modelingstrategy before seeing the outcome
Discrete covariates
bull Suppose that we knew that 119863119894 was unconfounded within levelsof a binary 119883119894
bull Then we could always estimate the causal effect using iteratedexpectations as in a stratified randomized experiment
1201241198831114107120124[119884119894|119863119894 = 1119883119894] minus 120124[119884119894|119863119894 = 0119883119894]1114110
= 1114101120124[119884119894|119863119894 = 1119883119894 = 1] minus 120124[119884119894|119863119894 = 0119883119894 = 1]1114104111405911138401113840111384011138401113840111384011138401113840111384011138401113840111384011138401113840111384011138401113840111384011138401113840111384011138401113840111384011138401113840111384011138401113840111384011140601113840111384011138401113840111384011138401113840111384011138401113840111384011138401113840111384011138401113840111384011138401113840111384011138401113840111384011138401113840111384011138401113840111384011138401114061
diff-in-means for 119883119894=1113568
ℙ[119883119894 = 1]11140591113840111384011138401113840111406011138401113840111384011138401114061share of 119883119894=1113568
+ 1114101120124[119884119894|119863119894 = 1119883119894 = 0] minus 120124[119884119894|119863119894 = 0119883119894 = 0]1114104111405911138401113840111384011138401113840111384011138401113840111384011138401113840111384011138401113840111384011138401113840111384011138401113840111384011138401113840111384011138401113840111384011138401113840111384011140601113840111384011138401113840111384011138401113840111384011138401113840111384011138401113840111384011138401113840111384011138401113840111384011138401113840111384011138401113840111384011138401113840111384011138401114061
diff-in-means for 119883119894=1113567
ℙ[119883119894 = 0]11140591113840111384011138401113840111406011138401113840111384011138401114061share of 119883119894=1113567
bull Never used our knowledge of the randomization for thisquantity
Continuous covariates
bull So great we can stratify Why not do this all the timebull What if 119883119894 = income for unit 119894
Each unit has its own value of 119883119894 $54134 $123043 $23842 If 119883119894 = 54134 is unique will only observe 1 of these
120124[119884119894|119863119894 = 1119883119894 = 54134] minus 120124[119884119894|119863119894 = 0119883119894 = 54134]
cannot stratify to each unique value of 119883119894bull Practically this is massively important almost always have
data with unique values
Going to a superpopulation
bull From here on out wersquoll focus less on the finite populationmodel
Harder with (functionally) continuous covariatesbull Assume that each unit 119894 is drawn from an infinite
superpopulation implies that (119884119894(0) 119884119894(1) 119863119894 119883119894) are a draw from their
population joint distributionbull Potential outcomes are now typical random variables
120583119888(119909) = 120124[119884119894(0)|119883119894 = 119909] and 120583119905(119909) = 120124[119884119894(1)|119883119894 = 119909] 1205901113569119888 (119909) = 120141[119884119894(0)|119883119894 = 119909] and 1205901113569119905 (119909) = 120141[119884119894(1)|119883119894 = 119909] 120591 = 120124[120583119905(119909) minus 120583119888(119909)|119883119894 = 119909]
Assumptions in the superpopulation
bull With an infinite superpopulation worry less aboutconditioning on the entire sample
Units are now independent due to random sampling from aninfinite population
bull No unmeasured confoudning implies that
ℙ(119863119894 = 1|119884119894(0) 119884119894(1) 119883119894) = ℙ(119863119894 = 1|119883119894)
bull Or written using conditional independence
119863119894 ⟂⟂ 1114100119884119894(0) 119884119894(1)1114103|119883119894
bull Positivity can be written 0 lt ℙ(119863119894 = 1|119883119894 = 119909) lt 1 for all 119909 inthe support of 119883119894
2 Confounding
What is confounding
bull Confounding is the bias caused by common causes of thetreatment and outcome
Leads to ldquospurious correlationrdquobull In observational studies the goal is to avoid confounding
inherent in the databull Pervasive in the social sciences
effect of income on voting (confounding age) effect of job training program on employment (confounding
motivation) effect of political institutions on economic development
(confounding previous economic development)bull No unmeasured confounding assumes that wersquove measured all
sources of confounding
Big problem
bull How can we determine if no unmeasured confounding holds ifwe didnrsquot assign the treatment
bull Put differently What covariates do we need to condition on What covariates do we need to match on What covaraites do we need to include in our regressions
bull One way from the assumption itself ℙ[119863119894 = 1|119831 119832(1) 119832(0)] = ℙ[119863119894 = 1|119831] Include covariates such that conditional on them the
treatment assignment does not depend on the potentialoutcomes
bull Another way use DAGs and look at back-door paths
Backdoor paths and blocking paths
bull Backdoor path is a non-causal path from 119863 to 119884 Would remain if we removed any arrows pointing out of 119863
bull Backdoor paths between 119863 and 119884 common causes of 119863and 119884
119863
119883
119884
bull Here there is a backdoor path 119863 larr 119883 rarr 119884 where 119883 is acommon cause for the treatment and the outcome
Other types of confounding
119863
119880 119883
119884
bull 119863 is enrolling in a job training programbull 119884 is getting a jobbull 119880 is being motivatedbull 119883 is number of job applications sent outbull Big assumption here no arrow from 119880 to 119884
Other types of confounding
119863
119880 119883
119884
bull 119863 is exercisebull 119884 is having a diseasebull 119880 is lifestylebull 119883 is smokingbull Big assumption here no arrow from 119880 to 119884
Whatrsquos the problem with backdoorpaths
119863
119880 119883
119884
bull A path is blocked if1 we control for or stratify a non-collider on that path OR2 we do not control for a collider
bull Unblocked backdoor paths confoundingbull In the DAG here if we condition on 119883 then the backdoor
path is blocked
Not all backdoor paths
119863
1198801113568119883119883
119884
bull Conditioning on the posttreatment covariates opens thenon-causal path
selection bias
M-bias
119863
1198801113568 1198801113569119883119883
119884
bull Not all backdoor paths induce confoundingbull This backdoor path is blocked by the collider 119883119894 that we donrsquot
control forbull If we control for 119883119894 opens the path and induces
confounding Sometimes called M-bias
bull Controversial because of differing views on what to control for Rubin thinks that M-bias is a ldquomathematical curiosityrdquo and we
should control for all pretreatment variables Pearl and others think M-bias is a real threat
Backdoor criterion
bull Can we use a DAG to evaluate no unmeasured confoundersbull Pearl answered yes with the backdoor criterion which states
that the effect of 119863 on 119884 is identified if1 No backdoor paths from 119863 to 119884 OR2 Measured covariates are sufficient to block all backdoor paths
from 119863 to 119884
bull First is really only valid for randomized experimentsbull The backdoor criterion is fairly powerful Tells us
if there confounding given this DAG if it is possible to removing the confounding and what variables to condition on to eliminate the confounding
SWIGs
119863 | 119889 119884(119889)
119880 119883
119884
bull Itrsquos a little hard to see how the backdoor criterion implies nounmeasured confounders
No potential outcomes on this graphbull Richardson and Robins Single World Intervention Graphs
Split 119863 node into natural value (119863) and intervention value 119889 Let all effects of 119863 take their potential value under
intervention 119884(119889)bull Now can see are 119863 and 119884(119889) related
119863 larr 119880 rarr 119883 rarr 119884(119889) implies not independent Conditioning on 119883 blocks that backdoor path 119863 ⟂⟂ 119884(119889)|119883
No unmeasured confounders is nottestable
bull No unmeasured confounding places no restrictions on theobserved data
1114100119884119894(0)|119863119894 = 1119883119894111410311140591113840111384011138401113840111384011138401113840111384011138401113840111406011138401113840111384011138401113840111384011138401113840111384011138401114061unobserved
119889= 1114100119884119894(0)|119863119894 = 0119883119894111410311140591113840111384011138401113840111384011138401113840111384011138401113840111406011138401113840111384011138401113840111384011138401113840111384011138401114061observed
bull Here 119889= means equal in distributionbull No way to directly test this assumption without the
counterfactual data which is missing by definitionbull With backdoor criterion you must have the correct DAG
Assessing no unmeasured confounders
bull Can do ldquoplacebordquo tests where 119863119894 cannot have an effect(lagged outcomes etc)
bull Della Vigna and Kaplan (2007 QJE) effect of Fox Newsavailability on Republican vote share
Availability in 20002003 canrsquot affect past vote sharesbull Unconfoundedness could still be violated even if you pass this
test
Alternatives to no unmeasuredconfounding
bull Without explicit randomization we need some way ofidentifying causal effects
bull No unmeasured confounders asymp randomized experiment Indentification results very similar to experiments
bull With unmeasured confounding are we doomed Maybe notbull Other approaches rely on finding plausibly exogenous variation
in assignment of 119863119894 Instrumental variables (randomization + exclusion restriction) Over-time variation (diff-in-diff fixed effects) Arbitrary thresholds for treatment assignment (RDD)
3 No unmeasuredconfounders and OLS
Justifying regression
bull We know how randomized experiments imply thatdifferences-in-means identify the ATE
bull In the next few weeks wersquoll work through how no unmeasuredconfounding justifies a number of estimation strategies
bull Today itrsquos useful to walk through what no unmeasuredconfounding can buy us in a familiar setting OLS
Wersquoll cover regression more formally later
Constant effects set up
bull Assume a constant effects setup
119884119894(0) = 120572 + 119883prime119894 120573 + 119906119894
119884119894(1) = 120572 + 120591 + 119883prime119894 120573 + 119906119894
bull Constant effects because 119884119894(1) minus 119884119894(0) = 120591 for all unitsbull Use consistency to get the usual regression formula
119884119894 = 119884119894(1)119863119894 + 119884119894(0)(1 minus 119863119894)= 119884119894(0) + 1114100119884119894(1) minus 119884119894(0)1114103 sdot 119863119894
= 120572 + 120591 sdot 119863119894 + 119883prime119894 120573 + 119906119894
bull Does no unmeasured confounding help us identify the causalparameter 120591
Regression on residuals
bull First estimate the residuals of regression of the treatment andoutcome on the covariates
119894 = 119884119894 minus 120124[119884119894|119883119894]119894 = 119863119894 minus 120124[119863119894|119883119894]
bull Running a regression of 119894 on 119894 is equivalent to controllingfor 119883119894
119884119894 = 120572 + 120591 sdot 119863119894 + 119883prime119894 120573 + 119906119894
119894 = 120572 + 120591 sdot 119894 + 119894bull Here 119894 = 119906119894 minus 120124[119906119894|119883119894]
What does OLS estimate
bull Using the usual OLS theory we can show that the probabilitylimit of the OLS estimator of 120591 is
plim 111369611136931113700 =Cov(119894 119894)Var(119894)
= Cov(119894 120572 + 120591 sdot 119894 + 119894)Var(119894)
= 120591 sdot Cov(119894 119894) + Cov(119894 119894)Var(119894)
= 120591 + Cov(119894 119894)Var(119894)
Key OLS assumption
plim 111369611136931113700 = 120591 + Cov(119894 119894)Var(119894)
bull Key identification comes from Cov(119894 119894) = 0 Conditional on 119883119894 no relationship between 119863119894 and 119906119894
bull Note 119906119894 is a function of 119883119894 and 119884119894(119889) 119906119894 = 119884119894(0) minus 120572 minus 119883prime
119894 120573 when 119863119894 = 0 119906119894 = 119884119894(1) minus 120572 minus 120591 minus 119883prime
119894 120573 when 119863119894 = 1 condition on 119883119894 only variation in 119906119894 comes from 119884119894(119889)
bull No unmeasured confounding implies this assumption
119863119894 ⟂⟂ 1114100119884119894(1) 119884119894(0)1114103|119883119894 ⟹ 119863119894 ⟂⟂ 119906119894|119883119894 ⟹ Cov(119894 119894) = 0
Omitted variable bias
bull What happens when this is violated Suppose that there isone omitted variable (residualized from 119883119894)
119894 = 120582119894 + 120596119894
bull Wersquoll assume that if we could measure 119871119894 then nounmeasured confounding would hold
bull Leads to inconsistency in the OLS estimator
plim 111369611136931113700 = 120591 + 120582Cov(119894 119894)Var(119894)
bull Bias here is terms multiplied together1 coefficient on 119871119894 (120582)2 the coefficient of regression of 119863119894 on 119871119894 also controlling for 119883119894
4 Estimating causaleffects under nounmeasuredconfounders
Basic approach to estimation
bull Remember the usual approach to estimating the ATE withcovariates
bull Stratification Stratify the units by the covariates Calculate CATE within these strata
bull Standardizationdirect adjustment Average the CATEs across the strata to get ATE
bull How to create strata when 119883 has continuous components If 119883 is discrete with only a few levels can use the exact values
of 119883 Otherwise we may have to subclassifycoarsen the data
Classic example cigarspipes versuscigarettes
bull 119863119894 = 1 for pipecigar smokers 119863119894 = 0 for cigarette smokersbull 119884119894 = death in the first year of follow-upbull Naive positive effect cigarpipe smokers more likely to die
Whatrsquos the confounder here Age Pipecigar smokers much older than cigarette smokers
bull Cochranrsquos approach stratify based on coarsened age Divide age into 119896 strata 119878119894 isin 1199041113568 1199041113569 hellip 119904119896 1199041113568 might be 18-25 1199041113569 might be 26-35 and so on Calculate effect within strata and aggregate
bull Key assumption no unmeasured confounders using stratifiedversion of age
119863119894 ⟂⟂ 1114100119884119894(0) 119884119894(1)1114103|119878119894
Stratification on the propensity score
bull What about when 119883 has has many dimensionsbull Curse of dimensionality there will be very few if any units in
a given stratum of 119883119894bull Stratify on a low-dimensional summary the propensity score
119890(119909) = ℙ[119863119894 = 1|119883119894 = 119909]
PS = unitrsquos probability of being treated conditional on 119883119894
bull For a particular unit this is 119890(119883119894) = ℙ[119863119894 = 1|119883119894]bull Rosenbaum and Rubin (1983) showed that
119863119894 ⟂⟂ 1114100119884119894(0) 119884119894(1)1114103 | 119883119894 ⟹ 119863119894 ⟂⟂ 1114100119884119894(0) 119884119894(1)1114103 | 119890(119883119894)
stratifying on 119890119894 is the same as stratifying on the full 119883119894
Propensity score as balancing score
bull The propensity score is actually a balancing score whichmeans that
119863119894 ⟂⟂ 119883119894 | 119890(119883119894)
bull Conditional on the propensity score treatment is independentof the covariates
Treatment status is said to be balanced 119891(119883119894|119863119894 = 1 119890(119883119894)) = 119891(119883119894|119863119894 = 0 119890(119883119894))
bull Of course we have to know the true PS to have all theseresults work
Estimating the propensity score
bull Of course in observational studies we donrsquot know thepropensity score
bull We would run a parametric model with parameters 120574 toestimate the propensity scores
1 Estimate 2 Create 119894 = Pr[119863119894 = 1|119883119894 ]
bull For instance in R we could easily calculate the propensityscores using the glm function
pscores lt- glm(treat ~ var1 + var2 + var3 data = mydata
family = binomial())$fittedvalues
Propensity score specifics
bull What variables do we include in the propensity score model Any set of variables that blocks all the backdoor paths from 119863119894
to 119884119894
bull Check balance within strata of 119894 Covariates should bebalanced
119891(119883119894|119863119894 = 1 119894) = 119891(119883119894|119863119894 = 0 119894)
bull Can also use automatednonparametric tools for estimating 119894 Covariate Balancing Propensity Scores (Imai and Ratkovic)
Stratifying by the propensity score
bull How will we use the propensity score Matching (next week) Weighting (two weeks) Regression
(three weeks)bull Today coarsening the propensity score and stratifyingbull Choose boundary points 0 = 1198871113567 lt 1198871113568 lt hellip lt 119887119870minus1113568 lt 119887119870 = 1bull Create block indicators
119861119894(119896) =
⎧⎪⎪⎨⎪⎪⎩1 if 119887119896minus1113568 lt (119883119894) lt 1198871198960 otherwise
bull Calculate within-strata effect estimates
120591119896 = 120124[119884119894|119863119894 = 1 119861119894(119896) = 1] minus 120124[119884119894|119863119894 = 0 119861119894(119896) = 1]
Standardizationdirect adjustment
bull We calculated the CATEs for each strata of the PS 120591119896bull We can use law of iterated expectations to back out the ATEbull Take the average of the CATEs over the distribution of 119883
120591 =1198701114012119896=1113568
120591119896ℙ[119861119894(119896) = 1]
bull Note that ℙ[119861119894(119896) = 1] is just the proportion of units in block 119896
ℙ[119861119894(119896) = 1] =sum119873
119894=1113568 119861119894(119896)119873
5 Wrapping Up
Summary
bull Defined observational studiesbull Defined confounding and assessed when no unmeasured
confounding holdsbull Saw how no unmeasured confounding helps with OLSbull Saw how to estimate causal effects under no unmeasured
confounding using the propensity score
Next few weeks
bull Learn how to estimate causal effects under no unmeasuredconfounders via
Matching Weighting Regression
bull Then we move onto situations where no unmeasuredconfounders is violated
Observational studies
bull Many different sets of identification assumptions that wersquollcover
bull To start focus on studies that are similar to experiments justwithout a known and controlled treatment assignment
No guarantee that the treatment and control groups arecomparable
1 Positivity assignment is probabilistic0 lt ℙ[119863119894 = 1|119831 119832(1) 119832(0)] lt 1
2 No unmeasured confoundingℙ[119863119894 = 1|119831 119832(1) 119832(0)] = ℙ[119863119894 = 1|119831]
For some observed 119831 Also called unconfoundedness ignorability selection on
observables no omitted variables exogenous conditionalexchangeable etc
Designing observational studies
bull Rubin (2008) argues that we should still ldquodesignrdquo ourobservational studies
Pick the ideal experiment to this observational study Hide the outcome data Try to estimate the randomization procedure Analyze this as an experiment with this estimated procedure
bull Tries to minimize ldquosnoopingrdquo by picking the best modelingstrategy before seeing the outcome
Discrete covariates
bull Suppose that we knew that 119863119894 was unconfounded within levelsof a binary 119883119894
bull Then we could always estimate the causal effect using iteratedexpectations as in a stratified randomized experiment
1201241198831114107120124[119884119894|119863119894 = 1119883119894] minus 120124[119884119894|119863119894 = 0119883119894]1114110
= 1114101120124[119884119894|119863119894 = 1119883119894 = 1] minus 120124[119884119894|119863119894 = 0119883119894 = 1]1114104111405911138401113840111384011138401113840111384011138401113840111384011138401113840111384011138401113840111384011138401113840111384011138401113840111384011138401113840111384011138401113840111384011138401113840111384011140601113840111384011138401113840111384011138401113840111384011138401113840111384011138401113840111384011138401113840111384011138401113840111384011138401113840111384011138401113840111384011138401113840111384011138401114061
diff-in-means for 119883119894=1113568
ℙ[119883119894 = 1]11140591113840111384011138401113840111406011138401113840111384011138401114061share of 119883119894=1113568
+ 1114101120124[119884119894|119863119894 = 1119883119894 = 0] minus 120124[119884119894|119863119894 = 0119883119894 = 0]1114104111405911138401113840111384011138401113840111384011138401113840111384011138401113840111384011138401113840111384011138401113840111384011138401113840111384011138401113840111384011138401113840111384011138401113840111384011140601113840111384011138401113840111384011138401113840111384011138401113840111384011138401113840111384011138401113840111384011138401113840111384011138401113840111384011138401113840111384011138401113840111384011138401114061
diff-in-means for 119883119894=1113567
ℙ[119883119894 = 0]11140591113840111384011138401113840111406011138401113840111384011138401114061share of 119883119894=1113567
bull Never used our knowledge of the randomization for thisquantity
Continuous covariates
bull So great we can stratify Why not do this all the timebull What if 119883119894 = income for unit 119894
Each unit has its own value of 119883119894 $54134 $123043 $23842 If 119883119894 = 54134 is unique will only observe 1 of these
120124[119884119894|119863119894 = 1119883119894 = 54134] minus 120124[119884119894|119863119894 = 0119883119894 = 54134]
cannot stratify to each unique value of 119883119894bull Practically this is massively important almost always have
data with unique values
Going to a superpopulation
bull From here on out wersquoll focus less on the finite populationmodel
Harder with (functionally) continuous covariatesbull Assume that each unit 119894 is drawn from an infinite
superpopulation implies that (119884119894(0) 119884119894(1) 119863119894 119883119894) are a draw from their
population joint distributionbull Potential outcomes are now typical random variables
120583119888(119909) = 120124[119884119894(0)|119883119894 = 119909] and 120583119905(119909) = 120124[119884119894(1)|119883119894 = 119909] 1205901113569119888 (119909) = 120141[119884119894(0)|119883119894 = 119909] and 1205901113569119905 (119909) = 120141[119884119894(1)|119883119894 = 119909] 120591 = 120124[120583119905(119909) minus 120583119888(119909)|119883119894 = 119909]
Assumptions in the superpopulation
bull With an infinite superpopulation worry less aboutconditioning on the entire sample
Units are now independent due to random sampling from aninfinite population
bull No unmeasured confoudning implies that
ℙ(119863119894 = 1|119884119894(0) 119884119894(1) 119883119894) = ℙ(119863119894 = 1|119883119894)
bull Or written using conditional independence
119863119894 ⟂⟂ 1114100119884119894(0) 119884119894(1)1114103|119883119894
bull Positivity can be written 0 lt ℙ(119863119894 = 1|119883119894 = 119909) lt 1 for all 119909 inthe support of 119883119894
2 Confounding
What is confounding
bull Confounding is the bias caused by common causes of thetreatment and outcome
Leads to ldquospurious correlationrdquobull In observational studies the goal is to avoid confounding
inherent in the databull Pervasive in the social sciences
effect of income on voting (confounding age) effect of job training program on employment (confounding
motivation) effect of political institutions on economic development
(confounding previous economic development)bull No unmeasured confounding assumes that wersquove measured all
sources of confounding
Big problem
bull How can we determine if no unmeasured confounding holds ifwe didnrsquot assign the treatment
bull Put differently What covariates do we need to condition on What covariates do we need to match on What covaraites do we need to include in our regressions
bull One way from the assumption itself ℙ[119863119894 = 1|119831 119832(1) 119832(0)] = ℙ[119863119894 = 1|119831] Include covariates such that conditional on them the
treatment assignment does not depend on the potentialoutcomes
bull Another way use DAGs and look at back-door paths
Backdoor paths and blocking paths
bull Backdoor path is a non-causal path from 119863 to 119884 Would remain if we removed any arrows pointing out of 119863
bull Backdoor paths between 119863 and 119884 common causes of 119863and 119884
119863
119883
119884
bull Here there is a backdoor path 119863 larr 119883 rarr 119884 where 119883 is acommon cause for the treatment and the outcome
Other types of confounding
119863
119880 119883
119884
bull 119863 is enrolling in a job training programbull 119884 is getting a jobbull 119880 is being motivatedbull 119883 is number of job applications sent outbull Big assumption here no arrow from 119880 to 119884
Other types of confounding
119863
119880 119883
119884
bull 119863 is exercisebull 119884 is having a diseasebull 119880 is lifestylebull 119883 is smokingbull Big assumption here no arrow from 119880 to 119884
Whatrsquos the problem with backdoorpaths
119863
119880 119883
119884
bull A path is blocked if1 we control for or stratify a non-collider on that path OR2 we do not control for a collider
bull Unblocked backdoor paths confoundingbull In the DAG here if we condition on 119883 then the backdoor
path is blocked
Not all backdoor paths
119863
1198801113568119883119883
119884
bull Conditioning on the posttreatment covariates opens thenon-causal path
selection bias
M-bias
119863
1198801113568 1198801113569119883119883
119884
bull Not all backdoor paths induce confoundingbull This backdoor path is blocked by the collider 119883119894 that we donrsquot
control forbull If we control for 119883119894 opens the path and induces
confounding Sometimes called M-bias
bull Controversial because of differing views on what to control for Rubin thinks that M-bias is a ldquomathematical curiosityrdquo and we
should control for all pretreatment variables Pearl and others think M-bias is a real threat
Backdoor criterion
bull Can we use a DAG to evaluate no unmeasured confoundersbull Pearl answered yes with the backdoor criterion which states
that the effect of 119863 on 119884 is identified if1 No backdoor paths from 119863 to 119884 OR2 Measured covariates are sufficient to block all backdoor paths
from 119863 to 119884
bull First is really only valid for randomized experimentsbull The backdoor criterion is fairly powerful Tells us
if there confounding given this DAG if it is possible to removing the confounding and what variables to condition on to eliminate the confounding
SWIGs
119863 | 119889 119884(119889)
119880 119883
119884
bull Itrsquos a little hard to see how the backdoor criterion implies nounmeasured confounders
No potential outcomes on this graphbull Richardson and Robins Single World Intervention Graphs
Split 119863 node into natural value (119863) and intervention value 119889 Let all effects of 119863 take their potential value under
intervention 119884(119889)bull Now can see are 119863 and 119884(119889) related
119863 larr 119880 rarr 119883 rarr 119884(119889) implies not independent Conditioning on 119883 blocks that backdoor path 119863 ⟂⟂ 119884(119889)|119883
No unmeasured confounders is nottestable
bull No unmeasured confounding places no restrictions on theobserved data
1114100119884119894(0)|119863119894 = 1119883119894111410311140591113840111384011138401113840111384011138401113840111384011138401113840111406011138401113840111384011138401113840111384011138401113840111384011138401114061unobserved
119889= 1114100119884119894(0)|119863119894 = 0119883119894111410311140591113840111384011138401113840111384011138401113840111384011138401113840111406011138401113840111384011138401113840111384011138401113840111384011138401114061observed
bull Here 119889= means equal in distributionbull No way to directly test this assumption without the
counterfactual data which is missing by definitionbull With backdoor criterion you must have the correct DAG
Assessing no unmeasured confounders
bull Can do ldquoplacebordquo tests where 119863119894 cannot have an effect(lagged outcomes etc)
bull Della Vigna and Kaplan (2007 QJE) effect of Fox Newsavailability on Republican vote share
Availability in 20002003 canrsquot affect past vote sharesbull Unconfoundedness could still be violated even if you pass this
test
Alternatives to no unmeasuredconfounding
bull Without explicit randomization we need some way ofidentifying causal effects
bull No unmeasured confounders asymp randomized experiment Indentification results very similar to experiments
bull With unmeasured confounding are we doomed Maybe notbull Other approaches rely on finding plausibly exogenous variation
in assignment of 119863119894 Instrumental variables (randomization + exclusion restriction) Over-time variation (diff-in-diff fixed effects) Arbitrary thresholds for treatment assignment (RDD)
3 No unmeasuredconfounders and OLS
Justifying regression
bull We know how randomized experiments imply thatdifferences-in-means identify the ATE
bull In the next few weeks wersquoll work through how no unmeasuredconfounding justifies a number of estimation strategies
bull Today itrsquos useful to walk through what no unmeasuredconfounding can buy us in a familiar setting OLS
Wersquoll cover regression more formally later
Constant effects set up
bull Assume a constant effects setup
119884119894(0) = 120572 + 119883prime119894 120573 + 119906119894
119884119894(1) = 120572 + 120591 + 119883prime119894 120573 + 119906119894
bull Constant effects because 119884119894(1) minus 119884119894(0) = 120591 for all unitsbull Use consistency to get the usual regression formula
119884119894 = 119884119894(1)119863119894 + 119884119894(0)(1 minus 119863119894)= 119884119894(0) + 1114100119884119894(1) minus 119884119894(0)1114103 sdot 119863119894
= 120572 + 120591 sdot 119863119894 + 119883prime119894 120573 + 119906119894
bull Does no unmeasured confounding help us identify the causalparameter 120591
Regression on residuals
bull First estimate the residuals of regression of the treatment andoutcome on the covariates
119894 = 119884119894 minus 120124[119884119894|119883119894]119894 = 119863119894 minus 120124[119863119894|119883119894]
bull Running a regression of 119894 on 119894 is equivalent to controllingfor 119883119894
119884119894 = 120572 + 120591 sdot 119863119894 + 119883prime119894 120573 + 119906119894
119894 = 120572 + 120591 sdot 119894 + 119894bull Here 119894 = 119906119894 minus 120124[119906119894|119883119894]
What does OLS estimate
bull Using the usual OLS theory we can show that the probabilitylimit of the OLS estimator of 120591 is
plim 111369611136931113700 =Cov(119894 119894)Var(119894)
= Cov(119894 120572 + 120591 sdot 119894 + 119894)Var(119894)
= 120591 sdot Cov(119894 119894) + Cov(119894 119894)Var(119894)
= 120591 + Cov(119894 119894)Var(119894)
Key OLS assumption
plim 111369611136931113700 = 120591 + Cov(119894 119894)Var(119894)
bull Key identification comes from Cov(119894 119894) = 0 Conditional on 119883119894 no relationship between 119863119894 and 119906119894
bull Note 119906119894 is a function of 119883119894 and 119884119894(119889) 119906119894 = 119884119894(0) minus 120572 minus 119883prime
119894 120573 when 119863119894 = 0 119906119894 = 119884119894(1) minus 120572 minus 120591 minus 119883prime
119894 120573 when 119863119894 = 1 condition on 119883119894 only variation in 119906119894 comes from 119884119894(119889)
bull No unmeasured confounding implies this assumption
119863119894 ⟂⟂ 1114100119884119894(1) 119884119894(0)1114103|119883119894 ⟹ 119863119894 ⟂⟂ 119906119894|119883119894 ⟹ Cov(119894 119894) = 0
Omitted variable bias
bull What happens when this is violated Suppose that there isone omitted variable (residualized from 119883119894)
119894 = 120582119894 + 120596119894
bull Wersquoll assume that if we could measure 119871119894 then nounmeasured confounding would hold
bull Leads to inconsistency in the OLS estimator
plim 111369611136931113700 = 120591 + 120582Cov(119894 119894)Var(119894)
bull Bias here is terms multiplied together1 coefficient on 119871119894 (120582)2 the coefficient of regression of 119863119894 on 119871119894 also controlling for 119883119894
4 Estimating causaleffects under nounmeasuredconfounders
Basic approach to estimation
bull Remember the usual approach to estimating the ATE withcovariates
bull Stratification Stratify the units by the covariates Calculate CATE within these strata
bull Standardizationdirect adjustment Average the CATEs across the strata to get ATE
bull How to create strata when 119883 has continuous components If 119883 is discrete with only a few levels can use the exact values
of 119883 Otherwise we may have to subclassifycoarsen the data
Classic example cigarspipes versuscigarettes
bull 119863119894 = 1 for pipecigar smokers 119863119894 = 0 for cigarette smokersbull 119884119894 = death in the first year of follow-upbull Naive positive effect cigarpipe smokers more likely to die
Whatrsquos the confounder here Age Pipecigar smokers much older than cigarette smokers
bull Cochranrsquos approach stratify based on coarsened age Divide age into 119896 strata 119878119894 isin 1199041113568 1199041113569 hellip 119904119896 1199041113568 might be 18-25 1199041113569 might be 26-35 and so on Calculate effect within strata and aggregate
bull Key assumption no unmeasured confounders using stratifiedversion of age
119863119894 ⟂⟂ 1114100119884119894(0) 119884119894(1)1114103|119878119894
Stratification on the propensity score
bull What about when 119883 has has many dimensionsbull Curse of dimensionality there will be very few if any units in
a given stratum of 119883119894bull Stratify on a low-dimensional summary the propensity score
119890(119909) = ℙ[119863119894 = 1|119883119894 = 119909]
PS = unitrsquos probability of being treated conditional on 119883119894
bull For a particular unit this is 119890(119883119894) = ℙ[119863119894 = 1|119883119894]bull Rosenbaum and Rubin (1983) showed that
119863119894 ⟂⟂ 1114100119884119894(0) 119884119894(1)1114103 | 119883119894 ⟹ 119863119894 ⟂⟂ 1114100119884119894(0) 119884119894(1)1114103 | 119890(119883119894)
stratifying on 119890119894 is the same as stratifying on the full 119883119894
Propensity score as balancing score
bull The propensity score is actually a balancing score whichmeans that
119863119894 ⟂⟂ 119883119894 | 119890(119883119894)
bull Conditional on the propensity score treatment is independentof the covariates
Treatment status is said to be balanced 119891(119883119894|119863119894 = 1 119890(119883119894)) = 119891(119883119894|119863119894 = 0 119890(119883119894))
bull Of course we have to know the true PS to have all theseresults work
Estimating the propensity score
bull Of course in observational studies we donrsquot know thepropensity score
bull We would run a parametric model with parameters 120574 toestimate the propensity scores
1 Estimate 2 Create 119894 = Pr[119863119894 = 1|119883119894 ]
bull For instance in R we could easily calculate the propensityscores using the glm function
pscores lt- glm(treat ~ var1 + var2 + var3 data = mydata
family = binomial())$fittedvalues
Propensity score specifics
bull What variables do we include in the propensity score model Any set of variables that blocks all the backdoor paths from 119863119894
to 119884119894
bull Check balance within strata of 119894 Covariates should bebalanced
119891(119883119894|119863119894 = 1 119894) = 119891(119883119894|119863119894 = 0 119894)
bull Can also use automatednonparametric tools for estimating 119894 Covariate Balancing Propensity Scores (Imai and Ratkovic)
Stratifying by the propensity score
bull How will we use the propensity score Matching (next week) Weighting (two weeks) Regression
(three weeks)bull Today coarsening the propensity score and stratifyingbull Choose boundary points 0 = 1198871113567 lt 1198871113568 lt hellip lt 119887119870minus1113568 lt 119887119870 = 1bull Create block indicators
119861119894(119896) =
⎧⎪⎪⎨⎪⎪⎩1 if 119887119896minus1113568 lt (119883119894) lt 1198871198960 otherwise
bull Calculate within-strata effect estimates
120591119896 = 120124[119884119894|119863119894 = 1 119861119894(119896) = 1] minus 120124[119884119894|119863119894 = 0 119861119894(119896) = 1]
Standardizationdirect adjustment
bull We calculated the CATEs for each strata of the PS 120591119896bull We can use law of iterated expectations to back out the ATEbull Take the average of the CATEs over the distribution of 119883
120591 =1198701114012119896=1113568
120591119896ℙ[119861119894(119896) = 1]
bull Note that ℙ[119861119894(119896) = 1] is just the proportion of units in block 119896
ℙ[119861119894(119896) = 1] =sum119873
119894=1113568 119861119894(119896)119873
5 Wrapping Up
Summary
bull Defined observational studiesbull Defined confounding and assessed when no unmeasured
confounding holdsbull Saw how no unmeasured confounding helps with OLSbull Saw how to estimate causal effects under no unmeasured
confounding using the propensity score
Next few weeks
bull Learn how to estimate causal effects under no unmeasuredconfounders via
Matching Weighting Regression
bull Then we move onto situations where no unmeasuredconfounders is violated
Designing observational studies
bull Rubin (2008) argues that we should still ldquodesignrdquo ourobservational studies
Pick the ideal experiment to this observational study Hide the outcome data Try to estimate the randomization procedure Analyze this as an experiment with this estimated procedure
bull Tries to minimize ldquosnoopingrdquo by picking the best modelingstrategy before seeing the outcome
Discrete covariates
bull Suppose that we knew that 119863119894 was unconfounded within levelsof a binary 119883119894
bull Then we could always estimate the causal effect using iteratedexpectations as in a stratified randomized experiment
1201241198831114107120124[119884119894|119863119894 = 1119883119894] minus 120124[119884119894|119863119894 = 0119883119894]1114110
= 1114101120124[119884119894|119863119894 = 1119883119894 = 1] minus 120124[119884119894|119863119894 = 0119883119894 = 1]1114104111405911138401113840111384011138401113840111384011138401113840111384011138401113840111384011138401113840111384011138401113840111384011138401113840111384011138401113840111384011138401113840111384011138401113840111384011140601113840111384011138401113840111384011138401113840111384011138401113840111384011138401113840111384011138401113840111384011138401113840111384011138401113840111384011138401113840111384011138401113840111384011138401114061
diff-in-means for 119883119894=1113568
ℙ[119883119894 = 1]11140591113840111384011138401113840111406011138401113840111384011138401114061share of 119883119894=1113568
+ 1114101120124[119884119894|119863119894 = 1119883119894 = 0] minus 120124[119884119894|119863119894 = 0119883119894 = 0]1114104111405911138401113840111384011138401113840111384011138401113840111384011138401113840111384011138401113840111384011138401113840111384011138401113840111384011138401113840111384011138401113840111384011138401113840111384011140601113840111384011138401113840111384011138401113840111384011138401113840111384011138401113840111384011138401113840111384011138401113840111384011138401113840111384011138401113840111384011138401113840111384011138401114061
diff-in-means for 119883119894=1113567
ℙ[119883119894 = 0]11140591113840111384011138401113840111406011138401113840111384011138401114061share of 119883119894=1113567
bull Never used our knowledge of the randomization for thisquantity
Continuous covariates
bull So great we can stratify Why not do this all the timebull What if 119883119894 = income for unit 119894
Each unit has its own value of 119883119894 $54134 $123043 $23842 If 119883119894 = 54134 is unique will only observe 1 of these
120124[119884119894|119863119894 = 1119883119894 = 54134] minus 120124[119884119894|119863119894 = 0119883119894 = 54134]
cannot stratify to each unique value of 119883119894bull Practically this is massively important almost always have
data with unique values
Going to a superpopulation
bull From here on out wersquoll focus less on the finite populationmodel
Harder with (functionally) continuous covariatesbull Assume that each unit 119894 is drawn from an infinite
superpopulation implies that (119884119894(0) 119884119894(1) 119863119894 119883119894) are a draw from their
population joint distributionbull Potential outcomes are now typical random variables
120583119888(119909) = 120124[119884119894(0)|119883119894 = 119909] and 120583119905(119909) = 120124[119884119894(1)|119883119894 = 119909] 1205901113569119888 (119909) = 120141[119884119894(0)|119883119894 = 119909] and 1205901113569119905 (119909) = 120141[119884119894(1)|119883119894 = 119909] 120591 = 120124[120583119905(119909) minus 120583119888(119909)|119883119894 = 119909]
Assumptions in the superpopulation
bull With an infinite superpopulation worry less aboutconditioning on the entire sample
Units are now independent due to random sampling from aninfinite population
bull No unmeasured confoudning implies that
ℙ(119863119894 = 1|119884119894(0) 119884119894(1) 119883119894) = ℙ(119863119894 = 1|119883119894)
bull Or written using conditional independence
119863119894 ⟂⟂ 1114100119884119894(0) 119884119894(1)1114103|119883119894
bull Positivity can be written 0 lt ℙ(119863119894 = 1|119883119894 = 119909) lt 1 for all 119909 inthe support of 119883119894
2 Confounding
What is confounding
bull Confounding is the bias caused by common causes of thetreatment and outcome
Leads to ldquospurious correlationrdquobull In observational studies the goal is to avoid confounding
inherent in the databull Pervasive in the social sciences
effect of income on voting (confounding age) effect of job training program on employment (confounding
motivation) effect of political institutions on economic development
(confounding previous economic development)bull No unmeasured confounding assumes that wersquove measured all
sources of confounding
Big problem
bull How can we determine if no unmeasured confounding holds ifwe didnrsquot assign the treatment
bull Put differently What covariates do we need to condition on What covariates do we need to match on What covaraites do we need to include in our regressions
bull One way from the assumption itself ℙ[119863119894 = 1|119831 119832(1) 119832(0)] = ℙ[119863119894 = 1|119831] Include covariates such that conditional on them the
treatment assignment does not depend on the potentialoutcomes
bull Another way use DAGs and look at back-door paths
Backdoor paths and blocking paths
bull Backdoor path is a non-causal path from 119863 to 119884 Would remain if we removed any arrows pointing out of 119863
bull Backdoor paths between 119863 and 119884 common causes of 119863and 119884
119863
119883
119884
bull Here there is a backdoor path 119863 larr 119883 rarr 119884 where 119883 is acommon cause for the treatment and the outcome
Other types of confounding
119863
119880 119883
119884
bull 119863 is enrolling in a job training programbull 119884 is getting a jobbull 119880 is being motivatedbull 119883 is number of job applications sent outbull Big assumption here no arrow from 119880 to 119884
Other types of confounding
119863
119880 119883
119884
bull 119863 is exercisebull 119884 is having a diseasebull 119880 is lifestylebull 119883 is smokingbull Big assumption here no arrow from 119880 to 119884
Whatrsquos the problem with backdoorpaths
119863
119880 119883
119884
bull A path is blocked if1 we control for or stratify a non-collider on that path OR2 we do not control for a collider
bull Unblocked backdoor paths confoundingbull In the DAG here if we condition on 119883 then the backdoor
path is blocked
Not all backdoor paths
119863
1198801113568119883119883
119884
bull Conditioning on the posttreatment covariates opens thenon-causal path
selection bias
M-bias
119863
1198801113568 1198801113569119883119883
119884
bull Not all backdoor paths induce confoundingbull This backdoor path is blocked by the collider 119883119894 that we donrsquot
control forbull If we control for 119883119894 opens the path and induces
confounding Sometimes called M-bias
bull Controversial because of differing views on what to control for Rubin thinks that M-bias is a ldquomathematical curiosityrdquo and we
should control for all pretreatment variables Pearl and others think M-bias is a real threat
Backdoor criterion
bull Can we use a DAG to evaluate no unmeasured confoundersbull Pearl answered yes with the backdoor criterion which states
that the effect of 119863 on 119884 is identified if1 No backdoor paths from 119863 to 119884 OR2 Measured covariates are sufficient to block all backdoor paths
from 119863 to 119884
bull First is really only valid for randomized experimentsbull The backdoor criterion is fairly powerful Tells us
if there confounding given this DAG if it is possible to removing the confounding and what variables to condition on to eliminate the confounding
SWIGs
119863 | 119889 119884(119889)
119880 119883
119884
bull Itrsquos a little hard to see how the backdoor criterion implies nounmeasured confounders
No potential outcomes on this graphbull Richardson and Robins Single World Intervention Graphs
Split 119863 node into natural value (119863) and intervention value 119889 Let all effects of 119863 take their potential value under
intervention 119884(119889)bull Now can see are 119863 and 119884(119889) related
119863 larr 119880 rarr 119883 rarr 119884(119889) implies not independent Conditioning on 119883 blocks that backdoor path 119863 ⟂⟂ 119884(119889)|119883
No unmeasured confounders is nottestable
bull No unmeasured confounding places no restrictions on theobserved data
1114100119884119894(0)|119863119894 = 1119883119894111410311140591113840111384011138401113840111384011138401113840111384011138401113840111406011138401113840111384011138401113840111384011138401113840111384011138401114061unobserved
119889= 1114100119884119894(0)|119863119894 = 0119883119894111410311140591113840111384011138401113840111384011138401113840111384011138401113840111406011138401113840111384011138401113840111384011138401113840111384011138401114061observed
bull Here 119889= means equal in distributionbull No way to directly test this assumption without the
counterfactual data which is missing by definitionbull With backdoor criterion you must have the correct DAG
Assessing no unmeasured confounders
bull Can do ldquoplacebordquo tests where 119863119894 cannot have an effect(lagged outcomes etc)
bull Della Vigna and Kaplan (2007 QJE) effect of Fox Newsavailability on Republican vote share
Availability in 20002003 canrsquot affect past vote sharesbull Unconfoundedness could still be violated even if you pass this
test
Alternatives to no unmeasuredconfounding
bull Without explicit randomization we need some way ofidentifying causal effects
bull No unmeasured confounders asymp randomized experiment Indentification results very similar to experiments
bull With unmeasured confounding are we doomed Maybe notbull Other approaches rely on finding plausibly exogenous variation
in assignment of 119863119894 Instrumental variables (randomization + exclusion restriction) Over-time variation (diff-in-diff fixed effects) Arbitrary thresholds for treatment assignment (RDD)
3 No unmeasuredconfounders and OLS
Justifying regression
bull We know how randomized experiments imply thatdifferences-in-means identify the ATE
bull In the next few weeks wersquoll work through how no unmeasuredconfounding justifies a number of estimation strategies
bull Today itrsquos useful to walk through what no unmeasuredconfounding can buy us in a familiar setting OLS
Wersquoll cover regression more formally later
Constant effects set up
bull Assume a constant effects setup
119884119894(0) = 120572 + 119883prime119894 120573 + 119906119894
119884119894(1) = 120572 + 120591 + 119883prime119894 120573 + 119906119894
bull Constant effects because 119884119894(1) minus 119884119894(0) = 120591 for all unitsbull Use consistency to get the usual regression formula
119884119894 = 119884119894(1)119863119894 + 119884119894(0)(1 minus 119863119894)= 119884119894(0) + 1114100119884119894(1) minus 119884119894(0)1114103 sdot 119863119894
= 120572 + 120591 sdot 119863119894 + 119883prime119894 120573 + 119906119894
bull Does no unmeasured confounding help us identify the causalparameter 120591
Regression on residuals
bull First estimate the residuals of regression of the treatment andoutcome on the covariates
119894 = 119884119894 minus 120124[119884119894|119883119894]119894 = 119863119894 minus 120124[119863119894|119883119894]
bull Running a regression of 119894 on 119894 is equivalent to controllingfor 119883119894
119884119894 = 120572 + 120591 sdot 119863119894 + 119883prime119894 120573 + 119906119894
119894 = 120572 + 120591 sdot 119894 + 119894bull Here 119894 = 119906119894 minus 120124[119906119894|119883119894]
What does OLS estimate
bull Using the usual OLS theory we can show that the probabilitylimit of the OLS estimator of 120591 is
plim 111369611136931113700 =Cov(119894 119894)Var(119894)
= Cov(119894 120572 + 120591 sdot 119894 + 119894)Var(119894)
= 120591 sdot Cov(119894 119894) + Cov(119894 119894)Var(119894)
= 120591 + Cov(119894 119894)Var(119894)
Key OLS assumption
plim 111369611136931113700 = 120591 + Cov(119894 119894)Var(119894)
bull Key identification comes from Cov(119894 119894) = 0 Conditional on 119883119894 no relationship between 119863119894 and 119906119894
bull Note 119906119894 is a function of 119883119894 and 119884119894(119889) 119906119894 = 119884119894(0) minus 120572 minus 119883prime
119894 120573 when 119863119894 = 0 119906119894 = 119884119894(1) minus 120572 minus 120591 minus 119883prime
119894 120573 when 119863119894 = 1 condition on 119883119894 only variation in 119906119894 comes from 119884119894(119889)
bull No unmeasured confounding implies this assumption
119863119894 ⟂⟂ 1114100119884119894(1) 119884119894(0)1114103|119883119894 ⟹ 119863119894 ⟂⟂ 119906119894|119883119894 ⟹ Cov(119894 119894) = 0
Omitted variable bias
bull What happens when this is violated Suppose that there isone omitted variable (residualized from 119883119894)
119894 = 120582119894 + 120596119894
bull Wersquoll assume that if we could measure 119871119894 then nounmeasured confounding would hold
bull Leads to inconsistency in the OLS estimator
plim 111369611136931113700 = 120591 + 120582Cov(119894 119894)Var(119894)
bull Bias here is terms multiplied together1 coefficient on 119871119894 (120582)2 the coefficient of regression of 119863119894 on 119871119894 also controlling for 119883119894
4 Estimating causaleffects under nounmeasuredconfounders
Basic approach to estimation
bull Remember the usual approach to estimating the ATE withcovariates
bull Stratification Stratify the units by the covariates Calculate CATE within these strata
bull Standardizationdirect adjustment Average the CATEs across the strata to get ATE
bull How to create strata when 119883 has continuous components If 119883 is discrete with only a few levels can use the exact values
of 119883 Otherwise we may have to subclassifycoarsen the data
Classic example cigarspipes versuscigarettes
bull 119863119894 = 1 for pipecigar smokers 119863119894 = 0 for cigarette smokersbull 119884119894 = death in the first year of follow-upbull Naive positive effect cigarpipe smokers more likely to die
Whatrsquos the confounder here Age Pipecigar smokers much older than cigarette smokers
bull Cochranrsquos approach stratify based on coarsened age Divide age into 119896 strata 119878119894 isin 1199041113568 1199041113569 hellip 119904119896 1199041113568 might be 18-25 1199041113569 might be 26-35 and so on Calculate effect within strata and aggregate
bull Key assumption no unmeasured confounders using stratifiedversion of age
119863119894 ⟂⟂ 1114100119884119894(0) 119884119894(1)1114103|119878119894
Stratification on the propensity score
bull What about when 119883 has has many dimensionsbull Curse of dimensionality there will be very few if any units in
a given stratum of 119883119894bull Stratify on a low-dimensional summary the propensity score
119890(119909) = ℙ[119863119894 = 1|119883119894 = 119909]
PS = unitrsquos probability of being treated conditional on 119883119894
bull For a particular unit this is 119890(119883119894) = ℙ[119863119894 = 1|119883119894]bull Rosenbaum and Rubin (1983) showed that
119863119894 ⟂⟂ 1114100119884119894(0) 119884119894(1)1114103 | 119883119894 ⟹ 119863119894 ⟂⟂ 1114100119884119894(0) 119884119894(1)1114103 | 119890(119883119894)
stratifying on 119890119894 is the same as stratifying on the full 119883119894
Propensity score as balancing score
bull The propensity score is actually a balancing score whichmeans that
119863119894 ⟂⟂ 119883119894 | 119890(119883119894)
bull Conditional on the propensity score treatment is independentof the covariates
Treatment status is said to be balanced 119891(119883119894|119863119894 = 1 119890(119883119894)) = 119891(119883119894|119863119894 = 0 119890(119883119894))
bull Of course we have to know the true PS to have all theseresults work
Estimating the propensity score
bull Of course in observational studies we donrsquot know thepropensity score
bull We would run a parametric model with parameters 120574 toestimate the propensity scores
1 Estimate 2 Create 119894 = Pr[119863119894 = 1|119883119894 ]
bull For instance in R we could easily calculate the propensityscores using the glm function
pscores lt- glm(treat ~ var1 + var2 + var3 data = mydata
family = binomial())$fittedvalues
Propensity score specifics
bull What variables do we include in the propensity score model Any set of variables that blocks all the backdoor paths from 119863119894
to 119884119894
bull Check balance within strata of 119894 Covariates should bebalanced
119891(119883119894|119863119894 = 1 119894) = 119891(119883119894|119863119894 = 0 119894)
bull Can also use automatednonparametric tools for estimating 119894 Covariate Balancing Propensity Scores (Imai and Ratkovic)
Stratifying by the propensity score
bull How will we use the propensity score Matching (next week) Weighting (two weeks) Regression
(three weeks)bull Today coarsening the propensity score and stratifyingbull Choose boundary points 0 = 1198871113567 lt 1198871113568 lt hellip lt 119887119870minus1113568 lt 119887119870 = 1bull Create block indicators
119861119894(119896) =
⎧⎪⎪⎨⎪⎪⎩1 if 119887119896minus1113568 lt (119883119894) lt 1198871198960 otherwise
bull Calculate within-strata effect estimates
120591119896 = 120124[119884119894|119863119894 = 1 119861119894(119896) = 1] minus 120124[119884119894|119863119894 = 0 119861119894(119896) = 1]
Standardizationdirect adjustment
bull We calculated the CATEs for each strata of the PS 120591119896bull We can use law of iterated expectations to back out the ATEbull Take the average of the CATEs over the distribution of 119883
120591 =1198701114012119896=1113568
120591119896ℙ[119861119894(119896) = 1]
bull Note that ℙ[119861119894(119896) = 1] is just the proportion of units in block 119896
ℙ[119861119894(119896) = 1] =sum119873
119894=1113568 119861119894(119896)119873
5 Wrapping Up
Summary
bull Defined observational studiesbull Defined confounding and assessed when no unmeasured
confounding holdsbull Saw how no unmeasured confounding helps with OLSbull Saw how to estimate causal effects under no unmeasured
confounding using the propensity score
Next few weeks
bull Learn how to estimate causal effects under no unmeasuredconfounders via
Matching Weighting Regression
bull Then we move onto situations where no unmeasuredconfounders is violated
Discrete covariates
bull Suppose that we knew that 119863119894 was unconfounded within levelsof a binary 119883119894
bull Then we could always estimate the causal effect using iteratedexpectations as in a stratified randomized experiment
1201241198831114107120124[119884119894|119863119894 = 1119883119894] minus 120124[119884119894|119863119894 = 0119883119894]1114110
= 1114101120124[119884119894|119863119894 = 1119883119894 = 1] minus 120124[119884119894|119863119894 = 0119883119894 = 1]1114104111405911138401113840111384011138401113840111384011138401113840111384011138401113840111384011138401113840111384011138401113840111384011138401113840111384011138401113840111384011138401113840111384011138401113840111384011140601113840111384011138401113840111384011138401113840111384011138401113840111384011138401113840111384011138401113840111384011138401113840111384011138401113840111384011138401113840111384011138401113840111384011138401114061
diff-in-means for 119883119894=1113568
ℙ[119883119894 = 1]11140591113840111384011138401113840111406011138401113840111384011138401114061share of 119883119894=1113568
+ 1114101120124[119884119894|119863119894 = 1119883119894 = 0] minus 120124[119884119894|119863119894 = 0119883119894 = 0]1114104111405911138401113840111384011138401113840111384011138401113840111384011138401113840111384011138401113840111384011138401113840111384011138401113840111384011138401113840111384011138401113840111384011138401113840111384011140601113840111384011138401113840111384011138401113840111384011138401113840111384011138401113840111384011138401113840111384011138401113840111384011138401113840111384011138401113840111384011138401113840111384011138401114061
diff-in-means for 119883119894=1113567
ℙ[119883119894 = 0]11140591113840111384011138401113840111406011138401113840111384011138401114061share of 119883119894=1113567
bull Never used our knowledge of the randomization for thisquantity
Continuous covariates
bull So great we can stratify Why not do this all the timebull What if 119883119894 = income for unit 119894
Each unit has its own value of 119883119894 $54134 $123043 $23842 If 119883119894 = 54134 is unique will only observe 1 of these
120124[119884119894|119863119894 = 1119883119894 = 54134] minus 120124[119884119894|119863119894 = 0119883119894 = 54134]
cannot stratify to each unique value of 119883119894bull Practically this is massively important almost always have
data with unique values
Going to a superpopulation
bull From here on out wersquoll focus less on the finite populationmodel
Harder with (functionally) continuous covariatesbull Assume that each unit 119894 is drawn from an infinite
superpopulation implies that (119884119894(0) 119884119894(1) 119863119894 119883119894) are a draw from their
population joint distributionbull Potential outcomes are now typical random variables
120583119888(119909) = 120124[119884119894(0)|119883119894 = 119909] and 120583119905(119909) = 120124[119884119894(1)|119883119894 = 119909] 1205901113569119888 (119909) = 120141[119884119894(0)|119883119894 = 119909] and 1205901113569119905 (119909) = 120141[119884119894(1)|119883119894 = 119909] 120591 = 120124[120583119905(119909) minus 120583119888(119909)|119883119894 = 119909]
Assumptions in the superpopulation
bull With an infinite superpopulation worry less aboutconditioning on the entire sample
Units are now independent due to random sampling from aninfinite population
bull No unmeasured confoudning implies that
ℙ(119863119894 = 1|119884119894(0) 119884119894(1) 119883119894) = ℙ(119863119894 = 1|119883119894)
bull Or written using conditional independence
119863119894 ⟂⟂ 1114100119884119894(0) 119884119894(1)1114103|119883119894
bull Positivity can be written 0 lt ℙ(119863119894 = 1|119883119894 = 119909) lt 1 for all 119909 inthe support of 119883119894
2 Confounding
What is confounding
bull Confounding is the bias caused by common causes of thetreatment and outcome
Leads to ldquospurious correlationrdquobull In observational studies the goal is to avoid confounding
inherent in the databull Pervasive in the social sciences
effect of income on voting (confounding age) effect of job training program on employment (confounding
motivation) effect of political institutions on economic development
(confounding previous economic development)bull No unmeasured confounding assumes that wersquove measured all
sources of confounding
Big problem
bull How can we determine if no unmeasured confounding holds ifwe didnrsquot assign the treatment
bull Put differently What covariates do we need to condition on What covariates do we need to match on What covaraites do we need to include in our regressions
bull One way from the assumption itself ℙ[119863119894 = 1|119831 119832(1) 119832(0)] = ℙ[119863119894 = 1|119831] Include covariates such that conditional on them the
treatment assignment does not depend on the potentialoutcomes
bull Another way use DAGs and look at back-door paths
Backdoor paths and blocking paths
bull Backdoor path is a non-causal path from 119863 to 119884 Would remain if we removed any arrows pointing out of 119863
bull Backdoor paths between 119863 and 119884 common causes of 119863and 119884
119863
119883
119884
bull Here there is a backdoor path 119863 larr 119883 rarr 119884 where 119883 is acommon cause for the treatment and the outcome
Other types of confounding
119863
119880 119883
119884
bull 119863 is enrolling in a job training programbull 119884 is getting a jobbull 119880 is being motivatedbull 119883 is number of job applications sent outbull Big assumption here no arrow from 119880 to 119884
Other types of confounding
119863
119880 119883
119884
bull 119863 is exercisebull 119884 is having a diseasebull 119880 is lifestylebull 119883 is smokingbull Big assumption here no arrow from 119880 to 119884
Whatrsquos the problem with backdoorpaths
119863
119880 119883
119884
bull A path is blocked if1 we control for or stratify a non-collider on that path OR2 we do not control for a collider
bull Unblocked backdoor paths confoundingbull In the DAG here if we condition on 119883 then the backdoor
path is blocked
Not all backdoor paths
119863
1198801113568119883119883
119884
bull Conditioning on the posttreatment covariates opens thenon-causal path
selection bias
M-bias
119863
1198801113568 1198801113569119883119883
119884
bull Not all backdoor paths induce confoundingbull This backdoor path is blocked by the collider 119883119894 that we donrsquot
control forbull If we control for 119883119894 opens the path and induces
confounding Sometimes called M-bias
bull Controversial because of differing views on what to control for Rubin thinks that M-bias is a ldquomathematical curiosityrdquo and we
should control for all pretreatment variables Pearl and others think M-bias is a real threat
Backdoor criterion
bull Can we use a DAG to evaluate no unmeasured confoundersbull Pearl answered yes with the backdoor criterion which states
that the effect of 119863 on 119884 is identified if1 No backdoor paths from 119863 to 119884 OR2 Measured covariates are sufficient to block all backdoor paths
from 119863 to 119884
bull First is really only valid for randomized experimentsbull The backdoor criterion is fairly powerful Tells us
if there confounding given this DAG if it is possible to removing the confounding and what variables to condition on to eliminate the confounding
SWIGs
119863 | 119889 119884(119889)
119880 119883
119884
bull Itrsquos a little hard to see how the backdoor criterion implies nounmeasured confounders
No potential outcomes on this graphbull Richardson and Robins Single World Intervention Graphs
Split 119863 node into natural value (119863) and intervention value 119889 Let all effects of 119863 take their potential value under
intervention 119884(119889)bull Now can see are 119863 and 119884(119889) related
119863 larr 119880 rarr 119883 rarr 119884(119889) implies not independent Conditioning on 119883 blocks that backdoor path 119863 ⟂⟂ 119884(119889)|119883
No unmeasured confounders is nottestable
bull No unmeasured confounding places no restrictions on theobserved data
1114100119884119894(0)|119863119894 = 1119883119894111410311140591113840111384011138401113840111384011138401113840111384011138401113840111406011138401113840111384011138401113840111384011138401113840111384011138401114061unobserved
119889= 1114100119884119894(0)|119863119894 = 0119883119894111410311140591113840111384011138401113840111384011138401113840111384011138401113840111406011138401113840111384011138401113840111384011138401113840111384011138401114061observed
bull Here 119889= means equal in distributionbull No way to directly test this assumption without the
counterfactual data which is missing by definitionbull With backdoor criterion you must have the correct DAG
Assessing no unmeasured confounders
bull Can do ldquoplacebordquo tests where 119863119894 cannot have an effect(lagged outcomes etc)
bull Della Vigna and Kaplan (2007 QJE) effect of Fox Newsavailability on Republican vote share
Availability in 20002003 canrsquot affect past vote sharesbull Unconfoundedness could still be violated even if you pass this
test
Alternatives to no unmeasuredconfounding
bull Without explicit randomization we need some way ofidentifying causal effects
bull No unmeasured confounders asymp randomized experiment Indentification results very similar to experiments
bull With unmeasured confounding are we doomed Maybe notbull Other approaches rely on finding plausibly exogenous variation
in assignment of 119863119894 Instrumental variables (randomization + exclusion restriction) Over-time variation (diff-in-diff fixed effects) Arbitrary thresholds for treatment assignment (RDD)
3 No unmeasuredconfounders and OLS
Justifying regression
bull We know how randomized experiments imply thatdifferences-in-means identify the ATE
bull In the next few weeks wersquoll work through how no unmeasuredconfounding justifies a number of estimation strategies
bull Today itrsquos useful to walk through what no unmeasuredconfounding can buy us in a familiar setting OLS
Wersquoll cover regression more formally later
Constant effects set up
bull Assume a constant effects setup
119884119894(0) = 120572 + 119883prime119894 120573 + 119906119894
119884119894(1) = 120572 + 120591 + 119883prime119894 120573 + 119906119894
bull Constant effects because 119884119894(1) minus 119884119894(0) = 120591 for all unitsbull Use consistency to get the usual regression formula
119884119894 = 119884119894(1)119863119894 + 119884119894(0)(1 minus 119863119894)= 119884119894(0) + 1114100119884119894(1) minus 119884119894(0)1114103 sdot 119863119894
= 120572 + 120591 sdot 119863119894 + 119883prime119894 120573 + 119906119894
bull Does no unmeasured confounding help us identify the causalparameter 120591
Regression on residuals
bull First estimate the residuals of regression of the treatment andoutcome on the covariates
119894 = 119884119894 minus 120124[119884119894|119883119894]119894 = 119863119894 minus 120124[119863119894|119883119894]
bull Running a regression of 119894 on 119894 is equivalent to controllingfor 119883119894
119884119894 = 120572 + 120591 sdot 119863119894 + 119883prime119894 120573 + 119906119894
119894 = 120572 + 120591 sdot 119894 + 119894bull Here 119894 = 119906119894 minus 120124[119906119894|119883119894]
What does OLS estimate
bull Using the usual OLS theory we can show that the probabilitylimit of the OLS estimator of 120591 is
plim 111369611136931113700 =Cov(119894 119894)Var(119894)
= Cov(119894 120572 + 120591 sdot 119894 + 119894)Var(119894)
= 120591 sdot Cov(119894 119894) + Cov(119894 119894)Var(119894)
= 120591 + Cov(119894 119894)Var(119894)
Key OLS assumption
plim 111369611136931113700 = 120591 + Cov(119894 119894)Var(119894)
bull Key identification comes from Cov(119894 119894) = 0 Conditional on 119883119894 no relationship between 119863119894 and 119906119894
bull Note 119906119894 is a function of 119883119894 and 119884119894(119889) 119906119894 = 119884119894(0) minus 120572 minus 119883prime
119894 120573 when 119863119894 = 0 119906119894 = 119884119894(1) minus 120572 minus 120591 minus 119883prime
119894 120573 when 119863119894 = 1 condition on 119883119894 only variation in 119906119894 comes from 119884119894(119889)
bull No unmeasured confounding implies this assumption
119863119894 ⟂⟂ 1114100119884119894(1) 119884119894(0)1114103|119883119894 ⟹ 119863119894 ⟂⟂ 119906119894|119883119894 ⟹ Cov(119894 119894) = 0
Omitted variable bias
bull What happens when this is violated Suppose that there isone omitted variable (residualized from 119883119894)
119894 = 120582119894 + 120596119894
bull Wersquoll assume that if we could measure 119871119894 then nounmeasured confounding would hold
bull Leads to inconsistency in the OLS estimator
plim 111369611136931113700 = 120591 + 120582Cov(119894 119894)Var(119894)
bull Bias here is terms multiplied together1 coefficient on 119871119894 (120582)2 the coefficient of regression of 119863119894 on 119871119894 also controlling for 119883119894
4 Estimating causaleffects under nounmeasuredconfounders
Basic approach to estimation
bull Remember the usual approach to estimating the ATE withcovariates
bull Stratification Stratify the units by the covariates Calculate CATE within these strata
bull Standardizationdirect adjustment Average the CATEs across the strata to get ATE
bull How to create strata when 119883 has continuous components If 119883 is discrete with only a few levels can use the exact values
of 119883 Otherwise we may have to subclassifycoarsen the data
Classic example cigarspipes versuscigarettes
bull 119863119894 = 1 for pipecigar smokers 119863119894 = 0 for cigarette smokersbull 119884119894 = death in the first year of follow-upbull Naive positive effect cigarpipe smokers more likely to die
Whatrsquos the confounder here Age Pipecigar smokers much older than cigarette smokers
bull Cochranrsquos approach stratify based on coarsened age Divide age into 119896 strata 119878119894 isin 1199041113568 1199041113569 hellip 119904119896 1199041113568 might be 18-25 1199041113569 might be 26-35 and so on Calculate effect within strata and aggregate
bull Key assumption no unmeasured confounders using stratifiedversion of age
119863119894 ⟂⟂ 1114100119884119894(0) 119884119894(1)1114103|119878119894
Stratification on the propensity score
bull What about when 119883 has has many dimensionsbull Curse of dimensionality there will be very few if any units in
a given stratum of 119883119894bull Stratify on a low-dimensional summary the propensity score
119890(119909) = ℙ[119863119894 = 1|119883119894 = 119909]
PS = unitrsquos probability of being treated conditional on 119883119894
bull For a particular unit this is 119890(119883119894) = ℙ[119863119894 = 1|119883119894]bull Rosenbaum and Rubin (1983) showed that
119863119894 ⟂⟂ 1114100119884119894(0) 119884119894(1)1114103 | 119883119894 ⟹ 119863119894 ⟂⟂ 1114100119884119894(0) 119884119894(1)1114103 | 119890(119883119894)
stratifying on 119890119894 is the same as stratifying on the full 119883119894
Propensity score as balancing score
bull The propensity score is actually a balancing score whichmeans that
119863119894 ⟂⟂ 119883119894 | 119890(119883119894)
bull Conditional on the propensity score treatment is independentof the covariates
Treatment status is said to be balanced 119891(119883119894|119863119894 = 1 119890(119883119894)) = 119891(119883119894|119863119894 = 0 119890(119883119894))
bull Of course we have to know the true PS to have all theseresults work
Estimating the propensity score
bull Of course in observational studies we donrsquot know thepropensity score
bull We would run a parametric model with parameters 120574 toestimate the propensity scores
1 Estimate 2 Create 119894 = Pr[119863119894 = 1|119883119894 ]
bull For instance in R we could easily calculate the propensityscores using the glm function
pscores lt- glm(treat ~ var1 + var2 + var3 data = mydata
family = binomial())$fittedvalues
Propensity score specifics
bull What variables do we include in the propensity score model Any set of variables that blocks all the backdoor paths from 119863119894
to 119884119894
bull Check balance within strata of 119894 Covariates should bebalanced
119891(119883119894|119863119894 = 1 119894) = 119891(119883119894|119863119894 = 0 119894)
bull Can also use automatednonparametric tools for estimating 119894 Covariate Balancing Propensity Scores (Imai and Ratkovic)
Stratifying by the propensity score
bull How will we use the propensity score Matching (next week) Weighting (two weeks) Regression
(three weeks)bull Today coarsening the propensity score and stratifyingbull Choose boundary points 0 = 1198871113567 lt 1198871113568 lt hellip lt 119887119870minus1113568 lt 119887119870 = 1bull Create block indicators
119861119894(119896) =
⎧⎪⎪⎨⎪⎪⎩1 if 119887119896minus1113568 lt (119883119894) lt 1198871198960 otherwise
bull Calculate within-strata effect estimates
120591119896 = 120124[119884119894|119863119894 = 1 119861119894(119896) = 1] minus 120124[119884119894|119863119894 = 0 119861119894(119896) = 1]
Standardizationdirect adjustment
bull We calculated the CATEs for each strata of the PS 120591119896bull We can use law of iterated expectations to back out the ATEbull Take the average of the CATEs over the distribution of 119883
120591 =1198701114012119896=1113568
120591119896ℙ[119861119894(119896) = 1]
bull Note that ℙ[119861119894(119896) = 1] is just the proportion of units in block 119896
ℙ[119861119894(119896) = 1] =sum119873
119894=1113568 119861119894(119896)119873
5 Wrapping Up
Summary
bull Defined observational studiesbull Defined confounding and assessed when no unmeasured
confounding holdsbull Saw how no unmeasured confounding helps with OLSbull Saw how to estimate causal effects under no unmeasured
confounding using the propensity score
Next few weeks
bull Learn how to estimate causal effects under no unmeasuredconfounders via
Matching Weighting Regression
bull Then we move onto situations where no unmeasuredconfounders is violated
Continuous covariates
bull So great we can stratify Why not do this all the timebull What if 119883119894 = income for unit 119894
Each unit has its own value of 119883119894 $54134 $123043 $23842 If 119883119894 = 54134 is unique will only observe 1 of these
120124[119884119894|119863119894 = 1119883119894 = 54134] minus 120124[119884119894|119863119894 = 0119883119894 = 54134]
cannot stratify to each unique value of 119883119894bull Practically this is massively important almost always have
data with unique values
Going to a superpopulation
bull From here on out wersquoll focus less on the finite populationmodel
Harder with (functionally) continuous covariatesbull Assume that each unit 119894 is drawn from an infinite
superpopulation implies that (119884119894(0) 119884119894(1) 119863119894 119883119894) are a draw from their
population joint distributionbull Potential outcomes are now typical random variables
120583119888(119909) = 120124[119884119894(0)|119883119894 = 119909] and 120583119905(119909) = 120124[119884119894(1)|119883119894 = 119909] 1205901113569119888 (119909) = 120141[119884119894(0)|119883119894 = 119909] and 1205901113569119905 (119909) = 120141[119884119894(1)|119883119894 = 119909] 120591 = 120124[120583119905(119909) minus 120583119888(119909)|119883119894 = 119909]
Assumptions in the superpopulation
bull With an infinite superpopulation worry less aboutconditioning on the entire sample
Units are now independent due to random sampling from aninfinite population
bull No unmeasured confoudning implies that
ℙ(119863119894 = 1|119884119894(0) 119884119894(1) 119883119894) = ℙ(119863119894 = 1|119883119894)
bull Or written using conditional independence
119863119894 ⟂⟂ 1114100119884119894(0) 119884119894(1)1114103|119883119894
bull Positivity can be written 0 lt ℙ(119863119894 = 1|119883119894 = 119909) lt 1 for all 119909 inthe support of 119883119894
2 Confounding
What is confounding
bull Confounding is the bias caused by common causes of thetreatment and outcome
Leads to ldquospurious correlationrdquobull In observational studies the goal is to avoid confounding
inherent in the databull Pervasive in the social sciences
effect of income on voting (confounding age) effect of job training program on employment (confounding
motivation) effect of political institutions on economic development
(confounding previous economic development)bull No unmeasured confounding assumes that wersquove measured all
sources of confounding
Big problem
bull How can we determine if no unmeasured confounding holds ifwe didnrsquot assign the treatment
bull Put differently What covariates do we need to condition on What covariates do we need to match on What covaraites do we need to include in our regressions
bull One way from the assumption itself ℙ[119863119894 = 1|119831 119832(1) 119832(0)] = ℙ[119863119894 = 1|119831] Include covariates such that conditional on them the
treatment assignment does not depend on the potentialoutcomes
bull Another way use DAGs and look at back-door paths
Backdoor paths and blocking paths
bull Backdoor path is a non-causal path from 119863 to 119884 Would remain if we removed any arrows pointing out of 119863
bull Backdoor paths between 119863 and 119884 common causes of 119863and 119884
119863
119883
119884
bull Here there is a backdoor path 119863 larr 119883 rarr 119884 where 119883 is acommon cause for the treatment and the outcome
Other types of confounding
119863
119880 119883
119884
bull 119863 is enrolling in a job training programbull 119884 is getting a jobbull 119880 is being motivatedbull 119883 is number of job applications sent outbull Big assumption here no arrow from 119880 to 119884
Other types of confounding
119863
119880 119883
119884
bull 119863 is exercisebull 119884 is having a diseasebull 119880 is lifestylebull 119883 is smokingbull Big assumption here no arrow from 119880 to 119884
Whatrsquos the problem with backdoorpaths
119863
119880 119883
119884
bull A path is blocked if1 we control for or stratify a non-collider on that path OR2 we do not control for a collider
bull Unblocked backdoor paths confoundingbull In the DAG here if we condition on 119883 then the backdoor
path is blocked
Not all backdoor paths
119863
1198801113568119883119883
119884
bull Conditioning on the posttreatment covariates opens thenon-causal path
selection bias
M-bias
119863
1198801113568 1198801113569119883119883
119884
bull Not all backdoor paths induce confoundingbull This backdoor path is blocked by the collider 119883119894 that we donrsquot
control forbull If we control for 119883119894 opens the path and induces
confounding Sometimes called M-bias
bull Controversial because of differing views on what to control for Rubin thinks that M-bias is a ldquomathematical curiosityrdquo and we
should control for all pretreatment variables Pearl and others think M-bias is a real threat
Backdoor criterion
bull Can we use a DAG to evaluate no unmeasured confoundersbull Pearl answered yes with the backdoor criterion which states
that the effect of 119863 on 119884 is identified if1 No backdoor paths from 119863 to 119884 OR2 Measured covariates are sufficient to block all backdoor paths
from 119863 to 119884
bull First is really only valid for randomized experimentsbull The backdoor criterion is fairly powerful Tells us
if there confounding given this DAG if it is possible to removing the confounding and what variables to condition on to eliminate the confounding
SWIGs
119863 | 119889 119884(119889)
119880 119883
119884
bull Itrsquos a little hard to see how the backdoor criterion implies nounmeasured confounders
No potential outcomes on this graphbull Richardson and Robins Single World Intervention Graphs
Split 119863 node into natural value (119863) and intervention value 119889 Let all effects of 119863 take their potential value under
intervention 119884(119889)bull Now can see are 119863 and 119884(119889) related
119863 larr 119880 rarr 119883 rarr 119884(119889) implies not independent Conditioning on 119883 blocks that backdoor path 119863 ⟂⟂ 119884(119889)|119883
No unmeasured confounders is nottestable
bull No unmeasured confounding places no restrictions on theobserved data
1114100119884119894(0)|119863119894 = 1119883119894111410311140591113840111384011138401113840111384011138401113840111384011138401113840111406011138401113840111384011138401113840111384011138401113840111384011138401114061unobserved
119889= 1114100119884119894(0)|119863119894 = 0119883119894111410311140591113840111384011138401113840111384011138401113840111384011138401113840111406011138401113840111384011138401113840111384011138401113840111384011138401114061observed
bull Here 119889= means equal in distributionbull No way to directly test this assumption without the
counterfactual data which is missing by definitionbull With backdoor criterion you must have the correct DAG
Assessing no unmeasured confounders
bull Can do ldquoplacebordquo tests where 119863119894 cannot have an effect(lagged outcomes etc)
bull Della Vigna and Kaplan (2007 QJE) effect of Fox Newsavailability on Republican vote share
Availability in 20002003 canrsquot affect past vote sharesbull Unconfoundedness could still be violated even if you pass this
test
Alternatives to no unmeasuredconfounding
bull Without explicit randomization we need some way ofidentifying causal effects
bull No unmeasured confounders asymp randomized experiment Indentification results very similar to experiments
bull With unmeasured confounding are we doomed Maybe notbull Other approaches rely on finding plausibly exogenous variation
in assignment of 119863119894 Instrumental variables (randomization + exclusion restriction) Over-time variation (diff-in-diff fixed effects) Arbitrary thresholds for treatment assignment (RDD)
3 No unmeasuredconfounders and OLS
Justifying regression
bull We know how randomized experiments imply thatdifferences-in-means identify the ATE
bull In the next few weeks wersquoll work through how no unmeasuredconfounding justifies a number of estimation strategies
bull Today itrsquos useful to walk through what no unmeasuredconfounding can buy us in a familiar setting OLS
Wersquoll cover regression more formally later
Constant effects set up
bull Assume a constant effects setup
119884119894(0) = 120572 + 119883prime119894 120573 + 119906119894
119884119894(1) = 120572 + 120591 + 119883prime119894 120573 + 119906119894
bull Constant effects because 119884119894(1) minus 119884119894(0) = 120591 for all unitsbull Use consistency to get the usual regression formula
119884119894 = 119884119894(1)119863119894 + 119884119894(0)(1 minus 119863119894)= 119884119894(0) + 1114100119884119894(1) minus 119884119894(0)1114103 sdot 119863119894
= 120572 + 120591 sdot 119863119894 + 119883prime119894 120573 + 119906119894
bull Does no unmeasured confounding help us identify the causalparameter 120591
Regression on residuals
bull First estimate the residuals of regression of the treatment andoutcome on the covariates
119894 = 119884119894 minus 120124[119884119894|119883119894]119894 = 119863119894 minus 120124[119863119894|119883119894]
bull Running a regression of 119894 on 119894 is equivalent to controllingfor 119883119894
119884119894 = 120572 + 120591 sdot 119863119894 + 119883prime119894 120573 + 119906119894
119894 = 120572 + 120591 sdot 119894 + 119894bull Here 119894 = 119906119894 minus 120124[119906119894|119883119894]
What does OLS estimate
bull Using the usual OLS theory we can show that the probabilitylimit of the OLS estimator of 120591 is
plim 111369611136931113700 =Cov(119894 119894)Var(119894)
= Cov(119894 120572 + 120591 sdot 119894 + 119894)Var(119894)
= 120591 sdot Cov(119894 119894) + Cov(119894 119894)Var(119894)
= 120591 + Cov(119894 119894)Var(119894)
Key OLS assumption
plim 111369611136931113700 = 120591 + Cov(119894 119894)Var(119894)
bull Key identification comes from Cov(119894 119894) = 0 Conditional on 119883119894 no relationship between 119863119894 and 119906119894
bull Note 119906119894 is a function of 119883119894 and 119884119894(119889) 119906119894 = 119884119894(0) minus 120572 minus 119883prime
119894 120573 when 119863119894 = 0 119906119894 = 119884119894(1) minus 120572 minus 120591 minus 119883prime
119894 120573 when 119863119894 = 1 condition on 119883119894 only variation in 119906119894 comes from 119884119894(119889)
bull No unmeasured confounding implies this assumption
119863119894 ⟂⟂ 1114100119884119894(1) 119884119894(0)1114103|119883119894 ⟹ 119863119894 ⟂⟂ 119906119894|119883119894 ⟹ Cov(119894 119894) = 0
Omitted variable bias
bull What happens when this is violated Suppose that there isone omitted variable (residualized from 119883119894)
119894 = 120582119894 + 120596119894
bull Wersquoll assume that if we could measure 119871119894 then nounmeasured confounding would hold
bull Leads to inconsistency in the OLS estimator
plim 111369611136931113700 = 120591 + 120582Cov(119894 119894)Var(119894)
bull Bias here is terms multiplied together1 coefficient on 119871119894 (120582)2 the coefficient of regression of 119863119894 on 119871119894 also controlling for 119883119894
4 Estimating causaleffects under nounmeasuredconfounders
Basic approach to estimation
bull Remember the usual approach to estimating the ATE withcovariates
bull Stratification Stratify the units by the covariates Calculate CATE within these strata
bull Standardizationdirect adjustment Average the CATEs across the strata to get ATE
bull How to create strata when 119883 has continuous components If 119883 is discrete with only a few levels can use the exact values
of 119883 Otherwise we may have to subclassifycoarsen the data
Classic example cigarspipes versuscigarettes
bull 119863119894 = 1 for pipecigar smokers 119863119894 = 0 for cigarette smokersbull 119884119894 = death in the first year of follow-upbull Naive positive effect cigarpipe smokers more likely to die
Whatrsquos the confounder here Age Pipecigar smokers much older than cigarette smokers
bull Cochranrsquos approach stratify based on coarsened age Divide age into 119896 strata 119878119894 isin 1199041113568 1199041113569 hellip 119904119896 1199041113568 might be 18-25 1199041113569 might be 26-35 and so on Calculate effect within strata and aggregate
bull Key assumption no unmeasured confounders using stratifiedversion of age
119863119894 ⟂⟂ 1114100119884119894(0) 119884119894(1)1114103|119878119894
Stratification on the propensity score
bull What about when 119883 has has many dimensionsbull Curse of dimensionality there will be very few if any units in
a given stratum of 119883119894bull Stratify on a low-dimensional summary the propensity score
119890(119909) = ℙ[119863119894 = 1|119883119894 = 119909]
PS = unitrsquos probability of being treated conditional on 119883119894
bull For a particular unit this is 119890(119883119894) = ℙ[119863119894 = 1|119883119894]bull Rosenbaum and Rubin (1983) showed that
119863119894 ⟂⟂ 1114100119884119894(0) 119884119894(1)1114103 | 119883119894 ⟹ 119863119894 ⟂⟂ 1114100119884119894(0) 119884119894(1)1114103 | 119890(119883119894)
stratifying on 119890119894 is the same as stratifying on the full 119883119894
Propensity score as balancing score
bull The propensity score is actually a balancing score whichmeans that
119863119894 ⟂⟂ 119883119894 | 119890(119883119894)
bull Conditional on the propensity score treatment is independentof the covariates
Treatment status is said to be balanced 119891(119883119894|119863119894 = 1 119890(119883119894)) = 119891(119883119894|119863119894 = 0 119890(119883119894))
bull Of course we have to know the true PS to have all theseresults work
Estimating the propensity score
bull Of course in observational studies we donrsquot know thepropensity score
bull We would run a parametric model with parameters 120574 toestimate the propensity scores
1 Estimate 2 Create 119894 = Pr[119863119894 = 1|119883119894 ]
bull For instance in R we could easily calculate the propensityscores using the glm function
pscores lt- glm(treat ~ var1 + var2 + var3 data = mydata
family = binomial())$fittedvalues
Propensity score specifics
bull What variables do we include in the propensity score model Any set of variables that blocks all the backdoor paths from 119863119894
to 119884119894
bull Check balance within strata of 119894 Covariates should bebalanced
119891(119883119894|119863119894 = 1 119894) = 119891(119883119894|119863119894 = 0 119894)
bull Can also use automatednonparametric tools for estimating 119894 Covariate Balancing Propensity Scores (Imai and Ratkovic)
Stratifying by the propensity score
bull How will we use the propensity score Matching (next week) Weighting (two weeks) Regression
(three weeks)bull Today coarsening the propensity score and stratifyingbull Choose boundary points 0 = 1198871113567 lt 1198871113568 lt hellip lt 119887119870minus1113568 lt 119887119870 = 1bull Create block indicators
119861119894(119896) =
⎧⎪⎪⎨⎪⎪⎩1 if 119887119896minus1113568 lt (119883119894) lt 1198871198960 otherwise
bull Calculate within-strata effect estimates
120591119896 = 120124[119884119894|119863119894 = 1 119861119894(119896) = 1] minus 120124[119884119894|119863119894 = 0 119861119894(119896) = 1]
Standardizationdirect adjustment
bull We calculated the CATEs for each strata of the PS 120591119896bull We can use law of iterated expectations to back out the ATEbull Take the average of the CATEs over the distribution of 119883
120591 =1198701114012119896=1113568
120591119896ℙ[119861119894(119896) = 1]
bull Note that ℙ[119861119894(119896) = 1] is just the proportion of units in block 119896
ℙ[119861119894(119896) = 1] =sum119873
119894=1113568 119861119894(119896)119873
5 Wrapping Up
Summary
bull Defined observational studiesbull Defined confounding and assessed when no unmeasured
confounding holdsbull Saw how no unmeasured confounding helps with OLSbull Saw how to estimate causal effects under no unmeasured
confounding using the propensity score
Next few weeks
bull Learn how to estimate causal effects under no unmeasuredconfounders via
Matching Weighting Regression
bull Then we move onto situations where no unmeasuredconfounders is violated
Going to a superpopulation
bull From here on out wersquoll focus less on the finite populationmodel
Harder with (functionally) continuous covariatesbull Assume that each unit 119894 is drawn from an infinite
superpopulation implies that (119884119894(0) 119884119894(1) 119863119894 119883119894) are a draw from their
population joint distributionbull Potential outcomes are now typical random variables
120583119888(119909) = 120124[119884119894(0)|119883119894 = 119909] and 120583119905(119909) = 120124[119884119894(1)|119883119894 = 119909] 1205901113569119888 (119909) = 120141[119884119894(0)|119883119894 = 119909] and 1205901113569119905 (119909) = 120141[119884119894(1)|119883119894 = 119909] 120591 = 120124[120583119905(119909) minus 120583119888(119909)|119883119894 = 119909]
Assumptions in the superpopulation
bull With an infinite superpopulation worry less aboutconditioning on the entire sample
Units are now independent due to random sampling from aninfinite population
bull No unmeasured confoudning implies that
ℙ(119863119894 = 1|119884119894(0) 119884119894(1) 119883119894) = ℙ(119863119894 = 1|119883119894)
bull Or written using conditional independence
119863119894 ⟂⟂ 1114100119884119894(0) 119884119894(1)1114103|119883119894
bull Positivity can be written 0 lt ℙ(119863119894 = 1|119883119894 = 119909) lt 1 for all 119909 inthe support of 119883119894
2 Confounding
What is confounding
bull Confounding is the bias caused by common causes of thetreatment and outcome
Leads to ldquospurious correlationrdquobull In observational studies the goal is to avoid confounding
inherent in the databull Pervasive in the social sciences
effect of income on voting (confounding age) effect of job training program on employment (confounding
motivation) effect of political institutions on economic development
(confounding previous economic development)bull No unmeasured confounding assumes that wersquove measured all
sources of confounding
Big problem
bull How can we determine if no unmeasured confounding holds ifwe didnrsquot assign the treatment
bull Put differently What covariates do we need to condition on What covariates do we need to match on What covaraites do we need to include in our regressions
bull One way from the assumption itself ℙ[119863119894 = 1|119831 119832(1) 119832(0)] = ℙ[119863119894 = 1|119831] Include covariates such that conditional on them the
treatment assignment does not depend on the potentialoutcomes
bull Another way use DAGs and look at back-door paths
Backdoor paths and blocking paths
bull Backdoor path is a non-causal path from 119863 to 119884 Would remain if we removed any arrows pointing out of 119863
bull Backdoor paths between 119863 and 119884 common causes of 119863and 119884
119863
119883
119884
bull Here there is a backdoor path 119863 larr 119883 rarr 119884 where 119883 is acommon cause for the treatment and the outcome
Other types of confounding
119863
119880 119883
119884
bull 119863 is enrolling in a job training programbull 119884 is getting a jobbull 119880 is being motivatedbull 119883 is number of job applications sent outbull Big assumption here no arrow from 119880 to 119884
Other types of confounding
119863
119880 119883
119884
bull 119863 is exercisebull 119884 is having a diseasebull 119880 is lifestylebull 119883 is smokingbull Big assumption here no arrow from 119880 to 119884
Whatrsquos the problem with backdoorpaths
119863
119880 119883
119884
bull A path is blocked if1 we control for or stratify a non-collider on that path OR2 we do not control for a collider
bull Unblocked backdoor paths confoundingbull In the DAG here if we condition on 119883 then the backdoor
path is blocked
Not all backdoor paths
119863
1198801113568119883119883
119884
bull Conditioning on the posttreatment covariates opens thenon-causal path
selection bias
M-bias
119863
1198801113568 1198801113569119883119883
119884
bull Not all backdoor paths induce confoundingbull This backdoor path is blocked by the collider 119883119894 that we donrsquot
control forbull If we control for 119883119894 opens the path and induces
confounding Sometimes called M-bias
bull Controversial because of differing views on what to control for Rubin thinks that M-bias is a ldquomathematical curiosityrdquo and we
should control for all pretreatment variables Pearl and others think M-bias is a real threat
Backdoor criterion
bull Can we use a DAG to evaluate no unmeasured confoundersbull Pearl answered yes with the backdoor criterion which states
that the effect of 119863 on 119884 is identified if1 No backdoor paths from 119863 to 119884 OR2 Measured covariates are sufficient to block all backdoor paths
from 119863 to 119884
bull First is really only valid for randomized experimentsbull The backdoor criterion is fairly powerful Tells us
if there confounding given this DAG if it is possible to removing the confounding and what variables to condition on to eliminate the confounding
SWIGs
119863 | 119889 119884(119889)
119880 119883
119884
bull Itrsquos a little hard to see how the backdoor criterion implies nounmeasured confounders
No potential outcomes on this graphbull Richardson and Robins Single World Intervention Graphs
Split 119863 node into natural value (119863) and intervention value 119889 Let all effects of 119863 take their potential value under
intervention 119884(119889)bull Now can see are 119863 and 119884(119889) related
119863 larr 119880 rarr 119883 rarr 119884(119889) implies not independent Conditioning on 119883 blocks that backdoor path 119863 ⟂⟂ 119884(119889)|119883
No unmeasured confounders is nottestable
bull No unmeasured confounding places no restrictions on theobserved data
1114100119884119894(0)|119863119894 = 1119883119894111410311140591113840111384011138401113840111384011138401113840111384011138401113840111406011138401113840111384011138401113840111384011138401113840111384011138401114061unobserved
119889= 1114100119884119894(0)|119863119894 = 0119883119894111410311140591113840111384011138401113840111384011138401113840111384011138401113840111406011138401113840111384011138401113840111384011138401113840111384011138401114061observed
bull Here 119889= means equal in distributionbull No way to directly test this assumption without the
counterfactual data which is missing by definitionbull With backdoor criterion you must have the correct DAG
Assessing no unmeasured confounders
bull Can do ldquoplacebordquo tests where 119863119894 cannot have an effect(lagged outcomes etc)
bull Della Vigna and Kaplan (2007 QJE) effect of Fox Newsavailability on Republican vote share
Availability in 20002003 canrsquot affect past vote sharesbull Unconfoundedness could still be violated even if you pass this
test
Alternatives to no unmeasuredconfounding
bull Without explicit randomization we need some way ofidentifying causal effects
bull No unmeasured confounders asymp randomized experiment Indentification results very similar to experiments
bull With unmeasured confounding are we doomed Maybe notbull Other approaches rely on finding plausibly exogenous variation
in assignment of 119863119894 Instrumental variables (randomization + exclusion restriction) Over-time variation (diff-in-diff fixed effects) Arbitrary thresholds for treatment assignment (RDD)
3 No unmeasuredconfounders and OLS
Justifying regression
bull We know how randomized experiments imply thatdifferences-in-means identify the ATE
bull In the next few weeks wersquoll work through how no unmeasuredconfounding justifies a number of estimation strategies
bull Today itrsquos useful to walk through what no unmeasuredconfounding can buy us in a familiar setting OLS
Wersquoll cover regression more formally later
Constant effects set up
bull Assume a constant effects setup
119884119894(0) = 120572 + 119883prime119894 120573 + 119906119894
119884119894(1) = 120572 + 120591 + 119883prime119894 120573 + 119906119894
bull Constant effects because 119884119894(1) minus 119884119894(0) = 120591 for all unitsbull Use consistency to get the usual regression formula
119884119894 = 119884119894(1)119863119894 + 119884119894(0)(1 minus 119863119894)= 119884119894(0) + 1114100119884119894(1) minus 119884119894(0)1114103 sdot 119863119894
= 120572 + 120591 sdot 119863119894 + 119883prime119894 120573 + 119906119894
bull Does no unmeasured confounding help us identify the causalparameter 120591
Regression on residuals
bull First estimate the residuals of regression of the treatment andoutcome on the covariates
119894 = 119884119894 minus 120124[119884119894|119883119894]119894 = 119863119894 minus 120124[119863119894|119883119894]
bull Running a regression of 119894 on 119894 is equivalent to controllingfor 119883119894
119884119894 = 120572 + 120591 sdot 119863119894 + 119883prime119894 120573 + 119906119894
119894 = 120572 + 120591 sdot 119894 + 119894bull Here 119894 = 119906119894 minus 120124[119906119894|119883119894]
What does OLS estimate
bull Using the usual OLS theory we can show that the probabilitylimit of the OLS estimator of 120591 is
plim 111369611136931113700 =Cov(119894 119894)Var(119894)
= Cov(119894 120572 + 120591 sdot 119894 + 119894)Var(119894)
= 120591 sdot Cov(119894 119894) + Cov(119894 119894)Var(119894)
= 120591 + Cov(119894 119894)Var(119894)
Key OLS assumption
plim 111369611136931113700 = 120591 + Cov(119894 119894)Var(119894)
bull Key identification comes from Cov(119894 119894) = 0 Conditional on 119883119894 no relationship between 119863119894 and 119906119894
bull Note 119906119894 is a function of 119883119894 and 119884119894(119889) 119906119894 = 119884119894(0) minus 120572 minus 119883prime
119894 120573 when 119863119894 = 0 119906119894 = 119884119894(1) minus 120572 minus 120591 minus 119883prime
119894 120573 when 119863119894 = 1 condition on 119883119894 only variation in 119906119894 comes from 119884119894(119889)
bull No unmeasured confounding implies this assumption
119863119894 ⟂⟂ 1114100119884119894(1) 119884119894(0)1114103|119883119894 ⟹ 119863119894 ⟂⟂ 119906119894|119883119894 ⟹ Cov(119894 119894) = 0
Omitted variable bias
bull What happens when this is violated Suppose that there isone omitted variable (residualized from 119883119894)
119894 = 120582119894 + 120596119894
bull Wersquoll assume that if we could measure 119871119894 then nounmeasured confounding would hold
bull Leads to inconsistency in the OLS estimator
plim 111369611136931113700 = 120591 + 120582Cov(119894 119894)Var(119894)
bull Bias here is terms multiplied together1 coefficient on 119871119894 (120582)2 the coefficient of regression of 119863119894 on 119871119894 also controlling for 119883119894
4 Estimating causaleffects under nounmeasuredconfounders
Basic approach to estimation
bull Remember the usual approach to estimating the ATE withcovariates
bull Stratification Stratify the units by the covariates Calculate CATE within these strata
bull Standardizationdirect adjustment Average the CATEs across the strata to get ATE
bull How to create strata when 119883 has continuous components If 119883 is discrete with only a few levels can use the exact values
of 119883 Otherwise we may have to subclassifycoarsen the data
Classic example cigarspipes versuscigarettes
bull 119863119894 = 1 for pipecigar smokers 119863119894 = 0 for cigarette smokersbull 119884119894 = death in the first year of follow-upbull Naive positive effect cigarpipe smokers more likely to die
Whatrsquos the confounder here Age Pipecigar smokers much older than cigarette smokers
bull Cochranrsquos approach stratify based on coarsened age Divide age into 119896 strata 119878119894 isin 1199041113568 1199041113569 hellip 119904119896 1199041113568 might be 18-25 1199041113569 might be 26-35 and so on Calculate effect within strata and aggregate
bull Key assumption no unmeasured confounders using stratifiedversion of age
119863119894 ⟂⟂ 1114100119884119894(0) 119884119894(1)1114103|119878119894
Stratification on the propensity score
bull What about when 119883 has has many dimensionsbull Curse of dimensionality there will be very few if any units in
a given stratum of 119883119894bull Stratify on a low-dimensional summary the propensity score
119890(119909) = ℙ[119863119894 = 1|119883119894 = 119909]
PS = unitrsquos probability of being treated conditional on 119883119894
bull For a particular unit this is 119890(119883119894) = ℙ[119863119894 = 1|119883119894]bull Rosenbaum and Rubin (1983) showed that
119863119894 ⟂⟂ 1114100119884119894(0) 119884119894(1)1114103 | 119883119894 ⟹ 119863119894 ⟂⟂ 1114100119884119894(0) 119884119894(1)1114103 | 119890(119883119894)
stratifying on 119890119894 is the same as stratifying on the full 119883119894
Propensity score as balancing score
bull The propensity score is actually a balancing score whichmeans that
119863119894 ⟂⟂ 119883119894 | 119890(119883119894)
bull Conditional on the propensity score treatment is independentof the covariates
Treatment status is said to be balanced 119891(119883119894|119863119894 = 1 119890(119883119894)) = 119891(119883119894|119863119894 = 0 119890(119883119894))
bull Of course we have to know the true PS to have all theseresults work
Estimating the propensity score
bull Of course in observational studies we donrsquot know thepropensity score
bull We would run a parametric model with parameters 120574 toestimate the propensity scores
1 Estimate 2 Create 119894 = Pr[119863119894 = 1|119883119894 ]
bull For instance in R we could easily calculate the propensityscores using the glm function
pscores lt- glm(treat ~ var1 + var2 + var3 data = mydata
family = binomial())$fittedvalues
Propensity score specifics
bull What variables do we include in the propensity score model Any set of variables that blocks all the backdoor paths from 119863119894
to 119884119894
bull Check balance within strata of 119894 Covariates should bebalanced
119891(119883119894|119863119894 = 1 119894) = 119891(119883119894|119863119894 = 0 119894)
bull Can also use automatednonparametric tools for estimating 119894 Covariate Balancing Propensity Scores (Imai and Ratkovic)
Stratifying by the propensity score
bull How will we use the propensity score Matching (next week) Weighting (two weeks) Regression
(three weeks)bull Today coarsening the propensity score and stratifyingbull Choose boundary points 0 = 1198871113567 lt 1198871113568 lt hellip lt 119887119870minus1113568 lt 119887119870 = 1bull Create block indicators
119861119894(119896) =
⎧⎪⎪⎨⎪⎪⎩1 if 119887119896minus1113568 lt (119883119894) lt 1198871198960 otherwise
bull Calculate within-strata effect estimates
120591119896 = 120124[119884119894|119863119894 = 1 119861119894(119896) = 1] minus 120124[119884119894|119863119894 = 0 119861119894(119896) = 1]
Standardizationdirect adjustment
bull We calculated the CATEs for each strata of the PS 120591119896bull We can use law of iterated expectations to back out the ATEbull Take the average of the CATEs over the distribution of 119883
120591 =1198701114012119896=1113568
120591119896ℙ[119861119894(119896) = 1]
bull Note that ℙ[119861119894(119896) = 1] is just the proportion of units in block 119896
ℙ[119861119894(119896) = 1] =sum119873
119894=1113568 119861119894(119896)119873
5 Wrapping Up
Summary
bull Defined observational studiesbull Defined confounding and assessed when no unmeasured
confounding holdsbull Saw how no unmeasured confounding helps with OLSbull Saw how to estimate causal effects under no unmeasured
confounding using the propensity score
Next few weeks
bull Learn how to estimate causal effects under no unmeasuredconfounders via
Matching Weighting Regression
bull Then we move onto situations where no unmeasuredconfounders is violated
Assumptions in the superpopulation
bull With an infinite superpopulation worry less aboutconditioning on the entire sample
Units are now independent due to random sampling from aninfinite population
bull No unmeasured confoudning implies that
ℙ(119863119894 = 1|119884119894(0) 119884119894(1) 119883119894) = ℙ(119863119894 = 1|119883119894)
bull Or written using conditional independence
119863119894 ⟂⟂ 1114100119884119894(0) 119884119894(1)1114103|119883119894
bull Positivity can be written 0 lt ℙ(119863119894 = 1|119883119894 = 119909) lt 1 for all 119909 inthe support of 119883119894
2 Confounding
What is confounding
bull Confounding is the bias caused by common causes of thetreatment and outcome
Leads to ldquospurious correlationrdquobull In observational studies the goal is to avoid confounding
inherent in the databull Pervasive in the social sciences
effect of income on voting (confounding age) effect of job training program on employment (confounding
motivation) effect of political institutions on economic development
(confounding previous economic development)bull No unmeasured confounding assumes that wersquove measured all
sources of confounding
Big problem
bull How can we determine if no unmeasured confounding holds ifwe didnrsquot assign the treatment
bull Put differently What covariates do we need to condition on What covariates do we need to match on What covaraites do we need to include in our regressions
bull One way from the assumption itself ℙ[119863119894 = 1|119831 119832(1) 119832(0)] = ℙ[119863119894 = 1|119831] Include covariates such that conditional on them the
treatment assignment does not depend on the potentialoutcomes
bull Another way use DAGs and look at back-door paths
Backdoor paths and blocking paths
bull Backdoor path is a non-causal path from 119863 to 119884 Would remain if we removed any arrows pointing out of 119863
bull Backdoor paths between 119863 and 119884 common causes of 119863and 119884
119863
119883
119884
bull Here there is a backdoor path 119863 larr 119883 rarr 119884 where 119883 is acommon cause for the treatment and the outcome
Other types of confounding
119863
119880 119883
119884
bull 119863 is enrolling in a job training programbull 119884 is getting a jobbull 119880 is being motivatedbull 119883 is number of job applications sent outbull Big assumption here no arrow from 119880 to 119884
Other types of confounding
119863
119880 119883
119884
bull 119863 is exercisebull 119884 is having a diseasebull 119880 is lifestylebull 119883 is smokingbull Big assumption here no arrow from 119880 to 119884
Whatrsquos the problem with backdoorpaths
119863
119880 119883
119884
bull A path is blocked if1 we control for or stratify a non-collider on that path OR2 we do not control for a collider
bull Unblocked backdoor paths confoundingbull In the DAG here if we condition on 119883 then the backdoor
path is blocked
Not all backdoor paths
119863
1198801113568119883119883
119884
bull Conditioning on the posttreatment covariates opens thenon-causal path
selection bias
M-bias
119863
1198801113568 1198801113569119883119883
119884
bull Not all backdoor paths induce confoundingbull This backdoor path is blocked by the collider 119883119894 that we donrsquot
control forbull If we control for 119883119894 opens the path and induces
confounding Sometimes called M-bias
bull Controversial because of differing views on what to control for Rubin thinks that M-bias is a ldquomathematical curiosityrdquo and we
should control for all pretreatment variables Pearl and others think M-bias is a real threat
Backdoor criterion
bull Can we use a DAG to evaluate no unmeasured confoundersbull Pearl answered yes with the backdoor criterion which states
that the effect of 119863 on 119884 is identified if1 No backdoor paths from 119863 to 119884 OR2 Measured covariates are sufficient to block all backdoor paths
from 119863 to 119884
bull First is really only valid for randomized experimentsbull The backdoor criterion is fairly powerful Tells us
if there confounding given this DAG if it is possible to removing the confounding and what variables to condition on to eliminate the confounding
SWIGs
119863 | 119889 119884(119889)
119880 119883
119884
bull Itrsquos a little hard to see how the backdoor criterion implies nounmeasured confounders
No potential outcomes on this graphbull Richardson and Robins Single World Intervention Graphs
Split 119863 node into natural value (119863) and intervention value 119889 Let all effects of 119863 take their potential value under
intervention 119884(119889)bull Now can see are 119863 and 119884(119889) related
119863 larr 119880 rarr 119883 rarr 119884(119889) implies not independent Conditioning on 119883 blocks that backdoor path 119863 ⟂⟂ 119884(119889)|119883
No unmeasured confounders is nottestable
bull No unmeasured confounding places no restrictions on theobserved data
1114100119884119894(0)|119863119894 = 1119883119894111410311140591113840111384011138401113840111384011138401113840111384011138401113840111406011138401113840111384011138401113840111384011138401113840111384011138401114061unobserved
119889= 1114100119884119894(0)|119863119894 = 0119883119894111410311140591113840111384011138401113840111384011138401113840111384011138401113840111406011138401113840111384011138401113840111384011138401113840111384011138401114061observed
bull Here 119889= means equal in distributionbull No way to directly test this assumption without the
counterfactual data which is missing by definitionbull With backdoor criterion you must have the correct DAG
Assessing no unmeasured confounders
bull Can do ldquoplacebordquo tests where 119863119894 cannot have an effect(lagged outcomes etc)
bull Della Vigna and Kaplan (2007 QJE) effect of Fox Newsavailability on Republican vote share
Availability in 20002003 canrsquot affect past vote sharesbull Unconfoundedness could still be violated even if you pass this
test
Alternatives to no unmeasuredconfounding
bull Without explicit randomization we need some way ofidentifying causal effects
bull No unmeasured confounders asymp randomized experiment Indentification results very similar to experiments
bull With unmeasured confounding are we doomed Maybe notbull Other approaches rely on finding plausibly exogenous variation
in assignment of 119863119894 Instrumental variables (randomization + exclusion restriction) Over-time variation (diff-in-diff fixed effects) Arbitrary thresholds for treatment assignment (RDD)
3 No unmeasuredconfounders and OLS
Justifying regression
bull We know how randomized experiments imply thatdifferences-in-means identify the ATE
bull In the next few weeks wersquoll work through how no unmeasuredconfounding justifies a number of estimation strategies
bull Today itrsquos useful to walk through what no unmeasuredconfounding can buy us in a familiar setting OLS
Wersquoll cover regression more formally later
Constant effects set up
bull Assume a constant effects setup
119884119894(0) = 120572 + 119883prime119894 120573 + 119906119894
119884119894(1) = 120572 + 120591 + 119883prime119894 120573 + 119906119894
bull Constant effects because 119884119894(1) minus 119884119894(0) = 120591 for all unitsbull Use consistency to get the usual regression formula
119884119894 = 119884119894(1)119863119894 + 119884119894(0)(1 minus 119863119894)= 119884119894(0) + 1114100119884119894(1) minus 119884119894(0)1114103 sdot 119863119894
= 120572 + 120591 sdot 119863119894 + 119883prime119894 120573 + 119906119894
bull Does no unmeasured confounding help us identify the causalparameter 120591
Regression on residuals
bull First estimate the residuals of regression of the treatment andoutcome on the covariates
119894 = 119884119894 minus 120124[119884119894|119883119894]119894 = 119863119894 minus 120124[119863119894|119883119894]
bull Running a regression of 119894 on 119894 is equivalent to controllingfor 119883119894
119884119894 = 120572 + 120591 sdot 119863119894 + 119883prime119894 120573 + 119906119894
119894 = 120572 + 120591 sdot 119894 + 119894bull Here 119894 = 119906119894 minus 120124[119906119894|119883119894]
What does OLS estimate
bull Using the usual OLS theory we can show that the probabilitylimit of the OLS estimator of 120591 is
plim 111369611136931113700 =Cov(119894 119894)Var(119894)
= Cov(119894 120572 + 120591 sdot 119894 + 119894)Var(119894)
= 120591 sdot Cov(119894 119894) + Cov(119894 119894)Var(119894)
= 120591 + Cov(119894 119894)Var(119894)
Key OLS assumption
plim 111369611136931113700 = 120591 + Cov(119894 119894)Var(119894)
bull Key identification comes from Cov(119894 119894) = 0 Conditional on 119883119894 no relationship between 119863119894 and 119906119894
bull Note 119906119894 is a function of 119883119894 and 119884119894(119889) 119906119894 = 119884119894(0) minus 120572 minus 119883prime
119894 120573 when 119863119894 = 0 119906119894 = 119884119894(1) minus 120572 minus 120591 minus 119883prime
119894 120573 when 119863119894 = 1 condition on 119883119894 only variation in 119906119894 comes from 119884119894(119889)
bull No unmeasured confounding implies this assumption
119863119894 ⟂⟂ 1114100119884119894(1) 119884119894(0)1114103|119883119894 ⟹ 119863119894 ⟂⟂ 119906119894|119883119894 ⟹ Cov(119894 119894) = 0
Omitted variable bias
bull What happens when this is violated Suppose that there isone omitted variable (residualized from 119883119894)
119894 = 120582119894 + 120596119894
bull Wersquoll assume that if we could measure 119871119894 then nounmeasured confounding would hold
bull Leads to inconsistency in the OLS estimator
plim 111369611136931113700 = 120591 + 120582Cov(119894 119894)Var(119894)
bull Bias here is terms multiplied together1 coefficient on 119871119894 (120582)2 the coefficient of regression of 119863119894 on 119871119894 also controlling for 119883119894
4 Estimating causaleffects under nounmeasuredconfounders
Basic approach to estimation
bull Remember the usual approach to estimating the ATE withcovariates
bull Stratification Stratify the units by the covariates Calculate CATE within these strata
bull Standardizationdirect adjustment Average the CATEs across the strata to get ATE
bull How to create strata when 119883 has continuous components If 119883 is discrete with only a few levels can use the exact values
of 119883 Otherwise we may have to subclassifycoarsen the data
Classic example cigarspipes versuscigarettes
bull 119863119894 = 1 for pipecigar smokers 119863119894 = 0 for cigarette smokersbull 119884119894 = death in the first year of follow-upbull Naive positive effect cigarpipe smokers more likely to die
Whatrsquos the confounder here Age Pipecigar smokers much older than cigarette smokers
bull Cochranrsquos approach stratify based on coarsened age Divide age into 119896 strata 119878119894 isin 1199041113568 1199041113569 hellip 119904119896 1199041113568 might be 18-25 1199041113569 might be 26-35 and so on Calculate effect within strata and aggregate
bull Key assumption no unmeasured confounders using stratifiedversion of age
119863119894 ⟂⟂ 1114100119884119894(0) 119884119894(1)1114103|119878119894
Stratification on the propensity score
bull What about when 119883 has has many dimensionsbull Curse of dimensionality there will be very few if any units in
a given stratum of 119883119894bull Stratify on a low-dimensional summary the propensity score
119890(119909) = ℙ[119863119894 = 1|119883119894 = 119909]
PS = unitrsquos probability of being treated conditional on 119883119894
bull For a particular unit this is 119890(119883119894) = ℙ[119863119894 = 1|119883119894]bull Rosenbaum and Rubin (1983) showed that
119863119894 ⟂⟂ 1114100119884119894(0) 119884119894(1)1114103 | 119883119894 ⟹ 119863119894 ⟂⟂ 1114100119884119894(0) 119884119894(1)1114103 | 119890(119883119894)
stratifying on 119890119894 is the same as stratifying on the full 119883119894
Propensity score as balancing score
bull The propensity score is actually a balancing score whichmeans that
119863119894 ⟂⟂ 119883119894 | 119890(119883119894)
bull Conditional on the propensity score treatment is independentof the covariates
Treatment status is said to be balanced 119891(119883119894|119863119894 = 1 119890(119883119894)) = 119891(119883119894|119863119894 = 0 119890(119883119894))
bull Of course we have to know the true PS to have all theseresults work
Estimating the propensity score
bull Of course in observational studies we donrsquot know thepropensity score
bull We would run a parametric model with parameters 120574 toestimate the propensity scores
1 Estimate 2 Create 119894 = Pr[119863119894 = 1|119883119894 ]
bull For instance in R we could easily calculate the propensityscores using the glm function
pscores lt- glm(treat ~ var1 + var2 + var3 data = mydata
family = binomial())$fittedvalues
Propensity score specifics
bull What variables do we include in the propensity score model Any set of variables that blocks all the backdoor paths from 119863119894
to 119884119894
bull Check balance within strata of 119894 Covariates should bebalanced
119891(119883119894|119863119894 = 1 119894) = 119891(119883119894|119863119894 = 0 119894)
bull Can also use automatednonparametric tools for estimating 119894 Covariate Balancing Propensity Scores (Imai and Ratkovic)
Stratifying by the propensity score
bull How will we use the propensity score Matching (next week) Weighting (two weeks) Regression
(three weeks)bull Today coarsening the propensity score and stratifyingbull Choose boundary points 0 = 1198871113567 lt 1198871113568 lt hellip lt 119887119870minus1113568 lt 119887119870 = 1bull Create block indicators
119861119894(119896) =
⎧⎪⎪⎨⎪⎪⎩1 if 119887119896minus1113568 lt (119883119894) lt 1198871198960 otherwise
bull Calculate within-strata effect estimates
120591119896 = 120124[119884119894|119863119894 = 1 119861119894(119896) = 1] minus 120124[119884119894|119863119894 = 0 119861119894(119896) = 1]
Standardizationdirect adjustment
bull We calculated the CATEs for each strata of the PS 120591119896bull We can use law of iterated expectations to back out the ATEbull Take the average of the CATEs over the distribution of 119883
120591 =1198701114012119896=1113568
120591119896ℙ[119861119894(119896) = 1]
bull Note that ℙ[119861119894(119896) = 1] is just the proportion of units in block 119896
ℙ[119861119894(119896) = 1] =sum119873
119894=1113568 119861119894(119896)119873
5 Wrapping Up
Summary
bull Defined observational studiesbull Defined confounding and assessed when no unmeasured
confounding holdsbull Saw how no unmeasured confounding helps with OLSbull Saw how to estimate causal effects under no unmeasured
confounding using the propensity score
Next few weeks
bull Learn how to estimate causal effects under no unmeasuredconfounders via
Matching Weighting Regression
bull Then we move onto situations where no unmeasuredconfounders is violated
2 Confounding
What is confounding
bull Confounding is the bias caused by common causes of thetreatment and outcome
Leads to ldquospurious correlationrdquobull In observational studies the goal is to avoid confounding
inherent in the databull Pervasive in the social sciences
effect of income on voting (confounding age) effect of job training program on employment (confounding
motivation) effect of political institutions on economic development
(confounding previous economic development)bull No unmeasured confounding assumes that wersquove measured all
sources of confounding
Big problem
bull How can we determine if no unmeasured confounding holds ifwe didnrsquot assign the treatment
bull Put differently What covariates do we need to condition on What covariates do we need to match on What covaraites do we need to include in our regressions
bull One way from the assumption itself ℙ[119863119894 = 1|119831 119832(1) 119832(0)] = ℙ[119863119894 = 1|119831] Include covariates such that conditional on them the
treatment assignment does not depend on the potentialoutcomes
bull Another way use DAGs and look at back-door paths
Backdoor paths and blocking paths
bull Backdoor path is a non-causal path from 119863 to 119884 Would remain if we removed any arrows pointing out of 119863
bull Backdoor paths between 119863 and 119884 common causes of 119863and 119884
119863
119883
119884
bull Here there is a backdoor path 119863 larr 119883 rarr 119884 where 119883 is acommon cause for the treatment and the outcome
Other types of confounding
119863
119880 119883
119884
bull 119863 is enrolling in a job training programbull 119884 is getting a jobbull 119880 is being motivatedbull 119883 is number of job applications sent outbull Big assumption here no arrow from 119880 to 119884
Other types of confounding
119863
119880 119883
119884
bull 119863 is exercisebull 119884 is having a diseasebull 119880 is lifestylebull 119883 is smokingbull Big assumption here no arrow from 119880 to 119884
Whatrsquos the problem with backdoorpaths
119863
119880 119883
119884
bull A path is blocked if1 we control for or stratify a non-collider on that path OR2 we do not control for a collider
bull Unblocked backdoor paths confoundingbull In the DAG here if we condition on 119883 then the backdoor
path is blocked
Not all backdoor paths
119863
1198801113568119883119883
119884
bull Conditioning on the posttreatment covariates opens thenon-causal path
selection bias
M-bias
119863
1198801113568 1198801113569119883119883
119884
bull Not all backdoor paths induce confoundingbull This backdoor path is blocked by the collider 119883119894 that we donrsquot
control forbull If we control for 119883119894 opens the path and induces
confounding Sometimes called M-bias
bull Controversial because of differing views on what to control for Rubin thinks that M-bias is a ldquomathematical curiosityrdquo and we
should control for all pretreatment variables Pearl and others think M-bias is a real threat
Backdoor criterion
bull Can we use a DAG to evaluate no unmeasured confoundersbull Pearl answered yes with the backdoor criterion which states
that the effect of 119863 on 119884 is identified if1 No backdoor paths from 119863 to 119884 OR2 Measured covariates are sufficient to block all backdoor paths
from 119863 to 119884
bull First is really only valid for randomized experimentsbull The backdoor criterion is fairly powerful Tells us
if there confounding given this DAG if it is possible to removing the confounding and what variables to condition on to eliminate the confounding
SWIGs
119863 | 119889 119884(119889)
119880 119883
119884
bull Itrsquos a little hard to see how the backdoor criterion implies nounmeasured confounders
No potential outcomes on this graphbull Richardson and Robins Single World Intervention Graphs
Split 119863 node into natural value (119863) and intervention value 119889 Let all effects of 119863 take their potential value under
intervention 119884(119889)bull Now can see are 119863 and 119884(119889) related
119863 larr 119880 rarr 119883 rarr 119884(119889) implies not independent Conditioning on 119883 blocks that backdoor path 119863 ⟂⟂ 119884(119889)|119883
No unmeasured confounders is nottestable
bull No unmeasured confounding places no restrictions on theobserved data
1114100119884119894(0)|119863119894 = 1119883119894111410311140591113840111384011138401113840111384011138401113840111384011138401113840111406011138401113840111384011138401113840111384011138401113840111384011138401114061unobserved
119889= 1114100119884119894(0)|119863119894 = 0119883119894111410311140591113840111384011138401113840111384011138401113840111384011138401113840111406011138401113840111384011138401113840111384011138401113840111384011138401114061observed
bull Here 119889= means equal in distributionbull No way to directly test this assumption without the
counterfactual data which is missing by definitionbull With backdoor criterion you must have the correct DAG
Assessing no unmeasured confounders
bull Can do ldquoplacebordquo tests where 119863119894 cannot have an effect(lagged outcomes etc)
bull Della Vigna and Kaplan (2007 QJE) effect of Fox Newsavailability on Republican vote share
Availability in 20002003 canrsquot affect past vote sharesbull Unconfoundedness could still be violated even if you pass this
test
Alternatives to no unmeasuredconfounding
bull Without explicit randomization we need some way ofidentifying causal effects
bull No unmeasured confounders asymp randomized experiment Indentification results very similar to experiments
bull With unmeasured confounding are we doomed Maybe notbull Other approaches rely on finding plausibly exogenous variation
in assignment of 119863119894 Instrumental variables (randomization + exclusion restriction) Over-time variation (diff-in-diff fixed effects) Arbitrary thresholds for treatment assignment (RDD)
3 No unmeasuredconfounders and OLS
Justifying regression
bull We know how randomized experiments imply thatdifferences-in-means identify the ATE
bull In the next few weeks wersquoll work through how no unmeasuredconfounding justifies a number of estimation strategies
bull Today itrsquos useful to walk through what no unmeasuredconfounding can buy us in a familiar setting OLS
Wersquoll cover regression more formally later
Constant effects set up
bull Assume a constant effects setup
119884119894(0) = 120572 + 119883prime119894 120573 + 119906119894
119884119894(1) = 120572 + 120591 + 119883prime119894 120573 + 119906119894
bull Constant effects because 119884119894(1) minus 119884119894(0) = 120591 for all unitsbull Use consistency to get the usual regression formula
119884119894 = 119884119894(1)119863119894 + 119884119894(0)(1 minus 119863119894)= 119884119894(0) + 1114100119884119894(1) minus 119884119894(0)1114103 sdot 119863119894
= 120572 + 120591 sdot 119863119894 + 119883prime119894 120573 + 119906119894
bull Does no unmeasured confounding help us identify the causalparameter 120591
Regression on residuals
bull First estimate the residuals of regression of the treatment andoutcome on the covariates
119894 = 119884119894 minus 120124[119884119894|119883119894]119894 = 119863119894 minus 120124[119863119894|119883119894]
bull Running a regression of 119894 on 119894 is equivalent to controllingfor 119883119894
119884119894 = 120572 + 120591 sdot 119863119894 + 119883prime119894 120573 + 119906119894
119894 = 120572 + 120591 sdot 119894 + 119894bull Here 119894 = 119906119894 minus 120124[119906119894|119883119894]
What does OLS estimate
bull Using the usual OLS theory we can show that the probabilitylimit of the OLS estimator of 120591 is
plim 111369611136931113700 =Cov(119894 119894)Var(119894)
= Cov(119894 120572 + 120591 sdot 119894 + 119894)Var(119894)
= 120591 sdot Cov(119894 119894) + Cov(119894 119894)Var(119894)
= 120591 + Cov(119894 119894)Var(119894)
Key OLS assumption
plim 111369611136931113700 = 120591 + Cov(119894 119894)Var(119894)
bull Key identification comes from Cov(119894 119894) = 0 Conditional on 119883119894 no relationship between 119863119894 and 119906119894
bull Note 119906119894 is a function of 119883119894 and 119884119894(119889) 119906119894 = 119884119894(0) minus 120572 minus 119883prime
119894 120573 when 119863119894 = 0 119906119894 = 119884119894(1) minus 120572 minus 120591 minus 119883prime
119894 120573 when 119863119894 = 1 condition on 119883119894 only variation in 119906119894 comes from 119884119894(119889)
bull No unmeasured confounding implies this assumption
119863119894 ⟂⟂ 1114100119884119894(1) 119884119894(0)1114103|119883119894 ⟹ 119863119894 ⟂⟂ 119906119894|119883119894 ⟹ Cov(119894 119894) = 0
Omitted variable bias
bull What happens when this is violated Suppose that there isone omitted variable (residualized from 119883119894)
119894 = 120582119894 + 120596119894
bull Wersquoll assume that if we could measure 119871119894 then nounmeasured confounding would hold
bull Leads to inconsistency in the OLS estimator
plim 111369611136931113700 = 120591 + 120582Cov(119894 119894)Var(119894)
bull Bias here is terms multiplied together1 coefficient on 119871119894 (120582)2 the coefficient of regression of 119863119894 on 119871119894 also controlling for 119883119894
4 Estimating causaleffects under nounmeasuredconfounders
Basic approach to estimation
bull Remember the usual approach to estimating the ATE withcovariates
bull Stratification Stratify the units by the covariates Calculate CATE within these strata
bull Standardizationdirect adjustment Average the CATEs across the strata to get ATE
bull How to create strata when 119883 has continuous components If 119883 is discrete with only a few levels can use the exact values
of 119883 Otherwise we may have to subclassifycoarsen the data
Classic example cigarspipes versuscigarettes
bull 119863119894 = 1 for pipecigar smokers 119863119894 = 0 for cigarette smokersbull 119884119894 = death in the first year of follow-upbull Naive positive effect cigarpipe smokers more likely to die
Whatrsquos the confounder here Age Pipecigar smokers much older than cigarette smokers
bull Cochranrsquos approach stratify based on coarsened age Divide age into 119896 strata 119878119894 isin 1199041113568 1199041113569 hellip 119904119896 1199041113568 might be 18-25 1199041113569 might be 26-35 and so on Calculate effect within strata and aggregate
bull Key assumption no unmeasured confounders using stratifiedversion of age
119863119894 ⟂⟂ 1114100119884119894(0) 119884119894(1)1114103|119878119894
Stratification on the propensity score
bull What about when 119883 has has many dimensionsbull Curse of dimensionality there will be very few if any units in
a given stratum of 119883119894bull Stratify on a low-dimensional summary the propensity score
119890(119909) = ℙ[119863119894 = 1|119883119894 = 119909]
PS = unitrsquos probability of being treated conditional on 119883119894
bull For a particular unit this is 119890(119883119894) = ℙ[119863119894 = 1|119883119894]bull Rosenbaum and Rubin (1983) showed that
119863119894 ⟂⟂ 1114100119884119894(0) 119884119894(1)1114103 | 119883119894 ⟹ 119863119894 ⟂⟂ 1114100119884119894(0) 119884119894(1)1114103 | 119890(119883119894)
stratifying on 119890119894 is the same as stratifying on the full 119883119894
Propensity score as balancing score
bull The propensity score is actually a balancing score whichmeans that
119863119894 ⟂⟂ 119883119894 | 119890(119883119894)
bull Conditional on the propensity score treatment is independentof the covariates
Treatment status is said to be balanced 119891(119883119894|119863119894 = 1 119890(119883119894)) = 119891(119883119894|119863119894 = 0 119890(119883119894))
bull Of course we have to know the true PS to have all theseresults work
Estimating the propensity score
bull Of course in observational studies we donrsquot know thepropensity score
bull We would run a parametric model with parameters 120574 toestimate the propensity scores
1 Estimate 2 Create 119894 = Pr[119863119894 = 1|119883119894 ]
bull For instance in R we could easily calculate the propensityscores using the glm function
pscores lt- glm(treat ~ var1 + var2 + var3 data = mydata
family = binomial())$fittedvalues
Propensity score specifics
bull What variables do we include in the propensity score model Any set of variables that blocks all the backdoor paths from 119863119894
to 119884119894
bull Check balance within strata of 119894 Covariates should bebalanced
119891(119883119894|119863119894 = 1 119894) = 119891(119883119894|119863119894 = 0 119894)
bull Can also use automatednonparametric tools for estimating 119894 Covariate Balancing Propensity Scores (Imai and Ratkovic)
Stratifying by the propensity score
bull How will we use the propensity score Matching (next week) Weighting (two weeks) Regression
(three weeks)bull Today coarsening the propensity score and stratifyingbull Choose boundary points 0 = 1198871113567 lt 1198871113568 lt hellip lt 119887119870minus1113568 lt 119887119870 = 1bull Create block indicators
119861119894(119896) =
⎧⎪⎪⎨⎪⎪⎩1 if 119887119896minus1113568 lt (119883119894) lt 1198871198960 otherwise
bull Calculate within-strata effect estimates
120591119896 = 120124[119884119894|119863119894 = 1 119861119894(119896) = 1] minus 120124[119884119894|119863119894 = 0 119861119894(119896) = 1]
Standardizationdirect adjustment
bull We calculated the CATEs for each strata of the PS 120591119896bull We can use law of iterated expectations to back out the ATEbull Take the average of the CATEs over the distribution of 119883
120591 =1198701114012119896=1113568
120591119896ℙ[119861119894(119896) = 1]
bull Note that ℙ[119861119894(119896) = 1] is just the proportion of units in block 119896
ℙ[119861119894(119896) = 1] =sum119873
119894=1113568 119861119894(119896)119873
5 Wrapping Up
Summary
bull Defined observational studiesbull Defined confounding and assessed when no unmeasured
confounding holdsbull Saw how no unmeasured confounding helps with OLSbull Saw how to estimate causal effects under no unmeasured
confounding using the propensity score
Next few weeks
bull Learn how to estimate causal effects under no unmeasuredconfounders via
Matching Weighting Regression
bull Then we move onto situations where no unmeasuredconfounders is violated
What is confounding
bull Confounding is the bias caused by common causes of thetreatment and outcome
Leads to ldquospurious correlationrdquobull In observational studies the goal is to avoid confounding
inherent in the databull Pervasive in the social sciences
effect of income on voting (confounding age) effect of job training program on employment (confounding
motivation) effect of political institutions on economic development
(confounding previous economic development)bull No unmeasured confounding assumes that wersquove measured all
sources of confounding
Big problem
bull How can we determine if no unmeasured confounding holds ifwe didnrsquot assign the treatment
bull Put differently What covariates do we need to condition on What covariates do we need to match on What covaraites do we need to include in our regressions
bull One way from the assumption itself ℙ[119863119894 = 1|119831 119832(1) 119832(0)] = ℙ[119863119894 = 1|119831] Include covariates such that conditional on them the
treatment assignment does not depend on the potentialoutcomes
bull Another way use DAGs and look at back-door paths
Backdoor paths and blocking paths
bull Backdoor path is a non-causal path from 119863 to 119884 Would remain if we removed any arrows pointing out of 119863
bull Backdoor paths between 119863 and 119884 common causes of 119863and 119884
119863
119883
119884
bull Here there is a backdoor path 119863 larr 119883 rarr 119884 where 119883 is acommon cause for the treatment and the outcome
Other types of confounding
119863
119880 119883
119884
bull 119863 is enrolling in a job training programbull 119884 is getting a jobbull 119880 is being motivatedbull 119883 is number of job applications sent outbull Big assumption here no arrow from 119880 to 119884
Other types of confounding
119863
119880 119883
119884
bull 119863 is exercisebull 119884 is having a diseasebull 119880 is lifestylebull 119883 is smokingbull Big assumption here no arrow from 119880 to 119884
Whatrsquos the problem with backdoorpaths
119863
119880 119883
119884
bull A path is blocked if1 we control for or stratify a non-collider on that path OR2 we do not control for a collider
bull Unblocked backdoor paths confoundingbull In the DAG here if we condition on 119883 then the backdoor
path is blocked
Not all backdoor paths
119863
1198801113568119883119883
119884
bull Conditioning on the posttreatment covariates opens thenon-causal path
selection bias
M-bias
119863
1198801113568 1198801113569119883119883
119884
bull Not all backdoor paths induce confoundingbull This backdoor path is blocked by the collider 119883119894 that we donrsquot
control forbull If we control for 119883119894 opens the path and induces
confounding Sometimes called M-bias
bull Controversial because of differing views on what to control for Rubin thinks that M-bias is a ldquomathematical curiosityrdquo and we
should control for all pretreatment variables Pearl and others think M-bias is a real threat
Backdoor criterion
bull Can we use a DAG to evaluate no unmeasured confoundersbull Pearl answered yes with the backdoor criterion which states
that the effect of 119863 on 119884 is identified if1 No backdoor paths from 119863 to 119884 OR2 Measured covariates are sufficient to block all backdoor paths
from 119863 to 119884
bull First is really only valid for randomized experimentsbull The backdoor criterion is fairly powerful Tells us
if there confounding given this DAG if it is possible to removing the confounding and what variables to condition on to eliminate the confounding
SWIGs
119863 | 119889 119884(119889)
119880 119883
119884
bull Itrsquos a little hard to see how the backdoor criterion implies nounmeasured confounders
No potential outcomes on this graphbull Richardson and Robins Single World Intervention Graphs
Split 119863 node into natural value (119863) and intervention value 119889 Let all effects of 119863 take their potential value under
intervention 119884(119889)bull Now can see are 119863 and 119884(119889) related
119863 larr 119880 rarr 119883 rarr 119884(119889) implies not independent Conditioning on 119883 blocks that backdoor path 119863 ⟂⟂ 119884(119889)|119883
No unmeasured confounders is nottestable
bull No unmeasured confounding places no restrictions on theobserved data
1114100119884119894(0)|119863119894 = 1119883119894111410311140591113840111384011138401113840111384011138401113840111384011138401113840111406011138401113840111384011138401113840111384011138401113840111384011138401114061unobserved
119889= 1114100119884119894(0)|119863119894 = 0119883119894111410311140591113840111384011138401113840111384011138401113840111384011138401113840111406011138401113840111384011138401113840111384011138401113840111384011138401114061observed
bull Here 119889= means equal in distributionbull No way to directly test this assumption without the
counterfactual data which is missing by definitionbull With backdoor criterion you must have the correct DAG
Assessing no unmeasured confounders
bull Can do ldquoplacebordquo tests where 119863119894 cannot have an effect(lagged outcomes etc)
bull Della Vigna and Kaplan (2007 QJE) effect of Fox Newsavailability on Republican vote share
Availability in 20002003 canrsquot affect past vote sharesbull Unconfoundedness could still be violated even if you pass this
test
Alternatives to no unmeasuredconfounding
bull Without explicit randomization we need some way ofidentifying causal effects
bull No unmeasured confounders asymp randomized experiment Indentification results very similar to experiments
bull With unmeasured confounding are we doomed Maybe notbull Other approaches rely on finding plausibly exogenous variation
in assignment of 119863119894 Instrumental variables (randomization + exclusion restriction) Over-time variation (diff-in-diff fixed effects) Arbitrary thresholds for treatment assignment (RDD)
3 No unmeasuredconfounders and OLS
Justifying regression
bull We know how randomized experiments imply thatdifferences-in-means identify the ATE
bull In the next few weeks wersquoll work through how no unmeasuredconfounding justifies a number of estimation strategies
bull Today itrsquos useful to walk through what no unmeasuredconfounding can buy us in a familiar setting OLS
Wersquoll cover regression more formally later
Constant effects set up
bull Assume a constant effects setup
119884119894(0) = 120572 + 119883prime119894 120573 + 119906119894
119884119894(1) = 120572 + 120591 + 119883prime119894 120573 + 119906119894
bull Constant effects because 119884119894(1) minus 119884119894(0) = 120591 for all unitsbull Use consistency to get the usual regression formula
119884119894 = 119884119894(1)119863119894 + 119884119894(0)(1 minus 119863119894)= 119884119894(0) + 1114100119884119894(1) minus 119884119894(0)1114103 sdot 119863119894
= 120572 + 120591 sdot 119863119894 + 119883prime119894 120573 + 119906119894
bull Does no unmeasured confounding help us identify the causalparameter 120591
Regression on residuals
bull First estimate the residuals of regression of the treatment andoutcome on the covariates
119894 = 119884119894 minus 120124[119884119894|119883119894]119894 = 119863119894 minus 120124[119863119894|119883119894]
bull Running a regression of 119894 on 119894 is equivalent to controllingfor 119883119894
119884119894 = 120572 + 120591 sdot 119863119894 + 119883prime119894 120573 + 119906119894
119894 = 120572 + 120591 sdot 119894 + 119894bull Here 119894 = 119906119894 minus 120124[119906119894|119883119894]
What does OLS estimate
bull Using the usual OLS theory we can show that the probabilitylimit of the OLS estimator of 120591 is
plim 111369611136931113700 =Cov(119894 119894)Var(119894)
= Cov(119894 120572 + 120591 sdot 119894 + 119894)Var(119894)
= 120591 sdot Cov(119894 119894) + Cov(119894 119894)Var(119894)
= 120591 + Cov(119894 119894)Var(119894)
Key OLS assumption
plim 111369611136931113700 = 120591 + Cov(119894 119894)Var(119894)
bull Key identification comes from Cov(119894 119894) = 0 Conditional on 119883119894 no relationship between 119863119894 and 119906119894
bull Note 119906119894 is a function of 119883119894 and 119884119894(119889) 119906119894 = 119884119894(0) minus 120572 minus 119883prime
119894 120573 when 119863119894 = 0 119906119894 = 119884119894(1) minus 120572 minus 120591 minus 119883prime
119894 120573 when 119863119894 = 1 condition on 119883119894 only variation in 119906119894 comes from 119884119894(119889)
bull No unmeasured confounding implies this assumption
119863119894 ⟂⟂ 1114100119884119894(1) 119884119894(0)1114103|119883119894 ⟹ 119863119894 ⟂⟂ 119906119894|119883119894 ⟹ Cov(119894 119894) = 0
Omitted variable bias
bull What happens when this is violated Suppose that there isone omitted variable (residualized from 119883119894)
119894 = 120582119894 + 120596119894
bull Wersquoll assume that if we could measure 119871119894 then nounmeasured confounding would hold
bull Leads to inconsistency in the OLS estimator
plim 111369611136931113700 = 120591 + 120582Cov(119894 119894)Var(119894)
bull Bias here is terms multiplied together1 coefficient on 119871119894 (120582)2 the coefficient of regression of 119863119894 on 119871119894 also controlling for 119883119894
4 Estimating causaleffects under nounmeasuredconfounders
Basic approach to estimation
bull Remember the usual approach to estimating the ATE withcovariates
bull Stratification Stratify the units by the covariates Calculate CATE within these strata
bull Standardizationdirect adjustment Average the CATEs across the strata to get ATE
bull How to create strata when 119883 has continuous components If 119883 is discrete with only a few levels can use the exact values
of 119883 Otherwise we may have to subclassifycoarsen the data
Classic example cigarspipes versuscigarettes
bull 119863119894 = 1 for pipecigar smokers 119863119894 = 0 for cigarette smokersbull 119884119894 = death in the first year of follow-upbull Naive positive effect cigarpipe smokers more likely to die
Whatrsquos the confounder here Age Pipecigar smokers much older than cigarette smokers
bull Cochranrsquos approach stratify based on coarsened age Divide age into 119896 strata 119878119894 isin 1199041113568 1199041113569 hellip 119904119896 1199041113568 might be 18-25 1199041113569 might be 26-35 and so on Calculate effect within strata and aggregate
bull Key assumption no unmeasured confounders using stratifiedversion of age
119863119894 ⟂⟂ 1114100119884119894(0) 119884119894(1)1114103|119878119894
Stratification on the propensity score
bull What about when 119883 has has many dimensionsbull Curse of dimensionality there will be very few if any units in
a given stratum of 119883119894bull Stratify on a low-dimensional summary the propensity score
119890(119909) = ℙ[119863119894 = 1|119883119894 = 119909]
PS = unitrsquos probability of being treated conditional on 119883119894
bull For a particular unit this is 119890(119883119894) = ℙ[119863119894 = 1|119883119894]bull Rosenbaum and Rubin (1983) showed that
119863119894 ⟂⟂ 1114100119884119894(0) 119884119894(1)1114103 | 119883119894 ⟹ 119863119894 ⟂⟂ 1114100119884119894(0) 119884119894(1)1114103 | 119890(119883119894)
stratifying on 119890119894 is the same as stratifying on the full 119883119894
Propensity score as balancing score
bull The propensity score is actually a balancing score whichmeans that
119863119894 ⟂⟂ 119883119894 | 119890(119883119894)
bull Conditional on the propensity score treatment is independentof the covariates
Treatment status is said to be balanced 119891(119883119894|119863119894 = 1 119890(119883119894)) = 119891(119883119894|119863119894 = 0 119890(119883119894))
bull Of course we have to know the true PS to have all theseresults work
Estimating the propensity score
bull Of course in observational studies we donrsquot know thepropensity score
bull We would run a parametric model with parameters 120574 toestimate the propensity scores
1 Estimate 2 Create 119894 = Pr[119863119894 = 1|119883119894 ]
bull For instance in R we could easily calculate the propensityscores using the glm function
pscores lt- glm(treat ~ var1 + var2 + var3 data = mydata
family = binomial())$fittedvalues
Propensity score specifics
bull What variables do we include in the propensity score model Any set of variables that blocks all the backdoor paths from 119863119894
to 119884119894
bull Check balance within strata of 119894 Covariates should bebalanced
119891(119883119894|119863119894 = 1 119894) = 119891(119883119894|119863119894 = 0 119894)
bull Can also use automatednonparametric tools for estimating 119894 Covariate Balancing Propensity Scores (Imai and Ratkovic)
Stratifying by the propensity score
bull How will we use the propensity score Matching (next week) Weighting (two weeks) Regression
(three weeks)bull Today coarsening the propensity score and stratifyingbull Choose boundary points 0 = 1198871113567 lt 1198871113568 lt hellip lt 119887119870minus1113568 lt 119887119870 = 1bull Create block indicators
119861119894(119896) =
⎧⎪⎪⎨⎪⎪⎩1 if 119887119896minus1113568 lt (119883119894) lt 1198871198960 otherwise
bull Calculate within-strata effect estimates
120591119896 = 120124[119884119894|119863119894 = 1 119861119894(119896) = 1] minus 120124[119884119894|119863119894 = 0 119861119894(119896) = 1]
Standardizationdirect adjustment
bull We calculated the CATEs for each strata of the PS 120591119896bull We can use law of iterated expectations to back out the ATEbull Take the average of the CATEs over the distribution of 119883
120591 =1198701114012119896=1113568
120591119896ℙ[119861119894(119896) = 1]
bull Note that ℙ[119861119894(119896) = 1] is just the proportion of units in block 119896
ℙ[119861119894(119896) = 1] =sum119873
119894=1113568 119861119894(119896)119873
5 Wrapping Up
Summary
bull Defined observational studiesbull Defined confounding and assessed when no unmeasured
confounding holdsbull Saw how no unmeasured confounding helps with OLSbull Saw how to estimate causal effects under no unmeasured
confounding using the propensity score
Next few weeks
bull Learn how to estimate causal effects under no unmeasuredconfounders via
Matching Weighting Regression
bull Then we move onto situations where no unmeasuredconfounders is violated
Big problem
bull How can we determine if no unmeasured confounding holds ifwe didnrsquot assign the treatment
bull Put differently What covariates do we need to condition on What covariates do we need to match on What covaraites do we need to include in our regressions
bull One way from the assumption itself ℙ[119863119894 = 1|119831 119832(1) 119832(0)] = ℙ[119863119894 = 1|119831] Include covariates such that conditional on them the
treatment assignment does not depend on the potentialoutcomes
bull Another way use DAGs and look at back-door paths
Backdoor paths and blocking paths
bull Backdoor path is a non-causal path from 119863 to 119884 Would remain if we removed any arrows pointing out of 119863
bull Backdoor paths between 119863 and 119884 common causes of 119863and 119884
119863
119883
119884
bull Here there is a backdoor path 119863 larr 119883 rarr 119884 where 119883 is acommon cause for the treatment and the outcome
Other types of confounding
119863
119880 119883
119884
bull 119863 is enrolling in a job training programbull 119884 is getting a jobbull 119880 is being motivatedbull 119883 is number of job applications sent outbull Big assumption here no arrow from 119880 to 119884
Other types of confounding
119863
119880 119883
119884
bull 119863 is exercisebull 119884 is having a diseasebull 119880 is lifestylebull 119883 is smokingbull Big assumption here no arrow from 119880 to 119884
Whatrsquos the problem with backdoorpaths
119863
119880 119883
119884
bull A path is blocked if1 we control for or stratify a non-collider on that path OR2 we do not control for a collider
bull Unblocked backdoor paths confoundingbull In the DAG here if we condition on 119883 then the backdoor
path is blocked
Not all backdoor paths
119863
1198801113568119883119883
119884
bull Conditioning on the posttreatment covariates opens thenon-causal path
selection bias
M-bias
119863
1198801113568 1198801113569119883119883
119884
bull Not all backdoor paths induce confoundingbull This backdoor path is blocked by the collider 119883119894 that we donrsquot
control forbull If we control for 119883119894 opens the path and induces
confounding Sometimes called M-bias
bull Controversial because of differing views on what to control for Rubin thinks that M-bias is a ldquomathematical curiosityrdquo and we
should control for all pretreatment variables Pearl and others think M-bias is a real threat
Backdoor criterion
bull Can we use a DAG to evaluate no unmeasured confoundersbull Pearl answered yes with the backdoor criterion which states
that the effect of 119863 on 119884 is identified if1 No backdoor paths from 119863 to 119884 OR2 Measured covariates are sufficient to block all backdoor paths
from 119863 to 119884
bull First is really only valid for randomized experimentsbull The backdoor criterion is fairly powerful Tells us
if there confounding given this DAG if it is possible to removing the confounding and what variables to condition on to eliminate the confounding
SWIGs
119863 | 119889 119884(119889)
119880 119883
119884
bull Itrsquos a little hard to see how the backdoor criterion implies nounmeasured confounders
No potential outcomes on this graphbull Richardson and Robins Single World Intervention Graphs
Split 119863 node into natural value (119863) and intervention value 119889 Let all effects of 119863 take their potential value under
intervention 119884(119889)bull Now can see are 119863 and 119884(119889) related
119863 larr 119880 rarr 119883 rarr 119884(119889) implies not independent Conditioning on 119883 blocks that backdoor path 119863 ⟂⟂ 119884(119889)|119883
No unmeasured confounders is nottestable
bull No unmeasured confounding places no restrictions on theobserved data
1114100119884119894(0)|119863119894 = 1119883119894111410311140591113840111384011138401113840111384011138401113840111384011138401113840111406011138401113840111384011138401113840111384011138401113840111384011138401114061unobserved
119889= 1114100119884119894(0)|119863119894 = 0119883119894111410311140591113840111384011138401113840111384011138401113840111384011138401113840111406011138401113840111384011138401113840111384011138401113840111384011138401114061observed
bull Here 119889= means equal in distributionbull No way to directly test this assumption without the
counterfactual data which is missing by definitionbull With backdoor criterion you must have the correct DAG
Assessing no unmeasured confounders
bull Can do ldquoplacebordquo tests where 119863119894 cannot have an effect(lagged outcomes etc)
bull Della Vigna and Kaplan (2007 QJE) effect of Fox Newsavailability on Republican vote share
Availability in 20002003 canrsquot affect past vote sharesbull Unconfoundedness could still be violated even if you pass this
test
Alternatives to no unmeasuredconfounding
bull Without explicit randomization we need some way ofidentifying causal effects
bull No unmeasured confounders asymp randomized experiment Indentification results very similar to experiments
bull With unmeasured confounding are we doomed Maybe notbull Other approaches rely on finding plausibly exogenous variation
in assignment of 119863119894 Instrumental variables (randomization + exclusion restriction) Over-time variation (diff-in-diff fixed effects) Arbitrary thresholds for treatment assignment (RDD)
3 No unmeasuredconfounders and OLS
Justifying regression
bull We know how randomized experiments imply thatdifferences-in-means identify the ATE
bull In the next few weeks wersquoll work through how no unmeasuredconfounding justifies a number of estimation strategies
bull Today itrsquos useful to walk through what no unmeasuredconfounding can buy us in a familiar setting OLS
Wersquoll cover regression more formally later
Constant effects set up
bull Assume a constant effects setup
119884119894(0) = 120572 + 119883prime119894 120573 + 119906119894
119884119894(1) = 120572 + 120591 + 119883prime119894 120573 + 119906119894
bull Constant effects because 119884119894(1) minus 119884119894(0) = 120591 for all unitsbull Use consistency to get the usual regression formula
119884119894 = 119884119894(1)119863119894 + 119884119894(0)(1 minus 119863119894)= 119884119894(0) + 1114100119884119894(1) minus 119884119894(0)1114103 sdot 119863119894
= 120572 + 120591 sdot 119863119894 + 119883prime119894 120573 + 119906119894
bull Does no unmeasured confounding help us identify the causalparameter 120591
Regression on residuals
bull First estimate the residuals of regression of the treatment andoutcome on the covariates
119894 = 119884119894 minus 120124[119884119894|119883119894]119894 = 119863119894 minus 120124[119863119894|119883119894]
bull Running a regression of 119894 on 119894 is equivalent to controllingfor 119883119894
119884119894 = 120572 + 120591 sdot 119863119894 + 119883prime119894 120573 + 119906119894
119894 = 120572 + 120591 sdot 119894 + 119894bull Here 119894 = 119906119894 minus 120124[119906119894|119883119894]
What does OLS estimate
bull Using the usual OLS theory we can show that the probabilitylimit of the OLS estimator of 120591 is
plim 111369611136931113700 =Cov(119894 119894)Var(119894)
= Cov(119894 120572 + 120591 sdot 119894 + 119894)Var(119894)
= 120591 sdot Cov(119894 119894) + Cov(119894 119894)Var(119894)
= 120591 + Cov(119894 119894)Var(119894)
Key OLS assumption
plim 111369611136931113700 = 120591 + Cov(119894 119894)Var(119894)
bull Key identification comes from Cov(119894 119894) = 0 Conditional on 119883119894 no relationship between 119863119894 and 119906119894
bull Note 119906119894 is a function of 119883119894 and 119884119894(119889) 119906119894 = 119884119894(0) minus 120572 minus 119883prime
119894 120573 when 119863119894 = 0 119906119894 = 119884119894(1) minus 120572 minus 120591 minus 119883prime
119894 120573 when 119863119894 = 1 condition on 119883119894 only variation in 119906119894 comes from 119884119894(119889)
bull No unmeasured confounding implies this assumption
119863119894 ⟂⟂ 1114100119884119894(1) 119884119894(0)1114103|119883119894 ⟹ 119863119894 ⟂⟂ 119906119894|119883119894 ⟹ Cov(119894 119894) = 0
Omitted variable bias
bull What happens when this is violated Suppose that there isone omitted variable (residualized from 119883119894)
119894 = 120582119894 + 120596119894
bull Wersquoll assume that if we could measure 119871119894 then nounmeasured confounding would hold
bull Leads to inconsistency in the OLS estimator
plim 111369611136931113700 = 120591 + 120582Cov(119894 119894)Var(119894)
bull Bias here is terms multiplied together1 coefficient on 119871119894 (120582)2 the coefficient of regression of 119863119894 on 119871119894 also controlling for 119883119894
4 Estimating causaleffects under nounmeasuredconfounders
Basic approach to estimation
bull Remember the usual approach to estimating the ATE withcovariates
bull Stratification Stratify the units by the covariates Calculate CATE within these strata
bull Standardizationdirect adjustment Average the CATEs across the strata to get ATE
bull How to create strata when 119883 has continuous components If 119883 is discrete with only a few levels can use the exact values
of 119883 Otherwise we may have to subclassifycoarsen the data
Classic example cigarspipes versuscigarettes
bull 119863119894 = 1 for pipecigar smokers 119863119894 = 0 for cigarette smokersbull 119884119894 = death in the first year of follow-upbull Naive positive effect cigarpipe smokers more likely to die
Whatrsquos the confounder here Age Pipecigar smokers much older than cigarette smokers
bull Cochranrsquos approach stratify based on coarsened age Divide age into 119896 strata 119878119894 isin 1199041113568 1199041113569 hellip 119904119896 1199041113568 might be 18-25 1199041113569 might be 26-35 and so on Calculate effect within strata and aggregate
bull Key assumption no unmeasured confounders using stratifiedversion of age
119863119894 ⟂⟂ 1114100119884119894(0) 119884119894(1)1114103|119878119894
Stratification on the propensity score
bull What about when 119883 has has many dimensionsbull Curse of dimensionality there will be very few if any units in
a given stratum of 119883119894bull Stratify on a low-dimensional summary the propensity score
119890(119909) = ℙ[119863119894 = 1|119883119894 = 119909]
PS = unitrsquos probability of being treated conditional on 119883119894
bull For a particular unit this is 119890(119883119894) = ℙ[119863119894 = 1|119883119894]bull Rosenbaum and Rubin (1983) showed that
119863119894 ⟂⟂ 1114100119884119894(0) 119884119894(1)1114103 | 119883119894 ⟹ 119863119894 ⟂⟂ 1114100119884119894(0) 119884119894(1)1114103 | 119890(119883119894)
stratifying on 119890119894 is the same as stratifying on the full 119883119894
Propensity score as balancing score
bull The propensity score is actually a balancing score whichmeans that
119863119894 ⟂⟂ 119883119894 | 119890(119883119894)
bull Conditional on the propensity score treatment is independentof the covariates
Treatment status is said to be balanced 119891(119883119894|119863119894 = 1 119890(119883119894)) = 119891(119883119894|119863119894 = 0 119890(119883119894))
bull Of course we have to know the true PS to have all theseresults work
Estimating the propensity score
bull Of course in observational studies we donrsquot know thepropensity score
bull We would run a parametric model with parameters 120574 toestimate the propensity scores
1 Estimate 2 Create 119894 = Pr[119863119894 = 1|119883119894 ]
bull For instance in R we could easily calculate the propensityscores using the glm function
pscores lt- glm(treat ~ var1 + var2 + var3 data = mydata
family = binomial())$fittedvalues
Propensity score specifics
bull What variables do we include in the propensity score model Any set of variables that blocks all the backdoor paths from 119863119894
to 119884119894
bull Check balance within strata of 119894 Covariates should bebalanced
119891(119883119894|119863119894 = 1 119894) = 119891(119883119894|119863119894 = 0 119894)
bull Can also use automatednonparametric tools for estimating 119894 Covariate Balancing Propensity Scores (Imai and Ratkovic)
Stratifying by the propensity score
bull How will we use the propensity score Matching (next week) Weighting (two weeks) Regression
(three weeks)bull Today coarsening the propensity score and stratifyingbull Choose boundary points 0 = 1198871113567 lt 1198871113568 lt hellip lt 119887119870minus1113568 lt 119887119870 = 1bull Create block indicators
119861119894(119896) =
⎧⎪⎪⎨⎪⎪⎩1 if 119887119896minus1113568 lt (119883119894) lt 1198871198960 otherwise
bull Calculate within-strata effect estimates
120591119896 = 120124[119884119894|119863119894 = 1 119861119894(119896) = 1] minus 120124[119884119894|119863119894 = 0 119861119894(119896) = 1]
Standardizationdirect adjustment
bull We calculated the CATEs for each strata of the PS 120591119896bull We can use law of iterated expectations to back out the ATEbull Take the average of the CATEs over the distribution of 119883
120591 =1198701114012119896=1113568
120591119896ℙ[119861119894(119896) = 1]
bull Note that ℙ[119861119894(119896) = 1] is just the proportion of units in block 119896
ℙ[119861119894(119896) = 1] =sum119873
119894=1113568 119861119894(119896)119873
5 Wrapping Up
Summary
bull Defined observational studiesbull Defined confounding and assessed when no unmeasured
confounding holdsbull Saw how no unmeasured confounding helps with OLSbull Saw how to estimate causal effects under no unmeasured
confounding using the propensity score
Next few weeks
bull Learn how to estimate causal effects under no unmeasuredconfounders via
Matching Weighting Regression
bull Then we move onto situations where no unmeasuredconfounders is violated
Backdoor paths and blocking paths
bull Backdoor path is a non-causal path from 119863 to 119884 Would remain if we removed any arrows pointing out of 119863
bull Backdoor paths between 119863 and 119884 common causes of 119863and 119884
119863
119883
119884
bull Here there is a backdoor path 119863 larr 119883 rarr 119884 where 119883 is acommon cause for the treatment and the outcome
Other types of confounding
119863
119880 119883
119884
bull 119863 is enrolling in a job training programbull 119884 is getting a jobbull 119880 is being motivatedbull 119883 is number of job applications sent outbull Big assumption here no arrow from 119880 to 119884
Other types of confounding
119863
119880 119883
119884
bull 119863 is exercisebull 119884 is having a diseasebull 119880 is lifestylebull 119883 is smokingbull Big assumption here no arrow from 119880 to 119884
Whatrsquos the problem with backdoorpaths
119863
119880 119883
119884
bull A path is blocked if1 we control for or stratify a non-collider on that path OR2 we do not control for a collider
bull Unblocked backdoor paths confoundingbull In the DAG here if we condition on 119883 then the backdoor
path is blocked
Not all backdoor paths
119863
1198801113568119883119883
119884
bull Conditioning on the posttreatment covariates opens thenon-causal path
selection bias
M-bias
119863
1198801113568 1198801113569119883119883
119884
bull Not all backdoor paths induce confoundingbull This backdoor path is blocked by the collider 119883119894 that we donrsquot
control forbull If we control for 119883119894 opens the path and induces
confounding Sometimes called M-bias
bull Controversial because of differing views on what to control for Rubin thinks that M-bias is a ldquomathematical curiosityrdquo and we
should control for all pretreatment variables Pearl and others think M-bias is a real threat
Backdoor criterion
bull Can we use a DAG to evaluate no unmeasured confoundersbull Pearl answered yes with the backdoor criterion which states
that the effect of 119863 on 119884 is identified if1 No backdoor paths from 119863 to 119884 OR2 Measured covariates are sufficient to block all backdoor paths
from 119863 to 119884
bull First is really only valid for randomized experimentsbull The backdoor criterion is fairly powerful Tells us
if there confounding given this DAG if it is possible to removing the confounding and what variables to condition on to eliminate the confounding
SWIGs
119863 | 119889 119884(119889)
119880 119883
119884
bull Itrsquos a little hard to see how the backdoor criterion implies nounmeasured confounders
No potential outcomes on this graphbull Richardson and Robins Single World Intervention Graphs
Split 119863 node into natural value (119863) and intervention value 119889 Let all effects of 119863 take their potential value under
intervention 119884(119889)bull Now can see are 119863 and 119884(119889) related
119863 larr 119880 rarr 119883 rarr 119884(119889) implies not independent Conditioning on 119883 blocks that backdoor path 119863 ⟂⟂ 119884(119889)|119883
No unmeasured confounders is nottestable
bull No unmeasured confounding places no restrictions on theobserved data
1114100119884119894(0)|119863119894 = 1119883119894111410311140591113840111384011138401113840111384011138401113840111384011138401113840111406011138401113840111384011138401113840111384011138401113840111384011138401114061unobserved
119889= 1114100119884119894(0)|119863119894 = 0119883119894111410311140591113840111384011138401113840111384011138401113840111384011138401113840111406011138401113840111384011138401113840111384011138401113840111384011138401114061observed
bull Here 119889= means equal in distributionbull No way to directly test this assumption without the
counterfactual data which is missing by definitionbull With backdoor criterion you must have the correct DAG
Assessing no unmeasured confounders
bull Can do ldquoplacebordquo tests where 119863119894 cannot have an effect(lagged outcomes etc)
bull Della Vigna and Kaplan (2007 QJE) effect of Fox Newsavailability on Republican vote share
Availability in 20002003 canrsquot affect past vote sharesbull Unconfoundedness could still be violated even if you pass this
test
Alternatives to no unmeasuredconfounding
bull Without explicit randomization we need some way ofidentifying causal effects
bull No unmeasured confounders asymp randomized experiment Indentification results very similar to experiments
bull With unmeasured confounding are we doomed Maybe notbull Other approaches rely on finding plausibly exogenous variation
in assignment of 119863119894 Instrumental variables (randomization + exclusion restriction) Over-time variation (diff-in-diff fixed effects) Arbitrary thresholds for treatment assignment (RDD)
3 No unmeasuredconfounders and OLS
Justifying regression
bull We know how randomized experiments imply thatdifferences-in-means identify the ATE
bull In the next few weeks wersquoll work through how no unmeasuredconfounding justifies a number of estimation strategies
bull Today itrsquos useful to walk through what no unmeasuredconfounding can buy us in a familiar setting OLS
Wersquoll cover regression more formally later
Constant effects set up
bull Assume a constant effects setup
119884119894(0) = 120572 + 119883prime119894 120573 + 119906119894
119884119894(1) = 120572 + 120591 + 119883prime119894 120573 + 119906119894
bull Constant effects because 119884119894(1) minus 119884119894(0) = 120591 for all unitsbull Use consistency to get the usual regression formula
119884119894 = 119884119894(1)119863119894 + 119884119894(0)(1 minus 119863119894)= 119884119894(0) + 1114100119884119894(1) minus 119884119894(0)1114103 sdot 119863119894
= 120572 + 120591 sdot 119863119894 + 119883prime119894 120573 + 119906119894
bull Does no unmeasured confounding help us identify the causalparameter 120591
Regression on residuals
bull First estimate the residuals of regression of the treatment andoutcome on the covariates
119894 = 119884119894 minus 120124[119884119894|119883119894]119894 = 119863119894 minus 120124[119863119894|119883119894]
bull Running a regression of 119894 on 119894 is equivalent to controllingfor 119883119894
119884119894 = 120572 + 120591 sdot 119863119894 + 119883prime119894 120573 + 119906119894
119894 = 120572 + 120591 sdot 119894 + 119894bull Here 119894 = 119906119894 minus 120124[119906119894|119883119894]
What does OLS estimate
bull Using the usual OLS theory we can show that the probabilitylimit of the OLS estimator of 120591 is
plim 111369611136931113700 =Cov(119894 119894)Var(119894)
= Cov(119894 120572 + 120591 sdot 119894 + 119894)Var(119894)
= 120591 sdot Cov(119894 119894) + Cov(119894 119894)Var(119894)
= 120591 + Cov(119894 119894)Var(119894)
Key OLS assumption
plim 111369611136931113700 = 120591 + Cov(119894 119894)Var(119894)
bull Key identification comes from Cov(119894 119894) = 0 Conditional on 119883119894 no relationship between 119863119894 and 119906119894
bull Note 119906119894 is a function of 119883119894 and 119884119894(119889) 119906119894 = 119884119894(0) minus 120572 minus 119883prime
119894 120573 when 119863119894 = 0 119906119894 = 119884119894(1) minus 120572 minus 120591 minus 119883prime
119894 120573 when 119863119894 = 1 condition on 119883119894 only variation in 119906119894 comes from 119884119894(119889)
bull No unmeasured confounding implies this assumption
119863119894 ⟂⟂ 1114100119884119894(1) 119884119894(0)1114103|119883119894 ⟹ 119863119894 ⟂⟂ 119906119894|119883119894 ⟹ Cov(119894 119894) = 0
Omitted variable bias
bull What happens when this is violated Suppose that there isone omitted variable (residualized from 119883119894)
119894 = 120582119894 + 120596119894
bull Wersquoll assume that if we could measure 119871119894 then nounmeasured confounding would hold
bull Leads to inconsistency in the OLS estimator
plim 111369611136931113700 = 120591 + 120582Cov(119894 119894)Var(119894)
bull Bias here is terms multiplied together1 coefficient on 119871119894 (120582)2 the coefficient of regression of 119863119894 on 119871119894 also controlling for 119883119894
4 Estimating causaleffects under nounmeasuredconfounders
Basic approach to estimation
bull Remember the usual approach to estimating the ATE withcovariates
bull Stratification Stratify the units by the covariates Calculate CATE within these strata
bull Standardizationdirect adjustment Average the CATEs across the strata to get ATE
bull How to create strata when 119883 has continuous components If 119883 is discrete with only a few levels can use the exact values
of 119883 Otherwise we may have to subclassifycoarsen the data
Classic example cigarspipes versuscigarettes
bull 119863119894 = 1 for pipecigar smokers 119863119894 = 0 for cigarette smokersbull 119884119894 = death in the first year of follow-upbull Naive positive effect cigarpipe smokers more likely to die
Whatrsquos the confounder here Age Pipecigar smokers much older than cigarette smokers
bull Cochranrsquos approach stratify based on coarsened age Divide age into 119896 strata 119878119894 isin 1199041113568 1199041113569 hellip 119904119896 1199041113568 might be 18-25 1199041113569 might be 26-35 and so on Calculate effect within strata and aggregate
bull Key assumption no unmeasured confounders using stratifiedversion of age
119863119894 ⟂⟂ 1114100119884119894(0) 119884119894(1)1114103|119878119894
Stratification on the propensity score
bull What about when 119883 has has many dimensionsbull Curse of dimensionality there will be very few if any units in
a given stratum of 119883119894bull Stratify on a low-dimensional summary the propensity score
119890(119909) = ℙ[119863119894 = 1|119883119894 = 119909]
PS = unitrsquos probability of being treated conditional on 119883119894
bull For a particular unit this is 119890(119883119894) = ℙ[119863119894 = 1|119883119894]bull Rosenbaum and Rubin (1983) showed that
119863119894 ⟂⟂ 1114100119884119894(0) 119884119894(1)1114103 | 119883119894 ⟹ 119863119894 ⟂⟂ 1114100119884119894(0) 119884119894(1)1114103 | 119890(119883119894)
stratifying on 119890119894 is the same as stratifying on the full 119883119894
Propensity score as balancing score
bull The propensity score is actually a balancing score whichmeans that
119863119894 ⟂⟂ 119883119894 | 119890(119883119894)
bull Conditional on the propensity score treatment is independentof the covariates
Treatment status is said to be balanced 119891(119883119894|119863119894 = 1 119890(119883119894)) = 119891(119883119894|119863119894 = 0 119890(119883119894))
bull Of course we have to know the true PS to have all theseresults work
Estimating the propensity score
bull Of course in observational studies we donrsquot know thepropensity score
bull We would run a parametric model with parameters 120574 toestimate the propensity scores
1 Estimate 2 Create 119894 = Pr[119863119894 = 1|119883119894 ]
bull For instance in R we could easily calculate the propensityscores using the glm function
pscores lt- glm(treat ~ var1 + var2 + var3 data = mydata
family = binomial())$fittedvalues
Propensity score specifics
bull What variables do we include in the propensity score model Any set of variables that blocks all the backdoor paths from 119863119894
to 119884119894
bull Check balance within strata of 119894 Covariates should bebalanced
119891(119883119894|119863119894 = 1 119894) = 119891(119883119894|119863119894 = 0 119894)
bull Can also use automatednonparametric tools for estimating 119894 Covariate Balancing Propensity Scores (Imai and Ratkovic)
Stratifying by the propensity score
bull How will we use the propensity score Matching (next week) Weighting (two weeks) Regression
(three weeks)bull Today coarsening the propensity score and stratifyingbull Choose boundary points 0 = 1198871113567 lt 1198871113568 lt hellip lt 119887119870minus1113568 lt 119887119870 = 1bull Create block indicators
119861119894(119896) =
⎧⎪⎪⎨⎪⎪⎩1 if 119887119896minus1113568 lt (119883119894) lt 1198871198960 otherwise
bull Calculate within-strata effect estimates
120591119896 = 120124[119884119894|119863119894 = 1 119861119894(119896) = 1] minus 120124[119884119894|119863119894 = 0 119861119894(119896) = 1]
Standardizationdirect adjustment
bull We calculated the CATEs for each strata of the PS 120591119896bull We can use law of iterated expectations to back out the ATEbull Take the average of the CATEs over the distribution of 119883
120591 =1198701114012119896=1113568
120591119896ℙ[119861119894(119896) = 1]
bull Note that ℙ[119861119894(119896) = 1] is just the proportion of units in block 119896
ℙ[119861119894(119896) = 1] =sum119873
119894=1113568 119861119894(119896)119873
5 Wrapping Up
Summary
bull Defined observational studiesbull Defined confounding and assessed when no unmeasured
confounding holdsbull Saw how no unmeasured confounding helps with OLSbull Saw how to estimate causal effects under no unmeasured
confounding using the propensity score
Next few weeks
bull Learn how to estimate causal effects under no unmeasuredconfounders via
Matching Weighting Regression
bull Then we move onto situations where no unmeasuredconfounders is violated
Other types of confounding
119863
119880 119883
119884
bull 119863 is enrolling in a job training programbull 119884 is getting a jobbull 119880 is being motivatedbull 119883 is number of job applications sent outbull Big assumption here no arrow from 119880 to 119884
Other types of confounding
119863
119880 119883
119884
bull 119863 is exercisebull 119884 is having a diseasebull 119880 is lifestylebull 119883 is smokingbull Big assumption here no arrow from 119880 to 119884
Whatrsquos the problem with backdoorpaths
119863
119880 119883
119884
bull A path is blocked if1 we control for or stratify a non-collider on that path OR2 we do not control for a collider
bull Unblocked backdoor paths confoundingbull In the DAG here if we condition on 119883 then the backdoor
path is blocked
Not all backdoor paths
119863
1198801113568119883119883
119884
bull Conditioning on the posttreatment covariates opens thenon-causal path
selection bias
M-bias
119863
1198801113568 1198801113569119883119883
119884
bull Not all backdoor paths induce confoundingbull This backdoor path is blocked by the collider 119883119894 that we donrsquot
control forbull If we control for 119883119894 opens the path and induces
confounding Sometimes called M-bias
bull Controversial because of differing views on what to control for Rubin thinks that M-bias is a ldquomathematical curiosityrdquo and we
should control for all pretreatment variables Pearl and others think M-bias is a real threat
Backdoor criterion
bull Can we use a DAG to evaluate no unmeasured confoundersbull Pearl answered yes with the backdoor criterion which states
that the effect of 119863 on 119884 is identified if1 No backdoor paths from 119863 to 119884 OR2 Measured covariates are sufficient to block all backdoor paths
from 119863 to 119884
bull First is really only valid for randomized experimentsbull The backdoor criterion is fairly powerful Tells us
if there confounding given this DAG if it is possible to removing the confounding and what variables to condition on to eliminate the confounding
SWIGs
119863 | 119889 119884(119889)
119880 119883
119884
bull Itrsquos a little hard to see how the backdoor criterion implies nounmeasured confounders
No potential outcomes on this graphbull Richardson and Robins Single World Intervention Graphs
Split 119863 node into natural value (119863) and intervention value 119889 Let all effects of 119863 take their potential value under
intervention 119884(119889)bull Now can see are 119863 and 119884(119889) related
119863 larr 119880 rarr 119883 rarr 119884(119889) implies not independent Conditioning on 119883 blocks that backdoor path 119863 ⟂⟂ 119884(119889)|119883
No unmeasured confounders is nottestable
bull No unmeasured confounding places no restrictions on theobserved data
1114100119884119894(0)|119863119894 = 1119883119894111410311140591113840111384011138401113840111384011138401113840111384011138401113840111406011138401113840111384011138401113840111384011138401113840111384011138401114061unobserved
119889= 1114100119884119894(0)|119863119894 = 0119883119894111410311140591113840111384011138401113840111384011138401113840111384011138401113840111406011138401113840111384011138401113840111384011138401113840111384011138401114061observed
bull Here 119889= means equal in distributionbull No way to directly test this assumption without the
counterfactual data which is missing by definitionbull With backdoor criterion you must have the correct DAG
Assessing no unmeasured confounders
bull Can do ldquoplacebordquo tests where 119863119894 cannot have an effect(lagged outcomes etc)
bull Della Vigna and Kaplan (2007 QJE) effect of Fox Newsavailability on Republican vote share
Availability in 20002003 canrsquot affect past vote sharesbull Unconfoundedness could still be violated even if you pass this
test
Alternatives to no unmeasuredconfounding
bull Without explicit randomization we need some way ofidentifying causal effects
bull No unmeasured confounders asymp randomized experiment Indentification results very similar to experiments
bull With unmeasured confounding are we doomed Maybe notbull Other approaches rely on finding plausibly exogenous variation
in assignment of 119863119894 Instrumental variables (randomization + exclusion restriction) Over-time variation (diff-in-diff fixed effects) Arbitrary thresholds for treatment assignment (RDD)
3 No unmeasuredconfounders and OLS
Justifying regression
bull We know how randomized experiments imply thatdifferences-in-means identify the ATE
bull In the next few weeks wersquoll work through how no unmeasuredconfounding justifies a number of estimation strategies
bull Today itrsquos useful to walk through what no unmeasuredconfounding can buy us in a familiar setting OLS
Wersquoll cover regression more formally later
Constant effects set up
bull Assume a constant effects setup
119884119894(0) = 120572 + 119883prime119894 120573 + 119906119894
119884119894(1) = 120572 + 120591 + 119883prime119894 120573 + 119906119894
bull Constant effects because 119884119894(1) minus 119884119894(0) = 120591 for all unitsbull Use consistency to get the usual regression formula
119884119894 = 119884119894(1)119863119894 + 119884119894(0)(1 minus 119863119894)= 119884119894(0) + 1114100119884119894(1) minus 119884119894(0)1114103 sdot 119863119894
= 120572 + 120591 sdot 119863119894 + 119883prime119894 120573 + 119906119894
bull Does no unmeasured confounding help us identify the causalparameter 120591
Regression on residuals
bull First estimate the residuals of regression of the treatment andoutcome on the covariates
119894 = 119884119894 minus 120124[119884119894|119883119894]119894 = 119863119894 minus 120124[119863119894|119883119894]
bull Running a regression of 119894 on 119894 is equivalent to controllingfor 119883119894
119884119894 = 120572 + 120591 sdot 119863119894 + 119883prime119894 120573 + 119906119894
119894 = 120572 + 120591 sdot 119894 + 119894bull Here 119894 = 119906119894 minus 120124[119906119894|119883119894]
What does OLS estimate
bull Using the usual OLS theory we can show that the probabilitylimit of the OLS estimator of 120591 is
plim 111369611136931113700 =Cov(119894 119894)Var(119894)
= Cov(119894 120572 + 120591 sdot 119894 + 119894)Var(119894)
= 120591 sdot Cov(119894 119894) + Cov(119894 119894)Var(119894)
= 120591 + Cov(119894 119894)Var(119894)
Key OLS assumption
plim 111369611136931113700 = 120591 + Cov(119894 119894)Var(119894)
bull Key identification comes from Cov(119894 119894) = 0 Conditional on 119883119894 no relationship between 119863119894 and 119906119894
bull Note 119906119894 is a function of 119883119894 and 119884119894(119889) 119906119894 = 119884119894(0) minus 120572 minus 119883prime
119894 120573 when 119863119894 = 0 119906119894 = 119884119894(1) minus 120572 minus 120591 minus 119883prime
119894 120573 when 119863119894 = 1 condition on 119883119894 only variation in 119906119894 comes from 119884119894(119889)
bull No unmeasured confounding implies this assumption
119863119894 ⟂⟂ 1114100119884119894(1) 119884119894(0)1114103|119883119894 ⟹ 119863119894 ⟂⟂ 119906119894|119883119894 ⟹ Cov(119894 119894) = 0
Omitted variable bias
bull What happens when this is violated Suppose that there isone omitted variable (residualized from 119883119894)
119894 = 120582119894 + 120596119894
bull Wersquoll assume that if we could measure 119871119894 then nounmeasured confounding would hold
bull Leads to inconsistency in the OLS estimator
plim 111369611136931113700 = 120591 + 120582Cov(119894 119894)Var(119894)
bull Bias here is terms multiplied together1 coefficient on 119871119894 (120582)2 the coefficient of regression of 119863119894 on 119871119894 also controlling for 119883119894
4 Estimating causaleffects under nounmeasuredconfounders
Basic approach to estimation
bull Remember the usual approach to estimating the ATE withcovariates
bull Stratification Stratify the units by the covariates Calculate CATE within these strata
bull Standardizationdirect adjustment Average the CATEs across the strata to get ATE
bull How to create strata when 119883 has continuous components If 119883 is discrete with only a few levels can use the exact values
of 119883 Otherwise we may have to subclassifycoarsen the data
Classic example cigarspipes versuscigarettes
bull 119863119894 = 1 for pipecigar smokers 119863119894 = 0 for cigarette smokersbull 119884119894 = death in the first year of follow-upbull Naive positive effect cigarpipe smokers more likely to die
Whatrsquos the confounder here Age Pipecigar smokers much older than cigarette smokers
bull Cochranrsquos approach stratify based on coarsened age Divide age into 119896 strata 119878119894 isin 1199041113568 1199041113569 hellip 119904119896 1199041113568 might be 18-25 1199041113569 might be 26-35 and so on Calculate effect within strata and aggregate
bull Key assumption no unmeasured confounders using stratifiedversion of age
119863119894 ⟂⟂ 1114100119884119894(0) 119884119894(1)1114103|119878119894
Stratification on the propensity score
bull What about when 119883 has has many dimensionsbull Curse of dimensionality there will be very few if any units in
a given stratum of 119883119894bull Stratify on a low-dimensional summary the propensity score
119890(119909) = ℙ[119863119894 = 1|119883119894 = 119909]
PS = unitrsquos probability of being treated conditional on 119883119894
bull For a particular unit this is 119890(119883119894) = ℙ[119863119894 = 1|119883119894]bull Rosenbaum and Rubin (1983) showed that
119863119894 ⟂⟂ 1114100119884119894(0) 119884119894(1)1114103 | 119883119894 ⟹ 119863119894 ⟂⟂ 1114100119884119894(0) 119884119894(1)1114103 | 119890(119883119894)
stratifying on 119890119894 is the same as stratifying on the full 119883119894
Propensity score as balancing score
bull The propensity score is actually a balancing score whichmeans that
119863119894 ⟂⟂ 119883119894 | 119890(119883119894)
bull Conditional on the propensity score treatment is independentof the covariates
Treatment status is said to be balanced 119891(119883119894|119863119894 = 1 119890(119883119894)) = 119891(119883119894|119863119894 = 0 119890(119883119894))
bull Of course we have to know the true PS to have all theseresults work
Estimating the propensity score
bull Of course in observational studies we donrsquot know thepropensity score
bull We would run a parametric model with parameters 120574 toestimate the propensity scores
1 Estimate 2 Create 119894 = Pr[119863119894 = 1|119883119894 ]
bull For instance in R we could easily calculate the propensityscores using the glm function
pscores lt- glm(treat ~ var1 + var2 + var3 data = mydata
family = binomial())$fittedvalues
Propensity score specifics
bull What variables do we include in the propensity score model Any set of variables that blocks all the backdoor paths from 119863119894
to 119884119894
bull Check balance within strata of 119894 Covariates should bebalanced
119891(119883119894|119863119894 = 1 119894) = 119891(119883119894|119863119894 = 0 119894)
bull Can also use automatednonparametric tools for estimating 119894 Covariate Balancing Propensity Scores (Imai and Ratkovic)
Stratifying by the propensity score
bull How will we use the propensity score Matching (next week) Weighting (two weeks) Regression
(three weeks)bull Today coarsening the propensity score and stratifyingbull Choose boundary points 0 = 1198871113567 lt 1198871113568 lt hellip lt 119887119870minus1113568 lt 119887119870 = 1bull Create block indicators
119861119894(119896) =
⎧⎪⎪⎨⎪⎪⎩1 if 119887119896minus1113568 lt (119883119894) lt 1198871198960 otherwise
bull Calculate within-strata effect estimates
120591119896 = 120124[119884119894|119863119894 = 1 119861119894(119896) = 1] minus 120124[119884119894|119863119894 = 0 119861119894(119896) = 1]
Standardizationdirect adjustment
bull We calculated the CATEs for each strata of the PS 120591119896bull We can use law of iterated expectations to back out the ATEbull Take the average of the CATEs over the distribution of 119883
120591 =1198701114012119896=1113568
120591119896ℙ[119861119894(119896) = 1]
bull Note that ℙ[119861119894(119896) = 1] is just the proportion of units in block 119896
ℙ[119861119894(119896) = 1] =sum119873
119894=1113568 119861119894(119896)119873
5 Wrapping Up
Summary
bull Defined observational studiesbull Defined confounding and assessed when no unmeasured
confounding holdsbull Saw how no unmeasured confounding helps with OLSbull Saw how to estimate causal effects under no unmeasured
confounding using the propensity score
Next few weeks
bull Learn how to estimate causal effects under no unmeasuredconfounders via
Matching Weighting Regression
bull Then we move onto situations where no unmeasuredconfounders is violated
Other types of confounding
119863
119880 119883
119884
bull 119863 is exercisebull 119884 is having a diseasebull 119880 is lifestylebull 119883 is smokingbull Big assumption here no arrow from 119880 to 119884
Whatrsquos the problem with backdoorpaths
119863
119880 119883
119884
bull A path is blocked if1 we control for or stratify a non-collider on that path OR2 we do not control for a collider
bull Unblocked backdoor paths confoundingbull In the DAG here if we condition on 119883 then the backdoor
path is blocked
Not all backdoor paths
119863
1198801113568119883119883
119884
bull Conditioning on the posttreatment covariates opens thenon-causal path
selection bias
M-bias
119863
1198801113568 1198801113569119883119883
119884
bull Not all backdoor paths induce confoundingbull This backdoor path is blocked by the collider 119883119894 that we donrsquot
control forbull If we control for 119883119894 opens the path and induces
confounding Sometimes called M-bias
bull Controversial because of differing views on what to control for Rubin thinks that M-bias is a ldquomathematical curiosityrdquo and we
should control for all pretreatment variables Pearl and others think M-bias is a real threat
Backdoor criterion
bull Can we use a DAG to evaluate no unmeasured confoundersbull Pearl answered yes with the backdoor criterion which states
that the effect of 119863 on 119884 is identified if1 No backdoor paths from 119863 to 119884 OR2 Measured covariates are sufficient to block all backdoor paths
from 119863 to 119884
bull First is really only valid for randomized experimentsbull The backdoor criterion is fairly powerful Tells us
if there confounding given this DAG if it is possible to removing the confounding and what variables to condition on to eliminate the confounding
SWIGs
119863 | 119889 119884(119889)
119880 119883
119884
bull Itrsquos a little hard to see how the backdoor criterion implies nounmeasured confounders
No potential outcomes on this graphbull Richardson and Robins Single World Intervention Graphs
Split 119863 node into natural value (119863) and intervention value 119889 Let all effects of 119863 take their potential value under
intervention 119884(119889)bull Now can see are 119863 and 119884(119889) related
119863 larr 119880 rarr 119883 rarr 119884(119889) implies not independent Conditioning on 119883 blocks that backdoor path 119863 ⟂⟂ 119884(119889)|119883
No unmeasured confounders is nottestable
bull No unmeasured confounding places no restrictions on theobserved data
1114100119884119894(0)|119863119894 = 1119883119894111410311140591113840111384011138401113840111384011138401113840111384011138401113840111406011138401113840111384011138401113840111384011138401113840111384011138401114061unobserved
119889= 1114100119884119894(0)|119863119894 = 0119883119894111410311140591113840111384011138401113840111384011138401113840111384011138401113840111406011138401113840111384011138401113840111384011138401113840111384011138401114061observed
bull Here 119889= means equal in distributionbull No way to directly test this assumption without the
counterfactual data which is missing by definitionbull With backdoor criterion you must have the correct DAG
Assessing no unmeasured confounders
bull Can do ldquoplacebordquo tests where 119863119894 cannot have an effect(lagged outcomes etc)
bull Della Vigna and Kaplan (2007 QJE) effect of Fox Newsavailability on Republican vote share
Availability in 20002003 canrsquot affect past vote sharesbull Unconfoundedness could still be violated even if you pass this
test
Alternatives to no unmeasuredconfounding
bull Without explicit randomization we need some way ofidentifying causal effects
bull No unmeasured confounders asymp randomized experiment Indentification results very similar to experiments
bull With unmeasured confounding are we doomed Maybe notbull Other approaches rely on finding plausibly exogenous variation
in assignment of 119863119894 Instrumental variables (randomization + exclusion restriction) Over-time variation (diff-in-diff fixed effects) Arbitrary thresholds for treatment assignment (RDD)
3 No unmeasuredconfounders and OLS
Justifying regression
bull We know how randomized experiments imply thatdifferences-in-means identify the ATE
bull In the next few weeks wersquoll work through how no unmeasuredconfounding justifies a number of estimation strategies
bull Today itrsquos useful to walk through what no unmeasuredconfounding can buy us in a familiar setting OLS
Wersquoll cover regression more formally later
Constant effects set up
bull Assume a constant effects setup
119884119894(0) = 120572 + 119883prime119894 120573 + 119906119894
119884119894(1) = 120572 + 120591 + 119883prime119894 120573 + 119906119894
bull Constant effects because 119884119894(1) minus 119884119894(0) = 120591 for all unitsbull Use consistency to get the usual regression formula
119884119894 = 119884119894(1)119863119894 + 119884119894(0)(1 minus 119863119894)= 119884119894(0) + 1114100119884119894(1) minus 119884119894(0)1114103 sdot 119863119894
= 120572 + 120591 sdot 119863119894 + 119883prime119894 120573 + 119906119894
bull Does no unmeasured confounding help us identify the causalparameter 120591
Regression on residuals
bull First estimate the residuals of regression of the treatment andoutcome on the covariates
119894 = 119884119894 minus 120124[119884119894|119883119894]119894 = 119863119894 minus 120124[119863119894|119883119894]
bull Running a regression of 119894 on 119894 is equivalent to controllingfor 119883119894
119884119894 = 120572 + 120591 sdot 119863119894 + 119883prime119894 120573 + 119906119894
119894 = 120572 + 120591 sdot 119894 + 119894bull Here 119894 = 119906119894 minus 120124[119906119894|119883119894]
What does OLS estimate
bull Using the usual OLS theory we can show that the probabilitylimit of the OLS estimator of 120591 is
plim 111369611136931113700 =Cov(119894 119894)Var(119894)
= Cov(119894 120572 + 120591 sdot 119894 + 119894)Var(119894)
= 120591 sdot Cov(119894 119894) + Cov(119894 119894)Var(119894)
= 120591 + Cov(119894 119894)Var(119894)
Key OLS assumption
plim 111369611136931113700 = 120591 + Cov(119894 119894)Var(119894)
bull Key identification comes from Cov(119894 119894) = 0 Conditional on 119883119894 no relationship between 119863119894 and 119906119894
bull Note 119906119894 is a function of 119883119894 and 119884119894(119889) 119906119894 = 119884119894(0) minus 120572 minus 119883prime
119894 120573 when 119863119894 = 0 119906119894 = 119884119894(1) minus 120572 minus 120591 minus 119883prime
119894 120573 when 119863119894 = 1 condition on 119883119894 only variation in 119906119894 comes from 119884119894(119889)
bull No unmeasured confounding implies this assumption
119863119894 ⟂⟂ 1114100119884119894(1) 119884119894(0)1114103|119883119894 ⟹ 119863119894 ⟂⟂ 119906119894|119883119894 ⟹ Cov(119894 119894) = 0
Omitted variable bias
bull What happens when this is violated Suppose that there isone omitted variable (residualized from 119883119894)
119894 = 120582119894 + 120596119894
bull Wersquoll assume that if we could measure 119871119894 then nounmeasured confounding would hold
bull Leads to inconsistency in the OLS estimator
plim 111369611136931113700 = 120591 + 120582Cov(119894 119894)Var(119894)
bull Bias here is terms multiplied together1 coefficient on 119871119894 (120582)2 the coefficient of regression of 119863119894 on 119871119894 also controlling for 119883119894
4 Estimating causaleffects under nounmeasuredconfounders
Basic approach to estimation
bull Remember the usual approach to estimating the ATE withcovariates
bull Stratification Stratify the units by the covariates Calculate CATE within these strata
bull Standardizationdirect adjustment Average the CATEs across the strata to get ATE
bull How to create strata when 119883 has continuous components If 119883 is discrete with only a few levels can use the exact values
of 119883 Otherwise we may have to subclassifycoarsen the data
Classic example cigarspipes versuscigarettes
bull 119863119894 = 1 for pipecigar smokers 119863119894 = 0 for cigarette smokersbull 119884119894 = death in the first year of follow-upbull Naive positive effect cigarpipe smokers more likely to die
Whatrsquos the confounder here Age Pipecigar smokers much older than cigarette smokers
bull Cochranrsquos approach stratify based on coarsened age Divide age into 119896 strata 119878119894 isin 1199041113568 1199041113569 hellip 119904119896 1199041113568 might be 18-25 1199041113569 might be 26-35 and so on Calculate effect within strata and aggregate
bull Key assumption no unmeasured confounders using stratifiedversion of age
119863119894 ⟂⟂ 1114100119884119894(0) 119884119894(1)1114103|119878119894
Stratification on the propensity score
bull What about when 119883 has has many dimensionsbull Curse of dimensionality there will be very few if any units in
a given stratum of 119883119894bull Stratify on a low-dimensional summary the propensity score
119890(119909) = ℙ[119863119894 = 1|119883119894 = 119909]
PS = unitrsquos probability of being treated conditional on 119883119894
bull For a particular unit this is 119890(119883119894) = ℙ[119863119894 = 1|119883119894]bull Rosenbaum and Rubin (1983) showed that
119863119894 ⟂⟂ 1114100119884119894(0) 119884119894(1)1114103 | 119883119894 ⟹ 119863119894 ⟂⟂ 1114100119884119894(0) 119884119894(1)1114103 | 119890(119883119894)
stratifying on 119890119894 is the same as stratifying on the full 119883119894
Propensity score as balancing score
bull The propensity score is actually a balancing score whichmeans that
119863119894 ⟂⟂ 119883119894 | 119890(119883119894)
bull Conditional on the propensity score treatment is independentof the covariates
Treatment status is said to be balanced 119891(119883119894|119863119894 = 1 119890(119883119894)) = 119891(119883119894|119863119894 = 0 119890(119883119894))
bull Of course we have to know the true PS to have all theseresults work
Estimating the propensity score
bull Of course in observational studies we donrsquot know thepropensity score
bull We would run a parametric model with parameters 120574 toestimate the propensity scores
1 Estimate 2 Create 119894 = Pr[119863119894 = 1|119883119894 ]
bull For instance in R we could easily calculate the propensityscores using the glm function
pscores lt- glm(treat ~ var1 + var2 + var3 data = mydata
family = binomial())$fittedvalues
Propensity score specifics
bull What variables do we include in the propensity score model Any set of variables that blocks all the backdoor paths from 119863119894
to 119884119894
bull Check balance within strata of 119894 Covariates should bebalanced
119891(119883119894|119863119894 = 1 119894) = 119891(119883119894|119863119894 = 0 119894)
bull Can also use automatednonparametric tools for estimating 119894 Covariate Balancing Propensity Scores (Imai and Ratkovic)
Stratifying by the propensity score
bull How will we use the propensity score Matching (next week) Weighting (two weeks) Regression
(three weeks)bull Today coarsening the propensity score and stratifyingbull Choose boundary points 0 = 1198871113567 lt 1198871113568 lt hellip lt 119887119870minus1113568 lt 119887119870 = 1bull Create block indicators
119861119894(119896) =
⎧⎪⎪⎨⎪⎪⎩1 if 119887119896minus1113568 lt (119883119894) lt 1198871198960 otherwise
bull Calculate within-strata effect estimates
120591119896 = 120124[119884119894|119863119894 = 1 119861119894(119896) = 1] minus 120124[119884119894|119863119894 = 0 119861119894(119896) = 1]
Standardizationdirect adjustment
bull We calculated the CATEs for each strata of the PS 120591119896bull We can use law of iterated expectations to back out the ATEbull Take the average of the CATEs over the distribution of 119883
120591 =1198701114012119896=1113568
120591119896ℙ[119861119894(119896) = 1]
bull Note that ℙ[119861119894(119896) = 1] is just the proportion of units in block 119896
ℙ[119861119894(119896) = 1] =sum119873
119894=1113568 119861119894(119896)119873
5 Wrapping Up
Summary
bull Defined observational studiesbull Defined confounding and assessed when no unmeasured
confounding holdsbull Saw how no unmeasured confounding helps with OLSbull Saw how to estimate causal effects under no unmeasured
confounding using the propensity score
Next few weeks
bull Learn how to estimate causal effects under no unmeasuredconfounders via
Matching Weighting Regression
bull Then we move onto situations where no unmeasuredconfounders is violated
Whatrsquos the problem with backdoorpaths
119863
119880 119883
119884
bull A path is blocked if1 we control for or stratify a non-collider on that path OR2 we do not control for a collider
bull Unblocked backdoor paths confoundingbull In the DAG here if we condition on 119883 then the backdoor
path is blocked
Not all backdoor paths
119863
1198801113568119883119883
119884
bull Conditioning on the posttreatment covariates opens thenon-causal path
selection bias
M-bias
119863
1198801113568 1198801113569119883119883
119884
bull Not all backdoor paths induce confoundingbull This backdoor path is blocked by the collider 119883119894 that we donrsquot
control forbull If we control for 119883119894 opens the path and induces
confounding Sometimes called M-bias
bull Controversial because of differing views on what to control for Rubin thinks that M-bias is a ldquomathematical curiosityrdquo and we
should control for all pretreatment variables Pearl and others think M-bias is a real threat
Backdoor criterion
bull Can we use a DAG to evaluate no unmeasured confoundersbull Pearl answered yes with the backdoor criterion which states
that the effect of 119863 on 119884 is identified if1 No backdoor paths from 119863 to 119884 OR2 Measured covariates are sufficient to block all backdoor paths
from 119863 to 119884
bull First is really only valid for randomized experimentsbull The backdoor criterion is fairly powerful Tells us
if there confounding given this DAG if it is possible to removing the confounding and what variables to condition on to eliminate the confounding
SWIGs
119863 | 119889 119884(119889)
119880 119883
119884
bull Itrsquos a little hard to see how the backdoor criterion implies nounmeasured confounders
No potential outcomes on this graphbull Richardson and Robins Single World Intervention Graphs
Split 119863 node into natural value (119863) and intervention value 119889 Let all effects of 119863 take their potential value under
intervention 119884(119889)bull Now can see are 119863 and 119884(119889) related
119863 larr 119880 rarr 119883 rarr 119884(119889) implies not independent Conditioning on 119883 blocks that backdoor path 119863 ⟂⟂ 119884(119889)|119883
No unmeasured confounders is nottestable
bull No unmeasured confounding places no restrictions on theobserved data
1114100119884119894(0)|119863119894 = 1119883119894111410311140591113840111384011138401113840111384011138401113840111384011138401113840111406011138401113840111384011138401113840111384011138401113840111384011138401114061unobserved
119889= 1114100119884119894(0)|119863119894 = 0119883119894111410311140591113840111384011138401113840111384011138401113840111384011138401113840111406011138401113840111384011138401113840111384011138401113840111384011138401114061observed
bull Here 119889= means equal in distributionbull No way to directly test this assumption without the
counterfactual data which is missing by definitionbull With backdoor criterion you must have the correct DAG
Assessing no unmeasured confounders
bull Can do ldquoplacebordquo tests where 119863119894 cannot have an effect(lagged outcomes etc)
bull Della Vigna and Kaplan (2007 QJE) effect of Fox Newsavailability on Republican vote share
Availability in 20002003 canrsquot affect past vote sharesbull Unconfoundedness could still be violated even if you pass this
test
Alternatives to no unmeasuredconfounding
bull Without explicit randomization we need some way ofidentifying causal effects
bull No unmeasured confounders asymp randomized experiment Indentification results very similar to experiments
bull With unmeasured confounding are we doomed Maybe notbull Other approaches rely on finding plausibly exogenous variation
in assignment of 119863119894 Instrumental variables (randomization + exclusion restriction) Over-time variation (diff-in-diff fixed effects) Arbitrary thresholds for treatment assignment (RDD)
3 No unmeasuredconfounders and OLS
Justifying regression
bull We know how randomized experiments imply thatdifferences-in-means identify the ATE
bull In the next few weeks wersquoll work through how no unmeasuredconfounding justifies a number of estimation strategies
bull Today itrsquos useful to walk through what no unmeasuredconfounding can buy us in a familiar setting OLS
Wersquoll cover regression more formally later
Constant effects set up
bull Assume a constant effects setup
119884119894(0) = 120572 + 119883prime119894 120573 + 119906119894
119884119894(1) = 120572 + 120591 + 119883prime119894 120573 + 119906119894
bull Constant effects because 119884119894(1) minus 119884119894(0) = 120591 for all unitsbull Use consistency to get the usual regression formula
119884119894 = 119884119894(1)119863119894 + 119884119894(0)(1 minus 119863119894)= 119884119894(0) + 1114100119884119894(1) minus 119884119894(0)1114103 sdot 119863119894
= 120572 + 120591 sdot 119863119894 + 119883prime119894 120573 + 119906119894
bull Does no unmeasured confounding help us identify the causalparameter 120591
Regression on residuals
bull First estimate the residuals of regression of the treatment andoutcome on the covariates
119894 = 119884119894 minus 120124[119884119894|119883119894]119894 = 119863119894 minus 120124[119863119894|119883119894]
bull Running a regression of 119894 on 119894 is equivalent to controllingfor 119883119894
119884119894 = 120572 + 120591 sdot 119863119894 + 119883prime119894 120573 + 119906119894
119894 = 120572 + 120591 sdot 119894 + 119894bull Here 119894 = 119906119894 minus 120124[119906119894|119883119894]
What does OLS estimate
bull Using the usual OLS theory we can show that the probabilitylimit of the OLS estimator of 120591 is
plim 111369611136931113700 =Cov(119894 119894)Var(119894)
= Cov(119894 120572 + 120591 sdot 119894 + 119894)Var(119894)
= 120591 sdot Cov(119894 119894) + Cov(119894 119894)Var(119894)
= 120591 + Cov(119894 119894)Var(119894)
Key OLS assumption
plim 111369611136931113700 = 120591 + Cov(119894 119894)Var(119894)
bull Key identification comes from Cov(119894 119894) = 0 Conditional on 119883119894 no relationship between 119863119894 and 119906119894
bull Note 119906119894 is a function of 119883119894 and 119884119894(119889) 119906119894 = 119884119894(0) minus 120572 minus 119883prime
119894 120573 when 119863119894 = 0 119906119894 = 119884119894(1) minus 120572 minus 120591 minus 119883prime
119894 120573 when 119863119894 = 1 condition on 119883119894 only variation in 119906119894 comes from 119884119894(119889)
bull No unmeasured confounding implies this assumption
119863119894 ⟂⟂ 1114100119884119894(1) 119884119894(0)1114103|119883119894 ⟹ 119863119894 ⟂⟂ 119906119894|119883119894 ⟹ Cov(119894 119894) = 0
Omitted variable bias
bull What happens when this is violated Suppose that there isone omitted variable (residualized from 119883119894)
119894 = 120582119894 + 120596119894
bull Wersquoll assume that if we could measure 119871119894 then nounmeasured confounding would hold
bull Leads to inconsistency in the OLS estimator
plim 111369611136931113700 = 120591 + 120582Cov(119894 119894)Var(119894)
bull Bias here is terms multiplied together1 coefficient on 119871119894 (120582)2 the coefficient of regression of 119863119894 on 119871119894 also controlling for 119883119894
4 Estimating causaleffects under nounmeasuredconfounders
Basic approach to estimation
bull Remember the usual approach to estimating the ATE withcovariates
bull Stratification Stratify the units by the covariates Calculate CATE within these strata
bull Standardizationdirect adjustment Average the CATEs across the strata to get ATE
bull How to create strata when 119883 has continuous components If 119883 is discrete with only a few levels can use the exact values
of 119883 Otherwise we may have to subclassifycoarsen the data
Classic example cigarspipes versuscigarettes
bull 119863119894 = 1 for pipecigar smokers 119863119894 = 0 for cigarette smokersbull 119884119894 = death in the first year of follow-upbull Naive positive effect cigarpipe smokers more likely to die
Whatrsquos the confounder here Age Pipecigar smokers much older than cigarette smokers
bull Cochranrsquos approach stratify based on coarsened age Divide age into 119896 strata 119878119894 isin 1199041113568 1199041113569 hellip 119904119896 1199041113568 might be 18-25 1199041113569 might be 26-35 and so on Calculate effect within strata and aggregate
bull Key assumption no unmeasured confounders using stratifiedversion of age
119863119894 ⟂⟂ 1114100119884119894(0) 119884119894(1)1114103|119878119894
Stratification on the propensity score
bull What about when 119883 has has many dimensionsbull Curse of dimensionality there will be very few if any units in
a given stratum of 119883119894bull Stratify on a low-dimensional summary the propensity score
119890(119909) = ℙ[119863119894 = 1|119883119894 = 119909]
PS = unitrsquos probability of being treated conditional on 119883119894
bull For a particular unit this is 119890(119883119894) = ℙ[119863119894 = 1|119883119894]bull Rosenbaum and Rubin (1983) showed that
119863119894 ⟂⟂ 1114100119884119894(0) 119884119894(1)1114103 | 119883119894 ⟹ 119863119894 ⟂⟂ 1114100119884119894(0) 119884119894(1)1114103 | 119890(119883119894)
stratifying on 119890119894 is the same as stratifying on the full 119883119894
Propensity score as balancing score
bull The propensity score is actually a balancing score whichmeans that
119863119894 ⟂⟂ 119883119894 | 119890(119883119894)
bull Conditional on the propensity score treatment is independentof the covariates
Treatment status is said to be balanced 119891(119883119894|119863119894 = 1 119890(119883119894)) = 119891(119883119894|119863119894 = 0 119890(119883119894))
bull Of course we have to know the true PS to have all theseresults work
Estimating the propensity score
bull Of course in observational studies we donrsquot know thepropensity score
bull We would run a parametric model with parameters 120574 toestimate the propensity scores
1 Estimate 2 Create 119894 = Pr[119863119894 = 1|119883119894 ]
bull For instance in R we could easily calculate the propensityscores using the glm function
pscores lt- glm(treat ~ var1 + var2 + var3 data = mydata
family = binomial())$fittedvalues
Propensity score specifics
bull What variables do we include in the propensity score model Any set of variables that blocks all the backdoor paths from 119863119894
to 119884119894
bull Check balance within strata of 119894 Covariates should bebalanced
119891(119883119894|119863119894 = 1 119894) = 119891(119883119894|119863119894 = 0 119894)
bull Can also use automatednonparametric tools for estimating 119894 Covariate Balancing Propensity Scores (Imai and Ratkovic)
Stratifying by the propensity score
bull How will we use the propensity score Matching (next week) Weighting (two weeks) Regression
(three weeks)bull Today coarsening the propensity score and stratifyingbull Choose boundary points 0 = 1198871113567 lt 1198871113568 lt hellip lt 119887119870minus1113568 lt 119887119870 = 1bull Create block indicators
119861119894(119896) =
⎧⎪⎪⎨⎪⎪⎩1 if 119887119896minus1113568 lt (119883119894) lt 1198871198960 otherwise
bull Calculate within-strata effect estimates
120591119896 = 120124[119884119894|119863119894 = 1 119861119894(119896) = 1] minus 120124[119884119894|119863119894 = 0 119861119894(119896) = 1]
Standardizationdirect adjustment
bull We calculated the CATEs for each strata of the PS 120591119896bull We can use law of iterated expectations to back out the ATEbull Take the average of the CATEs over the distribution of 119883
120591 =1198701114012119896=1113568
120591119896ℙ[119861119894(119896) = 1]
bull Note that ℙ[119861119894(119896) = 1] is just the proportion of units in block 119896
ℙ[119861119894(119896) = 1] =sum119873
119894=1113568 119861119894(119896)119873
5 Wrapping Up
Summary
bull Defined observational studiesbull Defined confounding and assessed when no unmeasured
confounding holdsbull Saw how no unmeasured confounding helps with OLSbull Saw how to estimate causal effects under no unmeasured
confounding using the propensity score
Next few weeks
bull Learn how to estimate causal effects under no unmeasuredconfounders via
Matching Weighting Regression
bull Then we move onto situations where no unmeasuredconfounders is violated
Not all backdoor paths
119863
1198801113568119883119883
119884
bull Conditioning on the posttreatment covariates opens thenon-causal path
selection bias
M-bias
119863
1198801113568 1198801113569119883119883
119884
bull Not all backdoor paths induce confoundingbull This backdoor path is blocked by the collider 119883119894 that we donrsquot
control forbull If we control for 119883119894 opens the path and induces
confounding Sometimes called M-bias
bull Controversial because of differing views on what to control for Rubin thinks that M-bias is a ldquomathematical curiosityrdquo and we
should control for all pretreatment variables Pearl and others think M-bias is a real threat
Backdoor criterion
bull Can we use a DAG to evaluate no unmeasured confoundersbull Pearl answered yes with the backdoor criterion which states
that the effect of 119863 on 119884 is identified if1 No backdoor paths from 119863 to 119884 OR2 Measured covariates are sufficient to block all backdoor paths
from 119863 to 119884
bull First is really only valid for randomized experimentsbull The backdoor criterion is fairly powerful Tells us
if there confounding given this DAG if it is possible to removing the confounding and what variables to condition on to eliminate the confounding
SWIGs
119863 | 119889 119884(119889)
119880 119883
119884
bull Itrsquos a little hard to see how the backdoor criterion implies nounmeasured confounders
No potential outcomes on this graphbull Richardson and Robins Single World Intervention Graphs
Split 119863 node into natural value (119863) and intervention value 119889 Let all effects of 119863 take their potential value under
intervention 119884(119889)bull Now can see are 119863 and 119884(119889) related
119863 larr 119880 rarr 119883 rarr 119884(119889) implies not independent Conditioning on 119883 blocks that backdoor path 119863 ⟂⟂ 119884(119889)|119883
No unmeasured confounders is nottestable
bull No unmeasured confounding places no restrictions on theobserved data
1114100119884119894(0)|119863119894 = 1119883119894111410311140591113840111384011138401113840111384011138401113840111384011138401113840111406011138401113840111384011138401113840111384011138401113840111384011138401114061unobserved
119889= 1114100119884119894(0)|119863119894 = 0119883119894111410311140591113840111384011138401113840111384011138401113840111384011138401113840111406011138401113840111384011138401113840111384011138401113840111384011138401114061observed
bull Here 119889= means equal in distributionbull No way to directly test this assumption without the
counterfactual data which is missing by definitionbull With backdoor criterion you must have the correct DAG
Assessing no unmeasured confounders
bull Can do ldquoplacebordquo tests where 119863119894 cannot have an effect(lagged outcomes etc)
bull Della Vigna and Kaplan (2007 QJE) effect of Fox Newsavailability on Republican vote share
Availability in 20002003 canrsquot affect past vote sharesbull Unconfoundedness could still be violated even if you pass this
test
Alternatives to no unmeasuredconfounding
bull Without explicit randomization we need some way ofidentifying causal effects
bull No unmeasured confounders asymp randomized experiment Indentification results very similar to experiments
bull With unmeasured confounding are we doomed Maybe notbull Other approaches rely on finding plausibly exogenous variation
in assignment of 119863119894 Instrumental variables (randomization + exclusion restriction) Over-time variation (diff-in-diff fixed effects) Arbitrary thresholds for treatment assignment (RDD)
3 No unmeasuredconfounders and OLS
Justifying regression
bull We know how randomized experiments imply thatdifferences-in-means identify the ATE
bull In the next few weeks wersquoll work through how no unmeasuredconfounding justifies a number of estimation strategies
bull Today itrsquos useful to walk through what no unmeasuredconfounding can buy us in a familiar setting OLS
Wersquoll cover regression more formally later
Constant effects set up
bull Assume a constant effects setup
119884119894(0) = 120572 + 119883prime119894 120573 + 119906119894
119884119894(1) = 120572 + 120591 + 119883prime119894 120573 + 119906119894
bull Constant effects because 119884119894(1) minus 119884119894(0) = 120591 for all unitsbull Use consistency to get the usual regression formula
119884119894 = 119884119894(1)119863119894 + 119884119894(0)(1 minus 119863119894)= 119884119894(0) + 1114100119884119894(1) minus 119884119894(0)1114103 sdot 119863119894
= 120572 + 120591 sdot 119863119894 + 119883prime119894 120573 + 119906119894
bull Does no unmeasured confounding help us identify the causalparameter 120591
Regression on residuals
bull First estimate the residuals of regression of the treatment andoutcome on the covariates
119894 = 119884119894 minus 120124[119884119894|119883119894]119894 = 119863119894 minus 120124[119863119894|119883119894]
bull Running a regression of 119894 on 119894 is equivalent to controllingfor 119883119894
119884119894 = 120572 + 120591 sdot 119863119894 + 119883prime119894 120573 + 119906119894
119894 = 120572 + 120591 sdot 119894 + 119894bull Here 119894 = 119906119894 minus 120124[119906119894|119883119894]
What does OLS estimate
bull Using the usual OLS theory we can show that the probabilitylimit of the OLS estimator of 120591 is
plim 111369611136931113700 =Cov(119894 119894)Var(119894)
= Cov(119894 120572 + 120591 sdot 119894 + 119894)Var(119894)
= 120591 sdot Cov(119894 119894) + Cov(119894 119894)Var(119894)
= 120591 + Cov(119894 119894)Var(119894)
Key OLS assumption
plim 111369611136931113700 = 120591 + Cov(119894 119894)Var(119894)
bull Key identification comes from Cov(119894 119894) = 0 Conditional on 119883119894 no relationship between 119863119894 and 119906119894
bull Note 119906119894 is a function of 119883119894 and 119884119894(119889) 119906119894 = 119884119894(0) minus 120572 minus 119883prime
119894 120573 when 119863119894 = 0 119906119894 = 119884119894(1) minus 120572 minus 120591 minus 119883prime
119894 120573 when 119863119894 = 1 condition on 119883119894 only variation in 119906119894 comes from 119884119894(119889)
bull No unmeasured confounding implies this assumption
119863119894 ⟂⟂ 1114100119884119894(1) 119884119894(0)1114103|119883119894 ⟹ 119863119894 ⟂⟂ 119906119894|119883119894 ⟹ Cov(119894 119894) = 0
Omitted variable bias
bull What happens when this is violated Suppose that there isone omitted variable (residualized from 119883119894)
119894 = 120582119894 + 120596119894
bull Wersquoll assume that if we could measure 119871119894 then nounmeasured confounding would hold
bull Leads to inconsistency in the OLS estimator
plim 111369611136931113700 = 120591 + 120582Cov(119894 119894)Var(119894)
bull Bias here is terms multiplied together1 coefficient on 119871119894 (120582)2 the coefficient of regression of 119863119894 on 119871119894 also controlling for 119883119894
4 Estimating causaleffects under nounmeasuredconfounders
Basic approach to estimation
bull Remember the usual approach to estimating the ATE withcovariates
bull Stratification Stratify the units by the covariates Calculate CATE within these strata
bull Standardizationdirect adjustment Average the CATEs across the strata to get ATE
bull How to create strata when 119883 has continuous components If 119883 is discrete with only a few levels can use the exact values
of 119883 Otherwise we may have to subclassifycoarsen the data
Classic example cigarspipes versuscigarettes
bull 119863119894 = 1 for pipecigar smokers 119863119894 = 0 for cigarette smokersbull 119884119894 = death in the first year of follow-upbull Naive positive effect cigarpipe smokers more likely to die
Whatrsquos the confounder here Age Pipecigar smokers much older than cigarette smokers
bull Cochranrsquos approach stratify based on coarsened age Divide age into 119896 strata 119878119894 isin 1199041113568 1199041113569 hellip 119904119896 1199041113568 might be 18-25 1199041113569 might be 26-35 and so on Calculate effect within strata and aggregate
bull Key assumption no unmeasured confounders using stratifiedversion of age
119863119894 ⟂⟂ 1114100119884119894(0) 119884119894(1)1114103|119878119894
Stratification on the propensity score
bull What about when 119883 has has many dimensionsbull Curse of dimensionality there will be very few if any units in
a given stratum of 119883119894bull Stratify on a low-dimensional summary the propensity score
119890(119909) = ℙ[119863119894 = 1|119883119894 = 119909]
PS = unitrsquos probability of being treated conditional on 119883119894
bull For a particular unit this is 119890(119883119894) = ℙ[119863119894 = 1|119883119894]bull Rosenbaum and Rubin (1983) showed that
119863119894 ⟂⟂ 1114100119884119894(0) 119884119894(1)1114103 | 119883119894 ⟹ 119863119894 ⟂⟂ 1114100119884119894(0) 119884119894(1)1114103 | 119890(119883119894)
stratifying on 119890119894 is the same as stratifying on the full 119883119894
Propensity score as balancing score
bull The propensity score is actually a balancing score whichmeans that
119863119894 ⟂⟂ 119883119894 | 119890(119883119894)
bull Conditional on the propensity score treatment is independentof the covariates
Treatment status is said to be balanced 119891(119883119894|119863119894 = 1 119890(119883119894)) = 119891(119883119894|119863119894 = 0 119890(119883119894))
bull Of course we have to know the true PS to have all theseresults work
Estimating the propensity score
bull Of course in observational studies we donrsquot know thepropensity score
bull We would run a parametric model with parameters 120574 toestimate the propensity scores
1 Estimate 2 Create 119894 = Pr[119863119894 = 1|119883119894 ]
bull For instance in R we could easily calculate the propensityscores using the glm function
pscores lt- glm(treat ~ var1 + var2 + var3 data = mydata
family = binomial())$fittedvalues
Propensity score specifics
bull What variables do we include in the propensity score model Any set of variables that blocks all the backdoor paths from 119863119894
to 119884119894
bull Check balance within strata of 119894 Covariates should bebalanced
119891(119883119894|119863119894 = 1 119894) = 119891(119883119894|119863119894 = 0 119894)
bull Can also use automatednonparametric tools for estimating 119894 Covariate Balancing Propensity Scores (Imai and Ratkovic)
Stratifying by the propensity score
bull How will we use the propensity score Matching (next week) Weighting (two weeks) Regression
(three weeks)bull Today coarsening the propensity score and stratifyingbull Choose boundary points 0 = 1198871113567 lt 1198871113568 lt hellip lt 119887119870minus1113568 lt 119887119870 = 1bull Create block indicators
119861119894(119896) =
⎧⎪⎪⎨⎪⎪⎩1 if 119887119896minus1113568 lt (119883119894) lt 1198871198960 otherwise
bull Calculate within-strata effect estimates
120591119896 = 120124[119884119894|119863119894 = 1 119861119894(119896) = 1] minus 120124[119884119894|119863119894 = 0 119861119894(119896) = 1]
Standardizationdirect adjustment
bull We calculated the CATEs for each strata of the PS 120591119896bull We can use law of iterated expectations to back out the ATEbull Take the average of the CATEs over the distribution of 119883
120591 =1198701114012119896=1113568
120591119896ℙ[119861119894(119896) = 1]
bull Note that ℙ[119861119894(119896) = 1] is just the proportion of units in block 119896
ℙ[119861119894(119896) = 1] =sum119873
119894=1113568 119861119894(119896)119873
5 Wrapping Up
Summary
bull Defined observational studiesbull Defined confounding and assessed when no unmeasured
confounding holdsbull Saw how no unmeasured confounding helps with OLSbull Saw how to estimate causal effects under no unmeasured
confounding using the propensity score
Next few weeks
bull Learn how to estimate causal effects under no unmeasuredconfounders via
Matching Weighting Regression
bull Then we move onto situations where no unmeasuredconfounders is violated
M-bias
119863
1198801113568 1198801113569119883119883
119884
bull Not all backdoor paths induce confoundingbull This backdoor path is blocked by the collider 119883119894 that we donrsquot
control forbull If we control for 119883119894 opens the path and induces
confounding Sometimes called M-bias
bull Controversial because of differing views on what to control for Rubin thinks that M-bias is a ldquomathematical curiosityrdquo and we
should control for all pretreatment variables Pearl and others think M-bias is a real threat
Backdoor criterion
bull Can we use a DAG to evaluate no unmeasured confoundersbull Pearl answered yes with the backdoor criterion which states
that the effect of 119863 on 119884 is identified if1 No backdoor paths from 119863 to 119884 OR2 Measured covariates are sufficient to block all backdoor paths
from 119863 to 119884
bull First is really only valid for randomized experimentsbull The backdoor criterion is fairly powerful Tells us
if there confounding given this DAG if it is possible to removing the confounding and what variables to condition on to eliminate the confounding
SWIGs
119863 | 119889 119884(119889)
119880 119883
119884
bull Itrsquos a little hard to see how the backdoor criterion implies nounmeasured confounders
No potential outcomes on this graphbull Richardson and Robins Single World Intervention Graphs
Split 119863 node into natural value (119863) and intervention value 119889 Let all effects of 119863 take their potential value under
intervention 119884(119889)bull Now can see are 119863 and 119884(119889) related
119863 larr 119880 rarr 119883 rarr 119884(119889) implies not independent Conditioning on 119883 blocks that backdoor path 119863 ⟂⟂ 119884(119889)|119883
No unmeasured confounders is nottestable
bull No unmeasured confounding places no restrictions on theobserved data
1114100119884119894(0)|119863119894 = 1119883119894111410311140591113840111384011138401113840111384011138401113840111384011138401113840111406011138401113840111384011138401113840111384011138401113840111384011138401114061unobserved
119889= 1114100119884119894(0)|119863119894 = 0119883119894111410311140591113840111384011138401113840111384011138401113840111384011138401113840111406011138401113840111384011138401113840111384011138401113840111384011138401114061observed
bull Here 119889= means equal in distributionbull No way to directly test this assumption without the
counterfactual data which is missing by definitionbull With backdoor criterion you must have the correct DAG
Assessing no unmeasured confounders
bull Can do ldquoplacebordquo tests where 119863119894 cannot have an effect(lagged outcomes etc)
bull Della Vigna and Kaplan (2007 QJE) effect of Fox Newsavailability on Republican vote share
Availability in 20002003 canrsquot affect past vote sharesbull Unconfoundedness could still be violated even if you pass this
test
Alternatives to no unmeasuredconfounding
bull Without explicit randomization we need some way ofidentifying causal effects
bull No unmeasured confounders asymp randomized experiment Indentification results very similar to experiments
bull With unmeasured confounding are we doomed Maybe notbull Other approaches rely on finding plausibly exogenous variation
in assignment of 119863119894 Instrumental variables (randomization + exclusion restriction) Over-time variation (diff-in-diff fixed effects) Arbitrary thresholds for treatment assignment (RDD)
3 No unmeasuredconfounders and OLS
Justifying regression
bull We know how randomized experiments imply thatdifferences-in-means identify the ATE
bull In the next few weeks wersquoll work through how no unmeasuredconfounding justifies a number of estimation strategies
bull Today itrsquos useful to walk through what no unmeasuredconfounding can buy us in a familiar setting OLS
Wersquoll cover regression more formally later
Constant effects set up
bull Assume a constant effects setup
119884119894(0) = 120572 + 119883prime119894 120573 + 119906119894
119884119894(1) = 120572 + 120591 + 119883prime119894 120573 + 119906119894
bull Constant effects because 119884119894(1) minus 119884119894(0) = 120591 for all unitsbull Use consistency to get the usual regression formula
119884119894 = 119884119894(1)119863119894 + 119884119894(0)(1 minus 119863119894)= 119884119894(0) + 1114100119884119894(1) minus 119884119894(0)1114103 sdot 119863119894
= 120572 + 120591 sdot 119863119894 + 119883prime119894 120573 + 119906119894
bull Does no unmeasured confounding help us identify the causalparameter 120591
Regression on residuals
bull First estimate the residuals of regression of the treatment andoutcome on the covariates
119894 = 119884119894 minus 120124[119884119894|119883119894]119894 = 119863119894 minus 120124[119863119894|119883119894]
bull Running a regression of 119894 on 119894 is equivalent to controllingfor 119883119894
119884119894 = 120572 + 120591 sdot 119863119894 + 119883prime119894 120573 + 119906119894
119894 = 120572 + 120591 sdot 119894 + 119894bull Here 119894 = 119906119894 minus 120124[119906119894|119883119894]
What does OLS estimate
bull Using the usual OLS theory we can show that the probabilitylimit of the OLS estimator of 120591 is
plim 111369611136931113700 =Cov(119894 119894)Var(119894)
= Cov(119894 120572 + 120591 sdot 119894 + 119894)Var(119894)
= 120591 sdot Cov(119894 119894) + Cov(119894 119894)Var(119894)
= 120591 + Cov(119894 119894)Var(119894)
Key OLS assumption
plim 111369611136931113700 = 120591 + Cov(119894 119894)Var(119894)
bull Key identification comes from Cov(119894 119894) = 0 Conditional on 119883119894 no relationship between 119863119894 and 119906119894
bull Note 119906119894 is a function of 119883119894 and 119884119894(119889) 119906119894 = 119884119894(0) minus 120572 minus 119883prime
119894 120573 when 119863119894 = 0 119906119894 = 119884119894(1) minus 120572 minus 120591 minus 119883prime
119894 120573 when 119863119894 = 1 condition on 119883119894 only variation in 119906119894 comes from 119884119894(119889)
bull No unmeasured confounding implies this assumption
119863119894 ⟂⟂ 1114100119884119894(1) 119884119894(0)1114103|119883119894 ⟹ 119863119894 ⟂⟂ 119906119894|119883119894 ⟹ Cov(119894 119894) = 0
Omitted variable bias
bull What happens when this is violated Suppose that there isone omitted variable (residualized from 119883119894)
119894 = 120582119894 + 120596119894
bull Wersquoll assume that if we could measure 119871119894 then nounmeasured confounding would hold
bull Leads to inconsistency in the OLS estimator
plim 111369611136931113700 = 120591 + 120582Cov(119894 119894)Var(119894)
bull Bias here is terms multiplied together1 coefficient on 119871119894 (120582)2 the coefficient of regression of 119863119894 on 119871119894 also controlling for 119883119894
4 Estimating causaleffects under nounmeasuredconfounders
Basic approach to estimation
bull Remember the usual approach to estimating the ATE withcovariates
bull Stratification Stratify the units by the covariates Calculate CATE within these strata
bull Standardizationdirect adjustment Average the CATEs across the strata to get ATE
bull How to create strata when 119883 has continuous components If 119883 is discrete with only a few levels can use the exact values
of 119883 Otherwise we may have to subclassifycoarsen the data
Classic example cigarspipes versuscigarettes
bull 119863119894 = 1 for pipecigar smokers 119863119894 = 0 for cigarette smokersbull 119884119894 = death in the first year of follow-upbull Naive positive effect cigarpipe smokers more likely to die
Whatrsquos the confounder here Age Pipecigar smokers much older than cigarette smokers
bull Cochranrsquos approach stratify based on coarsened age Divide age into 119896 strata 119878119894 isin 1199041113568 1199041113569 hellip 119904119896 1199041113568 might be 18-25 1199041113569 might be 26-35 and so on Calculate effect within strata and aggregate
bull Key assumption no unmeasured confounders using stratifiedversion of age
119863119894 ⟂⟂ 1114100119884119894(0) 119884119894(1)1114103|119878119894
Stratification on the propensity score
bull What about when 119883 has has many dimensionsbull Curse of dimensionality there will be very few if any units in
a given stratum of 119883119894bull Stratify on a low-dimensional summary the propensity score
119890(119909) = ℙ[119863119894 = 1|119883119894 = 119909]
PS = unitrsquos probability of being treated conditional on 119883119894
bull For a particular unit this is 119890(119883119894) = ℙ[119863119894 = 1|119883119894]bull Rosenbaum and Rubin (1983) showed that
119863119894 ⟂⟂ 1114100119884119894(0) 119884119894(1)1114103 | 119883119894 ⟹ 119863119894 ⟂⟂ 1114100119884119894(0) 119884119894(1)1114103 | 119890(119883119894)
stratifying on 119890119894 is the same as stratifying on the full 119883119894
Propensity score as balancing score
bull The propensity score is actually a balancing score whichmeans that
119863119894 ⟂⟂ 119883119894 | 119890(119883119894)
bull Conditional on the propensity score treatment is independentof the covariates
Treatment status is said to be balanced 119891(119883119894|119863119894 = 1 119890(119883119894)) = 119891(119883119894|119863119894 = 0 119890(119883119894))
bull Of course we have to know the true PS to have all theseresults work
Estimating the propensity score
bull Of course in observational studies we donrsquot know thepropensity score
bull We would run a parametric model with parameters 120574 toestimate the propensity scores
1 Estimate 2 Create 119894 = Pr[119863119894 = 1|119883119894 ]
bull For instance in R we could easily calculate the propensityscores using the glm function
pscores lt- glm(treat ~ var1 + var2 + var3 data = mydata
family = binomial())$fittedvalues
Propensity score specifics
bull What variables do we include in the propensity score model Any set of variables that blocks all the backdoor paths from 119863119894
to 119884119894
bull Check balance within strata of 119894 Covariates should bebalanced
119891(119883119894|119863119894 = 1 119894) = 119891(119883119894|119863119894 = 0 119894)
bull Can also use automatednonparametric tools for estimating 119894 Covariate Balancing Propensity Scores (Imai and Ratkovic)
Stratifying by the propensity score
bull How will we use the propensity score Matching (next week) Weighting (two weeks) Regression
(three weeks)bull Today coarsening the propensity score and stratifyingbull Choose boundary points 0 = 1198871113567 lt 1198871113568 lt hellip lt 119887119870minus1113568 lt 119887119870 = 1bull Create block indicators
119861119894(119896) =
⎧⎪⎪⎨⎪⎪⎩1 if 119887119896minus1113568 lt (119883119894) lt 1198871198960 otherwise
bull Calculate within-strata effect estimates
120591119896 = 120124[119884119894|119863119894 = 1 119861119894(119896) = 1] minus 120124[119884119894|119863119894 = 0 119861119894(119896) = 1]
Standardizationdirect adjustment
bull We calculated the CATEs for each strata of the PS 120591119896bull We can use law of iterated expectations to back out the ATEbull Take the average of the CATEs over the distribution of 119883
120591 =1198701114012119896=1113568
120591119896ℙ[119861119894(119896) = 1]
bull Note that ℙ[119861119894(119896) = 1] is just the proportion of units in block 119896
ℙ[119861119894(119896) = 1] =sum119873
119894=1113568 119861119894(119896)119873
5 Wrapping Up
Summary
bull Defined observational studiesbull Defined confounding and assessed when no unmeasured
confounding holdsbull Saw how no unmeasured confounding helps with OLSbull Saw how to estimate causal effects under no unmeasured
confounding using the propensity score
Next few weeks
bull Learn how to estimate causal effects under no unmeasuredconfounders via
Matching Weighting Regression
bull Then we move onto situations where no unmeasuredconfounders is violated
Backdoor criterion
bull Can we use a DAG to evaluate no unmeasured confoundersbull Pearl answered yes with the backdoor criterion which states
that the effect of 119863 on 119884 is identified if1 No backdoor paths from 119863 to 119884 OR2 Measured covariates are sufficient to block all backdoor paths
from 119863 to 119884
bull First is really only valid for randomized experimentsbull The backdoor criterion is fairly powerful Tells us
if there confounding given this DAG if it is possible to removing the confounding and what variables to condition on to eliminate the confounding
SWIGs
119863 | 119889 119884(119889)
119880 119883
119884
bull Itrsquos a little hard to see how the backdoor criterion implies nounmeasured confounders
No potential outcomes on this graphbull Richardson and Robins Single World Intervention Graphs
Split 119863 node into natural value (119863) and intervention value 119889 Let all effects of 119863 take their potential value under
intervention 119884(119889)bull Now can see are 119863 and 119884(119889) related
119863 larr 119880 rarr 119883 rarr 119884(119889) implies not independent Conditioning on 119883 blocks that backdoor path 119863 ⟂⟂ 119884(119889)|119883
No unmeasured confounders is nottestable
bull No unmeasured confounding places no restrictions on theobserved data
1114100119884119894(0)|119863119894 = 1119883119894111410311140591113840111384011138401113840111384011138401113840111384011138401113840111406011138401113840111384011138401113840111384011138401113840111384011138401114061unobserved
119889= 1114100119884119894(0)|119863119894 = 0119883119894111410311140591113840111384011138401113840111384011138401113840111384011138401113840111406011138401113840111384011138401113840111384011138401113840111384011138401114061observed
bull Here 119889= means equal in distributionbull No way to directly test this assumption without the
counterfactual data which is missing by definitionbull With backdoor criterion you must have the correct DAG
Assessing no unmeasured confounders
bull Can do ldquoplacebordquo tests where 119863119894 cannot have an effect(lagged outcomes etc)
bull Della Vigna and Kaplan (2007 QJE) effect of Fox Newsavailability on Republican vote share
Availability in 20002003 canrsquot affect past vote sharesbull Unconfoundedness could still be violated even if you pass this
test
Alternatives to no unmeasuredconfounding
bull Without explicit randomization we need some way ofidentifying causal effects
bull No unmeasured confounders asymp randomized experiment Indentification results very similar to experiments
bull With unmeasured confounding are we doomed Maybe notbull Other approaches rely on finding plausibly exogenous variation
in assignment of 119863119894 Instrumental variables (randomization + exclusion restriction) Over-time variation (diff-in-diff fixed effects) Arbitrary thresholds for treatment assignment (RDD)
3 No unmeasuredconfounders and OLS
Justifying regression
bull We know how randomized experiments imply thatdifferences-in-means identify the ATE
bull In the next few weeks wersquoll work through how no unmeasuredconfounding justifies a number of estimation strategies
bull Today itrsquos useful to walk through what no unmeasuredconfounding can buy us in a familiar setting OLS
Wersquoll cover regression more formally later
Constant effects set up
bull Assume a constant effects setup
119884119894(0) = 120572 + 119883prime119894 120573 + 119906119894
119884119894(1) = 120572 + 120591 + 119883prime119894 120573 + 119906119894
bull Constant effects because 119884119894(1) minus 119884119894(0) = 120591 for all unitsbull Use consistency to get the usual regression formula
119884119894 = 119884119894(1)119863119894 + 119884119894(0)(1 minus 119863119894)= 119884119894(0) + 1114100119884119894(1) minus 119884119894(0)1114103 sdot 119863119894
= 120572 + 120591 sdot 119863119894 + 119883prime119894 120573 + 119906119894
bull Does no unmeasured confounding help us identify the causalparameter 120591
Regression on residuals
bull First estimate the residuals of regression of the treatment andoutcome on the covariates
119894 = 119884119894 minus 120124[119884119894|119883119894]119894 = 119863119894 minus 120124[119863119894|119883119894]
bull Running a regression of 119894 on 119894 is equivalent to controllingfor 119883119894
119884119894 = 120572 + 120591 sdot 119863119894 + 119883prime119894 120573 + 119906119894
119894 = 120572 + 120591 sdot 119894 + 119894bull Here 119894 = 119906119894 minus 120124[119906119894|119883119894]
What does OLS estimate
bull Using the usual OLS theory we can show that the probabilitylimit of the OLS estimator of 120591 is
plim 111369611136931113700 =Cov(119894 119894)Var(119894)
= Cov(119894 120572 + 120591 sdot 119894 + 119894)Var(119894)
= 120591 sdot Cov(119894 119894) + Cov(119894 119894)Var(119894)
= 120591 + Cov(119894 119894)Var(119894)
Key OLS assumption
plim 111369611136931113700 = 120591 + Cov(119894 119894)Var(119894)
bull Key identification comes from Cov(119894 119894) = 0 Conditional on 119883119894 no relationship between 119863119894 and 119906119894
bull Note 119906119894 is a function of 119883119894 and 119884119894(119889) 119906119894 = 119884119894(0) minus 120572 minus 119883prime
119894 120573 when 119863119894 = 0 119906119894 = 119884119894(1) minus 120572 minus 120591 minus 119883prime
119894 120573 when 119863119894 = 1 condition on 119883119894 only variation in 119906119894 comes from 119884119894(119889)
bull No unmeasured confounding implies this assumption
119863119894 ⟂⟂ 1114100119884119894(1) 119884119894(0)1114103|119883119894 ⟹ 119863119894 ⟂⟂ 119906119894|119883119894 ⟹ Cov(119894 119894) = 0
Omitted variable bias
bull What happens when this is violated Suppose that there isone omitted variable (residualized from 119883119894)
119894 = 120582119894 + 120596119894
bull Wersquoll assume that if we could measure 119871119894 then nounmeasured confounding would hold
bull Leads to inconsistency in the OLS estimator
plim 111369611136931113700 = 120591 + 120582Cov(119894 119894)Var(119894)
bull Bias here is terms multiplied together1 coefficient on 119871119894 (120582)2 the coefficient of regression of 119863119894 on 119871119894 also controlling for 119883119894
4 Estimating causaleffects under nounmeasuredconfounders
Basic approach to estimation
bull Remember the usual approach to estimating the ATE withcovariates
bull Stratification Stratify the units by the covariates Calculate CATE within these strata
bull Standardizationdirect adjustment Average the CATEs across the strata to get ATE
bull How to create strata when 119883 has continuous components If 119883 is discrete with only a few levels can use the exact values
of 119883 Otherwise we may have to subclassifycoarsen the data
Classic example cigarspipes versuscigarettes
bull 119863119894 = 1 for pipecigar smokers 119863119894 = 0 for cigarette smokersbull 119884119894 = death in the first year of follow-upbull Naive positive effect cigarpipe smokers more likely to die
Whatrsquos the confounder here Age Pipecigar smokers much older than cigarette smokers
bull Cochranrsquos approach stratify based on coarsened age Divide age into 119896 strata 119878119894 isin 1199041113568 1199041113569 hellip 119904119896 1199041113568 might be 18-25 1199041113569 might be 26-35 and so on Calculate effect within strata and aggregate
bull Key assumption no unmeasured confounders using stratifiedversion of age
119863119894 ⟂⟂ 1114100119884119894(0) 119884119894(1)1114103|119878119894
Stratification on the propensity score
bull What about when 119883 has has many dimensionsbull Curse of dimensionality there will be very few if any units in
a given stratum of 119883119894bull Stratify on a low-dimensional summary the propensity score
119890(119909) = ℙ[119863119894 = 1|119883119894 = 119909]
PS = unitrsquos probability of being treated conditional on 119883119894
bull For a particular unit this is 119890(119883119894) = ℙ[119863119894 = 1|119883119894]bull Rosenbaum and Rubin (1983) showed that
119863119894 ⟂⟂ 1114100119884119894(0) 119884119894(1)1114103 | 119883119894 ⟹ 119863119894 ⟂⟂ 1114100119884119894(0) 119884119894(1)1114103 | 119890(119883119894)
stratifying on 119890119894 is the same as stratifying on the full 119883119894
Propensity score as balancing score
bull The propensity score is actually a balancing score whichmeans that
119863119894 ⟂⟂ 119883119894 | 119890(119883119894)
bull Conditional on the propensity score treatment is independentof the covariates
Treatment status is said to be balanced 119891(119883119894|119863119894 = 1 119890(119883119894)) = 119891(119883119894|119863119894 = 0 119890(119883119894))
bull Of course we have to know the true PS to have all theseresults work
Estimating the propensity score
bull Of course in observational studies we donrsquot know thepropensity score
bull We would run a parametric model with parameters 120574 toestimate the propensity scores
1 Estimate 2 Create 119894 = Pr[119863119894 = 1|119883119894 ]
bull For instance in R we could easily calculate the propensityscores using the glm function
pscores lt- glm(treat ~ var1 + var2 + var3 data = mydata
family = binomial())$fittedvalues
Propensity score specifics
bull What variables do we include in the propensity score model Any set of variables that blocks all the backdoor paths from 119863119894
to 119884119894
bull Check balance within strata of 119894 Covariates should bebalanced
119891(119883119894|119863119894 = 1 119894) = 119891(119883119894|119863119894 = 0 119894)
bull Can also use automatednonparametric tools for estimating 119894 Covariate Balancing Propensity Scores (Imai and Ratkovic)
Stratifying by the propensity score
bull How will we use the propensity score Matching (next week) Weighting (two weeks) Regression
(three weeks)bull Today coarsening the propensity score and stratifyingbull Choose boundary points 0 = 1198871113567 lt 1198871113568 lt hellip lt 119887119870minus1113568 lt 119887119870 = 1bull Create block indicators
119861119894(119896) =
⎧⎪⎪⎨⎪⎪⎩1 if 119887119896minus1113568 lt (119883119894) lt 1198871198960 otherwise
bull Calculate within-strata effect estimates
120591119896 = 120124[119884119894|119863119894 = 1 119861119894(119896) = 1] minus 120124[119884119894|119863119894 = 0 119861119894(119896) = 1]
Standardizationdirect adjustment
bull We calculated the CATEs for each strata of the PS 120591119896bull We can use law of iterated expectations to back out the ATEbull Take the average of the CATEs over the distribution of 119883
120591 =1198701114012119896=1113568
120591119896ℙ[119861119894(119896) = 1]
bull Note that ℙ[119861119894(119896) = 1] is just the proportion of units in block 119896
ℙ[119861119894(119896) = 1] =sum119873
119894=1113568 119861119894(119896)119873
5 Wrapping Up
Summary
bull Defined observational studiesbull Defined confounding and assessed when no unmeasured
confounding holdsbull Saw how no unmeasured confounding helps with OLSbull Saw how to estimate causal effects under no unmeasured
confounding using the propensity score
Next few weeks
bull Learn how to estimate causal effects under no unmeasuredconfounders via
Matching Weighting Regression
bull Then we move onto situations where no unmeasuredconfounders is violated
SWIGs
119863 | 119889 119884(119889)
119880 119883
119884
bull Itrsquos a little hard to see how the backdoor criterion implies nounmeasured confounders
No potential outcomes on this graphbull Richardson and Robins Single World Intervention Graphs
Split 119863 node into natural value (119863) and intervention value 119889 Let all effects of 119863 take their potential value under
intervention 119884(119889)bull Now can see are 119863 and 119884(119889) related
119863 larr 119880 rarr 119883 rarr 119884(119889) implies not independent Conditioning on 119883 blocks that backdoor path 119863 ⟂⟂ 119884(119889)|119883
No unmeasured confounders is nottestable
bull No unmeasured confounding places no restrictions on theobserved data
1114100119884119894(0)|119863119894 = 1119883119894111410311140591113840111384011138401113840111384011138401113840111384011138401113840111406011138401113840111384011138401113840111384011138401113840111384011138401114061unobserved
119889= 1114100119884119894(0)|119863119894 = 0119883119894111410311140591113840111384011138401113840111384011138401113840111384011138401113840111406011138401113840111384011138401113840111384011138401113840111384011138401114061observed
bull Here 119889= means equal in distributionbull No way to directly test this assumption without the
counterfactual data which is missing by definitionbull With backdoor criterion you must have the correct DAG
Assessing no unmeasured confounders
bull Can do ldquoplacebordquo tests where 119863119894 cannot have an effect(lagged outcomes etc)
bull Della Vigna and Kaplan (2007 QJE) effect of Fox Newsavailability on Republican vote share
Availability in 20002003 canrsquot affect past vote sharesbull Unconfoundedness could still be violated even if you pass this
test
Alternatives to no unmeasuredconfounding
bull Without explicit randomization we need some way ofidentifying causal effects
bull No unmeasured confounders asymp randomized experiment Indentification results very similar to experiments
bull With unmeasured confounding are we doomed Maybe notbull Other approaches rely on finding plausibly exogenous variation
in assignment of 119863119894 Instrumental variables (randomization + exclusion restriction) Over-time variation (diff-in-diff fixed effects) Arbitrary thresholds for treatment assignment (RDD)
3 No unmeasuredconfounders and OLS
Justifying regression
bull We know how randomized experiments imply thatdifferences-in-means identify the ATE
bull In the next few weeks wersquoll work through how no unmeasuredconfounding justifies a number of estimation strategies
bull Today itrsquos useful to walk through what no unmeasuredconfounding can buy us in a familiar setting OLS
Wersquoll cover regression more formally later
Constant effects set up
bull Assume a constant effects setup
119884119894(0) = 120572 + 119883prime119894 120573 + 119906119894
119884119894(1) = 120572 + 120591 + 119883prime119894 120573 + 119906119894
bull Constant effects because 119884119894(1) minus 119884119894(0) = 120591 for all unitsbull Use consistency to get the usual regression formula
119884119894 = 119884119894(1)119863119894 + 119884119894(0)(1 minus 119863119894)= 119884119894(0) + 1114100119884119894(1) minus 119884119894(0)1114103 sdot 119863119894
= 120572 + 120591 sdot 119863119894 + 119883prime119894 120573 + 119906119894
bull Does no unmeasured confounding help us identify the causalparameter 120591
Regression on residuals
bull First estimate the residuals of regression of the treatment andoutcome on the covariates
119894 = 119884119894 minus 120124[119884119894|119883119894]119894 = 119863119894 minus 120124[119863119894|119883119894]
bull Running a regression of 119894 on 119894 is equivalent to controllingfor 119883119894
119884119894 = 120572 + 120591 sdot 119863119894 + 119883prime119894 120573 + 119906119894
119894 = 120572 + 120591 sdot 119894 + 119894bull Here 119894 = 119906119894 minus 120124[119906119894|119883119894]
What does OLS estimate
bull Using the usual OLS theory we can show that the probabilitylimit of the OLS estimator of 120591 is
plim 111369611136931113700 =Cov(119894 119894)Var(119894)
= Cov(119894 120572 + 120591 sdot 119894 + 119894)Var(119894)
= 120591 sdot Cov(119894 119894) + Cov(119894 119894)Var(119894)
= 120591 + Cov(119894 119894)Var(119894)
Key OLS assumption
plim 111369611136931113700 = 120591 + Cov(119894 119894)Var(119894)
bull Key identification comes from Cov(119894 119894) = 0 Conditional on 119883119894 no relationship between 119863119894 and 119906119894
bull Note 119906119894 is a function of 119883119894 and 119884119894(119889) 119906119894 = 119884119894(0) minus 120572 minus 119883prime
119894 120573 when 119863119894 = 0 119906119894 = 119884119894(1) minus 120572 minus 120591 minus 119883prime
119894 120573 when 119863119894 = 1 condition on 119883119894 only variation in 119906119894 comes from 119884119894(119889)
bull No unmeasured confounding implies this assumption
119863119894 ⟂⟂ 1114100119884119894(1) 119884119894(0)1114103|119883119894 ⟹ 119863119894 ⟂⟂ 119906119894|119883119894 ⟹ Cov(119894 119894) = 0
Omitted variable bias
bull What happens when this is violated Suppose that there isone omitted variable (residualized from 119883119894)
119894 = 120582119894 + 120596119894
bull Wersquoll assume that if we could measure 119871119894 then nounmeasured confounding would hold
bull Leads to inconsistency in the OLS estimator
plim 111369611136931113700 = 120591 + 120582Cov(119894 119894)Var(119894)
bull Bias here is terms multiplied together1 coefficient on 119871119894 (120582)2 the coefficient of regression of 119863119894 on 119871119894 also controlling for 119883119894
4 Estimating causaleffects under nounmeasuredconfounders
Basic approach to estimation
bull Remember the usual approach to estimating the ATE withcovariates
bull Stratification Stratify the units by the covariates Calculate CATE within these strata
bull Standardizationdirect adjustment Average the CATEs across the strata to get ATE
bull How to create strata when 119883 has continuous components If 119883 is discrete with only a few levels can use the exact values
of 119883 Otherwise we may have to subclassifycoarsen the data
Classic example cigarspipes versuscigarettes
bull 119863119894 = 1 for pipecigar smokers 119863119894 = 0 for cigarette smokersbull 119884119894 = death in the first year of follow-upbull Naive positive effect cigarpipe smokers more likely to die
Whatrsquos the confounder here Age Pipecigar smokers much older than cigarette smokers
bull Cochranrsquos approach stratify based on coarsened age Divide age into 119896 strata 119878119894 isin 1199041113568 1199041113569 hellip 119904119896 1199041113568 might be 18-25 1199041113569 might be 26-35 and so on Calculate effect within strata and aggregate
bull Key assumption no unmeasured confounders using stratifiedversion of age
119863119894 ⟂⟂ 1114100119884119894(0) 119884119894(1)1114103|119878119894
Stratification on the propensity score
bull What about when 119883 has has many dimensionsbull Curse of dimensionality there will be very few if any units in
a given stratum of 119883119894bull Stratify on a low-dimensional summary the propensity score
119890(119909) = ℙ[119863119894 = 1|119883119894 = 119909]
PS = unitrsquos probability of being treated conditional on 119883119894
bull For a particular unit this is 119890(119883119894) = ℙ[119863119894 = 1|119883119894]bull Rosenbaum and Rubin (1983) showed that
119863119894 ⟂⟂ 1114100119884119894(0) 119884119894(1)1114103 | 119883119894 ⟹ 119863119894 ⟂⟂ 1114100119884119894(0) 119884119894(1)1114103 | 119890(119883119894)
stratifying on 119890119894 is the same as stratifying on the full 119883119894
Propensity score as balancing score
bull The propensity score is actually a balancing score whichmeans that
119863119894 ⟂⟂ 119883119894 | 119890(119883119894)
bull Conditional on the propensity score treatment is independentof the covariates
Treatment status is said to be balanced 119891(119883119894|119863119894 = 1 119890(119883119894)) = 119891(119883119894|119863119894 = 0 119890(119883119894))
bull Of course we have to know the true PS to have all theseresults work
Estimating the propensity score
bull Of course in observational studies we donrsquot know thepropensity score
bull We would run a parametric model with parameters 120574 toestimate the propensity scores
1 Estimate 2 Create 119894 = Pr[119863119894 = 1|119883119894 ]
bull For instance in R we could easily calculate the propensityscores using the glm function
pscores lt- glm(treat ~ var1 + var2 + var3 data = mydata
family = binomial())$fittedvalues
Propensity score specifics
bull What variables do we include in the propensity score model Any set of variables that blocks all the backdoor paths from 119863119894
to 119884119894
bull Check balance within strata of 119894 Covariates should bebalanced
119891(119883119894|119863119894 = 1 119894) = 119891(119883119894|119863119894 = 0 119894)
bull Can also use automatednonparametric tools for estimating 119894 Covariate Balancing Propensity Scores (Imai and Ratkovic)
Stratifying by the propensity score
bull How will we use the propensity score Matching (next week) Weighting (two weeks) Regression
(three weeks)bull Today coarsening the propensity score and stratifyingbull Choose boundary points 0 = 1198871113567 lt 1198871113568 lt hellip lt 119887119870minus1113568 lt 119887119870 = 1bull Create block indicators
119861119894(119896) =
⎧⎪⎪⎨⎪⎪⎩1 if 119887119896minus1113568 lt (119883119894) lt 1198871198960 otherwise
bull Calculate within-strata effect estimates
120591119896 = 120124[119884119894|119863119894 = 1 119861119894(119896) = 1] minus 120124[119884119894|119863119894 = 0 119861119894(119896) = 1]
Standardizationdirect adjustment
bull We calculated the CATEs for each strata of the PS 120591119896bull We can use law of iterated expectations to back out the ATEbull Take the average of the CATEs over the distribution of 119883
120591 =1198701114012119896=1113568
120591119896ℙ[119861119894(119896) = 1]
bull Note that ℙ[119861119894(119896) = 1] is just the proportion of units in block 119896
ℙ[119861119894(119896) = 1] =sum119873
119894=1113568 119861119894(119896)119873
5 Wrapping Up
Summary
bull Defined observational studiesbull Defined confounding and assessed when no unmeasured
confounding holdsbull Saw how no unmeasured confounding helps with OLSbull Saw how to estimate causal effects under no unmeasured
confounding using the propensity score
Next few weeks
bull Learn how to estimate causal effects under no unmeasuredconfounders via
Matching Weighting Regression
bull Then we move onto situations where no unmeasuredconfounders is violated
No unmeasured confounders is nottestable
bull No unmeasured confounding places no restrictions on theobserved data
1114100119884119894(0)|119863119894 = 1119883119894111410311140591113840111384011138401113840111384011138401113840111384011138401113840111406011138401113840111384011138401113840111384011138401113840111384011138401114061unobserved
119889= 1114100119884119894(0)|119863119894 = 0119883119894111410311140591113840111384011138401113840111384011138401113840111384011138401113840111406011138401113840111384011138401113840111384011138401113840111384011138401114061observed
bull Here 119889= means equal in distributionbull No way to directly test this assumption without the
counterfactual data which is missing by definitionbull With backdoor criterion you must have the correct DAG
Assessing no unmeasured confounders
bull Can do ldquoplacebordquo tests where 119863119894 cannot have an effect(lagged outcomes etc)
bull Della Vigna and Kaplan (2007 QJE) effect of Fox Newsavailability on Republican vote share
Availability in 20002003 canrsquot affect past vote sharesbull Unconfoundedness could still be violated even if you pass this
test
Alternatives to no unmeasuredconfounding
bull Without explicit randomization we need some way ofidentifying causal effects
bull No unmeasured confounders asymp randomized experiment Indentification results very similar to experiments
bull With unmeasured confounding are we doomed Maybe notbull Other approaches rely on finding plausibly exogenous variation
in assignment of 119863119894 Instrumental variables (randomization + exclusion restriction) Over-time variation (diff-in-diff fixed effects) Arbitrary thresholds for treatment assignment (RDD)
3 No unmeasuredconfounders and OLS
Justifying regression
bull We know how randomized experiments imply thatdifferences-in-means identify the ATE
bull In the next few weeks wersquoll work through how no unmeasuredconfounding justifies a number of estimation strategies
bull Today itrsquos useful to walk through what no unmeasuredconfounding can buy us in a familiar setting OLS
Wersquoll cover regression more formally later
Constant effects set up
bull Assume a constant effects setup
119884119894(0) = 120572 + 119883prime119894 120573 + 119906119894
119884119894(1) = 120572 + 120591 + 119883prime119894 120573 + 119906119894
bull Constant effects because 119884119894(1) minus 119884119894(0) = 120591 for all unitsbull Use consistency to get the usual regression formula
119884119894 = 119884119894(1)119863119894 + 119884119894(0)(1 minus 119863119894)= 119884119894(0) + 1114100119884119894(1) minus 119884119894(0)1114103 sdot 119863119894
= 120572 + 120591 sdot 119863119894 + 119883prime119894 120573 + 119906119894
bull Does no unmeasured confounding help us identify the causalparameter 120591
Regression on residuals
bull First estimate the residuals of regression of the treatment andoutcome on the covariates
119894 = 119884119894 minus 120124[119884119894|119883119894]119894 = 119863119894 minus 120124[119863119894|119883119894]
bull Running a regression of 119894 on 119894 is equivalent to controllingfor 119883119894
119884119894 = 120572 + 120591 sdot 119863119894 + 119883prime119894 120573 + 119906119894
119894 = 120572 + 120591 sdot 119894 + 119894bull Here 119894 = 119906119894 minus 120124[119906119894|119883119894]
What does OLS estimate
bull Using the usual OLS theory we can show that the probabilitylimit of the OLS estimator of 120591 is
plim 111369611136931113700 =Cov(119894 119894)Var(119894)
= Cov(119894 120572 + 120591 sdot 119894 + 119894)Var(119894)
= 120591 sdot Cov(119894 119894) + Cov(119894 119894)Var(119894)
= 120591 + Cov(119894 119894)Var(119894)
Key OLS assumption
plim 111369611136931113700 = 120591 + Cov(119894 119894)Var(119894)
bull Key identification comes from Cov(119894 119894) = 0 Conditional on 119883119894 no relationship between 119863119894 and 119906119894
bull Note 119906119894 is a function of 119883119894 and 119884119894(119889) 119906119894 = 119884119894(0) minus 120572 minus 119883prime
119894 120573 when 119863119894 = 0 119906119894 = 119884119894(1) minus 120572 minus 120591 minus 119883prime
119894 120573 when 119863119894 = 1 condition on 119883119894 only variation in 119906119894 comes from 119884119894(119889)
bull No unmeasured confounding implies this assumption
119863119894 ⟂⟂ 1114100119884119894(1) 119884119894(0)1114103|119883119894 ⟹ 119863119894 ⟂⟂ 119906119894|119883119894 ⟹ Cov(119894 119894) = 0
Omitted variable bias
bull What happens when this is violated Suppose that there isone omitted variable (residualized from 119883119894)
119894 = 120582119894 + 120596119894
bull Wersquoll assume that if we could measure 119871119894 then nounmeasured confounding would hold
bull Leads to inconsistency in the OLS estimator
plim 111369611136931113700 = 120591 + 120582Cov(119894 119894)Var(119894)
bull Bias here is terms multiplied together1 coefficient on 119871119894 (120582)2 the coefficient of regression of 119863119894 on 119871119894 also controlling for 119883119894
4 Estimating causaleffects under nounmeasuredconfounders
Basic approach to estimation
bull Remember the usual approach to estimating the ATE withcovariates
bull Stratification Stratify the units by the covariates Calculate CATE within these strata
bull Standardizationdirect adjustment Average the CATEs across the strata to get ATE
bull How to create strata when 119883 has continuous components If 119883 is discrete with only a few levels can use the exact values
of 119883 Otherwise we may have to subclassifycoarsen the data
Classic example cigarspipes versuscigarettes
bull 119863119894 = 1 for pipecigar smokers 119863119894 = 0 for cigarette smokersbull 119884119894 = death in the first year of follow-upbull Naive positive effect cigarpipe smokers more likely to die
Whatrsquos the confounder here Age Pipecigar smokers much older than cigarette smokers
bull Cochranrsquos approach stratify based on coarsened age Divide age into 119896 strata 119878119894 isin 1199041113568 1199041113569 hellip 119904119896 1199041113568 might be 18-25 1199041113569 might be 26-35 and so on Calculate effect within strata and aggregate
bull Key assumption no unmeasured confounders using stratifiedversion of age
119863119894 ⟂⟂ 1114100119884119894(0) 119884119894(1)1114103|119878119894
Stratification on the propensity score
bull What about when 119883 has has many dimensionsbull Curse of dimensionality there will be very few if any units in
a given stratum of 119883119894bull Stratify on a low-dimensional summary the propensity score
119890(119909) = ℙ[119863119894 = 1|119883119894 = 119909]
PS = unitrsquos probability of being treated conditional on 119883119894
bull For a particular unit this is 119890(119883119894) = ℙ[119863119894 = 1|119883119894]bull Rosenbaum and Rubin (1983) showed that
119863119894 ⟂⟂ 1114100119884119894(0) 119884119894(1)1114103 | 119883119894 ⟹ 119863119894 ⟂⟂ 1114100119884119894(0) 119884119894(1)1114103 | 119890(119883119894)
stratifying on 119890119894 is the same as stratifying on the full 119883119894
Propensity score as balancing score
bull The propensity score is actually a balancing score whichmeans that
119863119894 ⟂⟂ 119883119894 | 119890(119883119894)
bull Conditional on the propensity score treatment is independentof the covariates
Treatment status is said to be balanced 119891(119883119894|119863119894 = 1 119890(119883119894)) = 119891(119883119894|119863119894 = 0 119890(119883119894))
bull Of course we have to know the true PS to have all theseresults work
Estimating the propensity score
bull Of course in observational studies we donrsquot know thepropensity score
bull We would run a parametric model with parameters 120574 toestimate the propensity scores
1 Estimate 2 Create 119894 = Pr[119863119894 = 1|119883119894 ]
bull For instance in R we could easily calculate the propensityscores using the glm function
pscores lt- glm(treat ~ var1 + var2 + var3 data = mydata
family = binomial())$fittedvalues
Propensity score specifics
bull What variables do we include in the propensity score model Any set of variables that blocks all the backdoor paths from 119863119894
to 119884119894
bull Check balance within strata of 119894 Covariates should bebalanced
119891(119883119894|119863119894 = 1 119894) = 119891(119883119894|119863119894 = 0 119894)
bull Can also use automatednonparametric tools for estimating 119894 Covariate Balancing Propensity Scores (Imai and Ratkovic)
Stratifying by the propensity score
bull How will we use the propensity score Matching (next week) Weighting (two weeks) Regression
(three weeks)bull Today coarsening the propensity score and stratifyingbull Choose boundary points 0 = 1198871113567 lt 1198871113568 lt hellip lt 119887119870minus1113568 lt 119887119870 = 1bull Create block indicators
119861119894(119896) =
⎧⎪⎪⎨⎪⎪⎩1 if 119887119896minus1113568 lt (119883119894) lt 1198871198960 otherwise
bull Calculate within-strata effect estimates
120591119896 = 120124[119884119894|119863119894 = 1 119861119894(119896) = 1] minus 120124[119884119894|119863119894 = 0 119861119894(119896) = 1]
Standardizationdirect adjustment
bull We calculated the CATEs for each strata of the PS 120591119896bull We can use law of iterated expectations to back out the ATEbull Take the average of the CATEs over the distribution of 119883
120591 =1198701114012119896=1113568
120591119896ℙ[119861119894(119896) = 1]
bull Note that ℙ[119861119894(119896) = 1] is just the proportion of units in block 119896
ℙ[119861119894(119896) = 1] =sum119873
119894=1113568 119861119894(119896)119873
5 Wrapping Up
Summary
bull Defined observational studiesbull Defined confounding and assessed when no unmeasured
confounding holdsbull Saw how no unmeasured confounding helps with OLSbull Saw how to estimate causal effects under no unmeasured
confounding using the propensity score
Next few weeks
bull Learn how to estimate causal effects under no unmeasuredconfounders via
Matching Weighting Regression
bull Then we move onto situations where no unmeasuredconfounders is violated
Assessing no unmeasured confounders
bull Can do ldquoplacebordquo tests where 119863119894 cannot have an effect(lagged outcomes etc)
bull Della Vigna and Kaplan (2007 QJE) effect of Fox Newsavailability on Republican vote share
Availability in 20002003 canrsquot affect past vote sharesbull Unconfoundedness could still be violated even if you pass this
test
Alternatives to no unmeasuredconfounding
bull Without explicit randomization we need some way ofidentifying causal effects
bull No unmeasured confounders asymp randomized experiment Indentification results very similar to experiments
bull With unmeasured confounding are we doomed Maybe notbull Other approaches rely on finding plausibly exogenous variation
in assignment of 119863119894 Instrumental variables (randomization + exclusion restriction) Over-time variation (diff-in-diff fixed effects) Arbitrary thresholds for treatment assignment (RDD)
3 No unmeasuredconfounders and OLS
Justifying regression
bull We know how randomized experiments imply thatdifferences-in-means identify the ATE
bull In the next few weeks wersquoll work through how no unmeasuredconfounding justifies a number of estimation strategies
bull Today itrsquos useful to walk through what no unmeasuredconfounding can buy us in a familiar setting OLS
Wersquoll cover regression more formally later
Constant effects set up
bull Assume a constant effects setup
119884119894(0) = 120572 + 119883prime119894 120573 + 119906119894
119884119894(1) = 120572 + 120591 + 119883prime119894 120573 + 119906119894
bull Constant effects because 119884119894(1) minus 119884119894(0) = 120591 for all unitsbull Use consistency to get the usual regression formula
119884119894 = 119884119894(1)119863119894 + 119884119894(0)(1 minus 119863119894)= 119884119894(0) + 1114100119884119894(1) minus 119884119894(0)1114103 sdot 119863119894
= 120572 + 120591 sdot 119863119894 + 119883prime119894 120573 + 119906119894
bull Does no unmeasured confounding help us identify the causalparameter 120591
Regression on residuals
bull First estimate the residuals of regression of the treatment andoutcome on the covariates
119894 = 119884119894 minus 120124[119884119894|119883119894]119894 = 119863119894 minus 120124[119863119894|119883119894]
bull Running a regression of 119894 on 119894 is equivalent to controllingfor 119883119894
119884119894 = 120572 + 120591 sdot 119863119894 + 119883prime119894 120573 + 119906119894
119894 = 120572 + 120591 sdot 119894 + 119894bull Here 119894 = 119906119894 minus 120124[119906119894|119883119894]
What does OLS estimate
bull Using the usual OLS theory we can show that the probabilitylimit of the OLS estimator of 120591 is
plim 111369611136931113700 =Cov(119894 119894)Var(119894)
= Cov(119894 120572 + 120591 sdot 119894 + 119894)Var(119894)
= 120591 sdot Cov(119894 119894) + Cov(119894 119894)Var(119894)
= 120591 + Cov(119894 119894)Var(119894)
Key OLS assumption
plim 111369611136931113700 = 120591 + Cov(119894 119894)Var(119894)
bull Key identification comes from Cov(119894 119894) = 0 Conditional on 119883119894 no relationship between 119863119894 and 119906119894
bull Note 119906119894 is a function of 119883119894 and 119884119894(119889) 119906119894 = 119884119894(0) minus 120572 minus 119883prime
119894 120573 when 119863119894 = 0 119906119894 = 119884119894(1) minus 120572 minus 120591 minus 119883prime
119894 120573 when 119863119894 = 1 condition on 119883119894 only variation in 119906119894 comes from 119884119894(119889)
bull No unmeasured confounding implies this assumption
119863119894 ⟂⟂ 1114100119884119894(1) 119884119894(0)1114103|119883119894 ⟹ 119863119894 ⟂⟂ 119906119894|119883119894 ⟹ Cov(119894 119894) = 0
Omitted variable bias
bull What happens when this is violated Suppose that there isone omitted variable (residualized from 119883119894)
119894 = 120582119894 + 120596119894
bull Wersquoll assume that if we could measure 119871119894 then nounmeasured confounding would hold
bull Leads to inconsistency in the OLS estimator
plim 111369611136931113700 = 120591 + 120582Cov(119894 119894)Var(119894)
bull Bias here is terms multiplied together1 coefficient on 119871119894 (120582)2 the coefficient of regression of 119863119894 on 119871119894 also controlling for 119883119894
4 Estimating causaleffects under nounmeasuredconfounders
Basic approach to estimation
bull Remember the usual approach to estimating the ATE withcovariates
bull Stratification Stratify the units by the covariates Calculate CATE within these strata
bull Standardizationdirect adjustment Average the CATEs across the strata to get ATE
bull How to create strata when 119883 has continuous components If 119883 is discrete with only a few levels can use the exact values
of 119883 Otherwise we may have to subclassifycoarsen the data
Classic example cigarspipes versuscigarettes
bull 119863119894 = 1 for pipecigar smokers 119863119894 = 0 for cigarette smokersbull 119884119894 = death in the first year of follow-upbull Naive positive effect cigarpipe smokers more likely to die
Whatrsquos the confounder here Age Pipecigar smokers much older than cigarette smokers
bull Cochranrsquos approach stratify based on coarsened age Divide age into 119896 strata 119878119894 isin 1199041113568 1199041113569 hellip 119904119896 1199041113568 might be 18-25 1199041113569 might be 26-35 and so on Calculate effect within strata and aggregate
bull Key assumption no unmeasured confounders using stratifiedversion of age
119863119894 ⟂⟂ 1114100119884119894(0) 119884119894(1)1114103|119878119894
Stratification on the propensity score
bull What about when 119883 has has many dimensionsbull Curse of dimensionality there will be very few if any units in
a given stratum of 119883119894bull Stratify on a low-dimensional summary the propensity score
119890(119909) = ℙ[119863119894 = 1|119883119894 = 119909]
PS = unitrsquos probability of being treated conditional on 119883119894
bull For a particular unit this is 119890(119883119894) = ℙ[119863119894 = 1|119883119894]bull Rosenbaum and Rubin (1983) showed that
119863119894 ⟂⟂ 1114100119884119894(0) 119884119894(1)1114103 | 119883119894 ⟹ 119863119894 ⟂⟂ 1114100119884119894(0) 119884119894(1)1114103 | 119890(119883119894)
stratifying on 119890119894 is the same as stratifying on the full 119883119894
Propensity score as balancing score
bull The propensity score is actually a balancing score whichmeans that
119863119894 ⟂⟂ 119883119894 | 119890(119883119894)
bull Conditional on the propensity score treatment is independentof the covariates
Treatment status is said to be balanced 119891(119883119894|119863119894 = 1 119890(119883119894)) = 119891(119883119894|119863119894 = 0 119890(119883119894))
bull Of course we have to know the true PS to have all theseresults work
Estimating the propensity score
bull Of course in observational studies we donrsquot know thepropensity score
bull We would run a parametric model with parameters 120574 toestimate the propensity scores
1 Estimate 2 Create 119894 = Pr[119863119894 = 1|119883119894 ]
bull For instance in R we could easily calculate the propensityscores using the glm function
pscores lt- glm(treat ~ var1 + var2 + var3 data = mydata
family = binomial())$fittedvalues
Propensity score specifics
bull What variables do we include in the propensity score model Any set of variables that blocks all the backdoor paths from 119863119894
to 119884119894
bull Check balance within strata of 119894 Covariates should bebalanced
119891(119883119894|119863119894 = 1 119894) = 119891(119883119894|119863119894 = 0 119894)
bull Can also use automatednonparametric tools for estimating 119894 Covariate Balancing Propensity Scores (Imai and Ratkovic)
Stratifying by the propensity score
bull How will we use the propensity score Matching (next week) Weighting (two weeks) Regression
(three weeks)bull Today coarsening the propensity score and stratifyingbull Choose boundary points 0 = 1198871113567 lt 1198871113568 lt hellip lt 119887119870minus1113568 lt 119887119870 = 1bull Create block indicators
119861119894(119896) =
⎧⎪⎪⎨⎪⎪⎩1 if 119887119896minus1113568 lt (119883119894) lt 1198871198960 otherwise
bull Calculate within-strata effect estimates
120591119896 = 120124[119884119894|119863119894 = 1 119861119894(119896) = 1] minus 120124[119884119894|119863119894 = 0 119861119894(119896) = 1]
Standardizationdirect adjustment
bull We calculated the CATEs for each strata of the PS 120591119896bull We can use law of iterated expectations to back out the ATEbull Take the average of the CATEs over the distribution of 119883
120591 =1198701114012119896=1113568
120591119896ℙ[119861119894(119896) = 1]
bull Note that ℙ[119861119894(119896) = 1] is just the proportion of units in block 119896
ℙ[119861119894(119896) = 1] =sum119873
119894=1113568 119861119894(119896)119873
5 Wrapping Up
Summary
bull Defined observational studiesbull Defined confounding and assessed when no unmeasured
confounding holdsbull Saw how no unmeasured confounding helps with OLSbull Saw how to estimate causal effects under no unmeasured
confounding using the propensity score
Next few weeks
bull Learn how to estimate causal effects under no unmeasuredconfounders via
Matching Weighting Regression
bull Then we move onto situations where no unmeasuredconfounders is violated
Alternatives to no unmeasuredconfounding
bull Without explicit randomization we need some way ofidentifying causal effects
bull No unmeasured confounders asymp randomized experiment Indentification results very similar to experiments
bull With unmeasured confounding are we doomed Maybe notbull Other approaches rely on finding plausibly exogenous variation
in assignment of 119863119894 Instrumental variables (randomization + exclusion restriction) Over-time variation (diff-in-diff fixed effects) Arbitrary thresholds for treatment assignment (RDD)
3 No unmeasuredconfounders and OLS
Justifying regression
bull We know how randomized experiments imply thatdifferences-in-means identify the ATE
bull In the next few weeks wersquoll work through how no unmeasuredconfounding justifies a number of estimation strategies
bull Today itrsquos useful to walk through what no unmeasuredconfounding can buy us in a familiar setting OLS
Wersquoll cover regression more formally later
Constant effects set up
bull Assume a constant effects setup
119884119894(0) = 120572 + 119883prime119894 120573 + 119906119894
119884119894(1) = 120572 + 120591 + 119883prime119894 120573 + 119906119894
bull Constant effects because 119884119894(1) minus 119884119894(0) = 120591 for all unitsbull Use consistency to get the usual regression formula
119884119894 = 119884119894(1)119863119894 + 119884119894(0)(1 minus 119863119894)= 119884119894(0) + 1114100119884119894(1) minus 119884119894(0)1114103 sdot 119863119894
= 120572 + 120591 sdot 119863119894 + 119883prime119894 120573 + 119906119894
bull Does no unmeasured confounding help us identify the causalparameter 120591
Regression on residuals
bull First estimate the residuals of regression of the treatment andoutcome on the covariates
119894 = 119884119894 minus 120124[119884119894|119883119894]119894 = 119863119894 minus 120124[119863119894|119883119894]
bull Running a regression of 119894 on 119894 is equivalent to controllingfor 119883119894
119884119894 = 120572 + 120591 sdot 119863119894 + 119883prime119894 120573 + 119906119894
119894 = 120572 + 120591 sdot 119894 + 119894bull Here 119894 = 119906119894 minus 120124[119906119894|119883119894]
What does OLS estimate
bull Using the usual OLS theory we can show that the probabilitylimit of the OLS estimator of 120591 is
plim 111369611136931113700 =Cov(119894 119894)Var(119894)
= Cov(119894 120572 + 120591 sdot 119894 + 119894)Var(119894)
= 120591 sdot Cov(119894 119894) + Cov(119894 119894)Var(119894)
= 120591 + Cov(119894 119894)Var(119894)
Key OLS assumption
plim 111369611136931113700 = 120591 + Cov(119894 119894)Var(119894)
bull Key identification comes from Cov(119894 119894) = 0 Conditional on 119883119894 no relationship between 119863119894 and 119906119894
bull Note 119906119894 is a function of 119883119894 and 119884119894(119889) 119906119894 = 119884119894(0) minus 120572 minus 119883prime
119894 120573 when 119863119894 = 0 119906119894 = 119884119894(1) minus 120572 minus 120591 minus 119883prime
119894 120573 when 119863119894 = 1 condition on 119883119894 only variation in 119906119894 comes from 119884119894(119889)
bull No unmeasured confounding implies this assumption
119863119894 ⟂⟂ 1114100119884119894(1) 119884119894(0)1114103|119883119894 ⟹ 119863119894 ⟂⟂ 119906119894|119883119894 ⟹ Cov(119894 119894) = 0
Omitted variable bias
bull What happens when this is violated Suppose that there isone omitted variable (residualized from 119883119894)
119894 = 120582119894 + 120596119894
bull Wersquoll assume that if we could measure 119871119894 then nounmeasured confounding would hold
bull Leads to inconsistency in the OLS estimator
plim 111369611136931113700 = 120591 + 120582Cov(119894 119894)Var(119894)
bull Bias here is terms multiplied together1 coefficient on 119871119894 (120582)2 the coefficient of regression of 119863119894 on 119871119894 also controlling for 119883119894
4 Estimating causaleffects under nounmeasuredconfounders
Basic approach to estimation
bull Remember the usual approach to estimating the ATE withcovariates
bull Stratification Stratify the units by the covariates Calculate CATE within these strata
bull Standardizationdirect adjustment Average the CATEs across the strata to get ATE
bull How to create strata when 119883 has continuous components If 119883 is discrete with only a few levels can use the exact values
of 119883 Otherwise we may have to subclassifycoarsen the data
Classic example cigarspipes versuscigarettes
bull 119863119894 = 1 for pipecigar smokers 119863119894 = 0 for cigarette smokersbull 119884119894 = death in the first year of follow-upbull Naive positive effect cigarpipe smokers more likely to die
Whatrsquos the confounder here Age Pipecigar smokers much older than cigarette smokers
bull Cochranrsquos approach stratify based on coarsened age Divide age into 119896 strata 119878119894 isin 1199041113568 1199041113569 hellip 119904119896 1199041113568 might be 18-25 1199041113569 might be 26-35 and so on Calculate effect within strata and aggregate
bull Key assumption no unmeasured confounders using stratifiedversion of age
119863119894 ⟂⟂ 1114100119884119894(0) 119884119894(1)1114103|119878119894
Stratification on the propensity score
bull What about when 119883 has has many dimensionsbull Curse of dimensionality there will be very few if any units in
a given stratum of 119883119894bull Stratify on a low-dimensional summary the propensity score
119890(119909) = ℙ[119863119894 = 1|119883119894 = 119909]
PS = unitrsquos probability of being treated conditional on 119883119894
bull For a particular unit this is 119890(119883119894) = ℙ[119863119894 = 1|119883119894]bull Rosenbaum and Rubin (1983) showed that
119863119894 ⟂⟂ 1114100119884119894(0) 119884119894(1)1114103 | 119883119894 ⟹ 119863119894 ⟂⟂ 1114100119884119894(0) 119884119894(1)1114103 | 119890(119883119894)
stratifying on 119890119894 is the same as stratifying on the full 119883119894
Propensity score as balancing score
bull The propensity score is actually a balancing score whichmeans that
119863119894 ⟂⟂ 119883119894 | 119890(119883119894)
bull Conditional on the propensity score treatment is independentof the covariates
Treatment status is said to be balanced 119891(119883119894|119863119894 = 1 119890(119883119894)) = 119891(119883119894|119863119894 = 0 119890(119883119894))
bull Of course we have to know the true PS to have all theseresults work
Estimating the propensity score
bull Of course in observational studies we donrsquot know thepropensity score
bull We would run a parametric model with parameters 120574 toestimate the propensity scores
1 Estimate 2 Create 119894 = Pr[119863119894 = 1|119883119894 ]
bull For instance in R we could easily calculate the propensityscores using the glm function
pscores lt- glm(treat ~ var1 + var2 + var3 data = mydata
family = binomial())$fittedvalues
Propensity score specifics
bull What variables do we include in the propensity score model Any set of variables that blocks all the backdoor paths from 119863119894
to 119884119894
bull Check balance within strata of 119894 Covariates should bebalanced
119891(119883119894|119863119894 = 1 119894) = 119891(119883119894|119863119894 = 0 119894)
bull Can also use automatednonparametric tools for estimating 119894 Covariate Balancing Propensity Scores (Imai and Ratkovic)
Stratifying by the propensity score
bull How will we use the propensity score Matching (next week) Weighting (two weeks) Regression
(three weeks)bull Today coarsening the propensity score and stratifyingbull Choose boundary points 0 = 1198871113567 lt 1198871113568 lt hellip lt 119887119870minus1113568 lt 119887119870 = 1bull Create block indicators
119861119894(119896) =
⎧⎪⎪⎨⎪⎪⎩1 if 119887119896minus1113568 lt (119883119894) lt 1198871198960 otherwise
bull Calculate within-strata effect estimates
120591119896 = 120124[119884119894|119863119894 = 1 119861119894(119896) = 1] minus 120124[119884119894|119863119894 = 0 119861119894(119896) = 1]
Standardizationdirect adjustment
bull We calculated the CATEs for each strata of the PS 120591119896bull We can use law of iterated expectations to back out the ATEbull Take the average of the CATEs over the distribution of 119883
120591 =1198701114012119896=1113568
120591119896ℙ[119861119894(119896) = 1]
bull Note that ℙ[119861119894(119896) = 1] is just the proportion of units in block 119896
ℙ[119861119894(119896) = 1] =sum119873
119894=1113568 119861119894(119896)119873
5 Wrapping Up
Summary
bull Defined observational studiesbull Defined confounding and assessed when no unmeasured
confounding holdsbull Saw how no unmeasured confounding helps with OLSbull Saw how to estimate causal effects under no unmeasured
confounding using the propensity score
Next few weeks
bull Learn how to estimate causal effects under no unmeasuredconfounders via
Matching Weighting Regression
bull Then we move onto situations where no unmeasuredconfounders is violated
3 No unmeasuredconfounders and OLS
Justifying regression
bull We know how randomized experiments imply thatdifferences-in-means identify the ATE
bull In the next few weeks wersquoll work through how no unmeasuredconfounding justifies a number of estimation strategies
bull Today itrsquos useful to walk through what no unmeasuredconfounding can buy us in a familiar setting OLS
Wersquoll cover regression more formally later
Constant effects set up
bull Assume a constant effects setup
119884119894(0) = 120572 + 119883prime119894 120573 + 119906119894
119884119894(1) = 120572 + 120591 + 119883prime119894 120573 + 119906119894
bull Constant effects because 119884119894(1) minus 119884119894(0) = 120591 for all unitsbull Use consistency to get the usual regression formula
119884119894 = 119884119894(1)119863119894 + 119884119894(0)(1 minus 119863119894)= 119884119894(0) + 1114100119884119894(1) minus 119884119894(0)1114103 sdot 119863119894
= 120572 + 120591 sdot 119863119894 + 119883prime119894 120573 + 119906119894
bull Does no unmeasured confounding help us identify the causalparameter 120591
Regression on residuals
bull First estimate the residuals of regression of the treatment andoutcome on the covariates
119894 = 119884119894 minus 120124[119884119894|119883119894]119894 = 119863119894 minus 120124[119863119894|119883119894]
bull Running a regression of 119894 on 119894 is equivalent to controllingfor 119883119894
119884119894 = 120572 + 120591 sdot 119863119894 + 119883prime119894 120573 + 119906119894
119894 = 120572 + 120591 sdot 119894 + 119894bull Here 119894 = 119906119894 minus 120124[119906119894|119883119894]
What does OLS estimate
bull Using the usual OLS theory we can show that the probabilitylimit of the OLS estimator of 120591 is
plim 111369611136931113700 =Cov(119894 119894)Var(119894)
= Cov(119894 120572 + 120591 sdot 119894 + 119894)Var(119894)
= 120591 sdot Cov(119894 119894) + Cov(119894 119894)Var(119894)
= 120591 + Cov(119894 119894)Var(119894)
Key OLS assumption
plim 111369611136931113700 = 120591 + Cov(119894 119894)Var(119894)
bull Key identification comes from Cov(119894 119894) = 0 Conditional on 119883119894 no relationship between 119863119894 and 119906119894
bull Note 119906119894 is a function of 119883119894 and 119884119894(119889) 119906119894 = 119884119894(0) minus 120572 minus 119883prime
119894 120573 when 119863119894 = 0 119906119894 = 119884119894(1) minus 120572 minus 120591 minus 119883prime
119894 120573 when 119863119894 = 1 condition on 119883119894 only variation in 119906119894 comes from 119884119894(119889)
bull No unmeasured confounding implies this assumption
119863119894 ⟂⟂ 1114100119884119894(1) 119884119894(0)1114103|119883119894 ⟹ 119863119894 ⟂⟂ 119906119894|119883119894 ⟹ Cov(119894 119894) = 0
Omitted variable bias
bull What happens when this is violated Suppose that there isone omitted variable (residualized from 119883119894)
119894 = 120582119894 + 120596119894
bull Wersquoll assume that if we could measure 119871119894 then nounmeasured confounding would hold
bull Leads to inconsistency in the OLS estimator
plim 111369611136931113700 = 120591 + 120582Cov(119894 119894)Var(119894)
bull Bias here is terms multiplied together1 coefficient on 119871119894 (120582)2 the coefficient of regression of 119863119894 on 119871119894 also controlling for 119883119894
4 Estimating causaleffects under nounmeasuredconfounders
Basic approach to estimation
bull Remember the usual approach to estimating the ATE withcovariates
bull Stratification Stratify the units by the covariates Calculate CATE within these strata
bull Standardizationdirect adjustment Average the CATEs across the strata to get ATE
bull How to create strata when 119883 has continuous components If 119883 is discrete with only a few levels can use the exact values
of 119883 Otherwise we may have to subclassifycoarsen the data
Classic example cigarspipes versuscigarettes
bull 119863119894 = 1 for pipecigar smokers 119863119894 = 0 for cigarette smokersbull 119884119894 = death in the first year of follow-upbull Naive positive effect cigarpipe smokers more likely to die
Whatrsquos the confounder here Age Pipecigar smokers much older than cigarette smokers
bull Cochranrsquos approach stratify based on coarsened age Divide age into 119896 strata 119878119894 isin 1199041113568 1199041113569 hellip 119904119896 1199041113568 might be 18-25 1199041113569 might be 26-35 and so on Calculate effect within strata and aggregate
bull Key assumption no unmeasured confounders using stratifiedversion of age
119863119894 ⟂⟂ 1114100119884119894(0) 119884119894(1)1114103|119878119894
Stratification on the propensity score
bull What about when 119883 has has many dimensionsbull Curse of dimensionality there will be very few if any units in
a given stratum of 119883119894bull Stratify on a low-dimensional summary the propensity score
119890(119909) = ℙ[119863119894 = 1|119883119894 = 119909]
PS = unitrsquos probability of being treated conditional on 119883119894
bull For a particular unit this is 119890(119883119894) = ℙ[119863119894 = 1|119883119894]bull Rosenbaum and Rubin (1983) showed that
119863119894 ⟂⟂ 1114100119884119894(0) 119884119894(1)1114103 | 119883119894 ⟹ 119863119894 ⟂⟂ 1114100119884119894(0) 119884119894(1)1114103 | 119890(119883119894)
stratifying on 119890119894 is the same as stratifying on the full 119883119894
Propensity score as balancing score
bull The propensity score is actually a balancing score whichmeans that
119863119894 ⟂⟂ 119883119894 | 119890(119883119894)
bull Conditional on the propensity score treatment is independentof the covariates
Treatment status is said to be balanced 119891(119883119894|119863119894 = 1 119890(119883119894)) = 119891(119883119894|119863119894 = 0 119890(119883119894))
bull Of course we have to know the true PS to have all theseresults work
Estimating the propensity score
bull Of course in observational studies we donrsquot know thepropensity score
bull We would run a parametric model with parameters 120574 toestimate the propensity scores
1 Estimate 2 Create 119894 = Pr[119863119894 = 1|119883119894 ]
bull For instance in R we could easily calculate the propensityscores using the glm function
pscores lt- glm(treat ~ var1 + var2 + var3 data = mydata
family = binomial())$fittedvalues
Propensity score specifics
bull What variables do we include in the propensity score model Any set of variables that blocks all the backdoor paths from 119863119894
to 119884119894
bull Check balance within strata of 119894 Covariates should bebalanced
119891(119883119894|119863119894 = 1 119894) = 119891(119883119894|119863119894 = 0 119894)
bull Can also use automatednonparametric tools for estimating 119894 Covariate Balancing Propensity Scores (Imai and Ratkovic)
Stratifying by the propensity score
bull How will we use the propensity score Matching (next week) Weighting (two weeks) Regression
(three weeks)bull Today coarsening the propensity score and stratifyingbull Choose boundary points 0 = 1198871113567 lt 1198871113568 lt hellip lt 119887119870minus1113568 lt 119887119870 = 1bull Create block indicators
119861119894(119896) =
⎧⎪⎪⎨⎪⎪⎩1 if 119887119896minus1113568 lt (119883119894) lt 1198871198960 otherwise
bull Calculate within-strata effect estimates
120591119896 = 120124[119884119894|119863119894 = 1 119861119894(119896) = 1] minus 120124[119884119894|119863119894 = 0 119861119894(119896) = 1]
Standardizationdirect adjustment
bull We calculated the CATEs for each strata of the PS 120591119896bull We can use law of iterated expectations to back out the ATEbull Take the average of the CATEs over the distribution of 119883
120591 =1198701114012119896=1113568
120591119896ℙ[119861119894(119896) = 1]
bull Note that ℙ[119861119894(119896) = 1] is just the proportion of units in block 119896
ℙ[119861119894(119896) = 1] =sum119873
119894=1113568 119861119894(119896)119873
5 Wrapping Up
Summary
bull Defined observational studiesbull Defined confounding and assessed when no unmeasured
confounding holdsbull Saw how no unmeasured confounding helps with OLSbull Saw how to estimate causal effects under no unmeasured
confounding using the propensity score
Next few weeks
bull Learn how to estimate causal effects under no unmeasuredconfounders via
Matching Weighting Regression
bull Then we move onto situations where no unmeasuredconfounders is violated
Justifying regression
bull We know how randomized experiments imply thatdifferences-in-means identify the ATE
bull In the next few weeks wersquoll work through how no unmeasuredconfounding justifies a number of estimation strategies
bull Today itrsquos useful to walk through what no unmeasuredconfounding can buy us in a familiar setting OLS
Wersquoll cover regression more formally later
Constant effects set up
bull Assume a constant effects setup
119884119894(0) = 120572 + 119883prime119894 120573 + 119906119894
119884119894(1) = 120572 + 120591 + 119883prime119894 120573 + 119906119894
bull Constant effects because 119884119894(1) minus 119884119894(0) = 120591 for all unitsbull Use consistency to get the usual regression formula
119884119894 = 119884119894(1)119863119894 + 119884119894(0)(1 minus 119863119894)= 119884119894(0) + 1114100119884119894(1) minus 119884119894(0)1114103 sdot 119863119894
= 120572 + 120591 sdot 119863119894 + 119883prime119894 120573 + 119906119894
bull Does no unmeasured confounding help us identify the causalparameter 120591
Regression on residuals
bull First estimate the residuals of regression of the treatment andoutcome on the covariates
119894 = 119884119894 minus 120124[119884119894|119883119894]119894 = 119863119894 minus 120124[119863119894|119883119894]
bull Running a regression of 119894 on 119894 is equivalent to controllingfor 119883119894
119884119894 = 120572 + 120591 sdot 119863119894 + 119883prime119894 120573 + 119906119894
119894 = 120572 + 120591 sdot 119894 + 119894bull Here 119894 = 119906119894 minus 120124[119906119894|119883119894]
What does OLS estimate
bull Using the usual OLS theory we can show that the probabilitylimit of the OLS estimator of 120591 is
plim 111369611136931113700 =Cov(119894 119894)Var(119894)
= Cov(119894 120572 + 120591 sdot 119894 + 119894)Var(119894)
= 120591 sdot Cov(119894 119894) + Cov(119894 119894)Var(119894)
= 120591 + Cov(119894 119894)Var(119894)
Key OLS assumption
plim 111369611136931113700 = 120591 + Cov(119894 119894)Var(119894)
bull Key identification comes from Cov(119894 119894) = 0 Conditional on 119883119894 no relationship between 119863119894 and 119906119894
bull Note 119906119894 is a function of 119883119894 and 119884119894(119889) 119906119894 = 119884119894(0) minus 120572 minus 119883prime
119894 120573 when 119863119894 = 0 119906119894 = 119884119894(1) minus 120572 minus 120591 minus 119883prime
119894 120573 when 119863119894 = 1 condition on 119883119894 only variation in 119906119894 comes from 119884119894(119889)
bull No unmeasured confounding implies this assumption
119863119894 ⟂⟂ 1114100119884119894(1) 119884119894(0)1114103|119883119894 ⟹ 119863119894 ⟂⟂ 119906119894|119883119894 ⟹ Cov(119894 119894) = 0
Omitted variable bias
bull What happens when this is violated Suppose that there isone omitted variable (residualized from 119883119894)
119894 = 120582119894 + 120596119894
bull Wersquoll assume that if we could measure 119871119894 then nounmeasured confounding would hold
bull Leads to inconsistency in the OLS estimator
plim 111369611136931113700 = 120591 + 120582Cov(119894 119894)Var(119894)
bull Bias here is terms multiplied together1 coefficient on 119871119894 (120582)2 the coefficient of regression of 119863119894 on 119871119894 also controlling for 119883119894
4 Estimating causaleffects under nounmeasuredconfounders
Basic approach to estimation
bull Remember the usual approach to estimating the ATE withcovariates
bull Stratification Stratify the units by the covariates Calculate CATE within these strata
bull Standardizationdirect adjustment Average the CATEs across the strata to get ATE
bull How to create strata when 119883 has continuous components If 119883 is discrete with only a few levels can use the exact values
of 119883 Otherwise we may have to subclassifycoarsen the data
Classic example cigarspipes versuscigarettes
bull 119863119894 = 1 for pipecigar smokers 119863119894 = 0 for cigarette smokersbull 119884119894 = death in the first year of follow-upbull Naive positive effect cigarpipe smokers more likely to die
Whatrsquos the confounder here Age Pipecigar smokers much older than cigarette smokers
bull Cochranrsquos approach stratify based on coarsened age Divide age into 119896 strata 119878119894 isin 1199041113568 1199041113569 hellip 119904119896 1199041113568 might be 18-25 1199041113569 might be 26-35 and so on Calculate effect within strata and aggregate
bull Key assumption no unmeasured confounders using stratifiedversion of age
119863119894 ⟂⟂ 1114100119884119894(0) 119884119894(1)1114103|119878119894
Stratification on the propensity score
bull What about when 119883 has has many dimensionsbull Curse of dimensionality there will be very few if any units in
a given stratum of 119883119894bull Stratify on a low-dimensional summary the propensity score
119890(119909) = ℙ[119863119894 = 1|119883119894 = 119909]
PS = unitrsquos probability of being treated conditional on 119883119894
bull For a particular unit this is 119890(119883119894) = ℙ[119863119894 = 1|119883119894]bull Rosenbaum and Rubin (1983) showed that
119863119894 ⟂⟂ 1114100119884119894(0) 119884119894(1)1114103 | 119883119894 ⟹ 119863119894 ⟂⟂ 1114100119884119894(0) 119884119894(1)1114103 | 119890(119883119894)
stratifying on 119890119894 is the same as stratifying on the full 119883119894
Propensity score as balancing score
bull The propensity score is actually a balancing score whichmeans that
119863119894 ⟂⟂ 119883119894 | 119890(119883119894)
bull Conditional on the propensity score treatment is independentof the covariates
Treatment status is said to be balanced 119891(119883119894|119863119894 = 1 119890(119883119894)) = 119891(119883119894|119863119894 = 0 119890(119883119894))
bull Of course we have to know the true PS to have all theseresults work
Estimating the propensity score
bull Of course in observational studies we donrsquot know thepropensity score
bull We would run a parametric model with parameters 120574 toestimate the propensity scores
1 Estimate 2 Create 119894 = Pr[119863119894 = 1|119883119894 ]
bull For instance in R we could easily calculate the propensityscores using the glm function
pscores lt- glm(treat ~ var1 + var2 + var3 data = mydata
family = binomial())$fittedvalues
Propensity score specifics
bull What variables do we include in the propensity score model Any set of variables that blocks all the backdoor paths from 119863119894
to 119884119894
bull Check balance within strata of 119894 Covariates should bebalanced
119891(119883119894|119863119894 = 1 119894) = 119891(119883119894|119863119894 = 0 119894)
bull Can also use automatednonparametric tools for estimating 119894 Covariate Balancing Propensity Scores (Imai and Ratkovic)
Stratifying by the propensity score
bull How will we use the propensity score Matching (next week) Weighting (two weeks) Regression
(three weeks)bull Today coarsening the propensity score and stratifyingbull Choose boundary points 0 = 1198871113567 lt 1198871113568 lt hellip lt 119887119870minus1113568 lt 119887119870 = 1bull Create block indicators
119861119894(119896) =
⎧⎪⎪⎨⎪⎪⎩1 if 119887119896minus1113568 lt (119883119894) lt 1198871198960 otherwise
bull Calculate within-strata effect estimates
120591119896 = 120124[119884119894|119863119894 = 1 119861119894(119896) = 1] minus 120124[119884119894|119863119894 = 0 119861119894(119896) = 1]
Standardizationdirect adjustment
bull We calculated the CATEs for each strata of the PS 120591119896bull We can use law of iterated expectations to back out the ATEbull Take the average of the CATEs over the distribution of 119883
120591 =1198701114012119896=1113568
120591119896ℙ[119861119894(119896) = 1]
bull Note that ℙ[119861119894(119896) = 1] is just the proportion of units in block 119896
ℙ[119861119894(119896) = 1] =sum119873
119894=1113568 119861119894(119896)119873
5 Wrapping Up
Summary
bull Defined observational studiesbull Defined confounding and assessed when no unmeasured
confounding holdsbull Saw how no unmeasured confounding helps with OLSbull Saw how to estimate causal effects under no unmeasured
confounding using the propensity score
Next few weeks
bull Learn how to estimate causal effects under no unmeasuredconfounders via
Matching Weighting Regression
bull Then we move onto situations where no unmeasuredconfounders is violated
Constant effects set up
bull Assume a constant effects setup
119884119894(0) = 120572 + 119883prime119894 120573 + 119906119894
119884119894(1) = 120572 + 120591 + 119883prime119894 120573 + 119906119894
bull Constant effects because 119884119894(1) minus 119884119894(0) = 120591 for all unitsbull Use consistency to get the usual regression formula
119884119894 = 119884119894(1)119863119894 + 119884119894(0)(1 minus 119863119894)= 119884119894(0) + 1114100119884119894(1) minus 119884119894(0)1114103 sdot 119863119894
= 120572 + 120591 sdot 119863119894 + 119883prime119894 120573 + 119906119894
bull Does no unmeasured confounding help us identify the causalparameter 120591
Regression on residuals
bull First estimate the residuals of regression of the treatment andoutcome on the covariates
119894 = 119884119894 minus 120124[119884119894|119883119894]119894 = 119863119894 minus 120124[119863119894|119883119894]
bull Running a regression of 119894 on 119894 is equivalent to controllingfor 119883119894
119884119894 = 120572 + 120591 sdot 119863119894 + 119883prime119894 120573 + 119906119894
119894 = 120572 + 120591 sdot 119894 + 119894bull Here 119894 = 119906119894 minus 120124[119906119894|119883119894]
What does OLS estimate
bull Using the usual OLS theory we can show that the probabilitylimit of the OLS estimator of 120591 is
plim 111369611136931113700 =Cov(119894 119894)Var(119894)
= Cov(119894 120572 + 120591 sdot 119894 + 119894)Var(119894)
= 120591 sdot Cov(119894 119894) + Cov(119894 119894)Var(119894)
= 120591 + Cov(119894 119894)Var(119894)
Key OLS assumption
plim 111369611136931113700 = 120591 + Cov(119894 119894)Var(119894)
bull Key identification comes from Cov(119894 119894) = 0 Conditional on 119883119894 no relationship between 119863119894 and 119906119894
bull Note 119906119894 is a function of 119883119894 and 119884119894(119889) 119906119894 = 119884119894(0) minus 120572 minus 119883prime
119894 120573 when 119863119894 = 0 119906119894 = 119884119894(1) minus 120572 minus 120591 minus 119883prime
119894 120573 when 119863119894 = 1 condition on 119883119894 only variation in 119906119894 comes from 119884119894(119889)
bull No unmeasured confounding implies this assumption
119863119894 ⟂⟂ 1114100119884119894(1) 119884119894(0)1114103|119883119894 ⟹ 119863119894 ⟂⟂ 119906119894|119883119894 ⟹ Cov(119894 119894) = 0
Omitted variable bias
bull What happens when this is violated Suppose that there isone omitted variable (residualized from 119883119894)
119894 = 120582119894 + 120596119894
bull Wersquoll assume that if we could measure 119871119894 then nounmeasured confounding would hold
bull Leads to inconsistency in the OLS estimator
plim 111369611136931113700 = 120591 + 120582Cov(119894 119894)Var(119894)
bull Bias here is terms multiplied together1 coefficient on 119871119894 (120582)2 the coefficient of regression of 119863119894 on 119871119894 also controlling for 119883119894
4 Estimating causaleffects under nounmeasuredconfounders
Basic approach to estimation
bull Remember the usual approach to estimating the ATE withcovariates
bull Stratification Stratify the units by the covariates Calculate CATE within these strata
bull Standardizationdirect adjustment Average the CATEs across the strata to get ATE
bull How to create strata when 119883 has continuous components If 119883 is discrete with only a few levels can use the exact values
of 119883 Otherwise we may have to subclassifycoarsen the data
Classic example cigarspipes versuscigarettes
bull 119863119894 = 1 for pipecigar smokers 119863119894 = 0 for cigarette smokersbull 119884119894 = death in the first year of follow-upbull Naive positive effect cigarpipe smokers more likely to die
Whatrsquos the confounder here Age Pipecigar smokers much older than cigarette smokers
bull Cochranrsquos approach stratify based on coarsened age Divide age into 119896 strata 119878119894 isin 1199041113568 1199041113569 hellip 119904119896 1199041113568 might be 18-25 1199041113569 might be 26-35 and so on Calculate effect within strata and aggregate
bull Key assumption no unmeasured confounders using stratifiedversion of age
119863119894 ⟂⟂ 1114100119884119894(0) 119884119894(1)1114103|119878119894
Stratification on the propensity score
bull What about when 119883 has has many dimensionsbull Curse of dimensionality there will be very few if any units in
a given stratum of 119883119894bull Stratify on a low-dimensional summary the propensity score
119890(119909) = ℙ[119863119894 = 1|119883119894 = 119909]
PS = unitrsquos probability of being treated conditional on 119883119894
bull For a particular unit this is 119890(119883119894) = ℙ[119863119894 = 1|119883119894]bull Rosenbaum and Rubin (1983) showed that
119863119894 ⟂⟂ 1114100119884119894(0) 119884119894(1)1114103 | 119883119894 ⟹ 119863119894 ⟂⟂ 1114100119884119894(0) 119884119894(1)1114103 | 119890(119883119894)
stratifying on 119890119894 is the same as stratifying on the full 119883119894
Propensity score as balancing score
bull The propensity score is actually a balancing score whichmeans that
119863119894 ⟂⟂ 119883119894 | 119890(119883119894)
bull Conditional on the propensity score treatment is independentof the covariates
Treatment status is said to be balanced 119891(119883119894|119863119894 = 1 119890(119883119894)) = 119891(119883119894|119863119894 = 0 119890(119883119894))
bull Of course we have to know the true PS to have all theseresults work
Estimating the propensity score
bull Of course in observational studies we donrsquot know thepropensity score
bull We would run a parametric model with parameters 120574 toestimate the propensity scores
1 Estimate 2 Create 119894 = Pr[119863119894 = 1|119883119894 ]
bull For instance in R we could easily calculate the propensityscores using the glm function
pscores lt- glm(treat ~ var1 + var2 + var3 data = mydata
family = binomial())$fittedvalues
Propensity score specifics
bull What variables do we include in the propensity score model Any set of variables that blocks all the backdoor paths from 119863119894
to 119884119894
bull Check balance within strata of 119894 Covariates should bebalanced
119891(119883119894|119863119894 = 1 119894) = 119891(119883119894|119863119894 = 0 119894)
bull Can also use automatednonparametric tools for estimating 119894 Covariate Balancing Propensity Scores (Imai and Ratkovic)
Stratifying by the propensity score
bull How will we use the propensity score Matching (next week) Weighting (two weeks) Regression
(three weeks)bull Today coarsening the propensity score and stratifyingbull Choose boundary points 0 = 1198871113567 lt 1198871113568 lt hellip lt 119887119870minus1113568 lt 119887119870 = 1bull Create block indicators
119861119894(119896) =
⎧⎪⎪⎨⎪⎪⎩1 if 119887119896minus1113568 lt (119883119894) lt 1198871198960 otherwise
bull Calculate within-strata effect estimates
120591119896 = 120124[119884119894|119863119894 = 1 119861119894(119896) = 1] minus 120124[119884119894|119863119894 = 0 119861119894(119896) = 1]
Standardizationdirect adjustment
bull We calculated the CATEs for each strata of the PS 120591119896bull We can use law of iterated expectations to back out the ATEbull Take the average of the CATEs over the distribution of 119883
120591 =1198701114012119896=1113568
120591119896ℙ[119861119894(119896) = 1]
bull Note that ℙ[119861119894(119896) = 1] is just the proportion of units in block 119896
ℙ[119861119894(119896) = 1] =sum119873
119894=1113568 119861119894(119896)119873
5 Wrapping Up
Summary
bull Defined observational studiesbull Defined confounding and assessed when no unmeasured
confounding holdsbull Saw how no unmeasured confounding helps with OLSbull Saw how to estimate causal effects under no unmeasured
confounding using the propensity score
Next few weeks
bull Learn how to estimate causal effects under no unmeasuredconfounders via
Matching Weighting Regression
bull Then we move onto situations where no unmeasuredconfounders is violated
Regression on residuals
bull First estimate the residuals of regression of the treatment andoutcome on the covariates
119894 = 119884119894 minus 120124[119884119894|119883119894]119894 = 119863119894 minus 120124[119863119894|119883119894]
bull Running a regression of 119894 on 119894 is equivalent to controllingfor 119883119894
119884119894 = 120572 + 120591 sdot 119863119894 + 119883prime119894 120573 + 119906119894
119894 = 120572 + 120591 sdot 119894 + 119894bull Here 119894 = 119906119894 minus 120124[119906119894|119883119894]
What does OLS estimate
bull Using the usual OLS theory we can show that the probabilitylimit of the OLS estimator of 120591 is
plim 111369611136931113700 =Cov(119894 119894)Var(119894)
= Cov(119894 120572 + 120591 sdot 119894 + 119894)Var(119894)
= 120591 sdot Cov(119894 119894) + Cov(119894 119894)Var(119894)
= 120591 + Cov(119894 119894)Var(119894)
Key OLS assumption
plim 111369611136931113700 = 120591 + Cov(119894 119894)Var(119894)
bull Key identification comes from Cov(119894 119894) = 0 Conditional on 119883119894 no relationship between 119863119894 and 119906119894
bull Note 119906119894 is a function of 119883119894 and 119884119894(119889) 119906119894 = 119884119894(0) minus 120572 minus 119883prime
119894 120573 when 119863119894 = 0 119906119894 = 119884119894(1) minus 120572 minus 120591 minus 119883prime
119894 120573 when 119863119894 = 1 condition on 119883119894 only variation in 119906119894 comes from 119884119894(119889)
bull No unmeasured confounding implies this assumption
119863119894 ⟂⟂ 1114100119884119894(1) 119884119894(0)1114103|119883119894 ⟹ 119863119894 ⟂⟂ 119906119894|119883119894 ⟹ Cov(119894 119894) = 0
Omitted variable bias
bull What happens when this is violated Suppose that there isone omitted variable (residualized from 119883119894)
119894 = 120582119894 + 120596119894
bull Wersquoll assume that if we could measure 119871119894 then nounmeasured confounding would hold
bull Leads to inconsistency in the OLS estimator
plim 111369611136931113700 = 120591 + 120582Cov(119894 119894)Var(119894)
bull Bias here is terms multiplied together1 coefficient on 119871119894 (120582)2 the coefficient of regression of 119863119894 on 119871119894 also controlling for 119883119894
4 Estimating causaleffects under nounmeasuredconfounders
Basic approach to estimation
bull Remember the usual approach to estimating the ATE withcovariates
bull Stratification Stratify the units by the covariates Calculate CATE within these strata
bull Standardizationdirect adjustment Average the CATEs across the strata to get ATE
bull How to create strata when 119883 has continuous components If 119883 is discrete with only a few levels can use the exact values
of 119883 Otherwise we may have to subclassifycoarsen the data
Classic example cigarspipes versuscigarettes
bull 119863119894 = 1 for pipecigar smokers 119863119894 = 0 for cigarette smokersbull 119884119894 = death in the first year of follow-upbull Naive positive effect cigarpipe smokers more likely to die
Whatrsquos the confounder here Age Pipecigar smokers much older than cigarette smokers
bull Cochranrsquos approach stratify based on coarsened age Divide age into 119896 strata 119878119894 isin 1199041113568 1199041113569 hellip 119904119896 1199041113568 might be 18-25 1199041113569 might be 26-35 and so on Calculate effect within strata and aggregate
bull Key assumption no unmeasured confounders using stratifiedversion of age
119863119894 ⟂⟂ 1114100119884119894(0) 119884119894(1)1114103|119878119894
Stratification on the propensity score
bull What about when 119883 has has many dimensionsbull Curse of dimensionality there will be very few if any units in
a given stratum of 119883119894bull Stratify on a low-dimensional summary the propensity score
119890(119909) = ℙ[119863119894 = 1|119883119894 = 119909]
PS = unitrsquos probability of being treated conditional on 119883119894
bull For a particular unit this is 119890(119883119894) = ℙ[119863119894 = 1|119883119894]bull Rosenbaum and Rubin (1983) showed that
119863119894 ⟂⟂ 1114100119884119894(0) 119884119894(1)1114103 | 119883119894 ⟹ 119863119894 ⟂⟂ 1114100119884119894(0) 119884119894(1)1114103 | 119890(119883119894)
stratifying on 119890119894 is the same as stratifying on the full 119883119894
Propensity score as balancing score
bull The propensity score is actually a balancing score whichmeans that
119863119894 ⟂⟂ 119883119894 | 119890(119883119894)
bull Conditional on the propensity score treatment is independentof the covariates
Treatment status is said to be balanced 119891(119883119894|119863119894 = 1 119890(119883119894)) = 119891(119883119894|119863119894 = 0 119890(119883119894))
bull Of course we have to know the true PS to have all theseresults work
Estimating the propensity score
bull Of course in observational studies we donrsquot know thepropensity score
bull We would run a parametric model with parameters 120574 toestimate the propensity scores
1 Estimate 2 Create 119894 = Pr[119863119894 = 1|119883119894 ]
bull For instance in R we could easily calculate the propensityscores using the glm function
pscores lt- glm(treat ~ var1 + var2 + var3 data = mydata
family = binomial())$fittedvalues
Propensity score specifics
bull What variables do we include in the propensity score model Any set of variables that blocks all the backdoor paths from 119863119894
to 119884119894
bull Check balance within strata of 119894 Covariates should bebalanced
119891(119883119894|119863119894 = 1 119894) = 119891(119883119894|119863119894 = 0 119894)
bull Can also use automatednonparametric tools for estimating 119894 Covariate Balancing Propensity Scores (Imai and Ratkovic)
Stratifying by the propensity score
bull How will we use the propensity score Matching (next week) Weighting (two weeks) Regression
(three weeks)bull Today coarsening the propensity score and stratifyingbull Choose boundary points 0 = 1198871113567 lt 1198871113568 lt hellip lt 119887119870minus1113568 lt 119887119870 = 1bull Create block indicators
119861119894(119896) =
⎧⎪⎪⎨⎪⎪⎩1 if 119887119896minus1113568 lt (119883119894) lt 1198871198960 otherwise
bull Calculate within-strata effect estimates
120591119896 = 120124[119884119894|119863119894 = 1 119861119894(119896) = 1] minus 120124[119884119894|119863119894 = 0 119861119894(119896) = 1]
Standardizationdirect adjustment
bull We calculated the CATEs for each strata of the PS 120591119896bull We can use law of iterated expectations to back out the ATEbull Take the average of the CATEs over the distribution of 119883
120591 =1198701114012119896=1113568
120591119896ℙ[119861119894(119896) = 1]
bull Note that ℙ[119861119894(119896) = 1] is just the proportion of units in block 119896
ℙ[119861119894(119896) = 1] =sum119873
119894=1113568 119861119894(119896)119873
5 Wrapping Up
Summary
bull Defined observational studiesbull Defined confounding and assessed when no unmeasured
confounding holdsbull Saw how no unmeasured confounding helps with OLSbull Saw how to estimate causal effects under no unmeasured
confounding using the propensity score
Next few weeks
bull Learn how to estimate causal effects under no unmeasuredconfounders via
Matching Weighting Regression
bull Then we move onto situations where no unmeasuredconfounders is violated
What does OLS estimate
bull Using the usual OLS theory we can show that the probabilitylimit of the OLS estimator of 120591 is
plim 111369611136931113700 =Cov(119894 119894)Var(119894)
= Cov(119894 120572 + 120591 sdot 119894 + 119894)Var(119894)
= 120591 sdot Cov(119894 119894) + Cov(119894 119894)Var(119894)
= 120591 + Cov(119894 119894)Var(119894)
Key OLS assumption
plim 111369611136931113700 = 120591 + Cov(119894 119894)Var(119894)
bull Key identification comes from Cov(119894 119894) = 0 Conditional on 119883119894 no relationship between 119863119894 and 119906119894
bull Note 119906119894 is a function of 119883119894 and 119884119894(119889) 119906119894 = 119884119894(0) minus 120572 minus 119883prime
119894 120573 when 119863119894 = 0 119906119894 = 119884119894(1) minus 120572 minus 120591 minus 119883prime
119894 120573 when 119863119894 = 1 condition on 119883119894 only variation in 119906119894 comes from 119884119894(119889)
bull No unmeasured confounding implies this assumption
119863119894 ⟂⟂ 1114100119884119894(1) 119884119894(0)1114103|119883119894 ⟹ 119863119894 ⟂⟂ 119906119894|119883119894 ⟹ Cov(119894 119894) = 0
Omitted variable bias
bull What happens when this is violated Suppose that there isone omitted variable (residualized from 119883119894)
119894 = 120582119894 + 120596119894
bull Wersquoll assume that if we could measure 119871119894 then nounmeasured confounding would hold
bull Leads to inconsistency in the OLS estimator
plim 111369611136931113700 = 120591 + 120582Cov(119894 119894)Var(119894)
bull Bias here is terms multiplied together1 coefficient on 119871119894 (120582)2 the coefficient of regression of 119863119894 on 119871119894 also controlling for 119883119894
4 Estimating causaleffects under nounmeasuredconfounders
Basic approach to estimation
bull Remember the usual approach to estimating the ATE withcovariates
bull Stratification Stratify the units by the covariates Calculate CATE within these strata
bull Standardizationdirect adjustment Average the CATEs across the strata to get ATE
bull How to create strata when 119883 has continuous components If 119883 is discrete with only a few levels can use the exact values
of 119883 Otherwise we may have to subclassifycoarsen the data
Classic example cigarspipes versuscigarettes
bull 119863119894 = 1 for pipecigar smokers 119863119894 = 0 for cigarette smokersbull 119884119894 = death in the first year of follow-upbull Naive positive effect cigarpipe smokers more likely to die
Whatrsquos the confounder here Age Pipecigar smokers much older than cigarette smokers
bull Cochranrsquos approach stratify based on coarsened age Divide age into 119896 strata 119878119894 isin 1199041113568 1199041113569 hellip 119904119896 1199041113568 might be 18-25 1199041113569 might be 26-35 and so on Calculate effect within strata and aggregate
bull Key assumption no unmeasured confounders using stratifiedversion of age
119863119894 ⟂⟂ 1114100119884119894(0) 119884119894(1)1114103|119878119894
Stratification on the propensity score
bull What about when 119883 has has many dimensionsbull Curse of dimensionality there will be very few if any units in
a given stratum of 119883119894bull Stratify on a low-dimensional summary the propensity score
119890(119909) = ℙ[119863119894 = 1|119883119894 = 119909]
PS = unitrsquos probability of being treated conditional on 119883119894
bull For a particular unit this is 119890(119883119894) = ℙ[119863119894 = 1|119883119894]bull Rosenbaum and Rubin (1983) showed that
119863119894 ⟂⟂ 1114100119884119894(0) 119884119894(1)1114103 | 119883119894 ⟹ 119863119894 ⟂⟂ 1114100119884119894(0) 119884119894(1)1114103 | 119890(119883119894)
stratifying on 119890119894 is the same as stratifying on the full 119883119894
Propensity score as balancing score
bull The propensity score is actually a balancing score whichmeans that
119863119894 ⟂⟂ 119883119894 | 119890(119883119894)
bull Conditional on the propensity score treatment is independentof the covariates
Treatment status is said to be balanced 119891(119883119894|119863119894 = 1 119890(119883119894)) = 119891(119883119894|119863119894 = 0 119890(119883119894))
bull Of course we have to know the true PS to have all theseresults work
Estimating the propensity score
bull Of course in observational studies we donrsquot know thepropensity score
bull We would run a parametric model with parameters 120574 toestimate the propensity scores
1 Estimate 2 Create 119894 = Pr[119863119894 = 1|119883119894 ]
bull For instance in R we could easily calculate the propensityscores using the glm function
pscores lt- glm(treat ~ var1 + var2 + var3 data = mydata
family = binomial())$fittedvalues
Propensity score specifics
bull What variables do we include in the propensity score model Any set of variables that blocks all the backdoor paths from 119863119894
to 119884119894
bull Check balance within strata of 119894 Covariates should bebalanced
119891(119883119894|119863119894 = 1 119894) = 119891(119883119894|119863119894 = 0 119894)
bull Can also use automatednonparametric tools for estimating 119894 Covariate Balancing Propensity Scores (Imai and Ratkovic)
Stratifying by the propensity score
bull How will we use the propensity score Matching (next week) Weighting (two weeks) Regression
(three weeks)bull Today coarsening the propensity score and stratifyingbull Choose boundary points 0 = 1198871113567 lt 1198871113568 lt hellip lt 119887119870minus1113568 lt 119887119870 = 1bull Create block indicators
119861119894(119896) =
⎧⎪⎪⎨⎪⎪⎩1 if 119887119896minus1113568 lt (119883119894) lt 1198871198960 otherwise
bull Calculate within-strata effect estimates
120591119896 = 120124[119884119894|119863119894 = 1 119861119894(119896) = 1] minus 120124[119884119894|119863119894 = 0 119861119894(119896) = 1]
Standardizationdirect adjustment
bull We calculated the CATEs for each strata of the PS 120591119896bull We can use law of iterated expectations to back out the ATEbull Take the average of the CATEs over the distribution of 119883
120591 =1198701114012119896=1113568
120591119896ℙ[119861119894(119896) = 1]
bull Note that ℙ[119861119894(119896) = 1] is just the proportion of units in block 119896
ℙ[119861119894(119896) = 1] =sum119873
119894=1113568 119861119894(119896)119873
5 Wrapping Up
Summary
bull Defined observational studiesbull Defined confounding and assessed when no unmeasured
confounding holdsbull Saw how no unmeasured confounding helps with OLSbull Saw how to estimate causal effects under no unmeasured
confounding using the propensity score
Next few weeks
bull Learn how to estimate causal effects under no unmeasuredconfounders via
Matching Weighting Regression
bull Then we move onto situations where no unmeasuredconfounders is violated
Key OLS assumption
plim 111369611136931113700 = 120591 + Cov(119894 119894)Var(119894)
bull Key identification comes from Cov(119894 119894) = 0 Conditional on 119883119894 no relationship between 119863119894 and 119906119894
bull Note 119906119894 is a function of 119883119894 and 119884119894(119889) 119906119894 = 119884119894(0) minus 120572 minus 119883prime
119894 120573 when 119863119894 = 0 119906119894 = 119884119894(1) minus 120572 minus 120591 minus 119883prime
119894 120573 when 119863119894 = 1 condition on 119883119894 only variation in 119906119894 comes from 119884119894(119889)
bull No unmeasured confounding implies this assumption
119863119894 ⟂⟂ 1114100119884119894(1) 119884119894(0)1114103|119883119894 ⟹ 119863119894 ⟂⟂ 119906119894|119883119894 ⟹ Cov(119894 119894) = 0
Omitted variable bias
bull What happens when this is violated Suppose that there isone omitted variable (residualized from 119883119894)
119894 = 120582119894 + 120596119894
bull Wersquoll assume that if we could measure 119871119894 then nounmeasured confounding would hold
bull Leads to inconsistency in the OLS estimator
plim 111369611136931113700 = 120591 + 120582Cov(119894 119894)Var(119894)
bull Bias here is terms multiplied together1 coefficient on 119871119894 (120582)2 the coefficient of regression of 119863119894 on 119871119894 also controlling for 119883119894
4 Estimating causaleffects under nounmeasuredconfounders
Basic approach to estimation
bull Remember the usual approach to estimating the ATE withcovariates
bull Stratification Stratify the units by the covariates Calculate CATE within these strata
bull Standardizationdirect adjustment Average the CATEs across the strata to get ATE
bull How to create strata when 119883 has continuous components If 119883 is discrete with only a few levels can use the exact values
of 119883 Otherwise we may have to subclassifycoarsen the data
Classic example cigarspipes versuscigarettes
bull 119863119894 = 1 for pipecigar smokers 119863119894 = 0 for cigarette smokersbull 119884119894 = death in the first year of follow-upbull Naive positive effect cigarpipe smokers more likely to die
Whatrsquos the confounder here Age Pipecigar smokers much older than cigarette smokers
bull Cochranrsquos approach stratify based on coarsened age Divide age into 119896 strata 119878119894 isin 1199041113568 1199041113569 hellip 119904119896 1199041113568 might be 18-25 1199041113569 might be 26-35 and so on Calculate effect within strata and aggregate
bull Key assumption no unmeasured confounders using stratifiedversion of age
119863119894 ⟂⟂ 1114100119884119894(0) 119884119894(1)1114103|119878119894
Stratification on the propensity score
bull What about when 119883 has has many dimensionsbull Curse of dimensionality there will be very few if any units in
a given stratum of 119883119894bull Stratify on a low-dimensional summary the propensity score
119890(119909) = ℙ[119863119894 = 1|119883119894 = 119909]
PS = unitrsquos probability of being treated conditional on 119883119894
bull For a particular unit this is 119890(119883119894) = ℙ[119863119894 = 1|119883119894]bull Rosenbaum and Rubin (1983) showed that
119863119894 ⟂⟂ 1114100119884119894(0) 119884119894(1)1114103 | 119883119894 ⟹ 119863119894 ⟂⟂ 1114100119884119894(0) 119884119894(1)1114103 | 119890(119883119894)
stratifying on 119890119894 is the same as stratifying on the full 119883119894
Propensity score as balancing score
bull The propensity score is actually a balancing score whichmeans that
119863119894 ⟂⟂ 119883119894 | 119890(119883119894)
bull Conditional on the propensity score treatment is independentof the covariates
Treatment status is said to be balanced 119891(119883119894|119863119894 = 1 119890(119883119894)) = 119891(119883119894|119863119894 = 0 119890(119883119894))
bull Of course we have to know the true PS to have all theseresults work
Estimating the propensity score
bull Of course in observational studies we donrsquot know thepropensity score
bull We would run a parametric model with parameters 120574 toestimate the propensity scores
1 Estimate 2 Create 119894 = Pr[119863119894 = 1|119883119894 ]
bull For instance in R we could easily calculate the propensityscores using the glm function
pscores lt- glm(treat ~ var1 + var2 + var3 data = mydata
family = binomial())$fittedvalues
Propensity score specifics
bull What variables do we include in the propensity score model Any set of variables that blocks all the backdoor paths from 119863119894
to 119884119894
bull Check balance within strata of 119894 Covariates should bebalanced
119891(119883119894|119863119894 = 1 119894) = 119891(119883119894|119863119894 = 0 119894)
bull Can also use automatednonparametric tools for estimating 119894 Covariate Balancing Propensity Scores (Imai and Ratkovic)
Stratifying by the propensity score
bull How will we use the propensity score Matching (next week) Weighting (two weeks) Regression
(three weeks)bull Today coarsening the propensity score and stratifyingbull Choose boundary points 0 = 1198871113567 lt 1198871113568 lt hellip lt 119887119870minus1113568 lt 119887119870 = 1bull Create block indicators
119861119894(119896) =
⎧⎪⎪⎨⎪⎪⎩1 if 119887119896minus1113568 lt (119883119894) lt 1198871198960 otherwise
bull Calculate within-strata effect estimates
120591119896 = 120124[119884119894|119863119894 = 1 119861119894(119896) = 1] minus 120124[119884119894|119863119894 = 0 119861119894(119896) = 1]
Standardizationdirect adjustment
bull We calculated the CATEs for each strata of the PS 120591119896bull We can use law of iterated expectations to back out the ATEbull Take the average of the CATEs over the distribution of 119883
120591 =1198701114012119896=1113568
120591119896ℙ[119861119894(119896) = 1]
bull Note that ℙ[119861119894(119896) = 1] is just the proportion of units in block 119896
ℙ[119861119894(119896) = 1] =sum119873
119894=1113568 119861119894(119896)119873
5 Wrapping Up
Summary
bull Defined observational studiesbull Defined confounding and assessed when no unmeasured
confounding holdsbull Saw how no unmeasured confounding helps with OLSbull Saw how to estimate causal effects under no unmeasured
confounding using the propensity score
Next few weeks
bull Learn how to estimate causal effects under no unmeasuredconfounders via
Matching Weighting Regression
bull Then we move onto situations where no unmeasuredconfounders is violated
Omitted variable bias
bull What happens when this is violated Suppose that there isone omitted variable (residualized from 119883119894)
119894 = 120582119894 + 120596119894
bull Wersquoll assume that if we could measure 119871119894 then nounmeasured confounding would hold
bull Leads to inconsistency in the OLS estimator
plim 111369611136931113700 = 120591 + 120582Cov(119894 119894)Var(119894)
bull Bias here is terms multiplied together1 coefficient on 119871119894 (120582)2 the coefficient of regression of 119863119894 on 119871119894 also controlling for 119883119894
4 Estimating causaleffects under nounmeasuredconfounders
Basic approach to estimation
bull Remember the usual approach to estimating the ATE withcovariates
bull Stratification Stratify the units by the covariates Calculate CATE within these strata
bull Standardizationdirect adjustment Average the CATEs across the strata to get ATE
bull How to create strata when 119883 has continuous components If 119883 is discrete with only a few levels can use the exact values
of 119883 Otherwise we may have to subclassifycoarsen the data
Classic example cigarspipes versuscigarettes
bull 119863119894 = 1 for pipecigar smokers 119863119894 = 0 for cigarette smokersbull 119884119894 = death in the first year of follow-upbull Naive positive effect cigarpipe smokers more likely to die
Whatrsquos the confounder here Age Pipecigar smokers much older than cigarette smokers
bull Cochranrsquos approach stratify based on coarsened age Divide age into 119896 strata 119878119894 isin 1199041113568 1199041113569 hellip 119904119896 1199041113568 might be 18-25 1199041113569 might be 26-35 and so on Calculate effect within strata and aggregate
bull Key assumption no unmeasured confounders using stratifiedversion of age
119863119894 ⟂⟂ 1114100119884119894(0) 119884119894(1)1114103|119878119894
Stratification on the propensity score
bull What about when 119883 has has many dimensionsbull Curse of dimensionality there will be very few if any units in
a given stratum of 119883119894bull Stratify on a low-dimensional summary the propensity score
119890(119909) = ℙ[119863119894 = 1|119883119894 = 119909]
PS = unitrsquos probability of being treated conditional on 119883119894
bull For a particular unit this is 119890(119883119894) = ℙ[119863119894 = 1|119883119894]bull Rosenbaum and Rubin (1983) showed that
119863119894 ⟂⟂ 1114100119884119894(0) 119884119894(1)1114103 | 119883119894 ⟹ 119863119894 ⟂⟂ 1114100119884119894(0) 119884119894(1)1114103 | 119890(119883119894)
stratifying on 119890119894 is the same as stratifying on the full 119883119894
Propensity score as balancing score
bull The propensity score is actually a balancing score whichmeans that
119863119894 ⟂⟂ 119883119894 | 119890(119883119894)
bull Conditional on the propensity score treatment is independentof the covariates
Treatment status is said to be balanced 119891(119883119894|119863119894 = 1 119890(119883119894)) = 119891(119883119894|119863119894 = 0 119890(119883119894))
bull Of course we have to know the true PS to have all theseresults work
Estimating the propensity score
bull Of course in observational studies we donrsquot know thepropensity score
bull We would run a parametric model with parameters 120574 toestimate the propensity scores
1 Estimate 2 Create 119894 = Pr[119863119894 = 1|119883119894 ]
bull For instance in R we could easily calculate the propensityscores using the glm function
pscores lt- glm(treat ~ var1 + var2 + var3 data = mydata
family = binomial())$fittedvalues
Propensity score specifics
bull What variables do we include in the propensity score model Any set of variables that blocks all the backdoor paths from 119863119894
to 119884119894
bull Check balance within strata of 119894 Covariates should bebalanced
119891(119883119894|119863119894 = 1 119894) = 119891(119883119894|119863119894 = 0 119894)
bull Can also use automatednonparametric tools for estimating 119894 Covariate Balancing Propensity Scores (Imai and Ratkovic)
Stratifying by the propensity score
bull How will we use the propensity score Matching (next week) Weighting (two weeks) Regression
(three weeks)bull Today coarsening the propensity score and stratifyingbull Choose boundary points 0 = 1198871113567 lt 1198871113568 lt hellip lt 119887119870minus1113568 lt 119887119870 = 1bull Create block indicators
119861119894(119896) =
⎧⎪⎪⎨⎪⎪⎩1 if 119887119896minus1113568 lt (119883119894) lt 1198871198960 otherwise
bull Calculate within-strata effect estimates
120591119896 = 120124[119884119894|119863119894 = 1 119861119894(119896) = 1] minus 120124[119884119894|119863119894 = 0 119861119894(119896) = 1]
Standardizationdirect adjustment
bull We calculated the CATEs for each strata of the PS 120591119896bull We can use law of iterated expectations to back out the ATEbull Take the average of the CATEs over the distribution of 119883
120591 =1198701114012119896=1113568
120591119896ℙ[119861119894(119896) = 1]
bull Note that ℙ[119861119894(119896) = 1] is just the proportion of units in block 119896
ℙ[119861119894(119896) = 1] =sum119873
119894=1113568 119861119894(119896)119873
5 Wrapping Up
Summary
bull Defined observational studiesbull Defined confounding and assessed when no unmeasured
confounding holdsbull Saw how no unmeasured confounding helps with OLSbull Saw how to estimate causal effects under no unmeasured
confounding using the propensity score
Next few weeks
bull Learn how to estimate causal effects under no unmeasuredconfounders via
Matching Weighting Regression
bull Then we move onto situations where no unmeasuredconfounders is violated
4 Estimating causaleffects under nounmeasuredconfounders
Basic approach to estimation
bull Remember the usual approach to estimating the ATE withcovariates
bull Stratification Stratify the units by the covariates Calculate CATE within these strata
bull Standardizationdirect adjustment Average the CATEs across the strata to get ATE
bull How to create strata when 119883 has continuous components If 119883 is discrete with only a few levels can use the exact values
of 119883 Otherwise we may have to subclassifycoarsen the data
Classic example cigarspipes versuscigarettes
bull 119863119894 = 1 for pipecigar smokers 119863119894 = 0 for cigarette smokersbull 119884119894 = death in the first year of follow-upbull Naive positive effect cigarpipe smokers more likely to die
Whatrsquos the confounder here Age Pipecigar smokers much older than cigarette smokers
bull Cochranrsquos approach stratify based on coarsened age Divide age into 119896 strata 119878119894 isin 1199041113568 1199041113569 hellip 119904119896 1199041113568 might be 18-25 1199041113569 might be 26-35 and so on Calculate effect within strata and aggregate
bull Key assumption no unmeasured confounders using stratifiedversion of age
119863119894 ⟂⟂ 1114100119884119894(0) 119884119894(1)1114103|119878119894
Stratification on the propensity score
bull What about when 119883 has has many dimensionsbull Curse of dimensionality there will be very few if any units in
a given stratum of 119883119894bull Stratify on a low-dimensional summary the propensity score
119890(119909) = ℙ[119863119894 = 1|119883119894 = 119909]
PS = unitrsquos probability of being treated conditional on 119883119894
bull For a particular unit this is 119890(119883119894) = ℙ[119863119894 = 1|119883119894]bull Rosenbaum and Rubin (1983) showed that
119863119894 ⟂⟂ 1114100119884119894(0) 119884119894(1)1114103 | 119883119894 ⟹ 119863119894 ⟂⟂ 1114100119884119894(0) 119884119894(1)1114103 | 119890(119883119894)
stratifying on 119890119894 is the same as stratifying on the full 119883119894
Propensity score as balancing score
bull The propensity score is actually a balancing score whichmeans that
119863119894 ⟂⟂ 119883119894 | 119890(119883119894)
bull Conditional on the propensity score treatment is independentof the covariates
Treatment status is said to be balanced 119891(119883119894|119863119894 = 1 119890(119883119894)) = 119891(119883119894|119863119894 = 0 119890(119883119894))
bull Of course we have to know the true PS to have all theseresults work
Estimating the propensity score
bull Of course in observational studies we donrsquot know thepropensity score
bull We would run a parametric model with parameters 120574 toestimate the propensity scores
1 Estimate 2 Create 119894 = Pr[119863119894 = 1|119883119894 ]
bull For instance in R we could easily calculate the propensityscores using the glm function
pscores lt- glm(treat ~ var1 + var2 + var3 data = mydata
family = binomial())$fittedvalues
Propensity score specifics
bull What variables do we include in the propensity score model Any set of variables that blocks all the backdoor paths from 119863119894
to 119884119894
bull Check balance within strata of 119894 Covariates should bebalanced
119891(119883119894|119863119894 = 1 119894) = 119891(119883119894|119863119894 = 0 119894)
bull Can also use automatednonparametric tools for estimating 119894 Covariate Balancing Propensity Scores (Imai and Ratkovic)
Stratifying by the propensity score
bull How will we use the propensity score Matching (next week) Weighting (two weeks) Regression
(three weeks)bull Today coarsening the propensity score and stratifyingbull Choose boundary points 0 = 1198871113567 lt 1198871113568 lt hellip lt 119887119870minus1113568 lt 119887119870 = 1bull Create block indicators
119861119894(119896) =
⎧⎪⎪⎨⎪⎪⎩1 if 119887119896minus1113568 lt (119883119894) lt 1198871198960 otherwise
bull Calculate within-strata effect estimates
120591119896 = 120124[119884119894|119863119894 = 1 119861119894(119896) = 1] minus 120124[119884119894|119863119894 = 0 119861119894(119896) = 1]
Standardizationdirect adjustment
bull We calculated the CATEs for each strata of the PS 120591119896bull We can use law of iterated expectations to back out the ATEbull Take the average of the CATEs over the distribution of 119883
120591 =1198701114012119896=1113568
120591119896ℙ[119861119894(119896) = 1]
bull Note that ℙ[119861119894(119896) = 1] is just the proportion of units in block 119896
ℙ[119861119894(119896) = 1] =sum119873
119894=1113568 119861119894(119896)119873
5 Wrapping Up
Summary
bull Defined observational studiesbull Defined confounding and assessed when no unmeasured
confounding holdsbull Saw how no unmeasured confounding helps with OLSbull Saw how to estimate causal effects under no unmeasured
confounding using the propensity score
Next few weeks
bull Learn how to estimate causal effects under no unmeasuredconfounders via
Matching Weighting Regression
bull Then we move onto situations where no unmeasuredconfounders is violated
Basic approach to estimation
bull Remember the usual approach to estimating the ATE withcovariates
bull Stratification Stratify the units by the covariates Calculate CATE within these strata
bull Standardizationdirect adjustment Average the CATEs across the strata to get ATE
bull How to create strata when 119883 has continuous components If 119883 is discrete with only a few levels can use the exact values
of 119883 Otherwise we may have to subclassifycoarsen the data
Classic example cigarspipes versuscigarettes
bull 119863119894 = 1 for pipecigar smokers 119863119894 = 0 for cigarette smokersbull 119884119894 = death in the first year of follow-upbull Naive positive effect cigarpipe smokers more likely to die
Whatrsquos the confounder here Age Pipecigar smokers much older than cigarette smokers
bull Cochranrsquos approach stratify based on coarsened age Divide age into 119896 strata 119878119894 isin 1199041113568 1199041113569 hellip 119904119896 1199041113568 might be 18-25 1199041113569 might be 26-35 and so on Calculate effect within strata and aggregate
bull Key assumption no unmeasured confounders using stratifiedversion of age
119863119894 ⟂⟂ 1114100119884119894(0) 119884119894(1)1114103|119878119894
Stratification on the propensity score
bull What about when 119883 has has many dimensionsbull Curse of dimensionality there will be very few if any units in
a given stratum of 119883119894bull Stratify on a low-dimensional summary the propensity score
119890(119909) = ℙ[119863119894 = 1|119883119894 = 119909]
PS = unitrsquos probability of being treated conditional on 119883119894
bull For a particular unit this is 119890(119883119894) = ℙ[119863119894 = 1|119883119894]bull Rosenbaum and Rubin (1983) showed that
119863119894 ⟂⟂ 1114100119884119894(0) 119884119894(1)1114103 | 119883119894 ⟹ 119863119894 ⟂⟂ 1114100119884119894(0) 119884119894(1)1114103 | 119890(119883119894)
stratifying on 119890119894 is the same as stratifying on the full 119883119894
Propensity score as balancing score
bull The propensity score is actually a balancing score whichmeans that
119863119894 ⟂⟂ 119883119894 | 119890(119883119894)
bull Conditional on the propensity score treatment is independentof the covariates
Treatment status is said to be balanced 119891(119883119894|119863119894 = 1 119890(119883119894)) = 119891(119883119894|119863119894 = 0 119890(119883119894))
bull Of course we have to know the true PS to have all theseresults work
Estimating the propensity score
bull Of course in observational studies we donrsquot know thepropensity score
bull We would run a parametric model with parameters 120574 toestimate the propensity scores
1 Estimate 2 Create 119894 = Pr[119863119894 = 1|119883119894 ]
bull For instance in R we could easily calculate the propensityscores using the glm function
pscores lt- glm(treat ~ var1 + var2 + var3 data = mydata
family = binomial())$fittedvalues
Propensity score specifics
bull What variables do we include in the propensity score model Any set of variables that blocks all the backdoor paths from 119863119894
to 119884119894
bull Check balance within strata of 119894 Covariates should bebalanced
119891(119883119894|119863119894 = 1 119894) = 119891(119883119894|119863119894 = 0 119894)
bull Can also use automatednonparametric tools for estimating 119894 Covariate Balancing Propensity Scores (Imai and Ratkovic)
Stratifying by the propensity score
bull How will we use the propensity score Matching (next week) Weighting (two weeks) Regression
(three weeks)bull Today coarsening the propensity score and stratifyingbull Choose boundary points 0 = 1198871113567 lt 1198871113568 lt hellip lt 119887119870minus1113568 lt 119887119870 = 1bull Create block indicators
119861119894(119896) =
⎧⎪⎪⎨⎪⎪⎩1 if 119887119896minus1113568 lt (119883119894) lt 1198871198960 otherwise
bull Calculate within-strata effect estimates
120591119896 = 120124[119884119894|119863119894 = 1 119861119894(119896) = 1] minus 120124[119884119894|119863119894 = 0 119861119894(119896) = 1]
Standardizationdirect adjustment
bull We calculated the CATEs for each strata of the PS 120591119896bull We can use law of iterated expectations to back out the ATEbull Take the average of the CATEs over the distribution of 119883
120591 =1198701114012119896=1113568
120591119896ℙ[119861119894(119896) = 1]
bull Note that ℙ[119861119894(119896) = 1] is just the proportion of units in block 119896
ℙ[119861119894(119896) = 1] =sum119873
119894=1113568 119861119894(119896)119873
5 Wrapping Up
Summary
bull Defined observational studiesbull Defined confounding and assessed when no unmeasured
confounding holdsbull Saw how no unmeasured confounding helps with OLSbull Saw how to estimate causal effects under no unmeasured
confounding using the propensity score
Next few weeks
bull Learn how to estimate causal effects under no unmeasuredconfounders via
Matching Weighting Regression
bull Then we move onto situations where no unmeasuredconfounders is violated
Classic example cigarspipes versuscigarettes
bull 119863119894 = 1 for pipecigar smokers 119863119894 = 0 for cigarette smokersbull 119884119894 = death in the first year of follow-upbull Naive positive effect cigarpipe smokers more likely to die
Whatrsquos the confounder here Age Pipecigar smokers much older than cigarette smokers
bull Cochranrsquos approach stratify based on coarsened age Divide age into 119896 strata 119878119894 isin 1199041113568 1199041113569 hellip 119904119896 1199041113568 might be 18-25 1199041113569 might be 26-35 and so on Calculate effect within strata and aggregate
bull Key assumption no unmeasured confounders using stratifiedversion of age
119863119894 ⟂⟂ 1114100119884119894(0) 119884119894(1)1114103|119878119894
Stratification on the propensity score
bull What about when 119883 has has many dimensionsbull Curse of dimensionality there will be very few if any units in
a given stratum of 119883119894bull Stratify on a low-dimensional summary the propensity score
119890(119909) = ℙ[119863119894 = 1|119883119894 = 119909]
PS = unitrsquos probability of being treated conditional on 119883119894
bull For a particular unit this is 119890(119883119894) = ℙ[119863119894 = 1|119883119894]bull Rosenbaum and Rubin (1983) showed that
119863119894 ⟂⟂ 1114100119884119894(0) 119884119894(1)1114103 | 119883119894 ⟹ 119863119894 ⟂⟂ 1114100119884119894(0) 119884119894(1)1114103 | 119890(119883119894)
stratifying on 119890119894 is the same as stratifying on the full 119883119894
Propensity score as balancing score
bull The propensity score is actually a balancing score whichmeans that
119863119894 ⟂⟂ 119883119894 | 119890(119883119894)
bull Conditional on the propensity score treatment is independentof the covariates
Treatment status is said to be balanced 119891(119883119894|119863119894 = 1 119890(119883119894)) = 119891(119883119894|119863119894 = 0 119890(119883119894))
bull Of course we have to know the true PS to have all theseresults work
Estimating the propensity score
bull Of course in observational studies we donrsquot know thepropensity score
bull We would run a parametric model with parameters 120574 toestimate the propensity scores
1 Estimate 2 Create 119894 = Pr[119863119894 = 1|119883119894 ]
bull For instance in R we could easily calculate the propensityscores using the glm function
pscores lt- glm(treat ~ var1 + var2 + var3 data = mydata
family = binomial())$fittedvalues
Propensity score specifics
bull What variables do we include in the propensity score model Any set of variables that blocks all the backdoor paths from 119863119894
to 119884119894
bull Check balance within strata of 119894 Covariates should bebalanced
119891(119883119894|119863119894 = 1 119894) = 119891(119883119894|119863119894 = 0 119894)
bull Can also use automatednonparametric tools for estimating 119894 Covariate Balancing Propensity Scores (Imai and Ratkovic)
Stratifying by the propensity score
bull How will we use the propensity score Matching (next week) Weighting (two weeks) Regression
(three weeks)bull Today coarsening the propensity score and stratifyingbull Choose boundary points 0 = 1198871113567 lt 1198871113568 lt hellip lt 119887119870minus1113568 lt 119887119870 = 1bull Create block indicators
119861119894(119896) =
⎧⎪⎪⎨⎪⎪⎩1 if 119887119896minus1113568 lt (119883119894) lt 1198871198960 otherwise
bull Calculate within-strata effect estimates
120591119896 = 120124[119884119894|119863119894 = 1 119861119894(119896) = 1] minus 120124[119884119894|119863119894 = 0 119861119894(119896) = 1]
Standardizationdirect adjustment
bull We calculated the CATEs for each strata of the PS 120591119896bull We can use law of iterated expectations to back out the ATEbull Take the average of the CATEs over the distribution of 119883
120591 =1198701114012119896=1113568
120591119896ℙ[119861119894(119896) = 1]
bull Note that ℙ[119861119894(119896) = 1] is just the proportion of units in block 119896
ℙ[119861119894(119896) = 1] =sum119873
119894=1113568 119861119894(119896)119873
5 Wrapping Up
Summary
bull Defined observational studiesbull Defined confounding and assessed when no unmeasured
confounding holdsbull Saw how no unmeasured confounding helps with OLSbull Saw how to estimate causal effects under no unmeasured
confounding using the propensity score
Next few weeks
bull Learn how to estimate causal effects under no unmeasuredconfounders via
Matching Weighting Regression
bull Then we move onto situations where no unmeasuredconfounders is violated
Stratification on the propensity score
bull What about when 119883 has has many dimensionsbull Curse of dimensionality there will be very few if any units in
a given stratum of 119883119894bull Stratify on a low-dimensional summary the propensity score
119890(119909) = ℙ[119863119894 = 1|119883119894 = 119909]
PS = unitrsquos probability of being treated conditional on 119883119894
bull For a particular unit this is 119890(119883119894) = ℙ[119863119894 = 1|119883119894]bull Rosenbaum and Rubin (1983) showed that
119863119894 ⟂⟂ 1114100119884119894(0) 119884119894(1)1114103 | 119883119894 ⟹ 119863119894 ⟂⟂ 1114100119884119894(0) 119884119894(1)1114103 | 119890(119883119894)
stratifying on 119890119894 is the same as stratifying on the full 119883119894
Propensity score as balancing score
bull The propensity score is actually a balancing score whichmeans that
119863119894 ⟂⟂ 119883119894 | 119890(119883119894)
bull Conditional on the propensity score treatment is independentof the covariates
Treatment status is said to be balanced 119891(119883119894|119863119894 = 1 119890(119883119894)) = 119891(119883119894|119863119894 = 0 119890(119883119894))
bull Of course we have to know the true PS to have all theseresults work
Estimating the propensity score
bull Of course in observational studies we donrsquot know thepropensity score
bull We would run a parametric model with parameters 120574 toestimate the propensity scores
1 Estimate 2 Create 119894 = Pr[119863119894 = 1|119883119894 ]
bull For instance in R we could easily calculate the propensityscores using the glm function
pscores lt- glm(treat ~ var1 + var2 + var3 data = mydata
family = binomial())$fittedvalues
Propensity score specifics
bull What variables do we include in the propensity score model Any set of variables that blocks all the backdoor paths from 119863119894
to 119884119894
bull Check balance within strata of 119894 Covariates should bebalanced
119891(119883119894|119863119894 = 1 119894) = 119891(119883119894|119863119894 = 0 119894)
bull Can also use automatednonparametric tools for estimating 119894 Covariate Balancing Propensity Scores (Imai and Ratkovic)
Stratifying by the propensity score
bull How will we use the propensity score Matching (next week) Weighting (two weeks) Regression
(three weeks)bull Today coarsening the propensity score and stratifyingbull Choose boundary points 0 = 1198871113567 lt 1198871113568 lt hellip lt 119887119870minus1113568 lt 119887119870 = 1bull Create block indicators
119861119894(119896) =
⎧⎪⎪⎨⎪⎪⎩1 if 119887119896minus1113568 lt (119883119894) lt 1198871198960 otherwise
bull Calculate within-strata effect estimates
120591119896 = 120124[119884119894|119863119894 = 1 119861119894(119896) = 1] minus 120124[119884119894|119863119894 = 0 119861119894(119896) = 1]
Standardizationdirect adjustment
bull We calculated the CATEs for each strata of the PS 120591119896bull We can use law of iterated expectations to back out the ATEbull Take the average of the CATEs over the distribution of 119883
120591 =1198701114012119896=1113568
120591119896ℙ[119861119894(119896) = 1]
bull Note that ℙ[119861119894(119896) = 1] is just the proportion of units in block 119896
ℙ[119861119894(119896) = 1] =sum119873
119894=1113568 119861119894(119896)119873
5 Wrapping Up
Summary
bull Defined observational studiesbull Defined confounding and assessed when no unmeasured
confounding holdsbull Saw how no unmeasured confounding helps with OLSbull Saw how to estimate causal effects under no unmeasured
confounding using the propensity score
Next few weeks
bull Learn how to estimate causal effects under no unmeasuredconfounders via
Matching Weighting Regression
bull Then we move onto situations where no unmeasuredconfounders is violated
Propensity score as balancing score
bull The propensity score is actually a balancing score whichmeans that
119863119894 ⟂⟂ 119883119894 | 119890(119883119894)
bull Conditional on the propensity score treatment is independentof the covariates
Treatment status is said to be balanced 119891(119883119894|119863119894 = 1 119890(119883119894)) = 119891(119883119894|119863119894 = 0 119890(119883119894))
bull Of course we have to know the true PS to have all theseresults work
Estimating the propensity score
bull Of course in observational studies we donrsquot know thepropensity score
bull We would run a parametric model with parameters 120574 toestimate the propensity scores
1 Estimate 2 Create 119894 = Pr[119863119894 = 1|119883119894 ]
bull For instance in R we could easily calculate the propensityscores using the glm function
pscores lt- glm(treat ~ var1 + var2 + var3 data = mydata
family = binomial())$fittedvalues
Propensity score specifics
bull What variables do we include in the propensity score model Any set of variables that blocks all the backdoor paths from 119863119894
to 119884119894
bull Check balance within strata of 119894 Covariates should bebalanced
119891(119883119894|119863119894 = 1 119894) = 119891(119883119894|119863119894 = 0 119894)
bull Can also use automatednonparametric tools for estimating 119894 Covariate Balancing Propensity Scores (Imai and Ratkovic)
Stratifying by the propensity score
bull How will we use the propensity score Matching (next week) Weighting (two weeks) Regression
(three weeks)bull Today coarsening the propensity score and stratifyingbull Choose boundary points 0 = 1198871113567 lt 1198871113568 lt hellip lt 119887119870minus1113568 lt 119887119870 = 1bull Create block indicators
119861119894(119896) =
⎧⎪⎪⎨⎪⎪⎩1 if 119887119896minus1113568 lt (119883119894) lt 1198871198960 otherwise
bull Calculate within-strata effect estimates
120591119896 = 120124[119884119894|119863119894 = 1 119861119894(119896) = 1] minus 120124[119884119894|119863119894 = 0 119861119894(119896) = 1]
Standardizationdirect adjustment
bull We calculated the CATEs for each strata of the PS 120591119896bull We can use law of iterated expectations to back out the ATEbull Take the average of the CATEs over the distribution of 119883
120591 =1198701114012119896=1113568
120591119896ℙ[119861119894(119896) = 1]
bull Note that ℙ[119861119894(119896) = 1] is just the proportion of units in block 119896
ℙ[119861119894(119896) = 1] =sum119873
119894=1113568 119861119894(119896)119873
5 Wrapping Up
Summary
bull Defined observational studiesbull Defined confounding and assessed when no unmeasured
confounding holdsbull Saw how no unmeasured confounding helps with OLSbull Saw how to estimate causal effects under no unmeasured
confounding using the propensity score
Next few weeks
bull Learn how to estimate causal effects under no unmeasuredconfounders via
Matching Weighting Regression
bull Then we move onto situations where no unmeasuredconfounders is violated
Estimating the propensity score
bull Of course in observational studies we donrsquot know thepropensity score
bull We would run a parametric model with parameters 120574 toestimate the propensity scores
1 Estimate 2 Create 119894 = Pr[119863119894 = 1|119883119894 ]
bull For instance in R we could easily calculate the propensityscores using the glm function
pscores lt- glm(treat ~ var1 + var2 + var3 data = mydata
family = binomial())$fittedvalues
Propensity score specifics
bull What variables do we include in the propensity score model Any set of variables that blocks all the backdoor paths from 119863119894
to 119884119894
bull Check balance within strata of 119894 Covariates should bebalanced
119891(119883119894|119863119894 = 1 119894) = 119891(119883119894|119863119894 = 0 119894)
bull Can also use automatednonparametric tools for estimating 119894 Covariate Balancing Propensity Scores (Imai and Ratkovic)
Stratifying by the propensity score
bull How will we use the propensity score Matching (next week) Weighting (two weeks) Regression
(three weeks)bull Today coarsening the propensity score and stratifyingbull Choose boundary points 0 = 1198871113567 lt 1198871113568 lt hellip lt 119887119870minus1113568 lt 119887119870 = 1bull Create block indicators
119861119894(119896) =
⎧⎪⎪⎨⎪⎪⎩1 if 119887119896minus1113568 lt (119883119894) lt 1198871198960 otherwise
bull Calculate within-strata effect estimates
120591119896 = 120124[119884119894|119863119894 = 1 119861119894(119896) = 1] minus 120124[119884119894|119863119894 = 0 119861119894(119896) = 1]
Standardizationdirect adjustment
bull We calculated the CATEs for each strata of the PS 120591119896bull We can use law of iterated expectations to back out the ATEbull Take the average of the CATEs over the distribution of 119883
120591 =1198701114012119896=1113568
120591119896ℙ[119861119894(119896) = 1]
bull Note that ℙ[119861119894(119896) = 1] is just the proportion of units in block 119896
ℙ[119861119894(119896) = 1] =sum119873
119894=1113568 119861119894(119896)119873
5 Wrapping Up
Summary
bull Defined observational studiesbull Defined confounding and assessed when no unmeasured
confounding holdsbull Saw how no unmeasured confounding helps with OLSbull Saw how to estimate causal effects under no unmeasured
confounding using the propensity score
Next few weeks
bull Learn how to estimate causal effects under no unmeasuredconfounders via
Matching Weighting Regression
bull Then we move onto situations where no unmeasuredconfounders is violated
Propensity score specifics
bull What variables do we include in the propensity score model Any set of variables that blocks all the backdoor paths from 119863119894
to 119884119894
bull Check balance within strata of 119894 Covariates should bebalanced
119891(119883119894|119863119894 = 1 119894) = 119891(119883119894|119863119894 = 0 119894)
bull Can also use automatednonparametric tools for estimating 119894 Covariate Balancing Propensity Scores (Imai and Ratkovic)
Stratifying by the propensity score
bull How will we use the propensity score Matching (next week) Weighting (two weeks) Regression
(three weeks)bull Today coarsening the propensity score and stratifyingbull Choose boundary points 0 = 1198871113567 lt 1198871113568 lt hellip lt 119887119870minus1113568 lt 119887119870 = 1bull Create block indicators
119861119894(119896) =
⎧⎪⎪⎨⎪⎪⎩1 if 119887119896minus1113568 lt (119883119894) lt 1198871198960 otherwise
bull Calculate within-strata effect estimates
120591119896 = 120124[119884119894|119863119894 = 1 119861119894(119896) = 1] minus 120124[119884119894|119863119894 = 0 119861119894(119896) = 1]
Standardizationdirect adjustment
bull We calculated the CATEs for each strata of the PS 120591119896bull We can use law of iterated expectations to back out the ATEbull Take the average of the CATEs over the distribution of 119883
120591 =1198701114012119896=1113568
120591119896ℙ[119861119894(119896) = 1]
bull Note that ℙ[119861119894(119896) = 1] is just the proportion of units in block 119896
ℙ[119861119894(119896) = 1] =sum119873
119894=1113568 119861119894(119896)119873
5 Wrapping Up
Summary
bull Defined observational studiesbull Defined confounding and assessed when no unmeasured
confounding holdsbull Saw how no unmeasured confounding helps with OLSbull Saw how to estimate causal effects under no unmeasured
confounding using the propensity score
Next few weeks
bull Learn how to estimate causal effects under no unmeasuredconfounders via
Matching Weighting Regression
bull Then we move onto situations where no unmeasuredconfounders is violated
Stratifying by the propensity score
bull How will we use the propensity score Matching (next week) Weighting (two weeks) Regression
(three weeks)bull Today coarsening the propensity score and stratifyingbull Choose boundary points 0 = 1198871113567 lt 1198871113568 lt hellip lt 119887119870minus1113568 lt 119887119870 = 1bull Create block indicators
119861119894(119896) =
⎧⎪⎪⎨⎪⎪⎩1 if 119887119896minus1113568 lt (119883119894) lt 1198871198960 otherwise
bull Calculate within-strata effect estimates
120591119896 = 120124[119884119894|119863119894 = 1 119861119894(119896) = 1] minus 120124[119884119894|119863119894 = 0 119861119894(119896) = 1]
Standardizationdirect adjustment
bull We calculated the CATEs for each strata of the PS 120591119896bull We can use law of iterated expectations to back out the ATEbull Take the average of the CATEs over the distribution of 119883
120591 =1198701114012119896=1113568
120591119896ℙ[119861119894(119896) = 1]
bull Note that ℙ[119861119894(119896) = 1] is just the proportion of units in block 119896
ℙ[119861119894(119896) = 1] =sum119873
119894=1113568 119861119894(119896)119873
5 Wrapping Up
Summary
bull Defined observational studiesbull Defined confounding and assessed when no unmeasured
confounding holdsbull Saw how no unmeasured confounding helps with OLSbull Saw how to estimate causal effects under no unmeasured
confounding using the propensity score
Next few weeks
bull Learn how to estimate causal effects under no unmeasuredconfounders via
Matching Weighting Regression
bull Then we move onto situations where no unmeasuredconfounders is violated
Standardizationdirect adjustment
bull We calculated the CATEs for each strata of the PS 120591119896bull We can use law of iterated expectations to back out the ATEbull Take the average of the CATEs over the distribution of 119883
120591 =1198701114012119896=1113568
120591119896ℙ[119861119894(119896) = 1]
bull Note that ℙ[119861119894(119896) = 1] is just the proportion of units in block 119896
ℙ[119861119894(119896) = 1] =sum119873
119894=1113568 119861119894(119896)119873
5 Wrapping Up
Summary
bull Defined observational studiesbull Defined confounding and assessed when no unmeasured
confounding holdsbull Saw how no unmeasured confounding helps with OLSbull Saw how to estimate causal effects under no unmeasured
confounding using the propensity score
Next few weeks
bull Learn how to estimate causal effects under no unmeasuredconfounders via
Matching Weighting Regression
bull Then we move onto situations where no unmeasuredconfounders is violated
5 Wrapping Up
Summary
bull Defined observational studiesbull Defined confounding and assessed when no unmeasured
confounding holdsbull Saw how no unmeasured confounding helps with OLSbull Saw how to estimate causal effects under no unmeasured
confounding using the propensity score
Next few weeks
bull Learn how to estimate causal effects under no unmeasuredconfounders via
Matching Weighting Regression
bull Then we move onto situations where no unmeasuredconfounders is violated
Summary
bull Defined observational studiesbull Defined confounding and assessed when no unmeasured
confounding holdsbull Saw how no unmeasured confounding helps with OLSbull Saw how to estimate causal effects under no unmeasured
confounding using the propensity score
Next few weeks
bull Learn how to estimate causal effects under no unmeasuredconfounders via
Matching Weighting Regression
bull Then we move onto situations where no unmeasuredconfounders is violated
Next few weeks
bull Learn how to estimate causal effects under no unmeasuredconfounders via
Matching Weighting Regression
bull Then we move onto situations where no unmeasuredconfounders is violated