+ All Categories
Home > Documents > Gov 2002: 4. Observational Studies and Confounding · 2019-07-23 · 2. the coefficient of...

Gov 2002: 4. Observational Studies and Confounding · 2019-07-23 · 2. the coefficient of...

Date post: 19-Jun-2020
Category:
Upload: others
View: 0 times
Download: 0 times
Share this document with a friend
43
Gov 2002: 4. Observational Studies and Confounding Matthew Blackwell September 10, 2015
Transcript
Page 1: Gov 2002: 4. Observational Studies and Confounding · 2019-07-23 · 2. the coefficient of regression of𝐷𝑖on 𝐿𝑖also controlling for ... Basicapproachtoestimation • Remember

Gov 2002 4 Observational Studiesand Confounding

Matthew Blackwell

September 10 2015

Where are we Where are we going

bull Last two weeks randomized experimentsbull From here on observational studies

What are they How do they admit the possiblity of confounding How can we adjust for confounding

1 Observationalstudies

Experiment review

bull An experiment is a study where assignment to treatment iscontrolled by the researcher

119901119894 = ℙ[119863119894 = 1] be the probability of treatment assignmentprobability

119901119894 is controlled and known by researcher in an experimentbull A randomized experiment is an experiment with the following

properties

1 Positivity assignment is probabilistic 0 lt 119901119894 lt 1 No deterministic assignment

2 Unconfoundedness ℙ[119863119894 = 1|119832(1) 119832(0)] = ℙ[119863119894 = 1] Treatment assignment does not depend on any potential

outcomes Sometimes written as 119863119894 ⟂⟂ (119832(1) 119832(0))

Observational studies

bull Many different sets of identification assumptions that wersquollcover

bull To start focus on studies that are similar to experiments justwithout a known and controlled treatment assignment

No guarantee that the treatment and control groups arecomparable

1 Positivity assignment is probabilistic0 lt ℙ[119863119894 = 1|119831 119832(1) 119832(0)] lt 1

2 No unmeasured confoundingℙ[119863119894 = 1|119831 119832(1) 119832(0)] = ℙ[119863119894 = 1|119831]

For some observed 119831 Also called unconfoundedness ignorability selection on

observables no omitted variables exogenous conditionalexchangeable etc

Designing observational studies

bull Rubin (2008) argues that we should still ldquodesignrdquo ourobservational studies

Pick the ideal experiment to this observational study Hide the outcome data Try to estimate the randomization procedure Analyze this as an experiment with this estimated procedure

bull Tries to minimize ldquosnoopingrdquo by picking the best modelingstrategy before seeing the outcome

Discrete covariates

bull Suppose that we knew that 119863119894 was unconfounded within levelsof a binary 119883119894

bull Then we could always estimate the causal effect using iteratedexpectations as in a stratified randomized experiment

1201241198831114107120124[119884119894|119863119894 = 1119883119894] minus 120124[119884119894|119863119894 = 0119883119894]1114110

= 1114101120124[119884119894|119863119894 = 1119883119894 = 1] minus 120124[119884119894|119863119894 = 0119883119894 = 1]1114104111405911138401113840111384011138401113840111384011138401113840111384011138401113840111384011138401113840111384011138401113840111384011138401113840111384011138401113840111384011138401113840111384011138401113840111384011140601113840111384011138401113840111384011138401113840111384011138401113840111384011138401113840111384011138401113840111384011138401113840111384011138401113840111384011138401113840111384011138401113840111384011138401114061

diff-in-means for 119883119894=1113568

ℙ[119883119894 = 1]11140591113840111384011138401113840111406011138401113840111384011138401114061share of 119883119894=1113568

+ 1114101120124[119884119894|119863119894 = 1119883119894 = 0] minus 120124[119884119894|119863119894 = 0119883119894 = 0]1114104111405911138401113840111384011138401113840111384011138401113840111384011138401113840111384011138401113840111384011138401113840111384011138401113840111384011138401113840111384011138401113840111384011138401113840111384011140601113840111384011138401113840111384011138401113840111384011138401113840111384011138401113840111384011138401113840111384011138401113840111384011138401113840111384011138401113840111384011138401113840111384011138401114061

diff-in-means for 119883119894=1113567

ℙ[119883119894 = 0]11140591113840111384011138401113840111406011138401113840111384011138401114061share of 119883119894=1113567

bull Never used our knowledge of the randomization for thisquantity

Continuous covariates

bull So great we can stratify Why not do this all the timebull What if 119883119894 = income for unit 119894

Each unit has its own value of 119883119894 $54134 $123043 $23842 If 119883119894 = 54134 is unique will only observe 1 of these

120124[119884119894|119863119894 = 1119883119894 = 54134] minus 120124[119884119894|119863119894 = 0119883119894 = 54134]

cannot stratify to each unique value of 119883119894bull Practically this is massively important almost always have

data with unique values

Going to a superpopulation

bull From here on out wersquoll focus less on the finite populationmodel

Harder with (functionally) continuous covariatesbull Assume that each unit 119894 is drawn from an infinite

superpopulation implies that (119884119894(0) 119884119894(1) 119863119894 119883119894) are a draw from their

population joint distributionbull Potential outcomes are now typical random variables

120583119888(119909) = 120124[119884119894(0)|119883119894 = 119909] and 120583119905(119909) = 120124[119884119894(1)|119883119894 = 119909] 1205901113569119888 (119909) = 120141[119884119894(0)|119883119894 = 119909] and 1205901113569119905 (119909) = 120141[119884119894(1)|119883119894 = 119909] 120591 = 120124[120583119905(119909) minus 120583119888(119909)|119883119894 = 119909]

Assumptions in the superpopulation

bull With an infinite superpopulation worry less aboutconditioning on the entire sample

Units are now independent due to random sampling from aninfinite population

bull No unmeasured confoudning implies that

ℙ(119863119894 = 1|119884119894(0) 119884119894(1) 119883119894) = ℙ(119863119894 = 1|119883119894)

bull Or written using conditional independence

119863119894 ⟂⟂ 1114100119884119894(0) 119884119894(1)1114103|119883119894

bull Positivity can be written 0 lt ℙ(119863119894 = 1|119883119894 = 119909) lt 1 for all 119909 inthe support of 119883119894

2 Confounding

What is confounding

bull Confounding is the bias caused by common causes of thetreatment and outcome

Leads to ldquospurious correlationrdquobull In observational studies the goal is to avoid confounding

inherent in the databull Pervasive in the social sciences

effect of income on voting (confounding age) effect of job training program on employment (confounding

motivation) effect of political institutions on economic development

(confounding previous economic development)bull No unmeasured confounding assumes that wersquove measured all

sources of confounding

Big problem

bull How can we determine if no unmeasured confounding holds ifwe didnrsquot assign the treatment

bull Put differently What covariates do we need to condition on What covariates do we need to match on What covaraites do we need to include in our regressions

bull One way from the assumption itself ℙ[119863119894 = 1|119831 119832(1) 119832(0)] = ℙ[119863119894 = 1|119831] Include covariates such that conditional on them the

treatment assignment does not depend on the potentialoutcomes

bull Another way use DAGs and look at back-door paths

Backdoor paths and blocking paths

bull Backdoor path is a non-causal path from 119863 to 119884 Would remain if we removed any arrows pointing out of 119863

bull Backdoor paths between 119863 and 119884 common causes of 119863and 119884

119863

119883

119884

bull Here there is a backdoor path 119863 larr 119883 rarr 119884 where 119883 is acommon cause for the treatment and the outcome

Other types of confounding

119863

119880 119883

119884

bull 119863 is enrolling in a job training programbull 119884 is getting a jobbull 119880 is being motivatedbull 119883 is number of job applications sent outbull Big assumption here no arrow from 119880 to 119884

Other types of confounding

119863

119880 119883

119884

bull 119863 is exercisebull 119884 is having a diseasebull 119880 is lifestylebull 119883 is smokingbull Big assumption here no arrow from 119880 to 119884

Whatrsquos the problem with backdoorpaths

119863

119880 119883

119884

bull A path is blocked if1 we control for or stratify a non-collider on that path OR2 we do not control for a collider

bull Unblocked backdoor paths confoundingbull In the DAG here if we condition on 119883 then the backdoor

path is blocked

Not all backdoor paths

119863

1198801113568119883119883

119884

bull Conditioning on the posttreatment covariates opens thenon-causal path

selection bias

M-bias

119863

1198801113568 1198801113569119883119883

119884

bull Not all backdoor paths induce confoundingbull This backdoor path is blocked by the collider 119883119894 that we donrsquot

control forbull If we control for 119883119894 opens the path and induces

confounding Sometimes called M-bias

bull Controversial because of differing views on what to control for Rubin thinks that M-bias is a ldquomathematical curiosityrdquo and we

should control for all pretreatment variables Pearl and others think M-bias is a real threat

Backdoor criterion

bull Can we use a DAG to evaluate no unmeasured confoundersbull Pearl answered yes with the backdoor criterion which states

that the effect of 119863 on 119884 is identified if1 No backdoor paths from 119863 to 119884 OR2 Measured covariates are sufficient to block all backdoor paths

from 119863 to 119884

bull First is really only valid for randomized experimentsbull The backdoor criterion is fairly powerful Tells us

if there confounding given this DAG if it is possible to removing the confounding and what variables to condition on to eliminate the confounding

SWIGs

119863 | 119889 119884(119889)

119880 119883

119884

bull Itrsquos a little hard to see how the backdoor criterion implies nounmeasured confounders

No potential outcomes on this graphbull Richardson and Robins Single World Intervention Graphs

Split 119863 node into natural value (119863) and intervention value 119889 Let all effects of 119863 take their potential value under

intervention 119884(119889)bull Now can see are 119863 and 119884(119889) related

119863 larr 119880 rarr 119883 rarr 119884(119889) implies not independent Conditioning on 119883 blocks that backdoor path 119863 ⟂⟂ 119884(119889)|119883

No unmeasured confounders is nottestable

bull No unmeasured confounding places no restrictions on theobserved data

1114100119884119894(0)|119863119894 = 1119883119894111410311140591113840111384011138401113840111384011138401113840111384011138401113840111406011138401113840111384011138401113840111384011138401113840111384011138401114061unobserved

119889= 1114100119884119894(0)|119863119894 = 0119883119894111410311140591113840111384011138401113840111384011138401113840111384011138401113840111406011138401113840111384011138401113840111384011138401113840111384011138401114061observed

bull Here 119889= means equal in distributionbull No way to directly test this assumption without the

counterfactual data which is missing by definitionbull With backdoor criterion you must have the correct DAG

Assessing no unmeasured confounders

bull Can do ldquoplacebordquo tests where 119863119894 cannot have an effect(lagged outcomes etc)

bull Della Vigna and Kaplan (2007 QJE) effect of Fox Newsavailability on Republican vote share

Availability in 20002003 canrsquot affect past vote sharesbull Unconfoundedness could still be violated even if you pass this

test

Alternatives to no unmeasuredconfounding

bull Without explicit randomization we need some way ofidentifying causal effects

bull No unmeasured confounders asymp randomized experiment Indentification results very similar to experiments

bull With unmeasured confounding are we doomed Maybe notbull Other approaches rely on finding plausibly exogenous variation

in assignment of 119863119894 Instrumental variables (randomization + exclusion restriction) Over-time variation (diff-in-diff fixed effects) Arbitrary thresholds for treatment assignment (RDD)

3 No unmeasuredconfounders and OLS

Justifying regression

bull We know how randomized experiments imply thatdifferences-in-means identify the ATE

bull In the next few weeks wersquoll work through how no unmeasuredconfounding justifies a number of estimation strategies

bull Today itrsquos useful to walk through what no unmeasuredconfounding can buy us in a familiar setting OLS

Wersquoll cover regression more formally later

Constant effects set up

bull Assume a constant effects setup

119884119894(0) = 120572 + 119883prime119894 120573 + 119906119894

119884119894(1) = 120572 + 120591 + 119883prime119894 120573 + 119906119894

bull Constant effects because 119884119894(1) minus 119884119894(0) = 120591 for all unitsbull Use consistency to get the usual regression formula

119884119894 = 119884119894(1)119863119894 + 119884119894(0)(1 minus 119863119894)= 119884119894(0) + 1114100119884119894(1) minus 119884119894(0)1114103 sdot 119863119894

= 120572 + 120591 sdot 119863119894 + 119883prime119894 120573 + 119906119894

bull Does no unmeasured confounding help us identify the causalparameter 120591

Regression on residuals

bull First estimate the residuals of regression of the treatment andoutcome on the covariates

119894 = 119884119894 minus 120124[119884119894|119883119894]119894 = 119863119894 minus 120124[119863119894|119883119894]

bull Running a regression of 119894 on 119894 is equivalent to controllingfor 119883119894

119884119894 = 120572 + 120591 sdot 119863119894 + 119883prime119894 120573 + 119906119894

119894 = 120572 + 120591 sdot 119894 + 119894bull Here 119894 = 119906119894 minus 120124[119906119894|119883119894]

What does OLS estimate

bull Using the usual OLS theory we can show that the probabilitylimit of the OLS estimator of 120591 is

plim 111369611136931113700 =Cov(119894 119894)Var(119894)

= Cov(119894 120572 + 120591 sdot 119894 + 119894)Var(119894)

= 120591 sdot Cov(119894 119894) + Cov(119894 119894)Var(119894)

= 120591 + Cov(119894 119894)Var(119894)

Key OLS assumption

plim 111369611136931113700 = 120591 + Cov(119894 119894)Var(119894)

bull Key identification comes from Cov(119894 119894) = 0 Conditional on 119883119894 no relationship between 119863119894 and 119906119894

bull Note 119906119894 is a function of 119883119894 and 119884119894(119889) 119906119894 = 119884119894(0) minus 120572 minus 119883prime

119894 120573 when 119863119894 = 0 119906119894 = 119884119894(1) minus 120572 minus 120591 minus 119883prime

119894 120573 when 119863119894 = 1 condition on 119883119894 only variation in 119906119894 comes from 119884119894(119889)

bull No unmeasured confounding implies this assumption

119863119894 ⟂⟂ 1114100119884119894(1) 119884119894(0)1114103|119883119894 ⟹ 119863119894 ⟂⟂ 119906119894|119883119894 ⟹ Cov(119894 119894) = 0

Omitted variable bias

bull What happens when this is violated Suppose that there isone omitted variable (residualized from 119883119894)

119894 = 120582119894 + 120596119894

bull Wersquoll assume that if we could measure 119871119894 then nounmeasured confounding would hold

bull Leads to inconsistency in the OLS estimator

plim 111369611136931113700 = 120591 + 120582Cov(119894 119894)Var(119894)

bull Bias here is terms multiplied together1 coefficient on 119871119894 (120582)2 the coefficient of regression of 119863119894 on 119871119894 also controlling for 119883119894

4 Estimating causaleffects under nounmeasuredconfounders

Basic approach to estimation

bull Remember the usual approach to estimating the ATE withcovariates

bull Stratification Stratify the units by the covariates Calculate CATE within these strata

bull Standardizationdirect adjustment Average the CATEs across the strata to get ATE

bull How to create strata when 119883 has continuous components If 119883 is discrete with only a few levels can use the exact values

of 119883 Otherwise we may have to subclassifycoarsen the data

Classic example cigarspipes versuscigarettes

bull 119863119894 = 1 for pipecigar smokers 119863119894 = 0 for cigarette smokersbull 119884119894 = death in the first year of follow-upbull Naive positive effect cigarpipe smokers more likely to die

Whatrsquos the confounder here Age Pipecigar smokers much older than cigarette smokers

bull Cochranrsquos approach stratify based on coarsened age Divide age into 119896 strata 119878119894 isin 1199041113568 1199041113569 hellip 119904119896 1199041113568 might be 18-25 1199041113569 might be 26-35 and so on Calculate effect within strata and aggregate

bull Key assumption no unmeasured confounders using stratifiedversion of age

119863119894 ⟂⟂ 1114100119884119894(0) 119884119894(1)1114103|119878119894

Stratification on the propensity score

bull What about when 119883 has has many dimensionsbull Curse of dimensionality there will be very few if any units in

a given stratum of 119883119894bull Stratify on a low-dimensional summary the propensity score

119890(119909) = ℙ[119863119894 = 1|119883119894 = 119909]

PS = unitrsquos probability of being treated conditional on 119883119894

bull For a particular unit this is 119890(119883119894) = ℙ[119863119894 = 1|119883119894]bull Rosenbaum and Rubin (1983) showed that

119863119894 ⟂⟂ 1114100119884119894(0) 119884119894(1)1114103 | 119883119894 ⟹ 119863119894 ⟂⟂ 1114100119884119894(0) 119884119894(1)1114103 | 119890(119883119894)

stratifying on 119890119894 is the same as stratifying on the full 119883119894

Propensity score as balancing score

bull The propensity score is actually a balancing score whichmeans that

119863119894 ⟂⟂ 119883119894 | 119890(119883119894)

bull Conditional on the propensity score treatment is independentof the covariates

Treatment status is said to be balanced 119891(119883119894|119863119894 = 1 119890(119883119894)) = 119891(119883119894|119863119894 = 0 119890(119883119894))

bull Of course we have to know the true PS to have all theseresults work

Estimating the propensity score

bull Of course in observational studies we donrsquot know thepropensity score

bull We would run a parametric model with parameters 120574 toestimate the propensity scores

1 Estimate 2 Create 119894 = Pr[119863119894 = 1|119883119894 ]

bull For instance in R we could easily calculate the propensityscores using the glm function

pscores lt- glm(treat ~ var1 + var2 + var3 data = mydata

family = binomial())$fittedvalues

Propensity score specifics

bull What variables do we include in the propensity score model Any set of variables that blocks all the backdoor paths from 119863119894

to 119884119894

bull Check balance within strata of 119894 Covariates should bebalanced

119891(119883119894|119863119894 = 1 119894) = 119891(119883119894|119863119894 = 0 119894)

bull Can also use automatednonparametric tools for estimating 119894 Covariate Balancing Propensity Scores (Imai and Ratkovic)

Stratifying by the propensity score

bull How will we use the propensity score Matching (next week) Weighting (two weeks) Regression

(three weeks)bull Today coarsening the propensity score and stratifyingbull Choose boundary points 0 = 1198871113567 lt 1198871113568 lt hellip lt 119887119870minus1113568 lt 119887119870 = 1bull Create block indicators

119861119894(119896) =

⎧⎪⎪⎨⎪⎪⎩1 if 119887119896minus1113568 lt (119883119894) lt 1198871198960 otherwise

bull Calculate within-strata effect estimates

120591119896 = 120124[119884119894|119863119894 = 1 119861119894(119896) = 1] minus 120124[119884119894|119863119894 = 0 119861119894(119896) = 1]

Standardizationdirect adjustment

bull We calculated the CATEs for each strata of the PS 120591119896bull We can use law of iterated expectations to back out the ATEbull Take the average of the CATEs over the distribution of 119883

120591 =1198701114012119896=1113568

120591119896ℙ[119861119894(119896) = 1]

bull Note that ℙ[119861119894(119896) = 1] is just the proportion of units in block 119896

ℙ[119861119894(119896) = 1] =sum119873

119894=1113568 119861119894(119896)119873

5 Wrapping Up

Summary

bull Defined observational studiesbull Defined confounding and assessed when no unmeasured

confounding holdsbull Saw how no unmeasured confounding helps with OLSbull Saw how to estimate causal effects under no unmeasured

confounding using the propensity score

Next few weeks

bull Learn how to estimate causal effects under no unmeasuredconfounders via

Matching Weighting Regression

bull Then we move onto situations where no unmeasuredconfounders is violated

  • Observational studies
  • Confounding
  • No unmeasured confounders and OLS
  • Estimating causal effects under no unmeasured confounders
  • Wrapping Up
Page 2: Gov 2002: 4. Observational Studies and Confounding · 2019-07-23 · 2. the coefficient of regression of𝐷𝑖on 𝐿𝑖also controlling for ... Basicapproachtoestimation • Remember

Where are we Where are we going

bull Last two weeks randomized experimentsbull From here on observational studies

What are they How do they admit the possiblity of confounding How can we adjust for confounding

1 Observationalstudies

Experiment review

bull An experiment is a study where assignment to treatment iscontrolled by the researcher

119901119894 = ℙ[119863119894 = 1] be the probability of treatment assignmentprobability

119901119894 is controlled and known by researcher in an experimentbull A randomized experiment is an experiment with the following

properties

1 Positivity assignment is probabilistic 0 lt 119901119894 lt 1 No deterministic assignment

2 Unconfoundedness ℙ[119863119894 = 1|119832(1) 119832(0)] = ℙ[119863119894 = 1] Treatment assignment does not depend on any potential

outcomes Sometimes written as 119863119894 ⟂⟂ (119832(1) 119832(0))

Observational studies

bull Many different sets of identification assumptions that wersquollcover

bull To start focus on studies that are similar to experiments justwithout a known and controlled treatment assignment

No guarantee that the treatment and control groups arecomparable

1 Positivity assignment is probabilistic0 lt ℙ[119863119894 = 1|119831 119832(1) 119832(0)] lt 1

2 No unmeasured confoundingℙ[119863119894 = 1|119831 119832(1) 119832(0)] = ℙ[119863119894 = 1|119831]

For some observed 119831 Also called unconfoundedness ignorability selection on

observables no omitted variables exogenous conditionalexchangeable etc

Designing observational studies

bull Rubin (2008) argues that we should still ldquodesignrdquo ourobservational studies

Pick the ideal experiment to this observational study Hide the outcome data Try to estimate the randomization procedure Analyze this as an experiment with this estimated procedure

bull Tries to minimize ldquosnoopingrdquo by picking the best modelingstrategy before seeing the outcome

Discrete covariates

bull Suppose that we knew that 119863119894 was unconfounded within levelsof a binary 119883119894

bull Then we could always estimate the causal effect using iteratedexpectations as in a stratified randomized experiment

1201241198831114107120124[119884119894|119863119894 = 1119883119894] minus 120124[119884119894|119863119894 = 0119883119894]1114110

= 1114101120124[119884119894|119863119894 = 1119883119894 = 1] minus 120124[119884119894|119863119894 = 0119883119894 = 1]1114104111405911138401113840111384011138401113840111384011138401113840111384011138401113840111384011138401113840111384011138401113840111384011138401113840111384011138401113840111384011138401113840111384011138401113840111384011140601113840111384011138401113840111384011138401113840111384011138401113840111384011138401113840111384011138401113840111384011138401113840111384011138401113840111384011138401113840111384011138401113840111384011138401114061

diff-in-means for 119883119894=1113568

ℙ[119883119894 = 1]11140591113840111384011138401113840111406011138401113840111384011138401114061share of 119883119894=1113568

+ 1114101120124[119884119894|119863119894 = 1119883119894 = 0] minus 120124[119884119894|119863119894 = 0119883119894 = 0]1114104111405911138401113840111384011138401113840111384011138401113840111384011138401113840111384011138401113840111384011138401113840111384011138401113840111384011138401113840111384011138401113840111384011138401113840111384011140601113840111384011138401113840111384011138401113840111384011138401113840111384011138401113840111384011138401113840111384011138401113840111384011138401113840111384011138401113840111384011138401113840111384011138401114061

diff-in-means for 119883119894=1113567

ℙ[119883119894 = 0]11140591113840111384011138401113840111406011138401113840111384011138401114061share of 119883119894=1113567

bull Never used our knowledge of the randomization for thisquantity

Continuous covariates

bull So great we can stratify Why not do this all the timebull What if 119883119894 = income for unit 119894

Each unit has its own value of 119883119894 $54134 $123043 $23842 If 119883119894 = 54134 is unique will only observe 1 of these

120124[119884119894|119863119894 = 1119883119894 = 54134] minus 120124[119884119894|119863119894 = 0119883119894 = 54134]

cannot stratify to each unique value of 119883119894bull Practically this is massively important almost always have

data with unique values

Going to a superpopulation

bull From here on out wersquoll focus less on the finite populationmodel

Harder with (functionally) continuous covariatesbull Assume that each unit 119894 is drawn from an infinite

superpopulation implies that (119884119894(0) 119884119894(1) 119863119894 119883119894) are a draw from their

population joint distributionbull Potential outcomes are now typical random variables

120583119888(119909) = 120124[119884119894(0)|119883119894 = 119909] and 120583119905(119909) = 120124[119884119894(1)|119883119894 = 119909] 1205901113569119888 (119909) = 120141[119884119894(0)|119883119894 = 119909] and 1205901113569119905 (119909) = 120141[119884119894(1)|119883119894 = 119909] 120591 = 120124[120583119905(119909) minus 120583119888(119909)|119883119894 = 119909]

Assumptions in the superpopulation

bull With an infinite superpopulation worry less aboutconditioning on the entire sample

Units are now independent due to random sampling from aninfinite population

bull No unmeasured confoudning implies that

ℙ(119863119894 = 1|119884119894(0) 119884119894(1) 119883119894) = ℙ(119863119894 = 1|119883119894)

bull Or written using conditional independence

119863119894 ⟂⟂ 1114100119884119894(0) 119884119894(1)1114103|119883119894

bull Positivity can be written 0 lt ℙ(119863119894 = 1|119883119894 = 119909) lt 1 for all 119909 inthe support of 119883119894

2 Confounding

What is confounding

bull Confounding is the bias caused by common causes of thetreatment and outcome

Leads to ldquospurious correlationrdquobull In observational studies the goal is to avoid confounding

inherent in the databull Pervasive in the social sciences

effect of income on voting (confounding age) effect of job training program on employment (confounding

motivation) effect of political institutions on economic development

(confounding previous economic development)bull No unmeasured confounding assumes that wersquove measured all

sources of confounding

Big problem

bull How can we determine if no unmeasured confounding holds ifwe didnrsquot assign the treatment

bull Put differently What covariates do we need to condition on What covariates do we need to match on What covaraites do we need to include in our regressions

bull One way from the assumption itself ℙ[119863119894 = 1|119831 119832(1) 119832(0)] = ℙ[119863119894 = 1|119831] Include covariates such that conditional on them the

treatment assignment does not depend on the potentialoutcomes

bull Another way use DAGs and look at back-door paths

Backdoor paths and blocking paths

bull Backdoor path is a non-causal path from 119863 to 119884 Would remain if we removed any arrows pointing out of 119863

bull Backdoor paths between 119863 and 119884 common causes of 119863and 119884

119863

119883

119884

bull Here there is a backdoor path 119863 larr 119883 rarr 119884 where 119883 is acommon cause for the treatment and the outcome

Other types of confounding

119863

119880 119883

119884

bull 119863 is enrolling in a job training programbull 119884 is getting a jobbull 119880 is being motivatedbull 119883 is number of job applications sent outbull Big assumption here no arrow from 119880 to 119884

Other types of confounding

119863

119880 119883

119884

bull 119863 is exercisebull 119884 is having a diseasebull 119880 is lifestylebull 119883 is smokingbull Big assumption here no arrow from 119880 to 119884

Whatrsquos the problem with backdoorpaths

119863

119880 119883

119884

bull A path is blocked if1 we control for or stratify a non-collider on that path OR2 we do not control for a collider

bull Unblocked backdoor paths confoundingbull In the DAG here if we condition on 119883 then the backdoor

path is blocked

Not all backdoor paths

119863

1198801113568119883119883

119884

bull Conditioning on the posttreatment covariates opens thenon-causal path

selection bias

M-bias

119863

1198801113568 1198801113569119883119883

119884

bull Not all backdoor paths induce confoundingbull This backdoor path is blocked by the collider 119883119894 that we donrsquot

control forbull If we control for 119883119894 opens the path and induces

confounding Sometimes called M-bias

bull Controversial because of differing views on what to control for Rubin thinks that M-bias is a ldquomathematical curiosityrdquo and we

should control for all pretreatment variables Pearl and others think M-bias is a real threat

Backdoor criterion

bull Can we use a DAG to evaluate no unmeasured confoundersbull Pearl answered yes with the backdoor criterion which states

that the effect of 119863 on 119884 is identified if1 No backdoor paths from 119863 to 119884 OR2 Measured covariates are sufficient to block all backdoor paths

from 119863 to 119884

bull First is really only valid for randomized experimentsbull The backdoor criterion is fairly powerful Tells us

if there confounding given this DAG if it is possible to removing the confounding and what variables to condition on to eliminate the confounding

SWIGs

119863 | 119889 119884(119889)

119880 119883

119884

bull Itrsquos a little hard to see how the backdoor criterion implies nounmeasured confounders

No potential outcomes on this graphbull Richardson and Robins Single World Intervention Graphs

Split 119863 node into natural value (119863) and intervention value 119889 Let all effects of 119863 take their potential value under

intervention 119884(119889)bull Now can see are 119863 and 119884(119889) related

119863 larr 119880 rarr 119883 rarr 119884(119889) implies not independent Conditioning on 119883 blocks that backdoor path 119863 ⟂⟂ 119884(119889)|119883

No unmeasured confounders is nottestable

bull No unmeasured confounding places no restrictions on theobserved data

1114100119884119894(0)|119863119894 = 1119883119894111410311140591113840111384011138401113840111384011138401113840111384011138401113840111406011138401113840111384011138401113840111384011138401113840111384011138401114061unobserved

119889= 1114100119884119894(0)|119863119894 = 0119883119894111410311140591113840111384011138401113840111384011138401113840111384011138401113840111406011138401113840111384011138401113840111384011138401113840111384011138401114061observed

bull Here 119889= means equal in distributionbull No way to directly test this assumption without the

counterfactual data which is missing by definitionbull With backdoor criterion you must have the correct DAG

Assessing no unmeasured confounders

bull Can do ldquoplacebordquo tests where 119863119894 cannot have an effect(lagged outcomes etc)

bull Della Vigna and Kaplan (2007 QJE) effect of Fox Newsavailability on Republican vote share

Availability in 20002003 canrsquot affect past vote sharesbull Unconfoundedness could still be violated even if you pass this

test

Alternatives to no unmeasuredconfounding

bull Without explicit randomization we need some way ofidentifying causal effects

bull No unmeasured confounders asymp randomized experiment Indentification results very similar to experiments

bull With unmeasured confounding are we doomed Maybe notbull Other approaches rely on finding plausibly exogenous variation

in assignment of 119863119894 Instrumental variables (randomization + exclusion restriction) Over-time variation (diff-in-diff fixed effects) Arbitrary thresholds for treatment assignment (RDD)

3 No unmeasuredconfounders and OLS

Justifying regression

bull We know how randomized experiments imply thatdifferences-in-means identify the ATE

bull In the next few weeks wersquoll work through how no unmeasuredconfounding justifies a number of estimation strategies

bull Today itrsquos useful to walk through what no unmeasuredconfounding can buy us in a familiar setting OLS

Wersquoll cover regression more formally later

Constant effects set up

bull Assume a constant effects setup

119884119894(0) = 120572 + 119883prime119894 120573 + 119906119894

119884119894(1) = 120572 + 120591 + 119883prime119894 120573 + 119906119894

bull Constant effects because 119884119894(1) minus 119884119894(0) = 120591 for all unitsbull Use consistency to get the usual regression formula

119884119894 = 119884119894(1)119863119894 + 119884119894(0)(1 minus 119863119894)= 119884119894(0) + 1114100119884119894(1) minus 119884119894(0)1114103 sdot 119863119894

= 120572 + 120591 sdot 119863119894 + 119883prime119894 120573 + 119906119894

bull Does no unmeasured confounding help us identify the causalparameter 120591

Regression on residuals

bull First estimate the residuals of regression of the treatment andoutcome on the covariates

119894 = 119884119894 minus 120124[119884119894|119883119894]119894 = 119863119894 minus 120124[119863119894|119883119894]

bull Running a regression of 119894 on 119894 is equivalent to controllingfor 119883119894

119884119894 = 120572 + 120591 sdot 119863119894 + 119883prime119894 120573 + 119906119894

119894 = 120572 + 120591 sdot 119894 + 119894bull Here 119894 = 119906119894 minus 120124[119906119894|119883119894]

What does OLS estimate

bull Using the usual OLS theory we can show that the probabilitylimit of the OLS estimator of 120591 is

plim 111369611136931113700 =Cov(119894 119894)Var(119894)

= Cov(119894 120572 + 120591 sdot 119894 + 119894)Var(119894)

= 120591 sdot Cov(119894 119894) + Cov(119894 119894)Var(119894)

= 120591 + Cov(119894 119894)Var(119894)

Key OLS assumption

plim 111369611136931113700 = 120591 + Cov(119894 119894)Var(119894)

bull Key identification comes from Cov(119894 119894) = 0 Conditional on 119883119894 no relationship between 119863119894 and 119906119894

bull Note 119906119894 is a function of 119883119894 and 119884119894(119889) 119906119894 = 119884119894(0) minus 120572 minus 119883prime

119894 120573 when 119863119894 = 0 119906119894 = 119884119894(1) minus 120572 minus 120591 minus 119883prime

119894 120573 when 119863119894 = 1 condition on 119883119894 only variation in 119906119894 comes from 119884119894(119889)

bull No unmeasured confounding implies this assumption

119863119894 ⟂⟂ 1114100119884119894(1) 119884119894(0)1114103|119883119894 ⟹ 119863119894 ⟂⟂ 119906119894|119883119894 ⟹ Cov(119894 119894) = 0

Omitted variable bias

bull What happens when this is violated Suppose that there isone omitted variable (residualized from 119883119894)

119894 = 120582119894 + 120596119894

bull Wersquoll assume that if we could measure 119871119894 then nounmeasured confounding would hold

bull Leads to inconsistency in the OLS estimator

plim 111369611136931113700 = 120591 + 120582Cov(119894 119894)Var(119894)

bull Bias here is terms multiplied together1 coefficient on 119871119894 (120582)2 the coefficient of regression of 119863119894 on 119871119894 also controlling for 119883119894

4 Estimating causaleffects under nounmeasuredconfounders

Basic approach to estimation

bull Remember the usual approach to estimating the ATE withcovariates

bull Stratification Stratify the units by the covariates Calculate CATE within these strata

bull Standardizationdirect adjustment Average the CATEs across the strata to get ATE

bull How to create strata when 119883 has continuous components If 119883 is discrete with only a few levels can use the exact values

of 119883 Otherwise we may have to subclassifycoarsen the data

Classic example cigarspipes versuscigarettes

bull 119863119894 = 1 for pipecigar smokers 119863119894 = 0 for cigarette smokersbull 119884119894 = death in the first year of follow-upbull Naive positive effect cigarpipe smokers more likely to die

Whatrsquos the confounder here Age Pipecigar smokers much older than cigarette smokers

bull Cochranrsquos approach stratify based on coarsened age Divide age into 119896 strata 119878119894 isin 1199041113568 1199041113569 hellip 119904119896 1199041113568 might be 18-25 1199041113569 might be 26-35 and so on Calculate effect within strata and aggregate

bull Key assumption no unmeasured confounders using stratifiedversion of age

119863119894 ⟂⟂ 1114100119884119894(0) 119884119894(1)1114103|119878119894

Stratification on the propensity score

bull What about when 119883 has has many dimensionsbull Curse of dimensionality there will be very few if any units in

a given stratum of 119883119894bull Stratify on a low-dimensional summary the propensity score

119890(119909) = ℙ[119863119894 = 1|119883119894 = 119909]

PS = unitrsquos probability of being treated conditional on 119883119894

bull For a particular unit this is 119890(119883119894) = ℙ[119863119894 = 1|119883119894]bull Rosenbaum and Rubin (1983) showed that

119863119894 ⟂⟂ 1114100119884119894(0) 119884119894(1)1114103 | 119883119894 ⟹ 119863119894 ⟂⟂ 1114100119884119894(0) 119884119894(1)1114103 | 119890(119883119894)

stratifying on 119890119894 is the same as stratifying on the full 119883119894

Propensity score as balancing score

bull The propensity score is actually a balancing score whichmeans that

119863119894 ⟂⟂ 119883119894 | 119890(119883119894)

bull Conditional on the propensity score treatment is independentof the covariates

Treatment status is said to be balanced 119891(119883119894|119863119894 = 1 119890(119883119894)) = 119891(119883119894|119863119894 = 0 119890(119883119894))

bull Of course we have to know the true PS to have all theseresults work

Estimating the propensity score

bull Of course in observational studies we donrsquot know thepropensity score

bull We would run a parametric model with parameters 120574 toestimate the propensity scores

1 Estimate 2 Create 119894 = Pr[119863119894 = 1|119883119894 ]

bull For instance in R we could easily calculate the propensityscores using the glm function

pscores lt- glm(treat ~ var1 + var2 + var3 data = mydata

family = binomial())$fittedvalues

Propensity score specifics

bull What variables do we include in the propensity score model Any set of variables that blocks all the backdoor paths from 119863119894

to 119884119894

bull Check balance within strata of 119894 Covariates should bebalanced

119891(119883119894|119863119894 = 1 119894) = 119891(119883119894|119863119894 = 0 119894)

bull Can also use automatednonparametric tools for estimating 119894 Covariate Balancing Propensity Scores (Imai and Ratkovic)

Stratifying by the propensity score

bull How will we use the propensity score Matching (next week) Weighting (two weeks) Regression

(three weeks)bull Today coarsening the propensity score and stratifyingbull Choose boundary points 0 = 1198871113567 lt 1198871113568 lt hellip lt 119887119870minus1113568 lt 119887119870 = 1bull Create block indicators

119861119894(119896) =

⎧⎪⎪⎨⎪⎪⎩1 if 119887119896minus1113568 lt (119883119894) lt 1198871198960 otherwise

bull Calculate within-strata effect estimates

120591119896 = 120124[119884119894|119863119894 = 1 119861119894(119896) = 1] minus 120124[119884119894|119863119894 = 0 119861119894(119896) = 1]

Standardizationdirect adjustment

bull We calculated the CATEs for each strata of the PS 120591119896bull We can use law of iterated expectations to back out the ATEbull Take the average of the CATEs over the distribution of 119883

120591 =1198701114012119896=1113568

120591119896ℙ[119861119894(119896) = 1]

bull Note that ℙ[119861119894(119896) = 1] is just the proportion of units in block 119896

ℙ[119861119894(119896) = 1] =sum119873

119894=1113568 119861119894(119896)119873

5 Wrapping Up

Summary

bull Defined observational studiesbull Defined confounding and assessed when no unmeasured

confounding holdsbull Saw how no unmeasured confounding helps with OLSbull Saw how to estimate causal effects under no unmeasured

confounding using the propensity score

Next few weeks

bull Learn how to estimate causal effects under no unmeasuredconfounders via

Matching Weighting Regression

bull Then we move onto situations where no unmeasuredconfounders is violated

  • Observational studies
  • Confounding
  • No unmeasured confounders and OLS
  • Estimating causal effects under no unmeasured confounders
  • Wrapping Up
Page 3: Gov 2002: 4. Observational Studies and Confounding · 2019-07-23 · 2. the coefficient of regression of𝐷𝑖on 𝐿𝑖also controlling for ... Basicapproachtoestimation • Remember

1 Observationalstudies

Experiment review

bull An experiment is a study where assignment to treatment iscontrolled by the researcher

119901119894 = ℙ[119863119894 = 1] be the probability of treatment assignmentprobability

119901119894 is controlled and known by researcher in an experimentbull A randomized experiment is an experiment with the following

properties

1 Positivity assignment is probabilistic 0 lt 119901119894 lt 1 No deterministic assignment

2 Unconfoundedness ℙ[119863119894 = 1|119832(1) 119832(0)] = ℙ[119863119894 = 1] Treatment assignment does not depend on any potential

outcomes Sometimes written as 119863119894 ⟂⟂ (119832(1) 119832(0))

Observational studies

bull Many different sets of identification assumptions that wersquollcover

bull To start focus on studies that are similar to experiments justwithout a known and controlled treatment assignment

No guarantee that the treatment and control groups arecomparable

1 Positivity assignment is probabilistic0 lt ℙ[119863119894 = 1|119831 119832(1) 119832(0)] lt 1

2 No unmeasured confoundingℙ[119863119894 = 1|119831 119832(1) 119832(0)] = ℙ[119863119894 = 1|119831]

For some observed 119831 Also called unconfoundedness ignorability selection on

observables no omitted variables exogenous conditionalexchangeable etc

Designing observational studies

bull Rubin (2008) argues that we should still ldquodesignrdquo ourobservational studies

Pick the ideal experiment to this observational study Hide the outcome data Try to estimate the randomization procedure Analyze this as an experiment with this estimated procedure

bull Tries to minimize ldquosnoopingrdquo by picking the best modelingstrategy before seeing the outcome

Discrete covariates

bull Suppose that we knew that 119863119894 was unconfounded within levelsof a binary 119883119894

bull Then we could always estimate the causal effect using iteratedexpectations as in a stratified randomized experiment

1201241198831114107120124[119884119894|119863119894 = 1119883119894] minus 120124[119884119894|119863119894 = 0119883119894]1114110

= 1114101120124[119884119894|119863119894 = 1119883119894 = 1] minus 120124[119884119894|119863119894 = 0119883119894 = 1]1114104111405911138401113840111384011138401113840111384011138401113840111384011138401113840111384011138401113840111384011138401113840111384011138401113840111384011138401113840111384011138401113840111384011138401113840111384011140601113840111384011138401113840111384011138401113840111384011138401113840111384011138401113840111384011138401113840111384011138401113840111384011138401113840111384011138401113840111384011138401113840111384011138401114061

diff-in-means for 119883119894=1113568

ℙ[119883119894 = 1]11140591113840111384011138401113840111406011138401113840111384011138401114061share of 119883119894=1113568

+ 1114101120124[119884119894|119863119894 = 1119883119894 = 0] minus 120124[119884119894|119863119894 = 0119883119894 = 0]1114104111405911138401113840111384011138401113840111384011138401113840111384011138401113840111384011138401113840111384011138401113840111384011138401113840111384011138401113840111384011138401113840111384011138401113840111384011140601113840111384011138401113840111384011138401113840111384011138401113840111384011138401113840111384011138401113840111384011138401113840111384011138401113840111384011138401113840111384011138401113840111384011138401114061

diff-in-means for 119883119894=1113567

ℙ[119883119894 = 0]11140591113840111384011138401113840111406011138401113840111384011138401114061share of 119883119894=1113567

bull Never used our knowledge of the randomization for thisquantity

Continuous covariates

bull So great we can stratify Why not do this all the timebull What if 119883119894 = income for unit 119894

Each unit has its own value of 119883119894 $54134 $123043 $23842 If 119883119894 = 54134 is unique will only observe 1 of these

120124[119884119894|119863119894 = 1119883119894 = 54134] minus 120124[119884119894|119863119894 = 0119883119894 = 54134]

cannot stratify to each unique value of 119883119894bull Practically this is massively important almost always have

data with unique values

Going to a superpopulation

bull From here on out wersquoll focus less on the finite populationmodel

Harder with (functionally) continuous covariatesbull Assume that each unit 119894 is drawn from an infinite

superpopulation implies that (119884119894(0) 119884119894(1) 119863119894 119883119894) are a draw from their

population joint distributionbull Potential outcomes are now typical random variables

120583119888(119909) = 120124[119884119894(0)|119883119894 = 119909] and 120583119905(119909) = 120124[119884119894(1)|119883119894 = 119909] 1205901113569119888 (119909) = 120141[119884119894(0)|119883119894 = 119909] and 1205901113569119905 (119909) = 120141[119884119894(1)|119883119894 = 119909] 120591 = 120124[120583119905(119909) minus 120583119888(119909)|119883119894 = 119909]

Assumptions in the superpopulation

bull With an infinite superpopulation worry less aboutconditioning on the entire sample

Units are now independent due to random sampling from aninfinite population

bull No unmeasured confoudning implies that

ℙ(119863119894 = 1|119884119894(0) 119884119894(1) 119883119894) = ℙ(119863119894 = 1|119883119894)

bull Or written using conditional independence

119863119894 ⟂⟂ 1114100119884119894(0) 119884119894(1)1114103|119883119894

bull Positivity can be written 0 lt ℙ(119863119894 = 1|119883119894 = 119909) lt 1 for all 119909 inthe support of 119883119894

2 Confounding

What is confounding

bull Confounding is the bias caused by common causes of thetreatment and outcome

Leads to ldquospurious correlationrdquobull In observational studies the goal is to avoid confounding

inherent in the databull Pervasive in the social sciences

effect of income on voting (confounding age) effect of job training program on employment (confounding

motivation) effect of political institutions on economic development

(confounding previous economic development)bull No unmeasured confounding assumes that wersquove measured all

sources of confounding

Big problem

bull How can we determine if no unmeasured confounding holds ifwe didnrsquot assign the treatment

bull Put differently What covariates do we need to condition on What covariates do we need to match on What covaraites do we need to include in our regressions

bull One way from the assumption itself ℙ[119863119894 = 1|119831 119832(1) 119832(0)] = ℙ[119863119894 = 1|119831] Include covariates such that conditional on them the

treatment assignment does not depend on the potentialoutcomes

bull Another way use DAGs and look at back-door paths

Backdoor paths and blocking paths

bull Backdoor path is a non-causal path from 119863 to 119884 Would remain if we removed any arrows pointing out of 119863

bull Backdoor paths between 119863 and 119884 common causes of 119863and 119884

119863

119883

119884

bull Here there is a backdoor path 119863 larr 119883 rarr 119884 where 119883 is acommon cause for the treatment and the outcome

Other types of confounding

119863

119880 119883

119884

bull 119863 is enrolling in a job training programbull 119884 is getting a jobbull 119880 is being motivatedbull 119883 is number of job applications sent outbull Big assumption here no arrow from 119880 to 119884

Other types of confounding

119863

119880 119883

119884

bull 119863 is exercisebull 119884 is having a diseasebull 119880 is lifestylebull 119883 is smokingbull Big assumption here no arrow from 119880 to 119884

Whatrsquos the problem with backdoorpaths

119863

119880 119883

119884

bull A path is blocked if1 we control for or stratify a non-collider on that path OR2 we do not control for a collider

bull Unblocked backdoor paths confoundingbull In the DAG here if we condition on 119883 then the backdoor

path is blocked

Not all backdoor paths

119863

1198801113568119883119883

119884

bull Conditioning on the posttreatment covariates opens thenon-causal path

selection bias

M-bias

119863

1198801113568 1198801113569119883119883

119884

bull Not all backdoor paths induce confoundingbull This backdoor path is blocked by the collider 119883119894 that we donrsquot

control forbull If we control for 119883119894 opens the path and induces

confounding Sometimes called M-bias

bull Controversial because of differing views on what to control for Rubin thinks that M-bias is a ldquomathematical curiosityrdquo and we

should control for all pretreatment variables Pearl and others think M-bias is a real threat

Backdoor criterion

bull Can we use a DAG to evaluate no unmeasured confoundersbull Pearl answered yes with the backdoor criterion which states

that the effect of 119863 on 119884 is identified if1 No backdoor paths from 119863 to 119884 OR2 Measured covariates are sufficient to block all backdoor paths

from 119863 to 119884

bull First is really only valid for randomized experimentsbull The backdoor criterion is fairly powerful Tells us

if there confounding given this DAG if it is possible to removing the confounding and what variables to condition on to eliminate the confounding

SWIGs

119863 | 119889 119884(119889)

119880 119883

119884

bull Itrsquos a little hard to see how the backdoor criterion implies nounmeasured confounders

No potential outcomes on this graphbull Richardson and Robins Single World Intervention Graphs

Split 119863 node into natural value (119863) and intervention value 119889 Let all effects of 119863 take their potential value under

intervention 119884(119889)bull Now can see are 119863 and 119884(119889) related

119863 larr 119880 rarr 119883 rarr 119884(119889) implies not independent Conditioning on 119883 blocks that backdoor path 119863 ⟂⟂ 119884(119889)|119883

No unmeasured confounders is nottestable

bull No unmeasured confounding places no restrictions on theobserved data

1114100119884119894(0)|119863119894 = 1119883119894111410311140591113840111384011138401113840111384011138401113840111384011138401113840111406011138401113840111384011138401113840111384011138401113840111384011138401114061unobserved

119889= 1114100119884119894(0)|119863119894 = 0119883119894111410311140591113840111384011138401113840111384011138401113840111384011138401113840111406011138401113840111384011138401113840111384011138401113840111384011138401114061observed

bull Here 119889= means equal in distributionbull No way to directly test this assumption without the

counterfactual data which is missing by definitionbull With backdoor criterion you must have the correct DAG

Assessing no unmeasured confounders

bull Can do ldquoplacebordquo tests where 119863119894 cannot have an effect(lagged outcomes etc)

bull Della Vigna and Kaplan (2007 QJE) effect of Fox Newsavailability on Republican vote share

Availability in 20002003 canrsquot affect past vote sharesbull Unconfoundedness could still be violated even if you pass this

test

Alternatives to no unmeasuredconfounding

bull Without explicit randomization we need some way ofidentifying causal effects

bull No unmeasured confounders asymp randomized experiment Indentification results very similar to experiments

bull With unmeasured confounding are we doomed Maybe notbull Other approaches rely on finding plausibly exogenous variation

in assignment of 119863119894 Instrumental variables (randomization + exclusion restriction) Over-time variation (diff-in-diff fixed effects) Arbitrary thresholds for treatment assignment (RDD)

3 No unmeasuredconfounders and OLS

Justifying regression

bull We know how randomized experiments imply thatdifferences-in-means identify the ATE

bull In the next few weeks wersquoll work through how no unmeasuredconfounding justifies a number of estimation strategies

bull Today itrsquos useful to walk through what no unmeasuredconfounding can buy us in a familiar setting OLS

Wersquoll cover regression more formally later

Constant effects set up

bull Assume a constant effects setup

119884119894(0) = 120572 + 119883prime119894 120573 + 119906119894

119884119894(1) = 120572 + 120591 + 119883prime119894 120573 + 119906119894

bull Constant effects because 119884119894(1) minus 119884119894(0) = 120591 for all unitsbull Use consistency to get the usual regression formula

119884119894 = 119884119894(1)119863119894 + 119884119894(0)(1 minus 119863119894)= 119884119894(0) + 1114100119884119894(1) minus 119884119894(0)1114103 sdot 119863119894

= 120572 + 120591 sdot 119863119894 + 119883prime119894 120573 + 119906119894

bull Does no unmeasured confounding help us identify the causalparameter 120591

Regression on residuals

bull First estimate the residuals of regression of the treatment andoutcome on the covariates

119894 = 119884119894 minus 120124[119884119894|119883119894]119894 = 119863119894 minus 120124[119863119894|119883119894]

bull Running a regression of 119894 on 119894 is equivalent to controllingfor 119883119894

119884119894 = 120572 + 120591 sdot 119863119894 + 119883prime119894 120573 + 119906119894

119894 = 120572 + 120591 sdot 119894 + 119894bull Here 119894 = 119906119894 minus 120124[119906119894|119883119894]

What does OLS estimate

bull Using the usual OLS theory we can show that the probabilitylimit of the OLS estimator of 120591 is

plim 111369611136931113700 =Cov(119894 119894)Var(119894)

= Cov(119894 120572 + 120591 sdot 119894 + 119894)Var(119894)

= 120591 sdot Cov(119894 119894) + Cov(119894 119894)Var(119894)

= 120591 + Cov(119894 119894)Var(119894)

Key OLS assumption

plim 111369611136931113700 = 120591 + Cov(119894 119894)Var(119894)

bull Key identification comes from Cov(119894 119894) = 0 Conditional on 119883119894 no relationship between 119863119894 and 119906119894

bull Note 119906119894 is a function of 119883119894 and 119884119894(119889) 119906119894 = 119884119894(0) minus 120572 minus 119883prime

119894 120573 when 119863119894 = 0 119906119894 = 119884119894(1) minus 120572 minus 120591 minus 119883prime

119894 120573 when 119863119894 = 1 condition on 119883119894 only variation in 119906119894 comes from 119884119894(119889)

bull No unmeasured confounding implies this assumption

119863119894 ⟂⟂ 1114100119884119894(1) 119884119894(0)1114103|119883119894 ⟹ 119863119894 ⟂⟂ 119906119894|119883119894 ⟹ Cov(119894 119894) = 0

Omitted variable bias

bull What happens when this is violated Suppose that there isone omitted variable (residualized from 119883119894)

119894 = 120582119894 + 120596119894

bull Wersquoll assume that if we could measure 119871119894 then nounmeasured confounding would hold

bull Leads to inconsistency in the OLS estimator

plim 111369611136931113700 = 120591 + 120582Cov(119894 119894)Var(119894)

bull Bias here is terms multiplied together1 coefficient on 119871119894 (120582)2 the coefficient of regression of 119863119894 on 119871119894 also controlling for 119883119894

4 Estimating causaleffects under nounmeasuredconfounders

Basic approach to estimation

bull Remember the usual approach to estimating the ATE withcovariates

bull Stratification Stratify the units by the covariates Calculate CATE within these strata

bull Standardizationdirect adjustment Average the CATEs across the strata to get ATE

bull How to create strata when 119883 has continuous components If 119883 is discrete with only a few levels can use the exact values

of 119883 Otherwise we may have to subclassifycoarsen the data

Classic example cigarspipes versuscigarettes

bull 119863119894 = 1 for pipecigar smokers 119863119894 = 0 for cigarette smokersbull 119884119894 = death in the first year of follow-upbull Naive positive effect cigarpipe smokers more likely to die

Whatrsquos the confounder here Age Pipecigar smokers much older than cigarette smokers

bull Cochranrsquos approach stratify based on coarsened age Divide age into 119896 strata 119878119894 isin 1199041113568 1199041113569 hellip 119904119896 1199041113568 might be 18-25 1199041113569 might be 26-35 and so on Calculate effect within strata and aggregate

bull Key assumption no unmeasured confounders using stratifiedversion of age

119863119894 ⟂⟂ 1114100119884119894(0) 119884119894(1)1114103|119878119894

Stratification on the propensity score

bull What about when 119883 has has many dimensionsbull Curse of dimensionality there will be very few if any units in

a given stratum of 119883119894bull Stratify on a low-dimensional summary the propensity score

119890(119909) = ℙ[119863119894 = 1|119883119894 = 119909]

PS = unitrsquos probability of being treated conditional on 119883119894

bull For a particular unit this is 119890(119883119894) = ℙ[119863119894 = 1|119883119894]bull Rosenbaum and Rubin (1983) showed that

119863119894 ⟂⟂ 1114100119884119894(0) 119884119894(1)1114103 | 119883119894 ⟹ 119863119894 ⟂⟂ 1114100119884119894(0) 119884119894(1)1114103 | 119890(119883119894)

stratifying on 119890119894 is the same as stratifying on the full 119883119894

Propensity score as balancing score

bull The propensity score is actually a balancing score whichmeans that

119863119894 ⟂⟂ 119883119894 | 119890(119883119894)

bull Conditional on the propensity score treatment is independentof the covariates

Treatment status is said to be balanced 119891(119883119894|119863119894 = 1 119890(119883119894)) = 119891(119883119894|119863119894 = 0 119890(119883119894))

bull Of course we have to know the true PS to have all theseresults work

Estimating the propensity score

bull Of course in observational studies we donrsquot know thepropensity score

bull We would run a parametric model with parameters 120574 toestimate the propensity scores

1 Estimate 2 Create 119894 = Pr[119863119894 = 1|119883119894 ]

bull For instance in R we could easily calculate the propensityscores using the glm function

pscores lt- glm(treat ~ var1 + var2 + var3 data = mydata

family = binomial())$fittedvalues

Propensity score specifics

bull What variables do we include in the propensity score model Any set of variables that blocks all the backdoor paths from 119863119894

to 119884119894

bull Check balance within strata of 119894 Covariates should bebalanced

119891(119883119894|119863119894 = 1 119894) = 119891(119883119894|119863119894 = 0 119894)

bull Can also use automatednonparametric tools for estimating 119894 Covariate Balancing Propensity Scores (Imai and Ratkovic)

Stratifying by the propensity score

bull How will we use the propensity score Matching (next week) Weighting (two weeks) Regression

(three weeks)bull Today coarsening the propensity score and stratifyingbull Choose boundary points 0 = 1198871113567 lt 1198871113568 lt hellip lt 119887119870minus1113568 lt 119887119870 = 1bull Create block indicators

119861119894(119896) =

⎧⎪⎪⎨⎪⎪⎩1 if 119887119896minus1113568 lt (119883119894) lt 1198871198960 otherwise

bull Calculate within-strata effect estimates

120591119896 = 120124[119884119894|119863119894 = 1 119861119894(119896) = 1] minus 120124[119884119894|119863119894 = 0 119861119894(119896) = 1]

Standardizationdirect adjustment

bull We calculated the CATEs for each strata of the PS 120591119896bull We can use law of iterated expectations to back out the ATEbull Take the average of the CATEs over the distribution of 119883

120591 =1198701114012119896=1113568

120591119896ℙ[119861119894(119896) = 1]

bull Note that ℙ[119861119894(119896) = 1] is just the proportion of units in block 119896

ℙ[119861119894(119896) = 1] =sum119873

119894=1113568 119861119894(119896)119873

5 Wrapping Up

Summary

bull Defined observational studiesbull Defined confounding and assessed when no unmeasured

confounding holdsbull Saw how no unmeasured confounding helps with OLSbull Saw how to estimate causal effects under no unmeasured

confounding using the propensity score

Next few weeks

bull Learn how to estimate causal effects under no unmeasuredconfounders via

Matching Weighting Regression

bull Then we move onto situations where no unmeasuredconfounders is violated

  • Observational studies
  • Confounding
  • No unmeasured confounders and OLS
  • Estimating causal effects under no unmeasured confounders
  • Wrapping Up
Page 4: Gov 2002: 4. Observational Studies and Confounding · 2019-07-23 · 2. the coefficient of regression of𝐷𝑖on 𝐿𝑖also controlling for ... Basicapproachtoestimation • Remember

Experiment review

bull An experiment is a study where assignment to treatment iscontrolled by the researcher

119901119894 = ℙ[119863119894 = 1] be the probability of treatment assignmentprobability

119901119894 is controlled and known by researcher in an experimentbull A randomized experiment is an experiment with the following

properties

1 Positivity assignment is probabilistic 0 lt 119901119894 lt 1 No deterministic assignment

2 Unconfoundedness ℙ[119863119894 = 1|119832(1) 119832(0)] = ℙ[119863119894 = 1] Treatment assignment does not depend on any potential

outcomes Sometimes written as 119863119894 ⟂⟂ (119832(1) 119832(0))

Observational studies

bull Many different sets of identification assumptions that wersquollcover

bull To start focus on studies that are similar to experiments justwithout a known and controlled treatment assignment

No guarantee that the treatment and control groups arecomparable

1 Positivity assignment is probabilistic0 lt ℙ[119863119894 = 1|119831 119832(1) 119832(0)] lt 1

2 No unmeasured confoundingℙ[119863119894 = 1|119831 119832(1) 119832(0)] = ℙ[119863119894 = 1|119831]

For some observed 119831 Also called unconfoundedness ignorability selection on

observables no omitted variables exogenous conditionalexchangeable etc

Designing observational studies

bull Rubin (2008) argues that we should still ldquodesignrdquo ourobservational studies

Pick the ideal experiment to this observational study Hide the outcome data Try to estimate the randomization procedure Analyze this as an experiment with this estimated procedure

bull Tries to minimize ldquosnoopingrdquo by picking the best modelingstrategy before seeing the outcome

Discrete covariates

bull Suppose that we knew that 119863119894 was unconfounded within levelsof a binary 119883119894

bull Then we could always estimate the causal effect using iteratedexpectations as in a stratified randomized experiment

1201241198831114107120124[119884119894|119863119894 = 1119883119894] minus 120124[119884119894|119863119894 = 0119883119894]1114110

= 1114101120124[119884119894|119863119894 = 1119883119894 = 1] minus 120124[119884119894|119863119894 = 0119883119894 = 1]1114104111405911138401113840111384011138401113840111384011138401113840111384011138401113840111384011138401113840111384011138401113840111384011138401113840111384011138401113840111384011138401113840111384011138401113840111384011140601113840111384011138401113840111384011138401113840111384011138401113840111384011138401113840111384011138401113840111384011138401113840111384011138401113840111384011138401113840111384011138401113840111384011138401114061

diff-in-means for 119883119894=1113568

ℙ[119883119894 = 1]11140591113840111384011138401113840111406011138401113840111384011138401114061share of 119883119894=1113568

+ 1114101120124[119884119894|119863119894 = 1119883119894 = 0] minus 120124[119884119894|119863119894 = 0119883119894 = 0]1114104111405911138401113840111384011138401113840111384011138401113840111384011138401113840111384011138401113840111384011138401113840111384011138401113840111384011138401113840111384011138401113840111384011138401113840111384011140601113840111384011138401113840111384011138401113840111384011138401113840111384011138401113840111384011138401113840111384011138401113840111384011138401113840111384011138401113840111384011138401113840111384011138401114061

diff-in-means for 119883119894=1113567

ℙ[119883119894 = 0]11140591113840111384011138401113840111406011138401113840111384011138401114061share of 119883119894=1113567

bull Never used our knowledge of the randomization for thisquantity

Continuous covariates

bull So great we can stratify Why not do this all the timebull What if 119883119894 = income for unit 119894

Each unit has its own value of 119883119894 $54134 $123043 $23842 If 119883119894 = 54134 is unique will only observe 1 of these

120124[119884119894|119863119894 = 1119883119894 = 54134] minus 120124[119884119894|119863119894 = 0119883119894 = 54134]

cannot stratify to each unique value of 119883119894bull Practically this is massively important almost always have

data with unique values

Going to a superpopulation

bull From here on out wersquoll focus less on the finite populationmodel

Harder with (functionally) continuous covariatesbull Assume that each unit 119894 is drawn from an infinite

superpopulation implies that (119884119894(0) 119884119894(1) 119863119894 119883119894) are a draw from their

population joint distributionbull Potential outcomes are now typical random variables

120583119888(119909) = 120124[119884119894(0)|119883119894 = 119909] and 120583119905(119909) = 120124[119884119894(1)|119883119894 = 119909] 1205901113569119888 (119909) = 120141[119884119894(0)|119883119894 = 119909] and 1205901113569119905 (119909) = 120141[119884119894(1)|119883119894 = 119909] 120591 = 120124[120583119905(119909) minus 120583119888(119909)|119883119894 = 119909]

Assumptions in the superpopulation

bull With an infinite superpopulation worry less aboutconditioning on the entire sample

Units are now independent due to random sampling from aninfinite population

bull No unmeasured confoudning implies that

ℙ(119863119894 = 1|119884119894(0) 119884119894(1) 119883119894) = ℙ(119863119894 = 1|119883119894)

bull Or written using conditional independence

119863119894 ⟂⟂ 1114100119884119894(0) 119884119894(1)1114103|119883119894

bull Positivity can be written 0 lt ℙ(119863119894 = 1|119883119894 = 119909) lt 1 for all 119909 inthe support of 119883119894

2 Confounding

What is confounding

bull Confounding is the bias caused by common causes of thetreatment and outcome

Leads to ldquospurious correlationrdquobull In observational studies the goal is to avoid confounding

inherent in the databull Pervasive in the social sciences

effect of income on voting (confounding age) effect of job training program on employment (confounding

motivation) effect of political institutions on economic development

(confounding previous economic development)bull No unmeasured confounding assumes that wersquove measured all

sources of confounding

Big problem

bull How can we determine if no unmeasured confounding holds ifwe didnrsquot assign the treatment

bull Put differently What covariates do we need to condition on What covariates do we need to match on What covaraites do we need to include in our regressions

bull One way from the assumption itself ℙ[119863119894 = 1|119831 119832(1) 119832(0)] = ℙ[119863119894 = 1|119831] Include covariates such that conditional on them the

treatment assignment does not depend on the potentialoutcomes

bull Another way use DAGs and look at back-door paths

Backdoor paths and blocking paths

bull Backdoor path is a non-causal path from 119863 to 119884 Would remain if we removed any arrows pointing out of 119863

bull Backdoor paths between 119863 and 119884 common causes of 119863and 119884

119863

119883

119884

bull Here there is a backdoor path 119863 larr 119883 rarr 119884 where 119883 is acommon cause for the treatment and the outcome

Other types of confounding

119863

119880 119883

119884

bull 119863 is enrolling in a job training programbull 119884 is getting a jobbull 119880 is being motivatedbull 119883 is number of job applications sent outbull Big assumption here no arrow from 119880 to 119884

Other types of confounding

119863

119880 119883

119884

bull 119863 is exercisebull 119884 is having a diseasebull 119880 is lifestylebull 119883 is smokingbull Big assumption here no arrow from 119880 to 119884

Whatrsquos the problem with backdoorpaths

119863

119880 119883

119884

bull A path is blocked if1 we control for or stratify a non-collider on that path OR2 we do not control for a collider

bull Unblocked backdoor paths confoundingbull In the DAG here if we condition on 119883 then the backdoor

path is blocked

Not all backdoor paths

119863

1198801113568119883119883

119884

bull Conditioning on the posttreatment covariates opens thenon-causal path

selection bias

M-bias

119863

1198801113568 1198801113569119883119883

119884

bull Not all backdoor paths induce confoundingbull This backdoor path is blocked by the collider 119883119894 that we donrsquot

control forbull If we control for 119883119894 opens the path and induces

confounding Sometimes called M-bias

bull Controversial because of differing views on what to control for Rubin thinks that M-bias is a ldquomathematical curiosityrdquo and we

should control for all pretreatment variables Pearl and others think M-bias is a real threat

Backdoor criterion

bull Can we use a DAG to evaluate no unmeasured confoundersbull Pearl answered yes with the backdoor criterion which states

that the effect of 119863 on 119884 is identified if1 No backdoor paths from 119863 to 119884 OR2 Measured covariates are sufficient to block all backdoor paths

from 119863 to 119884

bull First is really only valid for randomized experimentsbull The backdoor criterion is fairly powerful Tells us

if there confounding given this DAG if it is possible to removing the confounding and what variables to condition on to eliminate the confounding

SWIGs

119863 | 119889 119884(119889)

119880 119883

119884

bull Itrsquos a little hard to see how the backdoor criterion implies nounmeasured confounders

No potential outcomes on this graphbull Richardson and Robins Single World Intervention Graphs

Split 119863 node into natural value (119863) and intervention value 119889 Let all effects of 119863 take their potential value under

intervention 119884(119889)bull Now can see are 119863 and 119884(119889) related

119863 larr 119880 rarr 119883 rarr 119884(119889) implies not independent Conditioning on 119883 blocks that backdoor path 119863 ⟂⟂ 119884(119889)|119883

No unmeasured confounders is nottestable

bull No unmeasured confounding places no restrictions on theobserved data

1114100119884119894(0)|119863119894 = 1119883119894111410311140591113840111384011138401113840111384011138401113840111384011138401113840111406011138401113840111384011138401113840111384011138401113840111384011138401114061unobserved

119889= 1114100119884119894(0)|119863119894 = 0119883119894111410311140591113840111384011138401113840111384011138401113840111384011138401113840111406011138401113840111384011138401113840111384011138401113840111384011138401114061observed

bull Here 119889= means equal in distributionbull No way to directly test this assumption without the

counterfactual data which is missing by definitionbull With backdoor criterion you must have the correct DAG

Assessing no unmeasured confounders

bull Can do ldquoplacebordquo tests where 119863119894 cannot have an effect(lagged outcomes etc)

bull Della Vigna and Kaplan (2007 QJE) effect of Fox Newsavailability on Republican vote share

Availability in 20002003 canrsquot affect past vote sharesbull Unconfoundedness could still be violated even if you pass this

test

Alternatives to no unmeasuredconfounding

bull Without explicit randomization we need some way ofidentifying causal effects

bull No unmeasured confounders asymp randomized experiment Indentification results very similar to experiments

bull With unmeasured confounding are we doomed Maybe notbull Other approaches rely on finding plausibly exogenous variation

in assignment of 119863119894 Instrumental variables (randomization + exclusion restriction) Over-time variation (diff-in-diff fixed effects) Arbitrary thresholds for treatment assignment (RDD)

3 No unmeasuredconfounders and OLS

Justifying regression

bull We know how randomized experiments imply thatdifferences-in-means identify the ATE

bull In the next few weeks wersquoll work through how no unmeasuredconfounding justifies a number of estimation strategies

bull Today itrsquos useful to walk through what no unmeasuredconfounding can buy us in a familiar setting OLS

Wersquoll cover regression more formally later

Constant effects set up

bull Assume a constant effects setup

119884119894(0) = 120572 + 119883prime119894 120573 + 119906119894

119884119894(1) = 120572 + 120591 + 119883prime119894 120573 + 119906119894

bull Constant effects because 119884119894(1) minus 119884119894(0) = 120591 for all unitsbull Use consistency to get the usual regression formula

119884119894 = 119884119894(1)119863119894 + 119884119894(0)(1 minus 119863119894)= 119884119894(0) + 1114100119884119894(1) minus 119884119894(0)1114103 sdot 119863119894

= 120572 + 120591 sdot 119863119894 + 119883prime119894 120573 + 119906119894

bull Does no unmeasured confounding help us identify the causalparameter 120591

Regression on residuals

bull First estimate the residuals of regression of the treatment andoutcome on the covariates

119894 = 119884119894 minus 120124[119884119894|119883119894]119894 = 119863119894 minus 120124[119863119894|119883119894]

bull Running a regression of 119894 on 119894 is equivalent to controllingfor 119883119894

119884119894 = 120572 + 120591 sdot 119863119894 + 119883prime119894 120573 + 119906119894

119894 = 120572 + 120591 sdot 119894 + 119894bull Here 119894 = 119906119894 minus 120124[119906119894|119883119894]

What does OLS estimate

bull Using the usual OLS theory we can show that the probabilitylimit of the OLS estimator of 120591 is

plim 111369611136931113700 =Cov(119894 119894)Var(119894)

= Cov(119894 120572 + 120591 sdot 119894 + 119894)Var(119894)

= 120591 sdot Cov(119894 119894) + Cov(119894 119894)Var(119894)

= 120591 + Cov(119894 119894)Var(119894)

Key OLS assumption

plim 111369611136931113700 = 120591 + Cov(119894 119894)Var(119894)

bull Key identification comes from Cov(119894 119894) = 0 Conditional on 119883119894 no relationship between 119863119894 and 119906119894

bull Note 119906119894 is a function of 119883119894 and 119884119894(119889) 119906119894 = 119884119894(0) minus 120572 minus 119883prime

119894 120573 when 119863119894 = 0 119906119894 = 119884119894(1) minus 120572 minus 120591 minus 119883prime

119894 120573 when 119863119894 = 1 condition on 119883119894 only variation in 119906119894 comes from 119884119894(119889)

bull No unmeasured confounding implies this assumption

119863119894 ⟂⟂ 1114100119884119894(1) 119884119894(0)1114103|119883119894 ⟹ 119863119894 ⟂⟂ 119906119894|119883119894 ⟹ Cov(119894 119894) = 0

Omitted variable bias

bull What happens when this is violated Suppose that there isone omitted variable (residualized from 119883119894)

119894 = 120582119894 + 120596119894

bull Wersquoll assume that if we could measure 119871119894 then nounmeasured confounding would hold

bull Leads to inconsistency in the OLS estimator

plim 111369611136931113700 = 120591 + 120582Cov(119894 119894)Var(119894)

bull Bias here is terms multiplied together1 coefficient on 119871119894 (120582)2 the coefficient of regression of 119863119894 on 119871119894 also controlling for 119883119894

4 Estimating causaleffects under nounmeasuredconfounders

Basic approach to estimation

bull Remember the usual approach to estimating the ATE withcovariates

bull Stratification Stratify the units by the covariates Calculate CATE within these strata

bull Standardizationdirect adjustment Average the CATEs across the strata to get ATE

bull How to create strata when 119883 has continuous components If 119883 is discrete with only a few levels can use the exact values

of 119883 Otherwise we may have to subclassifycoarsen the data

Classic example cigarspipes versuscigarettes

bull 119863119894 = 1 for pipecigar smokers 119863119894 = 0 for cigarette smokersbull 119884119894 = death in the first year of follow-upbull Naive positive effect cigarpipe smokers more likely to die

Whatrsquos the confounder here Age Pipecigar smokers much older than cigarette smokers

bull Cochranrsquos approach stratify based on coarsened age Divide age into 119896 strata 119878119894 isin 1199041113568 1199041113569 hellip 119904119896 1199041113568 might be 18-25 1199041113569 might be 26-35 and so on Calculate effect within strata and aggregate

bull Key assumption no unmeasured confounders using stratifiedversion of age

119863119894 ⟂⟂ 1114100119884119894(0) 119884119894(1)1114103|119878119894

Stratification on the propensity score

bull What about when 119883 has has many dimensionsbull Curse of dimensionality there will be very few if any units in

a given stratum of 119883119894bull Stratify on a low-dimensional summary the propensity score

119890(119909) = ℙ[119863119894 = 1|119883119894 = 119909]

PS = unitrsquos probability of being treated conditional on 119883119894

bull For a particular unit this is 119890(119883119894) = ℙ[119863119894 = 1|119883119894]bull Rosenbaum and Rubin (1983) showed that

119863119894 ⟂⟂ 1114100119884119894(0) 119884119894(1)1114103 | 119883119894 ⟹ 119863119894 ⟂⟂ 1114100119884119894(0) 119884119894(1)1114103 | 119890(119883119894)

stratifying on 119890119894 is the same as stratifying on the full 119883119894

Propensity score as balancing score

bull The propensity score is actually a balancing score whichmeans that

119863119894 ⟂⟂ 119883119894 | 119890(119883119894)

bull Conditional on the propensity score treatment is independentof the covariates

Treatment status is said to be balanced 119891(119883119894|119863119894 = 1 119890(119883119894)) = 119891(119883119894|119863119894 = 0 119890(119883119894))

bull Of course we have to know the true PS to have all theseresults work

Estimating the propensity score

bull Of course in observational studies we donrsquot know thepropensity score

bull We would run a parametric model with parameters 120574 toestimate the propensity scores

1 Estimate 2 Create 119894 = Pr[119863119894 = 1|119883119894 ]

bull For instance in R we could easily calculate the propensityscores using the glm function

pscores lt- glm(treat ~ var1 + var2 + var3 data = mydata

family = binomial())$fittedvalues

Propensity score specifics

bull What variables do we include in the propensity score model Any set of variables that blocks all the backdoor paths from 119863119894

to 119884119894

bull Check balance within strata of 119894 Covariates should bebalanced

119891(119883119894|119863119894 = 1 119894) = 119891(119883119894|119863119894 = 0 119894)

bull Can also use automatednonparametric tools for estimating 119894 Covariate Balancing Propensity Scores (Imai and Ratkovic)

Stratifying by the propensity score

bull How will we use the propensity score Matching (next week) Weighting (two weeks) Regression

(three weeks)bull Today coarsening the propensity score and stratifyingbull Choose boundary points 0 = 1198871113567 lt 1198871113568 lt hellip lt 119887119870minus1113568 lt 119887119870 = 1bull Create block indicators

119861119894(119896) =

⎧⎪⎪⎨⎪⎪⎩1 if 119887119896minus1113568 lt (119883119894) lt 1198871198960 otherwise

bull Calculate within-strata effect estimates

120591119896 = 120124[119884119894|119863119894 = 1 119861119894(119896) = 1] minus 120124[119884119894|119863119894 = 0 119861119894(119896) = 1]

Standardizationdirect adjustment

bull We calculated the CATEs for each strata of the PS 120591119896bull We can use law of iterated expectations to back out the ATEbull Take the average of the CATEs over the distribution of 119883

120591 =1198701114012119896=1113568

120591119896ℙ[119861119894(119896) = 1]

bull Note that ℙ[119861119894(119896) = 1] is just the proportion of units in block 119896

ℙ[119861119894(119896) = 1] =sum119873

119894=1113568 119861119894(119896)119873

5 Wrapping Up

Summary

bull Defined observational studiesbull Defined confounding and assessed when no unmeasured

confounding holdsbull Saw how no unmeasured confounding helps with OLSbull Saw how to estimate causal effects under no unmeasured

confounding using the propensity score

Next few weeks

bull Learn how to estimate causal effects under no unmeasuredconfounders via

Matching Weighting Regression

bull Then we move onto situations where no unmeasuredconfounders is violated

  • Observational studies
  • Confounding
  • No unmeasured confounders and OLS
  • Estimating causal effects under no unmeasured confounders
  • Wrapping Up
Page 5: Gov 2002: 4. Observational Studies and Confounding · 2019-07-23 · 2. the coefficient of regression of𝐷𝑖on 𝐿𝑖also controlling for ... Basicapproachtoestimation • Remember

Observational studies

bull Many different sets of identification assumptions that wersquollcover

bull To start focus on studies that are similar to experiments justwithout a known and controlled treatment assignment

No guarantee that the treatment and control groups arecomparable

1 Positivity assignment is probabilistic0 lt ℙ[119863119894 = 1|119831 119832(1) 119832(0)] lt 1

2 No unmeasured confoundingℙ[119863119894 = 1|119831 119832(1) 119832(0)] = ℙ[119863119894 = 1|119831]

For some observed 119831 Also called unconfoundedness ignorability selection on

observables no omitted variables exogenous conditionalexchangeable etc

Designing observational studies

bull Rubin (2008) argues that we should still ldquodesignrdquo ourobservational studies

Pick the ideal experiment to this observational study Hide the outcome data Try to estimate the randomization procedure Analyze this as an experiment with this estimated procedure

bull Tries to minimize ldquosnoopingrdquo by picking the best modelingstrategy before seeing the outcome

Discrete covariates

bull Suppose that we knew that 119863119894 was unconfounded within levelsof a binary 119883119894

bull Then we could always estimate the causal effect using iteratedexpectations as in a stratified randomized experiment

1201241198831114107120124[119884119894|119863119894 = 1119883119894] minus 120124[119884119894|119863119894 = 0119883119894]1114110

= 1114101120124[119884119894|119863119894 = 1119883119894 = 1] minus 120124[119884119894|119863119894 = 0119883119894 = 1]1114104111405911138401113840111384011138401113840111384011138401113840111384011138401113840111384011138401113840111384011138401113840111384011138401113840111384011138401113840111384011138401113840111384011138401113840111384011140601113840111384011138401113840111384011138401113840111384011138401113840111384011138401113840111384011138401113840111384011138401113840111384011138401113840111384011138401113840111384011138401113840111384011138401114061

diff-in-means for 119883119894=1113568

ℙ[119883119894 = 1]11140591113840111384011138401113840111406011138401113840111384011138401114061share of 119883119894=1113568

+ 1114101120124[119884119894|119863119894 = 1119883119894 = 0] minus 120124[119884119894|119863119894 = 0119883119894 = 0]1114104111405911138401113840111384011138401113840111384011138401113840111384011138401113840111384011138401113840111384011138401113840111384011138401113840111384011138401113840111384011138401113840111384011138401113840111384011140601113840111384011138401113840111384011138401113840111384011138401113840111384011138401113840111384011138401113840111384011138401113840111384011138401113840111384011138401113840111384011138401113840111384011138401114061

diff-in-means for 119883119894=1113567

ℙ[119883119894 = 0]11140591113840111384011138401113840111406011138401113840111384011138401114061share of 119883119894=1113567

bull Never used our knowledge of the randomization for thisquantity

Continuous covariates

bull So great we can stratify Why not do this all the timebull What if 119883119894 = income for unit 119894

Each unit has its own value of 119883119894 $54134 $123043 $23842 If 119883119894 = 54134 is unique will only observe 1 of these

120124[119884119894|119863119894 = 1119883119894 = 54134] minus 120124[119884119894|119863119894 = 0119883119894 = 54134]

cannot stratify to each unique value of 119883119894bull Practically this is massively important almost always have

data with unique values

Going to a superpopulation

bull From here on out wersquoll focus less on the finite populationmodel

Harder with (functionally) continuous covariatesbull Assume that each unit 119894 is drawn from an infinite

superpopulation implies that (119884119894(0) 119884119894(1) 119863119894 119883119894) are a draw from their

population joint distributionbull Potential outcomes are now typical random variables

120583119888(119909) = 120124[119884119894(0)|119883119894 = 119909] and 120583119905(119909) = 120124[119884119894(1)|119883119894 = 119909] 1205901113569119888 (119909) = 120141[119884119894(0)|119883119894 = 119909] and 1205901113569119905 (119909) = 120141[119884119894(1)|119883119894 = 119909] 120591 = 120124[120583119905(119909) minus 120583119888(119909)|119883119894 = 119909]

Assumptions in the superpopulation

bull With an infinite superpopulation worry less aboutconditioning on the entire sample

Units are now independent due to random sampling from aninfinite population

bull No unmeasured confoudning implies that

ℙ(119863119894 = 1|119884119894(0) 119884119894(1) 119883119894) = ℙ(119863119894 = 1|119883119894)

bull Or written using conditional independence

119863119894 ⟂⟂ 1114100119884119894(0) 119884119894(1)1114103|119883119894

bull Positivity can be written 0 lt ℙ(119863119894 = 1|119883119894 = 119909) lt 1 for all 119909 inthe support of 119883119894

2 Confounding

What is confounding

bull Confounding is the bias caused by common causes of thetreatment and outcome

Leads to ldquospurious correlationrdquobull In observational studies the goal is to avoid confounding

inherent in the databull Pervasive in the social sciences

effect of income on voting (confounding age) effect of job training program on employment (confounding

motivation) effect of political institutions on economic development

(confounding previous economic development)bull No unmeasured confounding assumes that wersquove measured all

sources of confounding

Big problem

bull How can we determine if no unmeasured confounding holds ifwe didnrsquot assign the treatment

bull Put differently What covariates do we need to condition on What covariates do we need to match on What covaraites do we need to include in our regressions

bull One way from the assumption itself ℙ[119863119894 = 1|119831 119832(1) 119832(0)] = ℙ[119863119894 = 1|119831] Include covariates such that conditional on them the

treatment assignment does not depend on the potentialoutcomes

bull Another way use DAGs and look at back-door paths

Backdoor paths and blocking paths

bull Backdoor path is a non-causal path from 119863 to 119884 Would remain if we removed any arrows pointing out of 119863

bull Backdoor paths between 119863 and 119884 common causes of 119863and 119884

119863

119883

119884

bull Here there is a backdoor path 119863 larr 119883 rarr 119884 where 119883 is acommon cause for the treatment and the outcome

Other types of confounding

119863

119880 119883

119884

bull 119863 is enrolling in a job training programbull 119884 is getting a jobbull 119880 is being motivatedbull 119883 is number of job applications sent outbull Big assumption here no arrow from 119880 to 119884

Other types of confounding

119863

119880 119883

119884

bull 119863 is exercisebull 119884 is having a diseasebull 119880 is lifestylebull 119883 is smokingbull Big assumption here no arrow from 119880 to 119884

Whatrsquos the problem with backdoorpaths

119863

119880 119883

119884

bull A path is blocked if1 we control for or stratify a non-collider on that path OR2 we do not control for a collider

bull Unblocked backdoor paths confoundingbull In the DAG here if we condition on 119883 then the backdoor

path is blocked

Not all backdoor paths

119863

1198801113568119883119883

119884

bull Conditioning on the posttreatment covariates opens thenon-causal path

selection bias

M-bias

119863

1198801113568 1198801113569119883119883

119884

bull Not all backdoor paths induce confoundingbull This backdoor path is blocked by the collider 119883119894 that we donrsquot

control forbull If we control for 119883119894 opens the path and induces

confounding Sometimes called M-bias

bull Controversial because of differing views on what to control for Rubin thinks that M-bias is a ldquomathematical curiosityrdquo and we

should control for all pretreatment variables Pearl and others think M-bias is a real threat

Backdoor criterion

bull Can we use a DAG to evaluate no unmeasured confoundersbull Pearl answered yes with the backdoor criterion which states

that the effect of 119863 on 119884 is identified if1 No backdoor paths from 119863 to 119884 OR2 Measured covariates are sufficient to block all backdoor paths

from 119863 to 119884

bull First is really only valid for randomized experimentsbull The backdoor criterion is fairly powerful Tells us

if there confounding given this DAG if it is possible to removing the confounding and what variables to condition on to eliminate the confounding

SWIGs

119863 | 119889 119884(119889)

119880 119883

119884

bull Itrsquos a little hard to see how the backdoor criterion implies nounmeasured confounders

No potential outcomes on this graphbull Richardson and Robins Single World Intervention Graphs

Split 119863 node into natural value (119863) and intervention value 119889 Let all effects of 119863 take their potential value under

intervention 119884(119889)bull Now can see are 119863 and 119884(119889) related

119863 larr 119880 rarr 119883 rarr 119884(119889) implies not independent Conditioning on 119883 blocks that backdoor path 119863 ⟂⟂ 119884(119889)|119883

No unmeasured confounders is nottestable

bull No unmeasured confounding places no restrictions on theobserved data

1114100119884119894(0)|119863119894 = 1119883119894111410311140591113840111384011138401113840111384011138401113840111384011138401113840111406011138401113840111384011138401113840111384011138401113840111384011138401114061unobserved

119889= 1114100119884119894(0)|119863119894 = 0119883119894111410311140591113840111384011138401113840111384011138401113840111384011138401113840111406011138401113840111384011138401113840111384011138401113840111384011138401114061observed

bull Here 119889= means equal in distributionbull No way to directly test this assumption without the

counterfactual data which is missing by definitionbull With backdoor criterion you must have the correct DAG

Assessing no unmeasured confounders

bull Can do ldquoplacebordquo tests where 119863119894 cannot have an effect(lagged outcomes etc)

bull Della Vigna and Kaplan (2007 QJE) effect of Fox Newsavailability on Republican vote share

Availability in 20002003 canrsquot affect past vote sharesbull Unconfoundedness could still be violated even if you pass this

test

Alternatives to no unmeasuredconfounding

bull Without explicit randomization we need some way ofidentifying causal effects

bull No unmeasured confounders asymp randomized experiment Indentification results very similar to experiments

bull With unmeasured confounding are we doomed Maybe notbull Other approaches rely on finding plausibly exogenous variation

in assignment of 119863119894 Instrumental variables (randomization + exclusion restriction) Over-time variation (diff-in-diff fixed effects) Arbitrary thresholds for treatment assignment (RDD)

3 No unmeasuredconfounders and OLS

Justifying regression

bull We know how randomized experiments imply thatdifferences-in-means identify the ATE

bull In the next few weeks wersquoll work through how no unmeasuredconfounding justifies a number of estimation strategies

bull Today itrsquos useful to walk through what no unmeasuredconfounding can buy us in a familiar setting OLS

Wersquoll cover regression more formally later

Constant effects set up

bull Assume a constant effects setup

119884119894(0) = 120572 + 119883prime119894 120573 + 119906119894

119884119894(1) = 120572 + 120591 + 119883prime119894 120573 + 119906119894

bull Constant effects because 119884119894(1) minus 119884119894(0) = 120591 for all unitsbull Use consistency to get the usual regression formula

119884119894 = 119884119894(1)119863119894 + 119884119894(0)(1 minus 119863119894)= 119884119894(0) + 1114100119884119894(1) minus 119884119894(0)1114103 sdot 119863119894

= 120572 + 120591 sdot 119863119894 + 119883prime119894 120573 + 119906119894

bull Does no unmeasured confounding help us identify the causalparameter 120591

Regression on residuals

bull First estimate the residuals of regression of the treatment andoutcome on the covariates

119894 = 119884119894 minus 120124[119884119894|119883119894]119894 = 119863119894 minus 120124[119863119894|119883119894]

bull Running a regression of 119894 on 119894 is equivalent to controllingfor 119883119894

119884119894 = 120572 + 120591 sdot 119863119894 + 119883prime119894 120573 + 119906119894

119894 = 120572 + 120591 sdot 119894 + 119894bull Here 119894 = 119906119894 minus 120124[119906119894|119883119894]

What does OLS estimate

bull Using the usual OLS theory we can show that the probabilitylimit of the OLS estimator of 120591 is

plim 111369611136931113700 =Cov(119894 119894)Var(119894)

= Cov(119894 120572 + 120591 sdot 119894 + 119894)Var(119894)

= 120591 sdot Cov(119894 119894) + Cov(119894 119894)Var(119894)

= 120591 + Cov(119894 119894)Var(119894)

Key OLS assumption

plim 111369611136931113700 = 120591 + Cov(119894 119894)Var(119894)

bull Key identification comes from Cov(119894 119894) = 0 Conditional on 119883119894 no relationship between 119863119894 and 119906119894

bull Note 119906119894 is a function of 119883119894 and 119884119894(119889) 119906119894 = 119884119894(0) minus 120572 minus 119883prime

119894 120573 when 119863119894 = 0 119906119894 = 119884119894(1) minus 120572 minus 120591 minus 119883prime

119894 120573 when 119863119894 = 1 condition on 119883119894 only variation in 119906119894 comes from 119884119894(119889)

bull No unmeasured confounding implies this assumption

119863119894 ⟂⟂ 1114100119884119894(1) 119884119894(0)1114103|119883119894 ⟹ 119863119894 ⟂⟂ 119906119894|119883119894 ⟹ Cov(119894 119894) = 0

Omitted variable bias

bull What happens when this is violated Suppose that there isone omitted variable (residualized from 119883119894)

119894 = 120582119894 + 120596119894

bull Wersquoll assume that if we could measure 119871119894 then nounmeasured confounding would hold

bull Leads to inconsistency in the OLS estimator

plim 111369611136931113700 = 120591 + 120582Cov(119894 119894)Var(119894)

bull Bias here is terms multiplied together1 coefficient on 119871119894 (120582)2 the coefficient of regression of 119863119894 on 119871119894 also controlling for 119883119894

4 Estimating causaleffects under nounmeasuredconfounders

Basic approach to estimation

bull Remember the usual approach to estimating the ATE withcovariates

bull Stratification Stratify the units by the covariates Calculate CATE within these strata

bull Standardizationdirect adjustment Average the CATEs across the strata to get ATE

bull How to create strata when 119883 has continuous components If 119883 is discrete with only a few levels can use the exact values

of 119883 Otherwise we may have to subclassifycoarsen the data

Classic example cigarspipes versuscigarettes

bull 119863119894 = 1 for pipecigar smokers 119863119894 = 0 for cigarette smokersbull 119884119894 = death in the first year of follow-upbull Naive positive effect cigarpipe smokers more likely to die

Whatrsquos the confounder here Age Pipecigar smokers much older than cigarette smokers

bull Cochranrsquos approach stratify based on coarsened age Divide age into 119896 strata 119878119894 isin 1199041113568 1199041113569 hellip 119904119896 1199041113568 might be 18-25 1199041113569 might be 26-35 and so on Calculate effect within strata and aggregate

bull Key assumption no unmeasured confounders using stratifiedversion of age

119863119894 ⟂⟂ 1114100119884119894(0) 119884119894(1)1114103|119878119894

Stratification on the propensity score

bull What about when 119883 has has many dimensionsbull Curse of dimensionality there will be very few if any units in

a given stratum of 119883119894bull Stratify on a low-dimensional summary the propensity score

119890(119909) = ℙ[119863119894 = 1|119883119894 = 119909]

PS = unitrsquos probability of being treated conditional on 119883119894

bull For a particular unit this is 119890(119883119894) = ℙ[119863119894 = 1|119883119894]bull Rosenbaum and Rubin (1983) showed that

119863119894 ⟂⟂ 1114100119884119894(0) 119884119894(1)1114103 | 119883119894 ⟹ 119863119894 ⟂⟂ 1114100119884119894(0) 119884119894(1)1114103 | 119890(119883119894)

stratifying on 119890119894 is the same as stratifying on the full 119883119894

Propensity score as balancing score

bull The propensity score is actually a balancing score whichmeans that

119863119894 ⟂⟂ 119883119894 | 119890(119883119894)

bull Conditional on the propensity score treatment is independentof the covariates

Treatment status is said to be balanced 119891(119883119894|119863119894 = 1 119890(119883119894)) = 119891(119883119894|119863119894 = 0 119890(119883119894))

bull Of course we have to know the true PS to have all theseresults work

Estimating the propensity score

bull Of course in observational studies we donrsquot know thepropensity score

bull We would run a parametric model with parameters 120574 toestimate the propensity scores

1 Estimate 2 Create 119894 = Pr[119863119894 = 1|119883119894 ]

bull For instance in R we could easily calculate the propensityscores using the glm function

pscores lt- glm(treat ~ var1 + var2 + var3 data = mydata

family = binomial())$fittedvalues

Propensity score specifics

bull What variables do we include in the propensity score model Any set of variables that blocks all the backdoor paths from 119863119894

to 119884119894

bull Check balance within strata of 119894 Covariates should bebalanced

119891(119883119894|119863119894 = 1 119894) = 119891(119883119894|119863119894 = 0 119894)

bull Can also use automatednonparametric tools for estimating 119894 Covariate Balancing Propensity Scores (Imai and Ratkovic)

Stratifying by the propensity score

bull How will we use the propensity score Matching (next week) Weighting (two weeks) Regression

(three weeks)bull Today coarsening the propensity score and stratifyingbull Choose boundary points 0 = 1198871113567 lt 1198871113568 lt hellip lt 119887119870minus1113568 lt 119887119870 = 1bull Create block indicators

119861119894(119896) =

⎧⎪⎪⎨⎪⎪⎩1 if 119887119896minus1113568 lt (119883119894) lt 1198871198960 otherwise

bull Calculate within-strata effect estimates

120591119896 = 120124[119884119894|119863119894 = 1 119861119894(119896) = 1] minus 120124[119884119894|119863119894 = 0 119861119894(119896) = 1]

Standardizationdirect adjustment

bull We calculated the CATEs for each strata of the PS 120591119896bull We can use law of iterated expectations to back out the ATEbull Take the average of the CATEs over the distribution of 119883

120591 =1198701114012119896=1113568

120591119896ℙ[119861119894(119896) = 1]

bull Note that ℙ[119861119894(119896) = 1] is just the proportion of units in block 119896

ℙ[119861119894(119896) = 1] =sum119873

119894=1113568 119861119894(119896)119873

5 Wrapping Up

Summary

bull Defined observational studiesbull Defined confounding and assessed when no unmeasured

confounding holdsbull Saw how no unmeasured confounding helps with OLSbull Saw how to estimate causal effects under no unmeasured

confounding using the propensity score

Next few weeks

bull Learn how to estimate causal effects under no unmeasuredconfounders via

Matching Weighting Regression

bull Then we move onto situations where no unmeasuredconfounders is violated

  • Observational studies
  • Confounding
  • No unmeasured confounders and OLS
  • Estimating causal effects under no unmeasured confounders
  • Wrapping Up
Page 6: Gov 2002: 4. Observational Studies and Confounding · 2019-07-23 · 2. the coefficient of regression of𝐷𝑖on 𝐿𝑖also controlling for ... Basicapproachtoestimation • Remember

Designing observational studies

bull Rubin (2008) argues that we should still ldquodesignrdquo ourobservational studies

Pick the ideal experiment to this observational study Hide the outcome data Try to estimate the randomization procedure Analyze this as an experiment with this estimated procedure

bull Tries to minimize ldquosnoopingrdquo by picking the best modelingstrategy before seeing the outcome

Discrete covariates

bull Suppose that we knew that 119863119894 was unconfounded within levelsof a binary 119883119894

bull Then we could always estimate the causal effect using iteratedexpectations as in a stratified randomized experiment

1201241198831114107120124[119884119894|119863119894 = 1119883119894] minus 120124[119884119894|119863119894 = 0119883119894]1114110

= 1114101120124[119884119894|119863119894 = 1119883119894 = 1] minus 120124[119884119894|119863119894 = 0119883119894 = 1]1114104111405911138401113840111384011138401113840111384011138401113840111384011138401113840111384011138401113840111384011138401113840111384011138401113840111384011138401113840111384011138401113840111384011138401113840111384011140601113840111384011138401113840111384011138401113840111384011138401113840111384011138401113840111384011138401113840111384011138401113840111384011138401113840111384011138401113840111384011138401113840111384011138401114061

diff-in-means for 119883119894=1113568

ℙ[119883119894 = 1]11140591113840111384011138401113840111406011138401113840111384011138401114061share of 119883119894=1113568

+ 1114101120124[119884119894|119863119894 = 1119883119894 = 0] minus 120124[119884119894|119863119894 = 0119883119894 = 0]1114104111405911138401113840111384011138401113840111384011138401113840111384011138401113840111384011138401113840111384011138401113840111384011138401113840111384011138401113840111384011138401113840111384011138401113840111384011140601113840111384011138401113840111384011138401113840111384011138401113840111384011138401113840111384011138401113840111384011138401113840111384011138401113840111384011138401113840111384011138401113840111384011138401114061

diff-in-means for 119883119894=1113567

ℙ[119883119894 = 0]11140591113840111384011138401113840111406011138401113840111384011138401114061share of 119883119894=1113567

bull Never used our knowledge of the randomization for thisquantity

Continuous covariates

bull So great we can stratify Why not do this all the timebull What if 119883119894 = income for unit 119894

Each unit has its own value of 119883119894 $54134 $123043 $23842 If 119883119894 = 54134 is unique will only observe 1 of these

120124[119884119894|119863119894 = 1119883119894 = 54134] minus 120124[119884119894|119863119894 = 0119883119894 = 54134]

cannot stratify to each unique value of 119883119894bull Practically this is massively important almost always have

data with unique values

Going to a superpopulation

bull From here on out wersquoll focus less on the finite populationmodel

Harder with (functionally) continuous covariatesbull Assume that each unit 119894 is drawn from an infinite

superpopulation implies that (119884119894(0) 119884119894(1) 119863119894 119883119894) are a draw from their

population joint distributionbull Potential outcomes are now typical random variables

120583119888(119909) = 120124[119884119894(0)|119883119894 = 119909] and 120583119905(119909) = 120124[119884119894(1)|119883119894 = 119909] 1205901113569119888 (119909) = 120141[119884119894(0)|119883119894 = 119909] and 1205901113569119905 (119909) = 120141[119884119894(1)|119883119894 = 119909] 120591 = 120124[120583119905(119909) minus 120583119888(119909)|119883119894 = 119909]

Assumptions in the superpopulation

bull With an infinite superpopulation worry less aboutconditioning on the entire sample

Units are now independent due to random sampling from aninfinite population

bull No unmeasured confoudning implies that

ℙ(119863119894 = 1|119884119894(0) 119884119894(1) 119883119894) = ℙ(119863119894 = 1|119883119894)

bull Or written using conditional independence

119863119894 ⟂⟂ 1114100119884119894(0) 119884119894(1)1114103|119883119894

bull Positivity can be written 0 lt ℙ(119863119894 = 1|119883119894 = 119909) lt 1 for all 119909 inthe support of 119883119894

2 Confounding

What is confounding

bull Confounding is the bias caused by common causes of thetreatment and outcome

Leads to ldquospurious correlationrdquobull In observational studies the goal is to avoid confounding

inherent in the databull Pervasive in the social sciences

effect of income on voting (confounding age) effect of job training program on employment (confounding

motivation) effect of political institutions on economic development

(confounding previous economic development)bull No unmeasured confounding assumes that wersquove measured all

sources of confounding

Big problem

bull How can we determine if no unmeasured confounding holds ifwe didnrsquot assign the treatment

bull Put differently What covariates do we need to condition on What covariates do we need to match on What covaraites do we need to include in our regressions

bull One way from the assumption itself ℙ[119863119894 = 1|119831 119832(1) 119832(0)] = ℙ[119863119894 = 1|119831] Include covariates such that conditional on them the

treatment assignment does not depend on the potentialoutcomes

bull Another way use DAGs and look at back-door paths

Backdoor paths and blocking paths

bull Backdoor path is a non-causal path from 119863 to 119884 Would remain if we removed any arrows pointing out of 119863

bull Backdoor paths between 119863 and 119884 common causes of 119863and 119884

119863

119883

119884

bull Here there is a backdoor path 119863 larr 119883 rarr 119884 where 119883 is acommon cause for the treatment and the outcome

Other types of confounding

119863

119880 119883

119884

bull 119863 is enrolling in a job training programbull 119884 is getting a jobbull 119880 is being motivatedbull 119883 is number of job applications sent outbull Big assumption here no arrow from 119880 to 119884

Other types of confounding

119863

119880 119883

119884

bull 119863 is exercisebull 119884 is having a diseasebull 119880 is lifestylebull 119883 is smokingbull Big assumption here no arrow from 119880 to 119884

Whatrsquos the problem with backdoorpaths

119863

119880 119883

119884

bull A path is blocked if1 we control for or stratify a non-collider on that path OR2 we do not control for a collider

bull Unblocked backdoor paths confoundingbull In the DAG here if we condition on 119883 then the backdoor

path is blocked

Not all backdoor paths

119863

1198801113568119883119883

119884

bull Conditioning on the posttreatment covariates opens thenon-causal path

selection bias

M-bias

119863

1198801113568 1198801113569119883119883

119884

bull Not all backdoor paths induce confoundingbull This backdoor path is blocked by the collider 119883119894 that we donrsquot

control forbull If we control for 119883119894 opens the path and induces

confounding Sometimes called M-bias

bull Controversial because of differing views on what to control for Rubin thinks that M-bias is a ldquomathematical curiosityrdquo and we

should control for all pretreatment variables Pearl and others think M-bias is a real threat

Backdoor criterion

bull Can we use a DAG to evaluate no unmeasured confoundersbull Pearl answered yes with the backdoor criterion which states

that the effect of 119863 on 119884 is identified if1 No backdoor paths from 119863 to 119884 OR2 Measured covariates are sufficient to block all backdoor paths

from 119863 to 119884

bull First is really only valid for randomized experimentsbull The backdoor criterion is fairly powerful Tells us

if there confounding given this DAG if it is possible to removing the confounding and what variables to condition on to eliminate the confounding

SWIGs

119863 | 119889 119884(119889)

119880 119883

119884

bull Itrsquos a little hard to see how the backdoor criterion implies nounmeasured confounders

No potential outcomes on this graphbull Richardson and Robins Single World Intervention Graphs

Split 119863 node into natural value (119863) and intervention value 119889 Let all effects of 119863 take their potential value under

intervention 119884(119889)bull Now can see are 119863 and 119884(119889) related

119863 larr 119880 rarr 119883 rarr 119884(119889) implies not independent Conditioning on 119883 blocks that backdoor path 119863 ⟂⟂ 119884(119889)|119883

No unmeasured confounders is nottestable

bull No unmeasured confounding places no restrictions on theobserved data

1114100119884119894(0)|119863119894 = 1119883119894111410311140591113840111384011138401113840111384011138401113840111384011138401113840111406011138401113840111384011138401113840111384011138401113840111384011138401114061unobserved

119889= 1114100119884119894(0)|119863119894 = 0119883119894111410311140591113840111384011138401113840111384011138401113840111384011138401113840111406011138401113840111384011138401113840111384011138401113840111384011138401114061observed

bull Here 119889= means equal in distributionbull No way to directly test this assumption without the

counterfactual data which is missing by definitionbull With backdoor criterion you must have the correct DAG

Assessing no unmeasured confounders

bull Can do ldquoplacebordquo tests where 119863119894 cannot have an effect(lagged outcomes etc)

bull Della Vigna and Kaplan (2007 QJE) effect of Fox Newsavailability on Republican vote share

Availability in 20002003 canrsquot affect past vote sharesbull Unconfoundedness could still be violated even if you pass this

test

Alternatives to no unmeasuredconfounding

bull Without explicit randomization we need some way ofidentifying causal effects

bull No unmeasured confounders asymp randomized experiment Indentification results very similar to experiments

bull With unmeasured confounding are we doomed Maybe notbull Other approaches rely on finding plausibly exogenous variation

in assignment of 119863119894 Instrumental variables (randomization + exclusion restriction) Over-time variation (diff-in-diff fixed effects) Arbitrary thresholds for treatment assignment (RDD)

3 No unmeasuredconfounders and OLS

Justifying regression

bull We know how randomized experiments imply thatdifferences-in-means identify the ATE

bull In the next few weeks wersquoll work through how no unmeasuredconfounding justifies a number of estimation strategies

bull Today itrsquos useful to walk through what no unmeasuredconfounding can buy us in a familiar setting OLS

Wersquoll cover regression more formally later

Constant effects set up

bull Assume a constant effects setup

119884119894(0) = 120572 + 119883prime119894 120573 + 119906119894

119884119894(1) = 120572 + 120591 + 119883prime119894 120573 + 119906119894

bull Constant effects because 119884119894(1) minus 119884119894(0) = 120591 for all unitsbull Use consistency to get the usual regression formula

119884119894 = 119884119894(1)119863119894 + 119884119894(0)(1 minus 119863119894)= 119884119894(0) + 1114100119884119894(1) minus 119884119894(0)1114103 sdot 119863119894

= 120572 + 120591 sdot 119863119894 + 119883prime119894 120573 + 119906119894

bull Does no unmeasured confounding help us identify the causalparameter 120591

Regression on residuals

bull First estimate the residuals of regression of the treatment andoutcome on the covariates

119894 = 119884119894 minus 120124[119884119894|119883119894]119894 = 119863119894 minus 120124[119863119894|119883119894]

bull Running a regression of 119894 on 119894 is equivalent to controllingfor 119883119894

119884119894 = 120572 + 120591 sdot 119863119894 + 119883prime119894 120573 + 119906119894

119894 = 120572 + 120591 sdot 119894 + 119894bull Here 119894 = 119906119894 minus 120124[119906119894|119883119894]

What does OLS estimate

bull Using the usual OLS theory we can show that the probabilitylimit of the OLS estimator of 120591 is

plim 111369611136931113700 =Cov(119894 119894)Var(119894)

= Cov(119894 120572 + 120591 sdot 119894 + 119894)Var(119894)

= 120591 sdot Cov(119894 119894) + Cov(119894 119894)Var(119894)

= 120591 + Cov(119894 119894)Var(119894)

Key OLS assumption

plim 111369611136931113700 = 120591 + Cov(119894 119894)Var(119894)

bull Key identification comes from Cov(119894 119894) = 0 Conditional on 119883119894 no relationship between 119863119894 and 119906119894

bull Note 119906119894 is a function of 119883119894 and 119884119894(119889) 119906119894 = 119884119894(0) minus 120572 minus 119883prime

119894 120573 when 119863119894 = 0 119906119894 = 119884119894(1) minus 120572 minus 120591 minus 119883prime

119894 120573 when 119863119894 = 1 condition on 119883119894 only variation in 119906119894 comes from 119884119894(119889)

bull No unmeasured confounding implies this assumption

119863119894 ⟂⟂ 1114100119884119894(1) 119884119894(0)1114103|119883119894 ⟹ 119863119894 ⟂⟂ 119906119894|119883119894 ⟹ Cov(119894 119894) = 0

Omitted variable bias

bull What happens when this is violated Suppose that there isone omitted variable (residualized from 119883119894)

119894 = 120582119894 + 120596119894

bull Wersquoll assume that if we could measure 119871119894 then nounmeasured confounding would hold

bull Leads to inconsistency in the OLS estimator

plim 111369611136931113700 = 120591 + 120582Cov(119894 119894)Var(119894)

bull Bias here is terms multiplied together1 coefficient on 119871119894 (120582)2 the coefficient of regression of 119863119894 on 119871119894 also controlling for 119883119894

4 Estimating causaleffects under nounmeasuredconfounders

Basic approach to estimation

bull Remember the usual approach to estimating the ATE withcovariates

bull Stratification Stratify the units by the covariates Calculate CATE within these strata

bull Standardizationdirect adjustment Average the CATEs across the strata to get ATE

bull How to create strata when 119883 has continuous components If 119883 is discrete with only a few levels can use the exact values

of 119883 Otherwise we may have to subclassifycoarsen the data

Classic example cigarspipes versuscigarettes

bull 119863119894 = 1 for pipecigar smokers 119863119894 = 0 for cigarette smokersbull 119884119894 = death in the first year of follow-upbull Naive positive effect cigarpipe smokers more likely to die

Whatrsquos the confounder here Age Pipecigar smokers much older than cigarette smokers

bull Cochranrsquos approach stratify based on coarsened age Divide age into 119896 strata 119878119894 isin 1199041113568 1199041113569 hellip 119904119896 1199041113568 might be 18-25 1199041113569 might be 26-35 and so on Calculate effect within strata and aggregate

bull Key assumption no unmeasured confounders using stratifiedversion of age

119863119894 ⟂⟂ 1114100119884119894(0) 119884119894(1)1114103|119878119894

Stratification on the propensity score

bull What about when 119883 has has many dimensionsbull Curse of dimensionality there will be very few if any units in

a given stratum of 119883119894bull Stratify on a low-dimensional summary the propensity score

119890(119909) = ℙ[119863119894 = 1|119883119894 = 119909]

PS = unitrsquos probability of being treated conditional on 119883119894

bull For a particular unit this is 119890(119883119894) = ℙ[119863119894 = 1|119883119894]bull Rosenbaum and Rubin (1983) showed that

119863119894 ⟂⟂ 1114100119884119894(0) 119884119894(1)1114103 | 119883119894 ⟹ 119863119894 ⟂⟂ 1114100119884119894(0) 119884119894(1)1114103 | 119890(119883119894)

stratifying on 119890119894 is the same as stratifying on the full 119883119894

Propensity score as balancing score

bull The propensity score is actually a balancing score whichmeans that

119863119894 ⟂⟂ 119883119894 | 119890(119883119894)

bull Conditional on the propensity score treatment is independentof the covariates

Treatment status is said to be balanced 119891(119883119894|119863119894 = 1 119890(119883119894)) = 119891(119883119894|119863119894 = 0 119890(119883119894))

bull Of course we have to know the true PS to have all theseresults work

Estimating the propensity score

bull Of course in observational studies we donrsquot know thepropensity score

bull We would run a parametric model with parameters 120574 toestimate the propensity scores

1 Estimate 2 Create 119894 = Pr[119863119894 = 1|119883119894 ]

bull For instance in R we could easily calculate the propensityscores using the glm function

pscores lt- glm(treat ~ var1 + var2 + var3 data = mydata

family = binomial())$fittedvalues

Propensity score specifics

bull What variables do we include in the propensity score model Any set of variables that blocks all the backdoor paths from 119863119894

to 119884119894

bull Check balance within strata of 119894 Covariates should bebalanced

119891(119883119894|119863119894 = 1 119894) = 119891(119883119894|119863119894 = 0 119894)

bull Can also use automatednonparametric tools for estimating 119894 Covariate Balancing Propensity Scores (Imai and Ratkovic)

Stratifying by the propensity score

bull How will we use the propensity score Matching (next week) Weighting (two weeks) Regression

(three weeks)bull Today coarsening the propensity score and stratifyingbull Choose boundary points 0 = 1198871113567 lt 1198871113568 lt hellip lt 119887119870minus1113568 lt 119887119870 = 1bull Create block indicators

119861119894(119896) =

⎧⎪⎪⎨⎪⎪⎩1 if 119887119896minus1113568 lt (119883119894) lt 1198871198960 otherwise

bull Calculate within-strata effect estimates

120591119896 = 120124[119884119894|119863119894 = 1 119861119894(119896) = 1] minus 120124[119884119894|119863119894 = 0 119861119894(119896) = 1]

Standardizationdirect adjustment

bull We calculated the CATEs for each strata of the PS 120591119896bull We can use law of iterated expectations to back out the ATEbull Take the average of the CATEs over the distribution of 119883

120591 =1198701114012119896=1113568

120591119896ℙ[119861119894(119896) = 1]

bull Note that ℙ[119861119894(119896) = 1] is just the proportion of units in block 119896

ℙ[119861119894(119896) = 1] =sum119873

119894=1113568 119861119894(119896)119873

5 Wrapping Up

Summary

bull Defined observational studiesbull Defined confounding and assessed when no unmeasured

confounding holdsbull Saw how no unmeasured confounding helps with OLSbull Saw how to estimate causal effects under no unmeasured

confounding using the propensity score

Next few weeks

bull Learn how to estimate causal effects under no unmeasuredconfounders via

Matching Weighting Regression

bull Then we move onto situations where no unmeasuredconfounders is violated

  • Observational studies
  • Confounding
  • No unmeasured confounders and OLS
  • Estimating causal effects under no unmeasured confounders
  • Wrapping Up
Page 7: Gov 2002: 4. Observational Studies and Confounding · 2019-07-23 · 2. the coefficient of regression of𝐷𝑖on 𝐿𝑖also controlling for ... Basicapproachtoestimation • Remember

Discrete covariates

bull Suppose that we knew that 119863119894 was unconfounded within levelsof a binary 119883119894

bull Then we could always estimate the causal effect using iteratedexpectations as in a stratified randomized experiment

1201241198831114107120124[119884119894|119863119894 = 1119883119894] minus 120124[119884119894|119863119894 = 0119883119894]1114110

= 1114101120124[119884119894|119863119894 = 1119883119894 = 1] minus 120124[119884119894|119863119894 = 0119883119894 = 1]1114104111405911138401113840111384011138401113840111384011138401113840111384011138401113840111384011138401113840111384011138401113840111384011138401113840111384011138401113840111384011138401113840111384011138401113840111384011140601113840111384011138401113840111384011138401113840111384011138401113840111384011138401113840111384011138401113840111384011138401113840111384011138401113840111384011138401113840111384011138401113840111384011138401114061

diff-in-means for 119883119894=1113568

ℙ[119883119894 = 1]11140591113840111384011138401113840111406011138401113840111384011138401114061share of 119883119894=1113568

+ 1114101120124[119884119894|119863119894 = 1119883119894 = 0] minus 120124[119884119894|119863119894 = 0119883119894 = 0]1114104111405911138401113840111384011138401113840111384011138401113840111384011138401113840111384011138401113840111384011138401113840111384011138401113840111384011138401113840111384011138401113840111384011138401113840111384011140601113840111384011138401113840111384011138401113840111384011138401113840111384011138401113840111384011138401113840111384011138401113840111384011138401113840111384011138401113840111384011138401113840111384011138401114061

diff-in-means for 119883119894=1113567

ℙ[119883119894 = 0]11140591113840111384011138401113840111406011138401113840111384011138401114061share of 119883119894=1113567

bull Never used our knowledge of the randomization for thisquantity

Continuous covariates

bull So great we can stratify Why not do this all the timebull What if 119883119894 = income for unit 119894

Each unit has its own value of 119883119894 $54134 $123043 $23842 If 119883119894 = 54134 is unique will only observe 1 of these

120124[119884119894|119863119894 = 1119883119894 = 54134] minus 120124[119884119894|119863119894 = 0119883119894 = 54134]

cannot stratify to each unique value of 119883119894bull Practically this is massively important almost always have

data with unique values

Going to a superpopulation

bull From here on out wersquoll focus less on the finite populationmodel

Harder with (functionally) continuous covariatesbull Assume that each unit 119894 is drawn from an infinite

superpopulation implies that (119884119894(0) 119884119894(1) 119863119894 119883119894) are a draw from their

population joint distributionbull Potential outcomes are now typical random variables

120583119888(119909) = 120124[119884119894(0)|119883119894 = 119909] and 120583119905(119909) = 120124[119884119894(1)|119883119894 = 119909] 1205901113569119888 (119909) = 120141[119884119894(0)|119883119894 = 119909] and 1205901113569119905 (119909) = 120141[119884119894(1)|119883119894 = 119909] 120591 = 120124[120583119905(119909) minus 120583119888(119909)|119883119894 = 119909]

Assumptions in the superpopulation

bull With an infinite superpopulation worry less aboutconditioning on the entire sample

Units are now independent due to random sampling from aninfinite population

bull No unmeasured confoudning implies that

ℙ(119863119894 = 1|119884119894(0) 119884119894(1) 119883119894) = ℙ(119863119894 = 1|119883119894)

bull Or written using conditional independence

119863119894 ⟂⟂ 1114100119884119894(0) 119884119894(1)1114103|119883119894

bull Positivity can be written 0 lt ℙ(119863119894 = 1|119883119894 = 119909) lt 1 for all 119909 inthe support of 119883119894

2 Confounding

What is confounding

bull Confounding is the bias caused by common causes of thetreatment and outcome

Leads to ldquospurious correlationrdquobull In observational studies the goal is to avoid confounding

inherent in the databull Pervasive in the social sciences

effect of income on voting (confounding age) effect of job training program on employment (confounding

motivation) effect of political institutions on economic development

(confounding previous economic development)bull No unmeasured confounding assumes that wersquove measured all

sources of confounding

Big problem

bull How can we determine if no unmeasured confounding holds ifwe didnrsquot assign the treatment

bull Put differently What covariates do we need to condition on What covariates do we need to match on What covaraites do we need to include in our regressions

bull One way from the assumption itself ℙ[119863119894 = 1|119831 119832(1) 119832(0)] = ℙ[119863119894 = 1|119831] Include covariates such that conditional on them the

treatment assignment does not depend on the potentialoutcomes

bull Another way use DAGs and look at back-door paths

Backdoor paths and blocking paths

bull Backdoor path is a non-causal path from 119863 to 119884 Would remain if we removed any arrows pointing out of 119863

bull Backdoor paths between 119863 and 119884 common causes of 119863and 119884

119863

119883

119884

bull Here there is a backdoor path 119863 larr 119883 rarr 119884 where 119883 is acommon cause for the treatment and the outcome

Other types of confounding

119863

119880 119883

119884

bull 119863 is enrolling in a job training programbull 119884 is getting a jobbull 119880 is being motivatedbull 119883 is number of job applications sent outbull Big assumption here no arrow from 119880 to 119884

Other types of confounding

119863

119880 119883

119884

bull 119863 is exercisebull 119884 is having a diseasebull 119880 is lifestylebull 119883 is smokingbull Big assumption here no arrow from 119880 to 119884

Whatrsquos the problem with backdoorpaths

119863

119880 119883

119884

bull A path is blocked if1 we control for or stratify a non-collider on that path OR2 we do not control for a collider

bull Unblocked backdoor paths confoundingbull In the DAG here if we condition on 119883 then the backdoor

path is blocked

Not all backdoor paths

119863

1198801113568119883119883

119884

bull Conditioning on the posttreatment covariates opens thenon-causal path

selection bias

M-bias

119863

1198801113568 1198801113569119883119883

119884

bull Not all backdoor paths induce confoundingbull This backdoor path is blocked by the collider 119883119894 that we donrsquot

control forbull If we control for 119883119894 opens the path and induces

confounding Sometimes called M-bias

bull Controversial because of differing views on what to control for Rubin thinks that M-bias is a ldquomathematical curiosityrdquo and we

should control for all pretreatment variables Pearl and others think M-bias is a real threat

Backdoor criterion

bull Can we use a DAG to evaluate no unmeasured confoundersbull Pearl answered yes with the backdoor criterion which states

that the effect of 119863 on 119884 is identified if1 No backdoor paths from 119863 to 119884 OR2 Measured covariates are sufficient to block all backdoor paths

from 119863 to 119884

bull First is really only valid for randomized experimentsbull The backdoor criterion is fairly powerful Tells us

if there confounding given this DAG if it is possible to removing the confounding and what variables to condition on to eliminate the confounding

SWIGs

119863 | 119889 119884(119889)

119880 119883

119884

bull Itrsquos a little hard to see how the backdoor criterion implies nounmeasured confounders

No potential outcomes on this graphbull Richardson and Robins Single World Intervention Graphs

Split 119863 node into natural value (119863) and intervention value 119889 Let all effects of 119863 take their potential value under

intervention 119884(119889)bull Now can see are 119863 and 119884(119889) related

119863 larr 119880 rarr 119883 rarr 119884(119889) implies not independent Conditioning on 119883 blocks that backdoor path 119863 ⟂⟂ 119884(119889)|119883

No unmeasured confounders is nottestable

bull No unmeasured confounding places no restrictions on theobserved data

1114100119884119894(0)|119863119894 = 1119883119894111410311140591113840111384011138401113840111384011138401113840111384011138401113840111406011138401113840111384011138401113840111384011138401113840111384011138401114061unobserved

119889= 1114100119884119894(0)|119863119894 = 0119883119894111410311140591113840111384011138401113840111384011138401113840111384011138401113840111406011138401113840111384011138401113840111384011138401113840111384011138401114061observed

bull Here 119889= means equal in distributionbull No way to directly test this assumption without the

counterfactual data which is missing by definitionbull With backdoor criterion you must have the correct DAG

Assessing no unmeasured confounders

bull Can do ldquoplacebordquo tests where 119863119894 cannot have an effect(lagged outcomes etc)

bull Della Vigna and Kaplan (2007 QJE) effect of Fox Newsavailability on Republican vote share

Availability in 20002003 canrsquot affect past vote sharesbull Unconfoundedness could still be violated even if you pass this

test

Alternatives to no unmeasuredconfounding

bull Without explicit randomization we need some way ofidentifying causal effects

bull No unmeasured confounders asymp randomized experiment Indentification results very similar to experiments

bull With unmeasured confounding are we doomed Maybe notbull Other approaches rely on finding plausibly exogenous variation

in assignment of 119863119894 Instrumental variables (randomization + exclusion restriction) Over-time variation (diff-in-diff fixed effects) Arbitrary thresholds for treatment assignment (RDD)

3 No unmeasuredconfounders and OLS

Justifying regression

bull We know how randomized experiments imply thatdifferences-in-means identify the ATE

bull In the next few weeks wersquoll work through how no unmeasuredconfounding justifies a number of estimation strategies

bull Today itrsquos useful to walk through what no unmeasuredconfounding can buy us in a familiar setting OLS

Wersquoll cover regression more formally later

Constant effects set up

bull Assume a constant effects setup

119884119894(0) = 120572 + 119883prime119894 120573 + 119906119894

119884119894(1) = 120572 + 120591 + 119883prime119894 120573 + 119906119894

bull Constant effects because 119884119894(1) minus 119884119894(0) = 120591 for all unitsbull Use consistency to get the usual regression formula

119884119894 = 119884119894(1)119863119894 + 119884119894(0)(1 minus 119863119894)= 119884119894(0) + 1114100119884119894(1) minus 119884119894(0)1114103 sdot 119863119894

= 120572 + 120591 sdot 119863119894 + 119883prime119894 120573 + 119906119894

bull Does no unmeasured confounding help us identify the causalparameter 120591

Regression on residuals

bull First estimate the residuals of regression of the treatment andoutcome on the covariates

119894 = 119884119894 minus 120124[119884119894|119883119894]119894 = 119863119894 minus 120124[119863119894|119883119894]

bull Running a regression of 119894 on 119894 is equivalent to controllingfor 119883119894

119884119894 = 120572 + 120591 sdot 119863119894 + 119883prime119894 120573 + 119906119894

119894 = 120572 + 120591 sdot 119894 + 119894bull Here 119894 = 119906119894 minus 120124[119906119894|119883119894]

What does OLS estimate

bull Using the usual OLS theory we can show that the probabilitylimit of the OLS estimator of 120591 is

plim 111369611136931113700 =Cov(119894 119894)Var(119894)

= Cov(119894 120572 + 120591 sdot 119894 + 119894)Var(119894)

= 120591 sdot Cov(119894 119894) + Cov(119894 119894)Var(119894)

= 120591 + Cov(119894 119894)Var(119894)

Key OLS assumption

plim 111369611136931113700 = 120591 + Cov(119894 119894)Var(119894)

bull Key identification comes from Cov(119894 119894) = 0 Conditional on 119883119894 no relationship between 119863119894 and 119906119894

bull Note 119906119894 is a function of 119883119894 and 119884119894(119889) 119906119894 = 119884119894(0) minus 120572 minus 119883prime

119894 120573 when 119863119894 = 0 119906119894 = 119884119894(1) minus 120572 minus 120591 minus 119883prime

119894 120573 when 119863119894 = 1 condition on 119883119894 only variation in 119906119894 comes from 119884119894(119889)

bull No unmeasured confounding implies this assumption

119863119894 ⟂⟂ 1114100119884119894(1) 119884119894(0)1114103|119883119894 ⟹ 119863119894 ⟂⟂ 119906119894|119883119894 ⟹ Cov(119894 119894) = 0

Omitted variable bias

bull What happens when this is violated Suppose that there isone omitted variable (residualized from 119883119894)

119894 = 120582119894 + 120596119894

bull Wersquoll assume that if we could measure 119871119894 then nounmeasured confounding would hold

bull Leads to inconsistency in the OLS estimator

plim 111369611136931113700 = 120591 + 120582Cov(119894 119894)Var(119894)

bull Bias here is terms multiplied together1 coefficient on 119871119894 (120582)2 the coefficient of regression of 119863119894 on 119871119894 also controlling for 119883119894

4 Estimating causaleffects under nounmeasuredconfounders

Basic approach to estimation

bull Remember the usual approach to estimating the ATE withcovariates

bull Stratification Stratify the units by the covariates Calculate CATE within these strata

bull Standardizationdirect adjustment Average the CATEs across the strata to get ATE

bull How to create strata when 119883 has continuous components If 119883 is discrete with only a few levels can use the exact values

of 119883 Otherwise we may have to subclassifycoarsen the data

Classic example cigarspipes versuscigarettes

bull 119863119894 = 1 for pipecigar smokers 119863119894 = 0 for cigarette smokersbull 119884119894 = death in the first year of follow-upbull Naive positive effect cigarpipe smokers more likely to die

Whatrsquos the confounder here Age Pipecigar smokers much older than cigarette smokers

bull Cochranrsquos approach stratify based on coarsened age Divide age into 119896 strata 119878119894 isin 1199041113568 1199041113569 hellip 119904119896 1199041113568 might be 18-25 1199041113569 might be 26-35 and so on Calculate effect within strata and aggregate

bull Key assumption no unmeasured confounders using stratifiedversion of age

119863119894 ⟂⟂ 1114100119884119894(0) 119884119894(1)1114103|119878119894

Stratification on the propensity score

bull What about when 119883 has has many dimensionsbull Curse of dimensionality there will be very few if any units in

a given stratum of 119883119894bull Stratify on a low-dimensional summary the propensity score

119890(119909) = ℙ[119863119894 = 1|119883119894 = 119909]

PS = unitrsquos probability of being treated conditional on 119883119894

bull For a particular unit this is 119890(119883119894) = ℙ[119863119894 = 1|119883119894]bull Rosenbaum and Rubin (1983) showed that

119863119894 ⟂⟂ 1114100119884119894(0) 119884119894(1)1114103 | 119883119894 ⟹ 119863119894 ⟂⟂ 1114100119884119894(0) 119884119894(1)1114103 | 119890(119883119894)

stratifying on 119890119894 is the same as stratifying on the full 119883119894

Propensity score as balancing score

bull The propensity score is actually a balancing score whichmeans that

119863119894 ⟂⟂ 119883119894 | 119890(119883119894)

bull Conditional on the propensity score treatment is independentof the covariates

Treatment status is said to be balanced 119891(119883119894|119863119894 = 1 119890(119883119894)) = 119891(119883119894|119863119894 = 0 119890(119883119894))

bull Of course we have to know the true PS to have all theseresults work

Estimating the propensity score

bull Of course in observational studies we donrsquot know thepropensity score

bull We would run a parametric model with parameters 120574 toestimate the propensity scores

1 Estimate 2 Create 119894 = Pr[119863119894 = 1|119883119894 ]

bull For instance in R we could easily calculate the propensityscores using the glm function

pscores lt- glm(treat ~ var1 + var2 + var3 data = mydata

family = binomial())$fittedvalues

Propensity score specifics

bull What variables do we include in the propensity score model Any set of variables that blocks all the backdoor paths from 119863119894

to 119884119894

bull Check balance within strata of 119894 Covariates should bebalanced

119891(119883119894|119863119894 = 1 119894) = 119891(119883119894|119863119894 = 0 119894)

bull Can also use automatednonparametric tools for estimating 119894 Covariate Balancing Propensity Scores (Imai and Ratkovic)

Stratifying by the propensity score

bull How will we use the propensity score Matching (next week) Weighting (two weeks) Regression

(three weeks)bull Today coarsening the propensity score and stratifyingbull Choose boundary points 0 = 1198871113567 lt 1198871113568 lt hellip lt 119887119870minus1113568 lt 119887119870 = 1bull Create block indicators

119861119894(119896) =

⎧⎪⎪⎨⎪⎪⎩1 if 119887119896minus1113568 lt (119883119894) lt 1198871198960 otherwise

bull Calculate within-strata effect estimates

120591119896 = 120124[119884119894|119863119894 = 1 119861119894(119896) = 1] minus 120124[119884119894|119863119894 = 0 119861119894(119896) = 1]

Standardizationdirect adjustment

bull We calculated the CATEs for each strata of the PS 120591119896bull We can use law of iterated expectations to back out the ATEbull Take the average of the CATEs over the distribution of 119883

120591 =1198701114012119896=1113568

120591119896ℙ[119861119894(119896) = 1]

bull Note that ℙ[119861119894(119896) = 1] is just the proportion of units in block 119896

ℙ[119861119894(119896) = 1] =sum119873

119894=1113568 119861119894(119896)119873

5 Wrapping Up

Summary

bull Defined observational studiesbull Defined confounding and assessed when no unmeasured

confounding holdsbull Saw how no unmeasured confounding helps with OLSbull Saw how to estimate causal effects under no unmeasured

confounding using the propensity score

Next few weeks

bull Learn how to estimate causal effects under no unmeasuredconfounders via

Matching Weighting Regression

bull Then we move onto situations where no unmeasuredconfounders is violated

  • Observational studies
  • Confounding
  • No unmeasured confounders and OLS
  • Estimating causal effects under no unmeasured confounders
  • Wrapping Up
Page 8: Gov 2002: 4. Observational Studies and Confounding · 2019-07-23 · 2. the coefficient of regression of𝐷𝑖on 𝐿𝑖also controlling for ... Basicapproachtoestimation • Remember

Continuous covariates

bull So great we can stratify Why not do this all the timebull What if 119883119894 = income for unit 119894

Each unit has its own value of 119883119894 $54134 $123043 $23842 If 119883119894 = 54134 is unique will only observe 1 of these

120124[119884119894|119863119894 = 1119883119894 = 54134] minus 120124[119884119894|119863119894 = 0119883119894 = 54134]

cannot stratify to each unique value of 119883119894bull Practically this is massively important almost always have

data with unique values

Going to a superpopulation

bull From here on out wersquoll focus less on the finite populationmodel

Harder with (functionally) continuous covariatesbull Assume that each unit 119894 is drawn from an infinite

superpopulation implies that (119884119894(0) 119884119894(1) 119863119894 119883119894) are a draw from their

population joint distributionbull Potential outcomes are now typical random variables

120583119888(119909) = 120124[119884119894(0)|119883119894 = 119909] and 120583119905(119909) = 120124[119884119894(1)|119883119894 = 119909] 1205901113569119888 (119909) = 120141[119884119894(0)|119883119894 = 119909] and 1205901113569119905 (119909) = 120141[119884119894(1)|119883119894 = 119909] 120591 = 120124[120583119905(119909) minus 120583119888(119909)|119883119894 = 119909]

Assumptions in the superpopulation

bull With an infinite superpopulation worry less aboutconditioning on the entire sample

Units are now independent due to random sampling from aninfinite population

bull No unmeasured confoudning implies that

ℙ(119863119894 = 1|119884119894(0) 119884119894(1) 119883119894) = ℙ(119863119894 = 1|119883119894)

bull Or written using conditional independence

119863119894 ⟂⟂ 1114100119884119894(0) 119884119894(1)1114103|119883119894

bull Positivity can be written 0 lt ℙ(119863119894 = 1|119883119894 = 119909) lt 1 for all 119909 inthe support of 119883119894

2 Confounding

What is confounding

bull Confounding is the bias caused by common causes of thetreatment and outcome

Leads to ldquospurious correlationrdquobull In observational studies the goal is to avoid confounding

inherent in the databull Pervasive in the social sciences

effect of income on voting (confounding age) effect of job training program on employment (confounding

motivation) effect of political institutions on economic development

(confounding previous economic development)bull No unmeasured confounding assumes that wersquove measured all

sources of confounding

Big problem

bull How can we determine if no unmeasured confounding holds ifwe didnrsquot assign the treatment

bull Put differently What covariates do we need to condition on What covariates do we need to match on What covaraites do we need to include in our regressions

bull One way from the assumption itself ℙ[119863119894 = 1|119831 119832(1) 119832(0)] = ℙ[119863119894 = 1|119831] Include covariates such that conditional on them the

treatment assignment does not depend on the potentialoutcomes

bull Another way use DAGs and look at back-door paths

Backdoor paths and blocking paths

bull Backdoor path is a non-causal path from 119863 to 119884 Would remain if we removed any arrows pointing out of 119863

bull Backdoor paths between 119863 and 119884 common causes of 119863and 119884

119863

119883

119884

bull Here there is a backdoor path 119863 larr 119883 rarr 119884 where 119883 is acommon cause for the treatment and the outcome

Other types of confounding

119863

119880 119883

119884

bull 119863 is enrolling in a job training programbull 119884 is getting a jobbull 119880 is being motivatedbull 119883 is number of job applications sent outbull Big assumption here no arrow from 119880 to 119884

Other types of confounding

119863

119880 119883

119884

bull 119863 is exercisebull 119884 is having a diseasebull 119880 is lifestylebull 119883 is smokingbull Big assumption here no arrow from 119880 to 119884

Whatrsquos the problem with backdoorpaths

119863

119880 119883

119884

bull A path is blocked if1 we control for or stratify a non-collider on that path OR2 we do not control for a collider

bull Unblocked backdoor paths confoundingbull In the DAG here if we condition on 119883 then the backdoor

path is blocked

Not all backdoor paths

119863

1198801113568119883119883

119884

bull Conditioning on the posttreatment covariates opens thenon-causal path

selection bias

M-bias

119863

1198801113568 1198801113569119883119883

119884

bull Not all backdoor paths induce confoundingbull This backdoor path is blocked by the collider 119883119894 that we donrsquot

control forbull If we control for 119883119894 opens the path and induces

confounding Sometimes called M-bias

bull Controversial because of differing views on what to control for Rubin thinks that M-bias is a ldquomathematical curiosityrdquo and we

should control for all pretreatment variables Pearl and others think M-bias is a real threat

Backdoor criterion

bull Can we use a DAG to evaluate no unmeasured confoundersbull Pearl answered yes with the backdoor criterion which states

that the effect of 119863 on 119884 is identified if1 No backdoor paths from 119863 to 119884 OR2 Measured covariates are sufficient to block all backdoor paths

from 119863 to 119884

bull First is really only valid for randomized experimentsbull The backdoor criterion is fairly powerful Tells us

if there confounding given this DAG if it is possible to removing the confounding and what variables to condition on to eliminate the confounding

SWIGs

119863 | 119889 119884(119889)

119880 119883

119884

bull Itrsquos a little hard to see how the backdoor criterion implies nounmeasured confounders

No potential outcomes on this graphbull Richardson and Robins Single World Intervention Graphs

Split 119863 node into natural value (119863) and intervention value 119889 Let all effects of 119863 take their potential value under

intervention 119884(119889)bull Now can see are 119863 and 119884(119889) related

119863 larr 119880 rarr 119883 rarr 119884(119889) implies not independent Conditioning on 119883 blocks that backdoor path 119863 ⟂⟂ 119884(119889)|119883

No unmeasured confounders is nottestable

bull No unmeasured confounding places no restrictions on theobserved data

1114100119884119894(0)|119863119894 = 1119883119894111410311140591113840111384011138401113840111384011138401113840111384011138401113840111406011138401113840111384011138401113840111384011138401113840111384011138401114061unobserved

119889= 1114100119884119894(0)|119863119894 = 0119883119894111410311140591113840111384011138401113840111384011138401113840111384011138401113840111406011138401113840111384011138401113840111384011138401113840111384011138401114061observed

bull Here 119889= means equal in distributionbull No way to directly test this assumption without the

counterfactual data which is missing by definitionbull With backdoor criterion you must have the correct DAG

Assessing no unmeasured confounders

bull Can do ldquoplacebordquo tests where 119863119894 cannot have an effect(lagged outcomes etc)

bull Della Vigna and Kaplan (2007 QJE) effect of Fox Newsavailability on Republican vote share

Availability in 20002003 canrsquot affect past vote sharesbull Unconfoundedness could still be violated even if you pass this

test

Alternatives to no unmeasuredconfounding

bull Without explicit randomization we need some way ofidentifying causal effects

bull No unmeasured confounders asymp randomized experiment Indentification results very similar to experiments

bull With unmeasured confounding are we doomed Maybe notbull Other approaches rely on finding plausibly exogenous variation

in assignment of 119863119894 Instrumental variables (randomization + exclusion restriction) Over-time variation (diff-in-diff fixed effects) Arbitrary thresholds for treatment assignment (RDD)

3 No unmeasuredconfounders and OLS

Justifying regression

bull We know how randomized experiments imply thatdifferences-in-means identify the ATE

bull In the next few weeks wersquoll work through how no unmeasuredconfounding justifies a number of estimation strategies

bull Today itrsquos useful to walk through what no unmeasuredconfounding can buy us in a familiar setting OLS

Wersquoll cover regression more formally later

Constant effects set up

bull Assume a constant effects setup

119884119894(0) = 120572 + 119883prime119894 120573 + 119906119894

119884119894(1) = 120572 + 120591 + 119883prime119894 120573 + 119906119894

bull Constant effects because 119884119894(1) minus 119884119894(0) = 120591 for all unitsbull Use consistency to get the usual regression formula

119884119894 = 119884119894(1)119863119894 + 119884119894(0)(1 minus 119863119894)= 119884119894(0) + 1114100119884119894(1) minus 119884119894(0)1114103 sdot 119863119894

= 120572 + 120591 sdot 119863119894 + 119883prime119894 120573 + 119906119894

bull Does no unmeasured confounding help us identify the causalparameter 120591

Regression on residuals

bull First estimate the residuals of regression of the treatment andoutcome on the covariates

119894 = 119884119894 minus 120124[119884119894|119883119894]119894 = 119863119894 minus 120124[119863119894|119883119894]

bull Running a regression of 119894 on 119894 is equivalent to controllingfor 119883119894

119884119894 = 120572 + 120591 sdot 119863119894 + 119883prime119894 120573 + 119906119894

119894 = 120572 + 120591 sdot 119894 + 119894bull Here 119894 = 119906119894 minus 120124[119906119894|119883119894]

What does OLS estimate

bull Using the usual OLS theory we can show that the probabilitylimit of the OLS estimator of 120591 is

plim 111369611136931113700 =Cov(119894 119894)Var(119894)

= Cov(119894 120572 + 120591 sdot 119894 + 119894)Var(119894)

= 120591 sdot Cov(119894 119894) + Cov(119894 119894)Var(119894)

= 120591 + Cov(119894 119894)Var(119894)

Key OLS assumption

plim 111369611136931113700 = 120591 + Cov(119894 119894)Var(119894)

bull Key identification comes from Cov(119894 119894) = 0 Conditional on 119883119894 no relationship between 119863119894 and 119906119894

bull Note 119906119894 is a function of 119883119894 and 119884119894(119889) 119906119894 = 119884119894(0) minus 120572 minus 119883prime

119894 120573 when 119863119894 = 0 119906119894 = 119884119894(1) minus 120572 minus 120591 minus 119883prime

119894 120573 when 119863119894 = 1 condition on 119883119894 only variation in 119906119894 comes from 119884119894(119889)

bull No unmeasured confounding implies this assumption

119863119894 ⟂⟂ 1114100119884119894(1) 119884119894(0)1114103|119883119894 ⟹ 119863119894 ⟂⟂ 119906119894|119883119894 ⟹ Cov(119894 119894) = 0

Omitted variable bias

bull What happens when this is violated Suppose that there isone omitted variable (residualized from 119883119894)

119894 = 120582119894 + 120596119894

bull Wersquoll assume that if we could measure 119871119894 then nounmeasured confounding would hold

bull Leads to inconsistency in the OLS estimator

plim 111369611136931113700 = 120591 + 120582Cov(119894 119894)Var(119894)

bull Bias here is terms multiplied together1 coefficient on 119871119894 (120582)2 the coefficient of regression of 119863119894 on 119871119894 also controlling for 119883119894

4 Estimating causaleffects under nounmeasuredconfounders

Basic approach to estimation

bull Remember the usual approach to estimating the ATE withcovariates

bull Stratification Stratify the units by the covariates Calculate CATE within these strata

bull Standardizationdirect adjustment Average the CATEs across the strata to get ATE

bull How to create strata when 119883 has continuous components If 119883 is discrete with only a few levels can use the exact values

of 119883 Otherwise we may have to subclassifycoarsen the data

Classic example cigarspipes versuscigarettes

bull 119863119894 = 1 for pipecigar smokers 119863119894 = 0 for cigarette smokersbull 119884119894 = death in the first year of follow-upbull Naive positive effect cigarpipe smokers more likely to die

Whatrsquos the confounder here Age Pipecigar smokers much older than cigarette smokers

bull Cochranrsquos approach stratify based on coarsened age Divide age into 119896 strata 119878119894 isin 1199041113568 1199041113569 hellip 119904119896 1199041113568 might be 18-25 1199041113569 might be 26-35 and so on Calculate effect within strata and aggregate

bull Key assumption no unmeasured confounders using stratifiedversion of age

119863119894 ⟂⟂ 1114100119884119894(0) 119884119894(1)1114103|119878119894

Stratification on the propensity score

bull What about when 119883 has has many dimensionsbull Curse of dimensionality there will be very few if any units in

a given stratum of 119883119894bull Stratify on a low-dimensional summary the propensity score

119890(119909) = ℙ[119863119894 = 1|119883119894 = 119909]

PS = unitrsquos probability of being treated conditional on 119883119894

bull For a particular unit this is 119890(119883119894) = ℙ[119863119894 = 1|119883119894]bull Rosenbaum and Rubin (1983) showed that

119863119894 ⟂⟂ 1114100119884119894(0) 119884119894(1)1114103 | 119883119894 ⟹ 119863119894 ⟂⟂ 1114100119884119894(0) 119884119894(1)1114103 | 119890(119883119894)

stratifying on 119890119894 is the same as stratifying on the full 119883119894

Propensity score as balancing score

bull The propensity score is actually a balancing score whichmeans that

119863119894 ⟂⟂ 119883119894 | 119890(119883119894)

bull Conditional on the propensity score treatment is independentof the covariates

Treatment status is said to be balanced 119891(119883119894|119863119894 = 1 119890(119883119894)) = 119891(119883119894|119863119894 = 0 119890(119883119894))

bull Of course we have to know the true PS to have all theseresults work

Estimating the propensity score

bull Of course in observational studies we donrsquot know thepropensity score

bull We would run a parametric model with parameters 120574 toestimate the propensity scores

1 Estimate 2 Create 119894 = Pr[119863119894 = 1|119883119894 ]

bull For instance in R we could easily calculate the propensityscores using the glm function

pscores lt- glm(treat ~ var1 + var2 + var3 data = mydata

family = binomial())$fittedvalues

Propensity score specifics

bull What variables do we include in the propensity score model Any set of variables that blocks all the backdoor paths from 119863119894

to 119884119894

bull Check balance within strata of 119894 Covariates should bebalanced

119891(119883119894|119863119894 = 1 119894) = 119891(119883119894|119863119894 = 0 119894)

bull Can also use automatednonparametric tools for estimating 119894 Covariate Balancing Propensity Scores (Imai and Ratkovic)

Stratifying by the propensity score

bull How will we use the propensity score Matching (next week) Weighting (two weeks) Regression

(three weeks)bull Today coarsening the propensity score and stratifyingbull Choose boundary points 0 = 1198871113567 lt 1198871113568 lt hellip lt 119887119870minus1113568 lt 119887119870 = 1bull Create block indicators

119861119894(119896) =

⎧⎪⎪⎨⎪⎪⎩1 if 119887119896minus1113568 lt (119883119894) lt 1198871198960 otherwise

bull Calculate within-strata effect estimates

120591119896 = 120124[119884119894|119863119894 = 1 119861119894(119896) = 1] minus 120124[119884119894|119863119894 = 0 119861119894(119896) = 1]

Standardizationdirect adjustment

bull We calculated the CATEs for each strata of the PS 120591119896bull We can use law of iterated expectations to back out the ATEbull Take the average of the CATEs over the distribution of 119883

120591 =1198701114012119896=1113568

120591119896ℙ[119861119894(119896) = 1]

bull Note that ℙ[119861119894(119896) = 1] is just the proportion of units in block 119896

ℙ[119861119894(119896) = 1] =sum119873

119894=1113568 119861119894(119896)119873

5 Wrapping Up

Summary

bull Defined observational studiesbull Defined confounding and assessed when no unmeasured

confounding holdsbull Saw how no unmeasured confounding helps with OLSbull Saw how to estimate causal effects under no unmeasured

confounding using the propensity score

Next few weeks

bull Learn how to estimate causal effects under no unmeasuredconfounders via

Matching Weighting Regression

bull Then we move onto situations where no unmeasuredconfounders is violated

  • Observational studies
  • Confounding
  • No unmeasured confounders and OLS
  • Estimating causal effects under no unmeasured confounders
  • Wrapping Up
Page 9: Gov 2002: 4. Observational Studies and Confounding · 2019-07-23 · 2. the coefficient of regression of𝐷𝑖on 𝐿𝑖also controlling for ... Basicapproachtoestimation • Remember

Going to a superpopulation

bull From here on out wersquoll focus less on the finite populationmodel

Harder with (functionally) continuous covariatesbull Assume that each unit 119894 is drawn from an infinite

superpopulation implies that (119884119894(0) 119884119894(1) 119863119894 119883119894) are a draw from their

population joint distributionbull Potential outcomes are now typical random variables

120583119888(119909) = 120124[119884119894(0)|119883119894 = 119909] and 120583119905(119909) = 120124[119884119894(1)|119883119894 = 119909] 1205901113569119888 (119909) = 120141[119884119894(0)|119883119894 = 119909] and 1205901113569119905 (119909) = 120141[119884119894(1)|119883119894 = 119909] 120591 = 120124[120583119905(119909) minus 120583119888(119909)|119883119894 = 119909]

Assumptions in the superpopulation

bull With an infinite superpopulation worry less aboutconditioning on the entire sample

Units are now independent due to random sampling from aninfinite population

bull No unmeasured confoudning implies that

ℙ(119863119894 = 1|119884119894(0) 119884119894(1) 119883119894) = ℙ(119863119894 = 1|119883119894)

bull Or written using conditional independence

119863119894 ⟂⟂ 1114100119884119894(0) 119884119894(1)1114103|119883119894

bull Positivity can be written 0 lt ℙ(119863119894 = 1|119883119894 = 119909) lt 1 for all 119909 inthe support of 119883119894

2 Confounding

What is confounding

bull Confounding is the bias caused by common causes of thetreatment and outcome

Leads to ldquospurious correlationrdquobull In observational studies the goal is to avoid confounding

inherent in the databull Pervasive in the social sciences

effect of income on voting (confounding age) effect of job training program on employment (confounding

motivation) effect of political institutions on economic development

(confounding previous economic development)bull No unmeasured confounding assumes that wersquove measured all

sources of confounding

Big problem

bull How can we determine if no unmeasured confounding holds ifwe didnrsquot assign the treatment

bull Put differently What covariates do we need to condition on What covariates do we need to match on What covaraites do we need to include in our regressions

bull One way from the assumption itself ℙ[119863119894 = 1|119831 119832(1) 119832(0)] = ℙ[119863119894 = 1|119831] Include covariates such that conditional on them the

treatment assignment does not depend on the potentialoutcomes

bull Another way use DAGs and look at back-door paths

Backdoor paths and blocking paths

bull Backdoor path is a non-causal path from 119863 to 119884 Would remain if we removed any arrows pointing out of 119863

bull Backdoor paths between 119863 and 119884 common causes of 119863and 119884

119863

119883

119884

bull Here there is a backdoor path 119863 larr 119883 rarr 119884 where 119883 is acommon cause for the treatment and the outcome

Other types of confounding

119863

119880 119883

119884

bull 119863 is enrolling in a job training programbull 119884 is getting a jobbull 119880 is being motivatedbull 119883 is number of job applications sent outbull Big assumption here no arrow from 119880 to 119884

Other types of confounding

119863

119880 119883

119884

bull 119863 is exercisebull 119884 is having a diseasebull 119880 is lifestylebull 119883 is smokingbull Big assumption here no arrow from 119880 to 119884

Whatrsquos the problem with backdoorpaths

119863

119880 119883

119884

bull A path is blocked if1 we control for or stratify a non-collider on that path OR2 we do not control for a collider

bull Unblocked backdoor paths confoundingbull In the DAG here if we condition on 119883 then the backdoor

path is blocked

Not all backdoor paths

119863

1198801113568119883119883

119884

bull Conditioning on the posttreatment covariates opens thenon-causal path

selection bias

M-bias

119863

1198801113568 1198801113569119883119883

119884

bull Not all backdoor paths induce confoundingbull This backdoor path is blocked by the collider 119883119894 that we donrsquot

control forbull If we control for 119883119894 opens the path and induces

confounding Sometimes called M-bias

bull Controversial because of differing views on what to control for Rubin thinks that M-bias is a ldquomathematical curiosityrdquo and we

should control for all pretreatment variables Pearl and others think M-bias is a real threat

Backdoor criterion

bull Can we use a DAG to evaluate no unmeasured confoundersbull Pearl answered yes with the backdoor criterion which states

that the effect of 119863 on 119884 is identified if1 No backdoor paths from 119863 to 119884 OR2 Measured covariates are sufficient to block all backdoor paths

from 119863 to 119884

bull First is really only valid for randomized experimentsbull The backdoor criterion is fairly powerful Tells us

if there confounding given this DAG if it is possible to removing the confounding and what variables to condition on to eliminate the confounding

SWIGs

119863 | 119889 119884(119889)

119880 119883

119884

bull Itrsquos a little hard to see how the backdoor criterion implies nounmeasured confounders

No potential outcomes on this graphbull Richardson and Robins Single World Intervention Graphs

Split 119863 node into natural value (119863) and intervention value 119889 Let all effects of 119863 take their potential value under

intervention 119884(119889)bull Now can see are 119863 and 119884(119889) related

119863 larr 119880 rarr 119883 rarr 119884(119889) implies not independent Conditioning on 119883 blocks that backdoor path 119863 ⟂⟂ 119884(119889)|119883

No unmeasured confounders is nottestable

bull No unmeasured confounding places no restrictions on theobserved data

1114100119884119894(0)|119863119894 = 1119883119894111410311140591113840111384011138401113840111384011138401113840111384011138401113840111406011138401113840111384011138401113840111384011138401113840111384011138401114061unobserved

119889= 1114100119884119894(0)|119863119894 = 0119883119894111410311140591113840111384011138401113840111384011138401113840111384011138401113840111406011138401113840111384011138401113840111384011138401113840111384011138401114061observed

bull Here 119889= means equal in distributionbull No way to directly test this assumption without the

counterfactual data which is missing by definitionbull With backdoor criterion you must have the correct DAG

Assessing no unmeasured confounders

bull Can do ldquoplacebordquo tests where 119863119894 cannot have an effect(lagged outcomes etc)

bull Della Vigna and Kaplan (2007 QJE) effect of Fox Newsavailability on Republican vote share

Availability in 20002003 canrsquot affect past vote sharesbull Unconfoundedness could still be violated even if you pass this

test

Alternatives to no unmeasuredconfounding

bull Without explicit randomization we need some way ofidentifying causal effects

bull No unmeasured confounders asymp randomized experiment Indentification results very similar to experiments

bull With unmeasured confounding are we doomed Maybe notbull Other approaches rely on finding plausibly exogenous variation

in assignment of 119863119894 Instrumental variables (randomization + exclusion restriction) Over-time variation (diff-in-diff fixed effects) Arbitrary thresholds for treatment assignment (RDD)

3 No unmeasuredconfounders and OLS

Justifying regression

bull We know how randomized experiments imply thatdifferences-in-means identify the ATE

bull In the next few weeks wersquoll work through how no unmeasuredconfounding justifies a number of estimation strategies

bull Today itrsquos useful to walk through what no unmeasuredconfounding can buy us in a familiar setting OLS

Wersquoll cover regression more formally later

Constant effects set up

bull Assume a constant effects setup

119884119894(0) = 120572 + 119883prime119894 120573 + 119906119894

119884119894(1) = 120572 + 120591 + 119883prime119894 120573 + 119906119894

bull Constant effects because 119884119894(1) minus 119884119894(0) = 120591 for all unitsbull Use consistency to get the usual regression formula

119884119894 = 119884119894(1)119863119894 + 119884119894(0)(1 minus 119863119894)= 119884119894(0) + 1114100119884119894(1) minus 119884119894(0)1114103 sdot 119863119894

= 120572 + 120591 sdot 119863119894 + 119883prime119894 120573 + 119906119894

bull Does no unmeasured confounding help us identify the causalparameter 120591

Regression on residuals

bull First estimate the residuals of regression of the treatment andoutcome on the covariates

119894 = 119884119894 minus 120124[119884119894|119883119894]119894 = 119863119894 minus 120124[119863119894|119883119894]

bull Running a regression of 119894 on 119894 is equivalent to controllingfor 119883119894

119884119894 = 120572 + 120591 sdot 119863119894 + 119883prime119894 120573 + 119906119894

119894 = 120572 + 120591 sdot 119894 + 119894bull Here 119894 = 119906119894 minus 120124[119906119894|119883119894]

What does OLS estimate

bull Using the usual OLS theory we can show that the probabilitylimit of the OLS estimator of 120591 is

plim 111369611136931113700 =Cov(119894 119894)Var(119894)

= Cov(119894 120572 + 120591 sdot 119894 + 119894)Var(119894)

= 120591 sdot Cov(119894 119894) + Cov(119894 119894)Var(119894)

= 120591 + Cov(119894 119894)Var(119894)

Key OLS assumption

plim 111369611136931113700 = 120591 + Cov(119894 119894)Var(119894)

bull Key identification comes from Cov(119894 119894) = 0 Conditional on 119883119894 no relationship between 119863119894 and 119906119894

bull Note 119906119894 is a function of 119883119894 and 119884119894(119889) 119906119894 = 119884119894(0) minus 120572 minus 119883prime

119894 120573 when 119863119894 = 0 119906119894 = 119884119894(1) minus 120572 minus 120591 minus 119883prime

119894 120573 when 119863119894 = 1 condition on 119883119894 only variation in 119906119894 comes from 119884119894(119889)

bull No unmeasured confounding implies this assumption

119863119894 ⟂⟂ 1114100119884119894(1) 119884119894(0)1114103|119883119894 ⟹ 119863119894 ⟂⟂ 119906119894|119883119894 ⟹ Cov(119894 119894) = 0

Omitted variable bias

bull What happens when this is violated Suppose that there isone omitted variable (residualized from 119883119894)

119894 = 120582119894 + 120596119894

bull Wersquoll assume that if we could measure 119871119894 then nounmeasured confounding would hold

bull Leads to inconsistency in the OLS estimator

plim 111369611136931113700 = 120591 + 120582Cov(119894 119894)Var(119894)

bull Bias here is terms multiplied together1 coefficient on 119871119894 (120582)2 the coefficient of regression of 119863119894 on 119871119894 also controlling for 119883119894

4 Estimating causaleffects under nounmeasuredconfounders

Basic approach to estimation

bull Remember the usual approach to estimating the ATE withcovariates

bull Stratification Stratify the units by the covariates Calculate CATE within these strata

bull Standardizationdirect adjustment Average the CATEs across the strata to get ATE

bull How to create strata when 119883 has continuous components If 119883 is discrete with only a few levels can use the exact values

of 119883 Otherwise we may have to subclassifycoarsen the data

Classic example cigarspipes versuscigarettes

bull 119863119894 = 1 for pipecigar smokers 119863119894 = 0 for cigarette smokersbull 119884119894 = death in the first year of follow-upbull Naive positive effect cigarpipe smokers more likely to die

Whatrsquos the confounder here Age Pipecigar smokers much older than cigarette smokers

bull Cochranrsquos approach stratify based on coarsened age Divide age into 119896 strata 119878119894 isin 1199041113568 1199041113569 hellip 119904119896 1199041113568 might be 18-25 1199041113569 might be 26-35 and so on Calculate effect within strata and aggregate

bull Key assumption no unmeasured confounders using stratifiedversion of age

119863119894 ⟂⟂ 1114100119884119894(0) 119884119894(1)1114103|119878119894

Stratification on the propensity score

bull What about when 119883 has has many dimensionsbull Curse of dimensionality there will be very few if any units in

a given stratum of 119883119894bull Stratify on a low-dimensional summary the propensity score

119890(119909) = ℙ[119863119894 = 1|119883119894 = 119909]

PS = unitrsquos probability of being treated conditional on 119883119894

bull For a particular unit this is 119890(119883119894) = ℙ[119863119894 = 1|119883119894]bull Rosenbaum and Rubin (1983) showed that

119863119894 ⟂⟂ 1114100119884119894(0) 119884119894(1)1114103 | 119883119894 ⟹ 119863119894 ⟂⟂ 1114100119884119894(0) 119884119894(1)1114103 | 119890(119883119894)

stratifying on 119890119894 is the same as stratifying on the full 119883119894

Propensity score as balancing score

bull The propensity score is actually a balancing score whichmeans that

119863119894 ⟂⟂ 119883119894 | 119890(119883119894)

bull Conditional on the propensity score treatment is independentof the covariates

Treatment status is said to be balanced 119891(119883119894|119863119894 = 1 119890(119883119894)) = 119891(119883119894|119863119894 = 0 119890(119883119894))

bull Of course we have to know the true PS to have all theseresults work

Estimating the propensity score

bull Of course in observational studies we donrsquot know thepropensity score

bull We would run a parametric model with parameters 120574 toestimate the propensity scores

1 Estimate 2 Create 119894 = Pr[119863119894 = 1|119883119894 ]

bull For instance in R we could easily calculate the propensityscores using the glm function

pscores lt- glm(treat ~ var1 + var2 + var3 data = mydata

family = binomial())$fittedvalues

Propensity score specifics

bull What variables do we include in the propensity score model Any set of variables that blocks all the backdoor paths from 119863119894

to 119884119894

bull Check balance within strata of 119894 Covariates should bebalanced

119891(119883119894|119863119894 = 1 119894) = 119891(119883119894|119863119894 = 0 119894)

bull Can also use automatednonparametric tools for estimating 119894 Covariate Balancing Propensity Scores (Imai and Ratkovic)

Stratifying by the propensity score

bull How will we use the propensity score Matching (next week) Weighting (two weeks) Regression

(three weeks)bull Today coarsening the propensity score and stratifyingbull Choose boundary points 0 = 1198871113567 lt 1198871113568 lt hellip lt 119887119870minus1113568 lt 119887119870 = 1bull Create block indicators

119861119894(119896) =

⎧⎪⎪⎨⎪⎪⎩1 if 119887119896minus1113568 lt (119883119894) lt 1198871198960 otherwise

bull Calculate within-strata effect estimates

120591119896 = 120124[119884119894|119863119894 = 1 119861119894(119896) = 1] minus 120124[119884119894|119863119894 = 0 119861119894(119896) = 1]

Standardizationdirect adjustment

bull We calculated the CATEs for each strata of the PS 120591119896bull We can use law of iterated expectations to back out the ATEbull Take the average of the CATEs over the distribution of 119883

120591 =1198701114012119896=1113568

120591119896ℙ[119861119894(119896) = 1]

bull Note that ℙ[119861119894(119896) = 1] is just the proportion of units in block 119896

ℙ[119861119894(119896) = 1] =sum119873

119894=1113568 119861119894(119896)119873

5 Wrapping Up

Summary

bull Defined observational studiesbull Defined confounding and assessed when no unmeasured

confounding holdsbull Saw how no unmeasured confounding helps with OLSbull Saw how to estimate causal effects under no unmeasured

confounding using the propensity score

Next few weeks

bull Learn how to estimate causal effects under no unmeasuredconfounders via

Matching Weighting Regression

bull Then we move onto situations where no unmeasuredconfounders is violated

  • Observational studies
  • Confounding
  • No unmeasured confounders and OLS
  • Estimating causal effects under no unmeasured confounders
  • Wrapping Up
Page 10: Gov 2002: 4. Observational Studies and Confounding · 2019-07-23 · 2. the coefficient of regression of𝐷𝑖on 𝐿𝑖also controlling for ... Basicapproachtoestimation • Remember

Assumptions in the superpopulation

bull With an infinite superpopulation worry less aboutconditioning on the entire sample

Units are now independent due to random sampling from aninfinite population

bull No unmeasured confoudning implies that

ℙ(119863119894 = 1|119884119894(0) 119884119894(1) 119883119894) = ℙ(119863119894 = 1|119883119894)

bull Or written using conditional independence

119863119894 ⟂⟂ 1114100119884119894(0) 119884119894(1)1114103|119883119894

bull Positivity can be written 0 lt ℙ(119863119894 = 1|119883119894 = 119909) lt 1 for all 119909 inthe support of 119883119894

2 Confounding

What is confounding

bull Confounding is the bias caused by common causes of thetreatment and outcome

Leads to ldquospurious correlationrdquobull In observational studies the goal is to avoid confounding

inherent in the databull Pervasive in the social sciences

effect of income on voting (confounding age) effect of job training program on employment (confounding

motivation) effect of political institutions on economic development

(confounding previous economic development)bull No unmeasured confounding assumes that wersquove measured all

sources of confounding

Big problem

bull How can we determine if no unmeasured confounding holds ifwe didnrsquot assign the treatment

bull Put differently What covariates do we need to condition on What covariates do we need to match on What covaraites do we need to include in our regressions

bull One way from the assumption itself ℙ[119863119894 = 1|119831 119832(1) 119832(0)] = ℙ[119863119894 = 1|119831] Include covariates such that conditional on them the

treatment assignment does not depend on the potentialoutcomes

bull Another way use DAGs and look at back-door paths

Backdoor paths and blocking paths

bull Backdoor path is a non-causal path from 119863 to 119884 Would remain if we removed any arrows pointing out of 119863

bull Backdoor paths between 119863 and 119884 common causes of 119863and 119884

119863

119883

119884

bull Here there is a backdoor path 119863 larr 119883 rarr 119884 where 119883 is acommon cause for the treatment and the outcome

Other types of confounding

119863

119880 119883

119884

bull 119863 is enrolling in a job training programbull 119884 is getting a jobbull 119880 is being motivatedbull 119883 is number of job applications sent outbull Big assumption here no arrow from 119880 to 119884

Other types of confounding

119863

119880 119883

119884

bull 119863 is exercisebull 119884 is having a diseasebull 119880 is lifestylebull 119883 is smokingbull Big assumption here no arrow from 119880 to 119884

Whatrsquos the problem with backdoorpaths

119863

119880 119883

119884

bull A path is blocked if1 we control for or stratify a non-collider on that path OR2 we do not control for a collider

bull Unblocked backdoor paths confoundingbull In the DAG here if we condition on 119883 then the backdoor

path is blocked

Not all backdoor paths

119863

1198801113568119883119883

119884

bull Conditioning on the posttreatment covariates opens thenon-causal path

selection bias

M-bias

119863

1198801113568 1198801113569119883119883

119884

bull Not all backdoor paths induce confoundingbull This backdoor path is blocked by the collider 119883119894 that we donrsquot

control forbull If we control for 119883119894 opens the path and induces

confounding Sometimes called M-bias

bull Controversial because of differing views on what to control for Rubin thinks that M-bias is a ldquomathematical curiosityrdquo and we

should control for all pretreatment variables Pearl and others think M-bias is a real threat

Backdoor criterion

bull Can we use a DAG to evaluate no unmeasured confoundersbull Pearl answered yes with the backdoor criterion which states

that the effect of 119863 on 119884 is identified if1 No backdoor paths from 119863 to 119884 OR2 Measured covariates are sufficient to block all backdoor paths

from 119863 to 119884

bull First is really only valid for randomized experimentsbull The backdoor criterion is fairly powerful Tells us

if there confounding given this DAG if it is possible to removing the confounding and what variables to condition on to eliminate the confounding

SWIGs

119863 | 119889 119884(119889)

119880 119883

119884

bull Itrsquos a little hard to see how the backdoor criterion implies nounmeasured confounders

No potential outcomes on this graphbull Richardson and Robins Single World Intervention Graphs

Split 119863 node into natural value (119863) and intervention value 119889 Let all effects of 119863 take their potential value under

intervention 119884(119889)bull Now can see are 119863 and 119884(119889) related

119863 larr 119880 rarr 119883 rarr 119884(119889) implies not independent Conditioning on 119883 blocks that backdoor path 119863 ⟂⟂ 119884(119889)|119883

No unmeasured confounders is nottestable

bull No unmeasured confounding places no restrictions on theobserved data

1114100119884119894(0)|119863119894 = 1119883119894111410311140591113840111384011138401113840111384011138401113840111384011138401113840111406011138401113840111384011138401113840111384011138401113840111384011138401114061unobserved

119889= 1114100119884119894(0)|119863119894 = 0119883119894111410311140591113840111384011138401113840111384011138401113840111384011138401113840111406011138401113840111384011138401113840111384011138401113840111384011138401114061observed

bull Here 119889= means equal in distributionbull No way to directly test this assumption without the

counterfactual data which is missing by definitionbull With backdoor criterion you must have the correct DAG

Assessing no unmeasured confounders

bull Can do ldquoplacebordquo tests where 119863119894 cannot have an effect(lagged outcomes etc)

bull Della Vigna and Kaplan (2007 QJE) effect of Fox Newsavailability on Republican vote share

Availability in 20002003 canrsquot affect past vote sharesbull Unconfoundedness could still be violated even if you pass this

test

Alternatives to no unmeasuredconfounding

bull Without explicit randomization we need some way ofidentifying causal effects

bull No unmeasured confounders asymp randomized experiment Indentification results very similar to experiments

bull With unmeasured confounding are we doomed Maybe notbull Other approaches rely on finding plausibly exogenous variation

in assignment of 119863119894 Instrumental variables (randomization + exclusion restriction) Over-time variation (diff-in-diff fixed effects) Arbitrary thresholds for treatment assignment (RDD)

3 No unmeasuredconfounders and OLS

Justifying regression

bull We know how randomized experiments imply thatdifferences-in-means identify the ATE

bull In the next few weeks wersquoll work through how no unmeasuredconfounding justifies a number of estimation strategies

bull Today itrsquos useful to walk through what no unmeasuredconfounding can buy us in a familiar setting OLS

Wersquoll cover regression more formally later

Constant effects set up

bull Assume a constant effects setup

119884119894(0) = 120572 + 119883prime119894 120573 + 119906119894

119884119894(1) = 120572 + 120591 + 119883prime119894 120573 + 119906119894

bull Constant effects because 119884119894(1) minus 119884119894(0) = 120591 for all unitsbull Use consistency to get the usual regression formula

119884119894 = 119884119894(1)119863119894 + 119884119894(0)(1 minus 119863119894)= 119884119894(0) + 1114100119884119894(1) minus 119884119894(0)1114103 sdot 119863119894

= 120572 + 120591 sdot 119863119894 + 119883prime119894 120573 + 119906119894

bull Does no unmeasured confounding help us identify the causalparameter 120591

Regression on residuals

bull First estimate the residuals of regression of the treatment andoutcome on the covariates

119894 = 119884119894 minus 120124[119884119894|119883119894]119894 = 119863119894 minus 120124[119863119894|119883119894]

bull Running a regression of 119894 on 119894 is equivalent to controllingfor 119883119894

119884119894 = 120572 + 120591 sdot 119863119894 + 119883prime119894 120573 + 119906119894

119894 = 120572 + 120591 sdot 119894 + 119894bull Here 119894 = 119906119894 minus 120124[119906119894|119883119894]

What does OLS estimate

bull Using the usual OLS theory we can show that the probabilitylimit of the OLS estimator of 120591 is

plim 111369611136931113700 =Cov(119894 119894)Var(119894)

= Cov(119894 120572 + 120591 sdot 119894 + 119894)Var(119894)

= 120591 sdot Cov(119894 119894) + Cov(119894 119894)Var(119894)

= 120591 + Cov(119894 119894)Var(119894)

Key OLS assumption

plim 111369611136931113700 = 120591 + Cov(119894 119894)Var(119894)

bull Key identification comes from Cov(119894 119894) = 0 Conditional on 119883119894 no relationship between 119863119894 and 119906119894

bull Note 119906119894 is a function of 119883119894 and 119884119894(119889) 119906119894 = 119884119894(0) minus 120572 minus 119883prime

119894 120573 when 119863119894 = 0 119906119894 = 119884119894(1) minus 120572 minus 120591 minus 119883prime

119894 120573 when 119863119894 = 1 condition on 119883119894 only variation in 119906119894 comes from 119884119894(119889)

bull No unmeasured confounding implies this assumption

119863119894 ⟂⟂ 1114100119884119894(1) 119884119894(0)1114103|119883119894 ⟹ 119863119894 ⟂⟂ 119906119894|119883119894 ⟹ Cov(119894 119894) = 0

Omitted variable bias

bull What happens when this is violated Suppose that there isone omitted variable (residualized from 119883119894)

119894 = 120582119894 + 120596119894

bull Wersquoll assume that if we could measure 119871119894 then nounmeasured confounding would hold

bull Leads to inconsistency in the OLS estimator

plim 111369611136931113700 = 120591 + 120582Cov(119894 119894)Var(119894)

bull Bias here is terms multiplied together1 coefficient on 119871119894 (120582)2 the coefficient of regression of 119863119894 on 119871119894 also controlling for 119883119894

4 Estimating causaleffects under nounmeasuredconfounders

Basic approach to estimation

bull Remember the usual approach to estimating the ATE withcovariates

bull Stratification Stratify the units by the covariates Calculate CATE within these strata

bull Standardizationdirect adjustment Average the CATEs across the strata to get ATE

bull How to create strata when 119883 has continuous components If 119883 is discrete with only a few levels can use the exact values

of 119883 Otherwise we may have to subclassifycoarsen the data

Classic example cigarspipes versuscigarettes

bull 119863119894 = 1 for pipecigar smokers 119863119894 = 0 for cigarette smokersbull 119884119894 = death in the first year of follow-upbull Naive positive effect cigarpipe smokers more likely to die

Whatrsquos the confounder here Age Pipecigar smokers much older than cigarette smokers

bull Cochranrsquos approach stratify based on coarsened age Divide age into 119896 strata 119878119894 isin 1199041113568 1199041113569 hellip 119904119896 1199041113568 might be 18-25 1199041113569 might be 26-35 and so on Calculate effect within strata and aggregate

bull Key assumption no unmeasured confounders using stratifiedversion of age

119863119894 ⟂⟂ 1114100119884119894(0) 119884119894(1)1114103|119878119894

Stratification on the propensity score

bull What about when 119883 has has many dimensionsbull Curse of dimensionality there will be very few if any units in

a given stratum of 119883119894bull Stratify on a low-dimensional summary the propensity score

119890(119909) = ℙ[119863119894 = 1|119883119894 = 119909]

PS = unitrsquos probability of being treated conditional on 119883119894

bull For a particular unit this is 119890(119883119894) = ℙ[119863119894 = 1|119883119894]bull Rosenbaum and Rubin (1983) showed that

119863119894 ⟂⟂ 1114100119884119894(0) 119884119894(1)1114103 | 119883119894 ⟹ 119863119894 ⟂⟂ 1114100119884119894(0) 119884119894(1)1114103 | 119890(119883119894)

stratifying on 119890119894 is the same as stratifying on the full 119883119894

Propensity score as balancing score

bull The propensity score is actually a balancing score whichmeans that

119863119894 ⟂⟂ 119883119894 | 119890(119883119894)

bull Conditional on the propensity score treatment is independentof the covariates

Treatment status is said to be balanced 119891(119883119894|119863119894 = 1 119890(119883119894)) = 119891(119883119894|119863119894 = 0 119890(119883119894))

bull Of course we have to know the true PS to have all theseresults work

Estimating the propensity score

bull Of course in observational studies we donrsquot know thepropensity score

bull We would run a parametric model with parameters 120574 toestimate the propensity scores

1 Estimate 2 Create 119894 = Pr[119863119894 = 1|119883119894 ]

bull For instance in R we could easily calculate the propensityscores using the glm function

pscores lt- glm(treat ~ var1 + var2 + var3 data = mydata

family = binomial())$fittedvalues

Propensity score specifics

bull What variables do we include in the propensity score model Any set of variables that blocks all the backdoor paths from 119863119894

to 119884119894

bull Check balance within strata of 119894 Covariates should bebalanced

119891(119883119894|119863119894 = 1 119894) = 119891(119883119894|119863119894 = 0 119894)

bull Can also use automatednonparametric tools for estimating 119894 Covariate Balancing Propensity Scores (Imai and Ratkovic)

Stratifying by the propensity score

bull How will we use the propensity score Matching (next week) Weighting (two weeks) Regression

(three weeks)bull Today coarsening the propensity score and stratifyingbull Choose boundary points 0 = 1198871113567 lt 1198871113568 lt hellip lt 119887119870minus1113568 lt 119887119870 = 1bull Create block indicators

119861119894(119896) =

⎧⎪⎪⎨⎪⎪⎩1 if 119887119896minus1113568 lt (119883119894) lt 1198871198960 otherwise

bull Calculate within-strata effect estimates

120591119896 = 120124[119884119894|119863119894 = 1 119861119894(119896) = 1] minus 120124[119884119894|119863119894 = 0 119861119894(119896) = 1]

Standardizationdirect adjustment

bull We calculated the CATEs for each strata of the PS 120591119896bull We can use law of iterated expectations to back out the ATEbull Take the average of the CATEs over the distribution of 119883

120591 =1198701114012119896=1113568

120591119896ℙ[119861119894(119896) = 1]

bull Note that ℙ[119861119894(119896) = 1] is just the proportion of units in block 119896

ℙ[119861119894(119896) = 1] =sum119873

119894=1113568 119861119894(119896)119873

5 Wrapping Up

Summary

bull Defined observational studiesbull Defined confounding and assessed when no unmeasured

confounding holdsbull Saw how no unmeasured confounding helps with OLSbull Saw how to estimate causal effects under no unmeasured

confounding using the propensity score

Next few weeks

bull Learn how to estimate causal effects under no unmeasuredconfounders via

Matching Weighting Regression

bull Then we move onto situations where no unmeasuredconfounders is violated

  • Observational studies
  • Confounding
  • No unmeasured confounders and OLS
  • Estimating causal effects under no unmeasured confounders
  • Wrapping Up
Page 11: Gov 2002: 4. Observational Studies and Confounding · 2019-07-23 · 2. the coefficient of regression of𝐷𝑖on 𝐿𝑖also controlling for ... Basicapproachtoestimation • Remember

2 Confounding

What is confounding

bull Confounding is the bias caused by common causes of thetreatment and outcome

Leads to ldquospurious correlationrdquobull In observational studies the goal is to avoid confounding

inherent in the databull Pervasive in the social sciences

effect of income on voting (confounding age) effect of job training program on employment (confounding

motivation) effect of political institutions on economic development

(confounding previous economic development)bull No unmeasured confounding assumes that wersquove measured all

sources of confounding

Big problem

bull How can we determine if no unmeasured confounding holds ifwe didnrsquot assign the treatment

bull Put differently What covariates do we need to condition on What covariates do we need to match on What covaraites do we need to include in our regressions

bull One way from the assumption itself ℙ[119863119894 = 1|119831 119832(1) 119832(0)] = ℙ[119863119894 = 1|119831] Include covariates such that conditional on them the

treatment assignment does not depend on the potentialoutcomes

bull Another way use DAGs and look at back-door paths

Backdoor paths and blocking paths

bull Backdoor path is a non-causal path from 119863 to 119884 Would remain if we removed any arrows pointing out of 119863

bull Backdoor paths between 119863 and 119884 common causes of 119863and 119884

119863

119883

119884

bull Here there is a backdoor path 119863 larr 119883 rarr 119884 where 119883 is acommon cause for the treatment and the outcome

Other types of confounding

119863

119880 119883

119884

bull 119863 is enrolling in a job training programbull 119884 is getting a jobbull 119880 is being motivatedbull 119883 is number of job applications sent outbull Big assumption here no arrow from 119880 to 119884

Other types of confounding

119863

119880 119883

119884

bull 119863 is exercisebull 119884 is having a diseasebull 119880 is lifestylebull 119883 is smokingbull Big assumption here no arrow from 119880 to 119884

Whatrsquos the problem with backdoorpaths

119863

119880 119883

119884

bull A path is blocked if1 we control for or stratify a non-collider on that path OR2 we do not control for a collider

bull Unblocked backdoor paths confoundingbull In the DAG here if we condition on 119883 then the backdoor

path is blocked

Not all backdoor paths

119863

1198801113568119883119883

119884

bull Conditioning on the posttreatment covariates opens thenon-causal path

selection bias

M-bias

119863

1198801113568 1198801113569119883119883

119884

bull Not all backdoor paths induce confoundingbull This backdoor path is blocked by the collider 119883119894 that we donrsquot

control forbull If we control for 119883119894 opens the path and induces

confounding Sometimes called M-bias

bull Controversial because of differing views on what to control for Rubin thinks that M-bias is a ldquomathematical curiosityrdquo and we

should control for all pretreatment variables Pearl and others think M-bias is a real threat

Backdoor criterion

bull Can we use a DAG to evaluate no unmeasured confoundersbull Pearl answered yes with the backdoor criterion which states

that the effect of 119863 on 119884 is identified if1 No backdoor paths from 119863 to 119884 OR2 Measured covariates are sufficient to block all backdoor paths

from 119863 to 119884

bull First is really only valid for randomized experimentsbull The backdoor criterion is fairly powerful Tells us

if there confounding given this DAG if it is possible to removing the confounding and what variables to condition on to eliminate the confounding

SWIGs

119863 | 119889 119884(119889)

119880 119883

119884

bull Itrsquos a little hard to see how the backdoor criterion implies nounmeasured confounders

No potential outcomes on this graphbull Richardson and Robins Single World Intervention Graphs

Split 119863 node into natural value (119863) and intervention value 119889 Let all effects of 119863 take their potential value under

intervention 119884(119889)bull Now can see are 119863 and 119884(119889) related

119863 larr 119880 rarr 119883 rarr 119884(119889) implies not independent Conditioning on 119883 blocks that backdoor path 119863 ⟂⟂ 119884(119889)|119883

No unmeasured confounders is nottestable

bull No unmeasured confounding places no restrictions on theobserved data

1114100119884119894(0)|119863119894 = 1119883119894111410311140591113840111384011138401113840111384011138401113840111384011138401113840111406011138401113840111384011138401113840111384011138401113840111384011138401114061unobserved

119889= 1114100119884119894(0)|119863119894 = 0119883119894111410311140591113840111384011138401113840111384011138401113840111384011138401113840111406011138401113840111384011138401113840111384011138401113840111384011138401114061observed

bull Here 119889= means equal in distributionbull No way to directly test this assumption without the

counterfactual data which is missing by definitionbull With backdoor criterion you must have the correct DAG

Assessing no unmeasured confounders

bull Can do ldquoplacebordquo tests where 119863119894 cannot have an effect(lagged outcomes etc)

bull Della Vigna and Kaplan (2007 QJE) effect of Fox Newsavailability on Republican vote share

Availability in 20002003 canrsquot affect past vote sharesbull Unconfoundedness could still be violated even if you pass this

test

Alternatives to no unmeasuredconfounding

bull Without explicit randomization we need some way ofidentifying causal effects

bull No unmeasured confounders asymp randomized experiment Indentification results very similar to experiments

bull With unmeasured confounding are we doomed Maybe notbull Other approaches rely on finding plausibly exogenous variation

in assignment of 119863119894 Instrumental variables (randomization + exclusion restriction) Over-time variation (diff-in-diff fixed effects) Arbitrary thresholds for treatment assignment (RDD)

3 No unmeasuredconfounders and OLS

Justifying regression

bull We know how randomized experiments imply thatdifferences-in-means identify the ATE

bull In the next few weeks wersquoll work through how no unmeasuredconfounding justifies a number of estimation strategies

bull Today itrsquos useful to walk through what no unmeasuredconfounding can buy us in a familiar setting OLS

Wersquoll cover regression more formally later

Constant effects set up

bull Assume a constant effects setup

119884119894(0) = 120572 + 119883prime119894 120573 + 119906119894

119884119894(1) = 120572 + 120591 + 119883prime119894 120573 + 119906119894

bull Constant effects because 119884119894(1) minus 119884119894(0) = 120591 for all unitsbull Use consistency to get the usual regression formula

119884119894 = 119884119894(1)119863119894 + 119884119894(0)(1 minus 119863119894)= 119884119894(0) + 1114100119884119894(1) minus 119884119894(0)1114103 sdot 119863119894

= 120572 + 120591 sdot 119863119894 + 119883prime119894 120573 + 119906119894

bull Does no unmeasured confounding help us identify the causalparameter 120591

Regression on residuals

bull First estimate the residuals of regression of the treatment andoutcome on the covariates

119894 = 119884119894 minus 120124[119884119894|119883119894]119894 = 119863119894 minus 120124[119863119894|119883119894]

bull Running a regression of 119894 on 119894 is equivalent to controllingfor 119883119894

119884119894 = 120572 + 120591 sdot 119863119894 + 119883prime119894 120573 + 119906119894

119894 = 120572 + 120591 sdot 119894 + 119894bull Here 119894 = 119906119894 minus 120124[119906119894|119883119894]

What does OLS estimate

bull Using the usual OLS theory we can show that the probabilitylimit of the OLS estimator of 120591 is

plim 111369611136931113700 =Cov(119894 119894)Var(119894)

= Cov(119894 120572 + 120591 sdot 119894 + 119894)Var(119894)

= 120591 sdot Cov(119894 119894) + Cov(119894 119894)Var(119894)

= 120591 + Cov(119894 119894)Var(119894)

Key OLS assumption

plim 111369611136931113700 = 120591 + Cov(119894 119894)Var(119894)

bull Key identification comes from Cov(119894 119894) = 0 Conditional on 119883119894 no relationship between 119863119894 and 119906119894

bull Note 119906119894 is a function of 119883119894 and 119884119894(119889) 119906119894 = 119884119894(0) minus 120572 minus 119883prime

119894 120573 when 119863119894 = 0 119906119894 = 119884119894(1) minus 120572 minus 120591 minus 119883prime

119894 120573 when 119863119894 = 1 condition on 119883119894 only variation in 119906119894 comes from 119884119894(119889)

bull No unmeasured confounding implies this assumption

119863119894 ⟂⟂ 1114100119884119894(1) 119884119894(0)1114103|119883119894 ⟹ 119863119894 ⟂⟂ 119906119894|119883119894 ⟹ Cov(119894 119894) = 0

Omitted variable bias

bull What happens when this is violated Suppose that there isone omitted variable (residualized from 119883119894)

119894 = 120582119894 + 120596119894

bull Wersquoll assume that if we could measure 119871119894 then nounmeasured confounding would hold

bull Leads to inconsistency in the OLS estimator

plim 111369611136931113700 = 120591 + 120582Cov(119894 119894)Var(119894)

bull Bias here is terms multiplied together1 coefficient on 119871119894 (120582)2 the coefficient of regression of 119863119894 on 119871119894 also controlling for 119883119894

4 Estimating causaleffects under nounmeasuredconfounders

Basic approach to estimation

bull Remember the usual approach to estimating the ATE withcovariates

bull Stratification Stratify the units by the covariates Calculate CATE within these strata

bull Standardizationdirect adjustment Average the CATEs across the strata to get ATE

bull How to create strata when 119883 has continuous components If 119883 is discrete with only a few levels can use the exact values

of 119883 Otherwise we may have to subclassifycoarsen the data

Classic example cigarspipes versuscigarettes

bull 119863119894 = 1 for pipecigar smokers 119863119894 = 0 for cigarette smokersbull 119884119894 = death in the first year of follow-upbull Naive positive effect cigarpipe smokers more likely to die

Whatrsquos the confounder here Age Pipecigar smokers much older than cigarette smokers

bull Cochranrsquos approach stratify based on coarsened age Divide age into 119896 strata 119878119894 isin 1199041113568 1199041113569 hellip 119904119896 1199041113568 might be 18-25 1199041113569 might be 26-35 and so on Calculate effect within strata and aggregate

bull Key assumption no unmeasured confounders using stratifiedversion of age

119863119894 ⟂⟂ 1114100119884119894(0) 119884119894(1)1114103|119878119894

Stratification on the propensity score

bull What about when 119883 has has many dimensionsbull Curse of dimensionality there will be very few if any units in

a given stratum of 119883119894bull Stratify on a low-dimensional summary the propensity score

119890(119909) = ℙ[119863119894 = 1|119883119894 = 119909]

PS = unitrsquos probability of being treated conditional on 119883119894

bull For a particular unit this is 119890(119883119894) = ℙ[119863119894 = 1|119883119894]bull Rosenbaum and Rubin (1983) showed that

119863119894 ⟂⟂ 1114100119884119894(0) 119884119894(1)1114103 | 119883119894 ⟹ 119863119894 ⟂⟂ 1114100119884119894(0) 119884119894(1)1114103 | 119890(119883119894)

stratifying on 119890119894 is the same as stratifying on the full 119883119894

Propensity score as balancing score

bull The propensity score is actually a balancing score whichmeans that

119863119894 ⟂⟂ 119883119894 | 119890(119883119894)

bull Conditional on the propensity score treatment is independentof the covariates

Treatment status is said to be balanced 119891(119883119894|119863119894 = 1 119890(119883119894)) = 119891(119883119894|119863119894 = 0 119890(119883119894))

bull Of course we have to know the true PS to have all theseresults work

Estimating the propensity score

bull Of course in observational studies we donrsquot know thepropensity score

bull We would run a parametric model with parameters 120574 toestimate the propensity scores

1 Estimate 2 Create 119894 = Pr[119863119894 = 1|119883119894 ]

bull For instance in R we could easily calculate the propensityscores using the glm function

pscores lt- glm(treat ~ var1 + var2 + var3 data = mydata

family = binomial())$fittedvalues

Propensity score specifics

bull What variables do we include in the propensity score model Any set of variables that blocks all the backdoor paths from 119863119894

to 119884119894

bull Check balance within strata of 119894 Covariates should bebalanced

119891(119883119894|119863119894 = 1 119894) = 119891(119883119894|119863119894 = 0 119894)

bull Can also use automatednonparametric tools for estimating 119894 Covariate Balancing Propensity Scores (Imai and Ratkovic)

Stratifying by the propensity score

bull How will we use the propensity score Matching (next week) Weighting (two weeks) Regression

(three weeks)bull Today coarsening the propensity score and stratifyingbull Choose boundary points 0 = 1198871113567 lt 1198871113568 lt hellip lt 119887119870minus1113568 lt 119887119870 = 1bull Create block indicators

119861119894(119896) =

⎧⎪⎪⎨⎪⎪⎩1 if 119887119896minus1113568 lt (119883119894) lt 1198871198960 otherwise

bull Calculate within-strata effect estimates

120591119896 = 120124[119884119894|119863119894 = 1 119861119894(119896) = 1] minus 120124[119884119894|119863119894 = 0 119861119894(119896) = 1]

Standardizationdirect adjustment

bull We calculated the CATEs for each strata of the PS 120591119896bull We can use law of iterated expectations to back out the ATEbull Take the average of the CATEs over the distribution of 119883

120591 =1198701114012119896=1113568

120591119896ℙ[119861119894(119896) = 1]

bull Note that ℙ[119861119894(119896) = 1] is just the proportion of units in block 119896

ℙ[119861119894(119896) = 1] =sum119873

119894=1113568 119861119894(119896)119873

5 Wrapping Up

Summary

bull Defined observational studiesbull Defined confounding and assessed when no unmeasured

confounding holdsbull Saw how no unmeasured confounding helps with OLSbull Saw how to estimate causal effects under no unmeasured

confounding using the propensity score

Next few weeks

bull Learn how to estimate causal effects under no unmeasuredconfounders via

Matching Weighting Regression

bull Then we move onto situations where no unmeasuredconfounders is violated

  • Observational studies
  • Confounding
  • No unmeasured confounders and OLS
  • Estimating causal effects under no unmeasured confounders
  • Wrapping Up
Page 12: Gov 2002: 4. Observational Studies and Confounding · 2019-07-23 · 2. the coefficient of regression of𝐷𝑖on 𝐿𝑖also controlling for ... Basicapproachtoestimation • Remember

What is confounding

bull Confounding is the bias caused by common causes of thetreatment and outcome

Leads to ldquospurious correlationrdquobull In observational studies the goal is to avoid confounding

inherent in the databull Pervasive in the social sciences

effect of income on voting (confounding age) effect of job training program on employment (confounding

motivation) effect of political institutions on economic development

(confounding previous economic development)bull No unmeasured confounding assumes that wersquove measured all

sources of confounding

Big problem

bull How can we determine if no unmeasured confounding holds ifwe didnrsquot assign the treatment

bull Put differently What covariates do we need to condition on What covariates do we need to match on What covaraites do we need to include in our regressions

bull One way from the assumption itself ℙ[119863119894 = 1|119831 119832(1) 119832(0)] = ℙ[119863119894 = 1|119831] Include covariates such that conditional on them the

treatment assignment does not depend on the potentialoutcomes

bull Another way use DAGs and look at back-door paths

Backdoor paths and blocking paths

bull Backdoor path is a non-causal path from 119863 to 119884 Would remain if we removed any arrows pointing out of 119863

bull Backdoor paths between 119863 and 119884 common causes of 119863and 119884

119863

119883

119884

bull Here there is a backdoor path 119863 larr 119883 rarr 119884 where 119883 is acommon cause for the treatment and the outcome

Other types of confounding

119863

119880 119883

119884

bull 119863 is enrolling in a job training programbull 119884 is getting a jobbull 119880 is being motivatedbull 119883 is number of job applications sent outbull Big assumption here no arrow from 119880 to 119884

Other types of confounding

119863

119880 119883

119884

bull 119863 is exercisebull 119884 is having a diseasebull 119880 is lifestylebull 119883 is smokingbull Big assumption here no arrow from 119880 to 119884

Whatrsquos the problem with backdoorpaths

119863

119880 119883

119884

bull A path is blocked if1 we control for or stratify a non-collider on that path OR2 we do not control for a collider

bull Unblocked backdoor paths confoundingbull In the DAG here if we condition on 119883 then the backdoor

path is blocked

Not all backdoor paths

119863

1198801113568119883119883

119884

bull Conditioning on the posttreatment covariates opens thenon-causal path

selection bias

M-bias

119863

1198801113568 1198801113569119883119883

119884

bull Not all backdoor paths induce confoundingbull This backdoor path is blocked by the collider 119883119894 that we donrsquot

control forbull If we control for 119883119894 opens the path and induces

confounding Sometimes called M-bias

bull Controversial because of differing views on what to control for Rubin thinks that M-bias is a ldquomathematical curiosityrdquo and we

should control for all pretreatment variables Pearl and others think M-bias is a real threat

Backdoor criterion

bull Can we use a DAG to evaluate no unmeasured confoundersbull Pearl answered yes with the backdoor criterion which states

that the effect of 119863 on 119884 is identified if1 No backdoor paths from 119863 to 119884 OR2 Measured covariates are sufficient to block all backdoor paths

from 119863 to 119884

bull First is really only valid for randomized experimentsbull The backdoor criterion is fairly powerful Tells us

if there confounding given this DAG if it is possible to removing the confounding and what variables to condition on to eliminate the confounding

SWIGs

119863 | 119889 119884(119889)

119880 119883

119884

bull Itrsquos a little hard to see how the backdoor criterion implies nounmeasured confounders

No potential outcomes on this graphbull Richardson and Robins Single World Intervention Graphs

Split 119863 node into natural value (119863) and intervention value 119889 Let all effects of 119863 take their potential value under

intervention 119884(119889)bull Now can see are 119863 and 119884(119889) related

119863 larr 119880 rarr 119883 rarr 119884(119889) implies not independent Conditioning on 119883 blocks that backdoor path 119863 ⟂⟂ 119884(119889)|119883

No unmeasured confounders is nottestable

bull No unmeasured confounding places no restrictions on theobserved data

1114100119884119894(0)|119863119894 = 1119883119894111410311140591113840111384011138401113840111384011138401113840111384011138401113840111406011138401113840111384011138401113840111384011138401113840111384011138401114061unobserved

119889= 1114100119884119894(0)|119863119894 = 0119883119894111410311140591113840111384011138401113840111384011138401113840111384011138401113840111406011138401113840111384011138401113840111384011138401113840111384011138401114061observed

bull Here 119889= means equal in distributionbull No way to directly test this assumption without the

counterfactual data which is missing by definitionbull With backdoor criterion you must have the correct DAG

Assessing no unmeasured confounders

bull Can do ldquoplacebordquo tests where 119863119894 cannot have an effect(lagged outcomes etc)

bull Della Vigna and Kaplan (2007 QJE) effect of Fox Newsavailability on Republican vote share

Availability in 20002003 canrsquot affect past vote sharesbull Unconfoundedness could still be violated even if you pass this

test

Alternatives to no unmeasuredconfounding

bull Without explicit randomization we need some way ofidentifying causal effects

bull No unmeasured confounders asymp randomized experiment Indentification results very similar to experiments

bull With unmeasured confounding are we doomed Maybe notbull Other approaches rely on finding plausibly exogenous variation

in assignment of 119863119894 Instrumental variables (randomization + exclusion restriction) Over-time variation (diff-in-diff fixed effects) Arbitrary thresholds for treatment assignment (RDD)

3 No unmeasuredconfounders and OLS

Justifying regression

bull We know how randomized experiments imply thatdifferences-in-means identify the ATE

bull In the next few weeks wersquoll work through how no unmeasuredconfounding justifies a number of estimation strategies

bull Today itrsquos useful to walk through what no unmeasuredconfounding can buy us in a familiar setting OLS

Wersquoll cover regression more formally later

Constant effects set up

bull Assume a constant effects setup

119884119894(0) = 120572 + 119883prime119894 120573 + 119906119894

119884119894(1) = 120572 + 120591 + 119883prime119894 120573 + 119906119894

bull Constant effects because 119884119894(1) minus 119884119894(0) = 120591 for all unitsbull Use consistency to get the usual regression formula

119884119894 = 119884119894(1)119863119894 + 119884119894(0)(1 minus 119863119894)= 119884119894(0) + 1114100119884119894(1) minus 119884119894(0)1114103 sdot 119863119894

= 120572 + 120591 sdot 119863119894 + 119883prime119894 120573 + 119906119894

bull Does no unmeasured confounding help us identify the causalparameter 120591

Regression on residuals

bull First estimate the residuals of regression of the treatment andoutcome on the covariates

119894 = 119884119894 minus 120124[119884119894|119883119894]119894 = 119863119894 minus 120124[119863119894|119883119894]

bull Running a regression of 119894 on 119894 is equivalent to controllingfor 119883119894

119884119894 = 120572 + 120591 sdot 119863119894 + 119883prime119894 120573 + 119906119894

119894 = 120572 + 120591 sdot 119894 + 119894bull Here 119894 = 119906119894 minus 120124[119906119894|119883119894]

What does OLS estimate

bull Using the usual OLS theory we can show that the probabilitylimit of the OLS estimator of 120591 is

plim 111369611136931113700 =Cov(119894 119894)Var(119894)

= Cov(119894 120572 + 120591 sdot 119894 + 119894)Var(119894)

= 120591 sdot Cov(119894 119894) + Cov(119894 119894)Var(119894)

= 120591 + Cov(119894 119894)Var(119894)

Key OLS assumption

plim 111369611136931113700 = 120591 + Cov(119894 119894)Var(119894)

bull Key identification comes from Cov(119894 119894) = 0 Conditional on 119883119894 no relationship between 119863119894 and 119906119894

bull Note 119906119894 is a function of 119883119894 and 119884119894(119889) 119906119894 = 119884119894(0) minus 120572 minus 119883prime

119894 120573 when 119863119894 = 0 119906119894 = 119884119894(1) minus 120572 minus 120591 minus 119883prime

119894 120573 when 119863119894 = 1 condition on 119883119894 only variation in 119906119894 comes from 119884119894(119889)

bull No unmeasured confounding implies this assumption

119863119894 ⟂⟂ 1114100119884119894(1) 119884119894(0)1114103|119883119894 ⟹ 119863119894 ⟂⟂ 119906119894|119883119894 ⟹ Cov(119894 119894) = 0

Omitted variable bias

bull What happens when this is violated Suppose that there isone omitted variable (residualized from 119883119894)

119894 = 120582119894 + 120596119894

bull Wersquoll assume that if we could measure 119871119894 then nounmeasured confounding would hold

bull Leads to inconsistency in the OLS estimator

plim 111369611136931113700 = 120591 + 120582Cov(119894 119894)Var(119894)

bull Bias here is terms multiplied together1 coefficient on 119871119894 (120582)2 the coefficient of regression of 119863119894 on 119871119894 also controlling for 119883119894

4 Estimating causaleffects under nounmeasuredconfounders

Basic approach to estimation

bull Remember the usual approach to estimating the ATE withcovariates

bull Stratification Stratify the units by the covariates Calculate CATE within these strata

bull Standardizationdirect adjustment Average the CATEs across the strata to get ATE

bull How to create strata when 119883 has continuous components If 119883 is discrete with only a few levels can use the exact values

of 119883 Otherwise we may have to subclassifycoarsen the data

Classic example cigarspipes versuscigarettes

bull 119863119894 = 1 for pipecigar smokers 119863119894 = 0 for cigarette smokersbull 119884119894 = death in the first year of follow-upbull Naive positive effect cigarpipe smokers more likely to die

Whatrsquos the confounder here Age Pipecigar smokers much older than cigarette smokers

bull Cochranrsquos approach stratify based on coarsened age Divide age into 119896 strata 119878119894 isin 1199041113568 1199041113569 hellip 119904119896 1199041113568 might be 18-25 1199041113569 might be 26-35 and so on Calculate effect within strata and aggregate

bull Key assumption no unmeasured confounders using stratifiedversion of age

119863119894 ⟂⟂ 1114100119884119894(0) 119884119894(1)1114103|119878119894

Stratification on the propensity score

bull What about when 119883 has has many dimensionsbull Curse of dimensionality there will be very few if any units in

a given stratum of 119883119894bull Stratify on a low-dimensional summary the propensity score

119890(119909) = ℙ[119863119894 = 1|119883119894 = 119909]

PS = unitrsquos probability of being treated conditional on 119883119894

bull For a particular unit this is 119890(119883119894) = ℙ[119863119894 = 1|119883119894]bull Rosenbaum and Rubin (1983) showed that

119863119894 ⟂⟂ 1114100119884119894(0) 119884119894(1)1114103 | 119883119894 ⟹ 119863119894 ⟂⟂ 1114100119884119894(0) 119884119894(1)1114103 | 119890(119883119894)

stratifying on 119890119894 is the same as stratifying on the full 119883119894

Propensity score as balancing score

bull The propensity score is actually a balancing score whichmeans that

119863119894 ⟂⟂ 119883119894 | 119890(119883119894)

bull Conditional on the propensity score treatment is independentof the covariates

Treatment status is said to be balanced 119891(119883119894|119863119894 = 1 119890(119883119894)) = 119891(119883119894|119863119894 = 0 119890(119883119894))

bull Of course we have to know the true PS to have all theseresults work

Estimating the propensity score

bull Of course in observational studies we donrsquot know thepropensity score

bull We would run a parametric model with parameters 120574 toestimate the propensity scores

1 Estimate 2 Create 119894 = Pr[119863119894 = 1|119883119894 ]

bull For instance in R we could easily calculate the propensityscores using the glm function

pscores lt- glm(treat ~ var1 + var2 + var3 data = mydata

family = binomial())$fittedvalues

Propensity score specifics

bull What variables do we include in the propensity score model Any set of variables that blocks all the backdoor paths from 119863119894

to 119884119894

bull Check balance within strata of 119894 Covariates should bebalanced

119891(119883119894|119863119894 = 1 119894) = 119891(119883119894|119863119894 = 0 119894)

bull Can also use automatednonparametric tools for estimating 119894 Covariate Balancing Propensity Scores (Imai and Ratkovic)

Stratifying by the propensity score

bull How will we use the propensity score Matching (next week) Weighting (two weeks) Regression

(three weeks)bull Today coarsening the propensity score and stratifyingbull Choose boundary points 0 = 1198871113567 lt 1198871113568 lt hellip lt 119887119870minus1113568 lt 119887119870 = 1bull Create block indicators

119861119894(119896) =

⎧⎪⎪⎨⎪⎪⎩1 if 119887119896minus1113568 lt (119883119894) lt 1198871198960 otherwise

bull Calculate within-strata effect estimates

120591119896 = 120124[119884119894|119863119894 = 1 119861119894(119896) = 1] minus 120124[119884119894|119863119894 = 0 119861119894(119896) = 1]

Standardizationdirect adjustment

bull We calculated the CATEs for each strata of the PS 120591119896bull We can use law of iterated expectations to back out the ATEbull Take the average of the CATEs over the distribution of 119883

120591 =1198701114012119896=1113568

120591119896ℙ[119861119894(119896) = 1]

bull Note that ℙ[119861119894(119896) = 1] is just the proportion of units in block 119896

ℙ[119861119894(119896) = 1] =sum119873

119894=1113568 119861119894(119896)119873

5 Wrapping Up

Summary

bull Defined observational studiesbull Defined confounding and assessed when no unmeasured

confounding holdsbull Saw how no unmeasured confounding helps with OLSbull Saw how to estimate causal effects under no unmeasured

confounding using the propensity score

Next few weeks

bull Learn how to estimate causal effects under no unmeasuredconfounders via

Matching Weighting Regression

bull Then we move onto situations where no unmeasuredconfounders is violated

  • Observational studies
  • Confounding
  • No unmeasured confounders and OLS
  • Estimating causal effects under no unmeasured confounders
  • Wrapping Up
Page 13: Gov 2002: 4. Observational Studies and Confounding · 2019-07-23 · 2. the coefficient of regression of𝐷𝑖on 𝐿𝑖also controlling for ... Basicapproachtoestimation • Remember

Big problem

bull How can we determine if no unmeasured confounding holds ifwe didnrsquot assign the treatment

bull Put differently What covariates do we need to condition on What covariates do we need to match on What covaraites do we need to include in our regressions

bull One way from the assumption itself ℙ[119863119894 = 1|119831 119832(1) 119832(0)] = ℙ[119863119894 = 1|119831] Include covariates such that conditional on them the

treatment assignment does not depend on the potentialoutcomes

bull Another way use DAGs and look at back-door paths

Backdoor paths and blocking paths

bull Backdoor path is a non-causal path from 119863 to 119884 Would remain if we removed any arrows pointing out of 119863

bull Backdoor paths between 119863 and 119884 common causes of 119863and 119884

119863

119883

119884

bull Here there is a backdoor path 119863 larr 119883 rarr 119884 where 119883 is acommon cause for the treatment and the outcome

Other types of confounding

119863

119880 119883

119884

bull 119863 is enrolling in a job training programbull 119884 is getting a jobbull 119880 is being motivatedbull 119883 is number of job applications sent outbull Big assumption here no arrow from 119880 to 119884

Other types of confounding

119863

119880 119883

119884

bull 119863 is exercisebull 119884 is having a diseasebull 119880 is lifestylebull 119883 is smokingbull Big assumption here no arrow from 119880 to 119884

Whatrsquos the problem with backdoorpaths

119863

119880 119883

119884

bull A path is blocked if1 we control for or stratify a non-collider on that path OR2 we do not control for a collider

bull Unblocked backdoor paths confoundingbull In the DAG here if we condition on 119883 then the backdoor

path is blocked

Not all backdoor paths

119863

1198801113568119883119883

119884

bull Conditioning on the posttreatment covariates opens thenon-causal path

selection bias

M-bias

119863

1198801113568 1198801113569119883119883

119884

bull Not all backdoor paths induce confoundingbull This backdoor path is blocked by the collider 119883119894 that we donrsquot

control forbull If we control for 119883119894 opens the path and induces

confounding Sometimes called M-bias

bull Controversial because of differing views on what to control for Rubin thinks that M-bias is a ldquomathematical curiosityrdquo and we

should control for all pretreatment variables Pearl and others think M-bias is a real threat

Backdoor criterion

bull Can we use a DAG to evaluate no unmeasured confoundersbull Pearl answered yes with the backdoor criterion which states

that the effect of 119863 on 119884 is identified if1 No backdoor paths from 119863 to 119884 OR2 Measured covariates are sufficient to block all backdoor paths

from 119863 to 119884

bull First is really only valid for randomized experimentsbull The backdoor criterion is fairly powerful Tells us

if there confounding given this DAG if it is possible to removing the confounding and what variables to condition on to eliminate the confounding

SWIGs

119863 | 119889 119884(119889)

119880 119883

119884

bull Itrsquos a little hard to see how the backdoor criterion implies nounmeasured confounders

No potential outcomes on this graphbull Richardson and Robins Single World Intervention Graphs

Split 119863 node into natural value (119863) and intervention value 119889 Let all effects of 119863 take their potential value under

intervention 119884(119889)bull Now can see are 119863 and 119884(119889) related

119863 larr 119880 rarr 119883 rarr 119884(119889) implies not independent Conditioning on 119883 blocks that backdoor path 119863 ⟂⟂ 119884(119889)|119883

No unmeasured confounders is nottestable

bull No unmeasured confounding places no restrictions on theobserved data

1114100119884119894(0)|119863119894 = 1119883119894111410311140591113840111384011138401113840111384011138401113840111384011138401113840111406011138401113840111384011138401113840111384011138401113840111384011138401114061unobserved

119889= 1114100119884119894(0)|119863119894 = 0119883119894111410311140591113840111384011138401113840111384011138401113840111384011138401113840111406011138401113840111384011138401113840111384011138401113840111384011138401114061observed

bull Here 119889= means equal in distributionbull No way to directly test this assumption without the

counterfactual data which is missing by definitionbull With backdoor criterion you must have the correct DAG

Assessing no unmeasured confounders

bull Can do ldquoplacebordquo tests where 119863119894 cannot have an effect(lagged outcomes etc)

bull Della Vigna and Kaplan (2007 QJE) effect of Fox Newsavailability on Republican vote share

Availability in 20002003 canrsquot affect past vote sharesbull Unconfoundedness could still be violated even if you pass this

test

Alternatives to no unmeasuredconfounding

bull Without explicit randomization we need some way ofidentifying causal effects

bull No unmeasured confounders asymp randomized experiment Indentification results very similar to experiments

bull With unmeasured confounding are we doomed Maybe notbull Other approaches rely on finding plausibly exogenous variation

in assignment of 119863119894 Instrumental variables (randomization + exclusion restriction) Over-time variation (diff-in-diff fixed effects) Arbitrary thresholds for treatment assignment (RDD)

3 No unmeasuredconfounders and OLS

Justifying regression

bull We know how randomized experiments imply thatdifferences-in-means identify the ATE

bull In the next few weeks wersquoll work through how no unmeasuredconfounding justifies a number of estimation strategies

bull Today itrsquos useful to walk through what no unmeasuredconfounding can buy us in a familiar setting OLS

Wersquoll cover regression more formally later

Constant effects set up

bull Assume a constant effects setup

119884119894(0) = 120572 + 119883prime119894 120573 + 119906119894

119884119894(1) = 120572 + 120591 + 119883prime119894 120573 + 119906119894

bull Constant effects because 119884119894(1) minus 119884119894(0) = 120591 for all unitsbull Use consistency to get the usual regression formula

119884119894 = 119884119894(1)119863119894 + 119884119894(0)(1 minus 119863119894)= 119884119894(0) + 1114100119884119894(1) minus 119884119894(0)1114103 sdot 119863119894

= 120572 + 120591 sdot 119863119894 + 119883prime119894 120573 + 119906119894

bull Does no unmeasured confounding help us identify the causalparameter 120591

Regression on residuals

bull First estimate the residuals of regression of the treatment andoutcome on the covariates

119894 = 119884119894 minus 120124[119884119894|119883119894]119894 = 119863119894 minus 120124[119863119894|119883119894]

bull Running a regression of 119894 on 119894 is equivalent to controllingfor 119883119894

119884119894 = 120572 + 120591 sdot 119863119894 + 119883prime119894 120573 + 119906119894

119894 = 120572 + 120591 sdot 119894 + 119894bull Here 119894 = 119906119894 minus 120124[119906119894|119883119894]

What does OLS estimate

bull Using the usual OLS theory we can show that the probabilitylimit of the OLS estimator of 120591 is

plim 111369611136931113700 =Cov(119894 119894)Var(119894)

= Cov(119894 120572 + 120591 sdot 119894 + 119894)Var(119894)

= 120591 sdot Cov(119894 119894) + Cov(119894 119894)Var(119894)

= 120591 + Cov(119894 119894)Var(119894)

Key OLS assumption

plim 111369611136931113700 = 120591 + Cov(119894 119894)Var(119894)

bull Key identification comes from Cov(119894 119894) = 0 Conditional on 119883119894 no relationship between 119863119894 and 119906119894

bull Note 119906119894 is a function of 119883119894 and 119884119894(119889) 119906119894 = 119884119894(0) minus 120572 minus 119883prime

119894 120573 when 119863119894 = 0 119906119894 = 119884119894(1) minus 120572 minus 120591 minus 119883prime

119894 120573 when 119863119894 = 1 condition on 119883119894 only variation in 119906119894 comes from 119884119894(119889)

bull No unmeasured confounding implies this assumption

119863119894 ⟂⟂ 1114100119884119894(1) 119884119894(0)1114103|119883119894 ⟹ 119863119894 ⟂⟂ 119906119894|119883119894 ⟹ Cov(119894 119894) = 0

Omitted variable bias

bull What happens when this is violated Suppose that there isone omitted variable (residualized from 119883119894)

119894 = 120582119894 + 120596119894

bull Wersquoll assume that if we could measure 119871119894 then nounmeasured confounding would hold

bull Leads to inconsistency in the OLS estimator

plim 111369611136931113700 = 120591 + 120582Cov(119894 119894)Var(119894)

bull Bias here is terms multiplied together1 coefficient on 119871119894 (120582)2 the coefficient of regression of 119863119894 on 119871119894 also controlling for 119883119894

4 Estimating causaleffects under nounmeasuredconfounders

Basic approach to estimation

bull Remember the usual approach to estimating the ATE withcovariates

bull Stratification Stratify the units by the covariates Calculate CATE within these strata

bull Standardizationdirect adjustment Average the CATEs across the strata to get ATE

bull How to create strata when 119883 has continuous components If 119883 is discrete with only a few levels can use the exact values

of 119883 Otherwise we may have to subclassifycoarsen the data

Classic example cigarspipes versuscigarettes

bull 119863119894 = 1 for pipecigar smokers 119863119894 = 0 for cigarette smokersbull 119884119894 = death in the first year of follow-upbull Naive positive effect cigarpipe smokers more likely to die

Whatrsquos the confounder here Age Pipecigar smokers much older than cigarette smokers

bull Cochranrsquos approach stratify based on coarsened age Divide age into 119896 strata 119878119894 isin 1199041113568 1199041113569 hellip 119904119896 1199041113568 might be 18-25 1199041113569 might be 26-35 and so on Calculate effect within strata and aggregate

bull Key assumption no unmeasured confounders using stratifiedversion of age

119863119894 ⟂⟂ 1114100119884119894(0) 119884119894(1)1114103|119878119894

Stratification on the propensity score

bull What about when 119883 has has many dimensionsbull Curse of dimensionality there will be very few if any units in

a given stratum of 119883119894bull Stratify on a low-dimensional summary the propensity score

119890(119909) = ℙ[119863119894 = 1|119883119894 = 119909]

PS = unitrsquos probability of being treated conditional on 119883119894

bull For a particular unit this is 119890(119883119894) = ℙ[119863119894 = 1|119883119894]bull Rosenbaum and Rubin (1983) showed that

119863119894 ⟂⟂ 1114100119884119894(0) 119884119894(1)1114103 | 119883119894 ⟹ 119863119894 ⟂⟂ 1114100119884119894(0) 119884119894(1)1114103 | 119890(119883119894)

stratifying on 119890119894 is the same as stratifying on the full 119883119894

Propensity score as balancing score

bull The propensity score is actually a balancing score whichmeans that

119863119894 ⟂⟂ 119883119894 | 119890(119883119894)

bull Conditional on the propensity score treatment is independentof the covariates

Treatment status is said to be balanced 119891(119883119894|119863119894 = 1 119890(119883119894)) = 119891(119883119894|119863119894 = 0 119890(119883119894))

bull Of course we have to know the true PS to have all theseresults work

Estimating the propensity score

bull Of course in observational studies we donrsquot know thepropensity score

bull We would run a parametric model with parameters 120574 toestimate the propensity scores

1 Estimate 2 Create 119894 = Pr[119863119894 = 1|119883119894 ]

bull For instance in R we could easily calculate the propensityscores using the glm function

pscores lt- glm(treat ~ var1 + var2 + var3 data = mydata

family = binomial())$fittedvalues

Propensity score specifics

bull What variables do we include in the propensity score model Any set of variables that blocks all the backdoor paths from 119863119894

to 119884119894

bull Check balance within strata of 119894 Covariates should bebalanced

119891(119883119894|119863119894 = 1 119894) = 119891(119883119894|119863119894 = 0 119894)

bull Can also use automatednonparametric tools for estimating 119894 Covariate Balancing Propensity Scores (Imai and Ratkovic)

Stratifying by the propensity score

bull How will we use the propensity score Matching (next week) Weighting (two weeks) Regression

(three weeks)bull Today coarsening the propensity score and stratifyingbull Choose boundary points 0 = 1198871113567 lt 1198871113568 lt hellip lt 119887119870minus1113568 lt 119887119870 = 1bull Create block indicators

119861119894(119896) =

⎧⎪⎪⎨⎪⎪⎩1 if 119887119896minus1113568 lt (119883119894) lt 1198871198960 otherwise

bull Calculate within-strata effect estimates

120591119896 = 120124[119884119894|119863119894 = 1 119861119894(119896) = 1] minus 120124[119884119894|119863119894 = 0 119861119894(119896) = 1]

Standardizationdirect adjustment

bull We calculated the CATEs for each strata of the PS 120591119896bull We can use law of iterated expectations to back out the ATEbull Take the average of the CATEs over the distribution of 119883

120591 =1198701114012119896=1113568

120591119896ℙ[119861119894(119896) = 1]

bull Note that ℙ[119861119894(119896) = 1] is just the proportion of units in block 119896

ℙ[119861119894(119896) = 1] =sum119873

119894=1113568 119861119894(119896)119873

5 Wrapping Up

Summary

bull Defined observational studiesbull Defined confounding and assessed when no unmeasured

confounding holdsbull Saw how no unmeasured confounding helps with OLSbull Saw how to estimate causal effects under no unmeasured

confounding using the propensity score

Next few weeks

bull Learn how to estimate causal effects under no unmeasuredconfounders via

Matching Weighting Regression

bull Then we move onto situations where no unmeasuredconfounders is violated

  • Observational studies
  • Confounding
  • No unmeasured confounders and OLS
  • Estimating causal effects under no unmeasured confounders
  • Wrapping Up
Page 14: Gov 2002: 4. Observational Studies and Confounding · 2019-07-23 · 2. the coefficient of regression of𝐷𝑖on 𝐿𝑖also controlling for ... Basicapproachtoestimation • Remember

Backdoor paths and blocking paths

bull Backdoor path is a non-causal path from 119863 to 119884 Would remain if we removed any arrows pointing out of 119863

bull Backdoor paths between 119863 and 119884 common causes of 119863and 119884

119863

119883

119884

bull Here there is a backdoor path 119863 larr 119883 rarr 119884 where 119883 is acommon cause for the treatment and the outcome

Other types of confounding

119863

119880 119883

119884

bull 119863 is enrolling in a job training programbull 119884 is getting a jobbull 119880 is being motivatedbull 119883 is number of job applications sent outbull Big assumption here no arrow from 119880 to 119884

Other types of confounding

119863

119880 119883

119884

bull 119863 is exercisebull 119884 is having a diseasebull 119880 is lifestylebull 119883 is smokingbull Big assumption here no arrow from 119880 to 119884

Whatrsquos the problem with backdoorpaths

119863

119880 119883

119884

bull A path is blocked if1 we control for or stratify a non-collider on that path OR2 we do not control for a collider

bull Unblocked backdoor paths confoundingbull In the DAG here if we condition on 119883 then the backdoor

path is blocked

Not all backdoor paths

119863

1198801113568119883119883

119884

bull Conditioning on the posttreatment covariates opens thenon-causal path

selection bias

M-bias

119863

1198801113568 1198801113569119883119883

119884

bull Not all backdoor paths induce confoundingbull This backdoor path is blocked by the collider 119883119894 that we donrsquot

control forbull If we control for 119883119894 opens the path and induces

confounding Sometimes called M-bias

bull Controversial because of differing views on what to control for Rubin thinks that M-bias is a ldquomathematical curiosityrdquo and we

should control for all pretreatment variables Pearl and others think M-bias is a real threat

Backdoor criterion

bull Can we use a DAG to evaluate no unmeasured confoundersbull Pearl answered yes with the backdoor criterion which states

that the effect of 119863 on 119884 is identified if1 No backdoor paths from 119863 to 119884 OR2 Measured covariates are sufficient to block all backdoor paths

from 119863 to 119884

bull First is really only valid for randomized experimentsbull The backdoor criterion is fairly powerful Tells us

if there confounding given this DAG if it is possible to removing the confounding and what variables to condition on to eliminate the confounding

SWIGs

119863 | 119889 119884(119889)

119880 119883

119884

bull Itrsquos a little hard to see how the backdoor criterion implies nounmeasured confounders

No potential outcomes on this graphbull Richardson and Robins Single World Intervention Graphs

Split 119863 node into natural value (119863) and intervention value 119889 Let all effects of 119863 take their potential value under

intervention 119884(119889)bull Now can see are 119863 and 119884(119889) related

119863 larr 119880 rarr 119883 rarr 119884(119889) implies not independent Conditioning on 119883 blocks that backdoor path 119863 ⟂⟂ 119884(119889)|119883

No unmeasured confounders is nottestable

bull No unmeasured confounding places no restrictions on theobserved data

1114100119884119894(0)|119863119894 = 1119883119894111410311140591113840111384011138401113840111384011138401113840111384011138401113840111406011138401113840111384011138401113840111384011138401113840111384011138401114061unobserved

119889= 1114100119884119894(0)|119863119894 = 0119883119894111410311140591113840111384011138401113840111384011138401113840111384011138401113840111406011138401113840111384011138401113840111384011138401113840111384011138401114061observed

bull Here 119889= means equal in distributionbull No way to directly test this assumption without the

counterfactual data which is missing by definitionbull With backdoor criterion you must have the correct DAG

Assessing no unmeasured confounders

bull Can do ldquoplacebordquo tests where 119863119894 cannot have an effect(lagged outcomes etc)

bull Della Vigna and Kaplan (2007 QJE) effect of Fox Newsavailability on Republican vote share

Availability in 20002003 canrsquot affect past vote sharesbull Unconfoundedness could still be violated even if you pass this

test

Alternatives to no unmeasuredconfounding

bull Without explicit randomization we need some way ofidentifying causal effects

bull No unmeasured confounders asymp randomized experiment Indentification results very similar to experiments

bull With unmeasured confounding are we doomed Maybe notbull Other approaches rely on finding plausibly exogenous variation

in assignment of 119863119894 Instrumental variables (randomization + exclusion restriction) Over-time variation (diff-in-diff fixed effects) Arbitrary thresholds for treatment assignment (RDD)

3 No unmeasuredconfounders and OLS

Justifying regression

bull We know how randomized experiments imply thatdifferences-in-means identify the ATE

bull In the next few weeks wersquoll work through how no unmeasuredconfounding justifies a number of estimation strategies

bull Today itrsquos useful to walk through what no unmeasuredconfounding can buy us in a familiar setting OLS

Wersquoll cover regression more formally later

Constant effects set up

bull Assume a constant effects setup

119884119894(0) = 120572 + 119883prime119894 120573 + 119906119894

119884119894(1) = 120572 + 120591 + 119883prime119894 120573 + 119906119894

bull Constant effects because 119884119894(1) minus 119884119894(0) = 120591 for all unitsbull Use consistency to get the usual regression formula

119884119894 = 119884119894(1)119863119894 + 119884119894(0)(1 minus 119863119894)= 119884119894(0) + 1114100119884119894(1) minus 119884119894(0)1114103 sdot 119863119894

= 120572 + 120591 sdot 119863119894 + 119883prime119894 120573 + 119906119894

bull Does no unmeasured confounding help us identify the causalparameter 120591

Regression on residuals

bull First estimate the residuals of regression of the treatment andoutcome on the covariates

119894 = 119884119894 minus 120124[119884119894|119883119894]119894 = 119863119894 minus 120124[119863119894|119883119894]

bull Running a regression of 119894 on 119894 is equivalent to controllingfor 119883119894

119884119894 = 120572 + 120591 sdot 119863119894 + 119883prime119894 120573 + 119906119894

119894 = 120572 + 120591 sdot 119894 + 119894bull Here 119894 = 119906119894 minus 120124[119906119894|119883119894]

What does OLS estimate

bull Using the usual OLS theory we can show that the probabilitylimit of the OLS estimator of 120591 is

plim 111369611136931113700 =Cov(119894 119894)Var(119894)

= Cov(119894 120572 + 120591 sdot 119894 + 119894)Var(119894)

= 120591 sdot Cov(119894 119894) + Cov(119894 119894)Var(119894)

= 120591 + Cov(119894 119894)Var(119894)

Key OLS assumption

plim 111369611136931113700 = 120591 + Cov(119894 119894)Var(119894)

bull Key identification comes from Cov(119894 119894) = 0 Conditional on 119883119894 no relationship between 119863119894 and 119906119894

bull Note 119906119894 is a function of 119883119894 and 119884119894(119889) 119906119894 = 119884119894(0) minus 120572 minus 119883prime

119894 120573 when 119863119894 = 0 119906119894 = 119884119894(1) minus 120572 minus 120591 minus 119883prime

119894 120573 when 119863119894 = 1 condition on 119883119894 only variation in 119906119894 comes from 119884119894(119889)

bull No unmeasured confounding implies this assumption

119863119894 ⟂⟂ 1114100119884119894(1) 119884119894(0)1114103|119883119894 ⟹ 119863119894 ⟂⟂ 119906119894|119883119894 ⟹ Cov(119894 119894) = 0

Omitted variable bias

bull What happens when this is violated Suppose that there isone omitted variable (residualized from 119883119894)

119894 = 120582119894 + 120596119894

bull Wersquoll assume that if we could measure 119871119894 then nounmeasured confounding would hold

bull Leads to inconsistency in the OLS estimator

plim 111369611136931113700 = 120591 + 120582Cov(119894 119894)Var(119894)

bull Bias here is terms multiplied together1 coefficient on 119871119894 (120582)2 the coefficient of regression of 119863119894 on 119871119894 also controlling for 119883119894

4 Estimating causaleffects under nounmeasuredconfounders

Basic approach to estimation

bull Remember the usual approach to estimating the ATE withcovariates

bull Stratification Stratify the units by the covariates Calculate CATE within these strata

bull Standardizationdirect adjustment Average the CATEs across the strata to get ATE

bull How to create strata when 119883 has continuous components If 119883 is discrete with only a few levels can use the exact values

of 119883 Otherwise we may have to subclassifycoarsen the data

Classic example cigarspipes versuscigarettes

bull 119863119894 = 1 for pipecigar smokers 119863119894 = 0 for cigarette smokersbull 119884119894 = death in the first year of follow-upbull Naive positive effect cigarpipe smokers more likely to die

Whatrsquos the confounder here Age Pipecigar smokers much older than cigarette smokers

bull Cochranrsquos approach stratify based on coarsened age Divide age into 119896 strata 119878119894 isin 1199041113568 1199041113569 hellip 119904119896 1199041113568 might be 18-25 1199041113569 might be 26-35 and so on Calculate effect within strata and aggregate

bull Key assumption no unmeasured confounders using stratifiedversion of age

119863119894 ⟂⟂ 1114100119884119894(0) 119884119894(1)1114103|119878119894

Stratification on the propensity score

bull What about when 119883 has has many dimensionsbull Curse of dimensionality there will be very few if any units in

a given stratum of 119883119894bull Stratify on a low-dimensional summary the propensity score

119890(119909) = ℙ[119863119894 = 1|119883119894 = 119909]

PS = unitrsquos probability of being treated conditional on 119883119894

bull For a particular unit this is 119890(119883119894) = ℙ[119863119894 = 1|119883119894]bull Rosenbaum and Rubin (1983) showed that

119863119894 ⟂⟂ 1114100119884119894(0) 119884119894(1)1114103 | 119883119894 ⟹ 119863119894 ⟂⟂ 1114100119884119894(0) 119884119894(1)1114103 | 119890(119883119894)

stratifying on 119890119894 is the same as stratifying on the full 119883119894

Propensity score as balancing score

bull The propensity score is actually a balancing score whichmeans that

119863119894 ⟂⟂ 119883119894 | 119890(119883119894)

bull Conditional on the propensity score treatment is independentof the covariates

Treatment status is said to be balanced 119891(119883119894|119863119894 = 1 119890(119883119894)) = 119891(119883119894|119863119894 = 0 119890(119883119894))

bull Of course we have to know the true PS to have all theseresults work

Estimating the propensity score

bull Of course in observational studies we donrsquot know thepropensity score

bull We would run a parametric model with parameters 120574 toestimate the propensity scores

1 Estimate 2 Create 119894 = Pr[119863119894 = 1|119883119894 ]

bull For instance in R we could easily calculate the propensityscores using the glm function

pscores lt- glm(treat ~ var1 + var2 + var3 data = mydata

family = binomial())$fittedvalues

Propensity score specifics

bull What variables do we include in the propensity score model Any set of variables that blocks all the backdoor paths from 119863119894

to 119884119894

bull Check balance within strata of 119894 Covariates should bebalanced

119891(119883119894|119863119894 = 1 119894) = 119891(119883119894|119863119894 = 0 119894)

bull Can also use automatednonparametric tools for estimating 119894 Covariate Balancing Propensity Scores (Imai and Ratkovic)

Stratifying by the propensity score

bull How will we use the propensity score Matching (next week) Weighting (two weeks) Regression

(three weeks)bull Today coarsening the propensity score and stratifyingbull Choose boundary points 0 = 1198871113567 lt 1198871113568 lt hellip lt 119887119870minus1113568 lt 119887119870 = 1bull Create block indicators

119861119894(119896) =

⎧⎪⎪⎨⎪⎪⎩1 if 119887119896minus1113568 lt (119883119894) lt 1198871198960 otherwise

bull Calculate within-strata effect estimates

120591119896 = 120124[119884119894|119863119894 = 1 119861119894(119896) = 1] minus 120124[119884119894|119863119894 = 0 119861119894(119896) = 1]

Standardizationdirect adjustment

bull We calculated the CATEs for each strata of the PS 120591119896bull We can use law of iterated expectations to back out the ATEbull Take the average of the CATEs over the distribution of 119883

120591 =1198701114012119896=1113568

120591119896ℙ[119861119894(119896) = 1]

bull Note that ℙ[119861119894(119896) = 1] is just the proportion of units in block 119896

ℙ[119861119894(119896) = 1] =sum119873

119894=1113568 119861119894(119896)119873

5 Wrapping Up

Summary

bull Defined observational studiesbull Defined confounding and assessed when no unmeasured

confounding holdsbull Saw how no unmeasured confounding helps with OLSbull Saw how to estimate causal effects under no unmeasured

confounding using the propensity score

Next few weeks

bull Learn how to estimate causal effects under no unmeasuredconfounders via

Matching Weighting Regression

bull Then we move onto situations where no unmeasuredconfounders is violated

  • Observational studies
  • Confounding
  • No unmeasured confounders and OLS
  • Estimating causal effects under no unmeasured confounders
  • Wrapping Up
Page 15: Gov 2002: 4. Observational Studies and Confounding · 2019-07-23 · 2. the coefficient of regression of𝐷𝑖on 𝐿𝑖also controlling for ... Basicapproachtoestimation • Remember

Other types of confounding

119863

119880 119883

119884

bull 119863 is enrolling in a job training programbull 119884 is getting a jobbull 119880 is being motivatedbull 119883 is number of job applications sent outbull Big assumption here no arrow from 119880 to 119884

Other types of confounding

119863

119880 119883

119884

bull 119863 is exercisebull 119884 is having a diseasebull 119880 is lifestylebull 119883 is smokingbull Big assumption here no arrow from 119880 to 119884

Whatrsquos the problem with backdoorpaths

119863

119880 119883

119884

bull A path is blocked if1 we control for or stratify a non-collider on that path OR2 we do not control for a collider

bull Unblocked backdoor paths confoundingbull In the DAG here if we condition on 119883 then the backdoor

path is blocked

Not all backdoor paths

119863

1198801113568119883119883

119884

bull Conditioning on the posttreatment covariates opens thenon-causal path

selection bias

M-bias

119863

1198801113568 1198801113569119883119883

119884

bull Not all backdoor paths induce confoundingbull This backdoor path is blocked by the collider 119883119894 that we donrsquot

control forbull If we control for 119883119894 opens the path and induces

confounding Sometimes called M-bias

bull Controversial because of differing views on what to control for Rubin thinks that M-bias is a ldquomathematical curiosityrdquo and we

should control for all pretreatment variables Pearl and others think M-bias is a real threat

Backdoor criterion

bull Can we use a DAG to evaluate no unmeasured confoundersbull Pearl answered yes with the backdoor criterion which states

that the effect of 119863 on 119884 is identified if1 No backdoor paths from 119863 to 119884 OR2 Measured covariates are sufficient to block all backdoor paths

from 119863 to 119884

bull First is really only valid for randomized experimentsbull The backdoor criterion is fairly powerful Tells us

if there confounding given this DAG if it is possible to removing the confounding and what variables to condition on to eliminate the confounding

SWIGs

119863 | 119889 119884(119889)

119880 119883

119884

bull Itrsquos a little hard to see how the backdoor criterion implies nounmeasured confounders

No potential outcomes on this graphbull Richardson and Robins Single World Intervention Graphs

Split 119863 node into natural value (119863) and intervention value 119889 Let all effects of 119863 take their potential value under

intervention 119884(119889)bull Now can see are 119863 and 119884(119889) related

119863 larr 119880 rarr 119883 rarr 119884(119889) implies not independent Conditioning on 119883 blocks that backdoor path 119863 ⟂⟂ 119884(119889)|119883

No unmeasured confounders is nottestable

bull No unmeasured confounding places no restrictions on theobserved data

1114100119884119894(0)|119863119894 = 1119883119894111410311140591113840111384011138401113840111384011138401113840111384011138401113840111406011138401113840111384011138401113840111384011138401113840111384011138401114061unobserved

119889= 1114100119884119894(0)|119863119894 = 0119883119894111410311140591113840111384011138401113840111384011138401113840111384011138401113840111406011138401113840111384011138401113840111384011138401113840111384011138401114061observed

bull Here 119889= means equal in distributionbull No way to directly test this assumption without the

counterfactual data which is missing by definitionbull With backdoor criterion you must have the correct DAG

Assessing no unmeasured confounders

bull Can do ldquoplacebordquo tests where 119863119894 cannot have an effect(lagged outcomes etc)

bull Della Vigna and Kaplan (2007 QJE) effect of Fox Newsavailability on Republican vote share

Availability in 20002003 canrsquot affect past vote sharesbull Unconfoundedness could still be violated even if you pass this

test

Alternatives to no unmeasuredconfounding

bull Without explicit randomization we need some way ofidentifying causal effects

bull No unmeasured confounders asymp randomized experiment Indentification results very similar to experiments

bull With unmeasured confounding are we doomed Maybe notbull Other approaches rely on finding plausibly exogenous variation

in assignment of 119863119894 Instrumental variables (randomization + exclusion restriction) Over-time variation (diff-in-diff fixed effects) Arbitrary thresholds for treatment assignment (RDD)

3 No unmeasuredconfounders and OLS

Justifying regression

bull We know how randomized experiments imply thatdifferences-in-means identify the ATE

bull In the next few weeks wersquoll work through how no unmeasuredconfounding justifies a number of estimation strategies

bull Today itrsquos useful to walk through what no unmeasuredconfounding can buy us in a familiar setting OLS

Wersquoll cover regression more formally later

Constant effects set up

bull Assume a constant effects setup

119884119894(0) = 120572 + 119883prime119894 120573 + 119906119894

119884119894(1) = 120572 + 120591 + 119883prime119894 120573 + 119906119894

bull Constant effects because 119884119894(1) minus 119884119894(0) = 120591 for all unitsbull Use consistency to get the usual regression formula

119884119894 = 119884119894(1)119863119894 + 119884119894(0)(1 minus 119863119894)= 119884119894(0) + 1114100119884119894(1) minus 119884119894(0)1114103 sdot 119863119894

= 120572 + 120591 sdot 119863119894 + 119883prime119894 120573 + 119906119894

bull Does no unmeasured confounding help us identify the causalparameter 120591

Regression on residuals

bull First estimate the residuals of regression of the treatment andoutcome on the covariates

119894 = 119884119894 minus 120124[119884119894|119883119894]119894 = 119863119894 minus 120124[119863119894|119883119894]

bull Running a regression of 119894 on 119894 is equivalent to controllingfor 119883119894

119884119894 = 120572 + 120591 sdot 119863119894 + 119883prime119894 120573 + 119906119894

119894 = 120572 + 120591 sdot 119894 + 119894bull Here 119894 = 119906119894 minus 120124[119906119894|119883119894]

What does OLS estimate

bull Using the usual OLS theory we can show that the probabilitylimit of the OLS estimator of 120591 is

plim 111369611136931113700 =Cov(119894 119894)Var(119894)

= Cov(119894 120572 + 120591 sdot 119894 + 119894)Var(119894)

= 120591 sdot Cov(119894 119894) + Cov(119894 119894)Var(119894)

= 120591 + Cov(119894 119894)Var(119894)

Key OLS assumption

plim 111369611136931113700 = 120591 + Cov(119894 119894)Var(119894)

bull Key identification comes from Cov(119894 119894) = 0 Conditional on 119883119894 no relationship between 119863119894 and 119906119894

bull Note 119906119894 is a function of 119883119894 and 119884119894(119889) 119906119894 = 119884119894(0) minus 120572 minus 119883prime

119894 120573 when 119863119894 = 0 119906119894 = 119884119894(1) minus 120572 minus 120591 minus 119883prime

119894 120573 when 119863119894 = 1 condition on 119883119894 only variation in 119906119894 comes from 119884119894(119889)

bull No unmeasured confounding implies this assumption

119863119894 ⟂⟂ 1114100119884119894(1) 119884119894(0)1114103|119883119894 ⟹ 119863119894 ⟂⟂ 119906119894|119883119894 ⟹ Cov(119894 119894) = 0

Omitted variable bias

bull What happens when this is violated Suppose that there isone omitted variable (residualized from 119883119894)

119894 = 120582119894 + 120596119894

bull Wersquoll assume that if we could measure 119871119894 then nounmeasured confounding would hold

bull Leads to inconsistency in the OLS estimator

plim 111369611136931113700 = 120591 + 120582Cov(119894 119894)Var(119894)

bull Bias here is terms multiplied together1 coefficient on 119871119894 (120582)2 the coefficient of regression of 119863119894 on 119871119894 also controlling for 119883119894

4 Estimating causaleffects under nounmeasuredconfounders

Basic approach to estimation

bull Remember the usual approach to estimating the ATE withcovariates

bull Stratification Stratify the units by the covariates Calculate CATE within these strata

bull Standardizationdirect adjustment Average the CATEs across the strata to get ATE

bull How to create strata when 119883 has continuous components If 119883 is discrete with only a few levels can use the exact values

of 119883 Otherwise we may have to subclassifycoarsen the data

Classic example cigarspipes versuscigarettes

bull 119863119894 = 1 for pipecigar smokers 119863119894 = 0 for cigarette smokersbull 119884119894 = death in the first year of follow-upbull Naive positive effect cigarpipe smokers more likely to die

Whatrsquos the confounder here Age Pipecigar smokers much older than cigarette smokers

bull Cochranrsquos approach stratify based on coarsened age Divide age into 119896 strata 119878119894 isin 1199041113568 1199041113569 hellip 119904119896 1199041113568 might be 18-25 1199041113569 might be 26-35 and so on Calculate effect within strata and aggregate

bull Key assumption no unmeasured confounders using stratifiedversion of age

119863119894 ⟂⟂ 1114100119884119894(0) 119884119894(1)1114103|119878119894

Stratification on the propensity score

bull What about when 119883 has has many dimensionsbull Curse of dimensionality there will be very few if any units in

a given stratum of 119883119894bull Stratify on a low-dimensional summary the propensity score

119890(119909) = ℙ[119863119894 = 1|119883119894 = 119909]

PS = unitrsquos probability of being treated conditional on 119883119894

bull For a particular unit this is 119890(119883119894) = ℙ[119863119894 = 1|119883119894]bull Rosenbaum and Rubin (1983) showed that

119863119894 ⟂⟂ 1114100119884119894(0) 119884119894(1)1114103 | 119883119894 ⟹ 119863119894 ⟂⟂ 1114100119884119894(0) 119884119894(1)1114103 | 119890(119883119894)

stratifying on 119890119894 is the same as stratifying on the full 119883119894

Propensity score as balancing score

bull The propensity score is actually a balancing score whichmeans that

119863119894 ⟂⟂ 119883119894 | 119890(119883119894)

bull Conditional on the propensity score treatment is independentof the covariates

Treatment status is said to be balanced 119891(119883119894|119863119894 = 1 119890(119883119894)) = 119891(119883119894|119863119894 = 0 119890(119883119894))

bull Of course we have to know the true PS to have all theseresults work

Estimating the propensity score

bull Of course in observational studies we donrsquot know thepropensity score

bull We would run a parametric model with parameters 120574 toestimate the propensity scores

1 Estimate 2 Create 119894 = Pr[119863119894 = 1|119883119894 ]

bull For instance in R we could easily calculate the propensityscores using the glm function

pscores lt- glm(treat ~ var1 + var2 + var3 data = mydata

family = binomial())$fittedvalues

Propensity score specifics

bull What variables do we include in the propensity score model Any set of variables that blocks all the backdoor paths from 119863119894

to 119884119894

bull Check balance within strata of 119894 Covariates should bebalanced

119891(119883119894|119863119894 = 1 119894) = 119891(119883119894|119863119894 = 0 119894)

bull Can also use automatednonparametric tools for estimating 119894 Covariate Balancing Propensity Scores (Imai and Ratkovic)

Stratifying by the propensity score

bull How will we use the propensity score Matching (next week) Weighting (two weeks) Regression

(three weeks)bull Today coarsening the propensity score and stratifyingbull Choose boundary points 0 = 1198871113567 lt 1198871113568 lt hellip lt 119887119870minus1113568 lt 119887119870 = 1bull Create block indicators

119861119894(119896) =

⎧⎪⎪⎨⎪⎪⎩1 if 119887119896minus1113568 lt (119883119894) lt 1198871198960 otherwise

bull Calculate within-strata effect estimates

120591119896 = 120124[119884119894|119863119894 = 1 119861119894(119896) = 1] minus 120124[119884119894|119863119894 = 0 119861119894(119896) = 1]

Standardizationdirect adjustment

bull We calculated the CATEs for each strata of the PS 120591119896bull We can use law of iterated expectations to back out the ATEbull Take the average of the CATEs over the distribution of 119883

120591 =1198701114012119896=1113568

120591119896ℙ[119861119894(119896) = 1]

bull Note that ℙ[119861119894(119896) = 1] is just the proportion of units in block 119896

ℙ[119861119894(119896) = 1] =sum119873

119894=1113568 119861119894(119896)119873

5 Wrapping Up

Summary

bull Defined observational studiesbull Defined confounding and assessed when no unmeasured

confounding holdsbull Saw how no unmeasured confounding helps with OLSbull Saw how to estimate causal effects under no unmeasured

confounding using the propensity score

Next few weeks

bull Learn how to estimate causal effects under no unmeasuredconfounders via

Matching Weighting Regression

bull Then we move onto situations where no unmeasuredconfounders is violated

  • Observational studies
  • Confounding
  • No unmeasured confounders and OLS
  • Estimating causal effects under no unmeasured confounders
  • Wrapping Up
Page 16: Gov 2002: 4. Observational Studies and Confounding · 2019-07-23 · 2. the coefficient of regression of𝐷𝑖on 𝐿𝑖also controlling for ... Basicapproachtoestimation • Remember

Other types of confounding

119863

119880 119883

119884

bull 119863 is exercisebull 119884 is having a diseasebull 119880 is lifestylebull 119883 is smokingbull Big assumption here no arrow from 119880 to 119884

Whatrsquos the problem with backdoorpaths

119863

119880 119883

119884

bull A path is blocked if1 we control for or stratify a non-collider on that path OR2 we do not control for a collider

bull Unblocked backdoor paths confoundingbull In the DAG here if we condition on 119883 then the backdoor

path is blocked

Not all backdoor paths

119863

1198801113568119883119883

119884

bull Conditioning on the posttreatment covariates opens thenon-causal path

selection bias

M-bias

119863

1198801113568 1198801113569119883119883

119884

bull Not all backdoor paths induce confoundingbull This backdoor path is blocked by the collider 119883119894 that we donrsquot

control forbull If we control for 119883119894 opens the path and induces

confounding Sometimes called M-bias

bull Controversial because of differing views on what to control for Rubin thinks that M-bias is a ldquomathematical curiosityrdquo and we

should control for all pretreatment variables Pearl and others think M-bias is a real threat

Backdoor criterion

bull Can we use a DAG to evaluate no unmeasured confoundersbull Pearl answered yes with the backdoor criterion which states

that the effect of 119863 on 119884 is identified if1 No backdoor paths from 119863 to 119884 OR2 Measured covariates are sufficient to block all backdoor paths

from 119863 to 119884

bull First is really only valid for randomized experimentsbull The backdoor criterion is fairly powerful Tells us

if there confounding given this DAG if it is possible to removing the confounding and what variables to condition on to eliminate the confounding

SWIGs

119863 | 119889 119884(119889)

119880 119883

119884

bull Itrsquos a little hard to see how the backdoor criterion implies nounmeasured confounders

No potential outcomes on this graphbull Richardson and Robins Single World Intervention Graphs

Split 119863 node into natural value (119863) and intervention value 119889 Let all effects of 119863 take their potential value under

intervention 119884(119889)bull Now can see are 119863 and 119884(119889) related

119863 larr 119880 rarr 119883 rarr 119884(119889) implies not independent Conditioning on 119883 blocks that backdoor path 119863 ⟂⟂ 119884(119889)|119883

No unmeasured confounders is nottestable

bull No unmeasured confounding places no restrictions on theobserved data

1114100119884119894(0)|119863119894 = 1119883119894111410311140591113840111384011138401113840111384011138401113840111384011138401113840111406011138401113840111384011138401113840111384011138401113840111384011138401114061unobserved

119889= 1114100119884119894(0)|119863119894 = 0119883119894111410311140591113840111384011138401113840111384011138401113840111384011138401113840111406011138401113840111384011138401113840111384011138401113840111384011138401114061observed

bull Here 119889= means equal in distributionbull No way to directly test this assumption without the

counterfactual data which is missing by definitionbull With backdoor criterion you must have the correct DAG

Assessing no unmeasured confounders

bull Can do ldquoplacebordquo tests where 119863119894 cannot have an effect(lagged outcomes etc)

bull Della Vigna and Kaplan (2007 QJE) effect of Fox Newsavailability on Republican vote share

Availability in 20002003 canrsquot affect past vote sharesbull Unconfoundedness could still be violated even if you pass this

test

Alternatives to no unmeasuredconfounding

bull Without explicit randomization we need some way ofidentifying causal effects

bull No unmeasured confounders asymp randomized experiment Indentification results very similar to experiments

bull With unmeasured confounding are we doomed Maybe notbull Other approaches rely on finding plausibly exogenous variation

in assignment of 119863119894 Instrumental variables (randomization + exclusion restriction) Over-time variation (diff-in-diff fixed effects) Arbitrary thresholds for treatment assignment (RDD)

3 No unmeasuredconfounders and OLS

Justifying regression

bull We know how randomized experiments imply thatdifferences-in-means identify the ATE

bull In the next few weeks wersquoll work through how no unmeasuredconfounding justifies a number of estimation strategies

bull Today itrsquos useful to walk through what no unmeasuredconfounding can buy us in a familiar setting OLS

Wersquoll cover regression more formally later

Constant effects set up

bull Assume a constant effects setup

119884119894(0) = 120572 + 119883prime119894 120573 + 119906119894

119884119894(1) = 120572 + 120591 + 119883prime119894 120573 + 119906119894

bull Constant effects because 119884119894(1) minus 119884119894(0) = 120591 for all unitsbull Use consistency to get the usual regression formula

119884119894 = 119884119894(1)119863119894 + 119884119894(0)(1 minus 119863119894)= 119884119894(0) + 1114100119884119894(1) minus 119884119894(0)1114103 sdot 119863119894

= 120572 + 120591 sdot 119863119894 + 119883prime119894 120573 + 119906119894

bull Does no unmeasured confounding help us identify the causalparameter 120591

Regression on residuals

bull First estimate the residuals of regression of the treatment andoutcome on the covariates

119894 = 119884119894 minus 120124[119884119894|119883119894]119894 = 119863119894 minus 120124[119863119894|119883119894]

bull Running a regression of 119894 on 119894 is equivalent to controllingfor 119883119894

119884119894 = 120572 + 120591 sdot 119863119894 + 119883prime119894 120573 + 119906119894

119894 = 120572 + 120591 sdot 119894 + 119894bull Here 119894 = 119906119894 minus 120124[119906119894|119883119894]

What does OLS estimate

bull Using the usual OLS theory we can show that the probabilitylimit of the OLS estimator of 120591 is

plim 111369611136931113700 =Cov(119894 119894)Var(119894)

= Cov(119894 120572 + 120591 sdot 119894 + 119894)Var(119894)

= 120591 sdot Cov(119894 119894) + Cov(119894 119894)Var(119894)

= 120591 + Cov(119894 119894)Var(119894)

Key OLS assumption

plim 111369611136931113700 = 120591 + Cov(119894 119894)Var(119894)

bull Key identification comes from Cov(119894 119894) = 0 Conditional on 119883119894 no relationship between 119863119894 and 119906119894

bull Note 119906119894 is a function of 119883119894 and 119884119894(119889) 119906119894 = 119884119894(0) minus 120572 minus 119883prime

119894 120573 when 119863119894 = 0 119906119894 = 119884119894(1) minus 120572 minus 120591 minus 119883prime

119894 120573 when 119863119894 = 1 condition on 119883119894 only variation in 119906119894 comes from 119884119894(119889)

bull No unmeasured confounding implies this assumption

119863119894 ⟂⟂ 1114100119884119894(1) 119884119894(0)1114103|119883119894 ⟹ 119863119894 ⟂⟂ 119906119894|119883119894 ⟹ Cov(119894 119894) = 0

Omitted variable bias

bull What happens when this is violated Suppose that there isone omitted variable (residualized from 119883119894)

119894 = 120582119894 + 120596119894

bull Wersquoll assume that if we could measure 119871119894 then nounmeasured confounding would hold

bull Leads to inconsistency in the OLS estimator

plim 111369611136931113700 = 120591 + 120582Cov(119894 119894)Var(119894)

bull Bias here is terms multiplied together1 coefficient on 119871119894 (120582)2 the coefficient of regression of 119863119894 on 119871119894 also controlling for 119883119894

4 Estimating causaleffects under nounmeasuredconfounders

Basic approach to estimation

bull Remember the usual approach to estimating the ATE withcovariates

bull Stratification Stratify the units by the covariates Calculate CATE within these strata

bull Standardizationdirect adjustment Average the CATEs across the strata to get ATE

bull How to create strata when 119883 has continuous components If 119883 is discrete with only a few levels can use the exact values

of 119883 Otherwise we may have to subclassifycoarsen the data

Classic example cigarspipes versuscigarettes

bull 119863119894 = 1 for pipecigar smokers 119863119894 = 0 for cigarette smokersbull 119884119894 = death in the first year of follow-upbull Naive positive effect cigarpipe smokers more likely to die

Whatrsquos the confounder here Age Pipecigar smokers much older than cigarette smokers

bull Cochranrsquos approach stratify based on coarsened age Divide age into 119896 strata 119878119894 isin 1199041113568 1199041113569 hellip 119904119896 1199041113568 might be 18-25 1199041113569 might be 26-35 and so on Calculate effect within strata and aggregate

bull Key assumption no unmeasured confounders using stratifiedversion of age

119863119894 ⟂⟂ 1114100119884119894(0) 119884119894(1)1114103|119878119894

Stratification on the propensity score

bull What about when 119883 has has many dimensionsbull Curse of dimensionality there will be very few if any units in

a given stratum of 119883119894bull Stratify on a low-dimensional summary the propensity score

119890(119909) = ℙ[119863119894 = 1|119883119894 = 119909]

PS = unitrsquos probability of being treated conditional on 119883119894

bull For a particular unit this is 119890(119883119894) = ℙ[119863119894 = 1|119883119894]bull Rosenbaum and Rubin (1983) showed that

119863119894 ⟂⟂ 1114100119884119894(0) 119884119894(1)1114103 | 119883119894 ⟹ 119863119894 ⟂⟂ 1114100119884119894(0) 119884119894(1)1114103 | 119890(119883119894)

stratifying on 119890119894 is the same as stratifying on the full 119883119894

Propensity score as balancing score

bull The propensity score is actually a balancing score whichmeans that

119863119894 ⟂⟂ 119883119894 | 119890(119883119894)

bull Conditional on the propensity score treatment is independentof the covariates

Treatment status is said to be balanced 119891(119883119894|119863119894 = 1 119890(119883119894)) = 119891(119883119894|119863119894 = 0 119890(119883119894))

bull Of course we have to know the true PS to have all theseresults work

Estimating the propensity score

bull Of course in observational studies we donrsquot know thepropensity score

bull We would run a parametric model with parameters 120574 toestimate the propensity scores

1 Estimate 2 Create 119894 = Pr[119863119894 = 1|119883119894 ]

bull For instance in R we could easily calculate the propensityscores using the glm function

pscores lt- glm(treat ~ var1 + var2 + var3 data = mydata

family = binomial())$fittedvalues

Propensity score specifics

bull What variables do we include in the propensity score model Any set of variables that blocks all the backdoor paths from 119863119894

to 119884119894

bull Check balance within strata of 119894 Covariates should bebalanced

119891(119883119894|119863119894 = 1 119894) = 119891(119883119894|119863119894 = 0 119894)

bull Can also use automatednonparametric tools for estimating 119894 Covariate Balancing Propensity Scores (Imai and Ratkovic)

Stratifying by the propensity score

bull How will we use the propensity score Matching (next week) Weighting (two weeks) Regression

(three weeks)bull Today coarsening the propensity score and stratifyingbull Choose boundary points 0 = 1198871113567 lt 1198871113568 lt hellip lt 119887119870minus1113568 lt 119887119870 = 1bull Create block indicators

119861119894(119896) =

⎧⎪⎪⎨⎪⎪⎩1 if 119887119896minus1113568 lt (119883119894) lt 1198871198960 otherwise

bull Calculate within-strata effect estimates

120591119896 = 120124[119884119894|119863119894 = 1 119861119894(119896) = 1] minus 120124[119884119894|119863119894 = 0 119861119894(119896) = 1]

Standardizationdirect adjustment

bull We calculated the CATEs for each strata of the PS 120591119896bull We can use law of iterated expectations to back out the ATEbull Take the average of the CATEs over the distribution of 119883

120591 =1198701114012119896=1113568

120591119896ℙ[119861119894(119896) = 1]

bull Note that ℙ[119861119894(119896) = 1] is just the proportion of units in block 119896

ℙ[119861119894(119896) = 1] =sum119873

119894=1113568 119861119894(119896)119873

5 Wrapping Up

Summary

bull Defined observational studiesbull Defined confounding and assessed when no unmeasured

confounding holdsbull Saw how no unmeasured confounding helps with OLSbull Saw how to estimate causal effects under no unmeasured

confounding using the propensity score

Next few weeks

bull Learn how to estimate causal effects under no unmeasuredconfounders via

Matching Weighting Regression

bull Then we move onto situations where no unmeasuredconfounders is violated

  • Observational studies
  • Confounding
  • No unmeasured confounders and OLS
  • Estimating causal effects under no unmeasured confounders
  • Wrapping Up
Page 17: Gov 2002: 4. Observational Studies and Confounding · 2019-07-23 · 2. the coefficient of regression of𝐷𝑖on 𝐿𝑖also controlling for ... Basicapproachtoestimation • Remember

Whatrsquos the problem with backdoorpaths

119863

119880 119883

119884

bull A path is blocked if1 we control for or stratify a non-collider on that path OR2 we do not control for a collider

bull Unblocked backdoor paths confoundingbull In the DAG here if we condition on 119883 then the backdoor

path is blocked

Not all backdoor paths

119863

1198801113568119883119883

119884

bull Conditioning on the posttreatment covariates opens thenon-causal path

selection bias

M-bias

119863

1198801113568 1198801113569119883119883

119884

bull Not all backdoor paths induce confoundingbull This backdoor path is blocked by the collider 119883119894 that we donrsquot

control forbull If we control for 119883119894 opens the path and induces

confounding Sometimes called M-bias

bull Controversial because of differing views on what to control for Rubin thinks that M-bias is a ldquomathematical curiosityrdquo and we

should control for all pretreatment variables Pearl and others think M-bias is a real threat

Backdoor criterion

bull Can we use a DAG to evaluate no unmeasured confoundersbull Pearl answered yes with the backdoor criterion which states

that the effect of 119863 on 119884 is identified if1 No backdoor paths from 119863 to 119884 OR2 Measured covariates are sufficient to block all backdoor paths

from 119863 to 119884

bull First is really only valid for randomized experimentsbull The backdoor criterion is fairly powerful Tells us

if there confounding given this DAG if it is possible to removing the confounding and what variables to condition on to eliminate the confounding

SWIGs

119863 | 119889 119884(119889)

119880 119883

119884

bull Itrsquos a little hard to see how the backdoor criterion implies nounmeasured confounders

No potential outcomes on this graphbull Richardson and Robins Single World Intervention Graphs

Split 119863 node into natural value (119863) and intervention value 119889 Let all effects of 119863 take their potential value under

intervention 119884(119889)bull Now can see are 119863 and 119884(119889) related

119863 larr 119880 rarr 119883 rarr 119884(119889) implies not independent Conditioning on 119883 blocks that backdoor path 119863 ⟂⟂ 119884(119889)|119883

No unmeasured confounders is nottestable

bull No unmeasured confounding places no restrictions on theobserved data

1114100119884119894(0)|119863119894 = 1119883119894111410311140591113840111384011138401113840111384011138401113840111384011138401113840111406011138401113840111384011138401113840111384011138401113840111384011138401114061unobserved

119889= 1114100119884119894(0)|119863119894 = 0119883119894111410311140591113840111384011138401113840111384011138401113840111384011138401113840111406011138401113840111384011138401113840111384011138401113840111384011138401114061observed

bull Here 119889= means equal in distributionbull No way to directly test this assumption without the

counterfactual data which is missing by definitionbull With backdoor criterion you must have the correct DAG

Assessing no unmeasured confounders

bull Can do ldquoplacebordquo tests where 119863119894 cannot have an effect(lagged outcomes etc)

bull Della Vigna and Kaplan (2007 QJE) effect of Fox Newsavailability on Republican vote share

Availability in 20002003 canrsquot affect past vote sharesbull Unconfoundedness could still be violated even if you pass this

test

Alternatives to no unmeasuredconfounding

bull Without explicit randomization we need some way ofidentifying causal effects

bull No unmeasured confounders asymp randomized experiment Indentification results very similar to experiments

bull With unmeasured confounding are we doomed Maybe notbull Other approaches rely on finding plausibly exogenous variation

in assignment of 119863119894 Instrumental variables (randomization + exclusion restriction) Over-time variation (diff-in-diff fixed effects) Arbitrary thresholds for treatment assignment (RDD)

3 No unmeasuredconfounders and OLS

Justifying regression

bull We know how randomized experiments imply thatdifferences-in-means identify the ATE

bull In the next few weeks wersquoll work through how no unmeasuredconfounding justifies a number of estimation strategies

bull Today itrsquos useful to walk through what no unmeasuredconfounding can buy us in a familiar setting OLS

Wersquoll cover regression more formally later

Constant effects set up

bull Assume a constant effects setup

119884119894(0) = 120572 + 119883prime119894 120573 + 119906119894

119884119894(1) = 120572 + 120591 + 119883prime119894 120573 + 119906119894

bull Constant effects because 119884119894(1) minus 119884119894(0) = 120591 for all unitsbull Use consistency to get the usual regression formula

119884119894 = 119884119894(1)119863119894 + 119884119894(0)(1 minus 119863119894)= 119884119894(0) + 1114100119884119894(1) minus 119884119894(0)1114103 sdot 119863119894

= 120572 + 120591 sdot 119863119894 + 119883prime119894 120573 + 119906119894

bull Does no unmeasured confounding help us identify the causalparameter 120591

Regression on residuals

bull First estimate the residuals of regression of the treatment andoutcome on the covariates

119894 = 119884119894 minus 120124[119884119894|119883119894]119894 = 119863119894 minus 120124[119863119894|119883119894]

bull Running a regression of 119894 on 119894 is equivalent to controllingfor 119883119894

119884119894 = 120572 + 120591 sdot 119863119894 + 119883prime119894 120573 + 119906119894

119894 = 120572 + 120591 sdot 119894 + 119894bull Here 119894 = 119906119894 minus 120124[119906119894|119883119894]

What does OLS estimate

bull Using the usual OLS theory we can show that the probabilitylimit of the OLS estimator of 120591 is

plim 111369611136931113700 =Cov(119894 119894)Var(119894)

= Cov(119894 120572 + 120591 sdot 119894 + 119894)Var(119894)

= 120591 sdot Cov(119894 119894) + Cov(119894 119894)Var(119894)

= 120591 + Cov(119894 119894)Var(119894)

Key OLS assumption

plim 111369611136931113700 = 120591 + Cov(119894 119894)Var(119894)

bull Key identification comes from Cov(119894 119894) = 0 Conditional on 119883119894 no relationship between 119863119894 and 119906119894

bull Note 119906119894 is a function of 119883119894 and 119884119894(119889) 119906119894 = 119884119894(0) minus 120572 minus 119883prime

119894 120573 when 119863119894 = 0 119906119894 = 119884119894(1) minus 120572 minus 120591 minus 119883prime

119894 120573 when 119863119894 = 1 condition on 119883119894 only variation in 119906119894 comes from 119884119894(119889)

bull No unmeasured confounding implies this assumption

119863119894 ⟂⟂ 1114100119884119894(1) 119884119894(0)1114103|119883119894 ⟹ 119863119894 ⟂⟂ 119906119894|119883119894 ⟹ Cov(119894 119894) = 0

Omitted variable bias

bull What happens when this is violated Suppose that there isone omitted variable (residualized from 119883119894)

119894 = 120582119894 + 120596119894

bull Wersquoll assume that if we could measure 119871119894 then nounmeasured confounding would hold

bull Leads to inconsistency in the OLS estimator

plim 111369611136931113700 = 120591 + 120582Cov(119894 119894)Var(119894)

bull Bias here is terms multiplied together1 coefficient on 119871119894 (120582)2 the coefficient of regression of 119863119894 on 119871119894 also controlling for 119883119894

4 Estimating causaleffects under nounmeasuredconfounders

Basic approach to estimation

bull Remember the usual approach to estimating the ATE withcovariates

bull Stratification Stratify the units by the covariates Calculate CATE within these strata

bull Standardizationdirect adjustment Average the CATEs across the strata to get ATE

bull How to create strata when 119883 has continuous components If 119883 is discrete with only a few levels can use the exact values

of 119883 Otherwise we may have to subclassifycoarsen the data

Classic example cigarspipes versuscigarettes

bull 119863119894 = 1 for pipecigar smokers 119863119894 = 0 for cigarette smokersbull 119884119894 = death in the first year of follow-upbull Naive positive effect cigarpipe smokers more likely to die

Whatrsquos the confounder here Age Pipecigar smokers much older than cigarette smokers

bull Cochranrsquos approach stratify based on coarsened age Divide age into 119896 strata 119878119894 isin 1199041113568 1199041113569 hellip 119904119896 1199041113568 might be 18-25 1199041113569 might be 26-35 and so on Calculate effect within strata and aggregate

bull Key assumption no unmeasured confounders using stratifiedversion of age

119863119894 ⟂⟂ 1114100119884119894(0) 119884119894(1)1114103|119878119894

Stratification on the propensity score

bull What about when 119883 has has many dimensionsbull Curse of dimensionality there will be very few if any units in

a given stratum of 119883119894bull Stratify on a low-dimensional summary the propensity score

119890(119909) = ℙ[119863119894 = 1|119883119894 = 119909]

PS = unitrsquos probability of being treated conditional on 119883119894

bull For a particular unit this is 119890(119883119894) = ℙ[119863119894 = 1|119883119894]bull Rosenbaum and Rubin (1983) showed that

119863119894 ⟂⟂ 1114100119884119894(0) 119884119894(1)1114103 | 119883119894 ⟹ 119863119894 ⟂⟂ 1114100119884119894(0) 119884119894(1)1114103 | 119890(119883119894)

stratifying on 119890119894 is the same as stratifying on the full 119883119894

Propensity score as balancing score

bull The propensity score is actually a balancing score whichmeans that

119863119894 ⟂⟂ 119883119894 | 119890(119883119894)

bull Conditional on the propensity score treatment is independentof the covariates

Treatment status is said to be balanced 119891(119883119894|119863119894 = 1 119890(119883119894)) = 119891(119883119894|119863119894 = 0 119890(119883119894))

bull Of course we have to know the true PS to have all theseresults work

Estimating the propensity score

bull Of course in observational studies we donrsquot know thepropensity score

bull We would run a parametric model with parameters 120574 toestimate the propensity scores

1 Estimate 2 Create 119894 = Pr[119863119894 = 1|119883119894 ]

bull For instance in R we could easily calculate the propensityscores using the glm function

pscores lt- glm(treat ~ var1 + var2 + var3 data = mydata

family = binomial())$fittedvalues

Propensity score specifics

bull What variables do we include in the propensity score model Any set of variables that blocks all the backdoor paths from 119863119894

to 119884119894

bull Check balance within strata of 119894 Covariates should bebalanced

119891(119883119894|119863119894 = 1 119894) = 119891(119883119894|119863119894 = 0 119894)

bull Can also use automatednonparametric tools for estimating 119894 Covariate Balancing Propensity Scores (Imai and Ratkovic)

Stratifying by the propensity score

bull How will we use the propensity score Matching (next week) Weighting (two weeks) Regression

(three weeks)bull Today coarsening the propensity score and stratifyingbull Choose boundary points 0 = 1198871113567 lt 1198871113568 lt hellip lt 119887119870minus1113568 lt 119887119870 = 1bull Create block indicators

119861119894(119896) =

⎧⎪⎪⎨⎪⎪⎩1 if 119887119896minus1113568 lt (119883119894) lt 1198871198960 otherwise

bull Calculate within-strata effect estimates

120591119896 = 120124[119884119894|119863119894 = 1 119861119894(119896) = 1] minus 120124[119884119894|119863119894 = 0 119861119894(119896) = 1]

Standardizationdirect adjustment

bull We calculated the CATEs for each strata of the PS 120591119896bull We can use law of iterated expectations to back out the ATEbull Take the average of the CATEs over the distribution of 119883

120591 =1198701114012119896=1113568

120591119896ℙ[119861119894(119896) = 1]

bull Note that ℙ[119861119894(119896) = 1] is just the proportion of units in block 119896

ℙ[119861119894(119896) = 1] =sum119873

119894=1113568 119861119894(119896)119873

5 Wrapping Up

Summary

bull Defined observational studiesbull Defined confounding and assessed when no unmeasured

confounding holdsbull Saw how no unmeasured confounding helps with OLSbull Saw how to estimate causal effects under no unmeasured

confounding using the propensity score

Next few weeks

bull Learn how to estimate causal effects under no unmeasuredconfounders via

Matching Weighting Regression

bull Then we move onto situations where no unmeasuredconfounders is violated

  • Observational studies
  • Confounding
  • No unmeasured confounders and OLS
  • Estimating causal effects under no unmeasured confounders
  • Wrapping Up
Page 18: Gov 2002: 4. Observational Studies and Confounding · 2019-07-23 · 2. the coefficient of regression of𝐷𝑖on 𝐿𝑖also controlling for ... Basicapproachtoestimation • Remember

Not all backdoor paths

119863

1198801113568119883119883

119884

bull Conditioning on the posttreatment covariates opens thenon-causal path

selection bias

M-bias

119863

1198801113568 1198801113569119883119883

119884

bull Not all backdoor paths induce confoundingbull This backdoor path is blocked by the collider 119883119894 that we donrsquot

control forbull If we control for 119883119894 opens the path and induces

confounding Sometimes called M-bias

bull Controversial because of differing views on what to control for Rubin thinks that M-bias is a ldquomathematical curiosityrdquo and we

should control for all pretreatment variables Pearl and others think M-bias is a real threat

Backdoor criterion

bull Can we use a DAG to evaluate no unmeasured confoundersbull Pearl answered yes with the backdoor criterion which states

that the effect of 119863 on 119884 is identified if1 No backdoor paths from 119863 to 119884 OR2 Measured covariates are sufficient to block all backdoor paths

from 119863 to 119884

bull First is really only valid for randomized experimentsbull The backdoor criterion is fairly powerful Tells us

if there confounding given this DAG if it is possible to removing the confounding and what variables to condition on to eliminate the confounding

SWIGs

119863 | 119889 119884(119889)

119880 119883

119884

bull Itrsquos a little hard to see how the backdoor criterion implies nounmeasured confounders

No potential outcomes on this graphbull Richardson and Robins Single World Intervention Graphs

Split 119863 node into natural value (119863) and intervention value 119889 Let all effects of 119863 take their potential value under

intervention 119884(119889)bull Now can see are 119863 and 119884(119889) related

119863 larr 119880 rarr 119883 rarr 119884(119889) implies not independent Conditioning on 119883 blocks that backdoor path 119863 ⟂⟂ 119884(119889)|119883

No unmeasured confounders is nottestable

bull No unmeasured confounding places no restrictions on theobserved data

1114100119884119894(0)|119863119894 = 1119883119894111410311140591113840111384011138401113840111384011138401113840111384011138401113840111406011138401113840111384011138401113840111384011138401113840111384011138401114061unobserved

119889= 1114100119884119894(0)|119863119894 = 0119883119894111410311140591113840111384011138401113840111384011138401113840111384011138401113840111406011138401113840111384011138401113840111384011138401113840111384011138401114061observed

bull Here 119889= means equal in distributionbull No way to directly test this assumption without the

counterfactual data which is missing by definitionbull With backdoor criterion you must have the correct DAG

Assessing no unmeasured confounders

bull Can do ldquoplacebordquo tests where 119863119894 cannot have an effect(lagged outcomes etc)

bull Della Vigna and Kaplan (2007 QJE) effect of Fox Newsavailability on Republican vote share

Availability in 20002003 canrsquot affect past vote sharesbull Unconfoundedness could still be violated even if you pass this

test

Alternatives to no unmeasuredconfounding

bull Without explicit randomization we need some way ofidentifying causal effects

bull No unmeasured confounders asymp randomized experiment Indentification results very similar to experiments

bull With unmeasured confounding are we doomed Maybe notbull Other approaches rely on finding plausibly exogenous variation

in assignment of 119863119894 Instrumental variables (randomization + exclusion restriction) Over-time variation (diff-in-diff fixed effects) Arbitrary thresholds for treatment assignment (RDD)

3 No unmeasuredconfounders and OLS

Justifying regression

bull We know how randomized experiments imply thatdifferences-in-means identify the ATE

bull In the next few weeks wersquoll work through how no unmeasuredconfounding justifies a number of estimation strategies

bull Today itrsquos useful to walk through what no unmeasuredconfounding can buy us in a familiar setting OLS

Wersquoll cover regression more formally later

Constant effects set up

bull Assume a constant effects setup

119884119894(0) = 120572 + 119883prime119894 120573 + 119906119894

119884119894(1) = 120572 + 120591 + 119883prime119894 120573 + 119906119894

bull Constant effects because 119884119894(1) minus 119884119894(0) = 120591 for all unitsbull Use consistency to get the usual regression formula

119884119894 = 119884119894(1)119863119894 + 119884119894(0)(1 minus 119863119894)= 119884119894(0) + 1114100119884119894(1) minus 119884119894(0)1114103 sdot 119863119894

= 120572 + 120591 sdot 119863119894 + 119883prime119894 120573 + 119906119894

bull Does no unmeasured confounding help us identify the causalparameter 120591

Regression on residuals

bull First estimate the residuals of regression of the treatment andoutcome on the covariates

119894 = 119884119894 minus 120124[119884119894|119883119894]119894 = 119863119894 minus 120124[119863119894|119883119894]

bull Running a regression of 119894 on 119894 is equivalent to controllingfor 119883119894

119884119894 = 120572 + 120591 sdot 119863119894 + 119883prime119894 120573 + 119906119894

119894 = 120572 + 120591 sdot 119894 + 119894bull Here 119894 = 119906119894 minus 120124[119906119894|119883119894]

What does OLS estimate

bull Using the usual OLS theory we can show that the probabilitylimit of the OLS estimator of 120591 is

plim 111369611136931113700 =Cov(119894 119894)Var(119894)

= Cov(119894 120572 + 120591 sdot 119894 + 119894)Var(119894)

= 120591 sdot Cov(119894 119894) + Cov(119894 119894)Var(119894)

= 120591 + Cov(119894 119894)Var(119894)

Key OLS assumption

plim 111369611136931113700 = 120591 + Cov(119894 119894)Var(119894)

bull Key identification comes from Cov(119894 119894) = 0 Conditional on 119883119894 no relationship between 119863119894 and 119906119894

bull Note 119906119894 is a function of 119883119894 and 119884119894(119889) 119906119894 = 119884119894(0) minus 120572 minus 119883prime

119894 120573 when 119863119894 = 0 119906119894 = 119884119894(1) minus 120572 minus 120591 minus 119883prime

119894 120573 when 119863119894 = 1 condition on 119883119894 only variation in 119906119894 comes from 119884119894(119889)

bull No unmeasured confounding implies this assumption

119863119894 ⟂⟂ 1114100119884119894(1) 119884119894(0)1114103|119883119894 ⟹ 119863119894 ⟂⟂ 119906119894|119883119894 ⟹ Cov(119894 119894) = 0

Omitted variable bias

bull What happens when this is violated Suppose that there isone omitted variable (residualized from 119883119894)

119894 = 120582119894 + 120596119894

bull Wersquoll assume that if we could measure 119871119894 then nounmeasured confounding would hold

bull Leads to inconsistency in the OLS estimator

plim 111369611136931113700 = 120591 + 120582Cov(119894 119894)Var(119894)

bull Bias here is terms multiplied together1 coefficient on 119871119894 (120582)2 the coefficient of regression of 119863119894 on 119871119894 also controlling for 119883119894

4 Estimating causaleffects under nounmeasuredconfounders

Basic approach to estimation

bull Remember the usual approach to estimating the ATE withcovariates

bull Stratification Stratify the units by the covariates Calculate CATE within these strata

bull Standardizationdirect adjustment Average the CATEs across the strata to get ATE

bull How to create strata when 119883 has continuous components If 119883 is discrete with only a few levels can use the exact values

of 119883 Otherwise we may have to subclassifycoarsen the data

Classic example cigarspipes versuscigarettes

bull 119863119894 = 1 for pipecigar smokers 119863119894 = 0 for cigarette smokersbull 119884119894 = death in the first year of follow-upbull Naive positive effect cigarpipe smokers more likely to die

Whatrsquos the confounder here Age Pipecigar smokers much older than cigarette smokers

bull Cochranrsquos approach stratify based on coarsened age Divide age into 119896 strata 119878119894 isin 1199041113568 1199041113569 hellip 119904119896 1199041113568 might be 18-25 1199041113569 might be 26-35 and so on Calculate effect within strata and aggregate

bull Key assumption no unmeasured confounders using stratifiedversion of age

119863119894 ⟂⟂ 1114100119884119894(0) 119884119894(1)1114103|119878119894

Stratification on the propensity score

bull What about when 119883 has has many dimensionsbull Curse of dimensionality there will be very few if any units in

a given stratum of 119883119894bull Stratify on a low-dimensional summary the propensity score

119890(119909) = ℙ[119863119894 = 1|119883119894 = 119909]

PS = unitrsquos probability of being treated conditional on 119883119894

bull For a particular unit this is 119890(119883119894) = ℙ[119863119894 = 1|119883119894]bull Rosenbaum and Rubin (1983) showed that

119863119894 ⟂⟂ 1114100119884119894(0) 119884119894(1)1114103 | 119883119894 ⟹ 119863119894 ⟂⟂ 1114100119884119894(0) 119884119894(1)1114103 | 119890(119883119894)

stratifying on 119890119894 is the same as stratifying on the full 119883119894

Propensity score as balancing score

bull The propensity score is actually a balancing score whichmeans that

119863119894 ⟂⟂ 119883119894 | 119890(119883119894)

bull Conditional on the propensity score treatment is independentof the covariates

Treatment status is said to be balanced 119891(119883119894|119863119894 = 1 119890(119883119894)) = 119891(119883119894|119863119894 = 0 119890(119883119894))

bull Of course we have to know the true PS to have all theseresults work

Estimating the propensity score

bull Of course in observational studies we donrsquot know thepropensity score

bull We would run a parametric model with parameters 120574 toestimate the propensity scores

1 Estimate 2 Create 119894 = Pr[119863119894 = 1|119883119894 ]

bull For instance in R we could easily calculate the propensityscores using the glm function

pscores lt- glm(treat ~ var1 + var2 + var3 data = mydata

family = binomial())$fittedvalues

Propensity score specifics

bull What variables do we include in the propensity score model Any set of variables that blocks all the backdoor paths from 119863119894

to 119884119894

bull Check balance within strata of 119894 Covariates should bebalanced

119891(119883119894|119863119894 = 1 119894) = 119891(119883119894|119863119894 = 0 119894)

bull Can also use automatednonparametric tools for estimating 119894 Covariate Balancing Propensity Scores (Imai and Ratkovic)

Stratifying by the propensity score

bull How will we use the propensity score Matching (next week) Weighting (two weeks) Regression

(three weeks)bull Today coarsening the propensity score and stratifyingbull Choose boundary points 0 = 1198871113567 lt 1198871113568 lt hellip lt 119887119870minus1113568 lt 119887119870 = 1bull Create block indicators

119861119894(119896) =

⎧⎪⎪⎨⎪⎪⎩1 if 119887119896minus1113568 lt (119883119894) lt 1198871198960 otherwise

bull Calculate within-strata effect estimates

120591119896 = 120124[119884119894|119863119894 = 1 119861119894(119896) = 1] minus 120124[119884119894|119863119894 = 0 119861119894(119896) = 1]

Standardizationdirect adjustment

bull We calculated the CATEs for each strata of the PS 120591119896bull We can use law of iterated expectations to back out the ATEbull Take the average of the CATEs over the distribution of 119883

120591 =1198701114012119896=1113568

120591119896ℙ[119861119894(119896) = 1]

bull Note that ℙ[119861119894(119896) = 1] is just the proportion of units in block 119896

ℙ[119861119894(119896) = 1] =sum119873

119894=1113568 119861119894(119896)119873

5 Wrapping Up

Summary

bull Defined observational studiesbull Defined confounding and assessed when no unmeasured

confounding holdsbull Saw how no unmeasured confounding helps with OLSbull Saw how to estimate causal effects under no unmeasured

confounding using the propensity score

Next few weeks

bull Learn how to estimate causal effects under no unmeasuredconfounders via

Matching Weighting Regression

bull Then we move onto situations where no unmeasuredconfounders is violated

  • Observational studies
  • Confounding
  • No unmeasured confounders and OLS
  • Estimating causal effects under no unmeasured confounders
  • Wrapping Up
Page 19: Gov 2002: 4. Observational Studies and Confounding · 2019-07-23 · 2. the coefficient of regression of𝐷𝑖on 𝐿𝑖also controlling for ... Basicapproachtoestimation • Remember

M-bias

119863

1198801113568 1198801113569119883119883

119884

bull Not all backdoor paths induce confoundingbull This backdoor path is blocked by the collider 119883119894 that we donrsquot

control forbull If we control for 119883119894 opens the path and induces

confounding Sometimes called M-bias

bull Controversial because of differing views on what to control for Rubin thinks that M-bias is a ldquomathematical curiosityrdquo and we

should control for all pretreatment variables Pearl and others think M-bias is a real threat

Backdoor criterion

bull Can we use a DAG to evaluate no unmeasured confoundersbull Pearl answered yes with the backdoor criterion which states

that the effect of 119863 on 119884 is identified if1 No backdoor paths from 119863 to 119884 OR2 Measured covariates are sufficient to block all backdoor paths

from 119863 to 119884

bull First is really only valid for randomized experimentsbull The backdoor criterion is fairly powerful Tells us

if there confounding given this DAG if it is possible to removing the confounding and what variables to condition on to eliminate the confounding

SWIGs

119863 | 119889 119884(119889)

119880 119883

119884

bull Itrsquos a little hard to see how the backdoor criterion implies nounmeasured confounders

No potential outcomes on this graphbull Richardson and Robins Single World Intervention Graphs

Split 119863 node into natural value (119863) and intervention value 119889 Let all effects of 119863 take their potential value under

intervention 119884(119889)bull Now can see are 119863 and 119884(119889) related

119863 larr 119880 rarr 119883 rarr 119884(119889) implies not independent Conditioning on 119883 blocks that backdoor path 119863 ⟂⟂ 119884(119889)|119883

No unmeasured confounders is nottestable

bull No unmeasured confounding places no restrictions on theobserved data

1114100119884119894(0)|119863119894 = 1119883119894111410311140591113840111384011138401113840111384011138401113840111384011138401113840111406011138401113840111384011138401113840111384011138401113840111384011138401114061unobserved

119889= 1114100119884119894(0)|119863119894 = 0119883119894111410311140591113840111384011138401113840111384011138401113840111384011138401113840111406011138401113840111384011138401113840111384011138401113840111384011138401114061observed

bull Here 119889= means equal in distributionbull No way to directly test this assumption without the

counterfactual data which is missing by definitionbull With backdoor criterion you must have the correct DAG

Assessing no unmeasured confounders

bull Can do ldquoplacebordquo tests where 119863119894 cannot have an effect(lagged outcomes etc)

bull Della Vigna and Kaplan (2007 QJE) effect of Fox Newsavailability on Republican vote share

Availability in 20002003 canrsquot affect past vote sharesbull Unconfoundedness could still be violated even if you pass this

test

Alternatives to no unmeasuredconfounding

bull Without explicit randomization we need some way ofidentifying causal effects

bull No unmeasured confounders asymp randomized experiment Indentification results very similar to experiments

bull With unmeasured confounding are we doomed Maybe notbull Other approaches rely on finding plausibly exogenous variation

in assignment of 119863119894 Instrumental variables (randomization + exclusion restriction) Over-time variation (diff-in-diff fixed effects) Arbitrary thresholds for treatment assignment (RDD)

3 No unmeasuredconfounders and OLS

Justifying regression

bull We know how randomized experiments imply thatdifferences-in-means identify the ATE

bull In the next few weeks wersquoll work through how no unmeasuredconfounding justifies a number of estimation strategies

bull Today itrsquos useful to walk through what no unmeasuredconfounding can buy us in a familiar setting OLS

Wersquoll cover regression more formally later

Constant effects set up

bull Assume a constant effects setup

119884119894(0) = 120572 + 119883prime119894 120573 + 119906119894

119884119894(1) = 120572 + 120591 + 119883prime119894 120573 + 119906119894

bull Constant effects because 119884119894(1) minus 119884119894(0) = 120591 for all unitsbull Use consistency to get the usual regression formula

119884119894 = 119884119894(1)119863119894 + 119884119894(0)(1 minus 119863119894)= 119884119894(0) + 1114100119884119894(1) minus 119884119894(0)1114103 sdot 119863119894

= 120572 + 120591 sdot 119863119894 + 119883prime119894 120573 + 119906119894

bull Does no unmeasured confounding help us identify the causalparameter 120591

Regression on residuals

bull First estimate the residuals of regression of the treatment andoutcome on the covariates

119894 = 119884119894 minus 120124[119884119894|119883119894]119894 = 119863119894 minus 120124[119863119894|119883119894]

bull Running a regression of 119894 on 119894 is equivalent to controllingfor 119883119894

119884119894 = 120572 + 120591 sdot 119863119894 + 119883prime119894 120573 + 119906119894

119894 = 120572 + 120591 sdot 119894 + 119894bull Here 119894 = 119906119894 minus 120124[119906119894|119883119894]

What does OLS estimate

bull Using the usual OLS theory we can show that the probabilitylimit of the OLS estimator of 120591 is

plim 111369611136931113700 =Cov(119894 119894)Var(119894)

= Cov(119894 120572 + 120591 sdot 119894 + 119894)Var(119894)

= 120591 sdot Cov(119894 119894) + Cov(119894 119894)Var(119894)

= 120591 + Cov(119894 119894)Var(119894)

Key OLS assumption

plim 111369611136931113700 = 120591 + Cov(119894 119894)Var(119894)

bull Key identification comes from Cov(119894 119894) = 0 Conditional on 119883119894 no relationship between 119863119894 and 119906119894

bull Note 119906119894 is a function of 119883119894 and 119884119894(119889) 119906119894 = 119884119894(0) minus 120572 minus 119883prime

119894 120573 when 119863119894 = 0 119906119894 = 119884119894(1) minus 120572 minus 120591 minus 119883prime

119894 120573 when 119863119894 = 1 condition on 119883119894 only variation in 119906119894 comes from 119884119894(119889)

bull No unmeasured confounding implies this assumption

119863119894 ⟂⟂ 1114100119884119894(1) 119884119894(0)1114103|119883119894 ⟹ 119863119894 ⟂⟂ 119906119894|119883119894 ⟹ Cov(119894 119894) = 0

Omitted variable bias

bull What happens when this is violated Suppose that there isone omitted variable (residualized from 119883119894)

119894 = 120582119894 + 120596119894

bull Wersquoll assume that if we could measure 119871119894 then nounmeasured confounding would hold

bull Leads to inconsistency in the OLS estimator

plim 111369611136931113700 = 120591 + 120582Cov(119894 119894)Var(119894)

bull Bias here is terms multiplied together1 coefficient on 119871119894 (120582)2 the coefficient of regression of 119863119894 on 119871119894 also controlling for 119883119894

4 Estimating causaleffects under nounmeasuredconfounders

Basic approach to estimation

bull Remember the usual approach to estimating the ATE withcovariates

bull Stratification Stratify the units by the covariates Calculate CATE within these strata

bull Standardizationdirect adjustment Average the CATEs across the strata to get ATE

bull How to create strata when 119883 has continuous components If 119883 is discrete with only a few levels can use the exact values

of 119883 Otherwise we may have to subclassifycoarsen the data

Classic example cigarspipes versuscigarettes

bull 119863119894 = 1 for pipecigar smokers 119863119894 = 0 for cigarette smokersbull 119884119894 = death in the first year of follow-upbull Naive positive effect cigarpipe smokers more likely to die

Whatrsquos the confounder here Age Pipecigar smokers much older than cigarette smokers

bull Cochranrsquos approach stratify based on coarsened age Divide age into 119896 strata 119878119894 isin 1199041113568 1199041113569 hellip 119904119896 1199041113568 might be 18-25 1199041113569 might be 26-35 and so on Calculate effect within strata and aggregate

bull Key assumption no unmeasured confounders using stratifiedversion of age

119863119894 ⟂⟂ 1114100119884119894(0) 119884119894(1)1114103|119878119894

Stratification on the propensity score

bull What about when 119883 has has many dimensionsbull Curse of dimensionality there will be very few if any units in

a given stratum of 119883119894bull Stratify on a low-dimensional summary the propensity score

119890(119909) = ℙ[119863119894 = 1|119883119894 = 119909]

PS = unitrsquos probability of being treated conditional on 119883119894

bull For a particular unit this is 119890(119883119894) = ℙ[119863119894 = 1|119883119894]bull Rosenbaum and Rubin (1983) showed that

119863119894 ⟂⟂ 1114100119884119894(0) 119884119894(1)1114103 | 119883119894 ⟹ 119863119894 ⟂⟂ 1114100119884119894(0) 119884119894(1)1114103 | 119890(119883119894)

stratifying on 119890119894 is the same as stratifying on the full 119883119894

Propensity score as balancing score

bull The propensity score is actually a balancing score whichmeans that

119863119894 ⟂⟂ 119883119894 | 119890(119883119894)

bull Conditional on the propensity score treatment is independentof the covariates

Treatment status is said to be balanced 119891(119883119894|119863119894 = 1 119890(119883119894)) = 119891(119883119894|119863119894 = 0 119890(119883119894))

bull Of course we have to know the true PS to have all theseresults work

Estimating the propensity score

bull Of course in observational studies we donrsquot know thepropensity score

bull We would run a parametric model with parameters 120574 toestimate the propensity scores

1 Estimate 2 Create 119894 = Pr[119863119894 = 1|119883119894 ]

bull For instance in R we could easily calculate the propensityscores using the glm function

pscores lt- glm(treat ~ var1 + var2 + var3 data = mydata

family = binomial())$fittedvalues

Propensity score specifics

bull What variables do we include in the propensity score model Any set of variables that blocks all the backdoor paths from 119863119894

to 119884119894

bull Check balance within strata of 119894 Covariates should bebalanced

119891(119883119894|119863119894 = 1 119894) = 119891(119883119894|119863119894 = 0 119894)

bull Can also use automatednonparametric tools for estimating 119894 Covariate Balancing Propensity Scores (Imai and Ratkovic)

Stratifying by the propensity score

bull How will we use the propensity score Matching (next week) Weighting (two weeks) Regression

(three weeks)bull Today coarsening the propensity score and stratifyingbull Choose boundary points 0 = 1198871113567 lt 1198871113568 lt hellip lt 119887119870minus1113568 lt 119887119870 = 1bull Create block indicators

119861119894(119896) =

⎧⎪⎪⎨⎪⎪⎩1 if 119887119896minus1113568 lt (119883119894) lt 1198871198960 otherwise

bull Calculate within-strata effect estimates

120591119896 = 120124[119884119894|119863119894 = 1 119861119894(119896) = 1] minus 120124[119884119894|119863119894 = 0 119861119894(119896) = 1]

Standardizationdirect adjustment

bull We calculated the CATEs for each strata of the PS 120591119896bull We can use law of iterated expectations to back out the ATEbull Take the average of the CATEs over the distribution of 119883

120591 =1198701114012119896=1113568

120591119896ℙ[119861119894(119896) = 1]

bull Note that ℙ[119861119894(119896) = 1] is just the proportion of units in block 119896

ℙ[119861119894(119896) = 1] =sum119873

119894=1113568 119861119894(119896)119873

5 Wrapping Up

Summary

bull Defined observational studiesbull Defined confounding and assessed when no unmeasured

confounding holdsbull Saw how no unmeasured confounding helps with OLSbull Saw how to estimate causal effects under no unmeasured

confounding using the propensity score

Next few weeks

bull Learn how to estimate causal effects under no unmeasuredconfounders via

Matching Weighting Regression

bull Then we move onto situations where no unmeasuredconfounders is violated

  • Observational studies
  • Confounding
  • No unmeasured confounders and OLS
  • Estimating causal effects under no unmeasured confounders
  • Wrapping Up
Page 20: Gov 2002: 4. Observational Studies and Confounding · 2019-07-23 · 2. the coefficient of regression of𝐷𝑖on 𝐿𝑖also controlling for ... Basicapproachtoestimation • Remember

Backdoor criterion

bull Can we use a DAG to evaluate no unmeasured confoundersbull Pearl answered yes with the backdoor criterion which states

that the effect of 119863 on 119884 is identified if1 No backdoor paths from 119863 to 119884 OR2 Measured covariates are sufficient to block all backdoor paths

from 119863 to 119884

bull First is really only valid for randomized experimentsbull The backdoor criterion is fairly powerful Tells us

if there confounding given this DAG if it is possible to removing the confounding and what variables to condition on to eliminate the confounding

SWIGs

119863 | 119889 119884(119889)

119880 119883

119884

bull Itrsquos a little hard to see how the backdoor criterion implies nounmeasured confounders

No potential outcomes on this graphbull Richardson and Robins Single World Intervention Graphs

Split 119863 node into natural value (119863) and intervention value 119889 Let all effects of 119863 take their potential value under

intervention 119884(119889)bull Now can see are 119863 and 119884(119889) related

119863 larr 119880 rarr 119883 rarr 119884(119889) implies not independent Conditioning on 119883 blocks that backdoor path 119863 ⟂⟂ 119884(119889)|119883

No unmeasured confounders is nottestable

bull No unmeasured confounding places no restrictions on theobserved data

1114100119884119894(0)|119863119894 = 1119883119894111410311140591113840111384011138401113840111384011138401113840111384011138401113840111406011138401113840111384011138401113840111384011138401113840111384011138401114061unobserved

119889= 1114100119884119894(0)|119863119894 = 0119883119894111410311140591113840111384011138401113840111384011138401113840111384011138401113840111406011138401113840111384011138401113840111384011138401113840111384011138401114061observed

bull Here 119889= means equal in distributionbull No way to directly test this assumption without the

counterfactual data which is missing by definitionbull With backdoor criterion you must have the correct DAG

Assessing no unmeasured confounders

bull Can do ldquoplacebordquo tests where 119863119894 cannot have an effect(lagged outcomes etc)

bull Della Vigna and Kaplan (2007 QJE) effect of Fox Newsavailability on Republican vote share

Availability in 20002003 canrsquot affect past vote sharesbull Unconfoundedness could still be violated even if you pass this

test

Alternatives to no unmeasuredconfounding

bull Without explicit randomization we need some way ofidentifying causal effects

bull No unmeasured confounders asymp randomized experiment Indentification results very similar to experiments

bull With unmeasured confounding are we doomed Maybe notbull Other approaches rely on finding plausibly exogenous variation

in assignment of 119863119894 Instrumental variables (randomization + exclusion restriction) Over-time variation (diff-in-diff fixed effects) Arbitrary thresholds for treatment assignment (RDD)

3 No unmeasuredconfounders and OLS

Justifying regression

bull We know how randomized experiments imply thatdifferences-in-means identify the ATE

bull In the next few weeks wersquoll work through how no unmeasuredconfounding justifies a number of estimation strategies

bull Today itrsquos useful to walk through what no unmeasuredconfounding can buy us in a familiar setting OLS

Wersquoll cover regression more formally later

Constant effects set up

bull Assume a constant effects setup

119884119894(0) = 120572 + 119883prime119894 120573 + 119906119894

119884119894(1) = 120572 + 120591 + 119883prime119894 120573 + 119906119894

bull Constant effects because 119884119894(1) minus 119884119894(0) = 120591 for all unitsbull Use consistency to get the usual regression formula

119884119894 = 119884119894(1)119863119894 + 119884119894(0)(1 minus 119863119894)= 119884119894(0) + 1114100119884119894(1) minus 119884119894(0)1114103 sdot 119863119894

= 120572 + 120591 sdot 119863119894 + 119883prime119894 120573 + 119906119894

bull Does no unmeasured confounding help us identify the causalparameter 120591

Regression on residuals

bull First estimate the residuals of regression of the treatment andoutcome on the covariates

119894 = 119884119894 minus 120124[119884119894|119883119894]119894 = 119863119894 minus 120124[119863119894|119883119894]

bull Running a regression of 119894 on 119894 is equivalent to controllingfor 119883119894

119884119894 = 120572 + 120591 sdot 119863119894 + 119883prime119894 120573 + 119906119894

119894 = 120572 + 120591 sdot 119894 + 119894bull Here 119894 = 119906119894 minus 120124[119906119894|119883119894]

What does OLS estimate

bull Using the usual OLS theory we can show that the probabilitylimit of the OLS estimator of 120591 is

plim 111369611136931113700 =Cov(119894 119894)Var(119894)

= Cov(119894 120572 + 120591 sdot 119894 + 119894)Var(119894)

= 120591 sdot Cov(119894 119894) + Cov(119894 119894)Var(119894)

= 120591 + Cov(119894 119894)Var(119894)

Key OLS assumption

plim 111369611136931113700 = 120591 + Cov(119894 119894)Var(119894)

bull Key identification comes from Cov(119894 119894) = 0 Conditional on 119883119894 no relationship between 119863119894 and 119906119894

bull Note 119906119894 is a function of 119883119894 and 119884119894(119889) 119906119894 = 119884119894(0) minus 120572 minus 119883prime

119894 120573 when 119863119894 = 0 119906119894 = 119884119894(1) minus 120572 minus 120591 minus 119883prime

119894 120573 when 119863119894 = 1 condition on 119883119894 only variation in 119906119894 comes from 119884119894(119889)

bull No unmeasured confounding implies this assumption

119863119894 ⟂⟂ 1114100119884119894(1) 119884119894(0)1114103|119883119894 ⟹ 119863119894 ⟂⟂ 119906119894|119883119894 ⟹ Cov(119894 119894) = 0

Omitted variable bias

bull What happens when this is violated Suppose that there isone omitted variable (residualized from 119883119894)

119894 = 120582119894 + 120596119894

bull Wersquoll assume that if we could measure 119871119894 then nounmeasured confounding would hold

bull Leads to inconsistency in the OLS estimator

plim 111369611136931113700 = 120591 + 120582Cov(119894 119894)Var(119894)

bull Bias here is terms multiplied together1 coefficient on 119871119894 (120582)2 the coefficient of regression of 119863119894 on 119871119894 also controlling for 119883119894

4 Estimating causaleffects under nounmeasuredconfounders

Basic approach to estimation

bull Remember the usual approach to estimating the ATE withcovariates

bull Stratification Stratify the units by the covariates Calculate CATE within these strata

bull Standardizationdirect adjustment Average the CATEs across the strata to get ATE

bull How to create strata when 119883 has continuous components If 119883 is discrete with only a few levels can use the exact values

of 119883 Otherwise we may have to subclassifycoarsen the data

Classic example cigarspipes versuscigarettes

bull 119863119894 = 1 for pipecigar smokers 119863119894 = 0 for cigarette smokersbull 119884119894 = death in the first year of follow-upbull Naive positive effect cigarpipe smokers more likely to die

Whatrsquos the confounder here Age Pipecigar smokers much older than cigarette smokers

bull Cochranrsquos approach stratify based on coarsened age Divide age into 119896 strata 119878119894 isin 1199041113568 1199041113569 hellip 119904119896 1199041113568 might be 18-25 1199041113569 might be 26-35 and so on Calculate effect within strata and aggregate

bull Key assumption no unmeasured confounders using stratifiedversion of age

119863119894 ⟂⟂ 1114100119884119894(0) 119884119894(1)1114103|119878119894

Stratification on the propensity score

bull What about when 119883 has has many dimensionsbull Curse of dimensionality there will be very few if any units in

a given stratum of 119883119894bull Stratify on a low-dimensional summary the propensity score

119890(119909) = ℙ[119863119894 = 1|119883119894 = 119909]

PS = unitrsquos probability of being treated conditional on 119883119894

bull For a particular unit this is 119890(119883119894) = ℙ[119863119894 = 1|119883119894]bull Rosenbaum and Rubin (1983) showed that

119863119894 ⟂⟂ 1114100119884119894(0) 119884119894(1)1114103 | 119883119894 ⟹ 119863119894 ⟂⟂ 1114100119884119894(0) 119884119894(1)1114103 | 119890(119883119894)

stratifying on 119890119894 is the same as stratifying on the full 119883119894

Propensity score as balancing score

bull The propensity score is actually a balancing score whichmeans that

119863119894 ⟂⟂ 119883119894 | 119890(119883119894)

bull Conditional on the propensity score treatment is independentof the covariates

Treatment status is said to be balanced 119891(119883119894|119863119894 = 1 119890(119883119894)) = 119891(119883119894|119863119894 = 0 119890(119883119894))

bull Of course we have to know the true PS to have all theseresults work

Estimating the propensity score

bull Of course in observational studies we donrsquot know thepropensity score

bull We would run a parametric model with parameters 120574 toestimate the propensity scores

1 Estimate 2 Create 119894 = Pr[119863119894 = 1|119883119894 ]

bull For instance in R we could easily calculate the propensityscores using the glm function

pscores lt- glm(treat ~ var1 + var2 + var3 data = mydata

family = binomial())$fittedvalues

Propensity score specifics

bull What variables do we include in the propensity score model Any set of variables that blocks all the backdoor paths from 119863119894

to 119884119894

bull Check balance within strata of 119894 Covariates should bebalanced

119891(119883119894|119863119894 = 1 119894) = 119891(119883119894|119863119894 = 0 119894)

bull Can also use automatednonparametric tools for estimating 119894 Covariate Balancing Propensity Scores (Imai and Ratkovic)

Stratifying by the propensity score

bull How will we use the propensity score Matching (next week) Weighting (two weeks) Regression

(three weeks)bull Today coarsening the propensity score and stratifyingbull Choose boundary points 0 = 1198871113567 lt 1198871113568 lt hellip lt 119887119870minus1113568 lt 119887119870 = 1bull Create block indicators

119861119894(119896) =

⎧⎪⎪⎨⎪⎪⎩1 if 119887119896minus1113568 lt (119883119894) lt 1198871198960 otherwise

bull Calculate within-strata effect estimates

120591119896 = 120124[119884119894|119863119894 = 1 119861119894(119896) = 1] minus 120124[119884119894|119863119894 = 0 119861119894(119896) = 1]

Standardizationdirect adjustment

bull We calculated the CATEs for each strata of the PS 120591119896bull We can use law of iterated expectations to back out the ATEbull Take the average of the CATEs over the distribution of 119883

120591 =1198701114012119896=1113568

120591119896ℙ[119861119894(119896) = 1]

bull Note that ℙ[119861119894(119896) = 1] is just the proportion of units in block 119896

ℙ[119861119894(119896) = 1] =sum119873

119894=1113568 119861119894(119896)119873

5 Wrapping Up

Summary

bull Defined observational studiesbull Defined confounding and assessed when no unmeasured

confounding holdsbull Saw how no unmeasured confounding helps with OLSbull Saw how to estimate causal effects under no unmeasured

confounding using the propensity score

Next few weeks

bull Learn how to estimate causal effects under no unmeasuredconfounders via

Matching Weighting Regression

bull Then we move onto situations where no unmeasuredconfounders is violated

  • Observational studies
  • Confounding
  • No unmeasured confounders and OLS
  • Estimating causal effects under no unmeasured confounders
  • Wrapping Up
Page 21: Gov 2002: 4. Observational Studies and Confounding · 2019-07-23 · 2. the coefficient of regression of𝐷𝑖on 𝐿𝑖also controlling for ... Basicapproachtoestimation • Remember

SWIGs

119863 | 119889 119884(119889)

119880 119883

119884

bull Itrsquos a little hard to see how the backdoor criterion implies nounmeasured confounders

No potential outcomes on this graphbull Richardson and Robins Single World Intervention Graphs

Split 119863 node into natural value (119863) and intervention value 119889 Let all effects of 119863 take their potential value under

intervention 119884(119889)bull Now can see are 119863 and 119884(119889) related

119863 larr 119880 rarr 119883 rarr 119884(119889) implies not independent Conditioning on 119883 blocks that backdoor path 119863 ⟂⟂ 119884(119889)|119883

No unmeasured confounders is nottestable

bull No unmeasured confounding places no restrictions on theobserved data

1114100119884119894(0)|119863119894 = 1119883119894111410311140591113840111384011138401113840111384011138401113840111384011138401113840111406011138401113840111384011138401113840111384011138401113840111384011138401114061unobserved

119889= 1114100119884119894(0)|119863119894 = 0119883119894111410311140591113840111384011138401113840111384011138401113840111384011138401113840111406011138401113840111384011138401113840111384011138401113840111384011138401114061observed

bull Here 119889= means equal in distributionbull No way to directly test this assumption without the

counterfactual data which is missing by definitionbull With backdoor criterion you must have the correct DAG

Assessing no unmeasured confounders

bull Can do ldquoplacebordquo tests where 119863119894 cannot have an effect(lagged outcomes etc)

bull Della Vigna and Kaplan (2007 QJE) effect of Fox Newsavailability on Republican vote share

Availability in 20002003 canrsquot affect past vote sharesbull Unconfoundedness could still be violated even if you pass this

test

Alternatives to no unmeasuredconfounding

bull Without explicit randomization we need some way ofidentifying causal effects

bull No unmeasured confounders asymp randomized experiment Indentification results very similar to experiments

bull With unmeasured confounding are we doomed Maybe notbull Other approaches rely on finding plausibly exogenous variation

in assignment of 119863119894 Instrumental variables (randomization + exclusion restriction) Over-time variation (diff-in-diff fixed effects) Arbitrary thresholds for treatment assignment (RDD)

3 No unmeasuredconfounders and OLS

Justifying regression

bull We know how randomized experiments imply thatdifferences-in-means identify the ATE

bull In the next few weeks wersquoll work through how no unmeasuredconfounding justifies a number of estimation strategies

bull Today itrsquos useful to walk through what no unmeasuredconfounding can buy us in a familiar setting OLS

Wersquoll cover regression more formally later

Constant effects set up

bull Assume a constant effects setup

119884119894(0) = 120572 + 119883prime119894 120573 + 119906119894

119884119894(1) = 120572 + 120591 + 119883prime119894 120573 + 119906119894

bull Constant effects because 119884119894(1) minus 119884119894(0) = 120591 for all unitsbull Use consistency to get the usual regression formula

119884119894 = 119884119894(1)119863119894 + 119884119894(0)(1 minus 119863119894)= 119884119894(0) + 1114100119884119894(1) minus 119884119894(0)1114103 sdot 119863119894

= 120572 + 120591 sdot 119863119894 + 119883prime119894 120573 + 119906119894

bull Does no unmeasured confounding help us identify the causalparameter 120591

Regression on residuals

bull First estimate the residuals of regression of the treatment andoutcome on the covariates

119894 = 119884119894 minus 120124[119884119894|119883119894]119894 = 119863119894 minus 120124[119863119894|119883119894]

bull Running a regression of 119894 on 119894 is equivalent to controllingfor 119883119894

119884119894 = 120572 + 120591 sdot 119863119894 + 119883prime119894 120573 + 119906119894

119894 = 120572 + 120591 sdot 119894 + 119894bull Here 119894 = 119906119894 minus 120124[119906119894|119883119894]

What does OLS estimate

bull Using the usual OLS theory we can show that the probabilitylimit of the OLS estimator of 120591 is

plim 111369611136931113700 =Cov(119894 119894)Var(119894)

= Cov(119894 120572 + 120591 sdot 119894 + 119894)Var(119894)

= 120591 sdot Cov(119894 119894) + Cov(119894 119894)Var(119894)

= 120591 + Cov(119894 119894)Var(119894)

Key OLS assumption

plim 111369611136931113700 = 120591 + Cov(119894 119894)Var(119894)

bull Key identification comes from Cov(119894 119894) = 0 Conditional on 119883119894 no relationship between 119863119894 and 119906119894

bull Note 119906119894 is a function of 119883119894 and 119884119894(119889) 119906119894 = 119884119894(0) minus 120572 minus 119883prime

119894 120573 when 119863119894 = 0 119906119894 = 119884119894(1) minus 120572 minus 120591 minus 119883prime

119894 120573 when 119863119894 = 1 condition on 119883119894 only variation in 119906119894 comes from 119884119894(119889)

bull No unmeasured confounding implies this assumption

119863119894 ⟂⟂ 1114100119884119894(1) 119884119894(0)1114103|119883119894 ⟹ 119863119894 ⟂⟂ 119906119894|119883119894 ⟹ Cov(119894 119894) = 0

Omitted variable bias

bull What happens when this is violated Suppose that there isone omitted variable (residualized from 119883119894)

119894 = 120582119894 + 120596119894

bull Wersquoll assume that if we could measure 119871119894 then nounmeasured confounding would hold

bull Leads to inconsistency in the OLS estimator

plim 111369611136931113700 = 120591 + 120582Cov(119894 119894)Var(119894)

bull Bias here is terms multiplied together1 coefficient on 119871119894 (120582)2 the coefficient of regression of 119863119894 on 119871119894 also controlling for 119883119894

4 Estimating causaleffects under nounmeasuredconfounders

Basic approach to estimation

bull Remember the usual approach to estimating the ATE withcovariates

bull Stratification Stratify the units by the covariates Calculate CATE within these strata

bull Standardizationdirect adjustment Average the CATEs across the strata to get ATE

bull How to create strata when 119883 has continuous components If 119883 is discrete with only a few levels can use the exact values

of 119883 Otherwise we may have to subclassifycoarsen the data

Classic example cigarspipes versuscigarettes

bull 119863119894 = 1 for pipecigar smokers 119863119894 = 0 for cigarette smokersbull 119884119894 = death in the first year of follow-upbull Naive positive effect cigarpipe smokers more likely to die

Whatrsquos the confounder here Age Pipecigar smokers much older than cigarette smokers

bull Cochranrsquos approach stratify based on coarsened age Divide age into 119896 strata 119878119894 isin 1199041113568 1199041113569 hellip 119904119896 1199041113568 might be 18-25 1199041113569 might be 26-35 and so on Calculate effect within strata and aggregate

bull Key assumption no unmeasured confounders using stratifiedversion of age

119863119894 ⟂⟂ 1114100119884119894(0) 119884119894(1)1114103|119878119894

Stratification on the propensity score

bull What about when 119883 has has many dimensionsbull Curse of dimensionality there will be very few if any units in

a given stratum of 119883119894bull Stratify on a low-dimensional summary the propensity score

119890(119909) = ℙ[119863119894 = 1|119883119894 = 119909]

PS = unitrsquos probability of being treated conditional on 119883119894

bull For a particular unit this is 119890(119883119894) = ℙ[119863119894 = 1|119883119894]bull Rosenbaum and Rubin (1983) showed that

119863119894 ⟂⟂ 1114100119884119894(0) 119884119894(1)1114103 | 119883119894 ⟹ 119863119894 ⟂⟂ 1114100119884119894(0) 119884119894(1)1114103 | 119890(119883119894)

stratifying on 119890119894 is the same as stratifying on the full 119883119894

Propensity score as balancing score

bull The propensity score is actually a balancing score whichmeans that

119863119894 ⟂⟂ 119883119894 | 119890(119883119894)

bull Conditional on the propensity score treatment is independentof the covariates

Treatment status is said to be balanced 119891(119883119894|119863119894 = 1 119890(119883119894)) = 119891(119883119894|119863119894 = 0 119890(119883119894))

bull Of course we have to know the true PS to have all theseresults work

Estimating the propensity score

bull Of course in observational studies we donrsquot know thepropensity score

bull We would run a parametric model with parameters 120574 toestimate the propensity scores

1 Estimate 2 Create 119894 = Pr[119863119894 = 1|119883119894 ]

bull For instance in R we could easily calculate the propensityscores using the glm function

pscores lt- glm(treat ~ var1 + var2 + var3 data = mydata

family = binomial())$fittedvalues

Propensity score specifics

bull What variables do we include in the propensity score model Any set of variables that blocks all the backdoor paths from 119863119894

to 119884119894

bull Check balance within strata of 119894 Covariates should bebalanced

119891(119883119894|119863119894 = 1 119894) = 119891(119883119894|119863119894 = 0 119894)

bull Can also use automatednonparametric tools for estimating 119894 Covariate Balancing Propensity Scores (Imai and Ratkovic)

Stratifying by the propensity score

bull How will we use the propensity score Matching (next week) Weighting (two weeks) Regression

(three weeks)bull Today coarsening the propensity score and stratifyingbull Choose boundary points 0 = 1198871113567 lt 1198871113568 lt hellip lt 119887119870minus1113568 lt 119887119870 = 1bull Create block indicators

119861119894(119896) =

⎧⎪⎪⎨⎪⎪⎩1 if 119887119896minus1113568 lt (119883119894) lt 1198871198960 otherwise

bull Calculate within-strata effect estimates

120591119896 = 120124[119884119894|119863119894 = 1 119861119894(119896) = 1] minus 120124[119884119894|119863119894 = 0 119861119894(119896) = 1]

Standardizationdirect adjustment

bull We calculated the CATEs for each strata of the PS 120591119896bull We can use law of iterated expectations to back out the ATEbull Take the average of the CATEs over the distribution of 119883

120591 =1198701114012119896=1113568

120591119896ℙ[119861119894(119896) = 1]

bull Note that ℙ[119861119894(119896) = 1] is just the proportion of units in block 119896

ℙ[119861119894(119896) = 1] =sum119873

119894=1113568 119861119894(119896)119873

5 Wrapping Up

Summary

bull Defined observational studiesbull Defined confounding and assessed when no unmeasured

confounding holdsbull Saw how no unmeasured confounding helps with OLSbull Saw how to estimate causal effects under no unmeasured

confounding using the propensity score

Next few weeks

bull Learn how to estimate causal effects under no unmeasuredconfounders via

Matching Weighting Regression

bull Then we move onto situations where no unmeasuredconfounders is violated

  • Observational studies
  • Confounding
  • No unmeasured confounders and OLS
  • Estimating causal effects under no unmeasured confounders
  • Wrapping Up
Page 22: Gov 2002: 4. Observational Studies and Confounding · 2019-07-23 · 2. the coefficient of regression of𝐷𝑖on 𝐿𝑖also controlling for ... Basicapproachtoestimation • Remember

No unmeasured confounders is nottestable

bull No unmeasured confounding places no restrictions on theobserved data

1114100119884119894(0)|119863119894 = 1119883119894111410311140591113840111384011138401113840111384011138401113840111384011138401113840111406011138401113840111384011138401113840111384011138401113840111384011138401114061unobserved

119889= 1114100119884119894(0)|119863119894 = 0119883119894111410311140591113840111384011138401113840111384011138401113840111384011138401113840111406011138401113840111384011138401113840111384011138401113840111384011138401114061observed

bull Here 119889= means equal in distributionbull No way to directly test this assumption without the

counterfactual data which is missing by definitionbull With backdoor criterion you must have the correct DAG

Assessing no unmeasured confounders

bull Can do ldquoplacebordquo tests where 119863119894 cannot have an effect(lagged outcomes etc)

bull Della Vigna and Kaplan (2007 QJE) effect of Fox Newsavailability on Republican vote share

Availability in 20002003 canrsquot affect past vote sharesbull Unconfoundedness could still be violated even if you pass this

test

Alternatives to no unmeasuredconfounding

bull Without explicit randomization we need some way ofidentifying causal effects

bull No unmeasured confounders asymp randomized experiment Indentification results very similar to experiments

bull With unmeasured confounding are we doomed Maybe notbull Other approaches rely on finding plausibly exogenous variation

in assignment of 119863119894 Instrumental variables (randomization + exclusion restriction) Over-time variation (diff-in-diff fixed effects) Arbitrary thresholds for treatment assignment (RDD)

3 No unmeasuredconfounders and OLS

Justifying regression

bull We know how randomized experiments imply thatdifferences-in-means identify the ATE

bull In the next few weeks wersquoll work through how no unmeasuredconfounding justifies a number of estimation strategies

bull Today itrsquos useful to walk through what no unmeasuredconfounding can buy us in a familiar setting OLS

Wersquoll cover regression more formally later

Constant effects set up

bull Assume a constant effects setup

119884119894(0) = 120572 + 119883prime119894 120573 + 119906119894

119884119894(1) = 120572 + 120591 + 119883prime119894 120573 + 119906119894

bull Constant effects because 119884119894(1) minus 119884119894(0) = 120591 for all unitsbull Use consistency to get the usual regression formula

119884119894 = 119884119894(1)119863119894 + 119884119894(0)(1 minus 119863119894)= 119884119894(0) + 1114100119884119894(1) minus 119884119894(0)1114103 sdot 119863119894

= 120572 + 120591 sdot 119863119894 + 119883prime119894 120573 + 119906119894

bull Does no unmeasured confounding help us identify the causalparameter 120591

Regression on residuals

bull First estimate the residuals of regression of the treatment andoutcome on the covariates

119894 = 119884119894 minus 120124[119884119894|119883119894]119894 = 119863119894 minus 120124[119863119894|119883119894]

bull Running a regression of 119894 on 119894 is equivalent to controllingfor 119883119894

119884119894 = 120572 + 120591 sdot 119863119894 + 119883prime119894 120573 + 119906119894

119894 = 120572 + 120591 sdot 119894 + 119894bull Here 119894 = 119906119894 minus 120124[119906119894|119883119894]

What does OLS estimate

bull Using the usual OLS theory we can show that the probabilitylimit of the OLS estimator of 120591 is

plim 111369611136931113700 =Cov(119894 119894)Var(119894)

= Cov(119894 120572 + 120591 sdot 119894 + 119894)Var(119894)

= 120591 sdot Cov(119894 119894) + Cov(119894 119894)Var(119894)

= 120591 + Cov(119894 119894)Var(119894)

Key OLS assumption

plim 111369611136931113700 = 120591 + Cov(119894 119894)Var(119894)

bull Key identification comes from Cov(119894 119894) = 0 Conditional on 119883119894 no relationship between 119863119894 and 119906119894

bull Note 119906119894 is a function of 119883119894 and 119884119894(119889) 119906119894 = 119884119894(0) minus 120572 minus 119883prime

119894 120573 when 119863119894 = 0 119906119894 = 119884119894(1) minus 120572 minus 120591 minus 119883prime

119894 120573 when 119863119894 = 1 condition on 119883119894 only variation in 119906119894 comes from 119884119894(119889)

bull No unmeasured confounding implies this assumption

119863119894 ⟂⟂ 1114100119884119894(1) 119884119894(0)1114103|119883119894 ⟹ 119863119894 ⟂⟂ 119906119894|119883119894 ⟹ Cov(119894 119894) = 0

Omitted variable bias

bull What happens when this is violated Suppose that there isone omitted variable (residualized from 119883119894)

119894 = 120582119894 + 120596119894

bull Wersquoll assume that if we could measure 119871119894 then nounmeasured confounding would hold

bull Leads to inconsistency in the OLS estimator

plim 111369611136931113700 = 120591 + 120582Cov(119894 119894)Var(119894)

bull Bias here is terms multiplied together1 coefficient on 119871119894 (120582)2 the coefficient of regression of 119863119894 on 119871119894 also controlling for 119883119894

4 Estimating causaleffects under nounmeasuredconfounders

Basic approach to estimation

bull Remember the usual approach to estimating the ATE withcovariates

bull Stratification Stratify the units by the covariates Calculate CATE within these strata

bull Standardizationdirect adjustment Average the CATEs across the strata to get ATE

bull How to create strata when 119883 has continuous components If 119883 is discrete with only a few levels can use the exact values

of 119883 Otherwise we may have to subclassifycoarsen the data

Classic example cigarspipes versuscigarettes

bull 119863119894 = 1 for pipecigar smokers 119863119894 = 0 for cigarette smokersbull 119884119894 = death in the first year of follow-upbull Naive positive effect cigarpipe smokers more likely to die

Whatrsquos the confounder here Age Pipecigar smokers much older than cigarette smokers

bull Cochranrsquos approach stratify based on coarsened age Divide age into 119896 strata 119878119894 isin 1199041113568 1199041113569 hellip 119904119896 1199041113568 might be 18-25 1199041113569 might be 26-35 and so on Calculate effect within strata and aggregate

bull Key assumption no unmeasured confounders using stratifiedversion of age

119863119894 ⟂⟂ 1114100119884119894(0) 119884119894(1)1114103|119878119894

Stratification on the propensity score

bull What about when 119883 has has many dimensionsbull Curse of dimensionality there will be very few if any units in

a given stratum of 119883119894bull Stratify on a low-dimensional summary the propensity score

119890(119909) = ℙ[119863119894 = 1|119883119894 = 119909]

PS = unitrsquos probability of being treated conditional on 119883119894

bull For a particular unit this is 119890(119883119894) = ℙ[119863119894 = 1|119883119894]bull Rosenbaum and Rubin (1983) showed that

119863119894 ⟂⟂ 1114100119884119894(0) 119884119894(1)1114103 | 119883119894 ⟹ 119863119894 ⟂⟂ 1114100119884119894(0) 119884119894(1)1114103 | 119890(119883119894)

stratifying on 119890119894 is the same as stratifying on the full 119883119894

Propensity score as balancing score

bull The propensity score is actually a balancing score whichmeans that

119863119894 ⟂⟂ 119883119894 | 119890(119883119894)

bull Conditional on the propensity score treatment is independentof the covariates

Treatment status is said to be balanced 119891(119883119894|119863119894 = 1 119890(119883119894)) = 119891(119883119894|119863119894 = 0 119890(119883119894))

bull Of course we have to know the true PS to have all theseresults work

Estimating the propensity score

bull Of course in observational studies we donrsquot know thepropensity score

bull We would run a parametric model with parameters 120574 toestimate the propensity scores

1 Estimate 2 Create 119894 = Pr[119863119894 = 1|119883119894 ]

bull For instance in R we could easily calculate the propensityscores using the glm function

pscores lt- glm(treat ~ var1 + var2 + var3 data = mydata

family = binomial())$fittedvalues

Propensity score specifics

bull What variables do we include in the propensity score model Any set of variables that blocks all the backdoor paths from 119863119894

to 119884119894

bull Check balance within strata of 119894 Covariates should bebalanced

119891(119883119894|119863119894 = 1 119894) = 119891(119883119894|119863119894 = 0 119894)

bull Can also use automatednonparametric tools for estimating 119894 Covariate Balancing Propensity Scores (Imai and Ratkovic)

Stratifying by the propensity score

bull How will we use the propensity score Matching (next week) Weighting (two weeks) Regression

(three weeks)bull Today coarsening the propensity score and stratifyingbull Choose boundary points 0 = 1198871113567 lt 1198871113568 lt hellip lt 119887119870minus1113568 lt 119887119870 = 1bull Create block indicators

119861119894(119896) =

⎧⎪⎪⎨⎪⎪⎩1 if 119887119896minus1113568 lt (119883119894) lt 1198871198960 otherwise

bull Calculate within-strata effect estimates

120591119896 = 120124[119884119894|119863119894 = 1 119861119894(119896) = 1] minus 120124[119884119894|119863119894 = 0 119861119894(119896) = 1]

Standardizationdirect adjustment

bull We calculated the CATEs for each strata of the PS 120591119896bull We can use law of iterated expectations to back out the ATEbull Take the average of the CATEs over the distribution of 119883

120591 =1198701114012119896=1113568

120591119896ℙ[119861119894(119896) = 1]

bull Note that ℙ[119861119894(119896) = 1] is just the proportion of units in block 119896

ℙ[119861119894(119896) = 1] =sum119873

119894=1113568 119861119894(119896)119873

5 Wrapping Up

Summary

bull Defined observational studiesbull Defined confounding and assessed when no unmeasured

confounding holdsbull Saw how no unmeasured confounding helps with OLSbull Saw how to estimate causal effects under no unmeasured

confounding using the propensity score

Next few weeks

bull Learn how to estimate causal effects under no unmeasuredconfounders via

Matching Weighting Regression

bull Then we move onto situations where no unmeasuredconfounders is violated

  • Observational studies
  • Confounding
  • No unmeasured confounders and OLS
  • Estimating causal effects under no unmeasured confounders
  • Wrapping Up
Page 23: Gov 2002: 4. Observational Studies and Confounding · 2019-07-23 · 2. the coefficient of regression of𝐷𝑖on 𝐿𝑖also controlling for ... Basicapproachtoestimation • Remember

Assessing no unmeasured confounders

bull Can do ldquoplacebordquo tests where 119863119894 cannot have an effect(lagged outcomes etc)

bull Della Vigna and Kaplan (2007 QJE) effect of Fox Newsavailability on Republican vote share

Availability in 20002003 canrsquot affect past vote sharesbull Unconfoundedness could still be violated even if you pass this

test

Alternatives to no unmeasuredconfounding

bull Without explicit randomization we need some way ofidentifying causal effects

bull No unmeasured confounders asymp randomized experiment Indentification results very similar to experiments

bull With unmeasured confounding are we doomed Maybe notbull Other approaches rely on finding plausibly exogenous variation

in assignment of 119863119894 Instrumental variables (randomization + exclusion restriction) Over-time variation (diff-in-diff fixed effects) Arbitrary thresholds for treatment assignment (RDD)

3 No unmeasuredconfounders and OLS

Justifying regression

bull We know how randomized experiments imply thatdifferences-in-means identify the ATE

bull In the next few weeks wersquoll work through how no unmeasuredconfounding justifies a number of estimation strategies

bull Today itrsquos useful to walk through what no unmeasuredconfounding can buy us in a familiar setting OLS

Wersquoll cover regression more formally later

Constant effects set up

bull Assume a constant effects setup

119884119894(0) = 120572 + 119883prime119894 120573 + 119906119894

119884119894(1) = 120572 + 120591 + 119883prime119894 120573 + 119906119894

bull Constant effects because 119884119894(1) minus 119884119894(0) = 120591 for all unitsbull Use consistency to get the usual regression formula

119884119894 = 119884119894(1)119863119894 + 119884119894(0)(1 minus 119863119894)= 119884119894(0) + 1114100119884119894(1) minus 119884119894(0)1114103 sdot 119863119894

= 120572 + 120591 sdot 119863119894 + 119883prime119894 120573 + 119906119894

bull Does no unmeasured confounding help us identify the causalparameter 120591

Regression on residuals

bull First estimate the residuals of regression of the treatment andoutcome on the covariates

119894 = 119884119894 minus 120124[119884119894|119883119894]119894 = 119863119894 minus 120124[119863119894|119883119894]

bull Running a regression of 119894 on 119894 is equivalent to controllingfor 119883119894

119884119894 = 120572 + 120591 sdot 119863119894 + 119883prime119894 120573 + 119906119894

119894 = 120572 + 120591 sdot 119894 + 119894bull Here 119894 = 119906119894 minus 120124[119906119894|119883119894]

What does OLS estimate

bull Using the usual OLS theory we can show that the probabilitylimit of the OLS estimator of 120591 is

plim 111369611136931113700 =Cov(119894 119894)Var(119894)

= Cov(119894 120572 + 120591 sdot 119894 + 119894)Var(119894)

= 120591 sdot Cov(119894 119894) + Cov(119894 119894)Var(119894)

= 120591 + Cov(119894 119894)Var(119894)

Key OLS assumption

plim 111369611136931113700 = 120591 + Cov(119894 119894)Var(119894)

bull Key identification comes from Cov(119894 119894) = 0 Conditional on 119883119894 no relationship between 119863119894 and 119906119894

bull Note 119906119894 is a function of 119883119894 and 119884119894(119889) 119906119894 = 119884119894(0) minus 120572 minus 119883prime

119894 120573 when 119863119894 = 0 119906119894 = 119884119894(1) minus 120572 minus 120591 minus 119883prime

119894 120573 when 119863119894 = 1 condition on 119883119894 only variation in 119906119894 comes from 119884119894(119889)

bull No unmeasured confounding implies this assumption

119863119894 ⟂⟂ 1114100119884119894(1) 119884119894(0)1114103|119883119894 ⟹ 119863119894 ⟂⟂ 119906119894|119883119894 ⟹ Cov(119894 119894) = 0

Omitted variable bias

bull What happens when this is violated Suppose that there isone omitted variable (residualized from 119883119894)

119894 = 120582119894 + 120596119894

bull Wersquoll assume that if we could measure 119871119894 then nounmeasured confounding would hold

bull Leads to inconsistency in the OLS estimator

plim 111369611136931113700 = 120591 + 120582Cov(119894 119894)Var(119894)

bull Bias here is terms multiplied together1 coefficient on 119871119894 (120582)2 the coefficient of regression of 119863119894 on 119871119894 also controlling for 119883119894

4 Estimating causaleffects under nounmeasuredconfounders

Basic approach to estimation

bull Remember the usual approach to estimating the ATE withcovariates

bull Stratification Stratify the units by the covariates Calculate CATE within these strata

bull Standardizationdirect adjustment Average the CATEs across the strata to get ATE

bull How to create strata when 119883 has continuous components If 119883 is discrete with only a few levels can use the exact values

of 119883 Otherwise we may have to subclassifycoarsen the data

Classic example cigarspipes versuscigarettes

bull 119863119894 = 1 for pipecigar smokers 119863119894 = 0 for cigarette smokersbull 119884119894 = death in the first year of follow-upbull Naive positive effect cigarpipe smokers more likely to die

Whatrsquos the confounder here Age Pipecigar smokers much older than cigarette smokers

bull Cochranrsquos approach stratify based on coarsened age Divide age into 119896 strata 119878119894 isin 1199041113568 1199041113569 hellip 119904119896 1199041113568 might be 18-25 1199041113569 might be 26-35 and so on Calculate effect within strata and aggregate

bull Key assumption no unmeasured confounders using stratifiedversion of age

119863119894 ⟂⟂ 1114100119884119894(0) 119884119894(1)1114103|119878119894

Stratification on the propensity score

bull What about when 119883 has has many dimensionsbull Curse of dimensionality there will be very few if any units in

a given stratum of 119883119894bull Stratify on a low-dimensional summary the propensity score

119890(119909) = ℙ[119863119894 = 1|119883119894 = 119909]

PS = unitrsquos probability of being treated conditional on 119883119894

bull For a particular unit this is 119890(119883119894) = ℙ[119863119894 = 1|119883119894]bull Rosenbaum and Rubin (1983) showed that

119863119894 ⟂⟂ 1114100119884119894(0) 119884119894(1)1114103 | 119883119894 ⟹ 119863119894 ⟂⟂ 1114100119884119894(0) 119884119894(1)1114103 | 119890(119883119894)

stratifying on 119890119894 is the same as stratifying on the full 119883119894

Propensity score as balancing score

bull The propensity score is actually a balancing score whichmeans that

119863119894 ⟂⟂ 119883119894 | 119890(119883119894)

bull Conditional on the propensity score treatment is independentof the covariates

Treatment status is said to be balanced 119891(119883119894|119863119894 = 1 119890(119883119894)) = 119891(119883119894|119863119894 = 0 119890(119883119894))

bull Of course we have to know the true PS to have all theseresults work

Estimating the propensity score

bull Of course in observational studies we donrsquot know thepropensity score

bull We would run a parametric model with parameters 120574 toestimate the propensity scores

1 Estimate 2 Create 119894 = Pr[119863119894 = 1|119883119894 ]

bull For instance in R we could easily calculate the propensityscores using the glm function

pscores lt- glm(treat ~ var1 + var2 + var3 data = mydata

family = binomial())$fittedvalues

Propensity score specifics

bull What variables do we include in the propensity score model Any set of variables that blocks all the backdoor paths from 119863119894

to 119884119894

bull Check balance within strata of 119894 Covariates should bebalanced

119891(119883119894|119863119894 = 1 119894) = 119891(119883119894|119863119894 = 0 119894)

bull Can also use automatednonparametric tools for estimating 119894 Covariate Balancing Propensity Scores (Imai and Ratkovic)

Stratifying by the propensity score

bull How will we use the propensity score Matching (next week) Weighting (two weeks) Regression

(three weeks)bull Today coarsening the propensity score and stratifyingbull Choose boundary points 0 = 1198871113567 lt 1198871113568 lt hellip lt 119887119870minus1113568 lt 119887119870 = 1bull Create block indicators

119861119894(119896) =

⎧⎪⎪⎨⎪⎪⎩1 if 119887119896minus1113568 lt (119883119894) lt 1198871198960 otherwise

bull Calculate within-strata effect estimates

120591119896 = 120124[119884119894|119863119894 = 1 119861119894(119896) = 1] minus 120124[119884119894|119863119894 = 0 119861119894(119896) = 1]

Standardizationdirect adjustment

bull We calculated the CATEs for each strata of the PS 120591119896bull We can use law of iterated expectations to back out the ATEbull Take the average of the CATEs over the distribution of 119883

120591 =1198701114012119896=1113568

120591119896ℙ[119861119894(119896) = 1]

bull Note that ℙ[119861119894(119896) = 1] is just the proportion of units in block 119896

ℙ[119861119894(119896) = 1] =sum119873

119894=1113568 119861119894(119896)119873

5 Wrapping Up

Summary

bull Defined observational studiesbull Defined confounding and assessed when no unmeasured

confounding holdsbull Saw how no unmeasured confounding helps with OLSbull Saw how to estimate causal effects under no unmeasured

confounding using the propensity score

Next few weeks

bull Learn how to estimate causal effects under no unmeasuredconfounders via

Matching Weighting Regression

bull Then we move onto situations where no unmeasuredconfounders is violated

  • Observational studies
  • Confounding
  • No unmeasured confounders and OLS
  • Estimating causal effects under no unmeasured confounders
  • Wrapping Up
Page 24: Gov 2002: 4. Observational Studies and Confounding · 2019-07-23 · 2. the coefficient of regression of𝐷𝑖on 𝐿𝑖also controlling for ... Basicapproachtoestimation • Remember

Alternatives to no unmeasuredconfounding

bull Without explicit randomization we need some way ofidentifying causal effects

bull No unmeasured confounders asymp randomized experiment Indentification results very similar to experiments

bull With unmeasured confounding are we doomed Maybe notbull Other approaches rely on finding plausibly exogenous variation

in assignment of 119863119894 Instrumental variables (randomization + exclusion restriction) Over-time variation (diff-in-diff fixed effects) Arbitrary thresholds for treatment assignment (RDD)

3 No unmeasuredconfounders and OLS

Justifying regression

bull We know how randomized experiments imply thatdifferences-in-means identify the ATE

bull In the next few weeks wersquoll work through how no unmeasuredconfounding justifies a number of estimation strategies

bull Today itrsquos useful to walk through what no unmeasuredconfounding can buy us in a familiar setting OLS

Wersquoll cover regression more formally later

Constant effects set up

bull Assume a constant effects setup

119884119894(0) = 120572 + 119883prime119894 120573 + 119906119894

119884119894(1) = 120572 + 120591 + 119883prime119894 120573 + 119906119894

bull Constant effects because 119884119894(1) minus 119884119894(0) = 120591 for all unitsbull Use consistency to get the usual regression formula

119884119894 = 119884119894(1)119863119894 + 119884119894(0)(1 minus 119863119894)= 119884119894(0) + 1114100119884119894(1) minus 119884119894(0)1114103 sdot 119863119894

= 120572 + 120591 sdot 119863119894 + 119883prime119894 120573 + 119906119894

bull Does no unmeasured confounding help us identify the causalparameter 120591

Regression on residuals

bull First estimate the residuals of regression of the treatment andoutcome on the covariates

119894 = 119884119894 minus 120124[119884119894|119883119894]119894 = 119863119894 minus 120124[119863119894|119883119894]

bull Running a regression of 119894 on 119894 is equivalent to controllingfor 119883119894

119884119894 = 120572 + 120591 sdot 119863119894 + 119883prime119894 120573 + 119906119894

119894 = 120572 + 120591 sdot 119894 + 119894bull Here 119894 = 119906119894 minus 120124[119906119894|119883119894]

What does OLS estimate

bull Using the usual OLS theory we can show that the probabilitylimit of the OLS estimator of 120591 is

plim 111369611136931113700 =Cov(119894 119894)Var(119894)

= Cov(119894 120572 + 120591 sdot 119894 + 119894)Var(119894)

= 120591 sdot Cov(119894 119894) + Cov(119894 119894)Var(119894)

= 120591 + Cov(119894 119894)Var(119894)

Key OLS assumption

plim 111369611136931113700 = 120591 + Cov(119894 119894)Var(119894)

bull Key identification comes from Cov(119894 119894) = 0 Conditional on 119883119894 no relationship between 119863119894 and 119906119894

bull Note 119906119894 is a function of 119883119894 and 119884119894(119889) 119906119894 = 119884119894(0) minus 120572 minus 119883prime

119894 120573 when 119863119894 = 0 119906119894 = 119884119894(1) minus 120572 minus 120591 minus 119883prime

119894 120573 when 119863119894 = 1 condition on 119883119894 only variation in 119906119894 comes from 119884119894(119889)

bull No unmeasured confounding implies this assumption

119863119894 ⟂⟂ 1114100119884119894(1) 119884119894(0)1114103|119883119894 ⟹ 119863119894 ⟂⟂ 119906119894|119883119894 ⟹ Cov(119894 119894) = 0

Omitted variable bias

bull What happens when this is violated Suppose that there isone omitted variable (residualized from 119883119894)

119894 = 120582119894 + 120596119894

bull Wersquoll assume that if we could measure 119871119894 then nounmeasured confounding would hold

bull Leads to inconsistency in the OLS estimator

plim 111369611136931113700 = 120591 + 120582Cov(119894 119894)Var(119894)

bull Bias here is terms multiplied together1 coefficient on 119871119894 (120582)2 the coefficient of regression of 119863119894 on 119871119894 also controlling for 119883119894

4 Estimating causaleffects under nounmeasuredconfounders

Basic approach to estimation

bull Remember the usual approach to estimating the ATE withcovariates

bull Stratification Stratify the units by the covariates Calculate CATE within these strata

bull Standardizationdirect adjustment Average the CATEs across the strata to get ATE

bull How to create strata when 119883 has continuous components If 119883 is discrete with only a few levels can use the exact values

of 119883 Otherwise we may have to subclassifycoarsen the data

Classic example cigarspipes versuscigarettes

bull 119863119894 = 1 for pipecigar smokers 119863119894 = 0 for cigarette smokersbull 119884119894 = death in the first year of follow-upbull Naive positive effect cigarpipe smokers more likely to die

Whatrsquos the confounder here Age Pipecigar smokers much older than cigarette smokers

bull Cochranrsquos approach stratify based on coarsened age Divide age into 119896 strata 119878119894 isin 1199041113568 1199041113569 hellip 119904119896 1199041113568 might be 18-25 1199041113569 might be 26-35 and so on Calculate effect within strata and aggregate

bull Key assumption no unmeasured confounders using stratifiedversion of age

119863119894 ⟂⟂ 1114100119884119894(0) 119884119894(1)1114103|119878119894

Stratification on the propensity score

bull What about when 119883 has has many dimensionsbull Curse of dimensionality there will be very few if any units in

a given stratum of 119883119894bull Stratify on a low-dimensional summary the propensity score

119890(119909) = ℙ[119863119894 = 1|119883119894 = 119909]

PS = unitrsquos probability of being treated conditional on 119883119894

bull For a particular unit this is 119890(119883119894) = ℙ[119863119894 = 1|119883119894]bull Rosenbaum and Rubin (1983) showed that

119863119894 ⟂⟂ 1114100119884119894(0) 119884119894(1)1114103 | 119883119894 ⟹ 119863119894 ⟂⟂ 1114100119884119894(0) 119884119894(1)1114103 | 119890(119883119894)

stratifying on 119890119894 is the same as stratifying on the full 119883119894

Propensity score as balancing score

bull The propensity score is actually a balancing score whichmeans that

119863119894 ⟂⟂ 119883119894 | 119890(119883119894)

bull Conditional on the propensity score treatment is independentof the covariates

Treatment status is said to be balanced 119891(119883119894|119863119894 = 1 119890(119883119894)) = 119891(119883119894|119863119894 = 0 119890(119883119894))

bull Of course we have to know the true PS to have all theseresults work

Estimating the propensity score

bull Of course in observational studies we donrsquot know thepropensity score

bull We would run a parametric model with parameters 120574 toestimate the propensity scores

1 Estimate 2 Create 119894 = Pr[119863119894 = 1|119883119894 ]

bull For instance in R we could easily calculate the propensityscores using the glm function

pscores lt- glm(treat ~ var1 + var2 + var3 data = mydata

family = binomial())$fittedvalues

Propensity score specifics

bull What variables do we include in the propensity score model Any set of variables that blocks all the backdoor paths from 119863119894

to 119884119894

bull Check balance within strata of 119894 Covariates should bebalanced

119891(119883119894|119863119894 = 1 119894) = 119891(119883119894|119863119894 = 0 119894)

bull Can also use automatednonparametric tools for estimating 119894 Covariate Balancing Propensity Scores (Imai and Ratkovic)

Stratifying by the propensity score

bull How will we use the propensity score Matching (next week) Weighting (two weeks) Regression

(three weeks)bull Today coarsening the propensity score and stratifyingbull Choose boundary points 0 = 1198871113567 lt 1198871113568 lt hellip lt 119887119870minus1113568 lt 119887119870 = 1bull Create block indicators

119861119894(119896) =

⎧⎪⎪⎨⎪⎪⎩1 if 119887119896minus1113568 lt (119883119894) lt 1198871198960 otherwise

bull Calculate within-strata effect estimates

120591119896 = 120124[119884119894|119863119894 = 1 119861119894(119896) = 1] minus 120124[119884119894|119863119894 = 0 119861119894(119896) = 1]

Standardizationdirect adjustment

bull We calculated the CATEs for each strata of the PS 120591119896bull We can use law of iterated expectations to back out the ATEbull Take the average of the CATEs over the distribution of 119883

120591 =1198701114012119896=1113568

120591119896ℙ[119861119894(119896) = 1]

bull Note that ℙ[119861119894(119896) = 1] is just the proportion of units in block 119896

ℙ[119861119894(119896) = 1] =sum119873

119894=1113568 119861119894(119896)119873

5 Wrapping Up

Summary

bull Defined observational studiesbull Defined confounding and assessed when no unmeasured

confounding holdsbull Saw how no unmeasured confounding helps with OLSbull Saw how to estimate causal effects under no unmeasured

confounding using the propensity score

Next few weeks

bull Learn how to estimate causal effects under no unmeasuredconfounders via

Matching Weighting Regression

bull Then we move onto situations where no unmeasuredconfounders is violated

  • Observational studies
  • Confounding
  • No unmeasured confounders and OLS
  • Estimating causal effects under no unmeasured confounders
  • Wrapping Up
Page 25: Gov 2002: 4. Observational Studies and Confounding · 2019-07-23 · 2. the coefficient of regression of𝐷𝑖on 𝐿𝑖also controlling for ... Basicapproachtoestimation • Remember

3 No unmeasuredconfounders and OLS

Justifying regression

bull We know how randomized experiments imply thatdifferences-in-means identify the ATE

bull In the next few weeks wersquoll work through how no unmeasuredconfounding justifies a number of estimation strategies

bull Today itrsquos useful to walk through what no unmeasuredconfounding can buy us in a familiar setting OLS

Wersquoll cover regression more formally later

Constant effects set up

bull Assume a constant effects setup

119884119894(0) = 120572 + 119883prime119894 120573 + 119906119894

119884119894(1) = 120572 + 120591 + 119883prime119894 120573 + 119906119894

bull Constant effects because 119884119894(1) minus 119884119894(0) = 120591 for all unitsbull Use consistency to get the usual regression formula

119884119894 = 119884119894(1)119863119894 + 119884119894(0)(1 minus 119863119894)= 119884119894(0) + 1114100119884119894(1) minus 119884119894(0)1114103 sdot 119863119894

= 120572 + 120591 sdot 119863119894 + 119883prime119894 120573 + 119906119894

bull Does no unmeasured confounding help us identify the causalparameter 120591

Regression on residuals

bull First estimate the residuals of regression of the treatment andoutcome on the covariates

119894 = 119884119894 minus 120124[119884119894|119883119894]119894 = 119863119894 minus 120124[119863119894|119883119894]

bull Running a regression of 119894 on 119894 is equivalent to controllingfor 119883119894

119884119894 = 120572 + 120591 sdot 119863119894 + 119883prime119894 120573 + 119906119894

119894 = 120572 + 120591 sdot 119894 + 119894bull Here 119894 = 119906119894 minus 120124[119906119894|119883119894]

What does OLS estimate

bull Using the usual OLS theory we can show that the probabilitylimit of the OLS estimator of 120591 is

plim 111369611136931113700 =Cov(119894 119894)Var(119894)

= Cov(119894 120572 + 120591 sdot 119894 + 119894)Var(119894)

= 120591 sdot Cov(119894 119894) + Cov(119894 119894)Var(119894)

= 120591 + Cov(119894 119894)Var(119894)

Key OLS assumption

plim 111369611136931113700 = 120591 + Cov(119894 119894)Var(119894)

bull Key identification comes from Cov(119894 119894) = 0 Conditional on 119883119894 no relationship between 119863119894 and 119906119894

bull Note 119906119894 is a function of 119883119894 and 119884119894(119889) 119906119894 = 119884119894(0) minus 120572 minus 119883prime

119894 120573 when 119863119894 = 0 119906119894 = 119884119894(1) minus 120572 minus 120591 minus 119883prime

119894 120573 when 119863119894 = 1 condition on 119883119894 only variation in 119906119894 comes from 119884119894(119889)

bull No unmeasured confounding implies this assumption

119863119894 ⟂⟂ 1114100119884119894(1) 119884119894(0)1114103|119883119894 ⟹ 119863119894 ⟂⟂ 119906119894|119883119894 ⟹ Cov(119894 119894) = 0

Omitted variable bias

bull What happens when this is violated Suppose that there isone omitted variable (residualized from 119883119894)

119894 = 120582119894 + 120596119894

bull Wersquoll assume that if we could measure 119871119894 then nounmeasured confounding would hold

bull Leads to inconsistency in the OLS estimator

plim 111369611136931113700 = 120591 + 120582Cov(119894 119894)Var(119894)

bull Bias here is terms multiplied together1 coefficient on 119871119894 (120582)2 the coefficient of regression of 119863119894 on 119871119894 also controlling for 119883119894

4 Estimating causaleffects under nounmeasuredconfounders

Basic approach to estimation

bull Remember the usual approach to estimating the ATE withcovariates

bull Stratification Stratify the units by the covariates Calculate CATE within these strata

bull Standardizationdirect adjustment Average the CATEs across the strata to get ATE

bull How to create strata when 119883 has continuous components If 119883 is discrete with only a few levels can use the exact values

of 119883 Otherwise we may have to subclassifycoarsen the data

Classic example cigarspipes versuscigarettes

bull 119863119894 = 1 for pipecigar smokers 119863119894 = 0 for cigarette smokersbull 119884119894 = death in the first year of follow-upbull Naive positive effect cigarpipe smokers more likely to die

Whatrsquos the confounder here Age Pipecigar smokers much older than cigarette smokers

bull Cochranrsquos approach stratify based on coarsened age Divide age into 119896 strata 119878119894 isin 1199041113568 1199041113569 hellip 119904119896 1199041113568 might be 18-25 1199041113569 might be 26-35 and so on Calculate effect within strata and aggregate

bull Key assumption no unmeasured confounders using stratifiedversion of age

119863119894 ⟂⟂ 1114100119884119894(0) 119884119894(1)1114103|119878119894

Stratification on the propensity score

bull What about when 119883 has has many dimensionsbull Curse of dimensionality there will be very few if any units in

a given stratum of 119883119894bull Stratify on a low-dimensional summary the propensity score

119890(119909) = ℙ[119863119894 = 1|119883119894 = 119909]

PS = unitrsquos probability of being treated conditional on 119883119894

bull For a particular unit this is 119890(119883119894) = ℙ[119863119894 = 1|119883119894]bull Rosenbaum and Rubin (1983) showed that

119863119894 ⟂⟂ 1114100119884119894(0) 119884119894(1)1114103 | 119883119894 ⟹ 119863119894 ⟂⟂ 1114100119884119894(0) 119884119894(1)1114103 | 119890(119883119894)

stratifying on 119890119894 is the same as stratifying on the full 119883119894

Propensity score as balancing score

bull The propensity score is actually a balancing score whichmeans that

119863119894 ⟂⟂ 119883119894 | 119890(119883119894)

bull Conditional on the propensity score treatment is independentof the covariates

Treatment status is said to be balanced 119891(119883119894|119863119894 = 1 119890(119883119894)) = 119891(119883119894|119863119894 = 0 119890(119883119894))

bull Of course we have to know the true PS to have all theseresults work

Estimating the propensity score

bull Of course in observational studies we donrsquot know thepropensity score

bull We would run a parametric model with parameters 120574 toestimate the propensity scores

1 Estimate 2 Create 119894 = Pr[119863119894 = 1|119883119894 ]

bull For instance in R we could easily calculate the propensityscores using the glm function

pscores lt- glm(treat ~ var1 + var2 + var3 data = mydata

family = binomial())$fittedvalues

Propensity score specifics

bull What variables do we include in the propensity score model Any set of variables that blocks all the backdoor paths from 119863119894

to 119884119894

bull Check balance within strata of 119894 Covariates should bebalanced

119891(119883119894|119863119894 = 1 119894) = 119891(119883119894|119863119894 = 0 119894)

bull Can also use automatednonparametric tools for estimating 119894 Covariate Balancing Propensity Scores (Imai and Ratkovic)

Stratifying by the propensity score

bull How will we use the propensity score Matching (next week) Weighting (two weeks) Regression

(three weeks)bull Today coarsening the propensity score and stratifyingbull Choose boundary points 0 = 1198871113567 lt 1198871113568 lt hellip lt 119887119870minus1113568 lt 119887119870 = 1bull Create block indicators

119861119894(119896) =

⎧⎪⎪⎨⎪⎪⎩1 if 119887119896minus1113568 lt (119883119894) lt 1198871198960 otherwise

bull Calculate within-strata effect estimates

120591119896 = 120124[119884119894|119863119894 = 1 119861119894(119896) = 1] minus 120124[119884119894|119863119894 = 0 119861119894(119896) = 1]

Standardizationdirect adjustment

bull We calculated the CATEs for each strata of the PS 120591119896bull We can use law of iterated expectations to back out the ATEbull Take the average of the CATEs over the distribution of 119883

120591 =1198701114012119896=1113568

120591119896ℙ[119861119894(119896) = 1]

bull Note that ℙ[119861119894(119896) = 1] is just the proportion of units in block 119896

ℙ[119861119894(119896) = 1] =sum119873

119894=1113568 119861119894(119896)119873

5 Wrapping Up

Summary

bull Defined observational studiesbull Defined confounding and assessed when no unmeasured

confounding holdsbull Saw how no unmeasured confounding helps with OLSbull Saw how to estimate causal effects under no unmeasured

confounding using the propensity score

Next few weeks

bull Learn how to estimate causal effects under no unmeasuredconfounders via

Matching Weighting Regression

bull Then we move onto situations where no unmeasuredconfounders is violated

  • Observational studies
  • Confounding
  • No unmeasured confounders and OLS
  • Estimating causal effects under no unmeasured confounders
  • Wrapping Up
Page 26: Gov 2002: 4. Observational Studies and Confounding · 2019-07-23 · 2. the coefficient of regression of𝐷𝑖on 𝐿𝑖also controlling for ... Basicapproachtoestimation • Remember

Justifying regression

bull We know how randomized experiments imply thatdifferences-in-means identify the ATE

bull In the next few weeks wersquoll work through how no unmeasuredconfounding justifies a number of estimation strategies

bull Today itrsquos useful to walk through what no unmeasuredconfounding can buy us in a familiar setting OLS

Wersquoll cover regression more formally later

Constant effects set up

bull Assume a constant effects setup

119884119894(0) = 120572 + 119883prime119894 120573 + 119906119894

119884119894(1) = 120572 + 120591 + 119883prime119894 120573 + 119906119894

bull Constant effects because 119884119894(1) minus 119884119894(0) = 120591 for all unitsbull Use consistency to get the usual regression formula

119884119894 = 119884119894(1)119863119894 + 119884119894(0)(1 minus 119863119894)= 119884119894(0) + 1114100119884119894(1) minus 119884119894(0)1114103 sdot 119863119894

= 120572 + 120591 sdot 119863119894 + 119883prime119894 120573 + 119906119894

bull Does no unmeasured confounding help us identify the causalparameter 120591

Regression on residuals

bull First estimate the residuals of regression of the treatment andoutcome on the covariates

119894 = 119884119894 minus 120124[119884119894|119883119894]119894 = 119863119894 minus 120124[119863119894|119883119894]

bull Running a regression of 119894 on 119894 is equivalent to controllingfor 119883119894

119884119894 = 120572 + 120591 sdot 119863119894 + 119883prime119894 120573 + 119906119894

119894 = 120572 + 120591 sdot 119894 + 119894bull Here 119894 = 119906119894 minus 120124[119906119894|119883119894]

What does OLS estimate

bull Using the usual OLS theory we can show that the probabilitylimit of the OLS estimator of 120591 is

plim 111369611136931113700 =Cov(119894 119894)Var(119894)

= Cov(119894 120572 + 120591 sdot 119894 + 119894)Var(119894)

= 120591 sdot Cov(119894 119894) + Cov(119894 119894)Var(119894)

= 120591 + Cov(119894 119894)Var(119894)

Key OLS assumption

plim 111369611136931113700 = 120591 + Cov(119894 119894)Var(119894)

bull Key identification comes from Cov(119894 119894) = 0 Conditional on 119883119894 no relationship between 119863119894 and 119906119894

bull Note 119906119894 is a function of 119883119894 and 119884119894(119889) 119906119894 = 119884119894(0) minus 120572 minus 119883prime

119894 120573 when 119863119894 = 0 119906119894 = 119884119894(1) minus 120572 minus 120591 minus 119883prime

119894 120573 when 119863119894 = 1 condition on 119883119894 only variation in 119906119894 comes from 119884119894(119889)

bull No unmeasured confounding implies this assumption

119863119894 ⟂⟂ 1114100119884119894(1) 119884119894(0)1114103|119883119894 ⟹ 119863119894 ⟂⟂ 119906119894|119883119894 ⟹ Cov(119894 119894) = 0

Omitted variable bias

bull What happens when this is violated Suppose that there isone omitted variable (residualized from 119883119894)

119894 = 120582119894 + 120596119894

bull Wersquoll assume that if we could measure 119871119894 then nounmeasured confounding would hold

bull Leads to inconsistency in the OLS estimator

plim 111369611136931113700 = 120591 + 120582Cov(119894 119894)Var(119894)

bull Bias here is terms multiplied together1 coefficient on 119871119894 (120582)2 the coefficient of regression of 119863119894 on 119871119894 also controlling for 119883119894

4 Estimating causaleffects under nounmeasuredconfounders

Basic approach to estimation

bull Remember the usual approach to estimating the ATE withcovariates

bull Stratification Stratify the units by the covariates Calculate CATE within these strata

bull Standardizationdirect adjustment Average the CATEs across the strata to get ATE

bull How to create strata when 119883 has continuous components If 119883 is discrete with only a few levels can use the exact values

of 119883 Otherwise we may have to subclassifycoarsen the data

Classic example cigarspipes versuscigarettes

bull 119863119894 = 1 for pipecigar smokers 119863119894 = 0 for cigarette smokersbull 119884119894 = death in the first year of follow-upbull Naive positive effect cigarpipe smokers more likely to die

Whatrsquos the confounder here Age Pipecigar smokers much older than cigarette smokers

bull Cochranrsquos approach stratify based on coarsened age Divide age into 119896 strata 119878119894 isin 1199041113568 1199041113569 hellip 119904119896 1199041113568 might be 18-25 1199041113569 might be 26-35 and so on Calculate effect within strata and aggregate

bull Key assumption no unmeasured confounders using stratifiedversion of age

119863119894 ⟂⟂ 1114100119884119894(0) 119884119894(1)1114103|119878119894

Stratification on the propensity score

bull What about when 119883 has has many dimensionsbull Curse of dimensionality there will be very few if any units in

a given stratum of 119883119894bull Stratify on a low-dimensional summary the propensity score

119890(119909) = ℙ[119863119894 = 1|119883119894 = 119909]

PS = unitrsquos probability of being treated conditional on 119883119894

bull For a particular unit this is 119890(119883119894) = ℙ[119863119894 = 1|119883119894]bull Rosenbaum and Rubin (1983) showed that

119863119894 ⟂⟂ 1114100119884119894(0) 119884119894(1)1114103 | 119883119894 ⟹ 119863119894 ⟂⟂ 1114100119884119894(0) 119884119894(1)1114103 | 119890(119883119894)

stratifying on 119890119894 is the same as stratifying on the full 119883119894

Propensity score as balancing score

bull The propensity score is actually a balancing score whichmeans that

119863119894 ⟂⟂ 119883119894 | 119890(119883119894)

bull Conditional on the propensity score treatment is independentof the covariates

Treatment status is said to be balanced 119891(119883119894|119863119894 = 1 119890(119883119894)) = 119891(119883119894|119863119894 = 0 119890(119883119894))

bull Of course we have to know the true PS to have all theseresults work

Estimating the propensity score

bull Of course in observational studies we donrsquot know thepropensity score

bull We would run a parametric model with parameters 120574 toestimate the propensity scores

1 Estimate 2 Create 119894 = Pr[119863119894 = 1|119883119894 ]

bull For instance in R we could easily calculate the propensityscores using the glm function

pscores lt- glm(treat ~ var1 + var2 + var3 data = mydata

family = binomial())$fittedvalues

Propensity score specifics

bull What variables do we include in the propensity score model Any set of variables that blocks all the backdoor paths from 119863119894

to 119884119894

bull Check balance within strata of 119894 Covariates should bebalanced

119891(119883119894|119863119894 = 1 119894) = 119891(119883119894|119863119894 = 0 119894)

bull Can also use automatednonparametric tools for estimating 119894 Covariate Balancing Propensity Scores (Imai and Ratkovic)

Stratifying by the propensity score

bull How will we use the propensity score Matching (next week) Weighting (two weeks) Regression

(three weeks)bull Today coarsening the propensity score and stratifyingbull Choose boundary points 0 = 1198871113567 lt 1198871113568 lt hellip lt 119887119870minus1113568 lt 119887119870 = 1bull Create block indicators

119861119894(119896) =

⎧⎪⎪⎨⎪⎪⎩1 if 119887119896minus1113568 lt (119883119894) lt 1198871198960 otherwise

bull Calculate within-strata effect estimates

120591119896 = 120124[119884119894|119863119894 = 1 119861119894(119896) = 1] minus 120124[119884119894|119863119894 = 0 119861119894(119896) = 1]

Standardizationdirect adjustment

bull We calculated the CATEs for each strata of the PS 120591119896bull We can use law of iterated expectations to back out the ATEbull Take the average of the CATEs over the distribution of 119883

120591 =1198701114012119896=1113568

120591119896ℙ[119861119894(119896) = 1]

bull Note that ℙ[119861119894(119896) = 1] is just the proportion of units in block 119896

ℙ[119861119894(119896) = 1] =sum119873

119894=1113568 119861119894(119896)119873

5 Wrapping Up

Summary

bull Defined observational studiesbull Defined confounding and assessed when no unmeasured

confounding holdsbull Saw how no unmeasured confounding helps with OLSbull Saw how to estimate causal effects under no unmeasured

confounding using the propensity score

Next few weeks

bull Learn how to estimate causal effects under no unmeasuredconfounders via

Matching Weighting Regression

bull Then we move onto situations where no unmeasuredconfounders is violated

  • Observational studies
  • Confounding
  • No unmeasured confounders and OLS
  • Estimating causal effects under no unmeasured confounders
  • Wrapping Up
Page 27: Gov 2002: 4. Observational Studies and Confounding · 2019-07-23 · 2. the coefficient of regression of𝐷𝑖on 𝐿𝑖also controlling for ... Basicapproachtoestimation • Remember

Constant effects set up

bull Assume a constant effects setup

119884119894(0) = 120572 + 119883prime119894 120573 + 119906119894

119884119894(1) = 120572 + 120591 + 119883prime119894 120573 + 119906119894

bull Constant effects because 119884119894(1) minus 119884119894(0) = 120591 for all unitsbull Use consistency to get the usual regression formula

119884119894 = 119884119894(1)119863119894 + 119884119894(0)(1 minus 119863119894)= 119884119894(0) + 1114100119884119894(1) minus 119884119894(0)1114103 sdot 119863119894

= 120572 + 120591 sdot 119863119894 + 119883prime119894 120573 + 119906119894

bull Does no unmeasured confounding help us identify the causalparameter 120591

Regression on residuals

bull First estimate the residuals of regression of the treatment andoutcome on the covariates

119894 = 119884119894 minus 120124[119884119894|119883119894]119894 = 119863119894 minus 120124[119863119894|119883119894]

bull Running a regression of 119894 on 119894 is equivalent to controllingfor 119883119894

119884119894 = 120572 + 120591 sdot 119863119894 + 119883prime119894 120573 + 119906119894

119894 = 120572 + 120591 sdot 119894 + 119894bull Here 119894 = 119906119894 minus 120124[119906119894|119883119894]

What does OLS estimate

bull Using the usual OLS theory we can show that the probabilitylimit of the OLS estimator of 120591 is

plim 111369611136931113700 =Cov(119894 119894)Var(119894)

= Cov(119894 120572 + 120591 sdot 119894 + 119894)Var(119894)

= 120591 sdot Cov(119894 119894) + Cov(119894 119894)Var(119894)

= 120591 + Cov(119894 119894)Var(119894)

Key OLS assumption

plim 111369611136931113700 = 120591 + Cov(119894 119894)Var(119894)

bull Key identification comes from Cov(119894 119894) = 0 Conditional on 119883119894 no relationship between 119863119894 and 119906119894

bull Note 119906119894 is a function of 119883119894 and 119884119894(119889) 119906119894 = 119884119894(0) minus 120572 minus 119883prime

119894 120573 when 119863119894 = 0 119906119894 = 119884119894(1) minus 120572 minus 120591 minus 119883prime

119894 120573 when 119863119894 = 1 condition on 119883119894 only variation in 119906119894 comes from 119884119894(119889)

bull No unmeasured confounding implies this assumption

119863119894 ⟂⟂ 1114100119884119894(1) 119884119894(0)1114103|119883119894 ⟹ 119863119894 ⟂⟂ 119906119894|119883119894 ⟹ Cov(119894 119894) = 0

Omitted variable bias

bull What happens when this is violated Suppose that there isone omitted variable (residualized from 119883119894)

119894 = 120582119894 + 120596119894

bull Wersquoll assume that if we could measure 119871119894 then nounmeasured confounding would hold

bull Leads to inconsistency in the OLS estimator

plim 111369611136931113700 = 120591 + 120582Cov(119894 119894)Var(119894)

bull Bias here is terms multiplied together1 coefficient on 119871119894 (120582)2 the coefficient of regression of 119863119894 on 119871119894 also controlling for 119883119894

4 Estimating causaleffects under nounmeasuredconfounders

Basic approach to estimation

bull Remember the usual approach to estimating the ATE withcovariates

bull Stratification Stratify the units by the covariates Calculate CATE within these strata

bull Standardizationdirect adjustment Average the CATEs across the strata to get ATE

bull How to create strata when 119883 has continuous components If 119883 is discrete with only a few levels can use the exact values

of 119883 Otherwise we may have to subclassifycoarsen the data

Classic example cigarspipes versuscigarettes

bull 119863119894 = 1 for pipecigar smokers 119863119894 = 0 for cigarette smokersbull 119884119894 = death in the first year of follow-upbull Naive positive effect cigarpipe smokers more likely to die

Whatrsquos the confounder here Age Pipecigar smokers much older than cigarette smokers

bull Cochranrsquos approach stratify based on coarsened age Divide age into 119896 strata 119878119894 isin 1199041113568 1199041113569 hellip 119904119896 1199041113568 might be 18-25 1199041113569 might be 26-35 and so on Calculate effect within strata and aggregate

bull Key assumption no unmeasured confounders using stratifiedversion of age

119863119894 ⟂⟂ 1114100119884119894(0) 119884119894(1)1114103|119878119894

Stratification on the propensity score

bull What about when 119883 has has many dimensionsbull Curse of dimensionality there will be very few if any units in

a given stratum of 119883119894bull Stratify on a low-dimensional summary the propensity score

119890(119909) = ℙ[119863119894 = 1|119883119894 = 119909]

PS = unitrsquos probability of being treated conditional on 119883119894

bull For a particular unit this is 119890(119883119894) = ℙ[119863119894 = 1|119883119894]bull Rosenbaum and Rubin (1983) showed that

119863119894 ⟂⟂ 1114100119884119894(0) 119884119894(1)1114103 | 119883119894 ⟹ 119863119894 ⟂⟂ 1114100119884119894(0) 119884119894(1)1114103 | 119890(119883119894)

stratifying on 119890119894 is the same as stratifying on the full 119883119894

Propensity score as balancing score

bull The propensity score is actually a balancing score whichmeans that

119863119894 ⟂⟂ 119883119894 | 119890(119883119894)

bull Conditional on the propensity score treatment is independentof the covariates

Treatment status is said to be balanced 119891(119883119894|119863119894 = 1 119890(119883119894)) = 119891(119883119894|119863119894 = 0 119890(119883119894))

bull Of course we have to know the true PS to have all theseresults work

Estimating the propensity score

bull Of course in observational studies we donrsquot know thepropensity score

bull We would run a parametric model with parameters 120574 toestimate the propensity scores

1 Estimate 2 Create 119894 = Pr[119863119894 = 1|119883119894 ]

bull For instance in R we could easily calculate the propensityscores using the glm function

pscores lt- glm(treat ~ var1 + var2 + var3 data = mydata

family = binomial())$fittedvalues

Propensity score specifics

bull What variables do we include in the propensity score model Any set of variables that blocks all the backdoor paths from 119863119894

to 119884119894

bull Check balance within strata of 119894 Covariates should bebalanced

119891(119883119894|119863119894 = 1 119894) = 119891(119883119894|119863119894 = 0 119894)

bull Can also use automatednonparametric tools for estimating 119894 Covariate Balancing Propensity Scores (Imai and Ratkovic)

Stratifying by the propensity score

bull How will we use the propensity score Matching (next week) Weighting (two weeks) Regression

(three weeks)bull Today coarsening the propensity score and stratifyingbull Choose boundary points 0 = 1198871113567 lt 1198871113568 lt hellip lt 119887119870minus1113568 lt 119887119870 = 1bull Create block indicators

119861119894(119896) =

⎧⎪⎪⎨⎪⎪⎩1 if 119887119896minus1113568 lt (119883119894) lt 1198871198960 otherwise

bull Calculate within-strata effect estimates

120591119896 = 120124[119884119894|119863119894 = 1 119861119894(119896) = 1] minus 120124[119884119894|119863119894 = 0 119861119894(119896) = 1]

Standardizationdirect adjustment

bull We calculated the CATEs for each strata of the PS 120591119896bull We can use law of iterated expectations to back out the ATEbull Take the average of the CATEs over the distribution of 119883

120591 =1198701114012119896=1113568

120591119896ℙ[119861119894(119896) = 1]

bull Note that ℙ[119861119894(119896) = 1] is just the proportion of units in block 119896

ℙ[119861119894(119896) = 1] =sum119873

119894=1113568 119861119894(119896)119873

5 Wrapping Up

Summary

bull Defined observational studiesbull Defined confounding and assessed when no unmeasured

confounding holdsbull Saw how no unmeasured confounding helps with OLSbull Saw how to estimate causal effects under no unmeasured

confounding using the propensity score

Next few weeks

bull Learn how to estimate causal effects under no unmeasuredconfounders via

Matching Weighting Regression

bull Then we move onto situations where no unmeasuredconfounders is violated

  • Observational studies
  • Confounding
  • No unmeasured confounders and OLS
  • Estimating causal effects under no unmeasured confounders
  • Wrapping Up
Page 28: Gov 2002: 4. Observational Studies and Confounding · 2019-07-23 · 2. the coefficient of regression of𝐷𝑖on 𝐿𝑖also controlling for ... Basicapproachtoestimation • Remember

Regression on residuals

bull First estimate the residuals of regression of the treatment andoutcome on the covariates

119894 = 119884119894 minus 120124[119884119894|119883119894]119894 = 119863119894 minus 120124[119863119894|119883119894]

bull Running a regression of 119894 on 119894 is equivalent to controllingfor 119883119894

119884119894 = 120572 + 120591 sdot 119863119894 + 119883prime119894 120573 + 119906119894

119894 = 120572 + 120591 sdot 119894 + 119894bull Here 119894 = 119906119894 minus 120124[119906119894|119883119894]

What does OLS estimate

bull Using the usual OLS theory we can show that the probabilitylimit of the OLS estimator of 120591 is

plim 111369611136931113700 =Cov(119894 119894)Var(119894)

= Cov(119894 120572 + 120591 sdot 119894 + 119894)Var(119894)

= 120591 sdot Cov(119894 119894) + Cov(119894 119894)Var(119894)

= 120591 + Cov(119894 119894)Var(119894)

Key OLS assumption

plim 111369611136931113700 = 120591 + Cov(119894 119894)Var(119894)

bull Key identification comes from Cov(119894 119894) = 0 Conditional on 119883119894 no relationship between 119863119894 and 119906119894

bull Note 119906119894 is a function of 119883119894 and 119884119894(119889) 119906119894 = 119884119894(0) minus 120572 minus 119883prime

119894 120573 when 119863119894 = 0 119906119894 = 119884119894(1) minus 120572 minus 120591 minus 119883prime

119894 120573 when 119863119894 = 1 condition on 119883119894 only variation in 119906119894 comes from 119884119894(119889)

bull No unmeasured confounding implies this assumption

119863119894 ⟂⟂ 1114100119884119894(1) 119884119894(0)1114103|119883119894 ⟹ 119863119894 ⟂⟂ 119906119894|119883119894 ⟹ Cov(119894 119894) = 0

Omitted variable bias

bull What happens when this is violated Suppose that there isone omitted variable (residualized from 119883119894)

119894 = 120582119894 + 120596119894

bull Wersquoll assume that if we could measure 119871119894 then nounmeasured confounding would hold

bull Leads to inconsistency in the OLS estimator

plim 111369611136931113700 = 120591 + 120582Cov(119894 119894)Var(119894)

bull Bias here is terms multiplied together1 coefficient on 119871119894 (120582)2 the coefficient of regression of 119863119894 on 119871119894 also controlling for 119883119894

4 Estimating causaleffects under nounmeasuredconfounders

Basic approach to estimation

bull Remember the usual approach to estimating the ATE withcovariates

bull Stratification Stratify the units by the covariates Calculate CATE within these strata

bull Standardizationdirect adjustment Average the CATEs across the strata to get ATE

bull How to create strata when 119883 has continuous components If 119883 is discrete with only a few levels can use the exact values

of 119883 Otherwise we may have to subclassifycoarsen the data

Classic example cigarspipes versuscigarettes

bull 119863119894 = 1 for pipecigar smokers 119863119894 = 0 for cigarette smokersbull 119884119894 = death in the first year of follow-upbull Naive positive effect cigarpipe smokers more likely to die

Whatrsquos the confounder here Age Pipecigar smokers much older than cigarette smokers

bull Cochranrsquos approach stratify based on coarsened age Divide age into 119896 strata 119878119894 isin 1199041113568 1199041113569 hellip 119904119896 1199041113568 might be 18-25 1199041113569 might be 26-35 and so on Calculate effect within strata and aggregate

bull Key assumption no unmeasured confounders using stratifiedversion of age

119863119894 ⟂⟂ 1114100119884119894(0) 119884119894(1)1114103|119878119894

Stratification on the propensity score

bull What about when 119883 has has many dimensionsbull Curse of dimensionality there will be very few if any units in

a given stratum of 119883119894bull Stratify on a low-dimensional summary the propensity score

119890(119909) = ℙ[119863119894 = 1|119883119894 = 119909]

PS = unitrsquos probability of being treated conditional on 119883119894

bull For a particular unit this is 119890(119883119894) = ℙ[119863119894 = 1|119883119894]bull Rosenbaum and Rubin (1983) showed that

119863119894 ⟂⟂ 1114100119884119894(0) 119884119894(1)1114103 | 119883119894 ⟹ 119863119894 ⟂⟂ 1114100119884119894(0) 119884119894(1)1114103 | 119890(119883119894)

stratifying on 119890119894 is the same as stratifying on the full 119883119894

Propensity score as balancing score

bull The propensity score is actually a balancing score whichmeans that

119863119894 ⟂⟂ 119883119894 | 119890(119883119894)

bull Conditional on the propensity score treatment is independentof the covariates

Treatment status is said to be balanced 119891(119883119894|119863119894 = 1 119890(119883119894)) = 119891(119883119894|119863119894 = 0 119890(119883119894))

bull Of course we have to know the true PS to have all theseresults work

Estimating the propensity score

bull Of course in observational studies we donrsquot know thepropensity score

bull We would run a parametric model with parameters 120574 toestimate the propensity scores

1 Estimate 2 Create 119894 = Pr[119863119894 = 1|119883119894 ]

bull For instance in R we could easily calculate the propensityscores using the glm function

pscores lt- glm(treat ~ var1 + var2 + var3 data = mydata

family = binomial())$fittedvalues

Propensity score specifics

bull What variables do we include in the propensity score model Any set of variables that blocks all the backdoor paths from 119863119894

to 119884119894

bull Check balance within strata of 119894 Covariates should bebalanced

119891(119883119894|119863119894 = 1 119894) = 119891(119883119894|119863119894 = 0 119894)

bull Can also use automatednonparametric tools for estimating 119894 Covariate Balancing Propensity Scores (Imai and Ratkovic)

Stratifying by the propensity score

bull How will we use the propensity score Matching (next week) Weighting (two weeks) Regression

(three weeks)bull Today coarsening the propensity score and stratifyingbull Choose boundary points 0 = 1198871113567 lt 1198871113568 lt hellip lt 119887119870minus1113568 lt 119887119870 = 1bull Create block indicators

119861119894(119896) =

⎧⎪⎪⎨⎪⎪⎩1 if 119887119896minus1113568 lt (119883119894) lt 1198871198960 otherwise

bull Calculate within-strata effect estimates

120591119896 = 120124[119884119894|119863119894 = 1 119861119894(119896) = 1] minus 120124[119884119894|119863119894 = 0 119861119894(119896) = 1]

Standardizationdirect adjustment

bull We calculated the CATEs for each strata of the PS 120591119896bull We can use law of iterated expectations to back out the ATEbull Take the average of the CATEs over the distribution of 119883

120591 =1198701114012119896=1113568

120591119896ℙ[119861119894(119896) = 1]

bull Note that ℙ[119861119894(119896) = 1] is just the proportion of units in block 119896

ℙ[119861119894(119896) = 1] =sum119873

119894=1113568 119861119894(119896)119873

5 Wrapping Up

Summary

bull Defined observational studiesbull Defined confounding and assessed when no unmeasured

confounding holdsbull Saw how no unmeasured confounding helps with OLSbull Saw how to estimate causal effects under no unmeasured

confounding using the propensity score

Next few weeks

bull Learn how to estimate causal effects under no unmeasuredconfounders via

Matching Weighting Regression

bull Then we move onto situations where no unmeasuredconfounders is violated

  • Observational studies
  • Confounding
  • No unmeasured confounders and OLS
  • Estimating causal effects under no unmeasured confounders
  • Wrapping Up
Page 29: Gov 2002: 4. Observational Studies and Confounding · 2019-07-23 · 2. the coefficient of regression of𝐷𝑖on 𝐿𝑖also controlling for ... Basicapproachtoestimation • Remember

What does OLS estimate

bull Using the usual OLS theory we can show that the probabilitylimit of the OLS estimator of 120591 is

plim 111369611136931113700 =Cov(119894 119894)Var(119894)

= Cov(119894 120572 + 120591 sdot 119894 + 119894)Var(119894)

= 120591 sdot Cov(119894 119894) + Cov(119894 119894)Var(119894)

= 120591 + Cov(119894 119894)Var(119894)

Key OLS assumption

plim 111369611136931113700 = 120591 + Cov(119894 119894)Var(119894)

bull Key identification comes from Cov(119894 119894) = 0 Conditional on 119883119894 no relationship between 119863119894 and 119906119894

bull Note 119906119894 is a function of 119883119894 and 119884119894(119889) 119906119894 = 119884119894(0) minus 120572 minus 119883prime

119894 120573 when 119863119894 = 0 119906119894 = 119884119894(1) minus 120572 minus 120591 minus 119883prime

119894 120573 when 119863119894 = 1 condition on 119883119894 only variation in 119906119894 comes from 119884119894(119889)

bull No unmeasured confounding implies this assumption

119863119894 ⟂⟂ 1114100119884119894(1) 119884119894(0)1114103|119883119894 ⟹ 119863119894 ⟂⟂ 119906119894|119883119894 ⟹ Cov(119894 119894) = 0

Omitted variable bias

bull What happens when this is violated Suppose that there isone omitted variable (residualized from 119883119894)

119894 = 120582119894 + 120596119894

bull Wersquoll assume that if we could measure 119871119894 then nounmeasured confounding would hold

bull Leads to inconsistency in the OLS estimator

plim 111369611136931113700 = 120591 + 120582Cov(119894 119894)Var(119894)

bull Bias here is terms multiplied together1 coefficient on 119871119894 (120582)2 the coefficient of regression of 119863119894 on 119871119894 also controlling for 119883119894

4 Estimating causaleffects under nounmeasuredconfounders

Basic approach to estimation

bull Remember the usual approach to estimating the ATE withcovariates

bull Stratification Stratify the units by the covariates Calculate CATE within these strata

bull Standardizationdirect adjustment Average the CATEs across the strata to get ATE

bull How to create strata when 119883 has continuous components If 119883 is discrete with only a few levels can use the exact values

of 119883 Otherwise we may have to subclassifycoarsen the data

Classic example cigarspipes versuscigarettes

bull 119863119894 = 1 for pipecigar smokers 119863119894 = 0 for cigarette smokersbull 119884119894 = death in the first year of follow-upbull Naive positive effect cigarpipe smokers more likely to die

Whatrsquos the confounder here Age Pipecigar smokers much older than cigarette smokers

bull Cochranrsquos approach stratify based on coarsened age Divide age into 119896 strata 119878119894 isin 1199041113568 1199041113569 hellip 119904119896 1199041113568 might be 18-25 1199041113569 might be 26-35 and so on Calculate effect within strata and aggregate

bull Key assumption no unmeasured confounders using stratifiedversion of age

119863119894 ⟂⟂ 1114100119884119894(0) 119884119894(1)1114103|119878119894

Stratification on the propensity score

bull What about when 119883 has has many dimensionsbull Curse of dimensionality there will be very few if any units in

a given stratum of 119883119894bull Stratify on a low-dimensional summary the propensity score

119890(119909) = ℙ[119863119894 = 1|119883119894 = 119909]

PS = unitrsquos probability of being treated conditional on 119883119894

bull For a particular unit this is 119890(119883119894) = ℙ[119863119894 = 1|119883119894]bull Rosenbaum and Rubin (1983) showed that

119863119894 ⟂⟂ 1114100119884119894(0) 119884119894(1)1114103 | 119883119894 ⟹ 119863119894 ⟂⟂ 1114100119884119894(0) 119884119894(1)1114103 | 119890(119883119894)

stratifying on 119890119894 is the same as stratifying on the full 119883119894

Propensity score as balancing score

bull The propensity score is actually a balancing score whichmeans that

119863119894 ⟂⟂ 119883119894 | 119890(119883119894)

bull Conditional on the propensity score treatment is independentof the covariates

Treatment status is said to be balanced 119891(119883119894|119863119894 = 1 119890(119883119894)) = 119891(119883119894|119863119894 = 0 119890(119883119894))

bull Of course we have to know the true PS to have all theseresults work

Estimating the propensity score

bull Of course in observational studies we donrsquot know thepropensity score

bull We would run a parametric model with parameters 120574 toestimate the propensity scores

1 Estimate 2 Create 119894 = Pr[119863119894 = 1|119883119894 ]

bull For instance in R we could easily calculate the propensityscores using the glm function

pscores lt- glm(treat ~ var1 + var2 + var3 data = mydata

family = binomial())$fittedvalues

Propensity score specifics

bull What variables do we include in the propensity score model Any set of variables that blocks all the backdoor paths from 119863119894

to 119884119894

bull Check balance within strata of 119894 Covariates should bebalanced

119891(119883119894|119863119894 = 1 119894) = 119891(119883119894|119863119894 = 0 119894)

bull Can also use automatednonparametric tools for estimating 119894 Covariate Balancing Propensity Scores (Imai and Ratkovic)

Stratifying by the propensity score

bull How will we use the propensity score Matching (next week) Weighting (two weeks) Regression

(three weeks)bull Today coarsening the propensity score and stratifyingbull Choose boundary points 0 = 1198871113567 lt 1198871113568 lt hellip lt 119887119870minus1113568 lt 119887119870 = 1bull Create block indicators

119861119894(119896) =

⎧⎪⎪⎨⎪⎪⎩1 if 119887119896minus1113568 lt (119883119894) lt 1198871198960 otherwise

bull Calculate within-strata effect estimates

120591119896 = 120124[119884119894|119863119894 = 1 119861119894(119896) = 1] minus 120124[119884119894|119863119894 = 0 119861119894(119896) = 1]

Standardizationdirect adjustment

bull We calculated the CATEs for each strata of the PS 120591119896bull We can use law of iterated expectations to back out the ATEbull Take the average of the CATEs over the distribution of 119883

120591 =1198701114012119896=1113568

120591119896ℙ[119861119894(119896) = 1]

bull Note that ℙ[119861119894(119896) = 1] is just the proportion of units in block 119896

ℙ[119861119894(119896) = 1] =sum119873

119894=1113568 119861119894(119896)119873

5 Wrapping Up

Summary

bull Defined observational studiesbull Defined confounding and assessed when no unmeasured

confounding holdsbull Saw how no unmeasured confounding helps with OLSbull Saw how to estimate causal effects under no unmeasured

confounding using the propensity score

Next few weeks

bull Learn how to estimate causal effects under no unmeasuredconfounders via

Matching Weighting Regression

bull Then we move onto situations where no unmeasuredconfounders is violated

  • Observational studies
  • Confounding
  • No unmeasured confounders and OLS
  • Estimating causal effects under no unmeasured confounders
  • Wrapping Up
Page 30: Gov 2002: 4. Observational Studies and Confounding · 2019-07-23 · 2. the coefficient of regression of𝐷𝑖on 𝐿𝑖also controlling for ... Basicapproachtoestimation • Remember

Key OLS assumption

plim 111369611136931113700 = 120591 + Cov(119894 119894)Var(119894)

bull Key identification comes from Cov(119894 119894) = 0 Conditional on 119883119894 no relationship between 119863119894 and 119906119894

bull Note 119906119894 is a function of 119883119894 and 119884119894(119889) 119906119894 = 119884119894(0) minus 120572 minus 119883prime

119894 120573 when 119863119894 = 0 119906119894 = 119884119894(1) minus 120572 minus 120591 minus 119883prime

119894 120573 when 119863119894 = 1 condition on 119883119894 only variation in 119906119894 comes from 119884119894(119889)

bull No unmeasured confounding implies this assumption

119863119894 ⟂⟂ 1114100119884119894(1) 119884119894(0)1114103|119883119894 ⟹ 119863119894 ⟂⟂ 119906119894|119883119894 ⟹ Cov(119894 119894) = 0

Omitted variable bias

bull What happens when this is violated Suppose that there isone omitted variable (residualized from 119883119894)

119894 = 120582119894 + 120596119894

bull Wersquoll assume that if we could measure 119871119894 then nounmeasured confounding would hold

bull Leads to inconsistency in the OLS estimator

plim 111369611136931113700 = 120591 + 120582Cov(119894 119894)Var(119894)

bull Bias here is terms multiplied together1 coefficient on 119871119894 (120582)2 the coefficient of regression of 119863119894 on 119871119894 also controlling for 119883119894

4 Estimating causaleffects under nounmeasuredconfounders

Basic approach to estimation

bull Remember the usual approach to estimating the ATE withcovariates

bull Stratification Stratify the units by the covariates Calculate CATE within these strata

bull Standardizationdirect adjustment Average the CATEs across the strata to get ATE

bull How to create strata when 119883 has continuous components If 119883 is discrete with only a few levels can use the exact values

of 119883 Otherwise we may have to subclassifycoarsen the data

Classic example cigarspipes versuscigarettes

bull 119863119894 = 1 for pipecigar smokers 119863119894 = 0 for cigarette smokersbull 119884119894 = death in the first year of follow-upbull Naive positive effect cigarpipe smokers more likely to die

Whatrsquos the confounder here Age Pipecigar smokers much older than cigarette smokers

bull Cochranrsquos approach stratify based on coarsened age Divide age into 119896 strata 119878119894 isin 1199041113568 1199041113569 hellip 119904119896 1199041113568 might be 18-25 1199041113569 might be 26-35 and so on Calculate effect within strata and aggregate

bull Key assumption no unmeasured confounders using stratifiedversion of age

119863119894 ⟂⟂ 1114100119884119894(0) 119884119894(1)1114103|119878119894

Stratification on the propensity score

bull What about when 119883 has has many dimensionsbull Curse of dimensionality there will be very few if any units in

a given stratum of 119883119894bull Stratify on a low-dimensional summary the propensity score

119890(119909) = ℙ[119863119894 = 1|119883119894 = 119909]

PS = unitrsquos probability of being treated conditional on 119883119894

bull For a particular unit this is 119890(119883119894) = ℙ[119863119894 = 1|119883119894]bull Rosenbaum and Rubin (1983) showed that

119863119894 ⟂⟂ 1114100119884119894(0) 119884119894(1)1114103 | 119883119894 ⟹ 119863119894 ⟂⟂ 1114100119884119894(0) 119884119894(1)1114103 | 119890(119883119894)

stratifying on 119890119894 is the same as stratifying on the full 119883119894

Propensity score as balancing score

bull The propensity score is actually a balancing score whichmeans that

119863119894 ⟂⟂ 119883119894 | 119890(119883119894)

bull Conditional on the propensity score treatment is independentof the covariates

Treatment status is said to be balanced 119891(119883119894|119863119894 = 1 119890(119883119894)) = 119891(119883119894|119863119894 = 0 119890(119883119894))

bull Of course we have to know the true PS to have all theseresults work

Estimating the propensity score

bull Of course in observational studies we donrsquot know thepropensity score

bull We would run a parametric model with parameters 120574 toestimate the propensity scores

1 Estimate 2 Create 119894 = Pr[119863119894 = 1|119883119894 ]

bull For instance in R we could easily calculate the propensityscores using the glm function

pscores lt- glm(treat ~ var1 + var2 + var3 data = mydata

family = binomial())$fittedvalues

Propensity score specifics

bull What variables do we include in the propensity score model Any set of variables that blocks all the backdoor paths from 119863119894

to 119884119894

bull Check balance within strata of 119894 Covariates should bebalanced

119891(119883119894|119863119894 = 1 119894) = 119891(119883119894|119863119894 = 0 119894)

bull Can also use automatednonparametric tools for estimating 119894 Covariate Balancing Propensity Scores (Imai and Ratkovic)

Stratifying by the propensity score

bull How will we use the propensity score Matching (next week) Weighting (two weeks) Regression

(three weeks)bull Today coarsening the propensity score and stratifyingbull Choose boundary points 0 = 1198871113567 lt 1198871113568 lt hellip lt 119887119870minus1113568 lt 119887119870 = 1bull Create block indicators

119861119894(119896) =

⎧⎪⎪⎨⎪⎪⎩1 if 119887119896minus1113568 lt (119883119894) lt 1198871198960 otherwise

bull Calculate within-strata effect estimates

120591119896 = 120124[119884119894|119863119894 = 1 119861119894(119896) = 1] minus 120124[119884119894|119863119894 = 0 119861119894(119896) = 1]

Standardizationdirect adjustment

bull We calculated the CATEs for each strata of the PS 120591119896bull We can use law of iterated expectations to back out the ATEbull Take the average of the CATEs over the distribution of 119883

120591 =1198701114012119896=1113568

120591119896ℙ[119861119894(119896) = 1]

bull Note that ℙ[119861119894(119896) = 1] is just the proportion of units in block 119896

ℙ[119861119894(119896) = 1] =sum119873

119894=1113568 119861119894(119896)119873

5 Wrapping Up

Summary

bull Defined observational studiesbull Defined confounding and assessed when no unmeasured

confounding holdsbull Saw how no unmeasured confounding helps with OLSbull Saw how to estimate causal effects under no unmeasured

confounding using the propensity score

Next few weeks

bull Learn how to estimate causal effects under no unmeasuredconfounders via

Matching Weighting Regression

bull Then we move onto situations where no unmeasuredconfounders is violated

  • Observational studies
  • Confounding
  • No unmeasured confounders and OLS
  • Estimating causal effects under no unmeasured confounders
  • Wrapping Up
Page 31: Gov 2002: 4. Observational Studies and Confounding · 2019-07-23 · 2. the coefficient of regression of𝐷𝑖on 𝐿𝑖also controlling for ... Basicapproachtoestimation • Remember

Omitted variable bias

bull What happens when this is violated Suppose that there isone omitted variable (residualized from 119883119894)

119894 = 120582119894 + 120596119894

bull Wersquoll assume that if we could measure 119871119894 then nounmeasured confounding would hold

bull Leads to inconsistency in the OLS estimator

plim 111369611136931113700 = 120591 + 120582Cov(119894 119894)Var(119894)

bull Bias here is terms multiplied together1 coefficient on 119871119894 (120582)2 the coefficient of regression of 119863119894 on 119871119894 also controlling for 119883119894

4 Estimating causaleffects under nounmeasuredconfounders

Basic approach to estimation

bull Remember the usual approach to estimating the ATE withcovariates

bull Stratification Stratify the units by the covariates Calculate CATE within these strata

bull Standardizationdirect adjustment Average the CATEs across the strata to get ATE

bull How to create strata when 119883 has continuous components If 119883 is discrete with only a few levels can use the exact values

of 119883 Otherwise we may have to subclassifycoarsen the data

Classic example cigarspipes versuscigarettes

bull 119863119894 = 1 for pipecigar smokers 119863119894 = 0 for cigarette smokersbull 119884119894 = death in the first year of follow-upbull Naive positive effect cigarpipe smokers more likely to die

Whatrsquos the confounder here Age Pipecigar smokers much older than cigarette smokers

bull Cochranrsquos approach stratify based on coarsened age Divide age into 119896 strata 119878119894 isin 1199041113568 1199041113569 hellip 119904119896 1199041113568 might be 18-25 1199041113569 might be 26-35 and so on Calculate effect within strata and aggregate

bull Key assumption no unmeasured confounders using stratifiedversion of age

119863119894 ⟂⟂ 1114100119884119894(0) 119884119894(1)1114103|119878119894

Stratification on the propensity score

bull What about when 119883 has has many dimensionsbull Curse of dimensionality there will be very few if any units in

a given stratum of 119883119894bull Stratify on a low-dimensional summary the propensity score

119890(119909) = ℙ[119863119894 = 1|119883119894 = 119909]

PS = unitrsquos probability of being treated conditional on 119883119894

bull For a particular unit this is 119890(119883119894) = ℙ[119863119894 = 1|119883119894]bull Rosenbaum and Rubin (1983) showed that

119863119894 ⟂⟂ 1114100119884119894(0) 119884119894(1)1114103 | 119883119894 ⟹ 119863119894 ⟂⟂ 1114100119884119894(0) 119884119894(1)1114103 | 119890(119883119894)

stratifying on 119890119894 is the same as stratifying on the full 119883119894

Propensity score as balancing score

bull The propensity score is actually a balancing score whichmeans that

119863119894 ⟂⟂ 119883119894 | 119890(119883119894)

bull Conditional on the propensity score treatment is independentof the covariates

Treatment status is said to be balanced 119891(119883119894|119863119894 = 1 119890(119883119894)) = 119891(119883119894|119863119894 = 0 119890(119883119894))

bull Of course we have to know the true PS to have all theseresults work

Estimating the propensity score

bull Of course in observational studies we donrsquot know thepropensity score

bull We would run a parametric model with parameters 120574 toestimate the propensity scores

1 Estimate 2 Create 119894 = Pr[119863119894 = 1|119883119894 ]

bull For instance in R we could easily calculate the propensityscores using the glm function

pscores lt- glm(treat ~ var1 + var2 + var3 data = mydata

family = binomial())$fittedvalues

Propensity score specifics

bull What variables do we include in the propensity score model Any set of variables that blocks all the backdoor paths from 119863119894

to 119884119894

bull Check balance within strata of 119894 Covariates should bebalanced

119891(119883119894|119863119894 = 1 119894) = 119891(119883119894|119863119894 = 0 119894)

bull Can also use automatednonparametric tools for estimating 119894 Covariate Balancing Propensity Scores (Imai and Ratkovic)

Stratifying by the propensity score

bull How will we use the propensity score Matching (next week) Weighting (two weeks) Regression

(three weeks)bull Today coarsening the propensity score and stratifyingbull Choose boundary points 0 = 1198871113567 lt 1198871113568 lt hellip lt 119887119870minus1113568 lt 119887119870 = 1bull Create block indicators

119861119894(119896) =

⎧⎪⎪⎨⎪⎪⎩1 if 119887119896minus1113568 lt (119883119894) lt 1198871198960 otherwise

bull Calculate within-strata effect estimates

120591119896 = 120124[119884119894|119863119894 = 1 119861119894(119896) = 1] minus 120124[119884119894|119863119894 = 0 119861119894(119896) = 1]

Standardizationdirect adjustment

bull We calculated the CATEs for each strata of the PS 120591119896bull We can use law of iterated expectations to back out the ATEbull Take the average of the CATEs over the distribution of 119883

120591 =1198701114012119896=1113568

120591119896ℙ[119861119894(119896) = 1]

bull Note that ℙ[119861119894(119896) = 1] is just the proportion of units in block 119896

ℙ[119861119894(119896) = 1] =sum119873

119894=1113568 119861119894(119896)119873

5 Wrapping Up

Summary

bull Defined observational studiesbull Defined confounding and assessed when no unmeasured

confounding holdsbull Saw how no unmeasured confounding helps with OLSbull Saw how to estimate causal effects under no unmeasured

confounding using the propensity score

Next few weeks

bull Learn how to estimate causal effects under no unmeasuredconfounders via

Matching Weighting Regression

bull Then we move onto situations where no unmeasuredconfounders is violated

  • Observational studies
  • Confounding
  • No unmeasured confounders and OLS
  • Estimating causal effects under no unmeasured confounders
  • Wrapping Up
Page 32: Gov 2002: 4. Observational Studies and Confounding · 2019-07-23 · 2. the coefficient of regression of𝐷𝑖on 𝐿𝑖also controlling for ... Basicapproachtoestimation • Remember

4 Estimating causaleffects under nounmeasuredconfounders

Basic approach to estimation

bull Remember the usual approach to estimating the ATE withcovariates

bull Stratification Stratify the units by the covariates Calculate CATE within these strata

bull Standardizationdirect adjustment Average the CATEs across the strata to get ATE

bull How to create strata when 119883 has continuous components If 119883 is discrete with only a few levels can use the exact values

of 119883 Otherwise we may have to subclassifycoarsen the data

Classic example cigarspipes versuscigarettes

bull 119863119894 = 1 for pipecigar smokers 119863119894 = 0 for cigarette smokersbull 119884119894 = death in the first year of follow-upbull Naive positive effect cigarpipe smokers more likely to die

Whatrsquos the confounder here Age Pipecigar smokers much older than cigarette smokers

bull Cochranrsquos approach stratify based on coarsened age Divide age into 119896 strata 119878119894 isin 1199041113568 1199041113569 hellip 119904119896 1199041113568 might be 18-25 1199041113569 might be 26-35 and so on Calculate effect within strata and aggregate

bull Key assumption no unmeasured confounders using stratifiedversion of age

119863119894 ⟂⟂ 1114100119884119894(0) 119884119894(1)1114103|119878119894

Stratification on the propensity score

bull What about when 119883 has has many dimensionsbull Curse of dimensionality there will be very few if any units in

a given stratum of 119883119894bull Stratify on a low-dimensional summary the propensity score

119890(119909) = ℙ[119863119894 = 1|119883119894 = 119909]

PS = unitrsquos probability of being treated conditional on 119883119894

bull For a particular unit this is 119890(119883119894) = ℙ[119863119894 = 1|119883119894]bull Rosenbaum and Rubin (1983) showed that

119863119894 ⟂⟂ 1114100119884119894(0) 119884119894(1)1114103 | 119883119894 ⟹ 119863119894 ⟂⟂ 1114100119884119894(0) 119884119894(1)1114103 | 119890(119883119894)

stratifying on 119890119894 is the same as stratifying on the full 119883119894

Propensity score as balancing score

bull The propensity score is actually a balancing score whichmeans that

119863119894 ⟂⟂ 119883119894 | 119890(119883119894)

bull Conditional on the propensity score treatment is independentof the covariates

Treatment status is said to be balanced 119891(119883119894|119863119894 = 1 119890(119883119894)) = 119891(119883119894|119863119894 = 0 119890(119883119894))

bull Of course we have to know the true PS to have all theseresults work

Estimating the propensity score

bull Of course in observational studies we donrsquot know thepropensity score

bull We would run a parametric model with parameters 120574 toestimate the propensity scores

1 Estimate 2 Create 119894 = Pr[119863119894 = 1|119883119894 ]

bull For instance in R we could easily calculate the propensityscores using the glm function

pscores lt- glm(treat ~ var1 + var2 + var3 data = mydata

family = binomial())$fittedvalues

Propensity score specifics

bull What variables do we include in the propensity score model Any set of variables that blocks all the backdoor paths from 119863119894

to 119884119894

bull Check balance within strata of 119894 Covariates should bebalanced

119891(119883119894|119863119894 = 1 119894) = 119891(119883119894|119863119894 = 0 119894)

bull Can also use automatednonparametric tools for estimating 119894 Covariate Balancing Propensity Scores (Imai and Ratkovic)

Stratifying by the propensity score

bull How will we use the propensity score Matching (next week) Weighting (two weeks) Regression

(three weeks)bull Today coarsening the propensity score and stratifyingbull Choose boundary points 0 = 1198871113567 lt 1198871113568 lt hellip lt 119887119870minus1113568 lt 119887119870 = 1bull Create block indicators

119861119894(119896) =

⎧⎪⎪⎨⎪⎪⎩1 if 119887119896minus1113568 lt (119883119894) lt 1198871198960 otherwise

bull Calculate within-strata effect estimates

120591119896 = 120124[119884119894|119863119894 = 1 119861119894(119896) = 1] minus 120124[119884119894|119863119894 = 0 119861119894(119896) = 1]

Standardizationdirect adjustment

bull We calculated the CATEs for each strata of the PS 120591119896bull We can use law of iterated expectations to back out the ATEbull Take the average of the CATEs over the distribution of 119883

120591 =1198701114012119896=1113568

120591119896ℙ[119861119894(119896) = 1]

bull Note that ℙ[119861119894(119896) = 1] is just the proportion of units in block 119896

ℙ[119861119894(119896) = 1] =sum119873

119894=1113568 119861119894(119896)119873

5 Wrapping Up

Summary

bull Defined observational studiesbull Defined confounding and assessed when no unmeasured

confounding holdsbull Saw how no unmeasured confounding helps with OLSbull Saw how to estimate causal effects under no unmeasured

confounding using the propensity score

Next few weeks

bull Learn how to estimate causal effects under no unmeasuredconfounders via

Matching Weighting Regression

bull Then we move onto situations where no unmeasuredconfounders is violated

  • Observational studies
  • Confounding
  • No unmeasured confounders and OLS
  • Estimating causal effects under no unmeasured confounders
  • Wrapping Up
Page 33: Gov 2002: 4. Observational Studies and Confounding · 2019-07-23 · 2. the coefficient of regression of𝐷𝑖on 𝐿𝑖also controlling for ... Basicapproachtoestimation • Remember

Basic approach to estimation

bull Remember the usual approach to estimating the ATE withcovariates

bull Stratification Stratify the units by the covariates Calculate CATE within these strata

bull Standardizationdirect adjustment Average the CATEs across the strata to get ATE

bull How to create strata when 119883 has continuous components If 119883 is discrete with only a few levels can use the exact values

of 119883 Otherwise we may have to subclassifycoarsen the data

Classic example cigarspipes versuscigarettes

bull 119863119894 = 1 for pipecigar smokers 119863119894 = 0 for cigarette smokersbull 119884119894 = death in the first year of follow-upbull Naive positive effect cigarpipe smokers more likely to die

Whatrsquos the confounder here Age Pipecigar smokers much older than cigarette smokers

bull Cochranrsquos approach stratify based on coarsened age Divide age into 119896 strata 119878119894 isin 1199041113568 1199041113569 hellip 119904119896 1199041113568 might be 18-25 1199041113569 might be 26-35 and so on Calculate effect within strata and aggregate

bull Key assumption no unmeasured confounders using stratifiedversion of age

119863119894 ⟂⟂ 1114100119884119894(0) 119884119894(1)1114103|119878119894

Stratification on the propensity score

bull What about when 119883 has has many dimensionsbull Curse of dimensionality there will be very few if any units in

a given stratum of 119883119894bull Stratify on a low-dimensional summary the propensity score

119890(119909) = ℙ[119863119894 = 1|119883119894 = 119909]

PS = unitrsquos probability of being treated conditional on 119883119894

bull For a particular unit this is 119890(119883119894) = ℙ[119863119894 = 1|119883119894]bull Rosenbaum and Rubin (1983) showed that

119863119894 ⟂⟂ 1114100119884119894(0) 119884119894(1)1114103 | 119883119894 ⟹ 119863119894 ⟂⟂ 1114100119884119894(0) 119884119894(1)1114103 | 119890(119883119894)

stratifying on 119890119894 is the same as stratifying on the full 119883119894

Propensity score as balancing score

bull The propensity score is actually a balancing score whichmeans that

119863119894 ⟂⟂ 119883119894 | 119890(119883119894)

bull Conditional on the propensity score treatment is independentof the covariates

Treatment status is said to be balanced 119891(119883119894|119863119894 = 1 119890(119883119894)) = 119891(119883119894|119863119894 = 0 119890(119883119894))

bull Of course we have to know the true PS to have all theseresults work

Estimating the propensity score

bull Of course in observational studies we donrsquot know thepropensity score

bull We would run a parametric model with parameters 120574 toestimate the propensity scores

1 Estimate 2 Create 119894 = Pr[119863119894 = 1|119883119894 ]

bull For instance in R we could easily calculate the propensityscores using the glm function

pscores lt- glm(treat ~ var1 + var2 + var3 data = mydata

family = binomial())$fittedvalues

Propensity score specifics

bull What variables do we include in the propensity score model Any set of variables that blocks all the backdoor paths from 119863119894

to 119884119894

bull Check balance within strata of 119894 Covariates should bebalanced

119891(119883119894|119863119894 = 1 119894) = 119891(119883119894|119863119894 = 0 119894)

bull Can also use automatednonparametric tools for estimating 119894 Covariate Balancing Propensity Scores (Imai and Ratkovic)

Stratifying by the propensity score

bull How will we use the propensity score Matching (next week) Weighting (two weeks) Regression

(three weeks)bull Today coarsening the propensity score and stratifyingbull Choose boundary points 0 = 1198871113567 lt 1198871113568 lt hellip lt 119887119870minus1113568 lt 119887119870 = 1bull Create block indicators

119861119894(119896) =

⎧⎪⎪⎨⎪⎪⎩1 if 119887119896minus1113568 lt (119883119894) lt 1198871198960 otherwise

bull Calculate within-strata effect estimates

120591119896 = 120124[119884119894|119863119894 = 1 119861119894(119896) = 1] minus 120124[119884119894|119863119894 = 0 119861119894(119896) = 1]

Standardizationdirect adjustment

bull We calculated the CATEs for each strata of the PS 120591119896bull We can use law of iterated expectations to back out the ATEbull Take the average of the CATEs over the distribution of 119883

120591 =1198701114012119896=1113568

120591119896ℙ[119861119894(119896) = 1]

bull Note that ℙ[119861119894(119896) = 1] is just the proportion of units in block 119896

ℙ[119861119894(119896) = 1] =sum119873

119894=1113568 119861119894(119896)119873

5 Wrapping Up

Summary

bull Defined observational studiesbull Defined confounding and assessed when no unmeasured

confounding holdsbull Saw how no unmeasured confounding helps with OLSbull Saw how to estimate causal effects under no unmeasured

confounding using the propensity score

Next few weeks

bull Learn how to estimate causal effects under no unmeasuredconfounders via

Matching Weighting Regression

bull Then we move onto situations where no unmeasuredconfounders is violated

  • Observational studies
  • Confounding
  • No unmeasured confounders and OLS
  • Estimating causal effects under no unmeasured confounders
  • Wrapping Up
Page 34: Gov 2002: 4. Observational Studies and Confounding · 2019-07-23 · 2. the coefficient of regression of𝐷𝑖on 𝐿𝑖also controlling for ... Basicapproachtoestimation • Remember

Classic example cigarspipes versuscigarettes

bull 119863119894 = 1 for pipecigar smokers 119863119894 = 0 for cigarette smokersbull 119884119894 = death in the first year of follow-upbull Naive positive effect cigarpipe smokers more likely to die

Whatrsquos the confounder here Age Pipecigar smokers much older than cigarette smokers

bull Cochranrsquos approach stratify based on coarsened age Divide age into 119896 strata 119878119894 isin 1199041113568 1199041113569 hellip 119904119896 1199041113568 might be 18-25 1199041113569 might be 26-35 and so on Calculate effect within strata and aggregate

bull Key assumption no unmeasured confounders using stratifiedversion of age

119863119894 ⟂⟂ 1114100119884119894(0) 119884119894(1)1114103|119878119894

Stratification on the propensity score

bull What about when 119883 has has many dimensionsbull Curse of dimensionality there will be very few if any units in

a given stratum of 119883119894bull Stratify on a low-dimensional summary the propensity score

119890(119909) = ℙ[119863119894 = 1|119883119894 = 119909]

PS = unitrsquos probability of being treated conditional on 119883119894

bull For a particular unit this is 119890(119883119894) = ℙ[119863119894 = 1|119883119894]bull Rosenbaum and Rubin (1983) showed that

119863119894 ⟂⟂ 1114100119884119894(0) 119884119894(1)1114103 | 119883119894 ⟹ 119863119894 ⟂⟂ 1114100119884119894(0) 119884119894(1)1114103 | 119890(119883119894)

stratifying on 119890119894 is the same as stratifying on the full 119883119894

Propensity score as balancing score

bull The propensity score is actually a balancing score whichmeans that

119863119894 ⟂⟂ 119883119894 | 119890(119883119894)

bull Conditional on the propensity score treatment is independentof the covariates

Treatment status is said to be balanced 119891(119883119894|119863119894 = 1 119890(119883119894)) = 119891(119883119894|119863119894 = 0 119890(119883119894))

bull Of course we have to know the true PS to have all theseresults work

Estimating the propensity score

bull Of course in observational studies we donrsquot know thepropensity score

bull We would run a parametric model with parameters 120574 toestimate the propensity scores

1 Estimate 2 Create 119894 = Pr[119863119894 = 1|119883119894 ]

bull For instance in R we could easily calculate the propensityscores using the glm function

pscores lt- glm(treat ~ var1 + var2 + var3 data = mydata

family = binomial())$fittedvalues

Propensity score specifics

bull What variables do we include in the propensity score model Any set of variables that blocks all the backdoor paths from 119863119894

to 119884119894

bull Check balance within strata of 119894 Covariates should bebalanced

119891(119883119894|119863119894 = 1 119894) = 119891(119883119894|119863119894 = 0 119894)

bull Can also use automatednonparametric tools for estimating 119894 Covariate Balancing Propensity Scores (Imai and Ratkovic)

Stratifying by the propensity score

bull How will we use the propensity score Matching (next week) Weighting (two weeks) Regression

(three weeks)bull Today coarsening the propensity score and stratifyingbull Choose boundary points 0 = 1198871113567 lt 1198871113568 lt hellip lt 119887119870minus1113568 lt 119887119870 = 1bull Create block indicators

119861119894(119896) =

⎧⎪⎪⎨⎪⎪⎩1 if 119887119896minus1113568 lt (119883119894) lt 1198871198960 otherwise

bull Calculate within-strata effect estimates

120591119896 = 120124[119884119894|119863119894 = 1 119861119894(119896) = 1] minus 120124[119884119894|119863119894 = 0 119861119894(119896) = 1]

Standardizationdirect adjustment

bull We calculated the CATEs for each strata of the PS 120591119896bull We can use law of iterated expectations to back out the ATEbull Take the average of the CATEs over the distribution of 119883

120591 =1198701114012119896=1113568

120591119896ℙ[119861119894(119896) = 1]

bull Note that ℙ[119861119894(119896) = 1] is just the proportion of units in block 119896

ℙ[119861119894(119896) = 1] =sum119873

119894=1113568 119861119894(119896)119873

5 Wrapping Up

Summary

bull Defined observational studiesbull Defined confounding and assessed when no unmeasured

confounding holdsbull Saw how no unmeasured confounding helps with OLSbull Saw how to estimate causal effects under no unmeasured

confounding using the propensity score

Next few weeks

bull Learn how to estimate causal effects under no unmeasuredconfounders via

Matching Weighting Regression

bull Then we move onto situations where no unmeasuredconfounders is violated

  • Observational studies
  • Confounding
  • No unmeasured confounders and OLS
  • Estimating causal effects under no unmeasured confounders
  • Wrapping Up
Page 35: Gov 2002: 4. Observational Studies and Confounding · 2019-07-23 · 2. the coefficient of regression of𝐷𝑖on 𝐿𝑖also controlling for ... Basicapproachtoestimation • Remember

Stratification on the propensity score

bull What about when 119883 has has many dimensionsbull Curse of dimensionality there will be very few if any units in

a given stratum of 119883119894bull Stratify on a low-dimensional summary the propensity score

119890(119909) = ℙ[119863119894 = 1|119883119894 = 119909]

PS = unitrsquos probability of being treated conditional on 119883119894

bull For a particular unit this is 119890(119883119894) = ℙ[119863119894 = 1|119883119894]bull Rosenbaum and Rubin (1983) showed that

119863119894 ⟂⟂ 1114100119884119894(0) 119884119894(1)1114103 | 119883119894 ⟹ 119863119894 ⟂⟂ 1114100119884119894(0) 119884119894(1)1114103 | 119890(119883119894)

stratifying on 119890119894 is the same as stratifying on the full 119883119894

Propensity score as balancing score

bull The propensity score is actually a balancing score whichmeans that

119863119894 ⟂⟂ 119883119894 | 119890(119883119894)

bull Conditional on the propensity score treatment is independentof the covariates

Treatment status is said to be balanced 119891(119883119894|119863119894 = 1 119890(119883119894)) = 119891(119883119894|119863119894 = 0 119890(119883119894))

bull Of course we have to know the true PS to have all theseresults work

Estimating the propensity score

bull Of course in observational studies we donrsquot know thepropensity score

bull We would run a parametric model with parameters 120574 toestimate the propensity scores

1 Estimate 2 Create 119894 = Pr[119863119894 = 1|119883119894 ]

bull For instance in R we could easily calculate the propensityscores using the glm function

pscores lt- glm(treat ~ var1 + var2 + var3 data = mydata

family = binomial())$fittedvalues

Propensity score specifics

bull What variables do we include in the propensity score model Any set of variables that blocks all the backdoor paths from 119863119894

to 119884119894

bull Check balance within strata of 119894 Covariates should bebalanced

119891(119883119894|119863119894 = 1 119894) = 119891(119883119894|119863119894 = 0 119894)

bull Can also use automatednonparametric tools for estimating 119894 Covariate Balancing Propensity Scores (Imai and Ratkovic)

Stratifying by the propensity score

bull How will we use the propensity score Matching (next week) Weighting (two weeks) Regression

(three weeks)bull Today coarsening the propensity score and stratifyingbull Choose boundary points 0 = 1198871113567 lt 1198871113568 lt hellip lt 119887119870minus1113568 lt 119887119870 = 1bull Create block indicators

119861119894(119896) =

⎧⎪⎪⎨⎪⎪⎩1 if 119887119896minus1113568 lt (119883119894) lt 1198871198960 otherwise

bull Calculate within-strata effect estimates

120591119896 = 120124[119884119894|119863119894 = 1 119861119894(119896) = 1] minus 120124[119884119894|119863119894 = 0 119861119894(119896) = 1]

Standardizationdirect adjustment

bull We calculated the CATEs for each strata of the PS 120591119896bull We can use law of iterated expectations to back out the ATEbull Take the average of the CATEs over the distribution of 119883

120591 =1198701114012119896=1113568

120591119896ℙ[119861119894(119896) = 1]

bull Note that ℙ[119861119894(119896) = 1] is just the proportion of units in block 119896

ℙ[119861119894(119896) = 1] =sum119873

119894=1113568 119861119894(119896)119873

5 Wrapping Up

Summary

bull Defined observational studiesbull Defined confounding and assessed when no unmeasured

confounding holdsbull Saw how no unmeasured confounding helps with OLSbull Saw how to estimate causal effects under no unmeasured

confounding using the propensity score

Next few weeks

bull Learn how to estimate causal effects under no unmeasuredconfounders via

Matching Weighting Regression

bull Then we move onto situations where no unmeasuredconfounders is violated

  • Observational studies
  • Confounding
  • No unmeasured confounders and OLS
  • Estimating causal effects under no unmeasured confounders
  • Wrapping Up
Page 36: Gov 2002: 4. Observational Studies and Confounding · 2019-07-23 · 2. the coefficient of regression of𝐷𝑖on 𝐿𝑖also controlling for ... Basicapproachtoestimation • Remember

Propensity score as balancing score

bull The propensity score is actually a balancing score whichmeans that

119863119894 ⟂⟂ 119883119894 | 119890(119883119894)

bull Conditional on the propensity score treatment is independentof the covariates

Treatment status is said to be balanced 119891(119883119894|119863119894 = 1 119890(119883119894)) = 119891(119883119894|119863119894 = 0 119890(119883119894))

bull Of course we have to know the true PS to have all theseresults work

Estimating the propensity score

bull Of course in observational studies we donrsquot know thepropensity score

bull We would run a parametric model with parameters 120574 toestimate the propensity scores

1 Estimate 2 Create 119894 = Pr[119863119894 = 1|119883119894 ]

bull For instance in R we could easily calculate the propensityscores using the glm function

pscores lt- glm(treat ~ var1 + var2 + var3 data = mydata

family = binomial())$fittedvalues

Propensity score specifics

bull What variables do we include in the propensity score model Any set of variables that blocks all the backdoor paths from 119863119894

to 119884119894

bull Check balance within strata of 119894 Covariates should bebalanced

119891(119883119894|119863119894 = 1 119894) = 119891(119883119894|119863119894 = 0 119894)

bull Can also use automatednonparametric tools for estimating 119894 Covariate Balancing Propensity Scores (Imai and Ratkovic)

Stratifying by the propensity score

bull How will we use the propensity score Matching (next week) Weighting (two weeks) Regression

(three weeks)bull Today coarsening the propensity score and stratifyingbull Choose boundary points 0 = 1198871113567 lt 1198871113568 lt hellip lt 119887119870minus1113568 lt 119887119870 = 1bull Create block indicators

119861119894(119896) =

⎧⎪⎪⎨⎪⎪⎩1 if 119887119896minus1113568 lt (119883119894) lt 1198871198960 otherwise

bull Calculate within-strata effect estimates

120591119896 = 120124[119884119894|119863119894 = 1 119861119894(119896) = 1] minus 120124[119884119894|119863119894 = 0 119861119894(119896) = 1]

Standardizationdirect adjustment

bull We calculated the CATEs for each strata of the PS 120591119896bull We can use law of iterated expectations to back out the ATEbull Take the average of the CATEs over the distribution of 119883

120591 =1198701114012119896=1113568

120591119896ℙ[119861119894(119896) = 1]

bull Note that ℙ[119861119894(119896) = 1] is just the proportion of units in block 119896

ℙ[119861119894(119896) = 1] =sum119873

119894=1113568 119861119894(119896)119873

5 Wrapping Up

Summary

bull Defined observational studiesbull Defined confounding and assessed when no unmeasured

confounding holdsbull Saw how no unmeasured confounding helps with OLSbull Saw how to estimate causal effects under no unmeasured

confounding using the propensity score

Next few weeks

bull Learn how to estimate causal effects under no unmeasuredconfounders via

Matching Weighting Regression

bull Then we move onto situations where no unmeasuredconfounders is violated

  • Observational studies
  • Confounding
  • No unmeasured confounders and OLS
  • Estimating causal effects under no unmeasured confounders
  • Wrapping Up
Page 37: Gov 2002: 4. Observational Studies and Confounding · 2019-07-23 · 2. the coefficient of regression of𝐷𝑖on 𝐿𝑖also controlling for ... Basicapproachtoestimation • Remember

Estimating the propensity score

bull Of course in observational studies we donrsquot know thepropensity score

bull We would run a parametric model with parameters 120574 toestimate the propensity scores

1 Estimate 2 Create 119894 = Pr[119863119894 = 1|119883119894 ]

bull For instance in R we could easily calculate the propensityscores using the glm function

pscores lt- glm(treat ~ var1 + var2 + var3 data = mydata

family = binomial())$fittedvalues

Propensity score specifics

bull What variables do we include in the propensity score model Any set of variables that blocks all the backdoor paths from 119863119894

to 119884119894

bull Check balance within strata of 119894 Covariates should bebalanced

119891(119883119894|119863119894 = 1 119894) = 119891(119883119894|119863119894 = 0 119894)

bull Can also use automatednonparametric tools for estimating 119894 Covariate Balancing Propensity Scores (Imai and Ratkovic)

Stratifying by the propensity score

bull How will we use the propensity score Matching (next week) Weighting (two weeks) Regression

(three weeks)bull Today coarsening the propensity score and stratifyingbull Choose boundary points 0 = 1198871113567 lt 1198871113568 lt hellip lt 119887119870minus1113568 lt 119887119870 = 1bull Create block indicators

119861119894(119896) =

⎧⎪⎪⎨⎪⎪⎩1 if 119887119896minus1113568 lt (119883119894) lt 1198871198960 otherwise

bull Calculate within-strata effect estimates

120591119896 = 120124[119884119894|119863119894 = 1 119861119894(119896) = 1] minus 120124[119884119894|119863119894 = 0 119861119894(119896) = 1]

Standardizationdirect adjustment

bull We calculated the CATEs for each strata of the PS 120591119896bull We can use law of iterated expectations to back out the ATEbull Take the average of the CATEs over the distribution of 119883

120591 =1198701114012119896=1113568

120591119896ℙ[119861119894(119896) = 1]

bull Note that ℙ[119861119894(119896) = 1] is just the proportion of units in block 119896

ℙ[119861119894(119896) = 1] =sum119873

119894=1113568 119861119894(119896)119873

5 Wrapping Up

Summary

bull Defined observational studiesbull Defined confounding and assessed when no unmeasured

confounding holdsbull Saw how no unmeasured confounding helps with OLSbull Saw how to estimate causal effects under no unmeasured

confounding using the propensity score

Next few weeks

bull Learn how to estimate causal effects under no unmeasuredconfounders via

Matching Weighting Regression

bull Then we move onto situations where no unmeasuredconfounders is violated

  • Observational studies
  • Confounding
  • No unmeasured confounders and OLS
  • Estimating causal effects under no unmeasured confounders
  • Wrapping Up
Page 38: Gov 2002: 4. Observational Studies and Confounding · 2019-07-23 · 2. the coefficient of regression of𝐷𝑖on 𝐿𝑖also controlling for ... Basicapproachtoestimation • Remember

Propensity score specifics

bull What variables do we include in the propensity score model Any set of variables that blocks all the backdoor paths from 119863119894

to 119884119894

bull Check balance within strata of 119894 Covariates should bebalanced

119891(119883119894|119863119894 = 1 119894) = 119891(119883119894|119863119894 = 0 119894)

bull Can also use automatednonparametric tools for estimating 119894 Covariate Balancing Propensity Scores (Imai and Ratkovic)

Stratifying by the propensity score

bull How will we use the propensity score Matching (next week) Weighting (two weeks) Regression

(three weeks)bull Today coarsening the propensity score and stratifyingbull Choose boundary points 0 = 1198871113567 lt 1198871113568 lt hellip lt 119887119870minus1113568 lt 119887119870 = 1bull Create block indicators

119861119894(119896) =

⎧⎪⎪⎨⎪⎪⎩1 if 119887119896minus1113568 lt (119883119894) lt 1198871198960 otherwise

bull Calculate within-strata effect estimates

120591119896 = 120124[119884119894|119863119894 = 1 119861119894(119896) = 1] minus 120124[119884119894|119863119894 = 0 119861119894(119896) = 1]

Standardizationdirect adjustment

bull We calculated the CATEs for each strata of the PS 120591119896bull We can use law of iterated expectations to back out the ATEbull Take the average of the CATEs over the distribution of 119883

120591 =1198701114012119896=1113568

120591119896ℙ[119861119894(119896) = 1]

bull Note that ℙ[119861119894(119896) = 1] is just the proportion of units in block 119896

ℙ[119861119894(119896) = 1] =sum119873

119894=1113568 119861119894(119896)119873

5 Wrapping Up

Summary

bull Defined observational studiesbull Defined confounding and assessed when no unmeasured

confounding holdsbull Saw how no unmeasured confounding helps with OLSbull Saw how to estimate causal effects under no unmeasured

confounding using the propensity score

Next few weeks

bull Learn how to estimate causal effects under no unmeasuredconfounders via

Matching Weighting Regression

bull Then we move onto situations where no unmeasuredconfounders is violated

  • Observational studies
  • Confounding
  • No unmeasured confounders and OLS
  • Estimating causal effects under no unmeasured confounders
  • Wrapping Up
Page 39: Gov 2002: 4. Observational Studies and Confounding · 2019-07-23 · 2. the coefficient of regression of𝐷𝑖on 𝐿𝑖also controlling for ... Basicapproachtoestimation • Remember

Stratifying by the propensity score

bull How will we use the propensity score Matching (next week) Weighting (two weeks) Regression

(three weeks)bull Today coarsening the propensity score and stratifyingbull Choose boundary points 0 = 1198871113567 lt 1198871113568 lt hellip lt 119887119870minus1113568 lt 119887119870 = 1bull Create block indicators

119861119894(119896) =

⎧⎪⎪⎨⎪⎪⎩1 if 119887119896minus1113568 lt (119883119894) lt 1198871198960 otherwise

bull Calculate within-strata effect estimates

120591119896 = 120124[119884119894|119863119894 = 1 119861119894(119896) = 1] minus 120124[119884119894|119863119894 = 0 119861119894(119896) = 1]

Standardizationdirect adjustment

bull We calculated the CATEs for each strata of the PS 120591119896bull We can use law of iterated expectations to back out the ATEbull Take the average of the CATEs over the distribution of 119883

120591 =1198701114012119896=1113568

120591119896ℙ[119861119894(119896) = 1]

bull Note that ℙ[119861119894(119896) = 1] is just the proportion of units in block 119896

ℙ[119861119894(119896) = 1] =sum119873

119894=1113568 119861119894(119896)119873

5 Wrapping Up

Summary

bull Defined observational studiesbull Defined confounding and assessed when no unmeasured

confounding holdsbull Saw how no unmeasured confounding helps with OLSbull Saw how to estimate causal effects under no unmeasured

confounding using the propensity score

Next few weeks

bull Learn how to estimate causal effects under no unmeasuredconfounders via

Matching Weighting Regression

bull Then we move onto situations where no unmeasuredconfounders is violated

  • Observational studies
  • Confounding
  • No unmeasured confounders and OLS
  • Estimating causal effects under no unmeasured confounders
  • Wrapping Up
Page 40: Gov 2002: 4. Observational Studies and Confounding · 2019-07-23 · 2. the coefficient of regression of𝐷𝑖on 𝐿𝑖also controlling for ... Basicapproachtoestimation • Remember

Standardizationdirect adjustment

bull We calculated the CATEs for each strata of the PS 120591119896bull We can use law of iterated expectations to back out the ATEbull Take the average of the CATEs over the distribution of 119883

120591 =1198701114012119896=1113568

120591119896ℙ[119861119894(119896) = 1]

bull Note that ℙ[119861119894(119896) = 1] is just the proportion of units in block 119896

ℙ[119861119894(119896) = 1] =sum119873

119894=1113568 119861119894(119896)119873

5 Wrapping Up

Summary

bull Defined observational studiesbull Defined confounding and assessed when no unmeasured

confounding holdsbull Saw how no unmeasured confounding helps with OLSbull Saw how to estimate causal effects under no unmeasured

confounding using the propensity score

Next few weeks

bull Learn how to estimate causal effects under no unmeasuredconfounders via

Matching Weighting Regression

bull Then we move onto situations where no unmeasuredconfounders is violated

  • Observational studies
  • Confounding
  • No unmeasured confounders and OLS
  • Estimating causal effects under no unmeasured confounders
  • Wrapping Up
Page 41: Gov 2002: 4. Observational Studies and Confounding · 2019-07-23 · 2. the coefficient of regression of𝐷𝑖on 𝐿𝑖also controlling for ... Basicapproachtoestimation • Remember

5 Wrapping Up

Summary

bull Defined observational studiesbull Defined confounding and assessed when no unmeasured

confounding holdsbull Saw how no unmeasured confounding helps with OLSbull Saw how to estimate causal effects under no unmeasured

confounding using the propensity score

Next few weeks

bull Learn how to estimate causal effects under no unmeasuredconfounders via

Matching Weighting Regression

bull Then we move onto situations where no unmeasuredconfounders is violated

  • Observational studies
  • Confounding
  • No unmeasured confounders and OLS
  • Estimating causal effects under no unmeasured confounders
  • Wrapping Up
Page 42: Gov 2002: 4. Observational Studies and Confounding · 2019-07-23 · 2. the coefficient of regression of𝐷𝑖on 𝐿𝑖also controlling for ... Basicapproachtoestimation • Remember

Summary

bull Defined observational studiesbull Defined confounding and assessed when no unmeasured

confounding holdsbull Saw how no unmeasured confounding helps with OLSbull Saw how to estimate causal effects under no unmeasured

confounding using the propensity score

Next few weeks

bull Learn how to estimate causal effects under no unmeasuredconfounders via

Matching Weighting Regression

bull Then we move onto situations where no unmeasuredconfounders is violated

  • Observational studies
  • Confounding
  • No unmeasured confounders and OLS
  • Estimating causal effects under no unmeasured confounders
  • Wrapping Up
Page 43: Gov 2002: 4. Observational Studies and Confounding · 2019-07-23 · 2. the coefficient of regression of𝐷𝑖on 𝐿𝑖also controlling for ... Basicapproachtoestimation • Remember

Next few weeks

bull Learn how to estimate causal effects under no unmeasuredconfounders via

Matching Weighting Regression

bull Then we move onto situations where no unmeasuredconfounders is violated

  • Observational studies
  • Confounding
  • No unmeasured confounders and OLS
  • Estimating causal effects under no unmeasured confounders
  • Wrapping Up

Recommended