150C Causal Inference - Princeton · Linear Interaction Model Deﬁnition (Linear Interaction...

transcript

150C Causal InferenceTreatment Effect Heterogeneity and Multiplicative Interaction Models

Jonathan Mummolo

Stanford University

Mummolo (Stanford) 1 / 43

Motivation

Conditional Effects

Often we are interested not only in the Average Treatment Effect(ATE) but in the Conditional Average Treatment Effect (CATE)

Effect of some treatment holding a covariate at a fixed valueE [Y1|X = x ]− E [Y0|X = x ] = E [Y1 − Y0|X = x ]We might further be interested in knowing whether two CATEsdiffer from one another:(E [Yi1 − Yi0|X = xj ])− (E [Yi1 − Yi0|X = xk ]) where j 6= k“Effect heterogeneity”, “Heterogeneous treatment effects,”“subgroup effects,” “interaction effects”

Motivation

Conditional Effects

Often we are interested not only in the Average Treatment Effect(ATE) but in the Conditional Average Treatment Effect (CATE)Effect of some treatment holding a covariate at a fixed value

E [Y1|X = x ]− E [Y0|X = x ] = E [Y1 − Y0|X = x ]We might further be interested in knowing whether two CATEsdiffer from one another:(E [Yi1 − Yi0|X = xj ])− (E [Yi1 − Yi0|X = xk ]) where j 6= k“Effect heterogeneity”, “Heterogeneous treatment effects,”“subgroup effects,” “interaction effects”

Motivation

Conditional Effects

Often we are interested not only in the Average Treatment Effect(ATE) but in the Conditional Average Treatment Effect (CATE)Effect of some treatment holding a covariate at a fixed valueE [Y1|X = x ]− E [Y0|X = x ] = E [Y1 − Y0|X = x ]

We might further be interested in knowing whether two CATEsdiffer from one another:(E [Yi1 − Yi0|X = xj ])− (E [Yi1 − Yi0|X = xk ]) where j 6= k“Effect heterogeneity”, “Heterogeneous treatment effects,”“subgroup effects,” “interaction effects”

Motivation

Conditional Effects

Often we are interested not only in the Average Treatment Effect(ATE) but in the Conditional Average Treatment Effect (CATE)Effect of some treatment holding a covariate at a fixed valueE [Y1|X = x ]− E [Y0|X = x ] = E [Y1 − Y0|X = x ]We might further be interested in knowing whether two CATEsdiffer from one another:

(E [Yi1 − Yi0|X = xj ])− (E [Yi1 − Yi0|X = xk ]) where j 6= k“Effect heterogeneity”, “Heterogeneous treatment effects,”“subgroup effects,” “interaction effects”

Motivation

Conditional Effects

Often we are interested not only in the Average Treatment Effect(ATE) but in the Conditional Average Treatment Effect (CATE)Effect of some treatment holding a covariate at a fixed valueE [Y1|X = x ]− E [Y0|X = x ] = E [Y1 − Y0|X = x ]We might further be interested in knowing whether two CATEsdiffer from one another:(E [Yi1 − Yi0|X = xj ])− (E [Yi1 − Yi0|X = xk ]) where j 6= k

“Effect heterogeneity”, “Heterogeneous treatment effects,”“subgroup effects,” “interaction effects”

Motivation

Conditional Effects

Often we are interested not only in the Average Treatment Effect(ATE) but in the Conditional Average Treatment Effect (CATE)Effect of some treatment holding a covariate at a fixed valueE [Y1|X = x ]− E [Y0|X = x ] = E [Y1 − Y0|X = x ]We might further be interested in knowing whether two CATEsdiffer from one another:(E [Yi1 − Yi0|X = xj ])− (E [Yi1 − Yi0|X = xk ]) where j 6= k“Effect heterogeneity”, “Heterogeneous treatment effects,”“subgroup effects,” “interaction effects”

Motivation

(Hypothetical) Examples

The magnitude—and sometimes, the direction—of the effect of sometreatment depends on an additional factor.

The effect of medicine X on health is positive for those below age35, but negative for those above age 35Seeing negative political ads causes old people to vote, youngpeople to stay homePolice body cameras cause a decline in the use of force byofficers in large police departments, but have no effect for officersin small police departments

Motivation

The effect of medicine X on health is positive for those below age35, but negative for those above age 35

Seeing negative political ads causes old people to vote, youngpeople to stay homePolice body cameras cause a decline in the use of force byofficers in large police departments, but have no effect for officersin small police departments

Motivation

The effect of medicine X on health is positive for those below age35, but negative for those above age 35Seeing negative political ads causes old people to vote, youngpeople to stay home

Police body cameras cause a decline in the use of force byofficers in large police departments, but have no effect for officersin small police departments

Motivation

The effect of medicine X on health is positive for those below age35, but negative for those above age 35Seeing negative political ads causes old people to vote, youngpeople to stay homePolice body cameras cause a decline in the use of force byofficers in large police departments, but have no effect for officersin small police departments

Motivation

Linear Interaction Model

Definition (Linear Interaction Model)Workhorse model in social science for estimating the CATE: the linearinteraction model

Yi = α+ β1Di + β2Xi + β3Di ∗ Xi + εi

where Di is the treatment and Xi is the conditioning variable(sometimes called a moderator).

How to interpret correctly?

Long way: set Di and Xi to given values, recover parameters underdifferent scenarios.

Motivation

Example: What is (E [Yi |Xi = 1,Di = 1])− (E [Yi |Xi = 1,Di = 0])?

(E [Yi |Xi = 1,Di = 1]) = α+ β1 ∗ 1 + β2 ∗ 1 + β3 ∗ 1 ∗ 1 (mean-zero errorterm drops out) = α+ β1 + β2 + β3

(E [Yi |Xi = 1,Di = 0]) = α+ β1 ∗ 0 + β2 ∗ 1 + β3 ∗ 0 ∗ 1 = α+ β2

So (E [Yi |Xi = 1,Di = 1])− (E [Yi |Xi = 1,Di = 0]) =(α+ β1 + β2 + β3)− (α+ β2) = β1 + β3

= Treatment effect for those units with X = 1 (where X could be adummy for gender, party ID, old/young, etc.)

Motivation

Example: What is (E [Yi |Xi = 1,Di = 1])− (E [Yi |Xi = 1,Di = 0])?(E [Yi |Xi = 1,Di = 1]) = α+ β1 ∗ 1 + β2 ∗ 1 + β3 ∗ 1 ∗ 1 (mean-zero errorterm drops out) = α+ β1 + β2 + β3

(E [Yi |Xi = 1,Di = 0]) = α+ β1 ∗ 0 + β2 ∗ 1 + β3 ∗ 0 ∗ 1 = α+ β2

Motivation

(E [Yi |Xi = 1,Di = 0]) = α+ β1 ∗ 0 + β2 ∗ 1 + β3 ∗ 0 ∗ 1 = α+ β2

Motivation

(E [Yi |Xi = 1,Di = 0]) = α+ β1 ∗ 0 + β2 ∗ 1 + β3 ∗ 0 ∗ 1 = α+ β2

Motivation

(E [Yi |Xi = 1,Di = 0]) = α+ β1 ∗ 0 + β2 ∗ 1 + β3 ∗ 0 ∗ 1 = α+ β2

Motivation

where Di is the treatment and Xi is a dichotomous conditioningvariable (sometimes called a moderator).

Similarly: What is (E [Yi |Xi = 0,Di = 1])− (E [Yi |Xi = 0,Di = 0])?

(E [Yi |Xi = 0,Di = 1]) = α+ β1 ∗ 1 + β2 ∗ 0 + β3 ∗ 1 ∗ 0 = α+ β1