+ All Categories
Home > Documents > estimating individual causal effects - Patrick Lampatricklam.org/talk/4_10_13.pdf · 1.think of...

estimating individual causal effects - Patrick Lampatricklam.org/talk/4_10_13.pdf · 1.think of...

Date post: 10-Aug-2020
Category:
Upload: others
View: 1 times
Download: 0 times
Share this document with a friend
92
estimating individual causal effects patrick lam april 10, 2013
Transcript
Page 1: estimating individual causal effects - Patrick Lampatricklam.org/talk/4_10_13.pdf · 1.think of estimating each ˝ i as a separate \study" where we have data consisting of observation

estimating individual causal effects

patrick lam

april 10, 2013

Page 2: estimating individual causal effects - Patrick Lampatricklam.org/talk/4_10_13.pdf · 1.think of estimating each ˝ i as a separate \study" where we have data consisting of observation

roadmap

the what and why of ICEs

estimation

simulations

application

Page 3: estimating individual causal effects - Patrick Lampatricklam.org/talk/4_10_13.pdf · 1.think of estimating each ˝ i as a separate \study" where we have data consisting of observation

contributions

1. reorient our thinking away from estimating average treatmenteffects first

2. coherent framework to think about individual causal effects

3. model that combines existing methods

4. practical applications to discover treatment effectheterogeneity and recover any causal quantity

Page 4: estimating individual causal effects - Patrick Lampatricklam.org/talk/4_10_13.pdf · 1.think of estimating each ˝ i as a separate \study" where we have data consisting of observation

the typical empirical paper

I “the effect of W on Y is β”

I “the treatment effect is τ”

q: what is β or τ estimating?

a: β or τ is usually the average treatment effect (ATE)

sometimes, it’s an ATE on a subset of the population:

I average treatment effect on the treated (ATT)

I conditional average treatment effect (CATE)

I local average treatment effect (LATE)

but really, what exactly is an average treatment effect?

Page 5: estimating individual causal effects - Patrick Lampatricklam.org/talk/4_10_13.pdf · 1.think of estimating each ˝ i as a separate \study" where we have data consisting of observation

potential outcomes framework (again)

suppose a binary treatment variable W .

each individual i has a potential outcome associated withtreatment Yi (1) and control Yi (0):

τi = Yi (1)− Yi (0)

τi is an individual causal effect (ICE).

fundamental problem of causal inference: at most one potentialoutcome is ever observed for each individual

Page 6: estimating individual causal effects - Patrick Lampatricklam.org/talk/4_10_13.pdf · 1.think of estimating each ˝ i as a separate \study" where we have data consisting of observation

the average treatment effect is...

the average of the individual treatment effects:

τATE = E [τi ] = E [Y (1)− Y (0)] = E [Y (1)]− E [Y (0)]

i Y (1) Y (0) Y (1)− Y (0)1 Y1(1) Y1(0) τ12 Y2(1) Y2(0) τ23 Y3(1) Y3(0) τ34 Y4(1) Y4(0) τ45 Y5(1) Y5(0) τ56 Y6(1) Y6(0) τ6

the ATE is the difference between the averages of the second andthird columns OR equivalently the average of the fourth column

Page 7: estimating individual causal effects - Patrick Lampatricklam.org/talk/4_10_13.pdf · 1.think of estimating each ˝ i as a separate \study" where we have data consisting of observation

the average treatment effect is NOT...

I the treatment effect of any specific individual

I the treatment effect of the average individual

but we probably care more about these quantities!

q: given this, why do we use the average treatment effect?

a: probably because

I the ATE is probably the “best” general one number summary

I the ATE is usually the easiest to estimate

I the ATE is equal to the treatment effect for everybody IFFone makes the constant treatment effects assumption:

τATE = τ1 = τ2 = · · · = τN

although often implicit in language, rarely assumed explicitlyand almost never reasonable

Page 8: estimating individual causal effects - Patrick Lampatricklam.org/talk/4_10_13.pdf · 1.think of estimating each ˝ i as a separate \study" where we have data consisting of observation

estimating the average treatment effect

observed data:

i W Y (1) Y (0)1 1 Y1(1)2 0 Y2(0)3 0 Y3(0)4 1 Y4(1)5 1 Y5(1)6 0 Y6(0)

assume ignorability of treatment assignment and SUTVA.

τATE = E [Y (1)|W = 1]− E [Y (0)|W = 0]

observed outcomes are a random sample from each column soτATE is an unbiased estimate of τATE .

Page 9: estimating individual causal effects - Patrick Lampatricklam.org/talk/4_10_13.pdf · 1.think of estimating each ˝ i as a separate \study" where we have data consisting of observation

what does the ATE miss?

suppose we observe the following data.

i W Y1 1 152 0 103 0 154 1 85 1 106 0 8

=

i W Y (1) Y (0)1 1 15 ?2 0 ? 103 0 ? 154 1 8 ?5 1 10 ?6 0 ? 8

τATE = E [Y (1)|W = 1]− E [Y (0)|W = 0]

= 11− 11

= 0

what does the true underlying world look like?

Page 10: estimating individual causal effects - Patrick Lampatricklam.org/talk/4_10_13.pdf · 1.think of estimating each ˝ i as a separate \study" where we have data consisting of observation

what does the ATE miss?

with τATE = 0, the true underlying world can be

i W Y (1) Y (0) τ1 1 15 15 02 0 10 10 03 0 15 15 04 1 8 8 05 1 10 10 06 0 8 8 0

(a) world 1

OR

i W Y (1) Y (0) τ1 1 15 10 52 0 15 10 53 0 8 15 -74 1 8 15 -75 1 10 8 26 0 10 8 2

(b) world 2

I we often naturally think (τATE = 0) ≡ world 1 (constanteffects assumption); because of our priors? some type ofcognitive bias?

I without additional information, P(world 1) = P(world 2)

I world 1 and world 2 have very different implicationsacademically and policy-wise

Page 11: estimating individual causal effects - Patrick Lampatricklam.org/talk/4_10_13.pdf · 1.think of estimating each ˝ i as a separate \study" where we have data consisting of observation

what can be done?

this is a problem of treatment effect heterogeneity.

possible solutions:

1. estimate conditional ATEs, where we estimate the ATE for asubset of individuals defined by some covariate(s)

I need to define the covariate(s) ourselves based on what wethink affects the heterogeneity

I possibly run lots of models to search for heterogeneity

2. estimate the individual causal effects (ICEs)

difference:

I estimating CATEs (top-down): define a subset and estimate asubsetted ATE

I estimating ICEs (bottom-up): estimate effects for everyindividual and aggregate/explore different subsets

Page 12: estimating individual causal effects - Patrick Lampatricklam.org/talk/4_10_13.pdf · 1.think of estimating each ˝ i as a separate \study" where we have data consisting of observation

individual causal effect: τi = Yi(1)− Yi(0)

why estimate ICEs?

I we usually care about the effect on specific individuals, theaverage individual, or groups of individuals, but not the ATE

I discover and explore treatment effect heterogeneity

I bridges the gap between quantitative and qualitative research

I every other causal quantity is a simple function of ICEs, so wecan calculate other estimands directly

why not estimate ICEs?

I strictly speaking, not identified

I hard to estimate

I only an in-sample quantity, hard to generalize to thepopulation of individuals not in data without additionalassumptions

Page 13: estimating individual causal effects - Patrick Lampatricklam.org/talk/4_10_13.pdf · 1.think of estimating each ˝ i as a separate \study" where we have data consisting of observation

if we had the ICEs (τi). . .

I ATE:∑N

i=1 τiN

I ATT:

∑i∈{Wi=1} τi

Nt

I CATE{X=1}:

∑i∈{Xi=1} τi∑I(Xi=1)

I P(τi > 0)

I relationship between X and τ : scatterplot of X and τi

Page 14: estimating individual causal effects - Patrick Lampatricklam.org/talk/4_10_13.pdf · 1.think of estimating each ˝ i as a separate \study" where we have data consisting of observation

estimating ICEs

problem: ICEs are not identified in the data

I the data does not give any information to distinguish whetherτi = -1000, 0, or 9999.8 since we do not observe bothpotential outcomes.

strategy: get a sense of the plausible values for the ICEs by

1. assuming that similar observations (on covariates) have similarpotential outcomes (matching)

2. using a bayesian model to combine possible prior beliefs withinformation about potential outcomes from these observationsto derive a posterior distribution for the ICEs

not really a new idea but focus has rarely ever been on ICEs before

Page 15: estimating individual causal effects - Patrick Lampatricklam.org/talk/4_10_13.pdf · 1.think of estimating each ˝ i as a separate \study" where we have data consisting of observation

the idea (more simply)

observed and unobserved (missing) data:

i W Y (1) Y (0) τi1 1 Y1 Ymis

1 ?

2 0 Ymis2 Y2 ?

3 0 Ymis3 Y3 ?

4 1 Y4 Ymis4 ?

5 1 Y5 Ymis5 ?

6 0 Ymis6 Y6 ?

I fill in missing potential outcomes (Ymis) by imputation

I τi and any other causal estimand can be calculated given Yi

and Ymisi

I builds on something Rubin has done in a number of papersI Rubin (2005), Rubin and Waterman (2006), Jin and Rubin (2008),

Pattanayak, Rubin, and Zell (2012), Gutman and Rubin (2012)

Page 16: estimating individual causal effects - Patrick Lampatricklam.org/talk/4_10_13.pdf · 1.think of estimating each ˝ i as a separate \study" where we have data consisting of observation

spoiler

embed in a bayesian model {

1. match

2. impute Ymisi

3. calculate τi

4. repeat for all i

}

none of these are new ideas!

Page 17: estimating individual causal effects - Patrick Lampatricklam.org/talk/4_10_13.pdf · 1.think of estimating each ˝ i as a separate \study" where we have data consisting of observation

how nature generated the data

1. draw and fix values of X(p)i , Wi , and τi for i = 1, . . . ,N

X(p)i = {Xi ,X

(u)i }

where Xi and X(u)i are our observed and unobserved

prognostic covariates (that predict the outcomes)

2. generate outcomes by

Yi = h(X(p)i )

Ymisi = h(X

(p)i , τi ) for Wi = 0

Yi = h(X(p)i , τi )

Ymisi = h(X

(p)i ) for Wi = 1

where h(·) is an unknown function

Page 18: estimating individual causal effects - Patrick Lampatricklam.org/talk/4_10_13.pdf · 1.think of estimating each ˝ i as a separate \study" where we have data consisting of observation

assumptions

I data are a finite sample of size N drawn from the datagenerating process described, so only look at sampleestimands

I ignorability of treatment assignment

(Y (1),Y (0)) ⊥W |Xτ ⊥W |X

X (u) ⊥W |X

I SUTVA: no interference & same version of treatment across i

Page 19: estimating individual causal effects - Patrick Lampatricklam.org/talk/4_10_13.pdf · 1.think of estimating each ˝ i as a separate \study" where we have data consisting of observation

estimation framework

model the missing potential outcomes as

Ymisi ∼ f (·|θmis

i ,Xi ,Wi )

where randomness is derived from not observing X(u)i and θmis

i isthe mean of the distribution f (·)

translation: assume that observations j which have the samevalues on X and the opposite treatment assignment as i haveobserved outcomes that follow the same distribution as Ymis

i

Yj ∼ f (·|θj ,Xj ,Wj)

θj = θmisi

if Xi = Xj and Wi 6= Wj

I find these “donor” observations via matching

I same process regardless of whether i is treated or control

Page 20: estimating individual causal effects - Patrick Lampatricklam.org/talk/4_10_13.pdf · 1.think of estimating each ˝ i as a separate \study" where we have data consisting of observation

estimation overview

1. think of estimating each τi as a separate “study” where wehave data consisting of observation i and all observations jwhere Wi 6= Wj

2. choose a matching procedure M3. using M, construct a donor pool for i consisting of

observations j that are “close” on the covariates X

4. model the mean of the donor pool

5. draw an imputation for Ymisi from f (·) given the mean

6. calculate τi = Wi (Yi − Ymisi ) + (1−Wi )(Ymis

i − Yi )

7. repeat for all i

I incorporate in a bayesian sampler and repeat to simulate fromthe entire posterior distribution of τi for uncertainty

I each observation can be used in multiple donor pools but onlyonce within any particular donor pool

Page 21: estimating individual causal effects - Patrick Lampatricklam.org/talk/4_10_13.pdf · 1.think of estimating each ˝ i as a separate \study" where we have data consisting of observation

bayesian model

p(θmis , θM,M|Y ,X ,W ) ∝ p(Y |X ,W , θmis , θM,M)p(θmis , θM,M)

I θmis is the vector of θmisi , which is of interest

I θM is a vector of parameters within the matching method

I data does not tell us anything about the choice of M so it ispurely prior driven

steps:

1. simulate from the posterior via MCMC

2. draw values of Ymisi from the posterior predictive distribution

f (·) given the marginal posterior for θmis

3. simulate the posterior for τ (vector of τi ) by calculatingτi = Wi (Yi − Ymis

i ) + (1−Wi )(Ymisi − Yi )

Page 22: estimating individual causal effects - Patrick Lampatricklam.org/talk/4_10_13.pdf · 1.think of estimating each ˝ i as a separate \study" where we have data consisting of observation

deriving the posterior

augment data with N binary variables D(i) for i = 1, . . . ,N

D(i)j =

{1 if Wj 6= Wi & j is a match to i0 otherwise.

example: suppose we want to match 1-to-1 on a single variable X

i W X Y D(1) D(2) D(3) D(4) D(5) D(6)

1 1 5 Y1 0 0 0 0 0 12 0 3 Y2 0 0 0 1 0 03 0 2 Y3 0 0 0 0 1 04 1 3 Y4 0 1 0 0 0 05 1 2 Y5 0 0 1 0 0 06 0 5 Y6 1 0 0 0 0 0

D(i) is an indicator for whether an observation is a match to theith observation when estimating τi

Page 23: estimating individual causal effects - Patrick Lampatricklam.org/talk/4_10_13.pdf · 1.think of estimating each ˝ i as a separate \study" where we have data consisting of observation

how data augmentation helps

the data likelihood (conditional on X and W ):

L(θmis , θM,M|Y ) = p(Y |θmis , θM,M)

= intractable

augment with the variables D to get complete data likelihood:

Lcomp(θmis , θM,M|Y ,D) = p(Y ,D|θmis , θM,M)

= p(Y |D, θmis)p(D|θM,M)

actual (observed) likelihood averages over D:

L(θmis , θM,M|Y ) =

∫p(Y |D, θmis)p(D|θM,M)dD

= tractable

Page 24: estimating individual causal effects - Patrick Lampatricklam.org/talk/4_10_13.pdf · 1.think of estimating each ˝ i as a separate \study" where we have data consisting of observation

the complete data likelihood

I complete likelihood is likelihood if we observed D

I uncertainty in D comes only from matching uncertainty

Lcomp(θmis , θM,M|Y ,D) = p(Y |D, θmis)p(D|θM,M)

=N∏i=1

N∏j=1

[p(Yj |θmis

j )D(i)j p(D

(i)j |θM,M)

]

I for any specific τi , observed Yi is fixed, Yj is random fordonor j because of unobserved X (u), non-donor observationsdon’t matter

I any particular Yj can appear in the likelihood multiple timesor not at all

I outer product assumes independence of each “study” (eachICE is estimated independently)

Page 25: estimating individual causal effects - Patrick Lampatricklam.org/talk/4_10_13.pdf · 1.think of estimating each ˝ i as a separate \study" where we have data consisting of observation

priors

p(θmis , θM,M) =

[N∏i=1

p(θmisi )

]p(θM)p(M)

I usually use improper uniform priors (bayesian model thatapproximates non-bayesian results)

I can easily incorporate qualitative priors

I prior on M reflects uncertainty over matching specification

I current use of matching almost always settles on one singlespecification ≡ spike prior on M and θM

I possible to incorporate information from data and let M enterinto likelihood via balance measures??

Page 26: estimating individual causal effects - Patrick Lampatricklam.org/talk/4_10_13.pdf · 1.think of estimating each ˝ i as a separate \study" where we have data consisting of observation

simulating from the posterior via MCMC

use a Gibbs sampler: algorithm: repeat the following nsim times

1. draw a matching procedure M from

p(M|Y ,X ,W , θM,D, θmis) = p(M)

2. draw a value θM from

p(θM|Y ,X ,W ,M,D, θmis) = p(θM|Y ,X ,W ,M)

captures estimation uncertainty of matching and of the parametersof the matching procedures

Page 27: estimating individual causal effects - Patrick Lampatricklam.org/talk/4_10_13.pdf · 1.think of estimating each ˝ i as a separate \study" where we have data consisting of observation

simulating from the posterior via MCMC

for (i in 1:N) {

3. draw D(i) from

p(D(i)|Y ,X ,W , θM,M,D(−i), θmis) = m(θM,M)

D(i) is a deterministic function of θM and M (matching)

4. draw θmisi from

p(θmisi |Y ,X ,W , θM,M,D, θmis

−i ) = p(θmisi |Y{D(i)=1},D

(i))

can use conjugacy here to model the mean of the donor pool

}

end of Gibbs sampler steps here gives us one draw from the jointposterior p(θmis , θM,M|Y ,X ,W )

Page 28: estimating individual causal effects - Patrick Lampatricklam.org/talk/4_10_13.pdf · 1.think of estimating each ˝ i as a separate \study" where we have data consisting of observation

simulating from the posterior via MCMC

given θmis , impute from the posterior predictive distribution:

for (i in 1:N) {

5. draw Ymisi from f (·|θmis

i ) (imputation); captures uncertaintyfrom not observing X (u)

6. calculate τi = Wi (Yi − Ymisi ) + (1−Wi )(Ymis

i − Yi )

}

end up with nsim draws τi (matrix of size N × nsim) from theposterior distribution of τi

Page 29: estimating individual causal effects - Patrick Lampatricklam.org/talk/4_10_13.pdf · 1.think of estimating each ˝ i as a separate \study" where we have data consisting of observation

after simulation

theoretically should check

I MCMC convergenceI lots of parameters (> N)I can check convergence on ATE, ATT, etc.:

non-convergence on aggregations ⇒ overall non-convergenceconvergence on aggregations ; overall convergence

I balance on matchingI lots of balance to check (> N × nsim)I unlike usual matching, we’re not comparing distributions but

rather one observation versus multiple observations

I number of times each observation is used as a donorI need to make sure results are not too reliant on very few

unique observations as donors

need more research into these areas!

Page 30: estimating individual causal effects - Patrick Lampatricklam.org/talk/4_10_13.pdf · 1.think of estimating each ˝ i as a separate \study" where we have data consisting of observation

summary

I estimating ICEs are a good idea but very hard

I in the absence of identification, want to get at least some ideaof the causal effects for each individual (τi )

I use semi-parametric approach: matching + bayesian model

I relies on the typical causal inference assumptions plus someparametric model assumptions

I can be used to predict ICEs for unobserved or futureindividuals but need assumptions about similarities of futureto current individuals or some more parametric assumptions

some questions:

I hidden assumptions about the smoothness and/or variance ofthe distribution of ICEs in the data?

I matching in a bayesian framework logical?

Page 31: estimating individual causal effects - Patrick Lampatricklam.org/talk/4_10_13.pdf · 1.think of estimating each ˝ i as a separate \study" where we have data consisting of observation

simulation study

want to test:

I how well does the model recover ICEs under normal conditions

I horse race to compare how well different matching procedures(hold other parametric model assumptions constant)

the idea:

1. generate fake data with ICEs known

2. consider different ways of generating outcomes and differentways of generating treatment assignments (unconfounded andvarious confounded)

3. evaluate performance of model and different matchingprocedures on various metrics

Page 32: estimating individual causal effects - Patrick Lampatricklam.org/talk/4_10_13.pdf · 1.think of estimating each ˝ i as a separate \study" where we have data consisting of observation

choosing the matching procedure M: method

I choice of method: distance metric and how to choosematches given distance

I match on nearest neighbor mahalanobis distanceI match on nearest neighbor predictive meanI match on nearest neighbor (linear) propensity scoreI subclassification on (linear) propensity score

I also compare toI (bayesian) linear regression imputationI no matching: use all observations with different treatment as

donors

Page 33: estimating individual causal effects - Patrick Lampatricklam.org/talk/4_10_13.pdf · 1.think of estimating each ˝ i as a separate \study" where we have data consisting of observation

mahalanobis matching

mahalanobis distance for two observations i and j on X :

∆M(xi , xj) = (Xi − Xj)TS−1(Xi − Xj)

where S−1 is the sample covariance matrix of X

I calculate ∆M(xi , xj) for every i and j pair

I D(i) = 1 for the M observations of the opposite treatmentthat have the smallest mahalanobis distance to i

I no variation in donor pool across iterations unless M or Xvaries

I most posterior variation likely coming from imputation step

Page 34: estimating individual causal effects - Patrick Lampatricklam.org/talk/4_10_13.pdf · 1.think of estimating each ˝ i as a separate \study" where we have data consisting of observation

predictive mean matching

embed two linear regression steps within the algorithm

1. regression of Y on X for treated observations: Yt = Xtβt

2. regression of Y on X for control observations: Yc = Xcβc

I for treated i , calculate µi = Xi βc and µj = Xj βc for allcontrol observations j

I βc is the estimated contribution of X on Y

I Ymisi is the outcome with only contributions from X

I µi is initial best guess of Ymisi

I match to control observations with similar “guesses”

I D(i) = 1 for the M control observations with µj closest to µiI do the same for control i using βt instead

Page 35: estimating individual causal effects - Patrick Lampatricklam.org/talk/4_10_13.pdf · 1.think of estimating each ˝ i as a separate \study" where we have data consisting of observation

propensity score matching

propensity score: ei = P(Wi = 1|Xi )

embed logistic regression of W on X within the algorithm

I calculate linear propensity score for all observations

ln

(ei

1− ei

)= Xi β

I D(i) = 1 for the M observations of the opposite treatmentwith the closest linear propensity scores to i

I variation in donor pools due to variation in θM

Page 36: estimating individual causal effects - Patrick Lampatricklam.org/talk/4_10_13.pdf · 1.think of estimating each ˝ i as a separate \study" where we have data consisting of observation

subclassification on propensity score

I calculate linear propensity score for all observations with thesame process as before

I sort linear propensity scores and divide into M subclasses

I D(i) = 1 for observations of the opposite treatment in thesame subclass as i

I restrict each subclass to have at least two treated and twocontrol observations

I if within an iteration, a subclass does not meet the restriction,reduce M by one for that iteration only

Page 37: estimating individual causal effects - Patrick Lampatricklam.org/talk/4_10_13.pdf · 1.think of estimating each ˝ i as a separate \study" where we have data consisting of observation

choosing the matching procedure M: M and X

in addition to method, a specification of M also includes

I set of X variables to match onI should match on all confounding variables to satisfy

ignorability assumptionI possibly match on other prognostic variables (tradeoff between

worse matches but more precise imputations)

I choice of number of matches or subclasses M (can be fixed orrandom)

Page 38: estimating individual causal effects - Patrick Lampatricklam.org/talk/4_10_13.pdf · 1.think of estimating each ˝ i as a separate \study" where we have data consisting of observation

performance metrics

I traditional performance metrics (coverage, bias, mse, etc.) donot really exist for bayesian models

I bayesian posteriors characterize probability of parameters

I results are distributions rather than point estimates andstandard errors

I leverage bayesian calibration and decision theory: bayesiancounterparts to traditional metrics

I no repeated sampling of data since theoretically individualsalways have the same ICE (bayesian rather than frequentist)

i use the following performance metrics:

1. posterior mean “bias” (“bias”)

2. expected error loss (“root mean squared error”)

3. proportion of ICE credible intervals not including 0 (“power”)

4. calibration of ICEs (“coverage”)

Page 39: estimating individual causal effects - Patrick Lampatricklam.org/talk/4_10_13.pdf · 1.think of estimating each ˝ i as a separate \study" where we have data consisting of observation

posterior mean “bias” (“bias”)

traditional bias: E [θ]− θ

versus

posterior mean “bias”: E [θ|X ]− θ

I how far off from the truth is our “best” estimate?

I for ICEs, calculate the average posterior mean “bias”

Page 40: estimating individual causal effects - Patrick Lampatricklam.org/talk/4_10_13.pdf · 1.think of estimating each ˝ i as a separate \study" where we have data consisting of observation

expected error loss (“root mse”)

traditional root mean squared error:√E [(θ − θ)2] =

√variance + bias2

versus

expected error loss:√∫((θ|X )− θ)2p(θ|X )dθ ≈

√∑([θ|X ]− θ)2

nsim

I akin to average distance from the truth for each of ourposterior draws θ

I for ICEs, calculate the average expected error loss

Page 41: estimating individual causal effects - Patrick Lampatricklam.org/talk/4_10_13.pdf · 1.think of estimating each ˝ i as a separate \study" where we have data consisting of observation

proportion of ICE credible intervals including 0 (“power”)

I ask what proportion of the N 95% credible intervals contain 0

I true τi in my simulations vary, but are never exactly equal to 0

I traditional definition of power: given the null hypothesis isfalse, what is the probability of rejecting the null?

I here: given a non-zero τi , what is the probability of 0 being inthe 95% credible interval?

I key differences:I calculate probability across i rather than across repeated

samplesI τi is different across i

I nevertheless, gets at some notion of “power”

Page 42: estimating individual causal effects - Patrick Lampatricklam.org/talk/4_10_13.pdf · 1.think of estimating each ˝ i as a separate \study" where we have data consisting of observation

calibration of ICEs (“coverage”)

bayesian calibration:

I 95% credible interval represents 0.95 posterior probability ofparameter being with the interval

I model calibration by testing whether future observations arewithin 95% credible interval 95% of the time

calibration:

I ask what proportion of the N 95% credible intervals containthe true τi

I model is well calibrated if proportion is close to 0.95

Page 43: estimating individual causal effects - Patrick Lampatricklam.org/talk/4_10_13.pdf · 1.think of estimating each ˝ i as a separate \study" where we have data consisting of observation

fake data generation

I 10 prognostic covariates:

I x1 ∼ N (0, 22)I x2 ∼ N (0, 1)I x3 ∼ N (0, 1)I x4 ∼ U(−3, 3)I x5 ∼ χ2

1

I x6 ∼ Bernoulli(.5)I x7 ∼ N (0, 1)I x8 ∼ N (0, 1)I x9 ∼ N (0, 1)I x10 ∼ N (0, 1)

I linear, moderately non-linear, and very non-linear outcomeequations

I unconfounded treatment assignment and confoundedtreatment assignment with linear and non-linear equations

I sample sizes of 100, 1000, and 5000

I 27 different datasets

Page 44: estimating individual causal effects - Patrick Lampatricklam.org/talk/4_10_13.pdf · 1.think of estimating each ˝ i as a separate \study" where we have data consisting of observation

9 different fake data generating processesI outcome equations:

1. Y (0) = x1 + x2 + x3 − x4 + x5 + x6 + x7 − x8 + x9 − x102. Y (0) = x1 + x2 + 0.2x3x4 −

√x5 + x7 + x8 − x9 + x10

3. Y (0) = (x1 + x2 + x5)2 + x7 − x8 + x9 − x10

I treatment assignments:1. p(W = 1) = 0.52. η = x1 + 2x2 − 2x3 − x4 − 0.5x5 + x6 + x7

W = 1 if η > 0; otherwise W = 03. η = 0.5x1 + 2x1x2 + x23 − x4 − 0.5

√x5− x5x6 + x7

W = 1 if η > 0; otherwise W = 0

I generate true ICEs: τi ∼ N (2, (√

3)2); also considerI τi ∼ N (20, (

√3)2)

I τi ∼ N (2, (√

100)2)I τi ∼ N (20, (

√100)2)

I mixtures

I Yi (1) = Yi (0) + τi

Page 45: estimating individual causal effects - Patrick Lampatricklam.org/talk/4_10_13.pdf · 1.think of estimating each ˝ i as a separate \study" where we have data consisting of observation

simulation comparison details

I 6 methods: regression imputation, no matching (all), 4matching methods

I size of X : 5,7,10

I M: small, medium, large, random

I for matching, also consider M as a percentage of size smallertreatment group

I compare performance metrics for recovering ICEs and ATE

I 1816 different specifications

I MCMC chain length of nsim = 2000

Page 46: estimating individual causal effects - Patrick Lampatricklam.org/talk/4_10_13.pdf · 1.think of estimating each ˝ i as a separate \study" where we have data consisting of observation

“bias” of ICEs

●●●

● ●●

●●●

●●●

●●● ●●● ●●●●●

●●●

●● ●●

●●●

● ●●

●●●

●●●

●●●

●●●

●●●

●●●

●●●●● ●●● ●● ●●●● ●●●●●● ●●●●●●●●●● ●●● ●●●●● ●●●● ●

●●● ●●

●●●

●●●

● ●●

●●●

●●●●● ●●

● ●●

●●●●● ●

●●●●●●

●●●

●●●

●● ●

●●●●●●●

●● ●●●●●●●● ●●●

●●●●● ●●

●●●

●●●

●●●

●● ●

●●

●●●●

●●

●●●

●●●

● ●●●● ●●●●● ●●●●●●●● ●●

●●●

●●● ●● ●● ●● ●●●

●●● ●●

●●

● ●●●

●●●

●●

●●●●●● ●

● ●●

●●

●●

●●

●● ●

●●●●● ●●

●●●●●● ●

●●●●

●● ●●

●●●● ●●●●●● ●

●●●

●●●

●●●●●●

●●●

●●●

●●●

● ●●

●●●

●● ●●●●

●●

●●●●● ●●●

●●●●●● ●●

● ●●●● ●● ●●● ●●●

●●●

●●

●●●

●●

● ● ●

●●

●●

●●●

● ●●

●●●●

●● ●●

●●●

●● ●●●

●●●

● ●● ●● ●●

●●●

●●●●●●●

●●●●

●● ●● ●●●●

● ●●

●●

●●●

●●●

●● ●●● ●●

●● ●●●●●●

●●●

● ●●● ●●●●●●●●●● ●

●●●●●● ●●●● ●●●●●●●●● ●●●●●

●● ●

●● ●●

● ●

●●

●●● ●●●●

●●●●●●●

●●●●●●●●

●●●●●●● ●●

●●●

●●●

● ●●

●●●●●●● ●●● ●

●●●●

●●●

●●●●● ●●

●●●

●●●●

●●●

●●●

●●●●

●●●

●●●●●

●●

● ●●● ●

●●●●●●●●●●● ●●● ●●● ●●●

●●●

● ●●●●●●●

●● ●●

●●

●●●

●●

●● ●●●

●●●●

●●● ●●●

● ●●●

●●●●●

●● ●

●●●●

●●●

●● ●

●●●

●●●● ●● ●●

●●●●

●● ●●

●● ●

●●●●

●●●●●● ●

●●●●

●●●

●●●

●●●●

●●●

●●●

●●●●●●●●

●●

●●●●

●●● ●●●

●●●●●●●● ●

●●●● ●●● ●

●●●

●● ●

●●

● ●●

●●●

● ●●

●●●●● ●●

● ●●●●●●

● ●●●●

●●●● ●

●● ●

●● ●

●●●

●●●●

●●●

● ●●

●● ●

●●●

● ●●●●●

●●

●●●

●●●

●●

●●●●●●

● ●●

●●●● ●●●●●●●●

● ●●●● ●● ●●●●●●●●●●●●●●●●●

●●●

●●

●●● ●

●●●

●● ●

●●●

●●●

●●●

●●●

●●●●●●●● ●

●●

●●●

●●●

●●●●

●● ●●●● ●●●●●

●●● ●●●

●●●

● ●●

●●

●●●

●●

●●

●●●●

●● ●●●●●●●●●

●● ●

●●●●●● ●●●

●●●

●● ●● ●●●●

●●

●● ●● ●●

●●●

●●●

●●●

●●

●●●●●●●

●●●●

●●●● ●

● ●

●● ●●●●● ●

●●● ●

●● ●

●●●

● ●●●

●●●●●●●●●●●

●●●●

●●●

●●●

●●●

●●

●●●

●●●●●●

●● ●● ●

●●●●●

●● ●●●●●● ●●●●●●●●●●●●●●●●●●●●

●●●

●●●●

● ●●

●●

●●●●●●

●●●● ●

● ●●

●●●

●●

●●●● ●●

DGP 1: linear in Y; unconfounded W DGP 2: linear in Y; linear in W DGP 3: linear in Y; non−linear in W

DGP 4: moderately non−linear in Y; unconfounded W DGP 5: moderately non−linear in Y; linear in W DGP 6: moderately non−linear in Y; non−linear in W

DGP 7: very non−linear in Y; unconfounded W DGP 8: very non−linear in Y; linear in W DGP 9: very non−linear in Y; non−linear in W

−10

−5

0

5

10

−10

−5

0

5

10

−10

−5

0

5

10

regression all predictive propensity subclass NA regression all predictive propensity subclass NA regression all predictive propensity subclass NAMethod

Ave

rage

ICE

Pos

terio

r M

ean

Bia

s

Number of Observations

100

1000

5000

Page 47: estimating individual causal effects - Patrick Lampatricklam.org/talk/4_10_13.pdf · 1.think of estimating each ˝ i as a separate \study" where we have data consisting of observation

“bias” of ATE

●● ●

●●●

●●●

●●●

●●●●●●●●●●●

●●●

● ●●●

●●●

●● ●

●●●

●● ●

●●●

●● ●

● ●●

● ●●

●●●●●●●●●●●●●●●● ●●●● ●●●● ●●●●● ●●●●●●● ●● ●● ●●●

●●●●●

●●●

●●●

● ●●

●●●

●●●● ●●●

●●●

●● ●●●●

● ●●● ●●

●● ●

●●●

● ●●

● ●●●●●●

●● ●● ●●●●● ●●●●●●●●● ●●

●●

●●●

● ●●

●●●

●●

●●●●

●●

●●

●●

● ●●

●● ●●● ●●●●● ●● ●●●●● ●●●

●●●

●●●●●●●●● ●●●

●●● ●●●●

●●●●

●●●

● ●

●●●●●● ●

●●●

● ●

● ●

●●

●● ●

● ●●●

●●●

●●●●●●●●

●●●

●● ●●●●●● ●● ●● ●●●

●● ●

●●

●●

●● ●●

●●●

●●●

●●●

●●●

●●●

● ●●●●●●

●●

●● ● ●●●●●●●●● ●●●

●● ●● ●●● ●●●●●●

●●●

●●

●●

●●

●●●

●●

●●

● ●●

●● ●

●●●●

●●●●

●●●

●●●

● ●

●●●

●●●●●●●

●● ●

● ●●●● ●●

●●●●●●●●●●●

●●● ●

●●

●●●

● ●●

●●● ●●●●

●●●●●●●●

● ●●

●●●●● ●● ●●●●● ●●●

●●● ●●●●● ●●●●●●●●●●●●●● ●●

●●●

●●●●

●●

●●

●●●●●●●

●● ●●●●●

●●●●●● ●●

●●● ●● ●●●●

●●●

●●●

●●●

●● ●●

●●● ●●●●●●●●

●●●

●●●●●●●

●● ●

●●●●

●●●

●●●

●●●●

●●

●●●●

●●

●●●●●

●●● ●●● ●●●●●●●● ●●●●●●

● ●●

●●● ●●●●●

●●●●

●●

●●●

●●

●●●●●

● ●●●

● ●●●● ●

●●●●

●●●

● ●●

●●

●●●●●●●

●●●

●●●

●●●●●●●●

●●●●

●●●●●●●

●●●●

●●●●●●●●

●● ●

● ●●

● ●●

●●●●

●●●

●●●

●●●●●●●●

●●●

● ●●●● ●●

●●

●●●●●

● ●●●●●●● ●●●●

●●●

●●●

●●

●●●

●●●

●●●

● ●●

●●●●

● ●●●●●●

●●●

●●

●●●●●

● ●●

●● ●

●●●

●●●●

●●●

●●●

●●●

● ●●

●●●●●●

●●

●●●

●● ●

●●

●● ●● ●●

●●●

●● ●● ●●●●●● ●●

●● ●●●● ●●●●●●● ●●● ●● ●●● ●● ●

●●●

● ●

●●●●

●●●

●●●

● ●●

●●●

●●●

●●●

● ●●●●●●●●

●●

●● ●

● ●●

●●●●

●●●●● ●●●●● ●

●●●●●●

●●●

●●●

●●

●●●

●●●

●●

●●●●

● ●●●●● ●●● ●●

●●●

●●● ●●●● ●●

● ●●

●● ●●● ●●●

● ●

● ●●●●●

●●●

●●●

●●●

●●

●●●●●●●

●●● ●

●●●●●

●●

●●●●

●●●●

●●● ●

●●●

●●●

●●●●

●●●●●●●●●● ●

●●●●

●● ●

● ●●

●●●

●●

●●●

●●●●●●

●●●● ●

●● ●●●

● ●●●●●●●●●●●●●●●● ●●●●● ●●● ●●●

●● ●

●●●●

●● ●

●●

●●●●●●

●●●●●

●●●

● ●●

●●

●● ●●●●

DGP 1: linear in Y; unconfounded W DGP 2: linear in Y; linear in W DGP 3: linear in Y; non−linear in W

DGP 4: moderately non−linear in Y; unconfounded W DGP 5: moderately non−linear in Y; linear in W DGP 6: moderately non−linear in Y; non−linear in W

DGP 7: very non−linear in Y; unconfounded W DGP 8: very non−linear in Y; linear in W DGP 9: very non−linear in Y; non−linear in W

−10

−5

0

5

10

−10

−5

0

5

10

−10

−5

0

5

10

regression all predictive propensity subclass NA regression all predictive propensity subclass NA regression all predictive propensity subclass NAMethod

ATE

Pos

terio

r M

ean

Bia

s

Number of Observations

100

1000

5000

Page 48: estimating individual causal effects - Patrick Lampatricklam.org/talk/4_10_13.pdf · 1.think of estimating each ˝ i as a separate \study" where we have data consisting of observation

“root mse” of ICEs

● ●●●●●●● ●

●● ●●●●●●●●● ●

●●●

●●

● ●●●●●

●●

●●

●●●

●●

●●●●● ●● ●●●●●● ●●●●●●● ●●● ●●●●●●●● ●●

●●● ●● ●● ●●●●●

●● ●●●●

●●●●●

●●●

●● ●●●●●● ●● ●●● ●●●●●●●●

●●●●●●

● ●

●● ●●

●●●●●● ●●●●●●● ●●●● ●●●● ●●●●●●●●●

●●

●●

●●● ●● ●

●●

● ●●●●●●

●●

●●●●●●

● ●● ●● ●● ●●●● ●●● ●●●●●●●

●●●●●●●●●

●●

●●

●●●●

●●

●●●●

● ●●● ●●● ●● ●●●●●●●●

● ●●●

●●●●●●

●●

● ●●●●●

●● ●●●

●●●

●●● ●●●●● ●●● ●●●●

● ●●●●●

●● ●●● ●●● ●● ●●●●●●●● ●●●●●●

●●● ●

●●●●●●●●●●●●●●●●● ●●●●●●●●●●● ●●

●●

●●●●● ●

●●●●●●●●

●●●●

●●●

●●● ●●●● ●●

●● ●●●●●●●●●●

●●●●● ●

●●●

●●

●●●

●●●●

●●●●●●● ●●● ●●●●●●●●●● ●● ●●

●●●●●●

●●●●●●

●●● ●●●●●

●●●●

● ●●

●● ● ●●●●● ●●●●● ●●●●●●●●

●●●●●●●

●●● ●●● ●●●●● ●●●●●

●● ●●●●●●●●●●●●

●●●●●●●

●● ●●●●

●●●●●●●●

●●●

●●●

●●

●●●●● ●

●●● ●●●

●●● ●

●●

●●

●●

●●

● ●●●●● ●●

●●●

●● ●●● ●●●●●●●●●●●

●●●● ●●●●

●●●●●●

●●● ●●●●●

●●

●●●●●● ●●●● ●

●●●

● ●●●●●

●●●● ●●

●●●●●●

●● ●

●●●

●●

●●

●●●● ●●● ●●●● ●●●●●●● ●●●●● ●

●●●●●●

●●● ●●●●●● ●● ●●●●

●●●●● ●

●●

●●●●●●

●● ●

●●●

●●●●●●

●●●●●

●●●●●

●●● ●●●

●●●

●●●

● ●●●●

●●●

●●● ●● ●

●●●●●●●● ●●●●●● ●

●●●●●●

DGP 1: linear in Y; unconfounded W DGP 2: linear in Y; linear in W DGP 3: linear in Y; non−linear in W

DGP 4: moderately non−linear in Y; unconfounded W DGP 5: moderately non−linear in Y; linear in W DGP 6: moderately non−linear in Y; non−linear in W

DGP 7: very non−linear in Y; unconfounded W DGP 8: very non−linear in Y; linear in W DGP 9: very non−linear in Y; non−linear in W

0

5

10

15

20

0

5

10

15

20

0

5

10

15

20

regression all predictive propensity subclass NA regression all predictive propensity subclass NA regression all predictive propensity subclass NAMethod

Ave

rage

ICE

Exp

ecte

d E

rror

Los

s

Number of Observations

100

1000

5000

Page 49: estimating individual causal effects - Patrick Lampatricklam.org/talk/4_10_13.pdf · 1.think of estimating each ˝ i as a separate \study" where we have data consisting of observation

“root mse” of ATE

●●●●● ●●

●● ●●

●●

●●

●●

●●

●●

●●●●

●●

●●●●●●

●●

●● ●●

● ●●●● ●●●●●●●

● ●●●●●●● ●●●●● ●●● ●●●●●●●●●●

●●

●●

● ●

●●●●

● ●

● ●

●●

●●●●● ●●● ●

●●●

●●

●●

●●

●●

●●

●●

● ●

●●

●●●

●●

●●

●● ●●●●●●

●●●●●●●● ●● ●● ●● ●●●●

●●

● ●● ●

● ●

●●

●●

●●●

●●

● ●

●●

●●

●●

●●

● ●

●●

● ●

●●

●●

●●

●●

●●●●

●●●●●●

●●

●●● ●●●

●●

●● ●●●●●●●●●●

●●

●●

●●●●

●●

●●

● ●

● ●●●●●●

●●

●●●

●●

●●

●●

●●

●●

●●

●●

● ●

●●

●● ●●●●● ●

●●●●

●●

●●

●●

●●●●●●● ●● ●●●●●●●●●●●●●●●● ●●●● ●●● ●●●●

●●● ●● ●

●●

●●●●

●●

● ●

●●

●●● ●●●●● ●

●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●

●●

● ●●●● ●●●●●●●●● ●●●● ●●●●● ●

●●

●●●

●●

●●●●

●●

● ●●●

●●●

● ●

● ●

●●● ●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●●●● ●●

● ●

●●●●

●●●●●●

●●

●●● ●●●●●●●●●

●●

●●

●●● ●

● ●

●●

●●

●●● ●

●● ●●

●●●●●

●●

●●●●

●●

●●

● ●

●●

● ●

●●●●

● ●

●●

● ●

●●●●●●●●●●● ●

●●●●

●●

●●●

●●

●●● ●●●● ●●●● ●●●●●●●●●●●●●

● ●● ●

●●

●●

● ●

●●

● ●

●●●●

● ●●

●●●● ●●

●●●●

●●

●●

● ●

●●

●●

●●

●●

●●

●●

●●

●●

●● ●●●●●●

●●●●●● ●●

●●●●●●●●

●●●●●●

●●●●●●

●● ●●●●

●●

● ●●

●●

● ●

● ●

●●●

●●

●●

● ●

●●

●●

●●

●●

●● ●

●●

●●

●●

●●●● ●●

●●

●●● ●●●●●●

●●●●●

● ●● ●●●

●●●●

●●

●●●●●●

●●

● ●

●●

●●●

●●●

DGP 1: linear in Y; unconfounded W DGP 2: linear in Y; linear in W DGP 3: linear in Y; non−linear in W

DGP 4: moderately non−linear in Y; unconfounded W DGP 5: moderately non−linear in Y; linear in W DGP 6: moderately non−linear in Y; non−linear in W

DGP 7: very non−linear in Y; unconfounded W DGP 8: very non−linear in Y; linear in W DGP 9: very non−linear in Y; non−linear in W

0.0

2.5

5.0

7.5

10.0

0.0

2.5

5.0

7.5

10.0

0.0

2.5

5.0

7.5

10.0

regression all predictive propensity subclass NA regression all predictive propensity subclass NA regression all predictive propensity subclass NAMethod

ATE

Err

or L

oss

Number of Observations

100

1000

5000

Page 50: estimating individual causal effects - Patrick Lampatricklam.org/talk/4_10_13.pdf · 1.think of estimating each ˝ i as a separate \study" where we have data consisting of observation

“power” of ICEs

●●●

●●●●●● ●●

●● ●●

●●●

●●●

●●●

●● ●

●●● ●●●

●●●

●●●

●●

●●

●●●● ●●●●

●●

●●

●●

●●●

●●

●●●●●●●

●●●●●●●●

●●●●

●● ●●●●●●●●●●● ●●●●●● ●●●●●

●● ●

●●●

●●

●●

●●●

●●●

● ●●●

●●

●●●

●●●●●

●●

●●

●●●●

●●

●●

●●

●●●

●●

●●●

●●

● ●●

●●●

●●

●●

●●●

●●

●●●●

●●●

●●

●●●

●●● ●

●●●●●●● ●

● ●●●● ●●●● ●● ●

●●●●

● ●●● ●●●●

●●

●●

●●

●●●

●●

●●●

●●

●●

●●

● ●●●

●●● ●

●●●

●●●

● ●

● ●

●●

● ●●

●●●

●●●

●●●

●●●

●●●

●●

●●

●●

●●●●●

●●

●●●

●●●

●●●

●●

●●

●● ●●

●●●

● ●●● ●● ●●●●● ●●●●

●●● ●● ●●● ●●●●●●●

●●●

●●

●●

●●

●●●

●●●●

●●●

●●

●●● ●●●

●●

● ●●●

●●

●● ●

●● ●

●●●

●●●

●●● ●● ●

●● ●●

●●●

●●●●

●●●

●●●●

●●

●●●

●●●

●●

●●

●●

●●●●●

●●

●●

●●●●●●●

●●●

●●●● ●●● ●● ●●●●

●●●●● ●●●●●●●

● ●●●●●

●●●

●●●●●

●●

●●●

●●

●●

●●●

●●●

●●

●●

●●●●

●●●

●●

●●●

●● ●

●●●

●●

●●●

●●

● ●●

●●●

●●

● ●●

● ●●

●●

●●

●●

●●●●

●●●●●●●●●● ●●●●● ●●● ●●

●●●●

● ●● ●● ●●●●●

●●

●●●

●●

●●

●●

●●●

●●

●●

● ●

●●●

●●

●●●

●●●

●●

●●●

●●●

●●●

●●●

●●●

●●●

●●●

●● ●

●●

●●

●●

● ●●● ●●

●●●●●

●●

●●

●●●

●● ●

●●●

●●● ●●●●● ●●

● ●●●●●●●●●● ●●●

●●

●●

●●

●●●

●●

●●●

●●

●●●

●●

●●

●●

●●●

●●

●●

● ●●●

●●● ●●

● ●●●

●●●●●●

●●● ●●

● ●●●● ●●●● ●●●

●●● ●

●●●●●●●●● ●●●● ●●●●●● ●●● ●●●●●●●●●●●●● ● ●● ●● ●● ●●● ●●●●●● ●●●●●●●●●●●●●●●● ●●●●●●●●● ●●● ●●

●●

●●●

●●

●●●

●●●

●●●●

●●●

●●

●●●

●●●

●●●

●●●

●●

●●●●

●●

● ●●

●●●●

●●

●●

●●●

●●●

●●●

● ●●

●●

● ●●●●

●●

●●●

●●●●●

●●●

●●● ●●●●● ●●●●●●●● ●●●●

●●●●● ●●●●●●●●●●●

●● ●●●●●●●

●●●

●●●

●●●

●●

● ●●

●●●●

●●

●●

●●

●●●

●●

●●●

●●●

●●●

●●●

●●●

●●●

●●●

● ●●

●●●

●●●●

●●●

●●

●●●●●

●●●●

●●●●

●●●●

●●●●

●● ●●●

●●●

●●

●●●● ●●●●

●●●●●●●● ●● ●●●●●

● ●● ●●●●●●● ●

●●

●●

●●

●●

●●

● ●●

●●●

●●

●● ●

●●●

●● ●

DGP 1: linear in Y; unconfounded W DGP 2: linear in Y; linear in W DGP 3: linear in Y; non−linear in W

DGP 4: moderately non−linear in Y; unconfounded W DGP 5: moderately non−linear in Y; linear in W DGP 6: moderately non−linear in Y; non−linear in W

DGP 7: very non−linear in Y; unconfounded W DGP 8: very non−linear in Y; linear in W DGP 9: very non−linear in Y; non−linear in W

0.00

0.25

0.50

0.75

1.00

0.00

0.25

0.50

0.75

1.00

0.00

0.25

0.50

0.75

1.00

regression all predictive propensity subclass NA regression all predictive propensity subclass NA regression all predictive propensity subclass NAMethod

Pro

port

ion

of 9

5% C

redi

ble

Inte

rval

s N

ot In

clud

ing

0

Number of Observations

100

1000

5000

Page 51: estimating individual causal effects - Patrick Lampatricklam.org/talk/4_10_13.pdf · 1.think of estimating each ˝ i as a separate \study" where we have data consisting of observation

“coverage” of ICEs

●● ● ●

●●●●●●●●● ●●●●●●●● ●●●●●●●●●●

●● ●●●

●●●● ●●● ●●● ●●● ●● ●●● ●●●●●●●●●● ●● ●● ●● ● ●●●●●●● ●●●● ●●●● ●●● ●●●●●●●●●● ●●●●●●●●●●●●●

●●● ●●●

●●●●● ●●●●●●

●●●●●●●●●

●●

●●

● ●●●

●●

● ●

●●

●●●● ●● ●●●

● ●●●●

●●●●●●

●●●●●

●●●

●●

●●

●●

●● ●●●●●●

●●

●● ●

●●● ● ●●●

● ●● ●●●●●

●● ●● ●●●●● ●●●●●●●● ●●●●●●●

●●

● ●

●●

●●

●● ●● ●

●●●●●●●●●●

●● ●●●●●●

●●

●●

●●● ●●●

●●

●●

● ●●●● ●●

●●●●●●●●●●● ●●●●●

●●●●

●●

●●●●

●●●●●●●●●●●

●●●● ●

●●●●●● ●

●●

● ● ●●●●●●●●● ●●● ●●●●● ●●●●● ●● ●●● ●●●●●

● ●●●●●●

●●●●● ●●●●

●●●●●●●●●●●●●● ●

●●

●●

●● ●

●●●● ●

●● ●●●

● ●●●●

●● ●●●●● ●●●●

●●● ●●●●●●

●●

●●●

●●

●●●

●●●●

●● ●●

●●●● ●●●●● ●●●

●●●●

●●●●

●●

●●● ●●●●●●● ●●●●● ●●●● ●●● ●●●●●

●●●

●●●●●●

● ●●●

●●●●●●

●● ●●●●●

●●●●●

●●

●●

●●

●● ●

● ●

●●

●●

●●●●

●●

●●●

●● ●

● ●●●

●●● ●

●●●●

●●●●

●●●

●●

●●

●●

●●●

●●

●●●●●●

●●● ● ●●●

●●●●●●●●●●●●● ●●● ●●●●

●●●●

●●●● ●●● ●

●●

●●

●●

●●

●●●●●

●●

●●●

●●

●●●●●

●●

●●

●●● ●●●● ●●●●●●●●●

●●● ●●●● ●●●●

●●●●●● ●●●● ●

●●●● ●●●●●●●

●●

●●

●●

●●●●

●●

●● ●●

●● ●

●●

● ●●● ●●●●●

●●●

●●●● ●●●●●●●●●● ●●●●●●● ●●●●●

●●●

●● ●●

●●●●

●●

● ●●●

●●●

●●●●

●●●●

●●

●●

●●● ●

●●●●●

●●●●●●●●● ●●● ●●●

●●●●●●●

●●●●●● ●●●● ●●

●●●●●

●●●●●●●●●●●●●●● ●●●● ●●●●●

●●● ●●●●●●●● ●●● ●●● ●●●●● ●●●●●●●●●●● ●●●●

●● ●●

● ●●●●●●●●

●● ●●●●●●●●●

●●●● ●●●●●

●●

●●

●●●

●● ●

●●

●●●

●●●●

●●●●●

●●●●

●●●●

● ●●●

● ●●●

●●● ●●

●●

●●

●●●● ●●●

●●

●●●

●● ●●●

● ●●

●●●

●●

●●● ●● ●●●●● ●● ●● ●●●●●●●●●●● ●●

●● ●●●●●● ●●●●●

●●●

●●

●●

●●●

●●●●

●●●

●●

●●

●●●

●●

●●

●●●

●●●●

●●●

● ●●

●●●

●●●

● ●●●●●

●●●●

●●●●

● ●●●

●●●

●●●

●●●

● ●●

●●●●

●●● ●●

● ●●●●●●

●●●●

●●● ●

●●●●●●●●●●●●●● ●●●●●●●●●●●●● ●●●●●●● ●●

●●

●●

●●

●●●

●●●

●●

●●

●●

●●●●●

●●

●●

● ●●

●●●

● ●●

DGP 1: linear in Y; unconfounded W DGP 2: linear in Y; linear in W DGP 3: linear in Y; non−linear in W

DGP 4: moderately non−linear in Y; unconfounded W DGP 5: moderately non−linear in Y; linear in W DGP 6: moderately non−linear in Y; non−linear in W

DGP 7: very non−linear in Y; unconfounded W DGP 8: very non−linear in Y; linear in W DGP 9: very non−linear in Y; non−linear in W

0.00

0.25

0.50

0.75

1.00

0.00

0.25

0.50

0.75

1.00

0.00

0.25

0.50

0.75

1.00

regression all predictive propensity subclass NA regression all predictive propensity subclass NA regression all predictive propensity subclass NAMethod

Cal

ibra

tion

Cov

erag

e of

ICE

s w

ith 9

5% C

redi

ble

Inte

rval

s

Number of Observations

100

1000

5000

Page 52: estimating individual causal effects - Patrick Lampatricklam.org/talk/4_10_13.pdf · 1.think of estimating each ˝ i as a separate \study" where we have data consisting of observation

“bias” of ICEs: different τi distributions

●●●●●● ●●● ● ●● ●●

●●●

●●●

●●●

●●●

●●

● ● ●●

●●●

●● ●

● ●●●● ●

●●●●●

● ●●●

●●●

●●●

● ●● ●●●

● ●●

●●● ●●

● ●●

● ●●

●●

●● ●●

●●●

●●●

●● ● ●●●

●● ●●●

● ●●

●●● ●●

●●● ●●● ●●● ●●●

●●

●●●

●●●

●●

●●● ●●●

●●●

●●●

● ●●

●● ●● ●

●●●

DGP 1: linear in Y; unconfounded W DGP 2: linear in Y; linear in W DGP 3: linear in Y; non−linear in W

DGP 4: moderately non−linear in Y; unconfounded W DGP 5: moderately non−linear in Y; linear in W DGP 6: moderately non−linear in Y; non−linear in W

DGP 7: very non−linear in Y; unconfounded W DGP 8: very non−linear in Y; linear in W DGP 9: very non−linear in Y; non−linear in W

−5.0

−2.5

0.0

2.5

5.0

−5.0

−2.5

0.0

2.5

5.0

−5.0

−2.5

0.0

2.5

5.0

regression all mahalanobis predictive propensity subclass regression all mahalanobis predictive propensity subclass regression all mahalanobis predictive propensity subclassMethod

Ave

rage

ICE

Pos

terio

r M

ean

Bia

s (N

=10

00)

Mean of τ Distribution

2

20

Mixture

SD of τ Distribution

● 3

100

Page 53: estimating individual causal effects - Patrick Lampatricklam.org/talk/4_10_13.pdf · 1.think of estimating each ˝ i as a separate \study" where we have data consisting of observation

“root mse” of ICEs: different τi distributions

●●

●●●

●●

●●

●●

●●

● ●

● ●●

●●

●●

●●

●●

●●

● ●●

●●

● ●

●●

●●

●●●

●●

●●

●●

●●

●●

●●●

●●

● ●

● ●

●●

●●

●●●

●●

●●

●●

●●●

●●●

●●

●●

●●● ●●

●●●

●●●

●●●

●●

● ● ●●

●●●

●●●

●●●

●●●

●●●

●●●

DGP 1: linear in Y; unconfounded W DGP 2: linear in Y; linear in W DGP 3: linear in Y; non−linear in W

DGP 4: moderately non−linear in Y; unconfounded W DGP 5: moderately non−linear in Y; linear in W DGP 6: moderately non−linear in Y; non−linear in W

DGP 7: very non−linear in Y; unconfounded W DGP 8: very non−linear in Y; linear in W DGP 9: very non−linear in Y; non−linear in W

0

10

20

30

40

50

0

10

20

30

40

50

0

10

20

30

40

50

regression all mahalanobis predictive propensity subclass regression all mahalanobis predictive propensity subclass regression all mahalanobis predictive propensity subclassMethod

Ave

rage

ICE

Exp

ecte

d E

rror

Los

s (N

=10

00)

Mean of τ Distribution

2

20

Mixture

SD of τ Distribution

● 3

100

Page 54: estimating individual causal effects - Patrick Lampatricklam.org/talk/4_10_13.pdf · 1.think of estimating each ˝ i as a separate \study" where we have data consisting of observation

“power” of ICEs: different τi distributions

●●

DGP 1: linear in Y; unconfounded W DGP 2: linear in Y; linear in W DGP 3: linear in Y; non−linear in W

DGP 4: moderately non−linear in Y; unconfounded W DGP 5: moderately non−linear in Y; linear in W DGP 6: moderately non−linear in Y; non−linear in W

DGP 7: very non−linear in Y; unconfounded W DGP 8: very non−linear in Y; linear in W DGP 9: very non−linear in Y; non−linear in W

0.00

0.25

0.50

0.75

1.00

0.00

0.25

0.50

0.75

1.00

0.00

0.25

0.50

0.75

1.00

regression all mahalanobis predictive propensity subclass regression all mahalanobis predictive propensity subclass regression all mahalanobis predictive propensity subclassMethod

Pro

port

ion

of 9

5% C

redi

ble

Inte

rval

s N

ot In

clud

ing

0 (N

=10

00)

Mean of τ Distribution

2

20

Mixture

SD of τ Distribution

● 3

100

Page 55: estimating individual causal effects - Patrick Lampatricklam.org/talk/4_10_13.pdf · 1.think of estimating each ˝ i as a separate \study" where we have data consisting of observation

“coverage” of ICEs: different τi distributions

●●

●●● ●●● ●●● ●●● ●● ●

●●

●●●

●●● ●● ●●● ●

●●●

●●

●●●

● ●● ●●● ●● ● ●● ●

●●

●●● ●●● ●●● ●● ● ●●●

●●

●●● ●●

● ●●●

●●● ●●●

●●

●●● ●●● ● ●●

●●● ●●●

●●●

●●● ●● ● ●●● ●●● ●●●

●●

●●●

●●●

●●●

●●●

● ●●

●●●

●● ●

● ●● ●● ●●●● ●●●

DGP 1: linear in Y; unconfounded W DGP 2: linear in Y; linear in W DGP 3: linear in Y; non−linear in W

DGP 4: moderately non−linear in Y; unconfounded W DGP 5: moderately non−linear in Y; linear in W DGP 6: moderately non−linear in Y; non−linear in W

DGP 7: very non−linear in Y; unconfounded W DGP 8: very non−linear in Y; linear in W DGP 9: very non−linear in Y; non−linear in W

0.00

0.25

0.50

0.75

1.00

0.00

0.25

0.50

0.75

1.00

0.00

0.25

0.50

0.75

1.00

regression all mahalanobis predictive propensity subclass regression all mahalanobis predictive propensity subclass regression all mahalanobis predictive propensity subclassMethod

Cal

ibra

tion

Cov

erag

e of

ICE

s w

ith 9

5% C

redi

ble

Inte

rval

s (N

=10

00)

Mean of τ Distribution

2

20

Mixture

SD of τ Distribution

● 3

100

Page 56: estimating individual causal effects - Patrick Lampatricklam.org/talk/4_10_13.pdf · 1.think of estimating each ˝ i as a separate \study" where we have data consisting of observation

comparing methods summary

I of the matching methods, predictive mean matching usuallyperforms as well or better than the others

I matching methods for ICEs have fairly low “power”

I regression has high power but very poor calibration “coverage”

conclusion: regression performs well if only interested in“averages”; for better performance at the individual level, usepredictive mean matching

Page 57: estimating individual causal effects - Patrick Lampatricklam.org/talk/4_10_13.pdf · 1.think of estimating each ˝ i as a separate \study" where we have data consisting of observation

comparing X : “bias” of ICEs

●●●

●●●

●●●

● ●●

●● ● ●●●● ● ● ●●

●●●

● ●●●

●● ●

●●●

●●●

●●●

●●●

● ●●

● ●●

●●●

● ●● ● ●●●●●●● ●● ●● ● ●●●●●●● ●●●● ●●● ●● ●●● ●● ●● ●● ●●

●●● ●●

● ●●

●● ●

●●●

●●●

●●●●●● ●

● ●●

● ●●

●● ●

●●●

●●●

●●●●● ●●

●● ●● ●●●●● ●●● ●

●●● ●●● ●

●●

●● ●

●● ●

●●●

●●

●●●●

●●

●●

●●

●●●

●●●● ●●●●● ●●● ●●● ● ●● ● ●

● ● ●

●●● ● ●● ●●● ●

●●

●●● ● ●

●●

●●● ●

●●●

●●

●●●●

●● ●

●●●

●●●

●●●●

●●●

●● ●●

●●●●●● ●

●●●●

●●● ● ●●● ●●●●

●● ●

●●

●●

●●●●

● ●●

●●●

●● ●

●●●

●●●

●●●● ●●

●●

●●●●●● ●●

●●●

● ●●●●● ● ●● ●●●● ●● ●●●

●●●

●●●

●●●

●●●

● ●

●●

●●●

●●●

●●●●

● ●●

●●●

● ●● ●●●●

●● ●

●●● ● ●●●

●●●●

●●● ● ●● ●●●●●

●●

●●●

●● ●

●●● ●● ●●

●●●● ●●●●

●● ●

●●●●● ●● ●●● ● ●●●●●● ●● ●●● ● ●●● ●● ●● ● ●●●● ● ●● ●

●●●

●●●●

● ●

●●

●●● ● ●●●

●●●●●●●

●●●●

●●● ●

●●●

● ●●

●●●

●●●●

●●● ●●●●●●● ●

●● ●

● ●● ●●●●

●● ●

● ●●●

●● ●

●●●

●● ● ●

●●

●● ●●●

●●

● ●●●●

●● ●● ● ●● ●●●●● ●● ●● ●●●●

●●●

●●● ● ●●●●

●● ● ●

●●

●●●

● ●

●●● ●●

● ●●●

●● ● ●● ●

●●● ●

● ●●●

●● ●

●●●

● ●●

●●● ● ●●●●

●● ●●

●●●●

● ●●

●●●●

●●● ●●● ●

●●●●

●●●

●●●

●●● ●

●●●

●●●

●● ●● ●●● ●

●●

●● ●●

●●●●●●

● ● ●●●

●●●●● ●●● ●● ●●

●●●

● ●●

●●

●● ●

● ●●

● ●●

●●●

● ●●●

●●●

●●●●●

●● ●

●●●

●●●

● ●●●

●● ●

●●●

●●●

● ●●

●●● ●●●

● ●

●●●

●● ●

● ●

●● ● ●●●

● ●●

●●● ● ● ●●● ●●●●

●●● ● ●●●● ●● ● ●●● ●● ●● ●● ● ●●●

●● ●

●●

● ●●●

●●●

● ●●

●● ●

● ●●

●●●

●●●

●●

●●●

●● ●

● ●●●

●●●●●● ● ●● ●●

●●● ●●●

●●●

●●●

●●

●●●

●●

●●

●● ● ●

● ●● ●●●● ●● ●●

●●●

●● ●●● ●● ●●

●●●

●● ●● ●●●●

●●

●● ● ● ●●

●●●

● ●●

●●●

● ●

●●●●● ●●

●●

●●●●

●●●●

●●

●●

●●●

●● ●

● ●●●

●● ● ●●●● ● ●●●

●●● ●

● ● ●

●●●

●●●

●●

●●●

●● ● ● ●●

●●●●●●● ●

● ●

● ●●●●●●● ●● ●● ●●●●●●● ●

●●●● ●●●●●●●

●●●●

●● ●

●●

●● ●●●●

●●●●●

●●●

●●●

DGP 1: linear in Y; unconfounded W DGP 2: linear in Y; linear in W DGP 3: linear in Y; non−linear in W

DGP 4: moderately non−linear in Y; unconfounded W DGP 5: moderately non−linear in Y; linear in W DGP 6: moderately non−linear in Y; non−linear in W

DGP 7: very non−linear in Y; unconfounded W DGP 8: very non−linear in Y; linear in W DGP 9: very non−linear in Y; non−linear in W

−10

−5

0

5

10

−10

−5

0

5

10

−10

−5

0

5

10

0 5 7 10 0 5 7 10 0 5 7 10Number of Matching Variables

Ave

rage

ICE

Pos

terio

r M

ean

Bia

s

Number of Observations

100

1000

5000

Page 58: estimating individual causal effects - Patrick Lampatricklam.org/talk/4_10_13.pdf · 1.think of estimating each ˝ i as a separate \study" where we have data consisting of observation

comparing X : “root mse” of ICEs

●●●●● ●● ●●●● ● ● ● ●

● ● ● ● ●●

●●●

●●

●●●●● ●

●●

●●

●●●

●●

● ● ●●● ● ●●●●●●● ●● ●● ●●● ●● ●●● ● ●● ● ●●●●

●●●●

●●●● ●●

●●

●● ● ●●●

●●●●● ●● ●●●● ●● ● ●

●● ●● ● ●

●● ●● ●●

●●

●● ●●

●● ●

●●●● ●● ● ●●● ●●● ●● ●●●●●●●● ●●●●

●●

●●

●●

●● ● ●● ●

● ● ●●● ●

●●● ●● ●● ●●● ● ● ● ●●● ● ●● ● ●

●● ●● ●●●

● ●●

●●

●● ●

●●●

●● ●●●

●● ●● ●●● ●●● ●●●● ● ●●

●●●●

●●

● ● ●●●●

●● ● ●●●

●● ● ●● ●● ●●●● ●● ●●

● ● ●● ● ●

●● ●● ● ●●● ●● ●●●● ●● ●●●● ●● ●●

●● ●●

●●

●●●● ●●●● ●●●●● ●●● ●●● ● ● ●●●●● ●●●●

● ● ●●●●

●● ●●● ●

●●● ● ● ●● ●●

● ● ●● ●●●● ● ● ●●

●● ●● ●●●●

●●

●●

●●

●● ●●

●●●● ●●●●●● ●● ●●● ●● ●● ● ● ●●●

●● ●●●●

●● ● ●● ●

●● ● ●● ●

● ●● ● ● ● ● ●●●● ●● ●●

●● ●● ●●

● ● ●● ●●●

● ●● ● ●● ● ●● ●●● ● ●● ●

●● ●●●● ●● ●● ● ● ●●

●● ● ●●●●

●● ●●●●

●● ●●●●

●●

● ● ●● ● ●

●● ●● ●●

● ●●●

●●

●●

●●

●●

●● ●●● ●●●

●● ●

●●●● ●●● ● ● ●●●●●●● ●● ●● ●● ●●

●● ●● ●●

●● ●● ●●

●●●●

● ●

●●●

●● ● ● ●●

●● ● ● ●●

●●● ● ● ●

● ● ●

● ● ●●

●●

●●

● ●● ● ●●●●●● ●● ●● ●●●●● ● ● ●●●

●● ●● ●●

● ● ●● ● ●●● ● ● ● ●

●●

●● ●● ●●

●● ●

● ●●

●●●● ●●

● ●●●

● ●●

● ●

●● ●●

●●

●●

●●

●● ●● ●●●

●●●

●● ●● ●●

●● ●● ●●● ● ●● ●●

DGP 1: linear in Y; unconfounded W DGP 2: linear in Y; linear in W DGP 3: linear in Y; non−linear in W

DGP 4: moderately non−linear in Y; unconfounded W DGP 5: moderately non−linear in Y; linear in W DGP 6: moderately non−linear in Y; non−linear in W

DGP 7: very non−linear in Y; unconfounded W DGP 8: very non−linear in Y; linear in W DGP 9: very non−linear in Y; non−linear in W

0

5

10

15

20

0

5

10

15

20

0

5

10

15

20

0 5 7 10 0 5 7 10 0 5 7 10Number of Matching Variables

Ave

rage

ICE

Exp

ecte

d E

rror

Los

s

Number of Observations

100

1000

5000

Page 59: estimating individual causal effects - Patrick Lampatricklam.org/talk/4_10_13.pdf · 1.think of estimating each ˝ i as a separate \study" where we have data consisting of observation

comparing X : “power” of ICEs

●●●

●●●● ●●●●

●●●●

●●●

●●●

● ●●

●●●● ●

●●●

●● ●

●●●

● ●●

● ●●

●●●

●● ●

●●

●●

●●

●●●●

●●●

● ●●●

●●● ● ●●●● ●

●●●●●●● ● ●● ● ●●●●●●● ● ●● ●● ● ●● ●

●●●

●● ●

●●

● ●

● ●●

● ●●

● ●● ●

●●

●●●

●● ● ●

●●

●●

●●

●●●

●●

● ●●

● ●

●●●

● ●●

●●

●●

●●●

●●

●● ●●

●●

●●

● ●●

●●●●

●●●● ●● ●●

●● ●●●●●● ● ●●●

●●● ●

●●●● ●●● ●

●●

●●●

●●

●●●

●●●

● ●●

●●

●●

●●

● ●●

●●● ●

●●●

● ●

● ●

●●

●●●

●●●

●●●

●●●

●●●

●●●

● ●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

● ●●

●●

●●●●

●● ●

●●●●●●● ●● ●● ●●●●

●●●● ●● ●● ● ●● ●

●●● ●●

●●

● ●

●●● ●

●●

● ●●●

●●

●●●

●● ●●

● ●

● ●●

●●●

●● ●

●●●● ●

●●● ●

●●●●

●●●

●●●

●●●

●●●

●●

● ●●●

●●●

●●

●●

●●●

●●

●●

●●

● ●●●

●●●● ●●●● ●

●●

●●● ●● ● ●●● ●● ●●●● ●● ●●●● ●● ●●

●●● ●●●

● ●●

● ●●

● ●● ●

● ●●

●●

●●

●● ●

●●●

●●●

●●

●●●

●●●

● ●●

● ●

●●●

●●

●●●

●●●

●●

●●●

●●●

●●

● ●

●●

●●●●

● ●●● ● ●●●●●● ● ● ●●● ●●● ●

●●●●

● ●●● ● ●●●

●●

●●●

●●

●●● ●

●●●

●●

●●

● ●

●●●

●●●

●●●

●●

●●●

● ●●

●●●

●●●

● ●●

●●●

●●●

● ●●

●●

● ●●

●●● ●

●●

●●●●

●●

●●

●●

● ●●

●●●

●● ●

●●● ● ●●●

● ●●●●

●●●● ●●●● ●● ●●

●●

●●

● ●

●●●

●●

● ●●

● ●

●●●

●●

●●

●●

● ●●

●●● ●●

● ●●●

● ● ●●● ●

●● ●● ●

●● ● ●●●●●

●●● ●

●●● ●

● ●●●● ●● ●●● ●●●●●● ● ●● ●● ●●●●●●● ●●● ●●● ●●●● ● ● ●● ● ●● ●●●●●● ● ●●● ●●●●● ●●● ● ●● ● ●● ●●●

●●● ●●●● ●●●●

●●

●●

●●

● ●● ●●

●●●

●● ●

● ●●●

●●●

●●●

●●

● ●● ●●

●●●

● ●●

●●

●●

●●●

●●●

● ●●

●●

●●

●●●●●

●●

●● ●

●●●●●

●●●

● ●● ● ●● ●●●● ●●● ●● ● ●●● ●

●●●● ●● ●● ●● ● ●●● ●●

● ●● ● ●●● ●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●

●●●

●●●

● ●●

●●

●●●

●●●

●●●

●●●

● ●●

● ●●

●●

●● ●

●●●

●●●●

●● ●●

●● ● ●

●●●●

●● ●● ●

●●●

●●

●● ●●●● ●●

●●● ●●● ● ●●● ●●●●●●● ● ●●●● ●●●●

●●

●●

●●

●●● ●

●●

●●

●●

●●

●●

DGP 1: linear in Y; unconfounded W DGP 2: linear in Y; linear in W DGP 3: linear in Y; non−linear in W

DGP 4: moderately non−linear in Y; unconfounded W DGP 5: moderately non−linear in Y; linear in W DGP 6: moderately non−linear in Y; non−linear in W

DGP 7: very non−linear in Y; unconfounded W DGP 8: very non−linear in Y; linear in W DGP 9: very non−linear in Y; non−linear in W

0.00

0.25

0.50

0.75

1.00

0.00

0.25

0.50

0.75

1.00

0.00

0.25

0.50

0.75

1.00

0 5 7 10 0 5 7 10 0 5 7 10Number of Matching Variables

Pro

port

ion

of 9

5% C

redi

ble

Inte

rval

s N

ot In

clud

ing

0

Number of Observations

100

1000

5000

Page 60: estimating individual causal effects - Patrick Lampatricklam.org/talk/4_10_13.pdf · 1.think of estimating each ˝ i as a separate \study" where we have data consisting of observation

comparing X : “coverage” of ICEs

●●

●●

●●●● ●

●● ●

● ●●

●●

● ●●

●●●

●●

●●●

●●

● ●●

●●

● ●

● ●

●●

●●

●●●

●●●

● ●●

● ●●●

●●●

●●

●●

● ●●

●●

●●● ●

●●●●

●●

●●

●●

●●●

●●●

● ●

●●

●●

● ●

●●●

●●

● ●

●●● ●

●●●●

●●

●●

● ●●●

●●

●●

●●

●●

●●

●●

● ●

●●● ● ●

● ●●

● ●●

●● ●●

●●●● ●● ●●

●●● ●

●●● ● ● ●●●

●●

●●●

●●

●●

●●

●●●

●●

●●

●●

●●

●●

●●●

●●●

●●

●●

●●

●●

●●●

●●

●●●

●●

●●

●●

●● ● ●

●●

●●●

●●

●●

● ●●

●●

● ●●

●●

●●

●●●

●●

● ●

●●●

●●

●●

● ●

●●

●●

●●

●●

●●●

●●●

●●

●●

●●

● ●

●●

●●

●●

●●●

●●

●●

●●

●●●

●● ●

●●

●●●●

●● ●●● ●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

● ●●●

●●●

●●●

●●●●

●●●●

●● ●●

●● ●●

●●● ● ●●● ●

● ●

●●

●●

●●

●●

●●

●●

● ●

●●

● ●●

●●

●●

●●

●●●

●●●

●●

●●

●●

●●

●●

●●

●●●●

●●●

●● ●

●●

●●

●●

●●

●●

●●●

●●

●●

●●

●●

●●

● ●

● ●●

●●

●●

●●

●●

●●●

●●●

●●

●●●

●●

●●

●● ●

●●●

●●●

●●

● ●●●

● ●●●

●●●

●●

●●

●●●

●●●

● ●● ●●

●●●

●●● ●

●●●●

●●●●

●●●●

●●●●

●●●●

● ●●

●●

●●

●●

●●

●●

●●

● ●

●●

● ●

● ●

●●

● ●

●●

●●

●●

● ●

●●

● ●

● ●

●●●

● ●

●●

●●

●●

●●

●●●

●●●●

●● ●●

●●●●

●● ●● ●● ●● ● ●●●

●●● ●

● ●●● ●●●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

● ●

●●

●●

●●●

●●

●●

●●

●●●

●●●

●●

●●

●●

●●

●●

●●●●● ●●

● ●●

●● ●

●●●● ●● ●● ●●

● ●

DGP 1: linear in Y; unconfounded W DGP 2: linear in Y; linear in W DGP 3: linear in Y; non−linear in W

DGP 4: moderately non−linear in Y; unconfounded W DGP 5: moderately non−linear in Y; linear in W DGP 6: moderately non−linear in Y; non−linear in W

DGP 7: very non−linear in Y; unconfounded W DGP 8: very non−linear in Y; linear in W DGP 9: very non−linear in Y; non−linear in W

0.80

0.85

0.90

0.95

1.00

0.80

0.85

0.90

0.95

1.00

0.80

0.85

0.90

0.95

1.00

0 5 7 10 0 5 7 10 0 5 7 10Number of Matching Variables

Cal

ibra

tion

Cov

erag

e of

ICE

s w

ith 9

5% C

redi

ble

Inte

rval

s

Number of Observations

100

1000

5000

Page 61: estimating individual causal effects - Patrick Lampatricklam.org/talk/4_10_13.pdf · 1.think of estimating each ˝ i as a separate \study" where we have data consisting of observation

comparing X summary

I include all confounders

I no huge gain to including more

Page 62: estimating individual causal effects - Patrick Lampatricklam.org/talk/4_10_13.pdf · 1.think of estimating each ˝ i as a separate \study" where we have data consisting of observation

comparing M : “bias” of ICEs

●●●

● ● ●

● ● ●

● ● ●

● ● ●● ● ● ●● ● ● ●

● ● ●

● ● ●●

● ● ●

● ● ●

● ● ●

● ● ●

● ●●

● ● ●

● ● ●

● ●●

● ● ●● ● ● ●● ● ● ●● ● ● ●● ● ● ●● ● ● ●● ● ● ●● ● ● ●● ● ● ●● ● ● ●● ● ● ●

● ● ● ●●

● ● ●

● ● ●

● ● ●

● ●●

● ● ●● ● ● ●

● ● ●

● ● ●

●● ●

● ● ●

● ● ●

● ● ●●● ● ●

● ● ● ●● ● ● ●● ● ● ●●

● ● ●● ● ● ●

●●

●●●

● ● ●

●● ●

●●

● ● ●●

●●

●●

●●

● ● ●

● ● ● ●● ● ● ●● ● ● ●● ● ● ●● ● ● ●

● ● ●

● ● ● ●● ● ● ●● ●

● ●

● ●● ●●

●●

● ● ●●

● ● ●

● ●

● ● ●●

● ● ●

● ●●

●●●

● ● ●●

● ● ●

●● ●●

● ● ●●

● ● ●

● ● ●●

● ● ●● ● ● ●● ● ● ●

●● ●

●●

●●

●● ● ●

● ●●

● ● ●

● ● ●

● ● ●

● ● ●

● ● ● ●●●

●●

●● ● ●● ● ● ●

● ●●

●● ● ● ●● ● ● ●● ● ● ●● ● ● ●

●●

● ●

●●

●●

●●

● ● ●

● ●

●●

● ● ●

● ● ●

● ● ●●

● ● ●

●●●

● ● ●● ● ● ●

● ● ●

● ● ●● ● ● ●

● ● ●●

● ● ●● ● ● ●●

● ● ●

● ●

● ● ●

● ● ●

● ● ●● ● ●●

● ● ●● ● ● ●●

● ● ●

● ● ●● ● ● ●● ● ● ●● ● ● ●

● ● ● ●● ● ● ●● ● ● ●● ● ● ●● ● ● ●● ● ● ●

● ● ●

● ● ●●

● ●

●●

● ● ●● ● ●●

● ● ●●● ● ●

●● ● ●

● ● ● ●

●● ●

● ● ●

● ● ●

● ● ●●

● ● ●● ● ● ●● ● ● ●

● ● ●

● ● ●●● ● ●

● ● ●

● ●● ●

●● ●

● ● ●

● ● ●●

●●

● ● ●●

● ●

● ●● ● ●

● ● ● ●● ● ● ●● ● ● ●● ● ● ●● ● ● ●

● ● ●

● ● ● ●● ● ● ●

●● ● ●

●●

● ● ●

● ●

● ● ● ●●

● ● ●●

● ● ●●● ●

● ● ● ●

●● ●●

● ● ●

● ● ●

● ● ●

●● ● ●● ● ● ●

●● ● ●

● ● ●●

● ● ●

● ● ●●

● ● ●●● ● ●

●● ● ●

● ●●

● ●●

● ● ● ●

● ● ●

● ● ●

● ● ●● ● ● ●●

●●

●● ● ●

● ● ● ●● ●

●● ● ●●

● ● ● ●● ● ● ●● ● ● ●

● ● ●

● ● ●

● ●

● ● ●

● ● ●

● ● ●

● ●●

● ● ● ●

● ● ●

●●● ● ●

● ● ●

● ● ●

● ● ●

● ● ●●

● ● ●

● ● ●

● ● ●

● ● ●

● ●●● ● ●

● ●

● ● ●

● ● ●

● ●

● ● ●● ●●

● ●●

● ● ● ●● ● ● ●● ● ● ●

● ● ● ●● ● ● ●● ● ● ●● ● ● ●● ● ● ●● ● ● ●

● ● ●

● ●

● ● ●●

● ● ●

● ● ●

● ● ●

● ● ●

● ● ●

● ●●

●●

● ● ●

● ● ●

● ● ●●

● ● ●● ● ● ●● ● ● ●

● ● ●● ● ●

● ● ●

● ● ●

●●

●● ●

●●

●●

● ● ●●

● ● ●● ● ● ●● ● ● ●

● ● ●

●● ● ● ●● ● ● ●

● ● ●

● ● ● ●● ● ● ●

● ●

● ● ●● ● ●

● ●●

● ●●

● ●●

● ●

● ● ●● ● ● ●

● ●

●● ● ●

●● ● ●

●●

●●

● ● ●

● ● ●

● ● ●●

● ● ●● ● ● ●● ● ● ●

●● ● ●

● ● ●

● ● ●

● ●●

● ●

● ●●

● ● ●● ●●

● ● ●● ●

● ●●

● ●

● ● ● ●● ● ● ●● ● ● ●● ● ● ●● ● ● ●

● ● ● ●● ● ● ●● ● ●

● ● ●●

● ● ●

● ●

● ● ● ●● ●

●●● ● ●

● ● ●

● ● ●

DGP 1: linear in Y; unconfounded W DGP 2: linear in Y; linear in W DGP 3: linear in Y; non−linear in W

DGP 4: moderately non−linear in Y; unconfounded W DGP 5: moderately non−linear in Y; linear in W DGP 6: moderately non−linear in Y; non−linear in W

DGP 7: very non−linear in Y; unconfounded W DGP 8: very non−linear in Y; linear in W DGP 9: very non−linear in Y; non−linear in W

−10

−5

0

5

10

−10

−5

0

5

10

−10

−5

0

5

10

small medium large random small medium large random small medium large randomNumber of Matches/Subclasses

Ave

rage

ICE

Pos

terio

r M

ean

Bia

s

Number of Observations

100

1000

5000

Page 63: estimating individual causal effects - Patrick Lampatricklam.org/talk/4_10_13.pdf · 1.think of estimating each ˝ i as a separate \study" where we have data consisting of observation

comparing M : “root mse” of ICEs

● ●●● ●● ●● ●

● ●● ●● ●● ●● ●● ●

● ●●

●●

●● ●● ●●

●●

●●

●●

●● ●

●● ● ● ●● ● ● ●● ● ● ●● ● ● ●● ● ● ●● ● ● ●● ● ● ●● ● ● ●

● ●● ●

● ●● ●● ●● ●

● ●● ●● ●

● ●●● ●● ●● ●● ●● ●● ●● ●● ●● ●

●●● ●● ●

● ●

●●● ●

●●●

●● ●● ● ● ●● ● ● ●● ● ● ●● ● ● ●● ● ● ●● ● ● ●

● ●

●●

●●

● ●● ●● ●

● ●● ●● ●

● ●●● ●● ●● ●● ●● ●● ●● ●● ●● ●

● ●● ●● ●●

●●●

●●

●●●

●●

●●●●● ●

● ●● ●● ●● ● ● ●● ● ● ●● ● ●

● ●● ●

●●

● ●● ●● ●

● ●● ●● ●

● ●●● ●● ●● ●● ●● ●● ●

● ●● ●● ●

● ●● ●● ●● ●● ●● ●● ●● ●● ●● ● ● ●● ●

●●● ●

●●

● ● ● ●● ● ● ●● ● ● ●● ● ● ●● ● ● ●● ● ● ●● ●● ●● ●

● ●● ●● ●

● ●● ●● ●

●● ●● ●● ●● ●

● ●● ●● ●● ●● ●● ●

●●● ●● ●

● ●●

●●

●●

●●

●● ●● ● ● ●● ● ● ●

● ● ● ●● ● ● ●● ● ● ●● ● ● ●● ●● ●● ●

● ●● ●● ●

● ●● ●● ●

●●●● ●● ●● ●

● ●● ●● ●● ●● ●● ●

● ●● ●● ●●

●● ●● ●● ●● ●● ●●●● ● ●

● ●● ●● ● ● ●● ● ● ●● ●

●● ●● ●● ●

● ●● ●● ●

● ●● ●● ●

●●

● ●●●● ●

●●● ●● ●

● ●● ●

● ●

●●

●●

●●

●● ● ● ●● ●●

●● ●

● ● ● ●● ● ● ●● ● ● ●● ● ● ●● ● ● ●● ● ● ●

●●● ●● ●

●●● ●● ●

● ●●

●● ●

● ●●

● ●● ●● ●

●●● ●● ●

● ●● ●● ●

●● ●

● ●●●

● ●

●●

● ● ● ●● ● ● ●● ● ● ●● ● ● ●● ● ● ●● ● ● ●

● ●● ●● ●

● ●● ●● ●● ●● ●● ●

●●

● ●● ●● ●

●●●●●

● ●● ●● ●

●● ●●

●●●

●●

●●●●

●●

●●

●●

● ● ● ●●● ●

● ● ●

●●● ●● ●

● ●● ●● ●● ●● ●● ●

DGP 1: linear in Y; unconfounded W DGP 2: linear in Y; linear in W DGP 3: linear in Y; non−linear in W

DGP 4: moderately non−linear in Y; unconfounded W DGP 5: moderately non−linear in Y; linear in W DGP 6: moderately non−linear in Y; non−linear in W

DGP 7: very non−linear in Y; unconfounded W DGP 8: very non−linear in Y; linear in W DGP 9: very non−linear in Y; non−linear in W

0

5

10

15

20

0

5

10

15

20

0

5

10

15

20

small medium large random small medium large random small medium large randomNumber of Matches/Subclasses

Ave

rage

ICE

Exp

ecte

d E

rror

Los

s

Number of Observations

100

1000

5000

Page 64: estimating individual causal effects - Patrick Lampatricklam.org/talk/4_10_13.pdf · 1.think of estimating each ˝ i as a separate \study" where we have data consisting of observation

comparing M : “power” of ICEs

●●

●●

● ● ●● ● ● ●●

● ● ●●

● ●●

● ●●

● ●●

●● ●

●●● ●

●●

● ● ●

● ●●●

●●●

●●●

●●

●●

●●

●●

●●● ●

●●

● ●●● ●●

● ● ● ●●● ● ●●

● ● ●● ● ● ●● ● ● ●● ● ● ●● ● ● ●● ● ● ●● ● ● ●

● ● ●● ● ●

●●

● ●

● ●●

● ●●

● ●● ●

●●

●●

● ● ● ●

●●

●●

●●

● ●●

●●

● ●●

● ●

● ●●

● ●●

●●

●●

●● ●

●●

●●●

●●

●●

●●●

● ● ● ●

● ● ● ●● ● ● ●

● ● ● ●● ● ● ●● ● ● ●

● ● ● ●

● ● ● ●● ● ● ●

●●

● ●

●●

● ●●

●●

● ●●

●●

● ●

●●

●●

● ● ● ●

●●●

● ●

●●

●●

● ●●

● ●●

● ●●

● ●●

● ●●

● ●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●

●●

●●

● ● ●●

● ● ●

● ● ●● ● ● ●● ● ● ●● ● ● ●

● ● ● ●● ● ● ●● ● ● ●

● ●●

● ●●

●●

● ●

●●

●●●

● ●● ●

●●

●●

● ● ● ●

●●

● ● ●

● ● ●

● ● ●

●● ●

●●● ●

●●

● ● ●●

● ●●

● ●●

● ●●

● ●●●

●●● ●●

●● ●

●●

●●

●●

● ●●

●●

●●●

●●

● ● ● ●●● ●

●●●

●● ● ● ●● ● ● ●● ● ● ●

● ● ● ●● ● ● ●● ● ● ●

● ●●

● ● ●

● ● ●

● ●●

● ●●●

● ●●

●●

●●

●●●

● ●●

●●●

●●

● ● ●

● ● ●

● ●●

● ●

● ●●

● ●

● ●●

● ●●

●●

● ●●

● ● ●

●●

● ●

●●

● ● ● ●

● ● ● ●● ● ● ●● ● ● ●● ● ● ●● ● ● ●

● ● ● ●

● ● ● ●● ● ● ●

●●

●●

● ● ●

●●

●●

●●

● ● ●

●●

●●

●●

● ● ●

●● ●

●●

●●

● ●●

● ●●

● ●●

● ●●

● ●●

● ●●

● ● ●

● ● ●

●●

●●

●●

●●●

●●●

●●●

●●●

●●●

● ● ●

● ● ●

● ● ●

● ● ● ●● ●●

●● ●● ●

● ● ● ●● ●● ●● ● ● ●●

●●

●●

● ● ●

●●

●●●

●●

● ● ●

●●

●●

●●

● ● ●

●●●●●

● ●● ●

● ●● ●● ●

●● ● ●●

● ● ●●● ● ●●

● ●●●

● ●●●

● ● ●● ● ●● ● ● ●● ● ● ●● ● ● ●● ● ● ●● ● ● ●● ● ● ●● ● ● ●● ● ●● ● ● ●● ● ● ●● ● ● ●● ● ● ●● ● ● ●● ● ● ●● ● ● ●● ● ● ●● ● ● ●●

● ● ●●● ● ●● ● ● ●●

●●

●●

●●

●● ●●

●●

●●

● ●●

●● ●

●●

● ●

●●

●●

●●

●●● ●

●●

● ●●

● ●●

●●

●●

● ●●

● ●●

● ●●

● ●●

●●

● ● ● ●●

●●

●● ●

● ● ● ●●

● ● ●

● ● ● ●● ● ● ●● ● ● ●● ● ● ●● ● ● ●

● ● ● ●● ● ● ●● ● ● ●● ● ● ●

● ● ● ●● ● ● ●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●●

● ●●

● ● ●

●●

● ●●

● ●●

● ●●

● ●●

● ●●

● ●●●

●●

● ● ●

● ●●

●● ● ●

●● ● ●

●● ● ●

●● ● ●

●● ● ●●

● ● ●

●●

●●● ● ● ●● ●

● ●● ● ● ●● ● ● ●● ● ● ●● ● ● ●● ● ● ●● ● ● ●

●●

●●

●●

●●

●●●

●●

●●

●●

●●

●●

DGP 1: linear in Y; unconfounded W DGP 2: linear in Y; linear in W DGP 3: linear in Y; non−linear in W

DGP 4: moderately non−linear in Y; unconfounded W DGP 5: moderately non−linear in Y; linear in W DGP 6: moderately non−linear in Y; non−linear in W

DGP 7: very non−linear in Y; unconfounded W DGP 8: very non−linear in Y; linear in W DGP 9: very non−linear in Y; non−linear in W

0.00

0.25

0.50

0.75

1.00

0.00

0.25

0.50

0.75

1.00

0.00

0.25

0.50

0.75

1.00

small medium large random small medium large random small medium large randomNumber of Matches/Subclasses

Pro

port

ion

of 9

5% C

redi

ble

Inte

rval

s N

ot In

clud

ing

0

Number of Observations

100

1000

5000

Page 65: estimating individual causal effects - Patrick Lampatricklam.org/talk/4_10_13.pdf · 1.think of estimating each ˝ i as a separate \study" where we have data consisting of observation

comparing M : “coverage” of ICEs

●●

●●

●●

●●●

●● ●

● ● ●

● ●

● ●●

● ● ●

●●

● ●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

● ●●

●● ●●

●●

●●

●●

●●●

●●

● ●● ●

●● ● ●

●●

● ●

● ●

●●

●●

● ●●

●●

●●

●●

●●

●●

●●

● ●

●● ● ●

●● ● ●

●●

● ●

●●● ●

●●

● ●

●●

●●

●●

●●

●●

●●

●● ● ● ●

● ● ●

● ● ●

● ● ● ●

● ● ● ●● ● ● ●

● ● ● ●

● ● ● ●● ● ● ●

●●

●●

●●

●●

●●

●●

●●

●● ●

●●

● ●

● ●

●●

●●

● ●●

●● ●

●●

●●

● ●

●●

●●

●●

●●

● ●●

● ●

● ● ●

●●

●●

● ●

●●● ●

●●

●●

●●

●●

●●●

●●

●●

● ●

●●●

●●

●●

●●

●●

●●

●●

● ●

●●

●●

● ●

●●

●● ●

● ●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●● ●

●●

●●

●●

●● ●

●●●

●●

●● ●●

●● ●

● ●● ●● ●

●●

●●

●●

●●

●●

●●

● ●

●●

●●

●●

●●

●●

● ●

●●

●●

●●

● ● ●●

● ● ●

● ● ●

● ● ● ●

● ● ●●

● ● ●●

● ● ● ●

● ● ● ●● ● ● ●

●●

●●

● ●

●●

●●

●●

●●

● ●

●●

● ●

● ●●

● ●

●●

●●

●● ●

● ●●

● ●

●●

●●

●●

●●

●● ● ●

● ●●

● ● ●

●●

●●

● ●

●●

●●

● ●●

●●

●●

●●

●●

●●

●●●

●●

●●

●●

●●

●● ●

● ●●

● ●

●● ●

●●

●●

● ● ●

● ●●

● ●●

●●

●●●

●●

●● ●

●● ●●

●●

● ●●

● ●●

●●● ● ●

●● ●

●● ● ●

●● ● ●

●● ● ●

●● ● ●

●● ● ●

● ● ● ●

● ●●

●●

●●

●●

●●

●●

●●

●●

● ●

●●

●●

●●

● ●

●●

● ●

●●

● ●

●●

● ●

●●

● ●

● ●

●●

●●

●●

● ●●

● ● ● ●

● ● ●●

● ● ●●

● ● ● ●● ● ● ●● ● ● ●

● ● ● ●

● ● ● ●● ● ● ●

●●

●●

●●

● ●

● ●

●●

●●

● ●

●●

● ●

●●

●●

● ●

●●

● ●●

●●

●●

●●

● ●●

● ●●

●●

●●

●●

●●

● ●

●● ● ● ●● ●

● ●●

●● ●

● ●● ●● ● ● ●● ●● ●

●●

●●

DGP 1: linear in Y; unconfounded W DGP 2: linear in Y; linear in W DGP 3: linear in Y; non−linear in W

DGP 4: moderately non−linear in Y; unconfounded W DGP 5: moderately non−linear in Y; linear in W DGP 6: moderately non−linear in Y; non−linear in W

DGP 7: very non−linear in Y; unconfounded W DGP 8: very non−linear in Y; linear in W DGP 9: very non−linear in Y; non−linear in W

0.80

0.85

0.90

0.95

1.00

0.80

0.85

0.90

0.95

1.00

0.80

0.85

0.90

0.95

1.00

small medium large random small medium large random small medium large randomNumber of Matches/Subclasses

Cal

ibra

tion

Cov

erag

e of

ICE

s w

ith 9

5% C

redi

ble

Inte

rval

s

Number of Observations

100

1000

5000

Page 66: estimating individual causal effects - Patrick Lampatricklam.org/talk/4_10_13.pdf · 1.think of estimating each ˝ i as a separate \study" where we have data consisting of observation

comparing M percentages: “bias” of ICEs

● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●

●●

●● ● ● ● ● ● ●

● ● ● ● ● ● ● ● ● ●

● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●●

●● ● ● ● ● ● ● ● ●

● ● ● ● ● ● ● ● ● ●●

●●

●●

●●

●●

●● ● ● ● ● ● ● ● ● ●

● ● ● ● ● ●● ●

●●

●● ●

●●

● ● ●

●●

● ● ●●

●● ●

●●

● ●● ●

●●

●● ● ●

DGP 1: linear in Y; unconfounded DGP 2: linear in Y; linear in propensity score DGP 3: linear in Y; non−linear in propensity score

DGP 4: moderately non−linear in Y; unconfounded DGP 5: moderately non−linear in Y; linear in propensity score DGP 6: moderately non−linear in Y; non−linear in propensity score

DGP 7: very non−linear in Y; unconfounded DGP 8: very non−linear in Y; linear in propensity score DGP 9: very non−linear in Y; non−linear in propensity score

0

1

2

0

1

2

0

1

2

1 3 5 7 10 30 50 70 100 1 3 5 7 10 30 50 70 100 1 3 5 7 10 30 50 70 100M Percentage

Ave

rage

ICE

Pos

terio

r M

ean

Bia

s (N

=10

00)

Page 67: estimating individual causal effects - Patrick Lampatricklam.org/talk/4_10_13.pdf · 1.think of estimating each ˝ i as a separate \study" where we have data consisting of observation

comparing M percentages: “root mse” of ICEs

● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●●

● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●●

● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●●

● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●●

● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●

● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●

● ● ● ● ● ● ● ● ●●

● ● ● ● ● ● ●

● ● ● ● ● ● ● ● ●● ● ● ● ● ●

● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ●

DGP 1: linear in Y; unconfounded DGP 2: linear in Y; linear in propensity score DGP 3: linear in Y; non−linear in propensity score

DGP 4: moderately non−linear in Y; unconfounded DGP 5: moderately non−linear in Y; linear in propensity score DGP 6: moderately non−linear in Y; non−linear in propensity score

DGP 7: very non−linear in Y; unconfounded DGP 8: very non−linear in Y; linear in propensity score DGP 9: very non−linear in Y; non−linear in propensity score

0

5

10

15

20

0

5

10

15

20

0

5

10

15

20

1 3 5 7 10 30 50 70 100 1 3 5 7 10 30 50 70 100 1 3 5 7 10 30 50 70 100M Percentage

Ave

rage

ICE

Exp

ecte

d E

rror

Los

s (N

=10

00)

Page 68: estimating individual causal effects - Patrick Lampatricklam.org/talk/4_10_13.pdf · 1.think of estimating each ˝ i as a separate \study" where we have data consisting of observation

comparing M percentages: “power” of ICEs

●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●

● ●

● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ●

● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●●

●● ● ● ● ● ● ● ● ● ● ● ● ● ● ●

● ●

●●

● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●

● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●●

●●

● ● ● ● ● ● ● ● ● ● ● ● ● ● ●●

●●

● ●● ● ● ● ● ● ● ● ● ● ● ● ● ●

●●

●● ● ● ● ● ● ● ●

● ● ● ● ● ● ●●

DGP 1: linear in Y; unconfounded DGP 2: linear in Y; linear in propensity score DGP 3: linear in Y; non−linear in propensity score

DGP 4: moderately non−linear in Y; unconfounded DGP 5: moderately non−linear in Y; linear in propensity score DGP 6: moderately non−linear in Y; non−linear in propensity score

DGP 7: very non−linear in Y; unconfounded DGP 8: very non−linear in Y; linear in propensity score DGP 9: very non−linear in Y; non−linear in propensity score

0.00

0.25

0.50

0.75

0.00

0.25

0.50

0.75

0.00

0.25

0.50

0.75

1 3 5 7 10 30 50 70 100 1 3 5 7 10 30 50 70 100 1 3 5 7 10 30 50 70 100M Percentage

Pro

port

ion

of 9

5% C

redi

ble

Inte

rval

s N

ot In

clud

ing

0 (N

=10

00)

Page 69: estimating individual causal effects - Patrick Lampatricklam.org/talk/4_10_13.pdf · 1.think of estimating each ˝ i as a separate \study" where we have data consisting of observation

comparing M percentages: “coverage” of ICEs

●●

● ● ●●

●●

●●

●● ● ●

●●

●● ●

● ●● ● ●

● ●●

●●

● ●●

●● ●

●● ●

●●

●●

●●

●● ●

●●

●●

●●

●●

●● ●

● ●●

●●

● ●

●●

●●

● ● ● ●●

● ●

●●

● ●

● ● ● ●

● ● ●●

●●

●●

●●

●●

●●

● ● ●●

●●

● ●

● ●

●●

●●

DGP 1: linear in Y; unconfounded DGP 2: linear in Y; linear in propensity score DGP 3: linear in Y; non−linear in propensity score

DGP 4: moderately non−linear in Y; unconfounded DGP 5: moderately non−linear in Y; linear in propensity score DGP 6: moderately non−linear in Y; non−linear in propensity score

DGP 7: very non−linear in Y; unconfounded DGP 8: very non−linear in Y; linear in propensity score DGP 9: very non−linear in Y; non−linear in propensity score

0.875

0.900

0.925

0.950

0.975

0.875

0.900

0.925

0.950

0.975

0.875

0.900

0.925

0.950

0.975

1 3 5 7 10 30 50 70 100 1 3 5 7 10 30 50 70 100 1 3 5 7 10 30 50 70 100M Percentage

Cal

ibra

tion

Cov

erag

e of

ICE

s w

ith 9

5% C

redi

ble

Inte

rval

s (N

=10

00)

Page 70: estimating individual causal effects - Patrick Lampatricklam.org/talk/4_10_13.pdf · 1.think of estimating each ˝ i as a separate \study" where we have data consisting of observation

comparing M summary

I larger donor pools better up to a certain point

I no clear optimal size (depends on data and application)

I random M introduces more uncertainty for little “bias” gain

I possibly use a smaller range of random M

Page 71: estimating individual causal effects - Patrick Lampatricklam.org/talk/4_10_13.pdf · 1.think of estimating each ˝ i as a separate \study" where we have data consisting of observation

simulation results lessons

I decent calibration coverage which improves with largersamples

I generally poor power

I performs well in recovering “unbiased” estimates of ICEs

I predictive mean matching generally performs as well or betterthan the other methods

I larger M is better up to a certain point (around 10% ofsmaller treatment group size), although there is no ideal M

I fairly robust to functional form misspecifications in theoutcome or treatment assignment

choice: predictive mean matching with approximate M size of 10percent of smaller treatment group

Page 72: estimating individual causal effects - Patrick Lampatricklam.org/talk/4_10_13.pdf · 1.think of estimating each ˝ i as a separate \study" where we have data consisting of observation

application: monitoring corruption

Olken (2007) field experiment in Indonesia

question: can top-down or grassroots bottom-up monitoringreduce corruption?

the setting:

I over 600 Indonesian villages received funds for road projects

I villages were randomly assigned monitoring mechanisms

I all villages hold three public project-accountability meetings

I corruption was measured by taking the difference betweenreported spending and an independent assessment of costs

Olken’s main findings:

I top-down monitoring effective in reducing corruption

I grassroots participation in monitoring had little effect

Page 73: estimating individual causal effects - Patrick Lampatricklam.org/talk/4_10_13.pdf · 1.think of estimating each ˝ i as a separate \study" where we have data consisting of observation

three randomly assigned treatments

project audit from government agency (top down)

I baseline of 4% chance of audit; treated villages increasedaudit chance to 100%

I results of audit reported in village accountability meetings

I audit treatment cluster randomized at the subdistrict level

invitations to attend accountability meetings (bottom-up)

I invitations distributed through schools or neighborhood heads

I some villages randomly received additional treatment ofanonymous comment forms in addition to the invitations

I comment forms summarized at accountability meetings

I classify both types into the “participation” treatment

Page 74: estimating individual causal effects - Patrick Lampatricklam.org/talk/4_10_13.pdf · 1.think of estimating each ˝ i as a separate \study" where we have data consisting of observation

measuring corruption

corruption can occur through overreporting of costs

Y = log(reported cost)− log(actual cost)

≈ percent missing

Y 1: major items (sand, rock, gravel, labor) in road project

Y 2: major items in roads and ancillary projects

Y 3: materials in road project

Y 4: unskilled labor in road project

actual costs estimated by

I estimating quantity of materials used by digging up road

I estimating hours worked and prices through worker andsupplier surveys

Page 75: estimating individual causal effects - Patrick Lampatricklam.org/talk/4_10_13.pdf · 1.think of estimating each ˝ i as a separate \study" where we have data consisting of observation

treatment assignment issues

I audit treatment cluster randomized at subdistrict level whileparticipation treatments randomized at village level

I missing data for various reasons (listwise deleted)

I overlapping treatments: 606 total villages of which 264received audit treatment, 185 received invites treatment, 189received invites + comments treatment, 106 received notreatment

less than ideal randomization. . .

Page 76: estimating individual causal effects - Patrick Lampatricklam.org/talk/4_10_13.pdf · 1.think of estimating each ˝ i as a separate \study" where we have data consisting of observation

. . . but interesting scenarios for causal inference and ICEs

1. binary treatment on continuous outcome

audit (W ) → corruption (Y )participation (W ) → “outsider” meeting attendance (A)

2. continuous treatment on continuous outcome

attendance (A) → corruption (Y )

3. two stage design of “instrument” on outcome

participation (W ) → attendance (A) → corruption (Y )

we can look at all three in an ICE framework!

Page 77: estimating individual causal effects - Patrick Lampatricklam.org/talk/4_10_13.pdf · 1.think of estimating each ˝ i as a separate \study" where we have data consisting of observation

other variables

observations at the village level with covariates:

I distance to subdistrict

I education of village head

I age of village head

I salary of village head

I percent of households poor

I village population

I mosques per 1,000 population

I mountainous village dummy

I total budget

I number of subprojects

also measure average “outsider” meeting attendance and average“outsider” meeting attendance percent

Page 78: estimating individual causal effects - Patrick Lampatricklam.org/talk/4_10_13.pdf · 1.think of estimating each ˝ i as a separate \study" where we have data consisting of observation

ATE: W on Y

●●

● ●●

● ●

●●

● ●●

Y1: major items in road project Y2: major items in roads and ancillary project

Y3: materials in road project Y4: unskilled labor in road project

−0.4

−0.2

0.0

0.2

−0.4

−0.2

0.0

0.2

audit(none)

audit(invites)

audit(comments)

invites(none)

invites(audit)

comments(none)

comments(audit)

audit(none)

audit(invites)

audit(comments)

invites(none)

invites(audit)

comments(none)

comments(audit)

Treatment (other treatment received)

Pos

terio

r M

ean

of A

TE

with

95%

Cre

dibl

e In

terv

als

Model

From ICEs

From Regression

audit treatment works; participation treatments don’t really

Page 79: estimating individual causal effects - Patrick Lampatricklam.org/talk/4_10_13.pdf · 1.think of estimating each ˝ i as a separate \study" where we have data consisting of observation

W on Y : different types of average treatment effects

● ● ●●

● ● ●● ●

● ●●

●●

●●

Y1: major items in road project Y2: major items in roads and ancillary project

Y3: materials in road project Y4: unskilled labor in road project−0.75

−0.50

−0.25

0.00

0.25

−0.75

−0.50

−0.25

0.00

0.25

all (ATE) treated (ATT) control (ATC) populous poor mountainous populous, poor,and mountainous

all (ATE) treated (ATT) control (ATC) populous poor mountainous populous, poor,and mountainous

Data Subset

ATE

Pos

terio

r M

ean

and

95%

Cre

dibl

e In

terv

al

Page 80: estimating individual causal effects - Patrick Lampatricklam.org/talk/4_10_13.pdf · 1.think of estimating each ˝ i as a separate \study" where we have data consisting of observation

W on Y : quantiles of treatment effects

Y1: major items in road project Y2: major items in roads and ancillary project

Y3: materials in road project Y4: unskilled labor in road project

−1.0

−0.5

0.0

0.5

1.0

−1.0

−0.5

0.0

0.5

1.0

audit invites comments audit invites commentsTreatment

Qua

ntile

Tre

atm

ent E

ffect

Pos

terio

r M

eans

with

95%

Cre

dibl

e In

terv

als

Quantile

25th

50th

75th

Page 81: estimating individual causal effects - Patrick Lampatricklam.org/talk/4_10_13.pdf · 1.think of estimating each ˝ i as a separate \study" where we have data consisting of observation

W on Y : audits have bigger effect on price corruption. . .

●●

●●

● ●●

● ●

●●

●●

●●

●●

●●

●●

● ●

●●

●●

●●

●●

●●

●●

●●

● ●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

● ●

●●

●●

●●

●●

●●

●●

● ●

●●

● ●

●●

●●

●●

●●

●●

●●

●●

●●

● ●●

●●

●●

●●

●●

● ●

●●

●●

● ●

●●

●●

●●

● ●

● ●

●● ●

● ●

●●

●●

●●

● ●

●●

●●

●●

●●

● ●

● ●

●●

●●

●●

●●

●●

●●

●●

●●

●●

● ●

●●

● ●●

●●

●●

●●

●●

●●

● ●

●●

●●

●●

● ●

●●

● ●

●●

●●

●●

●●

● ●

●●

●●

● ●

●●

●●

●●

●●

●●

● ●

●●

●●

● ●

●●

●●

●●

Y1: major items in road project Y2: major items in roads and ancillary project

Y3: materials in road project Y4: unskilled labor in road project

0.00

0.25

0.50

0.75

1.00

0.00

0.25

0.50

0.75

1.00

−1.0 −0.5 0.0 0.5 1.0 −1.0 −0.5 0.0 0.5 1.0Difference in Log of Reported and Actual Prices

Pro

babi

lity

of a

t lea

st a

20

Per

cent

age

Poi

nt D

ecre

ase

in P

erce

nt M

issi

ng (

τ≤

−0.

2)

Page 82: estimating individual causal effects - Patrick Lampatricklam.org/talk/4_10_13.pdf · 1.think of estimating each ˝ i as a separate \study" where we have data consisting of observation

W on Y : . . . than on quantities used corruption

●●

●●

● ●●

●●

●●

●●

● ●

● ●

●●

●●

●●

●●

●●

●●

●●

● ●

●●

●●

●●

●●

●●

● ●

●●

●●

● ●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

● ●

●●

●●

●●

●●

●●

● ●

●●

●●

●●

●●

● ●

●●

●●

● ● ●

●●

●●

●●

●●

●●

●●

●●

● ●

●●

● ●

●●

●●

● ●

●● ●

●●

●●

● ●

●●

●●

●●

●●

●●

● ●

●●

●●

●●

●●

● ●

●●

●●

●●

●●

●●

●●

●●

●●

●●●

●●

● ●

●●

● ●

●●

● ●

● ●

●●

●●

●●

●●

● ●

●●

● ●

●●

●●

● ●

●●

●●

● ●

●●

●●

●●

●●

●●

● ●

●●

●●

●●

●●

● ●

●●

Y1: major items in road project Y2: major items in roads and ancillary project

Y3: materials in road project Y4: unskilled labor in road project

0.00

0.25

0.50

0.75

1.00

0.00

0.25

0.50

0.75

1.00

−1.0 −0.5 0.0 0.5 −1.0 −0.5 0.0 0.5Difference in Log of Reported and Actual Quantity Used

Pro

babi

lity

of a

t lea

st a

20

Per

cent

age

Poi

nt D

ecre

ase

in P

erce

nt M

issi

ng (

τ≤

−0.

2)

Page 83: estimating individual causal effects - Patrick Lampatricklam.org/talk/4_10_13.pdf · 1.think of estimating each ˝ i as a separate \study" where we have data consisting of observation

W on A: comments substitute for attendance

A1: Outsider Attendance A2: Outsider Attendance Percentage

0

5

10

15

invites comments participation invites comments participationTreatment

ATT

Pos

terio

r M

ean

with

95%

Cre

dibl

e In

terv

al

DistributionMethod

all

schoolsneighborhoodheads

distribution through schools is slightly better

Page 84: estimating individual causal effects - Patrick Lampatricklam.org/talk/4_10_13.pdf · 1.think of estimating each ˝ i as a separate \study" where we have data consisting of observation

unique application: testing SUTVA

SUTVA usually violated through

1. interference across individuals OR

2. multiple versions of treatment (dosage issue)

Here: multiple versions of treatment (τ a and τb)

I two participation treatments (invites and invites + comments)

I two distribution methods (schools and neighborhood heads)

SUTVA violated if

Yi (1a) 6= Yi (1b)

τ ai 6= τbi

for every i assuming Yi (0a) = Yi (0b)

Page 85: estimating individual causal effects - Patrick Lampatricklam.org/talk/4_10_13.pdf · 1.think of estimating each ˝ i as a separate \study" where we have data consisting of observation

testing SUTVA

SUTVA violation if any τ ai 6= τbi so

P(SUTVA violated) ≈ P(τ ai 6= τbi )

define various violation criteria to estimate P(SUTVA violated):

I one-sided violation:

P(SUTVA violated) = max(P(τ ai > τbi ))

I posterior range:

P(SUTVA violated) = max(P(|τ ai − τbi | > ε))

I others?

Page 86: estimating individual causal effects - Patrick Lampatricklam.org/talk/4_10_13.pdf · 1.think of estimating each ˝ i as a separate \study" where we have data consisting of observation

W on A: testing SUTVA

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●●

●●

●●

●●●●

●●

●●

●●

●●

●●

●●●

Violation Criteria: Invites > Comments Violation Criteria: School > Heads

Violation Criteria: |Invites − Comments| > 10 Violation Criteria: |School − Heads| > 100.4

0.5

0.6

0.7

0.8

0.4

0.5

0.6

0.7

0.8

0 25 50 75 100 0 25 50 75 100Control Observation Number

Pro

babi

lity

of S

UT

VA

Vio

latio

n

Page 87: estimating individual causal effects - Patrick Lampatricklam.org/talk/4_10_13.pdf · 1.think of estimating each ˝ i as a separate \study" where we have data consisting of observation

A on Y : continuous treatments

for continuous treatment variable A, assume linear effect:

1. calculate predictive means with one regression of Y on X

2. match with possible donor pool of all observations j whereAi 6= Aj

3. run linear regression of Y on A with donor pool and i to getβD(i)

4. draw Ymisi from f (·|θmis

i ) where θmisi = βD(i)(Ai − 1) (so

assume i is always “treated” and calculate its outcome under“control” (Ai − 1))

5. calculate τmisi as Yi − Ymis

i

τi is a linear ICE

Page 88: estimating individual causal effects - Patrick Lampatricklam.org/talk/4_10_13.pdf · 1.think of estimating each ˝ i as a separate \study" where we have data consisting of observation

A on Y : no effect of attendance on corruption

●●

● ●

●●

● ●●

● ●

Y1: major items in road project Y2: major items in roads and ancillary project

Y3: materials in road project Y4: unskilled labor in road project

0.0

0.1

0.2

0.0

0.1

0.2

Outsider Attendance Outsider Attendance Percentage Outsider Attendance Outsider Attendance PercentageA (Treatment) Type

Line

ar A

TE

Pos

terio

r M

ean

with

95%

Cre

dibl

e In

terv

al

Village Type

all

audit treatment

no audit treatment

Page 89: estimating individual causal effects - Patrick Lampatricklam.org/talk/4_10_13.pdf · 1.think of estimating each ˝ i as a separate \study" where we have data consisting of observation

2-stage W on Y

W is an “instrument” for (continuous) A

I monotonicity assumption: Ai (1) ≥ Ai (0)

I exclusion restriction: if Ai (1) = Ai (0), thenYi (1,Ai (1)) = Yi (0,Ai (0))

two sets of ICEs:

1. first stage ICE: δi = Ai (1)− Ai (0)

2. second stage ICE:I if δi > 0 (compliers), then τ comp

i = Yi (1,Ai (1))− Yi (0,Ai (0))I if δi = 0 (non-compliers), then τncomp

i = Yi (1)− Yi (0)

typical estimand: local (complier) average treatment effect

E [τ compi |δi > 0] =

∑i :δi>0 Yi (1,Ai (1))− Yi (0,Ai (0))∑N

i=1 I(δi > 0)

Page 90: estimating individual causal effects - Patrick Lampatricklam.org/talk/4_10_13.pdf · 1.think of estimating each ˝ i as a separate \study" where we have data consisting of observation

2-stage W on Y : estimation

need to impute Ymis and Amis :

1. draw Amisi without matching: donor pool = all observations

from opposite treatment (can match if desired)

2. calculate δi = Wi (Ai − Amisi ) + (1−Wi )(Amis

i − Ai )

3. determine compliance status: Gi = 1 if δi > 0

4. draw Ymisi (with or without covariate matching) as follows:

I always match on Gi for D(i)

I without monotonicity: same process as ICEs for continuoustreatments with θmis

i = βD(i) δiI with monotonicity: θmis

i = βD(i) δi for compliers and draw θmisi

from p(θmisi |Y{D(i)=1},D

(i)) as normal for non-compliers

5. calculate τi = Wi (Yi − Ymisi ) + (1−Wi )(Ymis

i − Yi )

τi = τ compi if Gi = 1 and τi = τncomp

i if Gi = 0

Page 91: estimating individual causal effects - Patrick Lampatricklam.org/talk/4_10_13.pdf · 1.think of estimating each ˝ i as a separate \study" where we have data consisting of observation

2-stage W on Y : similar results; exclusion restrictionpossibly okay

Y1: major items in road project Y2: major items in roads and ancillary project

Y3: materials in road project Y4: unskilled labor in road project

−0.2

0.0

0.2

−0.2

0.0

0.2

monotonicityno matching

monotonicity2nd stage matching

no monotonicityno matching

no monotonicity2nd stage matching

monotonicityno matching

monotonicity2nd stage matching

no monotonicityno matching

no monotonicity2nd stage matching

Assumption/Matching

ate

ATE Type

LATE

NCATE

Page 92: estimating individual causal effects - Patrick Lampatricklam.org/talk/4_10_13.pdf · 1.think of estimating each ˝ i as a separate \study" where we have data consisting of observation

conclusion

I argument for estimating ICEs

I combining matching with bayesian model

I enormous flexibility in discover treatment heterogeneity andrecover any causal quantity

I adaptable to different data structures

I extensions:

1. relaxing monotonicity assumption in IV estimation2. testing causal inference assumptions


Recommended