+ All Categories
Home > Documents > Hrs Rdd Slides f

Hrs Rdd Slides f

Date post: 02-Jun-2018
Category:
Upload: elizabetmirandasalas
View: 233 times
Download: 0 times
Share this document with a friend

of 40

Transcript
  • 8/10/2019 Hrs Rdd Slides f

    1/40

    Regression Discontinuity Design

    Marcelo Coca Perraillon

    University of Chicago

    April 24 & 29, 2013

    1 / 4 0

  • 8/10/2019 Hrs Rdd Slides f

    2/40

    Introduction

    Basics

    Method developed to estimate treatment effects in non-experimentalsettings

    Provides causal estimates of treatment effects

    Good internal validity; some assumptions can be empirically verified

    Treatment effects are local (LATE)

    Limits external validity

    Easy to estimate

    First application: Thistlethwaite and Campbell (1960)

    2 / 4 0

  • 8/10/2019 Hrs Rdd Slides f

    3/40

    Introduction Thistlethwaite and Campbell

    Thistlethwaite and Campbell

    They studied the impact of merit awards on future academic outcomes

    Awards allocated based on test scores

    If a person had a score greater than c, the cutoff point, then she

    received the awardSimple way of analyzing: compare those who received the award tothose who didnt. Why is this wrong?

    Confounding: factors that influence the test score are also related tofuture academic outcomes (income, parents education, motivation)

    Thistlethwaite and Campbell noted that they could compareindividuals just above and just below the cutoff point

    3 / 4 0

  • 8/10/2019 Hrs Rdd Slides f

    4/40

    Introduction Validity

    Validity

    Simple idea: assignment mechanism is completely knownWe know that the probability of treatment jumps to 1 if test score >c

    Assumption is that individuals cannot manipulate with precision theirassignment variable (think about the SAT)

    Key word: precision. Consequence: comparable individuals near cutoffpoint

    If treated and untreated individuals are similar near the cutoff pointthen data can be analyzed as if it were a (conditionally) randomizedexperiment

    If this is true, then background characteristics should be similar nearc(can be checked empirically)

    The estimated treatment effect applies to those near the cutoff point(external validity)

    4 / 4 0

  • 8/10/2019 Hrs Rdd Slides f

    5/40

  • 8/10/2019 Hrs Rdd Slides f

    6/40

    Introd ction Graphical Example

  • 8/10/2019 Hrs Rdd Slides f

    7/40

    Introduction Graphical Example

    No effect

    7 / 4 0

  • 8/10/2019 Hrs Rdd Slides f

    8/40

    Estimation

  • 8/10/2019 Hrs Rdd Slides f

    9/40

    Estimation

    Estimation: Parametric

    Simplest case is linear relationship between Y and X

    Yi =0+ 1Ti+3Xi+i

    Ti= 1 if subject i received treatment and Ti= 0 otherwise. You can

    also write this as Ti =1(Xi>c) orTi= [Xi>c]

    Xis the assignment variable (sometimes called forcing or runningvariable)

    Usually centered at cutoff point

    Yi=0+1Ti+3(Xi c) +i. Treatment effect is given by 1.

    E[Y|T = 1,X =c] =0+ 1 and E[Y|T = 0,X =c] =0.

    E[Y|T = 1,X =c] E[Y|T = 0,X =c] =1.

    9 / 4 0

    Estimation Centering

  • 8/10/2019 Hrs Rdd Slides f

    10/40

    Estimation Centering

    Reminder on centering

    Centering changes the interpretation of the intercept:

    Y = 0+ 1(Age 65) +2Edu

    = 0+ 1Age 165 +2Edu

    = (0 165) +1Age+2Edu

    Compare to:Y =0+ 1Age+2Edu

    1=1, 2=2, but 0= (0 165)

    Useful with interactions:

    Y =0+ 1Age+2Edu+3Age Edu

    Compare to:

    Y =0+ 1(Age 65) + 2(Edu 12) + 3(Age 65) (Edu 12)

    10/40

    Estimation Extrapolation

  • 8/10/2019 Hrs Rdd Slides f

    11/40

    Estimation Extrapolation

    Extrapolation

    Note that the estimation of treatment effect in RDD depends onextrapolation

    To the left of cutoff point only non-treated observationsTo the right of cutoff point only treated observations

    What is the treatment effect at X= 130? Just plug in:

    E[Y|T,X= 130] =0+ 1T+3(130 140)

    11/40

    Estimation Extrapolation

  • 8/10/2019 Hrs Rdd Slides f

    12/40

    p

    Extrapolation...

    Dashed lines are extrapolations

    12/40

    Estimation Extrapolation

  • 8/10/2019 Hrs Rdd Slides f

    13/40

    Counterfactuals

    The extrapolation is a counterfactual or potential outcomeEach person ihas two potential outcomes (Rubins causalframework).

    Yi(1) denotes the outcome of person i if in the treated group

    Yi(0) denotes the outcome of person i if in the non-treated groupCausal effect of treatment for person i is Yi(1) Yi(0)

    Average treatment effect is E[Yi(1) Yi(0)]

    Only one potential outcome is observed. In randomized experiments,

    one group provides the conterfactual for the other because they arecomparable (exchangeable)

    Exchangeability (epi). Also called selection on observables or nounmeasured confounders

    13/40

    Estimation Extrapolation

  • 8/10/2019 Hrs Rdd Slides f

    14/40

    Counterfactuals, II

    In RDD the counterfactuals are conditional on Xas in a conditionallyrandomized trial (think severity)

    We are interested in the treatment effect at X =c:E[Yi(1) Yi(0)|Xi =c]

    Treatment effect is limxcE[Yi|Xi=x] limxcE[Yi|Xi=x]

    Estimation possible because of the continuity ofE[Yi(1)|X] andE[Yi(0)|X]

    Since the estimation of the treatment effect is based on extrapolationbecause of lack of overlap, the functional relationship between X andYmust be correctly specified

    14/40

    Estimation Functional form

  • 8/10/2019 Hrs Rdd Slides f

    15/40

    Need to model relationship between X and Y correctly

    What if nonlinear? Could result in a biased treatment effect if one

    assumes a linear model.

    15/40

    Estimation Flexible specification

  • 8/10/2019 Hrs Rdd Slides f

    16/40

    Other specifications

    More general: Yi=0+ 1Ti+3f(Xi c) +i

    If (Xi c) = Xi then Yi=0+ 1Ti+3f(Xi) +i

    Most common form for f(Xi) are polynomials

    Polynomials of order p:Yi=0+1Ti+2Xi+3Xi

    2+4Xi

    3+ +p+1Xi

    p+i

    More flexibility with interactions

    2nd degree with interactions:

    Yi=0+1Ti+3Xi+4Xi2

    +5Xi Ti+6Xi2

    Ti+i

    Question: Why not controlling for other covariates?

    16/40

    Estimation Flexible specification

  • 8/10/2019 Hrs Rdd Slides f

    17/40

    Third degree polynomial. Actual model second degree polynomial (seeStata do file).

    17/40

    Example

  • 8/10/2019 Hrs Rdd Slides f

    18/40

    Real dataset

    Data from Lee, Moretti, Buttler (2004)

    Forcing variable is Democratic vote share. If share >50 thenDemocratic candidate is elected

    Outcome is a liberal voting score from the Americans for DemocraticAction (ADA)

    Do candidates who are elected in close elections tend to moderatetheir congressional voting?

    Data: http://harrisschool.uchicago.edu/Blogs/EITM/wp-content/uploads/2011/06/EITM-RD-Examples.zip

    Some of the code is old; syntax no longer works

    18/40

    Example

    http://harrisschool.uchicago.edu/Blogs/EITM/wp-content/uploads/2011/06/EITM-RD-Examples.ziphttp://harrisschool.uchicago.edu/Blogs/EITM/wp-content/uploads/2011/06/EITM-RD-Examples.ziphttp://harrisschool.uchicago.edu/Blogs/EITM/wp-content/uploads/2011/06/EITM-RD-Examples.ziphttp://harrisschool.uchicago.edu/Blogs/EITM/wp-content/uploads/2011/06/EITM-RD-Examples.zip
  • 8/10/2019 Hrs Rdd Slides f

    19/40

    Graph a bit messy

    scatter score demvoteshare, msize(tiny) xline(0.5) ///

    xtitle("Democrat vote share") ytitle("ADA score")

    19/40

    Example

  • 8/10/2019 Hrs Rdd Slides f

    20/40

    Always a good idea to add some jittering

    With the jitter option, it is easier to see where is the mass

    scatter score demvoteshare, msize(tiny) xline(0.5) ///xtitle("Democrat vote share") ytitle("ADA score") jitter(5)

    20/40

    Example

  • 8/10/2019 Hrs Rdd Slides f

    21/40

    Could add linear trend

    scatter score demvoteshare, msize(tiny) xline(0.5) xtitle("Democrat vote share") //

    ytitle("ADA score") || lfit score demvoteshare if democrat ==1, color(red) || ///

    lfit score demvoteshare if democrat ==0, color(red) legend(off)

    21/40

    Example

  • 8/10/2019 Hrs Rdd Slides f

    22/40

    Better something more flexible

    What about a polynomial of degree 5?

    gen demvoteshare2 = demvoteshare^2

    gen demvoteshare3 = demvoteshare^3

    gen demvoteshare4 = demvoteshare^4

    gen demvoteshare5 = demvoteshare^5

    qui reg score demvoteshare demvoteshare2 demvoteshare3 demvoteshare4 ///demvoteshare5 democrat

    qui predict scorehat

    scatter score demvoteshare, msize(tiny) xline(0.5) ///

    xtitle("Democrat vote share") ///

    ytitle("ADA score") || ///

    line scorehat demvoteshare if democrat ==1, sort color(red) || ///

    line scorehat demvoteshare if democrat ==0, sort color(red) legend(off)

    graph export lee3.png, replace

    22/40

    Example

  • 8/10/2019 Hrs Rdd Slides f

    23/40

    23/40

    Example

  • 8/10/2019 Hrs Rdd Slides f

    24/40

    Regression results

    Center the forcing variable and use polynomials and interactions

    gen x_c = demvoteshare - 0.5gen x2_c = x_c^2gen x3_c = x_c^3gen x4_c = x_c^4gen x5_c = x_c^5

    reg score i.democrat##(c.x_c c.x2_c c.x3_c c.x4_c c.x5_c)

    Source | SS df MS Number of obs = 13577

    -------------+------------------------------ F( 11, 13565) = 1058.32Model | 6677033.81 11 607003.073 Prob > F = 0.0000Residual | 7780253.69 13565 573.553534 R-squared = 0.4618

    -------------+------------------------------ Adj R-squared = 0.4614Total | 14457287.5 13576 1064.91511 Root MSE = 23.949

    ---------------------------------------------------------------------------------score | Coef. Std. Err. t P>|t| [95% Conf. Interval]

    ----------------+----------------------------------------------------------------

    1.democrat | 47.73325 2.042906 23.37 0.000 43.72887 51.73763x_c | - 28.79765 74.09167 -0.39 0.698 -174.0276 116.4323x2_c | - 1138.232 1144.497 -0.99 0.320 -3381.605 1105.141x3_c | - 10681.29 7137.315 -1.50 0.135 -24671.42 3308.839x4_c | - 33490.23 18844.92 -1.78 0.076 -70428.88 3448.424x5_c | - 32873.77 17212.08 -1.91 0.056 -66611.83 864.302

    |democrat#c.x_c |

    1 | - 5.793828 97.79088 -0.06 0.953 -197.4775 185.8899|

    24/40

    Example

  • 8/10/2019 Hrs Rdd Slides f

    25/40

    Could restrict to a window

    Run a flexible regression like a polynomial with interactions(stratified) but dont use observations away from the cutoff. Choose abandwidth around X = 0.5. Lee et al (2004) used 0.4 to 0.6.

    reg score demvoteshare demvoteshare2 if democrat ==1 & ///

    (demvoteshare>.40 & demvoteshare.40 & demvoteshare

  • 8/10/2019 Hrs Rdd Slides f

    26/40

    26/40

    Example

  • 8/10/2019 Hrs Rdd Slides f

    27/40

    Limit to window, 2nd degree polynomial

    reg score i.democrat##(c.x_c c.x2_c) if (demvoteshare>.40 & demvoteshare F = 0.0000Residual | 2104043.2 4626 454.829918 R-squared = 0.5549

    -------------+------------------------------ Adj R-squared = 0.5544Total | 4726805.22 4631 1020.6878 Root MSE = 21.327

    ---------------------------------------------------------------------------------score | Coef. Std. Err. t P>|t| [95% Conf. Interval]

    ----------------+----------------------------------------------------------------1.democrat | 45.9283 1.892566 24.27 0.000 42.21797 49.63863

    x_c | 38.63988 60.77525 0.64 0.525 -80.5086 157.7884x2_c | 295.1723 594.3159 0.50 0.619 -869.9704 1460.315

    |democrat#c.x_c |

    1 | 6.507415 88.51418 0.07 0.941 -167.0226 180.0374|democrat#c.x2_c |

    1 | - 744.0247 862.0435 -0.86 0.388 -2434.041 945.9916|

    _cons | 17.71198 1.310861 13.51 0.000 15.14207 20.28189---------------------------------------------------------------------------------

    27/40

    Nonparametric

  • 8/10/2019 Hrs Rdd Slides f

    28/40

    Nonparametric methods

    Paper by Hahn, Todd, and Van der Klaauw (2001) clarifiedassumptions about RDD and framed estimation as a nonparametricproblem

    Emphasized using local polynomial regression

    Nonparametric methods means a lot of things in statistics

    In the context of RDD, the idea is to estimate a model that does notassume a functional form for the relationship between Y and X. Themodel is something like Yi=f(Xi) +i

    A very basic method: calculate E[Y] for each bin on X (think of ahistogram)

    28/40

    Nonparametric

    Stata has a command to do just that:

  • 8/10/2019 Hrs Rdd Slides f

    29/40

    Stata has a command to do just that: cmogramAfter installing the command (ssc install cmogram) type help cmogram. Lotsof useful optionsCommon way to show RDD data. See for example Figure II of Almond et al.

    (2010). To recreate something like Figure 1 of Lee et al (2004):cmogram score demvoteshare, cut(.5) scatter line(.5) qfit

    29/40

    Nonparametric

    C li d LOWESS fi

  • 8/10/2019 Hrs Rdd Slides f

    30/40

    Compare to linear and LOWESS fits

    cmogram score demvoteshare, cut(.5) scatter line(.5) lfit

    cmogram score demvoteshare, cut(.5) scatter line(.5) lowess

    30/40

    Nonparametric Local polynomials

    L l l i l i

  • 8/10/2019 Hrs Rdd Slides f

    31/40

    Local polynomial regression

    Hahn, Todd, and Van der Klaauw (2001) showed that one-side Kernelestimation (like LOWESS) may have poor properties because thepoint of interest is at a boundary

    Proposed to use instead a local linear nonparametric regression

    Statas lpoly command estimates kernel-weighted local polynomialregression

    Think of it as a weighted regression restricted to a window (hencelocal). The Kernel provides the weights

    A rectangular Kernel would give the same result as taking E[Y] at agiven bin on X. The triangular Kernel gives more importance toobservations close to the center

    Method sensitive to choice of bandwidth (window)

    31/40

    Nonparametric Local polynomials

    L l i i hi h d

  • 8/10/2019 Hrs Rdd Slides f

    32/40

    Local regression is a smoothing method

    Kernel-weighted local polynomial regression is a smoothing method

    lpoly score demvoteshare if democrat == 0, nograph kernel(triangle) gen(x0 sdem0) bwidth(0.1)lpoly score demvoteshare if democrat == 1, nograph kernel(triangle) gen(x1 sdem1) bwidth(0.1)

    32/40

    Nonparametric Local polynomials

    T t t ff t

  • 8/10/2019 Hrs Rdd Slides f

    33/40

    Treatment effect

    Were interested in getting the treatment at X = 0.5

    gen forat = 0.5 in 1

    lpoly score demvoteshare if democrat == 0, nograph kernel(triangle) gen(sdem0) ///

    at(forat) bwidth(0.1)

    lpoly score demvoteshare if democrat == 1, nograph kernel(triangle) gen(sdem1) ///

    at(forat) bwidth(0.1)gen dif = sdem1 - sdem0

    list sdem1 sdem0 dif in 1/1

    +----------------------------------+

    | sdem1 sdem0 dif |

    |----------------------------------|

    1. | 64.395204 16.908821 47.48639 |+----------------------------------+

    33/40

    Nonparametric Local polynomials

    Diff t i d

  • 8/10/2019 Hrs Rdd Slides f

    34/40

    Different windows

    What happens when we change the bandwidth?

    34/40

    Nonparametric Local polynomials

    Nonparametric

  • 8/10/2019 Hrs Rdd Slides f

    35/40

    Nonparametric

    Several methods to choose optimal windows: trade off between biasand variance

    In practical applications, you may want to check balance around thatwindow

    Standard error of treatment effect can be bootstrapped but there areother alternatives

    Could add other variables to nonparametric methods

    See Stata do file for examples using commands rd obs and rdrobust

    rd obs is old and buggy. rdrobust is new and promising butpreliminary

    35/40

    Nonparametric Local polynomials

    Using rdrobust

  • 8/10/2019 Hrs Rdd Slides f

    36/40

    Using rdrobust

    . rdrobust score demvoteshare, c(0.5) all bwselect(IK)

    Sharp RD Estimates using Local Polynomial Regression

    Cutoff c = .5 | Left of c Right of c Number of obs = 13577----------------------+---------------------- Rho (h/b) = 0.770

    Number of obs | 3535 3318 NN Matches = 3Order Loc. Poly. (p) | 1 1 BW Type = IK

    Order Bias (q) | 2 2 Kernel Type = TriangularBW Loc. Poly. (h) | 0.152 0.152

    BW Bias (b) | 0.197 0.197

    --------------------------------------------------------------------------------------| Loc. Poly. Robust [Robust

    score | Coef. Std. Err. z P>|z| 95% Conf. Interval]----------------------+---------------------------------------------------------------

    demvoteshare | 47.171 1.262 36.9043 0.000 44.1 49.047108--------------------------------------------------------------------------------------

    All Estimates. Outcome: score. Running Variable: demvoteshare.

    --------------------------------------------------------------------------------------Method | Coef. Std. Err. z P>|z| [95% Conf. Interval]

    ----------------------+---------------------------------------------------------------Conventional | 47.171 .98131 48.0692 0.000 45.247 49.093991

    Bias-Corrected | 46.574 .98131 47.4608 0.000 44.65 48.496943Robust | 46.574 1.262 36.9043 0.000 44.1 49.047108

    --------------------------------------------------------------------------------------

    36/40

    Conclusion

    Parametric or non parametric?

  • 8/10/2019 Hrs Rdd Slides f

    37/40

    Parametric or non-parametric?

    When would parametric or non-parametric or window size matter?

    Small effectRelationship between Y and Xdifferent away from cutoffFunctional form not well captured by polynomials (or other functional

    form). Splines may work too.Easier to work with traditional methods (parametric)

    Could add random effects, robust standard errors, clustering SEs

    In practical applications a regression with polynomials usually works

    wellIf conclusions are different, worry. A lot.

    37/40

    Almond et al (2010)

    Marginal returns to medical care

  • 8/10/2019 Hrs Rdd Slides f

    38/40

    Marginal returns to medical care

    Big picture: is spending more money on health care worth it (in termsof health gained)?

    Actual research: is spending more money on low-weight newbornsworth it in terms of mortality reductions? Compare marginal costs

    (dollars) to marginal benefits (mortality transformed into dollars).

    On jargon: In economics marginal = additional. So compareadditional spending to additional benefit

    RDD part used to estimate marginal benefits

    Forcing variable is newborn weight. Cutoff point c= 1, 500 grams(almost 3 lbs)

    38/40

    Almond et al (2010) Estimation

    Estimating equation

  • 8/10/2019 Hrs Rdd Slides f

    39/40

    Estimating equation

    Their model is:

    Yi=0+ 1VLBWi+2VLBWi (gi 1500)+

    3(1 VLBWi)(gi 1500) +t+s+X

    i +i (1)

    Change notation so VLBW =Tand (gi 1500) = Xand after doingsome algebra the model is:

    Y =0+ 1T+3X+ (2 3)T X+ (t+s+X) +

    (t+s+X) are covariates

    39/40

    Almond et al (2010) Estimation...

    Covariates

  • 8/10/2019 Hrs Rdd Slides f

    40/40

    Covariates

    They compared means of covariates above and beyond cutoff point

    They found some differences (large sample) so they include covariatesin the model

    They did a RDD-type analysis on covariates to see if they weresmooth (no jump at cutoff)

    40/40


Recommended