+ All Categories
Home > Documents > Dynamic treatment assignment and evaluation of active ... 2017.pdfDynamic treatment assignment and...

Dynamic treatment assignment and evaluation of active ... 2017.pdfDynamic treatment assignment and...

Date post: 25-Jun-2020
Category:
Upload: others
View: 0 times
Download: 0 times
Share this document with a friend
13
Labour Economics 49 (2017) 42–54 Contents lists available at ScienceDirect Labour Economics journal homepage: www.elsevier.com/locate/labeco Dynamic treatment assignment and evaluation of active labor market policies Johan Vikström a,b a IFAU, Box 513, 751 20 Uppsala, Sweden b Department of Economics, UCLS, Uppsala University, Sweden a r t i c l e i n f o JEL classification: C21 J18 Keywords: Program evaluation Treatment effects Work practice Training Survival time a b s t r a c t This paper considers treatment evaluation in a discrete time setting in which treatment can start at any point in time. We consider evaluation under unconfoundedness and propose a dynamic inverse probability weighting estimator. A typical application is an active labor market program that can start after any elapsed unemploy- ment duration. The identification and estimation results concern both cases with one single treatment as well as sequences of programs. The new estimator is applied to Swedish data on participants in a training program and a work practice program. The work practice program increases re-employment rates. Most sequences of the two programs are inefficient when compared to one single program episode. © 2017 Elsevier B.V. All rights reserved. 1. Introduction A common feature of many active labor market programs (ALMPs) is that they can start after any elapsed unemployment duration. This dy- namic nature of the treatment assignment raises several methodological issues. The main issue is that, currently, non-treated individuals may be- come treated later on. The implication is that unconfoundedness-based methods that use static treatment status, defined as enrollment in treat- ment before exiting from unemployment, are no longer valid (see dis- cussions in e.g., Crépon et al., 2009; Fredriksson and Johansson, 2008; Lechner et al., 2011; Sianesi, 2004). The reason is that static treatment status depends on survival time (i.e., the outcome), since the proba- bility of treatment enrollment by construction increases with time in unemployment, and this confounds any analysis based solely on static treatment indicators. As a response, several papers explicitly address this dynamic treat- ment assignment problem. Sianesi (2004) proposes to transform the dy- namic problem into a static problem by focusing on the effect of treat- ment now versus waiting for treatment. Applications of this approach in- clude Sianesi (2008), Fitzenberger et al. (2008) and Biewen et al. (2014). Several other papers focus on the average effect of treatment af- ter some elapsed duration compared with never receiving treatment, and this is also the average effect considered in this paper. Both Fredriksson and Johansson (2008) and Crépon et al. (2009) utilize the I am grateful for helpful suggestions Gerard J. van den Berg, Gregory Jolivet, Per Johansson, Bas van der Klaauw, Martin Huber, Helena Holmlund, Oskar Nordström Skans, Ingeborg Waernbaum and seminar participants at IFAU Uppsala, the 4th IZA/IFAU conference, IFAU, SOFI and EALE 2016. Financial support from the Jan Wallander and Tom Hedelius Foundation, and FORTE are acknowledged. E-mail address: [email protected] outcomes of the not-yet treated to obtain the counterfactual outcome under never receiving treatment. In a related paper, Kastoryano and van der Klaauw (2011) compare different evaluation approaches in a dynamic setting. Other influential studies include Lechner (1999), Gerfin and Lechner (2002) and Lechner et al. (2011). In particular, Lechner et al. (2011) evaluate the long-run effects of a training pro- gram and discuss the static evaluation model in a case in which the sample size is too small to use the methods in Fredriksson and Johans- son (2008) and Sianesi (2008), adding to the literature on evaluation under dynamic treatment assignment. This paper contributes to the literature on evaluation in cases with dynamic treatment assignment. We consider a setting in which the pop- ulation of interest is a cohort of units that are in an initial state and the outcome of interest is the survival time in the initial state. In evaluations of ALMPs, the initial state is typically unemployment. The key feature is that exits out of the initial state and the start of treatment are allowed to occur at any point in time. Besides ALMPs, an important example of this setting is a medical treatment implemented at various times after the onset of a disease. For this setting we consider identification and estimation under the assumption of sequential unconfoundedness (se- lection on observables). That is, conditional on covariates, treatment assignment among the non-treated survivors is unrelated to future po- tential outcomes. In some settings this is a restrictive assumption, but it is more likely to hold in cases with detailed administrative data and/or http://dx.doi.org/10.1016/j.labeco.2017.09.003 Received 30 September 2016; Received in revised form 8 September 2017; Accepted 11 September 2017 Available online 12 September 2017 0927-5371/© 2017 Elsevier B.V. All rights reserved.
Transcript
Page 1: Dynamic treatment assignment and evaluation of active ... 2017.pdfDynamic treatment assignment and evaluation of active labor market ... assignment among the non-treated survivors

Labour Economics 49 (2017) 42–54

Contents lists available at ScienceDirect

Labour Economics

journal homepage: www.elsevier.com/locate/labeco

Dynamic treatment assignment and evaluation of active labor market policies

Johan Vikström

a , b

a IFAU, Box 513, 751 20 Uppsala, Sweden b Department of Economics, UCLS, Uppsala University, Sweden

a r t i c l e i n f o

JEL classification:

C21 J18

Keywords:

Program evaluation Treatment effects Work practice Training Survival time

a b s t r a c t

This paper considers treatment evaluation in a discrete time setting in which treatment can start at any point in time. We consider evaluation under unconfoundedness and propose a dynamic inverse probability weighting estimator. A typical application is an active labor market program that can start after any elapsed unemploy- ment duration. The identification and estimation results concern both cases with one single treatment as well as sequences of programs. The new estimator is applied to Swedish data on participants in a training program and a work practice program. The work practice program increases re-employment rates. Most sequences of the two programs are inefficient when compared to one single program episode.

© 2017 Elsevier B.V. All rights reserved.

1

i

n

i

c

m

m

c

L

s

b

u

t

m

n

m

c

S

t

a

F

Wa

o

u

v

a

G

L

g

s

s

u

d

u

o

o

i

t

t

t

e

l

a

t

i

hRA0

. Introduction

A common feature of many active labor market programs (ALMPs)s that they can start after any elapsed unemployment duration. This dy-amic nature of the treatment assignment raises several methodologicalssues. The main issue is that, currently, non-treated individuals may be-ome treated later on. The implication is that unconfoundedness-basedethods that use static treatment status, defined as enrollment in treat-ent before exiting from unemployment, are no longer valid (see dis-

ussions in e.g., Crépon et al., 2009; Fredriksson and Johansson, 2008;echner et al., 2011; Sianesi, 2004 ). The reason is that static treatmenttatus depends on survival time (i.e., the outcome), since the proba-ility of treatment enrollment by construction increases with time innemployment, and this confounds any analysis based solely on staticreatment indicators.

As a response, several papers explicitly address this dynamic treat-ent assignment problem. Sianesi (2004) proposes to transform the dy-amic problem into a static problem by focusing on the effect of treat-ent now versus waiting for treatment. Applications of this approach in-

lude Sianesi (2008) , Fitzenberger et al. (2008) and Biewen et al. (2014) .everal other papers focus on the average effect of treatment af-er some elapsed duration compared with never receiving treatment,nd this is also the average effect considered in this paper. Bothredriksson and Johansson (2008) and Crépon et al. (2009) utilize the

☆ I am grateful for helpful suggestions Gerard J. van den Berg, Gregory Jolivet, Per Johanssonaernbaum and seminar participants at IFAU Uppsala, the 4th IZA/IFAU conference, IFAU, SOFI

nd FORTE are acknowledged. E-mail address: [email protected]

ttp://dx.doi.org/10.1016/j.labeco.2017.09.003 eceived 30 September 2016; Received in revised form 8 September 2017; Accepted 11 Septemvailable online 12 September 2017 927-5371/© 2017 Elsevier B.V. All rights reserved.

utcomes of the not-yet treated to obtain the counterfactual outcomender never receiving treatment. In a related paper, Kastoryano andan der Klaauw (2011) compare different evaluation approaches in dynamic setting. Other influential studies include Lechner (1999) ,erfin and Lechner (2002) and Lechner et al. (2011) . In particular,echner et al. (2011) evaluate the long-run effects of a training pro-ram and discuss the static evaluation model in a case in which theample size is too small to use the methods in Fredriksson and Johans-on (2008) and Sianesi (2008) , adding to the literature on evaluationnder dynamic treatment assignment.

This paper contributes to the literature on evaluation in cases withynamic treatment assignment. We consider a setting in which the pop-lation of interest is a cohort of units that are in an initial state and theutcome of interest is the survival time in the initial state. In evaluationsf ALMPs, the initial state is typically unemployment. The key features that exits out of the initial state and the start of treatment are allowedo occur at any point in time. Besides ALMPs, an important example ofhis setting is a medical treatment implemented at various times afterhe onset of a disease. For this setting we consider identification andstimation under the assumption of sequential unconfoundedness (se-ection on observables). That is, conditional on covariates, treatmentssignment among the non-treated survivors is unrelated to future po-ential outcomes. In some settings this is a restrictive assumption, but its more likely to hold in cases with detailed administrative data and/or

, Bas van der Klaauw, Martin Huber, Helena Holmlund, Oskar Nordström Skans, Ingeborg and EALE 2016. Financial support from the Jan Wallander and Tom Hedelius Foundation,

ber 2017

Page 2: Dynamic treatment assignment and evaluation of active ... 2017.pdfDynamic treatment assignment and evaluation of active labor market ... assignment among the non-treated survivors

J. Vikström Labour Economics 49 (2017) 42–54

s

c

c

v

p

g

t

m

t

o

w

t

n

s

t

m

A

t

i

C

f

o

f

t

t

t

y

g

a

g

s

p

A

r

t

w

t

o

g

t

t

c

c

l

v

m

w

s

a

d

v

i

a

t

oo

H

n

i

c

t

p

a

o

p

t

(

e

l

o

m

i

s

m

i

o

c

T

t

h

c

j

d

p

s

e

w

w

a

n

T

t

t

u

M

c

e

D

i

m

m

r

S

2

2

w

T

b

o

urvey data. Below we discuss related approaches for cases where un-onfoundedness does not hold.

This paper contributes in several ways. One contribution is that weonsider identification and estimation with selection on time-variant co-ariates. This may be important for evaluations of active labor marketrograms and other policies, since it is likely that selection to the pro-ram not only depends on characteristics measured at the beginning ofhe unemployment, but also on characteristics like marital status thatay change during the unemployment spell. Allowing for selection on

ime-varying covariates extends the identification results for selectionn time-invariant covariates in Crépon et al. (2009) .

Another contribution is to provide a dynamic inverse probabilityeighting (DIPW) estimator for the average treatment effect on the

reated for treatment in a certain period against no treatment nowor thereafter. One advantage of the DIPW approach is that once thecores forming the weights have been estimated no additional func-ional form assumptions are needed. Recent applications of the newethods proposed in this paper include van den Berg et al. (2015) andlbanese et al. (2015) . The finite sample properties of the DIPW es-

imator are compared with those of the two-step matching estimatorn Fredriksson and Johansson (2008) and the blocking estimator inrépon et al. (2009) in a Monte Carlo simulation. The estimator allows

or selection on time-variant covariates. As an illustration of the DIPW estimator consider the average effect

n the treated at t . The survival rate under treatment is obtained directlyrom those actually treated at t . The counterfactual exit rate under noreatment at t is estimated by weighting the outcomes of the not-yetreated at t in order to mimic the distribution of the confounders inhe population of treated at t . In subsequent periods, some of the not-et treated at t are treated, and this creates selective censoring in theroup of not-yet treated. However, under unconfoundedness the weightst 𝑡 + 1 correct for this selective censoring, so that the DIPW estimatorives the desired exit rate at 𝑡 + 1 . Similar re weighting occurs in sub-equent periods. An estimator for the average effect averaged over allre-treatment durations is provided.

We also extend the results to allow sequences of treatments. In anLMP setting, this is important, since many unemployed workers en-oll in several programs during a single unemployment spell. 1 As forhe analysis of a single treatment we focus on a survival time setting inhich the outcome of interest is a transition from an initial state to a des-

ination state. We study both treatment sequences defined by sequencesf the same treatment and sequences defined by multiple types of pro-rams. Specifically, we examine identification and DIPW estimation ofhe difference in survival under two treatment sequences, and show thathis parameter is identified under a generalization of the sequential un-onfoundedness assumption for cases with a single treatment.

The analysis of sequences of treatments in a survival time setting islosely related to previous papers on sequences of treatments. In particu-ar, Lechner (2008) , Lechner (2009) and Lechner and Miquel (2010) de-elop a seminal framework for causal analysis of sequences of treat-ents, and propose and implement matching and inverse probabilityeighting estimators for various average effects. 2 This causal model as-

umes a setting with discrete periods for which treatments, confoundersnd outcomes are observed in all periods. The outcome of interest is theifference between two potential outcomes at a single point in time.

The main difference compared to our study is that we consider a sur-ival time setting, which affects the causal effects that are possible todentify. One reason for this is that, in a duration setting, the outcomend the treatment at t are only observed for individuals conditional onhat (s)he has survived up to time 𝑡 − 1 . This leads to the well known dy-

1 For instance, in Sweden almost 24% of program participants participate in more than ne program during one unemployment spell. This is conditional on participating at least nce in a program during the time period 1999–2006. 2 There is also a parallel epidemiological literature, see e.g., Robins (1986) and ernan et al. (2001) .

i

ltt

43

amic selection problem, which, for instance, implies that the differencen hazard rates cannot be identified without model assumptions sinceonditioning jointly on the outcome and a counterfactual treatment atime 𝑡 − 1 is not possible ( van den, 2001 ). Instead we show that it isossible to identify effects on the survival rate, and this separates ournalysis of sequences of treatment from previous studies on sequencesf treatments.

This paper also relates to several other strands of the literature. Inarticular, if the sequential unconfoundedness is believed not to holdhere are several alternative approaches. First, the Timing-of-EventsToE) approach by Abbring and van den Berg (2003) also considersvaluation in a dynamic setting, in which exits and treatments are al-owed to occur at any point in time. One difference compared withur approach is that the ToE approach allows the selection into treat-ent to be based on both observed and unobserved heterogeneity. This

s achieved at the expense of imposing the mixed proportional hazardtructure, whereas the DIPW approach in this paper requires no para-etric assumptions. Another difference is that we consider evaluation

n discrete time while the ToE approach is for continuous time. 3 Sec-nd, Heckman and Navarro (2007) establish semi-parametric identifi-ation of a discrete time model of time to the treatment and its effects.heir model requires the presence of additional covariates, besides thereatment indicator, that are independent of unobservable errors andave large support, whereas this paper considers evaluation under un-onfoundedness. Third, with multiple spell data on can attempt to ad-ust for fixed unobserved heterogeneity, using approaches similar to theifference-in-differences design.

The DIPW estimator is illustrated using data from a Swedish workractice program. Data for the period 2003–2006 are used and the re-ult is that the program increases the employment rate 15 months afternrollment in the program. We also study the impact of sequences ofork practice episodes and the effects of different combinations of theork practice program and a labor market training program. From thisnalysis we conclude that in most cases a second program episode doesot reduce the total unemployment more than a single program episode.his holds for most timings of the first program and most spacings be-ween the two program episodes.

Our results on the effects of sequences of programs are relatedo previous studies on the effects of sequences of ALMP programssing the Lechner (2009) framework: Lechner (2009) , Lechner andiquel (2010) , Lechner and Wiehler (2013) and Dengler (2015) . Our

omparison of two different programs is related to studies on the relativefficiency of different programs: Osikominu (2013) , Hotz et al. (2006) ,yke et al. (2006) , Sianesi (2008) and Fitzenberger et al. (2008) .

The rest of the paper is organized as follows. Section 2 considersdentification and DIPW estimation for the case with one single treat-ent. In Section 3 , we extend the results allowing for sequences of treat-ents. Section 4 gives the simulation results for the DIPW estimator and

elated estimators. Section 5 reports the results from the application, andection 6 concludes.

. Identification and estimation

.1. Evaluation framework

We consider the average effects on survival time when transitions asell as the start of treatment can occur at any point in discrete time.he time to the start of the treatment is denoted by S and we let Y t ( s )e an indicator of a transition in period t if treated at s . The potentialutcome if never treated is denoted by Y t (0) and the observed outcomen period t is Y t . Denoted by 𝑌 𝑡 ( 𝑠 ) is the sequence of potential outcomes

3 In evaluations of ALMPs, unemployment durations are typically measured at the daily evel (discrete time). However, if the treatment rate is low, some aggregation (for instance o 30-day intervals) is necessary in order to obtain a sufficient number of treated in each ime period. The ToE approach does not suffer from this aggregation issue.

Page 3: Dynamic treatment assignment and evaluation of active ... 2017.pdfDynamic treatment assignment and evaluation of active labor market ... assignment among the non-treated survivors

J. Vikström Labour Economics 49 (2017) 42–54

𝑌

a

z

1

o

o

s

i

A

P

e

t

e

u

e

c

t

i

A

m

w

2

i

o

I

u

a

g

t

p

d

s

o

s

p

A

t

q

t

{

I

s

a

c

t

h

s

d

s

p

q

s

p

h

u

c

f

e

k

a

a

i

b

s

t

f

t

c

t

t

v

p

a

a

p

c

n

f

C

i

u

c

a

e

P

T

n

f

t

a

o

t

a

h

i

l

i

p

S

a

4 One difference is that Biewen et al. (2014) consider selection on time-invariant co- variates while we allow for selection on time-varying covariates.

5 Another difference is that the W-DCIA assumption concerns sequences of treatments, so that the assumption should hold conditional on previous treatments. In Section 3 , we generalize (A1) to a setting with sequences of treatments.

6 For estimation, Assumption (A2) needs to hold given 𝑋 𝑡 − . 7 However, there is one special case when the assumption still holds. As pointed out by a

referee, if both treated and non-treated unemployed workers with the same characteristics have the same beliefs about future treatments and respond to these beliefs in the same way, both groups are affected by any future treatments in the same way, so that any anticipatory behaviour cancels out when comparing treated and non-treated. Conceptually, this is not possible in an evaluation framework with one single treatment, but it may hold in many real world settings in which individuals may be treated several times.

8 We have: Pr ( 𝐷 𝑡 = 𝑡 |𝑋 𝑡 − , 𝑆 ≥ 𝑡, 𝑌 𝑡 −1 = 0) < 1 for all t . That is, as for static average treatment effects on the treated the treatment propensity needs to be below one.

𝑡 ( 𝑠 ) = { 𝑌 1 ( 𝑠 ) , … , 𝑌 𝑡 ( 𝑠 )} , 𝑌 𝑡 is a similar sequence of observed outcomes,nd 𝑌 𝑡 ( 𝑠 ) = 0 implies that all outcomes in the sequence are equal toero. Throughout the paper we assume a sample of N individuals 𝑖 = , … , 𝑁 . Subsequently, the notation Y t, i is used to denote the observedutcome of a specific individual, and similar notation will is used forther variables.

We consider the average effect of treatment at s on the probability ofurviving to time point t compared with survival throughout the samenterval if never treated:

TET 𝑡 ( 𝑠 ) =

r ( 𝑌 𝑡 ( 𝑠 ) = 0 |𝑆 = 𝑠, 𝑌 𝑠 −1 ( 𝑠 ) = 0) − Pr ( 𝑌 𝑡 (0) = 0 |𝑆 = 𝑠, 𝑌 𝑠 −1 ( 𝑠 ) = 0) . (1)

Let us illustrate this setting using the application in this paper, whichvaluates the effects of a Swedish work practice program. The aim ofhe program is to provide unemployed workers with practical experi-nce in a certain profession, and the program can start after any elapsednemployment duration. Then, S is the time between the entry into un-mployment and the start of the work practice program, Y t ( s ) is an indi-ator of a transition from unemployment to employment in time period if treated at s , and 𝑌 𝑡 = 0 implies that the unemployed worker remainsn unemployment for at least t periods. The average effect of interest,TET t ( s ), is the difference in the probability of remaining in unemploy-ent up until period t when comparing work practice after s periodsith no work practice.

.2. Assumptions

We assume that we have data on the selection to treatment such thatt is reasonable to assume that the potential durations are independentf assignment to treatment when conditioning on observed covariates.n particular, we allow for selection on time-variant covariates. In eval-ations of active labor market programs (ALMPs), this allows treatmentssignments to not only depend on characteristics measured at the be-inning of the unemployment spell, but also to depend on characteristicshat change during the unemployment spell. For instance, some unem-loyed workers divorce and local labor market conditions can change.

To rule out effects of the treatment in period t on X , the covariatesetermining treatment assignment in t should be measured before as-ignments are made. For that reason we use the notation 𝑋 𝑡 − for thebserved covariates at t , where 𝑡 − indicates that X is measured at leastlightly before t . Note that 𝑋 𝑡 − may include covariates from previouseriods, even the entire vector of covariates from all previous periods.lso, let us introduce the notation D t , which denotes treatment in period .

Formally, the assumptions is that unconfoundedness should hold se-uentially conditional on the time-varying observed covariates amonghe non-treated survivors:

𝑌 𝑘 ( 𝑠 ); ∀𝑘, 𝑠 ≥ 𝑡 } ⊥ 𝐷 𝑡 | 𝑋 𝑡 − , 𝑆 > 𝑡 − 1 , 𝑌 𝑡 −1 (0) = 0) . (A1)

n evaluations of ALMPs, the assumption is that for unemployed workerstill unemployed and non-treated at t , treatment assignments in period tre independent of the potential outcomes (re-employment rates) afteronditioning on covariates measured shortly before t . Whether sequen-ial unconfoundedness assumption holds or does not hold depends onow treatment is assigned and on the information available. The as-umption is more likely to hold in cases with detailed administrativeata and/or survey data. For instance, if unemployed workers mainlyelf-select into the ALMP of interest, individual motivation, ability andersonality traits may affect treatment decisions. In such cases, the se-uential unconfoundedness is not likely to hold, unless the data includeurvey information on these factors and/or other variables that are closeroxies for individual motivation and the other factors. On the otherand if caseworkers determine assignment to ALMPs and if the data setsed in the analysis contains most of the information available when

44

aseworkers decide on treatment enrollments, the sequential uncon-oundedness is more likely to hold. Thus, the sequential unconfound-dness assumption has to be assessed on a case-by-case basis based onnowledge about the treatment assignment process and the data avail-ble.

Assumption (A1) is similar to assumptions made in related dynamicpproaches. Biewen et al. (2014) make a dynamic conditional meanndependence assumption (DCIA). They consider multiple treatments,ut otherwise their DCIA assumption has similar implications as ourequential unconfoundedness assumption. 4 Both Assumption (A1) andhe DCIA assumption concern treatment assignments in a given periodor individuals surviving up until this time period. For this populationhose starting treatment and those who remain non-treated should beomparable conditional on the covariates.

Lechner (2009) and Lechner and Miquel (2010) consider the iden-ification of effects of sequences of treatments under the assumption ofweak conditional independence assumption ” (W-DCIA), which implieshat conditional on previous treatments and possibly time-varying co-ariates, treatment assignments in the next period are unrelated to theotential outcomes in that period. This assumption is in the same spirits our sequential unconfoundedness assumption. In both cases treatmentssignments should be unrelated to potential outcomes conditional onast events and current covariates. The main difference is that this paperonsiders a survival time setting, so that our sequential unconfounded-ess assumption is for the population of survivors while the W-DCIA isor the full population. 5

Next, Fredriksson and Johansson (2008) andrépon et al. (2009) both consider identification in settings sim-

lar to the one considered in this paper. These papers considernconfoundedness assumptions allowing for selection on time-invariantovariates. In this paper, we extend these previous identification resultsnd allow for selection on time-varying covariates.

The next assumption is the familiar no-anticipation assumption (see.g., Abbring and van den Berg, 2003 6 :

r ( 𝑌 𝑡 ( 𝑠 ′) = 1) = Pr ( 𝑌 𝑡 ( 𝑠 ′′) = 1) , ∀𝑡 < min ( 𝑠 ′, 𝑠 ′′) . (A2)

he implication of Assumption (A2) is that future treatments shouldot affect current outcomes. This holds if individuals are unaware ofuture treatments or if they do not alter their behaviour as a responseo knowledge of future treatments. In practice, this holds if individualsre only informed about upcoming treatments shortly before the startf the treatment, or if there is a great deal of uncertainty around futurereatment assignments. On the other hand, if individuals are informedbout future treatments far in advance the assumption is not likely toold, since this gives individuals time to react to future treatments. Fornstance, if unemployed workers are assigned to a ALMP that they wouldike to avoid well in advance, this is expected to affect their job searchntensity and reservation wages already before the actual start of therogram. 7

Besides Assumptions (A1) and (A2) , an overlap condition 8 andUTVA need to hold. The latter rules out general equilibrium effectsnd other types of interference between the individuals in the sample.

Page 4: Dynamic treatment assignment and evaluation of active ... 2017.pdfDynamic treatment assignment and evaluation of active labor market ... assignment among the non-treated survivors

J. Vikström Labour Economics 49 (2017) 42–54

2

(

t

t

o

s

t

o

r

a

a

p

e

t

i

a

T

t

A

𝐸

P

c

d

w

a

c

s

t

fi

i

s

b

t

o

a

2

t

t

A

A

w

𝑤

w

𝑡

p

p

s

t

s

o

i

o

t

d

n

n

t

t

w

a

e

W

t

m

H

o

c

i

i

h

A

w

s

n

m

w

A

p

t

o

2

c

s

t

i

f

v

v

r

c

e

t

{

T

c

9 See, for instance, Hirano et al. (2003) for a discussion of the implications of using estimated scores instead of true scores.

10 In each period, we assume that censoring occurs before any treatments and we implic- itly assume that the estimation sample consists of all observations that are not censored in the first period.

.3. Identification

We show that ATET t ( s ) is identified under Assumptions (A1) andA2) . The survival function under treatment at s is directly identified byhe outcomes of those actually treated at s . The main issue is instead howo select a proper control group in order to identify the counterfactualutcome, Pr ( 𝑌 𝑡 (0) = 0 |𝑆 = 𝑠, 𝑌 𝑠 −1 ( 𝑠 ) = 0) . One key problem is that thetart of treatment may occur at any point in time, so that individuals notreated at t may become treated after t . Another problem is that the startf treatment is unobserved if an individual leaves the initial state beforeeceiving treatment. This identification problem is discussed by, for ex-mple, Fredriksson and Johansson (2008) and Crépon et al. (2009) in setting with selection on time-invariant covariates. The idea in bothapers is to successively use all those not-yet treated at t to estimate thexit rate under no-treatment at t for those treated at s . We now showhat a similar logic applies with selection on time-varying covariates.

For the effect of treatment in the first period, ATET 2 (1), our maindentification result is summarized in Theorem 1 . Similar expressionspply for ATET t ( s ) for other t and s .

heorem 1 (Identification of ATET) . Suppose that (A1) and (A2) hold

hen

TET 2 (1) = Pr ( 𝑌 2 = 0 , 𝑌 1 = 0 |𝑆 = 1)−

𝑋 1 − |𝑆=1 [𝐸 𝑋 2 − |𝑋 1 − ,𝑌 1 =0 ,𝑆=1 { Pr ( 𝑌 2 = 0 |𝑋 2 − , 𝑌 1 = 0 , 𝑆 > 2)}

Pr ( 𝑌 1 = 0 |𝑋 1 − , 𝑆 > 1) ].

roof. See Appendix A . □

In each period, only not-yet treated individuals are used, so that theontrol group successively changes as some previously non-treated in-ividuals start treatment. In the first period not-yet treated individualsith S > 1 are used and in the second period not-yet treated individu-ls with S > 2 are used. It is possible to use this successively changingomparison group, since, conditional on the covariates, treatment as-ignments among the non-treated survivors are assumed to be unrelatedo potential outcomes.

Note that both Assumptions (A1) and (A2) are important for identi-cation. Assumption (A1) relates to the allocation of treatment across

ndividuals, and assures that the treated and the not-yet treated haveimilar potential outcomes. Assumption (A2) concerns the relationshipetween different potential outcomes for a given individual, and assureshat the outcomes of the not-yet treated at t can be used to mimic theutcomes under never treatment even if some of the not-yet treated at tre treated after t .

.4. Dynamic IPW estimation

We propose a dynamic inverse probability weighting (DIPW) estima-or for the average treatment effect on the treated. Appendix A.1 showshat if Assumptions (A1) and (A2) hold a consistent estimator ofTET t ( s ) is:

TET 𝑡 ( 𝑠 ) =

𝑡 ∏𝑘 = 𝑠

[

1 −

∑𝑖 𝑌 𝑘,𝑖 𝟏 ( 𝑌 𝑘 −1 ,𝑖 = 0) 𝟏 ( 𝑆 𝑖 = 𝑠 ) ∑𝑖 𝟏 ( 𝑌 𝑘 −1 ,𝑖 = 0) 𝟏 ( 𝑆 𝑖 = 𝑠 )

]

𝑡 ∏𝑘 = 𝑠

[

1 −

∑𝑖 �� 𝑖 ( 𝑠, 𝑘 ) 𝑌 𝑘,𝑖 𝟏 ( 𝑌 𝑘 −1 ,𝑖 = 0) 𝟏 ( 𝑆 𝑖 > 𝑘 ) ∑𝑖 �� 𝑖 ( 𝑠, 𝑘 ) 𝟏 ( 𝑌 𝑘 −1 ,𝑖 = 0) 𝟏 ( 𝑆 𝑖 > 𝑘 )

]

(2)

ith the estimated weights

𝑖 ( 𝑠, 𝑘 ) =

𝑝 𝑠 ( 𝑋 𝑖,𝑠 − ) 1 − 𝑝 𝑠 ( 𝑋 𝑖,𝑠 − )

1 ∏𝑘 𝑚 = 𝑠 +1 1 − 𝑝 𝑚 ( 𝑋 𝑖,𝑚 − )

, (3)

here 1 ( ) is an indicator function, and 𝑝 𝑠 ( 𝑋 𝑖,𝑠 − ) = Pr ( 𝑆 = 𝑡 |𝑋 𝑖,𝑠 − , 𝑆 ≥

, 𝑌 𝑡 −1 = 0) . is a propensity score. Specifically, 𝑝 𝑠 ( 𝑋 𝑖,𝑠 − ) is the estimated

45

robability of obtaining treatment in period s given survival until timeeriod s and covariates X . Note that the weights are normalized by con-truction. One way to obtain standard errors is by bootstrapping. Sincehe selection probabilities and the weights are re-estimated in each boot-trap replication, this accounts for the variation in both the estimationf weights and the outcome equation. 9

Let us consider the intuition behind the estimator. If the interest liesn the average effect on the treated at s the actually observed outcomesf those treated at s , can be used to estimate the survival rate underreatment(first part of the estimator). The counterfactual outcome un-er no-treatment is obtained using non-treated survivors at s , i.e. thoseot-yet treated at s . Under Assumption (A1) , the treated at s and theot yet treated at s are comparable if we adjust for the fact that dueo the assignment process the distribution of X differs between thesewo populations. Thus, the counterfactual exit rate at s is obtained byeighting the not-yet treated at risk at s and the exits among this group,nd the weights essentially follow from the IPW estimator of the av-rage effect on the treated in the static evaluation literature (see e.g.,ooldridge, 2010 ). At 𝑠 + 1 , i.e. in the second period after the start of treatment, a frac-

ion of the not-yet treated at s that survives up until 𝑠 + 1 starts treat-ent. This creates selective censoring in the group of not-yet treated.owever, under Assumption (A1) , assignments at 𝑠 + 1 only depend onbserved covariates, so that the selective censoring can be taken into ac-ount by weighting the outcomes of the not-yet treated at 𝑠 + 1 , and thiss the purpose of the second part of the weights. The implication is thatndividuals who are still not-yet treated with covariates such that theyave a high probability of starting treatment are given larger weight.gain, the exit rate is obtained by dividing the weighted exits by theeighted risk set. Similar weighting occurs in subsequent periods.

Note that in each period only not-yet treated individuals are used,o that the control group successively changes as some previouslyon-treated individuals start treatment. For evaluations of ALMPs, thiseans that the control group in each period consists of unemployedorkers who are still unemployed and who have not enrolled in theLMP yet. Some of these comparison workers, however, enroll into therogram in subsequent periods, at which time they are removed fromhe control group, and the weights account for this selective censoringf the not-yet treated who are treated.

.5. Right censoring

The DIPW estimator above ignores regular right-censoring, which isommon in many applications due to, for instance, drop-out from thetudy or a limited follow-up period. A standard way of handling thisype of right censoring is to include the individual up until the censor-ng point. Under completely random censoring we do not have to adjustor right censoring, while if the right-censoring depends on observedariables it will create another source of selective censoring since indi-iduals with certain types of characteristics will be censored at a higherate. We now consider identification and estimation under such selectiveensoring. 10

Formally, let C t be an indicator for censoring in period t . We considerstimation if the censoring is sequentially independent conditional onime-varying covariates t :

𝑌 𝑘 ( 𝑠 ); ∀𝑘, 𝑠 ≥ 𝑡 } ⊥ 𝐶 𝑡 | 𝑋 𝑡 − , 𝑌 𝑡 −1 = 0) . (A3)

his is similar to Assumption (A1) , except that this assumption con-erns right-censoring instead of treatment assignment. Importantly, the

Page 5: Dynamic treatment assignment and evaluation of active ... 2017.pdfDynamic treatment assignment and evaluation of active labor market ... assignment among the non-treated survivors

J. Vikström Labour Economics 49 (2017) 42–54

r

i

a

o

i

T

(

A

P

c

s

v

t

s

a

s

c

A

i

c

b

g

2

o

l

2

e

i

u

t

a

a

w

i

n

t

l

m

s

s

t

w

u

t

t

o

s

i

m

t

2

t

i

r

t

A

i

i

3

c

fi

i

u

u

p

s

w

r

m

t

𝑌

h

a

f

e

t

i

v

u

c

o

t

t

s

h

c

A

m

t

r

11 The results of this paper concern the identification and estimation of this parameter of interest. Contrasting sequences with respect to a starting point, t > 1 can also be considered for a population that follows the same treatment sequence in the time points before the new starting point of interest. One can also consider the ATE for subpopulations, such as

ATET 𝑠 𝑡 , 𝑠 ∗ 𝑡 ( 𝑠 ′ ) = Pr ( 𝑌

𝑠 𝑡 = 0 |𝑆 = 𝑠 ′ ) − Pr ( 𝑌 𝑠 ∗ 𝑡 = 0 |𝑆 = 𝑠 ′ ) .

ight-censoring is allowed to depend on time-varying covariates, allow-ng the right-censoring to depend on time-varying characteristics suchs local labor market conditions. As for Assumption (A1) the plausibilityf Assumption (A3) depends on the exact censoring process and on thenformation available. In terms of identification we have:

heorem 2 (Identification with right-censoring) . Suppose that (A1) ,A2) and (A3) hold then

TET 2 (1) = 𝐸 𝑋 1 − |𝑆=1 [ 𝐸 𝑋 2 − |𝑋 1 − ,𝑌 1 =0 ,𝑆=1 ,𝐶 1 =0 { Pr ( 𝑌 2 = 0 |𝑌 1 = 0 , 𝑆 = 1 , 𝐶 2 = 0 , 𝑋 2 − )}

×Pr ( 𝑌 1 = 0 |𝑆 = 1 , 𝐶 1 = 0) , 𝑋 1 − ]

− 𝐸 𝑋 1 − |𝑆=1 [ 𝐸 𝑋 2 − |𝑋 1 − ,𝑌 1 =0 ,𝑆=1 ,𝐶 1 =0 ×{ Pr ( 𝑌 2 = 0 |𝑌 1 = 0 , 𝑆 > 2 , 𝐶 2 = 0 , 𝑋 2 − )}

×Pr ( 𝑌 1 = 0 |𝑆 > 1 , 𝐶 1 = 0) , 𝑋 1 − ] .

roof. See Appendix B. □

As before, only not-yet treated individuals are used to estimate theounterfactual outcome under no treatment. In cases with right cen-oring, the control group successively changes both because some pre-iously non-treated individuals start treatment and because some non-reated individuals are right-censored. In both cases the process is as-umed to be random conditional on the possibly time-varying covari-tes, and this allows us to adjust for the selection that occurs due to theuccessive treatments and right censoring.

The DIPW estimator can also be adjusted to accommodate for right-ensoring. Under Assumptions (A1)–(A3) , the DIPW estimator is:

TET 𝑡 ( 𝑠 ) = 𝑡 ∏

𝑘 = 𝑠

⎡ ⎢ ⎢ ⎣ 1 − ∑

𝑖 1 ∏𝑘

𝑚 = 𝑠 +1 [ 1− 𝑐 𝑚 ( 𝑋 𝑖,𝑚 − )] 𝑌 𝑘,𝑖 𝟏 ( 𝑌 𝑘 −1 ,𝑖 = 0) 𝟏 ( 𝑆 𝑖 = 𝑠 ) 𝟏 ( 𝐶 𝑘,𝑖 = 0) ∑

𝑖 1 ∏𝑘

𝑚 = 𝑠 +1 [ 1− 𝑐 𝑚 ( 𝑋 𝑖,𝑚 − )] 𝟏 ( 𝑌 𝑘 −1 ,𝑖 = 0) 𝟏 ( 𝑆 𝑖 = 𝑠 ) 𝟏 ( 𝐶 𝑘,𝑖 = 0)

⎤ ⎥ ⎥ ⎦ −

𝑡 ∏𝑘 = 𝑠

⎡ ⎢ ⎢ ⎢ ⎣ 1 − ∑

𝑖

𝑝 𝑠 ( 𝑋 𝑖,𝑠 − ) 1− 𝑝 𝑠 ( 𝑋 𝑖,𝑠 − )

1 ∏𝑘 𝑚 = 𝑠 +1 [ 1− 𝑝 𝑚 ( 𝑋 𝑖,𝑚 − )] [ 1− 𝑐 𝑚 ( 𝑋 𝑖,𝑚 − )]

𝑌 𝑘 𝟏 ( 𝑌 𝑘 −1 ,𝑖 = 0) 𝟏 ( 𝑆 𝑖 > 𝑘 ) 𝟏 ( 𝐶 𝑘,𝑖 = 0) ∑𝑖

𝑝 𝑠 ( 𝑋 𝑖,𝑠 − ) 1− 𝑝 𝑠 ( 𝑋 𝑖,𝑠 − )

1 ∏𝑘 𝑚 = 𝑠 +1 [ 1− 𝑝 𝑚 ( 𝑋 𝑖,𝑚 − )] [ 1− 𝑐 𝑚 ( 𝑋 𝑖,𝑚 − )]

𝟏 ( 𝑌 𝑘 −1 ,𝑖 = 0) 𝟏 ( 𝑆 𝑖 > 𝑘 ) 𝟏 ( 𝐶 𝑘,𝑖 = 0)

⎤ ⎥ ⎥ ⎥ ⎦ with 𝑐 𝑡 ( 𝑋 𝑖 ) = Pr ( 𝐶 𝑖 = 𝑡 |𝑋 𝑖 , 𝑆 𝑖 ≥ 𝑡, 𝑌 𝑡 −1 ,𝑖 = 0) , i.e. the censoring probabil-ty in period t is similar to a propensity score. 𝑐 𝑡 ( 𝑋 𝑖 ) denotes estimatedensoring probabilities. Note that any selective right censoring affectsoth the treated and the not-yet treated, so that the outcome of bothroups are re-weighted.

.6. Trimming

It is well known that IPW estimation is sensitive to extreme valuesf the propensity scores (p-scores), since single observations receive tooarge weight (see e.g., Busso et al., 2014; Frölich, 2004; Huber et al.,013 ). In this paper, the weights are a function of the product of sev-ral propensity scores, which exaggerates problems with extreme valuesf units with extreme values for one propensity also have extreme val-es for other scores. It is, therefore, important to perform some kind ofrimming. Here, we propose to use a variant of the trimming three-steppproach in Huber et al. (2013) . For average treatment effects, theirpproach implies setting the weights to zero for all treated (controls)hose share of the sum of all weights in the treatment (control) group

s greater than t % (e.g., 4%). Then, the weights are normalized again. Fi-ally, treated (control) observations whose propensity score is smallerhan largest score in the control (treatment) group or whose score isarger than the smallest score in the control (treatment) group are re-oved.

In this paper, the IPW weights are a function of several propensitycores and the weights given to a certain individual change with theurvival time. We therefore consider a slightly modified version of therimming approach in Huber et al. (2013) . Firstly, using the t % rule,e obtain the cut-off values 𝑤 ( 𝑠, 𝑘 ) for all s ≤ k ≤ t . Thereafter, we onlyse individuals whose weights are below 𝑤 ( 𝑠, 𝑘 ) in all time periods (be-ween s and t ). This assures that extreme values are discarded and that

46

he same types of individuals are discarded in both treatment arms. Sec-ndly, we apply the minimum and maximum logic to the product of thecores. We exclude observations whose product of the estimated scoresf non-treated is smaller (greater) than the maximum (minimum) of theinimum (maximum) product if non-treated among the treated and con-

rols(for time periods between s and t ).

.7. Aggregated effect

ATET t ( s ) provides estimates for each separate pre-treatment dura-ion. From a policy perspective, the overall effect is also of interest, thats the average effect on the treated averaged over all pre-treatment du-ations. Specifically, the overall effect on the probability of surviving t ′ime periods after the start of the treatment can be estimated as:

TET 𝑡 ′ =

∑𝑠 𝑃 ( 𝑠 ) ATET 𝑠 + 𝑡 ′ ( 𝑠 ) ,

.e., as an average over relevant pre-treatment durations. Here, 𝑃 ( 𝑠 ) =𝑛 𝑠 ∑𝑠 𝑛 𝑠

, where n s is the number of treated at s , so that 𝑃 ( 𝑠 ) is the fraction

n the sample of treated starting treatment at s .

. Sequential treatments with duration outcomes

So far we have considered the case with one single treatment. By fo-using on one single treatment, we were able to focus on the key identi-cation and estimation issues. However, in many cases we are interested

n the effects of sequences of treatments. For instance, in Sweden manynemployed workers participate in more than one ALMP during onenemployment spell, including participation in different programs andarticipation in the same program more than once. We therefore con-ider an extension of the results for one single treatment into a settingith sequences of multiple treatments.

We continue to focus on a survival time setting. Formally, in each pe-iod there are M mutually exclusive treatments and no treatment. Treat-ent in time period t is denoted by S t and we use the notation 𝑠 𝑡 for a par-

icular sequence of treatments. We have the binary potential outcomes 𝑡 ( 𝑠 ) , as indicators of a transition in period t if the treatment sequencead been 𝑠 . As before, 𝐴 𝑡 is used to denote a sequence 𝐴 𝑡 = { 𝐴 1 , … , 𝐴 𝑡 }nd 𝑋 𝑡 − is the observed covariates at t .

In a survival time setting, there are several interesting average ef-ects of sequences of treatments. One alternative is to study the averageffect on conditional transition probabilities, usually referred to as theransition rate or the hazard rate. However, it is well known that thedentification of such conditional transition rates is difficult (see e.g.an den, 2001 ). From the data we observe the outcomes for survivorsnder each treatment sequence, but due to dynamic selection it is diffi-ult to learn about one population of survivors from another populationf survivors.

For that reason we focus on average effects on the survival rate andhe comparison of the probability to survive from a starting point to time under one treatment sequence compared with survival throughout theame time interval under a reference treatment sequence. Formally, weave the contrast between the two treatment sequences 𝑠 𝑡 and 𝑠 ∗ 𝑡 thatan be any sequences of the M treatments 11 :

TE ( 𝑠 𝑡 , 𝑠 ∗ 𝑡 ) = 𝐸( 𝑌 𝑡 ( 𝑠 𝑡 ) = 0) − 𝐸( 𝑌 𝑡 ( 𝑠

∗ 𝑡 ) = 0) . (4)

Note that the ATE captures the full effect of the sequence of treat-ents. As before, let us use ALMPs to illustrate the main points. Consider

he causal contrast between two periods of work practice and two pe-iods with no treatment. The two periods of work practice may affect

1 𝑡 1 1 𝑡 1 1

Page 6: Dynamic treatment assignment and evaluation of active ... 2017.pdfDynamic treatment assignment and evaluation of active labor market ... assignment among the non-treated survivors

J. Vikström Labour Economics 49 (2017) 42–54

t

t

t

u

t

s

w

d

o

p

o

f

3

w

𝑡

{

𝑠

s

m

i

A

t

w

c

P

A

f

t

q

T

(

A

P

s

o

a

p

s

3

(

A

q1

w

𝑤

H

r

v

A

s

t

m

s

t

o

i

c

o

4

t

g

s

P

a

P

w

V

f

u

u

u

𝑐

f

s

W

b

w

c

4

t

J

C

t

he two-period survival rate in several ways. For instance, work prac-ice in the first period can affect the job finding rate in both the first andhe second period, so that some treated unemployed workers exit fromnemployment already in the first period. The framework proposed inhis paper captures such first period effects as well as any effects in theecond period. As a comparison, note that any analysis using only thoseho do not find employment before the end of an entire sequence, byefinition disregards any effect in the first period. By examining effectsn the survival rate, we also avoid conditioning the analysis to a sam-le that completes an entire sequence, since this implies conditioningn survival under a specific series of treatments, that is conditioning onuture outcomes which introduces additional selection problems.

.1. Assumptions and identification

Concerning identification, we extend Assumption (A1) to a settingith sequences of treatments. For all t , 𝑠 𝑡 −1 and all 𝑠 ∗ 𝑘 , k ≥ t with the first

− 1 components equal to 𝑠 𝑡 −1 , we assume that

𝑌 𝑠 ∗ 𝑘 𝑘 , 𝑘 = 𝑡, 𝑡 + 1 , …} ⊥ 𝑆 𝑡 | 𝑥 𝑡 − , 𝑆 𝑡 −1 = 𝑠 𝑡 −1 , 𝑌 𝑡 −1 = 0 . (A4)

In other words, for individuals surviving up until t under sequence 𝑡 −1 , treatment assignment in period t is random conditional on the ob-erved covariates. This holds in all situations in which decisions areade sequentially based on the survivor experience up to a certain point

n time, for instance, if case workers assign unemployed individuals toLMP programs based on time in unemployment, previous program par-

icipation and a set of observed covariates. We also generalize the no-anticipation assumption into this setting

ith sequences of treatments. For all t , 𝑠 , and all 𝑠 ∗ , with the first tomponents equal to 𝑠 𝑡 we assume that

r ( 𝑌 𝑡 ( 𝑠 ) = 1) = Pr ( 𝑌 𝑡 ( 𝑠 ∗ ) = 1) . (A5)

gain, future treatments should not affect current outcomes. In this case,or sequences that are identical up until t the potential outcomes arehe same in period t no matter future differences between the two se-uences. 12 Our main identification result is summarized in Theorem 3 :

heorem 3 (Identification of ATE ( 𝑠 𝑡 , 𝑠 ∗ 𝑡 ) ) . Suppose that (A4) and

A5) hold then

TE ( 𝑠 2 , 𝑠 ∗ 2 ) = 𝐸 𝑋 1 −

[ 𝐸 𝑋 2 − |𝑋 1 − ,𝑌 =0 ,𝑆 1 = 𝑠 1 { Pr ( 𝑌 2 = 0 |𝑋 2 − , 𝑌 = 0 , 𝑆 2 = 𝑠 2 )} Pr ( 𝑌 1 = 0 |𝑋 1 − , 𝑆 1 = 𝑠 1 )]

− 𝐸 𝑋 1 −

[𝐸 𝑋 2 − |𝑋 1 − ,𝑌 1 =0 ,𝑆 1 = 𝑠 ∗ 1 { Pr ( 𝑌 2 = 0 |𝑋 2 − , 𝑌 1 = 0 , 𝑆 2 = 𝑠

∗ 2 )}

Pr ( 𝑌 1 = 0 |𝑋 1 − , 𝑆 1 = 𝑠 ∗ 1 ) ].

roof. See Appendix B. □

Identification results for an arbitrary number of periods follow in theame way. The identification follows from the fact that the probabilityf the observed survival of the individuals, conditional on the covariatesnd the treatment sequence can be used to estimate the correspondingrobability for the potential outcomes under the same sequence. This isimilar to the line of reasoning in the case with one single treatment.

.2. Estimation

We also generalize the DIPW estimator. If Assumptions (A4) andA5) hold, then we have

TE ( 𝑠 𝑡 , 𝑠 ∗ 𝑡 ) =

𝑡 ∏𝑘 =1

⎡ ⎢ ⎢ ⎣ 1 −

∑𝑁

𝑖 =1 �� 𝑠 𝑘,𝑖 𝑌 𝑘,𝑖 𝟏 ( 𝑌 𝑘 −1 ,𝑖 = 0) 𝟏 ( 𝑆 𝑘,𝑖 = 𝑠 𝑘 ) ∑𝑁

𝑖 =1 �� 𝑠 𝑘,𝑖 𝟏 ( 𝑌 𝑘 −1 ,𝑖 = 0) 𝟏 ( 𝑆 𝑘,𝑖 = 𝑠 𝑘 )

⎤ ⎥ ⎥ ⎦

12 Besides Assumptions (A4) and (A5) an overlap condition and SUTVA are also re- uired. Here, the relevant overlap condition is: 0 < Pr ( 𝑆 𝑡 = 𝑠 𝑡 |𝑋 𝑡 − , 𝑆 𝑡 −1 = 𝑠 𝑡 −1 , 𝑌 𝑡 −1 = 0) < , ∀𝑡, 𝑠 𝑡 −1 , 𝑠 𝑡 .

t

t

47

𝑡 ∏𝑘 =1

⎡ ⎢ ⎢ ⎣ 1 −

∑𝑁

𝑖 =1 �� 𝑠 ∗ 𝑘,𝑖 𝑌 𝑘,𝑖 𝟏 ( 𝑌 𝑘 −1 ,𝑖 = 0) 𝟏 ( 𝑆 𝑘,𝑖 = 𝑠 ∗ 𝑘 ) ∑𝑁

𝑖 =1 �� 𝑠 ∗ 𝑘,𝑖 𝟏 ( 𝑌 𝑘 −1 ,𝑖 = 0) 𝟏 ( 𝑆 𝑘,𝑖 = 𝑠 ∗ 𝑘 )

⎤ ⎥ ⎥ ⎦ (5)

ith the estimated weights

𝑠 𝑘,𝑖

=

1 �� 𝑠 1 ( 𝑋 𝑖, 1 − )

∏𝑘 𝑚 =2 �� 𝑠 𝑚 ( 𝑋 𝑖,𝑚 − , 𝑠 𝑖,𝑚 −1 )

,

�� 𝑠 ∗ 𝑘 =

1 �� 𝑠 ∗ 1

( 𝑋 𝑖, 1 − ) ∏𝑘

𝑚 =2 �� 𝑠 ∗ 𝑚 ( 𝑋 𝑖,𝑚 − , 𝑠 ∗ 𝑖,𝑚 −1 )

.

ere, �� 𝑠 𝑡 ( 𝑋 𝑖,𝑡 − , 𝑠 𝑖,𝑡 −1 ) is the probability of obtaining treatment s t in pe-iod t given survival up until t under treatment sequence 𝑠 𝑡 −1 and co-ariates 𝑋 𝑡 − . Thus, 𝑝 𝑠 𝑡 ( 𝑋 𝑡 − , 𝑠 𝑡 −1 ) = Pr ( 𝑆 𝑡 = 𝑠 𝑡 |𝑋 𝑡 − , 𝑆 𝑡 −1 = 𝑠 𝑡 −1 , 𝑌 𝑡 −1 = 0) .gain, one way of obtaining standard errors is to use bootstrapping.

The intuition behind the weights is very similar to the case with oneingle treatment. In the first period, the only purpose of the weights iso re-weight the outcomes of the individuals on 𝑠 𝑡 and 𝑠 ∗ 𝑡 in order toimic the distribution of the covariates in the full population. In the

econd period, the weights also correct for the selective censoring dueo treatment assignment in period 2. Specifically, the weights dependn the inverse probability of remaining on the treatment sequence ofnterest conditional on the observed covariates, so that individuals withovariates such that they are more likely to diverge from the sequencef interest are given larger weight.

. Simulations

This section examines the finite sample properties of the DIPW es-imator with a focus on the case with a single treatment. 13 Data areenerated using a logistic model for the hazard rate out of the initialtate

r ( 𝑌 𝑡 = 1 |𝑌 𝑡 −1 = 0 , 𝑋, 𝑉 𝑌 ) = [1 + exp (−(3 . 0 + 𝑏 𝑌 𝑋 + 𝑐 𝑌 𝑉 𝑌 ))] −1

nd for the hazard rate into treatment

r ( 𝑆 = 𝑡 |𝑆 ≥ 𝑡, 𝑋, 𝑉 𝑆 ) = [1 + exp (−( 𝑎 𝑆 + 𝑏 𝑆 𝑋 + 𝑐 𝑆 𝑉 𝑆 ))] −1 ,

here X is assumed to be observed by the econometrician, and V S and Y are unobserved. All three are independent random variables drawn

rom a uniform distribution on the interval [ −1 , 1]. Thus, we allow fornobserved heterogeneity, but simulate under the restriction that thenobserved heterogeneity in the outcome and treatment equations arencorrelated, so that the unconfoundedness assumption holds.

Three baseline settings are considered: no heterogeneity ( 𝑏 𝑌 = 𝑏 𝑆 = 𝑌 = 𝑐 𝑆 = 0 ), observed heterogeneity ( 𝑏 𝑌 = 𝑏 𝑆 > 0 , 𝑐 𝑌 = 𝑐 𝑆 = 0 ) and aull heterogeneity setting ( 𝑏 𝑌 = 𝑏 𝑆 > 0 , 𝑐 𝑌 = 𝑐 𝑆 = 1 ). We set a S equal to2 . 0 or −3 . 0 , i.e. either a high or a low treatment rate. 14 Samples with aize of 10,000 are generated, and the number of replications is 10,000.e consider ATET t (1), that is, the effect of treatment in the first period,

ut similar results are obtained for other enrollment times. The propensity scores in the DIPW estimator weights are estimated

ith a correct logistic model specification. The standard errors are cal-ulated using bootstrapping (99 replications).

.1. Related estimators

In the simulations we compare the performance of the DIPW es-imator with the two-step matching estimator in Fredriksson andohansson (2008) (FJ henceforth) and the blocking estimator inrépon et al. (2009) (CFJV henceforth). Both papers propose estima-ors of the same average effects that are considered in this paper.

13 The working paper version of this paper reports simulation results for sequences of reatments ( Vikström, 2015 ). 14 In the case with 𝑏 𝑌 = 𝑏 𝑆 = 𝑐 𝑌 = 𝑐 𝑆 = 1 this implies that about 14% and 6% respec- ively start treatment in each period.

Page 7: Dynamic treatment assignment and evaluation of active ... 2017.pdfDynamic treatment assignment and evaluation of active labor market ... assignment among the non-treated survivors

J. Vikström Labour Economics 49 (2017) 42–54

Fig. 1. Simulation results 1 – by observed and unobserved covariate impact. Note: Bias for ATET 10 (1), i.e. the average effect of treatment in the first period on survival 10 peri- ods. b Y and b S measures the impact of the observed covariate in the exit rate and treat- ment rate equation, respectively. 𝑐 𝑌 = 𝑐 𝑆 = 0 implies no unobserved heterogeneity and 𝑐 𝑌 = 𝑐 𝑆 = 1 includes unobserved heterogeneity. Data generating processes for the logis- tic simulation models described in the text. DIPW is the dynamic IPW estimator and F-J the Fredriksson and Johansson (2008) estimator. Samples of sizes 10,000 and results are based on 10,000 replications.

s

m

o

m

w

s

u

e

s

a

w

t

s

l

o

4

e

o

a

t

b

uofme

s

o

i

t

l

t

s

u

i

s

r

t

t

l

h

u

P

P

P

i

r

T

s

a

s

o

t

t

e

s

t

5

t

e

5

p

o

s

e

c

a

a

e

t

16 The intuition is that the FJ estimator censors the not-yet treated when they become

The first step of the FJ estimator is one-to-one matching of treated atpecific duration, s , to non-treated survivors at s . Then, the samples ofatched treated and matched controls are used to construct estimates

f the hazard rates under treatment and no-treatment. In this step, anyatched control starting treatment after t is considered right-censoredhen (s)he starts treatment. As in FJ, 1-nearest neighbor propensity

core matching is used for the FJ estimator, and the scores are estimatedsing logistic regression models.

In the CFJV blocking estimator, propensity scores are estimated forach time to treatment, s . Then, the treated at s and the non-treated at are divided into blocks based on the predicted scores. For block b , theverage effect on the survival function is

𝑡

𝑘 = 𝑠

⎡ ⎢ ⎢ ⎣ 1 −

∑𝑖 ∈𝐽 𝑏 𝑌 𝑘,𝑖 𝟏 ( 𝑌 𝑘 −1 ,𝑖 = 0) 𝟏 ( 𝑆 𝑖 = 𝑠 ) ∑𝑖 ∈𝐽 𝑏 𝟏 ( 𝑌 𝑘 −1 ,𝑖 = 0) 𝟏 ( 𝑆 𝑖 = 𝑠 )

⎤ ⎥ ⎥ ⎦ −

𝑡 ∏𝑘 = 𝑠

⎡ ⎢ ⎢ ⎣ 1 −

∑𝑖 ∈𝐽 𝑏 𝑌 𝑘,𝑖 𝟏 ( 𝑌 𝑘 −1 ,𝑖 = 0) 𝟏 ( 𝑆 𝑖 > 𝑘 ) ∑𝑖 ∈𝐽 𝑏 𝟏 ( 𝑌 𝑘 −1 ,𝑖 = 0) 𝟏 ( 𝑆 𝑖 > 𝑘 )

⎤ ⎥ ⎥ ⎦ , here J b denotes the set of indices for all individuals in block b . Note

hat the hazard rate in each period is the fraction leaving the initialtate among the treatment group and the control group, of which theatter only consists of not-yet treated individuals. The overall effect isbtained by averaging over all the blocks. 15

.2. Simulation results

Initially, Fig. 1 compares the bias of the DIPW estimator and the FJstimator for a selection of values of b Y and b S , with and without un-bserved heterogeneity. We report results for survival over ten periods,nd note that low (high) values of b Y and b S correspond to limited (ex-ensive) observed heterogeneity. The results in the figure show that theias of the DIPW estimator is small in all cases. In the no heterogeneity

15 For the CFJV estimator the scores can be estimated in several different ways. CFJV ses a proportional hazard model with a piecewise constant baseline function and un- bserved heterogeneity. Another way, utilized here, is to estimate separate logit models or each s , using only the survivors at each s . Note that data are generated using logistic odels, so this implies using the correct functional form. We use ten blocks and standard

rrors are obtained using bootstrapping (99 replications).

tatt

ass

48

etting (with 𝑏 𝑌 = 𝑏 𝑆 = 0 ) the bias of the FJ estimator is small, but withbserved heterogeneity in the model the FJ estimator is biased. Thiss in line with the theoretical results in the working paper version ofhis paper, which shows that FJ ignores a selective censoring problem,eading to bias in many relevant settings Vikström(2014). 16 The bias ofhe FJ estimator is increases in b Y and b S , so that more pronounced ob-erved heterogeneity leads to greater bias, and the bias is reinforced byncorrelated unobserved heterogeneity.

These properties are confirmed by the full simulation results reportedn Table 1 . The results in this table for tests with a nominal size of 5%how that the DIPW estimator also has the correct size. Table 1 reportsesults for the CFJV estimator. We find that for our three baseline set-ings the bias of the CFJV estimator is small and of correct size. Addi-ional exploration of the CFJV estimator reveals that this holds for bothimited and extensive observed heterogeneity as well as for a low andigh treatment rate.

Besides the three baseline models, a model with two covariates (bothniform on [ − 1,1]) with time-varying selection is considered:

r ( 𝑌 𝑡 = 1 |𝑌 𝑡 −1 = 0 , 𝑋 1 , 𝑋 2 , 𝑉 𝑌 ) = [1 + exp (−(3 . 0 + 𝑋 1 + 𝑋 2 + 𝑉 𝑌 ))] −1

r ( 𝑆 = 𝑡 |𝑆 ≥ 𝑡, 𝑋 1 , 𝑋 2 , 𝑉 𝑆 ) = [1 + exp (−(2 . 0 + 𝑋 1 + 𝑉 𝑆 ))] −1 , 𝑡 = 1

r ( 𝑆 = 𝑡 |𝑆 ≥ 𝑡, 𝑋 1 , 𝑋 2 , 𝑉 𝑆 ) = [1 + exp (−(2 . 0 + 𝑋 2 + 𝑉 𝑆 ))] −1 , 𝑡 > 1 ,

.e. one covariate only affects selection into treatment in the first pe-iod and the other covariate affects selection in all subsequent periods.he results from this model, reported in the fourth panel of Table 1 ,how that the bias of the DIPW is small, but that both the FJ estimatornd the CFJV estimator are severely biased. The intuition behind the re-ults for the CFJV blocking estimator is that the blocking is only basedn the selection mechanism at 𝑡 = 1 and not on the subsequent selec-ion into treatment, and this leads to bias in the current setting with aime-varying impact of the covariates. 17 Another advantage of the DIPWstimator compared to the CFJV estimator is that for the CFJV the re-earcher has to decide upon the number of blocks and it is unclear howo select the optimal number of blocks.

. Application on Swedish ALMPs

This section illustrates the DIPW estimator using data on a work prac-ice program and a training program governed by the Swedish publicmployment service (PES).

.1. The programs

The aim of the work practice program is to provide long-term unem-loyed individuals with practical experience and employer contacts inrder to maintain and strengthen their productivity. The participantshould perform regular tasks at regular firms, even though they are notmployed by them. The duration of the program does not normally ex-eed six months, and is in fact usually much shorter. Participants receive grant for their participation in this and the other programs. Those whore entitled to unemployment insurance (UI) benefits receive a grantqual to their UI benefits.

The main purpose of the training program is to improve the skills ofhe unemployed person and thereby enhance their chances of obtaining

reated, but the FJ estimator does not adjust for the fact that this censoring introduces selection problem. The consequence of this is that individuals with X characteristics hat make them less likely to enter treatment will be overrepresented among the not-yet reated, and this confounds the analysis if these X characteristics also affect the outcome. 17 A generalized blocking estimator is to block based on both the propensity score at 𝑡 = 1 nd the scores at t > 1, addressing the problem raised in our simulations. However, with mall samples this is problematic, since then the blocking is based on several propensity cores.

Page 8: Dynamic treatment assignment and evaluation of active ... 2017.pdfDynamic treatment assignment and evaluation of active labor market ... assignment among the non-treated survivors

J. Vikström Labour Economics 49 (2017) 42–54

Table 1

Simulation results for the DIPW estimator and related estimators.

DIPW FJ CFJV

t size bias se size bias se size bias se [1] [2] [3] [4] [5] [6] [7] [8] [9]

No heterogeneity ( 𝑏 𝑌 = 𝑏 𝑆 = 𝑐 𝑌 = 𝑐 𝑆 = 0 ) 1 0.053 0.0002 0.0065 0.063 − 0.0044 0.0092 0.060 − 0.0095 0.0094 2 0.053 − 0.0056 0.0089 0.063 − 0.0013 0.013 0.056 − 0.030 0.013 5 0.056 0.0063 0.013 0.066 − 0.0089 0.019 0.051 − 0.011 0.018 10 0.055 0.0014 0.016 0.066 0.023 0.026 0.050 0.019 0.022

Observed heterogeneity ( 𝑏 𝑌 = 𝑏 𝑆 > 0 , 𝑐 𝑌 = 𝑐 𝑆 = 0 ) 1 0.055 0.0082 0.0066 0.063 0.0048 0.0095 0.053 0.0053 0.0068 2 0.051 0.0093 0.0089 0.062 − 0.05 0.013 0.056 − 0.0092 0.0094 5 0.057 0.015 0.013 0.068 − 0.38 0.020 0.053 − 0.021 0.013 10 0.056 0.017 0.016 0.10 − 1.3 0.026 0.048 − 0.036 0.016

Unobserved and observed heterogeneity ( 𝑏 𝑌 = 𝑏 𝑆 > 0 , 𝑐 𝑌 = 𝑐 𝑆 = 1 ) 1 0.051 − 0.00008 0.0069 0.075 − 0.0013 0.010 0.052 0.0087 0.0072 2 0.055 − 0.0021 0.0093 0.076 − .0097 0.014 0.052 − 0.0041 0.0096 5 0.055 − 0.0078 0.013 0.095 − 0.72 0.019 0.054 − 0.0044 0.013 10 0.052 − 0.013 0.015 0.16 − 2.0 0.025 0.051 − 0.037 0.016

Time-varying selection effect of covariates

1 0.054 0.0007 0.0070 0.071 − 0.0011 0.010 0.052 − 0.01 0.0073 2 0.062 0.017 0.0095 0.071 − 0.11 0.013 0.053 − 0.11 0.0096 5 0.055 0.012 0.013 0.093 − 0.71 0.019 0.091 − 0.75 0.013 10 0.052 0.028 0.015 0.16 − 1.8 0.025 0.24 − 1.8 0.015

Note: Data generating processes for the logistics simulation models described in Section 5 . DIPW estimates with bootstraped standard errors (99 replications). FJ is the Fredriksson and Johansson (2008) matching estimator implemented using 1-nearest neighbor propensity score matching. CFJV is the Crépon et al. (2009) blocking estimator applied using 10 blocks and bootstraped standard errors (99 replications). Bias has been multiplied by 100. Size is for 5% level tests. The results are based on 10,000 replications.

a

g

t

t

h

5

t

d

p

u

(

a

a

i

o

(

t

o

b

1

t

T

a

e

W

y

w

t

w

a

5

b

A

i

b

p

u

a

c

c

l

e

o

(

m

a

m

m

r

d

u

e

F

l

t

c

p

g

I

m

d

s

job. The contents of the courses should be directed towards the up-rading of skills or the acquisition of skills that are in short supply orhat are expected to be in short supply. These can be computer skills,echnical skills, manufacturing skills, and skills in services and medicalealth care.

.2. Data and estimation details

The population is taken from the register Händel administrated byhe PES, which includes all job seekers in Sweden. The register containsaily information on the time when an individual (i) became unem-loyed, (ii) entered into a labor market program and (iii) exited fromnemployment. It also includes information on the reason for the exitemployment, education, social assistance, disability or sickness insur-nce programs and lost contact), and personal characteristics recordedt the beginning of the unemployment spell. To these data we mergenformation on marital status, household characteristics (e.g., numberf children), labor income and income from various insurance schemese.g., sickness and disability) from the population registers. We also usehe unemployment records to construct detailed information on previ-us unemployment and short-term labor market history (e.g., the num-er and length of previous spells).

We sample all unemployed individuals in Händel between January, 1999 and December 31, 2006 who were aged between 25 and 55 athe time of entry into unemployment. The study ends in April 8, 2011.he spell ends when the unemployed individual finds employment for minimum period of 30 days. Spells with exits for reasons other thanmployment, such as lost contact, sickness or end of study, are censored.e aggregate the daily spell data to 30-day intervals. In robustness anal-

ses we also use 15-day and 60-day intervals. We focus on the effects ofork practice programs when work practice is the first program during

he unemployment spell. Individuals entering another program beforeork practice are right censored.

We use logit regression models to estimate the propensity scores andpply the trimming rule described in Section 2.6 ( t is set to 1%).

49

.3. Identification

In order to apply the DIPW estimator, two main assumptions have toe fulfilled: sequential unconfoundedness and no-anticipation. In thisLMP program setting, the sequential unconfoundedness assumption

mplies that conditional on the covariates treatment assignments shoulde random among the non-treated survivors. For individuals still unem-loyed in a given period treatment assignments in that period should benrelated to futures outcomes once we condition on the observed covari-tes. The credibility of this assumption depends to a large extent on theovariates that are controlled for in the analysis. We control for baselineharacteristics, including gender, age, age squared, an indicator for ateast one child in the household, marital status, country of origin (3 cat-gories), and level of education (5 categories). We also control for timef entry into unemployment (inflow year dummies), regional variation22 regions), local unemployment rate, unemployment benefits entitle-ent, short-term labor market history (labor income, social assistance

nd unemployment insurance benefits 1 and 2 years before unemploy-ent), and medium-run unemployment history (number of unemploy-ent days in the last 5 years).

In the baseline analysis we only use time-invariant covariates. Oneeason for this is that most of the characteristics used in the analyseso not change over time. For instance, the level of education and pre-nemployment labor market history remain the same throughout the un-mployment spell. Later we explore two sets of time-varying covariates.irst, we allow the local unemployment rate to vary over time, sinceabor market conditions are important for treatment assignments andhese conditions may vary over time. Second, we include time-varyingovariates measuring program participation before entering the workractice program. We use a time-varying indicator for any previous pro-ram participation, but we have also explored more elaborated models.n this analyses, we study all work practice episodes while we in theain analysis focus on cases where work practice is the first programuring the unemployment spell.

The variables used in the analysis are selected based on the re-ults from previous studies. For instance, Heckman et al. (1998) and

Page 9: Dynamic treatment assignment and evaluation of active ... 2017.pdfDynamic treatment assignment and evaluation of active labor market ... assignment among the non-treated survivors

J. Vikström Labour Economics 49 (2017) 42–54

H

n

p

r

M

v

T

i

g

a

h

s

d

r

c

b

m

b

l

a

a

s

p

w

i

i

t

a

t

p

t

t

i

a

a

d

(

p

t

u

a

f

q

o

m

b

t

t

t

e

f

p

i

f

i

ad

pwb

h

T

l

h

c

s

r

e

d

f

r

5

i

b

r

t

p

m

m

a

p

e

t

F

a

a

i

u

w

l

a

t

p

f

6

3

t

v

5

S

b

f

a

d

20 These patterns are explained by a dynamic selection problem within the intervals. With 60-day intervals the treated consists of those treated within a 60-day period. The non-treated includes those who leave unemployment within the 60-day period without

eckman and Smith (1999) stress that besides using basic socioeco-omic variables, it is also very important to control for previous unem-loyment, lagged earnings, and local labor market characteristics. Moreecently, Lechner and Wunsch (2013) use German data and Empiricalonte Carlo methods to perform an innovative examination of which

ariables are important to control for in evaluations of ALMP programs.hey conclude that it is important to control for baseline characteristics,

nformation about the timing of entry into unemployment and the pro-ram, regional information, pre-treatments outcomes (e.g., employmentnd earnings four years before the program) and short-run labor marketistory. Here, we control for similar types of variables, but there areome differences compared to Lechner and Wunsch (2013) . The mainifference is that they include more detailed information on the short-un labor market history. In this paper we control for labor income, so-ial assistance and unemployment insurance benefits one and two yearsefore the start of the unemployment spell. We have also explored usingore detailed controls in a similar way as Lechner and Wiehler (2013) ,

ut the results did not change in any significant way. 18

In addition, we have some specific information on the program se-ection process in Sweden. The selection into the ALMP programs is two-sided selection process, involving both the unemployed workernd the caseworker at the local employment office. However, there isome evidence that caseworkers have a large degree of discretionaryower over enrollment in Swedish ALPM programs. In an experimentith caseworkers, Eriksson (1997) finds that case worker heterogeneity

s more important for program participation than individual heterogene-ty. Using Swedish register data, Carling and Richardson (2001) showhat program enrollment depends more on the local employment officeffiliation than on the unemployed individuals ’ own characteristics. Allhis suggests that individual self-selection into the program is less of aroblem, lending additional support to the unconfoundedness assump-ion. 19

However, caseworkers might also have and use additional informa-ion about the unemployed workers. For instance, since case workersnteract with the unemployed workers they learn about their person-lity traits and individual motivation. This and other unobserved vari-bles are likely to affect treatment assignments, which would invali-ate the unconfoundedness assumption. However, using German data Caliendo et al., 2017 ) use survey information on personality traits, ex-ectations, attitudes and job search behaviour. However, they concludehat such survey information makes little difference compared to onlysing the rich administrative data. We also argue that previous outcomesre good proxies for individual motivation, since motivation should af-ect both previous and current outcomes. All this suggests that the se-uentially unconfoundedness is fulfilled, but we can of course never ruleut that there is some other unobserved factor that affects both treat-ent assignments and the outcome of interest.

No-anticipation holds if the unemployed workers do not alter theirehaviour as a response to their knowledge of future treatments, or ifhey are unaware of future treatments. In evaluations of ALMP programshis assumption is problematic since several recent studies have foundhat the unemployed react to information about future treatments (see.g., Black et al., 2003 ). However, in many cases the individual is in-ormed about the work practice program shortly before the start of therogram, and this limits any substantial anticipation effects. However,f the no-anticipation assumption is violated the not-yet treated react touture treatments. If the non-treated dislikes future treatments, searchntensity should increase and reservation wages decrease, leading to

18 Another difference compared with Lechner and Wiehler (2013) is that we control for ge using age and age squared, whereas Lechner and Wunsch (2013) use a set of age ummies for different age groups. 19 The caseworker discretion is problematic if caseworkers who frequently assign unem- loyed workers to ALMPs, are systematically better or worse at helping the unemployed orkers to find jobs, since this will introduce a correlation between the probability of ecoming treated and unemployment durations.

bfw

ttesjt

50

igher transitions rates among the not-yet treated in the control group.his will bias the estimated effect downwards.

Besides sequential unconfoundedness and no-anticipation, the over-ap condition, SUTVA and sequentially unconfounded censoring need toold. The sequential censoring assumption implies, that, conditional onovariates, any right censoring, for instance, due to drop-out from thetudy, should be random among the group of survivors. Here, the maineason for right censoring is that the PES has lost contact with the un-mployed worker. This makes the right censoring assumption somewhatifficult to assess. However, note that any selective right censoring af-ects both the treated and the non-treated, having an impact the survivalate estimates under both treatment and no treatment.

.4. Effects of work practice

The estimation results for one work practice episode are presentedn Fig. 2 . Each figure gives the results for effects after a certain num-er of months. Note that the results are for the effect on the fractione-employed instead of the effect on the survival rate. The figure showshat in all cases, there are substantial locking-in effects with lower em-loyment rates during the first months after enrollment. In the firstonth after assignment the re-employment rate is lower in the treat-ent group. After this period participants catch up and about 450 days

fter enrollment the employment rate on average is about 6 percentageoints higher among the participants. We also see that the size of theffect varies with enrollment time, but Fig. 2 indicates no clear patternhat early enrollment is better than late enrollment.

This analysis adjusts for a large set of time-invariant covariates. Next,igs. C.1 and C.2 in Appendix C explore time-varying covariates. Fig. C.1llows the local unemployment rate to vary over time and Fig. C.1 inddition adds time-varying covariates measuring previous participationn any other program besides work practice. In both cases, the resultssing time-varying covariates are very similar to our baseline resultsith only time-invariant covariates. This holds even though both local

abor conditions and previous program participation affects treatmentssignments.

In our main analysis, the daily unemployment data is aggregatedo 30-day intervals. As robustness analyses, Figure C.3 in Appendix Cresent estimates using 15-day and 60-day intervals. The estimated ef-ect of work practice on the re-employment rate is more negative with0-day intervals and more positive with 15-day intervals compared with0-day intervals. However, from around 200 days and onwards the es-imated effects are very similar with 15-day intervals and 30-day inter-als. 20

.5. Effects of sequences of programs

We now consider effects of different sequences of the two programs.pecifically, we explore treatment sequences defined by a specific com-ination of the first and the second entry time. 21 Table 2 reports the ef-ects of different sequences of the two programs when these sequencesre compared with not enrolling into a program. We present results forifferent times to the first episode and different times between the first

eing treated and those who remain non-treated throughout the entire interval. Since the ormer group is positively selected this creates a dynamic selection, which is more severe ith larger intervals.

21 Due to sample size restrictions the data in this analysis is aggregated to 45-day in- ervals. We also adjust the trimming rule and set t to 4% and for sequences with two reatments we set t to 20%. Moreover, for sequences defined by a single work practice pisode the propensity scores are estimated separately in each time period for the cen- oring due to a first program episode and thereafter the propensity scores are estimated ointly. The latter joint logit model includes time period dummies. Similar joint estima- ions are performed when the sequence consists of two treatments.

Page 10: Dynamic treatment assignment and evaluation of active ... 2017.pdfDynamic treatment assignment and evaluation of active labor market ... assignment among the non-treated survivors

J. Vikström Labour Economics 49 (2017) 42–54

Fig. 2. Effect of work practice on fraction reemployed. By pre-treatment duration. Note: DIPW estimates with bootstraped standard errors (99 replications).

Table 2

Treatment sequences and cumulative employment rates. By type of programs, timing and spacing of the programs.

Time between programs

Work practice second program Training second program

Time to first program No second 45–135 days 136–225 days 226–315 days 45–135 days 136–225 days 226–315 days [1] [2] [3] [4] [5] [6] [7]

Panel A Work practice first program

91–180 days − 0.93 0.26 0.09 0.17 0.16 3.17 − 0.53 (0.14) (0.74) (0.44) (0.39) (0.70) (0.68) (0.62)

181–270 days − 1.32 − 0.48 − 0.75 − 0.39 0.70 2.12 0.45 (0.20) (0.85) (0.54) (0.70) (1.12) (0.72) (1.19)

271–360 days − 0.78 0.44 0.27 2.36 1.08 (0.21) (0.89) (0.91) (1.24) (1.14)

Panel B: Training first program

91–180 days − 1.44 − 1.81 − 0.93 0.11 0.37 (0.13) (1.04) (0.75) (0.81) (0.74)

181–270 days − 1.86 − 3.34 − 2.38 0.56 − 0.40 (0.17) (0.93) (2.08) (1.12) (1.25)

271–360 days − 1.90 − 2.48 − 1.17 (0.21) (2.37) (1.95)

Note: The table reports cumulated re-employment rates for the first 36 months of unemployment. The comparison sequence is participating in no program. Swedish data for the period 1999–2006. The covariates used in the weighting are gender, age number of unemployment days in the last 5 years, level of education (3 categories), indicator for UI entitlement, region of residence (6 regions), indicator for at least one child in the household, marital status, foreign born, labor income, social assistance and unemployment benefits one and two years before the start of the unemployment, and calendar year (for inflow). The threshold for the trimming rate is 4% for the single program estimates and 10% for combinations of two programs.

a

e

e

os

i

t

nd the second program. 22 For presentation reasons, we average theffects over pre-treatment intervals and report effects on cumulated re-mployment rates (first 36 months). The cumulated effects are scaled

22 Note that the time between the two programs refers to the time between the start f the two programs, so that the actual time between the two programs is significantly horter.

s

t

t

t

51

n such a way that they can be interpreted as the effect on the averageruncated unemployment duration in months.

Initially, consider the results in columns 2–4 of Panel A for differentequences of work practice episodes. Comparing these estimates withhe estimates for one single work practice episode in column 1 we seehat in all cases a second program episode leads to a smaller decrease ofime in unemployment compared with one single work practice episode.

Page 11: Dynamic treatment assignment and evaluation of active ... 2017.pdfDynamic treatment assignment and evaluation of active labor market ... assignment among the non-treated survivors

J. Vikström Labour Economics 49 (2017) 42–54

T

b

d

p

1

c

i

w

q

s

t

f

r

s

a

i

t

t

o

t

s

o

6

f

m

t

d

t

e

m

n

r

t

u

e

t

l

r

m

v

c

s

a

A

A

i

A

sp

f

P

S

t

𝐸

𝐸

i

s

a

s

o

i

t

m

A

e

A

F

0

i∏T

𝑝

N

t

𝑌

his holds for all the timings of the first program and for all the spacingsetween the first and the second program episode, even though theseifferences are not always significant.

Columns 5–7 of Panel A presents results for sequences with first workractice and then training. 23 From a comparison of the results in column for only one single episode of work practice and the results in the otherolumns, we see that such combinations of work practice and trainingn all cases lead to a less favorable employment effect than one singleork practice episode.

Panel B reports results when training is the first program in the se-uence. From column 1 we see that a single training program leads tohorter unemployment durations. We also see that later enrolment inraining is associated with greater employment effects. Next, the resultsor sequences of training episodes in columns 2–4 of Panel B show thatepeated episodes of training seem to be a waste of resources. One pos-ible explanation is that another training episode only gives rise to andditional locking-in period and this is not counteracted by greater pos-tive post-program effects of the second training episode.

Finally, columns 5–7 of Panel B show results for sequences with firstraining and then work practice. We find some evidence that work prac-ice after completed training can reduce unemployment more than justne single training episode. One reason for this can be that work prac-ice in the occupation for which the individuals have been trained canerve as a quick way into a new type of occupation or into a new typef labor market.

. Conclusions

This paper has re-considered treatment evaluation under uncon-oundedness in a dynamic treatment assignment setting in which treat-ent may start at any point in time. The outcome of interest is survival

ime and together with the dynamic treatment assignment this intro-uces well-known methodological issues. We have proposed a DIPW es-imator to estimate average effects in this setting, focusing both on theffects of single treatment and sequences of treatments. The new esti-ator involves separate weights for each time period and the use of theot-yet treated in each period to estimate the counterfactual survivalate.

An analysis of data from two Swedish ALMP programs illustrateshe DIPW estimator. Since the ALMP programs can start at any elapsednemployment duration and the outcome of interest is the time in un-mployment this offers a key application of the estimators introduced inhis paper. The result is that participation in the work practice programeads to significantly increased employment rates compared with nevereceiving treatment.

We have also studied effects of sequences of work practice and laborarket training. One key result is that enrolling an unemployed indi-

idual twice in the same program or in two different programs in mostases leads to longer unemployment spells than only participating in aingle program once. This holds for most timings of the first programnd most spacings between the two program episodes.

ppendix A. Proofs

1. Identification

We show that if Assumption (A1) and (A2) hold then ATET t ( s ) isdentified. Consider identification of

TET 𝑡 ( 𝑠 ) = Pr ( 𝑌 𝑡 ( 𝑠 ) = 0 |𝑆 = 𝑠, 𝑌 𝑠 −1 ( 𝑠 ) = 0)

− Pr ( 𝑌 𝑡 (0) = 0 |𝑆 = 𝑠, 𝑌 𝑠 −1 ( 𝑠 ) = 0) (A.1)

23 Since the training program normally lasts for several months we are unable to consider equences where the second program starts less than 136 days after the start of the training rogram.

B

∏52

or 𝑡 = 2 and 𝑠 = 1 . First, for treatment we have

r ( 𝑌 2 (1) = 0 |𝑆 = 1) = Pr ( 𝑌 2 = 0 , 𝑌 1 = 0 |𝑆 = 1) . (A.2)

econd, for the counterfactual outcome Pr ( 𝑌 2 (0) = 0 |𝑆 = 1) under noreatment we have

( 𝑌 2 (0) = 0 |𝑆 = 1) = 𝐸 𝑋 1 − |𝑆=1 [ 𝐸( 𝑌 2 (0) = 0 |𝑆 = 1 , 𝑋 1 − )] (A.3)

𝑋 1 − |𝑆=1 [ Pr ( 𝑌 2 (0) = 0 |𝑌 1 (0) = 0 , 𝑆 = 1 , 𝑋 1 − ) Pr ( 𝑌 1 (0) = 0 |𝑆 = 1 , 𝑋 1 − )] = 𝐸 𝑋 1 − |𝑆=1 [ Pr ( 𝑌 2 (0) = 0 |𝑌 1 (0) = 0 , 𝑆 > 1 , 𝑋 1 − ) Pr ( 𝑌 1 (0) = 0 |𝑆 > 1 , 𝑋 1 − )] = 𝐸 𝑋 1 − |𝑆=1 [ 𝐸 𝑋 2 − |𝑋 1 − ,𝑌 1 (0)=0 ,𝑆=1 { Pr ( 𝑌 2 (0) = 0 |𝑌 1 (0) = 0 , 𝑆 > 1 , 𝑋 2 − )} Pr ( 𝑌 1 (0) = 0 |𝑆 > 1 , 𝑋 1 − )]

= 𝐸 𝑋 1 − |𝑆=1 [ 𝐸 𝑋 2 − |𝑋 1 − ,𝑌 1 (0)=0 ,𝑆=1 { Pr ( 𝑌 2 (0) = 0 |𝑌 1 (0) = 0 , 𝑆 > 2 , 𝑋 2 − )} Pr ( 𝑌 1 (0) = 0 |𝑆 > 1 , 𝑋 1 − )]

= 𝐸 𝑋 1 − |𝑆=1 [ 𝐸 𝑋 2 − |𝑋 1 − ,𝑌 =0 ,𝑆=1 { Pr ( 𝑌 2 = 0 |𝑌 1 = 0 , 𝑆 1 > 2 , 𝑋 2 − )} Pr ( 𝑌 1 = 0 |𝑆 > 1 , 𝑋 1 − )] . (A.4)

Note that the first and the second equality follows from the law ofterated expectations and Bayes rule, the third equality by applying As-umption (A1) for the first period, the fourth equality by the law of iter-ted expectations and the fifth holds by applying Assumption (A1) in theecond period. Finally, the sixth equality holds by going from potentialutcomes to observed outcomes. In this step, Assumption (A2) plays anmportant role. This assumption assures that the observed outcomes ofhe not-yet treated corresponds to the potential outcome under no treat-ent, even though some of the not-yet treated becomes treated later on.

2. Dynamic IPW estimation

We now show that the if Assumptions (A1) and (A2) hold the DIPWstimator is a consistent estimator of

TET 𝑡 ( 𝑠 ) = Pr ( 𝑌 𝑡 ( 𝑠 ) = 0 |𝑆 = 𝑠, 𝑌 𝑠 −1 ( 𝑠 ) = 0)

− Pr ( 𝑌 𝑡 (0) = 0 |𝑆 = 𝑠, 𝑌 𝑠 −1 ( 𝑠 ) = 0) . (A.5)

irst, consider estimation of the outcome under treatment, Pr ( 𝑌 𝑡 ( 𝑠 ) = |𝑆 = 𝑠, 𝑌 𝑠 −1 ( 𝑠 ) = 0) . For this part of the ATET t ( s ), the DIPW estimators: 𝑡

𝑘 = 𝑠

[

1 −

∑𝑖 𝑌 𝑘,𝑖 𝟏 ( 𝑌 𝑘 −1 ,𝑖 = 0) 𝟏 ( 𝑆 𝑖 = 𝑠 ) ∑𝑖 𝟏 ( 𝑌 𝑘 −1 ,𝑖 = 0) 𝟏 ( 𝑆 𝑖 = 𝑠 )

]

(A.6)

aking the probability limit of (A.6) gives

lim

𝑁→∞

𝑡 ∏𝑘 = 𝑠

[

1 −

∑𝑖 𝑌 𝑘,𝑖 𝟏 ( 𝑌 𝑘 −1 ,𝑖 = 0) 𝟏 ( 𝑆 𝑖 = 𝑠 ) ∑𝑖 𝟏 ( 𝑌 𝑘 −1 ,𝑖 = 0) 𝟏 ( 𝑆 𝑖 = 𝑠 )

]

=

𝑡 ∏𝑘 = 𝑠

[

1 −

𝑝 lim 𝑁→∞∑

𝑖 𝑌 𝑘,𝑖 𝟏 ( 𝑌 𝑘 −1 ,𝑖 = 0) 𝟏 ( 𝑆 𝑖 = 𝑠 )

𝑝 lim 𝑁→∞∑

𝑖 𝟏 ( 𝑌 𝑘 −1 ,𝑖 = 0) 𝟏 ( 𝑆 𝑖 = 𝑠 )

]

=

𝑡 ∏𝑘 = 𝑠

[

1 −

𝔼 [ 𝑌 𝑘 𝟏 ( 𝑌 𝑘 −1 = 0) 𝟏 ( 𝑆 = 𝑠 )]

𝔼 [ 𝟏 ( 𝑌 𝑘 −1 = 0) 𝟏 ( 𝑆 = 𝑠 )]

]

. (A.7)

ext, the observed outcome, Y k , corresponds to the individuals ’ actualreatment regime, so that for individuals with 𝑆 = 𝑠 we have that 𝑌 𝑘 = 𝑘 ( 𝑠 ) . Then,

𝑡

𝑘 = 𝑠

[

1 −

𝔼 [ 𝑌 𝑘 𝟏 ( 𝑌 𝑘 −1 = 0) 𝟏 ( 𝑆 = 𝑠 )]

𝔼 [ 𝟏 ( 𝑌 𝑘 −1 = 0) 𝟏 ( 𝑆 = 𝑠 )]

]

=

𝑡 ∏𝑘 = 𝑠

[

1 −

𝔼 [ 𝑌 𝑘 ( 𝑠 ) 𝟏 ( 𝑌 𝑘 −1 ( 𝑠 ) = 0) 𝟏 ( 𝑆 = 𝑠 )]

𝔼 [ 𝟏 ( 𝑌 𝑘 −1 ( 𝑠 ) = 0) 𝟏 ( 𝑆 = 𝑠 )]

]

. (A.8)

y simplifying we obtain

𝑡

𝑘 = 𝑠

[

1 −

𝔼 [ 𝑌 𝑘 ( 𝑠 ) 𝟏 ( 𝑌 𝑘 −1 ( 𝑠 ) = 0) 𝟏 ( 𝑆 = 𝑠 )]

𝔼 [ 𝟏 ( 𝑌 𝑘 −1 ( 𝑠 ) = 0) 𝟏 ( 𝑆 = 𝑠 )]

]

Page 12: Dynamic treatment assignment and evaluation of active ... 2017.pdfDynamic treatment assignment and evaluation of active labor market ... assignment among the non-treated survivors

J. Vikström Labour Economics 49 (2017) 42–54

T

𝑝

P

e

∏T

𝑝

N

1

𝔼

I

Y

t

𝔼

N

𝔼

S

𝔼

𝔼

N

𝔼

S

𝔼

I

o

e

1

p

P

s

𝔼

N

t

a

1

𝔼

2 1

=

𝑡 ∏𝑘 = 𝑠

[

1 −

Pr ( 𝑌 𝑘 ( 𝑠 ) = 1 , 𝑌 𝑘 −1 ( 𝑠 ) = 0 , 𝑆 = 𝑠 )

Pr ( 𝑌 𝑘 −1 ( 𝑠 ) = 0 , 𝑆 = 𝑠 )

]

=

𝑡 ∏𝑘 = 𝑠

[1 − Pr ( 𝑌 𝑘 ( 𝑠 ) = 1 |𝑆 = 𝑠, 𝑌 𝑘 −1 ( 𝑠 ) = 0)

]= Pr ( 𝑌 𝑡 ( 𝑠 ) = 0 |𝑆 = 𝑠, 𝑌 𝑠 −1 ( 𝑠 ) = 0) . (A.9)

hus, from (A.7) –(A.9) we have that

lim

𝑁→∞

𝑡 ∏𝑘 = 𝑠

[

1 −

∑𝑖 𝑌 𝑘,𝑖 𝟏 ( 𝑌 𝑘 −1 ,𝑖 = 0) 𝟏 ( 𝑆 𝑖 = 𝑠 ) ∑𝑖 𝟏 ( 𝑌 𝑘 −1 ,𝑖 = 0) 𝟏 ( 𝑆 𝑖 = 𝑠 )

]

= Pr ( 𝑌 𝑡 ( 𝑠 ) = 0 |𝑆 = 𝑠, 𝑌 𝑠 −1 ( 𝑠 ) = 0) . (A.10)

Second, consider estimation of the outcome under no treatment,r ( 𝑌 𝑡 (0) = 0 |𝑆 = 𝑠, 𝑌 𝑠 −1 ( 𝑠 ) = 0) . For this part of the ATET t ( s ) the DIPWstimator is:

𝑡

𝑘 = 𝑠

⎡ ⎢ ⎢ ⎢ ⎣ 1 −

∑𝑖

𝑝 𝑠 ( 𝑋 𝑖,𝑠 − ) ∏𝑘 𝑚 = 𝑠 1− 𝑝 𝑚 ( 𝑋 𝑖,𝑚 − )

𝑌 𝑘,𝑖 𝟏 ( 𝑌 𝑘 −1 ,𝑖 = 0) 𝟏 ( 𝑆 𝑖 > 𝑘 ) ∑𝑖

𝑝 𝑠 ( 𝑋 𝑖,𝑠 − ) ∏𝑘 𝑚 = 𝑠 1− 𝑝 𝑚 ( 𝑋 𝑖,𝑚 − )

𝟏 ( 𝑌 𝑘 −1 ,𝑖 = 0) 𝟏 ( 𝑆 𝑖 > 𝑘 )

⎤ ⎥ ⎥ ⎥ ⎦ . (A.11)

aking the probability limit of (A.11) gives

lim

𝑁→∞

𝑡 ∏𝑘 = 𝑠

⎡ ⎢ ⎢ ⎢ ⎣ 1 −

∑𝑖

𝑝 𝑠 ( 𝑋 𝑖,𝑠 − ) ∏𝑘 𝑚 = 𝑠 1− 𝑝 𝑚 ( 𝑋 𝑖,𝑚 − )

𝑌 𝑘,𝑖 𝟏 ( 𝑌 𝑘 −1 ,𝑖 = 0) 𝟏 ( 𝑆 𝑖 > 𝑘 ) ∑𝑖

𝑝 𝑠 ( 𝑋 𝑖,𝑠 − ) ∏𝑘 𝑚 = 𝑠 1− 𝑝 𝑚 ( 𝑋 𝑖,𝑚 − )

𝟏 ( 𝑌 𝑘 −1 ,𝑖 = 0) 𝟏 ( 𝑆 𝑖 > 𝑘 )

⎤ ⎥ ⎥ ⎥ ⎦ =

𝑡 ∏𝑘 = 𝑠

⎡ ⎢ ⎢ ⎢ ⎣ 1 −

𝑝 lim 𝑁→∞∑

𝑖 𝑝 𝑠 ( 𝑋 𝑖,𝑠 − ) ∏𝑘

𝑚 = 𝑠 1− 𝑝 𝑚 ( 𝑋 𝑖,𝑚 − ) 𝑌 𝑘,𝑖 𝟏 ( 𝑌 𝑘 −1 ,𝑖 = 0) 𝟏 ( 𝑆 𝑖 > 𝑘 )

𝑝 lim 𝑁→∞∑

𝑖 𝑝 𝑠 ( 𝑋 𝑖,𝑠 − ) ∏𝑘

𝑚 = 𝑠 1− 𝑝 𝑚 ( 𝑋 𝑖,𝑚 − ) 𝟏 ( 𝑌 𝑘 −1 ,𝑖 = 0) 𝟏 ( 𝑆 𝑖 > 𝑘 )

⎤ ⎥ ⎥ ⎥ ⎦ =

𝑡 ∏𝑘 = 𝑠

⎡ ⎢ ⎢ ⎢ ⎢ ⎣ 1 −

𝔼 [

𝑝 𝑠 ( 𝑋 𝑠 − ) ∏𝑘 𝑚 = 𝑠 1− 𝑝 𝑚 ( 𝑋 𝑚 − )

𝑌 𝑘 𝟏 ( 𝑌 𝑘 −1 = 0) 𝟏 ( 𝑆 > 𝑘 ) ]

𝔼 [

𝑝 𝑠 ( 𝑋 𝑠 − ) ∏𝑘 𝑚 = 𝑠 1− 𝑝 𝑚 ( 𝑋 𝑚 − )

𝟏 ( 𝑌 𝑘 −1 = 0) 𝟏 ( 𝑆 > 𝑘 ) ]

⎤ ⎥ ⎥ ⎥ ⎥ ⎦ . (A.12)

ext, consider 𝐸[ 𝑝 𝑠 ( 𝑋 𝑠 ) ∏𝑘 𝑚 = 𝑠 1− 𝑝 𝑚 ( 𝑋 𝑚 )

𝑌 𝑘 1( 𝑌 𝑘 −1 = 0)1( 𝑆 > 𝑘 )] , for 𝑘 = 2 and 𝑠 =

. Then, we need to consider

[ 𝑝 1 ( 𝑋 1 − )

[1 − 𝑝 1 ( 𝑋 1 − )][1 − 𝑝 2 ( 𝑋 2 − )] 𝑌 2 𝟏 ( 𝑌 1 = 0) 𝟏 ( 𝑆 > 2)

] . (A.13)

f no-anticipation as in Assumption (A2) holds, the observed outcome k for individuals treated after k (individuals with S > k ) corresponds to

he potential outcome under no-treatment, Y k (0), so that

[ 𝑝 1 ( 𝑋 1 − )

[1 − 𝑝 1 ( 𝑋 1 − )][1 − 𝑝 2 ( 𝑋 2 − )] 𝑌 2 𝟏 ( 𝑌 1 = 0) 𝟏 ( 𝑆 > 2)

] = 𝔼

[ 𝑝 1 ( 𝑋 1 − )

[1 − 𝑝 1 ( 𝑋 1 − )][1 − 𝑝 2 ( 𝑋 2 − )] 𝑌 2 (0) 𝟏 ( 𝑌 1 (0) = 0) 𝟏 ( 𝑆 > 2)

] . (A.14)

ext, by the law of iterated expectations

[ 𝑝 1 ( 𝑋 1 − )

[1 − 𝑝 1 ( 𝑋 1 − )][1 − 𝑝 2 ( 𝑋 2 − )] 𝑌 2 (0) 𝟏 ( 𝑌 1 (0) = 0) 𝟏 ( 𝑆 > 2)

] = 𝔼 𝑋 1 −

(

𝔼 [

𝑝 1 ( 𝑋 1 − ) [1 − 𝑝 1 ( 𝑋 1 − )][1 − 𝑝 2 ( 𝑋 2 − )]

𝑌 2 (0) 𝟏 ( 𝑌 1 (0) = 0) 𝟏 ( 𝑆 > 2) |𝑋 1 −

] )

.

(A.15)

implifying gives

𝑋 1 −

(

𝔼 [

𝑝 1 ( 𝑋 1 − ) [1 − 𝑝 1 ( 𝑋 1 − )][1 − 𝑝 2 ( 𝑋 2 − )]

𝑌 2 (0) 𝟏 ( 𝑌 1 (0) = 0) 𝟏 ( 𝑆 > 2) |𝑋 1 −

] )

= 𝔼 𝑋 1 −

(

𝑝 1 ( 𝑋 1 − ) 1 − 𝑝 1 ( 𝑋 1 − )

𝔼 [ 𝑌 2 (0) 𝟏 ( 𝑌 1 (0) = 0) 𝟏 ( 𝑆 > 2)

1 − 𝑝 2 ( 𝑋 2 − ) |𝑋 1 −

] )

= 𝔼 𝑋 1 −

(

𝑝 1 ( 𝑋 1 − ) 1 − 𝑝 1 ( 𝑋 1 − )

𝔼 [ 𝑌 2 (0) 𝟏 ( 𝑌 1 (0) = 0) 𝟏 ( 𝑆 > 2)

1 − 𝑝 2 ( 𝑋 2 − ) |𝑆 > 1 , 𝑋 1 −

] [1 − 𝑝 1 ( 𝑋 1 − )]

)

(A.16)

53

𝑋 1 −

(

𝑝 1 ( 𝑋 1 − ) 𝔼 [ 𝑌 2 (0) 𝟏 ( 𝑌 1 (0) = 0) 𝟏 ( 𝑆 > 2)

1 − 𝑝 2 ( 𝑋 2 − ) |𝑆 > 1 , 𝑋 1 −

] )

= 𝔼 𝑋 1 − (𝑝 1 ( 𝑋 1 − ) Pr ( 𝑌 1 (0) = 0 |𝑆 > 1 , 𝑋 1 − )

𝔼 [ 𝑌 2 (0) 𝟏 ( 𝑆 > 2) 1 − 𝑝 2 ( 𝑋 2 − )

|𝑌 1 (0) = 0 , 𝑆 > 1 , 𝑋 1 −

] )

.

ext, by the law of iterated expectations

𝑋 1 −

(𝑝 1 ( 𝑋 1 − ) Pr ( 𝑌 1 (0) = 0 |𝑆 > 1 , 𝑋 1 − )

𝔼 [ 𝑌 2 (0) 𝟏 ( 𝑆 > 2) 1 − 𝑝 2 ( 𝑋 2 − )

|𝑌 1 (0) = 0 , 𝑆 > 1 , 𝑋 1 −

] )

= 𝔼 𝑋 1 − (𝑝 1 ( 𝑋 1 − ) Pr ( 𝑌 1 (0) = 0 |𝑆 > 1 , 𝑋 1 − ) 𝔼 𝑋 2 − |𝑋 1 − ,𝑌 1 (0)=0 ,𝑆> 1 (

𝔼 [ 𝑌 2 (0) 𝟏 ( 𝑆 > 2) 1 − 𝑝 2 ( 𝑋 2 − )

|𝑌 1 (0) = 0 , 𝑆 > 1 , 𝑋 2 −

] ) )

. (A.17)

implifying gives

𝑋 1 −

(𝑝 1 ( 𝑋 1 − ) Pr ( 𝑌 1 (0) = 0 |𝑆 > 1 , 𝑋 1 − ) 𝔼 𝑋 2 − |𝑋 1 − ,𝑌 1 (0)=0 ,𝑆> 1 (

𝔼 [ 𝑌 2 (0) 𝟏 ( 𝑆 > 2) 1 − 𝑝 2 ( 𝑋 2 − )

|𝑌 1 (0) = 0 , 𝑆 > 1 , 𝑋 2 −

] ) )

= 𝔼 𝑋 1 − (𝑝 1 ( 𝑋 1 − ) Pr ( 𝑌 1 (0) = 0 |𝑆 > 1 , 𝑋 1 − ) 𝔼 𝑋 2 − |𝑋 1 − ,𝑌 1 (0)=0 ,𝑆> 1 (

𝔼 [𝑌 2 (0) 𝟏 ( 𝑆 > 2) |𝑌 1 (0) = 0 , 𝑆 > 1 , 𝑋 2 −

]1 − 𝑝 2 ( 𝑋 2 − )

) )

= 𝔼 𝑋 1 − (𝑝 1 ( 𝑋 1 − ) Pr ( 𝑌 1 (0) = 0 |𝑆 > 1 , 𝑋 1 − ) 𝔼 𝑋 2 − |𝑋 1 − ,𝑌 1 (0)=0 ,𝑆> 1 (

𝔼 [𝑌 2 (0) |𝑌 1 (0) = 0 , 𝑆 > 2 , 𝑋 2 −

][1 − 𝑝 2 ( 𝑋 2 − )]

1 − 𝑝 2 ( 𝑋 2 − )

) )

= 𝔼 𝑋 1 − (𝑝 1 ( 𝑋 1 − ) Pr ( 𝑌 1 (0) = 0 |𝑆 > 1 , 𝑋 1 − ) 𝔼 𝑋 2 − |𝑋 1 − ,𝑌 1 (0)=0 ,𝑆> 1 (

Pr ( 𝑌 2 (0) = 1 |𝑌 1 (0) = 0 , 𝑆 > 2 , 𝑋 2 − ) )). (A.18)

f unconfoundedness as in Assumption (A1) holds, then conditionaln 𝑋 2 − the potential outcomes for individuals with S > 2 on averagequals the potential outcomes for treated in period 2, i.e. Pr ( 𝑌 2 (0) = |𝑆 > 2 , 𝑌 1 (0) = 0 , 𝑋 2 − ) = Pr ( 𝑌 2 (0) = 1 |𝑆 = 2 , 𝑌 1 (0) = 0 , 𝑋 2 − ) , which im-lies that

r ( 𝑌 2 (0) = 1 |𝑌 1 (0) = 0 , 𝑆 > 2 , 𝑋 2 − ) = Pr ( 𝑌 2 (0) = 1 |𝑌 1 (0) = 0 , 𝑆 > 1 , 𝑋 2 − ) ,

o that

𝑋 1 −

(𝑝 1 ( 𝑋 1 − ) Pr ( 𝑌 1 (0) = 0 |𝑆 > 1 , 𝑋 1 − ) 𝔼 𝑋 2 − |𝑋 1 − ,𝑌 1 (0)=0 ,𝑆> 1 (

Pr ( 𝑌 2 (0) = 1 |𝑌 1 (0) = 0 , 𝑆 > 2 , 𝑋 2 − ) ))

= 𝔼 𝑋 1 − (𝑝 1 ( 𝑋 1 − ) Pr ( 𝑌 1 (0) = 0 |𝑆 > 1 , 𝑋 1 − ) 𝔼 𝑋 2 − |𝑋 1 − ,𝑌 1 (0)=0 ,𝑆> 1 (

Pr ( 𝑌 2 (0) = 1 |𝑌 1 (0) = 0 , 𝑆 > 1 , 𝑋 2 − ) ))

= 𝔼 𝑋 1 − (𝑝 1 ( 𝑋 1 − ) Pr ( 𝑌 1 (0) = 0 |𝑆 > 1 , 𝑋 1 − )

Pr ( 𝑌 2 (0) = 1 |𝑆 > 1 , 𝑌 1 (0) = 0 , 𝑋 1 − ) ). (A.19)

ext, if unconfoundedness as in Assumption (A1) holds, then condi-ional on 𝑋 1 − the potential outcomes for individuals with S > 1 on aver-ge equals the potential outcomes for treated in period 1, i.e., Pr ( 𝑌 2 (0) = |𝑆 > 1 , 𝑌 1 (0) = 0 , 𝑋 2 − ) = Pr ( 𝑌 2 (0) = 1 |𝑆 = 1 , 𝑌 1 (0) = 0 , 𝑋 2 − ) , so that

𝑋 1 −

(𝑝 1 ( 𝑋 1 − ) Pr ( 𝑌 1 (0) = 0 |𝑆 > 1 , 𝑋 1 − )

Pr ( 𝑌 2 (0) = 1 |𝑆 > 1 , 𝑌 1 (0) = 0) , 𝑋 1 − )

= 𝔼 𝑋 1 − (𝑝 1 ( 𝑋 1 − ) Pr ( 𝑌 1 (0) = 0 |𝑆 = 1 , 𝑋 1 − )

Pr ( 𝑌 2 (0) = 1 |𝑆 = 1 , 𝑌 1 (0) = 0) , 𝑋 1 − )

= 𝔼 𝑋 1 − (𝑝 1 ( 𝑋 1 − ) Pr ( 𝑌 2 (0) = 1 , 𝑌 1 (0) = 0 |𝑆 = 1 , 𝑌 1 (0) = 0) , 𝑋 1 −

)= 𝔼 𝑋 1 −

(Pr ( 𝑌 2 (0) = 1 , 𝑌 1 (0) = 0) , 𝑆 = 1 |𝑋 1 −

)= Pr ( 𝑌 (0) = 1 , 𝑌 (0) = 0) , 𝑆 = 1) . (A.20)

Page 13: Dynamic treatment assignment and evaluation of active ... 2017.pdfDynamic treatment assignment and evaluation of active labor market ... assignment among the non-treated survivors

J. Vikström Labour Economics 49 (2017) 42–54

T

𝔼

B

𝔼

a

𝔼

S

) .

4)

T

S

t

R

A

A

v

B

B

B

C

C

C

v

D

D

E

F

F

F

G

H

H

H

H

H

H

H

K

L

L

L

L

L

L

L

O

R

S

S

V

W

hus, from equations we that (A.12) –(A.20)

[ 𝑝 1 ( 𝑋 1 − )

[1 − 𝑝 1 ( 𝑋 1 − )][1 − 𝑝 2 ( 𝑋 2 − )] 𝑌 2 𝟏 ( 𝑌 1 = 0) 𝟏 ( 𝑆 > 2)

] = Pr ( 𝑌 2 (0) = 1 , 𝑌 1 (0) = 0) , 𝑆 = 1) . (A.21)

y similar reasoning

[

𝑝 𝑠 ( 𝑋 𝑠 − ) ∏𝑘 𝑚 = 𝑠 1 − 𝑝 𝑚 ( 𝑋 𝑚 − )

𝑌 𝑘 𝟏 ( 𝑌 𝑘 −1 = 0) 𝟏 ( 𝑆 > 𝑘 )

]

= Pr ( 𝑆 = 𝑠, 𝑌 𝑘 (0) = 1 , 𝑌 𝑘 −1 (0) = 0) (A.22)

nd

[

𝑝 𝑠 ( 𝑋 𝑠 − ) ∏𝑘 𝑚 = 𝑠 1 − 𝑝 𝑚 ( 𝑋 𝑚 − )

𝟏 ( 𝑌 𝑘 −1 = 0) 𝟏 ( 𝑆 > 𝑘 )

]

= Pr ( 𝑆 = 𝑠, 𝑌 𝑘 −1 (0) = 0) .

(A.23)

ubstituting (A.22) and (A.23) into (A.12) and simplifying gives

𝑡

𝑘 = 𝑠

⎡ ⎢ ⎢ ⎢ ⎣ 1 −

∑𝑖

𝑝 𝑠 ( 𝑋 𝑖,𝑠 − ) ∏𝑘 𝑚 = 𝑠 1− 𝑝 𝑚 ( 𝑋 𝑖,𝑚 − )

𝑌 𝑘,𝑖 𝟏 ( 𝑌 𝑘 −1 ,𝑖 = 0) 𝟏 ( 𝑆 𝑖 > 𝑘 ) ∑𝑖

𝑝 𝑠 ( 𝑋 𝑖,𝑚 − ) ∏𝑘 𝑚 = 𝑠 1− 𝑝 𝑚 ( 𝑋 𝑖,𝑚 − )

𝟏 ( 𝑌 𝑘 −1 ,𝑖 = 0) 𝟏 ( 𝑆 𝑖 > 𝑘 )

⎤ ⎥ ⎥ ⎥ ⎦ =

𝑡 ∏𝑘 = 𝑠

[

1 −

Pr ( 𝑆 = 𝑠, 𝑌 𝑘 (0) = 1 , 𝑌 𝑘 −1 (0) = 0)

Pr ( 𝑆 = 𝑠, 𝑌 𝑘 −1 (0) = 0)

]

=

𝑡 ∏𝑘 = 𝑠

[1 − Pr ( 𝑌 𝑘 (0) = 1 |𝑆 = 𝑠, 𝑌 𝑘 −1 (0) = 0)

]=

𝑡 ∏𝑘 = 𝑠

Pr ( 𝑌 𝑘 (0) = 0 |𝑆 = 𝑠, 𝑌 𝑘 −1 (0) = 0) = Pr ( 𝑌 𝑡 (0) = 0 |𝑆 = 𝑠, 𝑌 𝑠 −1 ( 𝑠 ) = 0

(A.2

ogether (A.10) and (A.24) implies that 𝑝 lim 𝑁→∞ ATET 𝑡 ( 𝑠 ) = ATET 𝑡 ( 𝑠 ) .

upplementary material

Supplementary material associated with this article can be found, inhe online version, at 10.1016/j.labeco.2017.09.003 .

eferences

bbring, J.H. , van den Berg, G.J. , 2003. The non-parametric identification of treatmenteffects in duration models. Econometrica 71, 1491–1517 .

lbanese, A., Thuy, Y., Cockx, B., 2015. Working time reductions at the end of the career.do they prolong the time spent in employment?, differential effects of active labourmarket programs for the unemployed. IZA Discussion paper 9619.

an den Berg, G.J. , Caliendo, M. , Schmidl, R. , Uhlendorff, A. , 2015. Matching or DurationModels? A Monte Carlo Study. Mimeo, University of Mannheim .

iewen, M. , Fitzenberger, B. , Osikominu, A. , Paul, M. , 2014. The effectiveness of publicsponsored training revisited: the importance of data and methodological choices. J.Labor Econ. 32 (4), 837–897 .

lack, D. , Smith, J. , Berger, M. , Noel, B. , 2003. Is the threat of reemployment servicesmore effective than the services themselves? evidence from random assignment inthe UI system. Am. Econ. Rev. 93 (4), 1313–1327 .

usso, M. , DiNardo, J. , McCrary, J. , 2014. New evidence on the finite sample propertiesof propensity score reweighting and matching estimators. Rev. Econ. Stat. 96 (5),885–897 .

aliendo, M. , Mahlstedt, R. , Mitnik, O. , 2017. Some practical guidance for the implemen-tation of propensity score matching. Labour Econ. 46, 14–25 .

54

arling, K. , Richardson, K. , 2001. The relative efficiency of labor market programs:Swedish experience from the 1990. Labour Econ. 26 (4), 335–354 .

répon, B. , Ferracci, M. , Jolivet, G. , van den Berg, G.J. , 2009. Active labor market policyeffects in a dynamic setting. J. Eur. Econ. Assoc. 7, 595–605 .

an den, B.G. , 2001. Duration models: Specification, identification and multiple durations.In: Heckman, J., Leamer, E.E. (Eds.), Handbook of Econometrics, Vol. 5. Elsevier,pp. 3381–3460 .

engler, K. , 2015. Effectiveness of sequences of one-euro-jobs for welfare recipients ingermany. Appl. Econ. 47 (57), 6170–6190 .

yke, A. , Heinrich, C.J. , Mueser, P.R. , Troske, K.R. , Jeon, K.-S. , 2006. The effects of wel-fare-to-work program activities on labor market outcomes. J. Labor Econ. 24 (3),567–607 .

riksson, M. , 1997. To Choose or not to Choose: Choice and Choice Set Models, UmeåEconomic Studies 443, Department of Economics. Umeåa University .

itzenberger, B. , Osikominu, A. , Völter, R. , 2008. Get training or wait? long-run em-ployment effects of training programs for theunemployed in west germany. Annalesd ’Économie et de Statistique, 91/92, 321–355 .

redriksson, P. , Johansson, P. , 2008. Dynamic treatment assignment: the consequencesfor evaluations using observational data. J. Busi. Econ. Stat. 26 (4), 435–445 .

rölich, M. , 2004. Finite sample properties of propensity-score matching and weightingestimators. Rev. Econ. Stat. 77–90 .

erfin, M. , Lechner, M. , 2002. A microeconometric evaluation of the active labour marketpolicy in switzerland. Econ. J. 112, 854–893 .

eckman, J. , Ichimura, H. , Todd, P. , 1998. Matching as an econometric evaluation esti-mator. Rev. Econ. Stud. 65, 261–294 .

eckman, J. , Navarro, S. , 2007. Dynamic discrete choice and dynamic treatment effects.J. Econom. 136, 341–396 .

eckman, J. , Smith, J. , 1999. The pre-programme earnings dip and the determinants ofparticipation in a social programme. implications for simple programme evaluationstrategies. Econ. J. 109, 313–348 .

ernan, M. , Brumback, B. , Robins, J.M. , 2001. Marginal structural models to estimate thejoint causal effect of nonrandomized treatments. J. Am. Stat. Assoc. 96, 440–448 .

irano, K. , imbens, G. , Ridder, G. , 2003. Efficient estimation of average treatment effectsusing the estimated propensity score. Econometrica 71, 1161–1189 .

otz, V.J. , Imbens, G.W. , Klerman, J.A. , 2006. Evaluating the differential effects of al-ternative welfare-to-work training components: a reanalysis of the california GAINprogram. J. Labor Econ. 24 (3), 521–566 .

uber, M. , Lechner, M. , Wunsch, C. , 2013. The performance of estimators based on thepropensity score. J. Econom. 175, 1–21 .

astoryano, S., van der Klaauw, B., 2011. Dynamic evaluation of job search assistance,IZA DP no.5424.

echner, M. , 1999. Earnings and employment effects of continuous off-the-job training ineast germany after unification. J. Bus. Econ. Stat. 17, 74–90 .

echner, M. , 2008. Matching estimation of dynamic treatment models: Some practicalissues. In: Millimet, D., Smith, J., Vytlacil, E. (Eds.), Advances in Econometrics 21,Modelling and Evaluating Treatment Effects in Econometrics. Emerald Group Pub-lishing Limited, pp. 289–333 .

echner, M. , 2009. Sequential causal models for the evaluation of labor market programs.J. Bus. Econ. Stat. 27 (1), 71–83 .

echner, M. , Miquel, R. , 2010. Identification of the effects of dynamic treatments by se-quential conditional independence assumptions. Empirical Econ. 39, 111–137 .

echner, M. , Miquel, R. , Wunsch, C. , 2011. Long-run effects of public sector sponsoredtraining in west germany. J. Eur. Econ. Assoc. 9 (4), 742–784 .

echner, M. , Wiehler, S. , 2013. Does the order and timing of active labour market pro-grammes matter? Oxford Bull. Econ. Stat. 75 (2), 180–212 .

echner, M. , Wunsch, C. , 2013. Sensitivity of matching-based program evaluations to theavailability of control variables. Labour Econ. 21, 111–121 .

sikominu, A. , 2013. Quick job entry or long-term human capital development? the dy-namic effects of alternative training schemes. Rev. Econ. Stud. 80 (1), 313–342 .

obins, J. , 1986. A new approach to causal inference in mortality studies with sustainedexposure periods: application to control of the healthy worker survivor effect. Math.Modell. 7, 1293–1512 .

ianesi, B. , 2004. An evaluation of the swedish system of active labour market programmesin the 1990s. Rev. Econ. Stat. 86, 133–155 .

ianesi, B. , 2008. Differential effects of active labour market programs for the unemployed.Labour Econ. 15 (3), 370–399 .

ikström, J. , 2015. Evaluation of sequences of treatments with application to active labormarket policies. IFAU Working paper 2015, Vol. 25 .

ooldridge, J.M. , 2010. Econometric Analysis of Cross Section and Panel Data, 2nd ed.MIT Press .


Recommended