The Propensity Score method in public policy evaluation: a...

Dipartimento di Politiche Pubbliche e Scelte Collettive – POLIS Department of Public Policy and Public Choice – POLIS

Working paper n. 88

April 2007

The Propensity Score method in public policy evaluation: a survey

Michela Bia

UNIVERSITA’ DEL PIEMONTE ORIENTALE “Amedeo Avogadro” ALESSANDRIA

Periodico mensile on-line "POLIS Working Papers" - Iscrizione n.591 del 12/05/2006 - Tribunale di Alessandria

The Propensity Score Method in Public Policy

Evaluation: a Survey♣

Michela Bia*

♣University of Florence – G. Parenti Statistics Department – Florence, Italy. University of Eastern

Piedmont, Department of Public Policy and Public Choice - POLIS, Alessandria, Italy. E-mail: [email protected]; [email protected]

*

The author wishes to thank F. Mealli, A. Mattei, D. Bondonio, A. Martini, G. Imbens.

ii

Abstract

Recently, in the field of causal inference, nonparametric techniques, that use

matching procedures based for example on the propensity score

(Rosenbaum, Rubin, 1983), have received growing attention. In this paper

we focus on propensity score methods, introduced by Rosenbaum and Rubin

(1983). The key result underlying this methodology is that, given the

ignorability assumption, treatment assignment and the potential outcomes

are independent given the propensity score. Much of the work on propensity

score analysis has focused on the case where the treatment is binary, but in

many cases of interest the treatment takes on more than two values. In this

article we examine an extension to the propensity score method, in a setting

with a continuous treatment.

iii

Introduction

Recently, in the field of causal inference, nonparametric techniques, that use

matching procedures based for example on the propensity score

(Rosenbaum, Rubin, 1983), have received growing attention. When the

data, the type of intervention and the assignment criterion allow it, a quasi-

experimental design can be assumed such as the regression discontinuity

design (Thwistelthwaite, Campbell, 1960; Battistin, Rettore, 2004). Another

assumption, that leads to another quasi-experimental design and may be

reasonable to assume in some observational studies, is that treatment

assignment is unconfounded with potential outcomes conditional on a

sufficient set of covariates or pretreatment variables. The unconfoundedness

assumption allows us to compare treated and control units with the same

value of the covariates. Given unconfoundedness, various methods have

been proposed for estimating causal effects. In this paper we focus on

propensity score methods, introduced by Rosenbaum and Rubin (1983). The

key result underlying this methodology is that, given the ignorability

assumption, treatment assignment and the potential outcomes are

independent given the propensity score. Thus, adjusting on the propensity

score removes the bias associated with differences in the observed

covariates in the treated and control groups. To estimate propensity scores,

which are the conditional probabilities of being treated given a vector of

observed covariates, we must specify the distribution of the treatment

indicator given pre-treatment variables.

iv

Much of the work on propensity score analysis has focused on the case

where the treatment is binary, but in many cases of interest the treatment

takes on more than two values (for example, we can think to drug applied in

different doses or a treatment applied over different time periods…etc). In

this paper we examine an extension to the propensity score method, in a

setting with a continuous treatment. The first section introduces the standard

propensity score analysis (Rosenbaum and Rubin, 1983) - that is when the

treatment is binary. The second section is a review of the propensity score

methodology with multiple treatment. The third section deals with the

propensity score method when the treatment is continuous.

1

1 The evaluation of public policies: some statistical methods

1.1 Introduction

The evaluation of policies carried out by using quantitative tools is a

tangible answer to the need to express “judgements empirically based on

achievement accomplished by a public policy when facing a particular

collective problem”. By “collective problem” we mean a situation that is

socially perceived as inadequate and, as such, worthy of change and

eventually worthy of public contribution. Think about pollution in city

centres, assistance to old people or the lack of competitiveness between

small and medium enterprises: these are all problems which require public

involvement through allocation of fund. When a problem is faced by an

intervention, we are referring to a public policy (Martini et al. 2005). The

statistic field of reference is that of causal inference, with the reference to

and the development of appropriate quantitative methods for policy effect

evaluation. The starting point of a policy effect evaluation is the

identification of the object of analysis, i.e., referring to the potential

outcome approach to causal inference (Neyman, 1923; Rubin, 1974), a

characteristic of the distribution of the difference between two potentially

observable outcomes: Y0 (a post-intervention variable observed on a unit -

individual or firm - in the absence of an intervention) and Y1 (a post-

intervention variable observed on a unit in the presence of an intervention).

Identification and estimation of such parameters present some relevant

problems: a) only one of the two potential outcomes is observed on a single

2

unit, the other representing the counterfactual situation; b) the assignment to

the treatment is usually not random, so estimation is based on observational

data; c) it is necessary to isolate the effect of the intervention from the

effects of other factors, which can influence access and results. Appropriate

estimation methods (parametric or nonparametric) should be based on

sensible hypotheses about the assignment rule, which allow to identify (even

partially, Manski, 1995, 2003) the causal effects. In observational studies, a

usual starting point consists in constructing a control group (units not

receiving the treatment, but similar to units receiving it), under the

unconfoundedness assumption (Rosenbaum, Rubin, 1983). In this section

we intend to describe the basic principles of such an approach, continuing

with a more formal discussion.

1.2 Potential results and the Rubin Causal model

The basic idea of causal inference is that of an action (or a treatment)

applied to a unit, where unit means a person or a company, at a specific

point in time. As a result, in the binary treatment case, for each unit and

each treatment there are two potential results: one referring to the value of

the outcome variable in the event of treatment, and the other in the event of

non treatment. The causal effect is the result of a comparison between the

two potential results. The use of the adjective ‘potential’ is motivated by the

impossibility of observing the outcome both with and without treatment.

This is defined as the basic problem of causal inference (Holland, 1986). In

this sense it is very useful to have information about several units, analysing

the distribution of the treatment effect and concentrating on summary

3

measures of such distribution, for example the average treatment effects. In

order to obtain correct estimates of such quantities, it is crucial to define the

assignment mechanism, described below. We now introduce some notation:

consider a population of N units. Each unit i is characterised by a k-

dimensional vector of Xi covariates, two potential results Yi(0) and Yi(1) and

a variable Zi { }1,0∈ , which denotes the assignment (Zi = 1) or not (Zi = 0) to

the treatment. X indicates the matrix (N × K) of the k units’ characteristics,

Y(0) and Y(1) the vector of the potential results and Z the vector for

assignment to treatment 1.

From the existing tie between the vectors of potential results (Y(0), Y(1))

and treatments (Z), we have two distinct relationships between the observed

and unobserved results, denoted by Yi (observed) and Yi (missing) respectively:

Yi (observed) = Zi⋅ Y(1) + (1-Zi)⋅ Yi(0)

Yi (missing) = (1-Zi)⋅ Y(1) + Zi⋅ Yi(0)

where Yi (observed) and Yi (missing) represent the i.th element of vectors Yi (observed)

and Y(missing). In order to identify and define causal effects, it is necessary to

make some assumptions. An important assumption, that reduces the number

of potential results is the following (Rubin, 1978a, 1980):

Assumption 1.1 Stable Unit Treatment Value Assumption

(SUTVA), under which the potential outcomes Yi(Zi) for the ιth unit

just depend on the treatment that the ιth unit received. That is, there

is “no interference between units” and there are “no versions of

treatments”.

4

It is necessary to emphasize how the reliability of such an assumption is

neither testable, nor removable and completely based on the experience of

the researcher. Identifying treatment effects relies on further assumption on

the assignment mechanism, that is, the mechanism that determines which

units get which treatments, formally defined as follows:

Definition 1.1 Assignment mechanism

Given a population of N units, the assignment mechanism is a row

exchangeable function p( )1(),0(,;( YYXZ ), with values included in

{ }N1,0 and so that ∑ =Z

YYXZp 1))1(),0(,;( , for each )1(),0(, YYX .

The probability of unit assignment is defined as follows:

Definition 1.2 Units assignment probability

The probability of assignment to treatment for unit i is given by:

))1(),0(,;())1(),0(,( 1 YYXZpYYXpizi ∑ == z .

Let X(i) indicates the matrix (N-1) × K dimension obtained removing the i.th

matrix row X; and analogously for Y(i)(0) and Y(i)(1). The exchangeability of

the assignment mechanism allows us to rewrite the N functions pi(.) in terms

of a common function q(.), that depends on the covariates and potential

results of unit i and on the covariates and potential results of all other units.

))1(),0(,( YYXp i = ))1(,)0(,),1(),0(,( )()()( iiiiii YYXYYXq

for any i = 1,…N

Strictly connected to the assignment probability concept there is the

propensity score, defined as follows:

5

Definition 1.3 Propensity score.

Given a population of N units, the propensity score is defined as:

{ }∑ == )/)1(),0(,())1(),0(,( : xixXi NYYXpYYXei

with xN equal to the number of units with xX i = . For each x

resulting in xN = 0, the propensity score is not defined. (Imbens,

2002).

This definition of the propensity score, which will be examined in detail in

later sections, will be useful later on for analysing our case study, where the

treatment turns out to be a continuous variable and a generalized propensity

score is defined to allow for treatment effect estimation with no binary

treatment. The definition of probabilistic assignment follows:

Definition 1.4 Probabilistic assignment

An assignment mechanism is referred to as probabilistic if for every i

the assignment probability is between 0 and 1, that is:

( ) 1)1(),0(,0 << YYXpi

This assumption requires that each unit has a non-zero probability of being

treated and, at the same time, there are no units with a probability equal to 1

of being treated.

6

1.3 Causal effects and identifying assumptions

Inferences about the effects of the treatments involve speculations about the

effect that one treatment would have had on a unit which, actually, received

an other treatment (Rosenbaum and Rubin, 1983a). If we consider the

binary treatment, according to the type of intervention assigned to the N

units under study, the i.th unit has both a response Yi(1), that would have

resulted if it had received treatment 1 and a response Yi(0) that would have

resulted if it had received treatment 0. As a result, causal effects are

comparison of )0()1( ii ,YY (for example, a difference )0()1( ii YY − or a ratio

)0(/)1( ii YY ). It is evident that estimating the causal effects of treatments is a

missing data problem, since either Yi(1) or Yi(0) is missing. In causal

inference in general – and in policy evaluation in particular – a quantity of

primary interest to be estimated is the average treatment effect ATE,

defined as follows:

ATE = )]0([)]1([)]0()1([][ iiiii YEYEYYEE −=−=τ

The estimation of the average treatment effect for a subpopulation (SATE)

having received treatment level z - with z = 0,1 - is equal to:

SATE = [ ]zZiE iT == :ττ

and when z = 1 the SATE is usually known as the ATT. In particular, in the

field of policy evaluation, we are interested in the ATT estimation, because,

with such an estimate, it is possible to assess how much the intervention

may have produced a change in a given condition or behaviour of the policy

7

beneficiaries. Now, in the case of randomized experiments, it has been

shown (Neyman, 1923) that the ATT can be easily estimated; this means

that it is possible to obtain an unbiased estimate of the average causal effect

through the SUTVA assumption, by a direct comparison between the

average results of the two treatment groups, in which units are similar, with

respect to any possible characteristics, including potential results, thanks to

randomisation. In an observational field1, however, such direct comparisons

may be misleading because the units exposed to one treatment generally

differ systematically from the units exposed to the other treatment.

Specifically, whereas in experimental situations one can obtain a control and

treatment group which are homogeneous with respect to the observable

characteristics, X, this is not possible in nonexperimental studies since it is

likely that the decision to be assigned to a treatment is, in this case, not

independent from the observable as well as unobservable characteristics2.

This leads to a self-selection process which makes the two groups

potentially different even before the policy is carried out. A possible way to

address this complication in nonexperimental studies is to consider the

randomized experiment as a template for the analysis of an observational

(i.e., nonrandomized) study. Having the template of a randomized

experiment means having to think about the underlying randomized

experiment that could have been done, where in the randomized experiment

underlying an observational study, the probabilities of assignment to

1 A study is considered observational when a treatment assignment is not known. 2 With random assignment, homogeneity of the control and treatment group with respect to the unobservable characteristics is also guaranteed if the size of the groups is sufficiently large.

8

treatments are not equal, but are rather functions of the covariates, and so

the template is actually an unconfounded assignment mechanism.

To do this we make the strong ignorability or unconfoundedness

assumption.

Assumption 1.2 Strong Unconfoundedness assumption

(Rosenbaum and Rubin, 1983)

Generally, we shall say treatment assignment is strongly ignorable

given a vector of covariates X if

XZYY ⊥)1(),0( and 1)1(0 <=< XZprob (common support)

referring, from now on, to )1(),0( YY instead of )1(),0( ii YY for the potential

results corresponding to the i.th individual. For brevity, when treatment

assignment is strongly ignorable given the observed covariates X, we shall

say simply that treatment assignment is strongly ignorable. The strong

ignorability assumption asserts that the probability of assignment to a

treatment does not depend on the potential outcomes conditional on

observed covariates. In other words, within subpopulations defined by

values of the covariates, we have random assignment. This assumption rules

out the role of the unobservable variables. The issue of unobserved

covariates should be addressed using models for sensitivity analysis (e.g.,

Rosenbaum and Rubin, 1983b) or using non parametric bounds for

treatment effects (Manski, 1990; Manski et al., 1992).

Of course, if the goal is to identify only the Average Treatment Effect for

the Treated (ATT), a weaker assumption can be made:

9

Assumption 1.3 Weak Unconfoundedness assumption (Rosenbaum

and Rubin, 1983)

XZY ⊥)0( and 1)1( <= XZprob

That is, the unconfoundedness assumption can be relaxed, requiring

only that Y(0) is independent of Z given X. Also the overlap

condition can be relaxed so that the support of X for the treated units

is a subset of the support of X for the untreated units

Both the unconfoundedness assumption and the overlap condition, may be

controversial in applications. The first assumption requires that all variables

that affect both outcome and the likelihood of receiving the treatment are

observed or that all the others are perfectly collinear with the observed ones.

Although this assumption is not testable, it is a very strong assumption, and

one that need not generally be applicable. Clearly selection may also take

place on the basis of unobservable characteristics. However, any alternative

assumptions that not rely on unconfoundedness, while allowing for

consistent estimation of the causal effects of interest, must make alternative

untestable assumptions. Whereas the unconfoundedness assumption implies

that the best matches are units that differ only in their treatment status, but

otherwise are identical, alternative assumptions implicitly match units that

differ in the pre-treatment characteristics. Often such assumptions are even

more difficult to justify. For instance, the technique of instrumental

variables is sometimes considered as an alternative to assuming

unconfoundedness (Heckman, 1979; Heckman and Hotz, 1989), but a

disadvantage of these methodologies is the high sensitiveness with respect

to the distributional hypothesis. A possible solution to this is a non or semi-

10

parametric approach through the selection of instrumental variables (Angrist

et al., 1996). But, since the identification of these variables is often

extremely difficult, the use of unconfoundedness assumption therefore may

be a natural starting point after comparing average outcomes for treated and

control units to adjust for observable pretreatment differences.

The strong ignorability assumption validates the comparison of treated and

control units with the same value of covariates; in fact the average treatment

effect (ATE) can be written as:

)]0()1([ YYE −=τ )])0()1(([ XYYEE −=

)],0)0(([)],1)1(([ xXZYEExXZYEE ==−===

)],0([)],1([ xXZYEExXZYEE ==−===

while the average treatment effect on the treated (ATT) formula may be

rewritten as follows:

]],1)0()1([[1 xXZYYEE ZxT ==−= =τ

]],1)0([[]],1)1([[ 11 xXZYEExXZYEE ZxZx ==−=== ==

]],0[[]],1[[ 11 xXZYEExXZYEE ZxZx ==−=== ==

Note that in both τ and Tτ , due to the unconfoundedness, what is not

known:

11

]],0)1([[ xXZYEE == and ]],1)0([[ xXZYEE == for τ ,

]],1)0([[1 xXZYEE Zx === for Tτ

can be substituted with what can be actually observed:

]],1[[ xXZYEE == and ]],0[[ xXZYEE == for τ ,

]],0[[1 xXZYEE Zx === for Tτ

Typically, there are many background characteristics that need to be

controlled for estimating the average causal effect and adjusting the

estimation for all these covariates can be actually unfeasible. Propensity

score technology, introduced by Rosenbaum and Rubin (1983a), addresses

this situation by reducing the entire collection of background characteristics

to a single “composite” characteristic that appropriately summarizes the

collection. In the following sections, we will focus on common variants of

such method.

1.4 Propensity score: definition and properties

Theoretically if the unconfoundedness assumption is valid, the expression

for the propensity score can be rewritten as follows:

)())1(),0(,( XeYYXe =

12

Formally the unit propensity score is the conditional probability that a unit

be assigned to treatment given pre-treatment variables:

)1()( xXZpXe ===

The propensity score is a balancing score, that is, where propensity score is

equal, distribution of covariates is the same for treatment and controls,

formally we can write (Rosenbaum and Rubin,1983):

Lemma 1.1 Balancing of pre-treatment variables given the

propensity score

)(XeZX ⊥

In particular, the propensity score is the coarsest balancing score, i.e., any

balancing score b(X) must satisfy the relation e(X) = f(b(X)), for some

function f (Rosenbaum and Rubin, 1983a). The key feature of propensity

score methodology is that, given the strong ignorability assumption,

treatment assignment and the potential outcomes are independent:

)()1(),0( XeZYY ⊥

and

1))(1(0 <=< XeZp

Thus, adjusting for the propensity score removes the bias associated with

differences in the observed covariates in the treated and control group. As a

13

result, given the strong ignorability assumption, if the propensity score e(X)

is known, it follows that:

[ ])0()1(( YYE −=τ

)](,0)0(([)](,1)1(([)]()0()1(([ XeZYEEXeZYEEXeYYEE =−==−=

and

)]](,1)0()1([[1 XeZYYEE ZxT =−= =τ

)]](,0)0([[)]](,1)1([[ 11 XeZYEEXeZYEE ZxZx =−== ==

where the outer expectation is over the distribution of e(X).

1.5 Matching and propensity score based methods

In what follows we will concentrate on the estimation of ATTs, although

the techniques can be easily modified and used for the estimation of ATE.

As already mentioned, the quantity

]],0[[1 xXZYEE Zx ===

which by the uncounfoundedness assumption is used to estimate the

unknown quantity

14

]],1)0([[1 xXZYEE Zx ===

may be computed using different procedures. The most appropriate way

would be to use the information about the untreated units, considering

eventual differences in terms of observable characteristics between the two

sub-populations of treated and untreated individuals. The most common

methodologies in use are the regression and matching techniques. The first

ones are based on the specification of a model for the outcome variable, for

example the simple linear regression or more complex models. However, it

is clear that correct specification of the model is crucial for correct causal

interpretation. On the contrary, matching techniques do not need any a priori

functional form specification between the dependent and independent

variables and, in this sense, they are more robust (Rubin, 1973a). We will

now describe the most common variants of matching, that, as already

mentioned, can be used together with the propensity score.

1.5.1 Matching types

A wide range of literature about matching procedure is available (see for

example Rubin, 1973a, b, Abadie and Imbens, 2004). These methods match

each treated unit/s to control unit/s according to different procedures. In

general, we may suppose to have a dataset concerning a population/sample

of N units. For each of the N units we observe (Yiobs, Zi, Xi) that,

respectively, represent the observed potential outcome, the treatment

indicator and the vector of the k covariates. Because ATT is our causal

estimand, Yi(1) is observed for every treated unit, whereas Yi(0), the

15

counterfactual outcome, must be somehow estimated. Matching allows to

find, in the control group, a value for Yi(0) identified on the basis of the Xi

pre-treatment variables. We can define by T0 the untreated unit group, with

z(j) the weight given to the unit j and with Ai = {j∈ T0: Xj ∈ C(Xi)} the

subgroup of the untreated units, which have are used to estimate Yi(0),

following criteria C(Xi).

By defining every type of matching with Ai and Zi, we obtain the following

definitions:

i) Exact matching:

Control unit/s with the same observed characteristics of the treated

units are sought out:

Ai = {j∈ T0: (Xi = Xj)}

The greatest problem concerning this type of matching is given by

the possibility that in the control group there is no unit with this type

of characteristic. The probability of such an event happening

increases with the number of covariates, if covariates are continuous

variables and if the sample is not too large.

ii) Caliper matching:

This type of matching is a generalization of exact matching. Instead

of requiring a perfect equality of the covariates, the (treated and

untreated) units characteristics are assumed to be “not too distant”.

This may be formalized as follows:

Ai = {j∈ T0: Xi(m) - Xj

(m) < c(m), m=1,i…,k)}

16

In this case the problem is choosing the threshold c(m) for each

covariate.

iii) Nearest neighbors:

This type of method allows us to overcome the multidimensionality

problem. In fact, according to this procedure, we may consider

suitable metrics to reduce the distance between covariates. An

appropriate solution is to choose the unit/s which are nearest, through

an appropriate distance function:

Ai = { j∈ T0 : minj ji XX − }

In this case matching type would depend on the chosen metrics, for

example the euclid distance, Mahalanobis distance (Rubin,1980a),

variance covariance matrix (Abadie and Imbens, 2002,…etc.

Another solution could be to include in Ai more than one unit,

varying appropriately the weights zij in the estimator.

1.6 Use of propensity score in matching techniques and

matching estimators

Matching methods, applied in connection to propensity scores, remove the

covariates multidimensionality problem. As previously mentioned, one of

the most important propensity score property is, in fact, to be one-

dimensional summary of multidimensional covariates X, such that when the

propensity scores are balanced across the treatment and control groups, the

17

distribution of all the covariates X are balanced in expectation across the

two groups. Rosenbaum and Rubin (1983) showed that for a specific value

of the propensity score, the difference between the treatment and control

means for all units with that value of the propensity score is an unbiased

estimate of the average treatment effect at that propensity score, if the

treatment assignment is strongly ignorable given the covariates. Thus

matching or regression (covariance) adjustment on propensity score tends to

produce unbiased estimates of the treatment effects when treatment

assignment is strongly ignorable. Here the basic matching techniques

(Rosenbaum and Rubin, 1984, Dehejia and Wahba, 1999) and estimation

based on the propensity score methods presented.

i) Stratification matching

According to this method, the propensity score is divided in blocks

so that in each layer the covariates are balanced and the assignment

to treatment can be considered random. Once the stratification

responding to such properties is obtained, treatment effect estimation

is carried out through two steps. First, within each interval, we

compute the difference between the means of the observed potential

outcomes for treated and untreated units (obtaining a conditional

effect estimation for that block. Second, we estimate the ATT effect

weighting each difference according to treated units distribution

inside each block (see Stratification matching estimator formula

1.6.1 section).

ii) Nearest neighbor matching

18

This matching procedure matches to each treated unit that specific

untreated unit that has the nearest propensity score:

Ai = { j∈ T0 : minj )()( ji XeXe − } with 1=∑∈ iAj

ijz

The control group is represented by just one control unit and the

selection is usually made with repetition, so it is possible to match it

several times to various treated units. As a result, the number of

control units, used for the intervention effect estimation, may be

lower than the number of treated units. According to this method, it

is possible to match some treated units to control units with a very

different propensity score, in that it is the nearest among those

singled out. As a result, a minimal distance between the two

propensity scores needs to be set up. The group Ai may be

considered, however, suitably redefined so as to include more than

one neighbour for each treated (number to be defined beforehand).

iii) Radius matching

Each treated unit is matched to control units with a propensity score

interval which is minor or equal to a certain “radius” δ and the

number of controls to be used for the Yi(0) identification is not

defined:

Ai = { j∈ T0: e(Xi) - δ ≤ e(Xj) ≤ e(Xi) + δ} with 1=∑∈ iAj

ijz

19

This procedure, compared to the previous one, has two basic

differences: some treated units may be rejected because there is no

untreated unit with a propensity score within the defined interval,

more than one untreated unit can be matched to a single treated unit,

as there are more untreated units with a propensity score includes in

the interval. The choice of range δ has been made as a compromise

between two existing requirements. In fact, if the range is very

small, some treated units will be missed, but making comparison

between “very similar units” will be an advantage; vice versa a wide

range will mean a higher number of controls, but these will be “less

similar” to the treated units

iv) Kernel matching

Each treated unit is “matched” to all untreated units (Ai. = T0), with

weights varying inversely to the distance of their propensity score

from treated units propensity score. We use this type of weighting

system:

where k(.)3 is a density function and h is the bandwidth parameter.

3For example the Kernel density function:

−−=

2)()(

21

exp211

h

xexe

hk ij

j π

∑ ∈

−

−

=

0

)()(1

)()(1

Tj

ij

ij

ij

hxexe

kh

hxexe

kh

z

20

1.6.1 Matching Estimators

We list the formulas for the matching estimators introduced in the previous

section and their variance:

Nearest Neighbor and radius matching

The average effect on the treated, applying the nearest neighbor or

radius matching method, is equal to the following formula (where n

stands for either nearest neighbor or radius matching and the number

of units in the treated group is denoted by NT ):

( ) ( )∑ ∑∈ ∈

−=

Ti Ajiiji

T

n

i

YzYN

011

τ

with 1=∑∈ iAj

ijz

The variance estimator is assumed to have fixed weights and

indipendent outcomes accross units:

)]0(())1(([)(

1)(

0

22 j

Tjj

Tii

T

n YVarzYVarN

Var ∑∑∈∈

+=τ

)]0((

)(1

))1(([)(

1

0

222 j

TjjTi

T

T

YVarzN

YVarNN ∑

∈

+=

)0((

)(1

))1((1

0

22 j

TjjTi

T

YVarzN

YVarN ∑

∈

+=

21

where T0 denotes the selected control sub-group applying the

matching procedure ∑=i ijj zz . Standard errors are obtained

analytically using the above formula, or using the bootstrap method,

even if this last point appears to be controversial for nearest neighbor

matching, since standard errors seem to be inconsistent according to

this procedure (Imbens, 2004).

Stratification matching:

By construction, the propensity score is divided in blocks so that in

each layer the covariates are balanced and the assignment to

treatment can be considered random. As a result, the difference

between the means of the observed potential outcomes for treated

and untreated units, is equal to:

01

)0()1(

q

qj j

q

qi isq N

Y

N

Y ∑∑ ∈∈ −=τ

where 1qN and 0

qN denote the number of treated and control units

inside each block q. The estimator of the ATT is computed weighting

each differences sqτ according to treated units distribution in each

block.

T

qQ

q

sq

s

N

N 1

1∑

=

= ττ

22

where Q is the number of layers and NT is the total treated units.

Assuming independence of outcomes across units, the variance sτ is

computed by:

)]())1(([1

)( 00

1

1

1

jq

qQ

q T

qi

T

s YVarN

N

N

NYVar

NVar ∑

=

+=τ

Standard errors are obtained analytically using the above formula, or

using the bootstrap method.

Kernel matching:

The kernel matching estimator is given by:

∑∑∈∈

−=0

)]0()1([1

Ijjij

Tii

T

k YzYN

τ

where ijz is computed by the formula:

∑ ∈

−

−

=

0

)()(1

)()(1

Ij

ij

ij

ij

h

xexek

h

h

xexek

hz

In this case standard errors are easy to obtain using bootstrap method.

1.7 Alternative estimation methods

Alternatives to matching methodologies are outlined in this paragraph. We

will focused on the Difference in Difference and Heckman selection model.

23

DID methods for estimating causal effect of policy interventions are widely

used in economics, in particular when outcomes are measured in both the

treatment and control group before and after the policy intervention. In the

standard DID model we have N individuals (usually random sample from

the population), observed in time periods iT = (t-m),…(t-1),(t),(t+1)…(t+k),

with (t-m),…(t-1) and (t+1)…(t+k) denoting the pre and post - policy

intervention period, respectively, while the error terms iε are assumed to be

additive and constant over time. To account for time trends unrelated to the

treatment, the change experienced by the group subject to the intervention

(treatment group) is adjusted by the change experienced by the no-

beneficiary group (control group). Meyer (1995), Angrist and Krueger

(2000), Blundell and MaCurdy (2000) describe many applications of this

methodology. In the field of Program Evaluation, the difference in

difference method (Moffit, 1991; Heckman and Robb, 1985) involves the

use of panel data to better define the control group and reduce the selection

bias effect. A great number of observational units: Yi,t-1, Yi,t-2, Yi,t-3… Yi,t-m, -

before programme intervention at time (t) - can be (potentially) considered

in the model. This means that we can highlight existing systematic

differences between the treated and untreated groups. Taking into account

these differences allows us to obtain unbiased treatment effect estimation,

since they could influence the outcome value independently to the program.

It is important to underline that a greater number of observations, before

program treatment, that take into consideration the differences related to the

temporal trends before policy actuation, certainly improving the estimate of

the unobservable conterfactual measure.

24

Note that, the interpretation of the standard DID estimator depends on the

assumptions about treatment effect with respect to the individuals. It is, in

fact, often assumed to be constant across individuals, but more generally the

effect of the intervention might differ across individuals, then the standard

DID estimator gives the average intervention effect on the treatment group.

Recently Imbens and Athey (2005) proposed a different approach from the

standard DID method. They allow the effects of both time and intervention

differ systematically across individuals (e.g, we can think about new

medical technology that differentially benefits sicker patients). The setting

considered in their research is that of repeated cross-sections4 of individuals

observed in a treatment group and a control group, before and after the

treatment. They propose an estimator for the entire counterfactual treatment

effect distribution on the treatment group as well as the treatment effects

distribution on the control group, where the two distributions may differ

from each other in arbitrary ways. First they propose a new model that

relates outcomes to an individuals’ group, time and unobservable

characteristics. Groups can differ in arbitrary ways (and, in particular, the

treatment group might have individuals who experience a high treatment

benefit). In DID method the mean of individual outcomes in the absence of

the treatment can vary by group and by time. In contrast, in their model,

time periods and groups are treated asymmetrically. Second, they provide

conditions to identify the model in a non parametric way, proposing an

estimation strategy based on the identification method. They use the entire

control group outcome distributions pre and post intervention to make a non

parametric estimation of the change occurred on the group. Assuming that

the outcomes distribution in the treatment group would be the same (that is, 4 But they apply their model also to panel data.

25

with the same change), they estimate the counterfactual distribution for the

treatment group in the second period. They compare this counterfactual

distribution to the actual post-intervention distribution for the treatment

group, yielding an estimate of the treatment effects distribution for treated

units. Using a similar strategy they define the treatment effect on the

control units. In other words, to figure out what would have happened to a

treated unit in the first period with outcome Y, they look at units in the first

period control group with the same outcome Y. Under weak monotonicity

assumption, the distribution of their second period outcomes is possible to

be derived, using that to obtain the counterfactual distribution for the second

period treated units with no policy intervention.

In this way it is possible to evaluate a range of economic questions

suggested by policy analysis, such as, for example, which part of the

distribution benefits most from a policy intervention, always basing on a

consistent economic model of the outcomes. The proposed CIC model has

many advantages. It allows the distribution of unobservable characteristics

to vary across groups in arbitrary ways. It allows for changes of the

distribution outcomes, both in mean and variance, over time and without a

policy intervention. Moreover, it is possible to estimate the effects of a

policy on the mean and variance of the treatment groups distribution relative

to the original time trend in these moments. It is clear that the DID model is

assumed to be a special case of the change in change model.

One common worry (Besley and Case, 2000) is that the effects identified by

DID may not be correct if the policy occurred in a “field” that derives

atypical benefits from the policy intervention. It implies that the treatment

group may differ from the control group not just in terms of the outcomes

distribution in the absence of the treatment, but also in the effects of the

26

treatment. Athey and Imbens’ model allows for both of these eventualities

across groups, because they allow the effect of the treatment to vary by

unobservable characteristics of an individual and the unobservable

distribution varies across groups.

Another model that is usually used to remove the hypothesis of selection on

observable (unconfoundedness assumptions) is the Heckman selection

model (Heckman, 1974) which can be specified in its simplest form as

follows:

iiii ZXY 1210 εβββ +++=

iii XZ 210* εγγ ++=

that is, the model includes latent dependent variables models. iY is the

outcome and *iZ is latent variable underlying the treatment indicator Z. X is

a matrix ((N1 + N0)×h) with h equal to the number of characteristics constant

over time, for the i.th unit, before policy intervention. The errors

components 1ε , 2ε are assumed to be jointly bivariate normally distributed

conditional on X, with zero mean vector and variance matrixΣ , so that:

),0(1 Σ≈ Nε , ),0(2 Σ≈ Nε

with ρεε =),( 21corr

It is possible to remove the bivariate normality assumption of the errors in

the following cases: maintaining the monotonicity assumption with the

27

availability of an instrumental variable (semiparametric Heckman’s

selection models, Deaton(1989), Hausman and Newey (1995)) or, if an

instrumental variable is not available, introducing non parametric bounds

(Lee, 2005). However, most of the recent studies, aimed to develop

semiparametric versions of selection models (Newey and Vella, 2003),

while keeping some of the previous assumptions: Powell (1987), Newey

(1988), Ahn and Powell (1993) and Honore and Powell (2001).

28

2 Multivalued treatment

2.1 Introduction

The Rubin causal model is usually presented for binary treatments, although

in principle, in many cases of interest, the treatment takes on more than two

values. There are many examples of that: we can think about drug applied in

different doses or a treatment applied over different time periods, as well as

labour market programmes that need a more complex framework including

the actual choice set of individuals, certainly characterized by more than

two options. Anyway, in all these cases, the standard propensity score

methodology must be modified in a non-trivial way. As a consequence,

methods have been developed in order to extend the conventional two

treatments framework to allow for estimation of average causal effects with

multiple mutually exclusive treatments. Imbens (1999) and Lechner (2000)

gave, with this respect, the major methodological contributions. They refine

identification using strong and weak unconfoundedness assumptions for the

case of more than two treatments. In the following sections we present and

compare both approaches.

2.2 The basic framework.

In order to extend propensity score application from binary treatment to

arbitrary treatment regimes, we report the basic assumptions available in the

first case that we can usefully generalize also in multiple treatment. Let’s

29

summarize the conventional Rubin Causal Model. We have a binary

treatment, that is Zi { }1,0∈ 1. Associated with each unit i = 1,2…N and each

value of the treatment z, there is a potential outcome Yi(z). We are interested

in the average outcome, E[Y(z)] and particularly in the average causal effect

of exposing units to treatment or not: E[Yi(1) - Yi(0)]. A key assumption,

that we will now restate for the identification of causal effect is the

uncounfoundedness assumption in its two strong and weak forms.

Assumption 2.1 Strong unconfoundedness assumption

Assignment to treatment Z is strongly ignorable, given pretreatment

variable X, if

{ } { } XZzY z ⊥∈ 1,0)(

In order to redefine

the weaker version of unconfoundedness, we define Di(z) to be the

indicator, for unit i, of receiving treatment z.

=

=otherwise

zZif

zDi

i

0

1

)(

As a result, weak unconfoundedness assumption is defined in the following

way:

Assumption 2.2 Weak unconfoundedness assumption

1 See assumptions in previous section: Potential outcomes.

30

Assignment to treatment Z is weakly ignorable, given pretreatment

variable X, if

XzDzY )()( ⊥ for all Zi { }1,0∈ .

As we can see, Rosenbaum and Rubin show how strong unconfoundedness

requires the treatment Z to be independent of the entire set of potential

outcomes, while weak unconfoundedness implies only pairwise

independence of the treament indicator with each of the potential outcomes.

Moreover, weak unconfoundedness requires a local independence of the

potential outcome Y(z) with respect to the considered treatment level. This

means independence of the level indicator )(zD , rather than of the entire

vector of treatment values Z. In the binary treatment case, first and second

condition are obviously the same thing. It is clear that the importance of the

two ignorability assumptions versions is strictly related to what we are

interested in estimating. Particularly, the weak unconfoundedness concept is

linked to the missing data problem of causal inference. More often the

concern is, infact, with the average of Yi(z) in the sub-sample with

1)( =zDi . As a consequence, units with 0)( =zDi did not receive treatment

level z and the other potential outcomes Yi(0) are never observed for the

units with 1)( =zDi , so that they can play no role in any adjustment for

differences procedures by defining subpopulations. This lack of relevance is

well reflected by weakly ignorable assumption. In addition we report the

following Lemma 2.1 and Lemma 2.2.

Let be )(Xe the propensity score in binary treatment case, we have:

31

Lemma 2.1 Balancing property of pre-treatment variables given the

propensity score (Rosenbaum and Rubin, 1983)

)(XeXZ ⊥

Lemma 2.2 Weak unconfoundedness given the propensity score with

binary treatments (Rosenbaum and Rubin, 1983)

)()()( XezDzY ⊥ for all Zi { }1,0∈

According to this result, it is sufficient to condition on the propensity score

instead of the entire set of covariates (Imbens,1999). Formally, we also

report the following Theorem that will be useful in section 2.4, in order to

introduce the average treatment effect estimation in multivalued treatment

case.

Theorem 2.1 Adjustment for propensity score given weak

unconfoundedness assumption:

i) ])(,)([])()([),( eXezZzYEeXezYEez ====≡µ

ii) ))](,([[)]([ XezEEzYE µ= for all Zi { }1,0∈ .

2.3 Multiple treatment

From now on, we allow the treatment variable to take on integer values

between 0 and k. Let T be the treatment variable in the multi-valued case,

so that { }kT ,....1,0= and Xi the set of covariates such that χ∈X . It is

32

assumed that each individual i = 1,2…N is assigned to one specific

treatment.

We are interested in the population average treatment effect and,

particularly, in the average causal effect of exposing units to treatment t or

to treatment s, that is:

)]()([ sYtYEATEts −=

which denotes the ATE of the treatment t relative to treatment s for a

participant drawn randomly from the population. The average effect of

treatment t relative to treatment s, for the sub-population having received

treatment level t only, can be defined as follows:

])()([ tTsYtYEATTts =−=

Imbens and Lechner refer to different versions of unconfoundedness

assumptions according to the type of treatment effect that is needed

to identify and estimate. The following weak ignorability

assumptions can be introduced:

Assumption 2.3 Weak unconfoundedness given pre-treatment

variables X (version 1) (Imbens 1999)

XtDtY )()( ⊥ Tt ∈∀

Assumption 2.4 Weak ignorable assumption (version 2)

XsDtDsY )(),()( ⊥

33

Assumption 2.5 Strong ignorability assumption (Lechner, 2000)

xXTsYtY =⊥)(),(

Assumption 2.6 Weak ignorability assumption (Lechner, 2000)

{ }tsTxXTsY ,,)( ∈=⊥

Synthetically we report the average treatment effects that can be identified

under each of the previous assumptions:

)]()([ sYtYEATEts −=

according to the assumption 2.3 and assumption 2.5

]1)()()([ =−= tDsYtYEATTts

according to the assumption 2.4 and assumption 2.6

Again, there are many background characteristics that need to be controlled

for estimating the average causal effect and adjusting the estimation for all

these covariates can be unfeasible. In this sense, the introduction of a

Propensity score generalized to arbitrary treatment regimes results very

useful since the propensity score summarizes the information on the

background characteristics in an appropriate single summary score. As a

consequence, we need to modify the standard definition of propensity score,

to allow for the implementation of a generalized propensity score (Imbens,

1999):

Definition 2.1 Generalized propensity score

34

The Generalized propensity score (GPS) is the conditional

probability of receiving a particular level of the treatment given the

pre-treatment variables:

])([)Pr(),( xXtDExXtTxtr ====≡

According to this notation, the propensity score in the binary treatment is

equivalent to:

),1()( xrxe =

Hence, i) the GPS defines a single random variable as a transformation of

the two random variables T and X: r(T,X); ii) it defines a family of random

variables indexed by t as a transformation of X alone: r(t,X) for all T∈t .

The GPS also satisfies the balancing property, like the conventional

propensity score:

Lemma 2.3 Balancing property of the Generalized Propensity

Score

),()( XtrXtD ⊥ for all Tt ∈ .

Proof (Imbens, 1999)

First we have

),(])([)],(,1)(Pr[ XtrXtDEXtrXtD ===

in fact by definition

35

])([),( XtDEXtr =

Second

],()],(,)([[)],()([)],(1)(Pr[ XtrXtrXtDEEXtrtDEXtrtD ===

Hence

)],(1)(Pr[()],(,1)(Pr[ XtrtDXtrXtD === ,

that is, conditionally on r(t,X), the treatment indicator D(t) and the pre-

treatment variables are independent. It is important to note that the

conditioning argument changes according to the level of treatment. As a

result, to guarantee conditional independence of the multi-valued treatment

T and covariates X, we need to condition on the entire set of T∈tXtr )},({ . It

is only in the binary treatment case that conditioning on T∈tXtr )},({ is

identical to conditioning on a single score e(X). As a result, all previous

unconfoundedness assumptions can be re-written given the generalized

propensity score definition. In fact, if strong or weak ignorability

assumptions given the covariates are available, then:

Theorem 2.2 Weak unconfoundedness given GPS (Imbens, 1999)

Suppose assignment to treatment T is weakly unconfounded given

pre-treatment variables X (version 1), then:

),()()( XtrtDtY ⊥ Tt ∈∀

Proof

36

)],(),()([)],(),(1)(Pr[( XtrtYtDEXtrtYtD ==

)],(),()],(,),()([[ XtrtYXtrXtYtDEE=

),()],(,),(),([ XtrXtrXtYXtrE ==

Moreover, as shown in the proof for Lemma 2.3,

),()],(1)(Pr[ XtrXtrtD == .

Hence,

)],(1)(Pr[)],(),(1)(Pr[ XtrtDXtrtYtD === ,

so, conditionally on r(t,X), D(t) and Y(t) are independent.

Assumption 2.7 Weak unconfoundedness given GPS (version 2)

),(),,()(),()( XsrXtrsDtDsY ⊥

According to Lechner’s approach we can re-write the previous assumptions

3 and 4, given the pre-treatment variables, in the following way:

Assumption 2.8 Strong unconfoundedness given GPS (Lechner,

2000)

If xXTsYtY =⊥)(),( and 1)Pr(0 <==< xXjT hold for

χ∈∀x and for kstj ,....,...1,0=∀ ,

It follows that

37

)]Pr()Pr(),...,Pr(

)Pr(),Pr()[Pr()(),(

xXkTXkTxXsT

XsTxXtTXtTTsYtY

=======

=====⊥

Assumption 2.9 Weak unconfoundedness given GPS (Lechner,

2000)

If { }tsTxXTsY ,,)( ∈=⊥ and 1)Pr(0 <==< xXjT hold for

χ∈∀x and stj ,=∀

It follows that

{ }],),(Pr)([Pr)(),( ,, stTxXTsYtY stssts ∈=⊥

where

{ })Pr()Pr(

)Pr(),,Pr(

xXtTxXsTxXsT

xXstTsT==+==

====∈=

.

2.4 Implementation of the GPS in multi-valued treatments.

Since the GPS has analogous properties to the propensity score used in

binary treatment, we now apply it instead of the covariates, in order to

obtain the ATEts and ATTts estimations. In the binary treatment case, the

propensity score is computed using a logistic regression. In the multi-valued

case could be applied multinomial logit or nested logit models (with ordered

levels of treatments in the second case, for example the dose of a drug or

time over which a treatment is applied, …etc). Given the generalized

propensity score, we can compute the average outcomes estimation by

38

conditioning solely on the GPS. As a result, according to Theorem 2.2 and

imposing smoothness of the expectation function if appropriate, the

conditional expectation of the outcome can be estimated (Imbens, 1999),

given the treatment t and the probability of receiving the treatment actually

received, applying the following Theorem:

Theorem 2.3 Estimation of Average Potential Outcomes given the

generalized propensity score, supposing assignment to treatment

weakly unconfounded given the pre-treatment variables.

Then

i) ]),(,[]),()([),( rXTrtTYErXtrtYErt =====β

ii) ))],(,([)]([ XtrtEtYE β= by iterated expectations

for all Tt ∈ .

Proof

The proof concerns part i), since part ii) follows by applying iterated

expectations

]),(,)([]),(,[ rXTrtTtYErXTrtTYE =====

]),(,1)()([]),(,[ rXtrtDtYErXtrtTYE =====

which by weak unconfoundedness assumption is equal to

]),([ rXtrYE =

39

Note that to obtain the population average value (which, as we will show in

the continuous case, is the causal effect estimation) we need to apply

iterated expectations on ),( rtβ , i.e ))],(,([)]([ XtrtEtYE β= . We can

consider the subpopulations obtained as strata of the population applying the

GPS. In particular, let Y(t) be the average value for units with treatments t

and r(T,X) = r, this is an unbiased estimate of the average Y(t) for the

subpopulation with T = t and r(t,X) = r. The reason is that the former

subpopulation with r(T,X) = r is the same as the latter one with r(t,X) = r.

As a result, the average of Y(s) for units with T = s, in the same

subpopulation with r(T,X) = r, is unbiased for the average of Y(s) in a

different subpopulation with r(s,X) = r (that is, with a different set from

subpopulation with r(t,X) = r ). Hence no causal comparison can be possible

within the subpopulation defined by r(T,X) = r and the regression of the

observed Y on the treatment level T and the GPS r(T,X) = r has no causal

interpretation.

Formally consider the difference

]),(,)([]),(,)([),(),( rXTrsTsYErXTrtTtYErsrt ==−===− ββ

by weak ignorability assumption (version 1) this is equal to

]),()([]),()([ rXsrsYErXtrtYE =−=

but there is no causal interpretation for the comparison conditional on the

GPS value, because the conditioning sets differ:

{ } { }rxsrxrxtrx =≠= ),(),(

40

In order to obtain a causal interpretation, we need to condition the difference

to the intersection of the two conditioning sets, that is:

)],(),,(,)([)],(),,(,)([ XsrXtrsTsYEXsrXtrtTtYE =−=

)],(),,()()([ XsrXtrsYtYE −

But, if the researcher is interested in the dose-response of a specific sub-

population or in the average effect of a specific treatment versus another

one, the average should be computed over the distribution of the pre-

treatment variables in that particular sub-population. For example, we can

estimate the expected (average) effect of treatment t relative to treatment s

for the sub-population having received treatment level t only. In particular,

according to the weak unconfoundedness (version 2), the tsATT is supposed

to be equal to:

]1)()()([ =−= tDsYtYEATTts

]1)()([]1)()([ =−== tDsYEtDtYE

]],1)()([[]],1)()([[ XtDsYEEXtDtYEE =−==

that by weak unconfoundedness is equal to:

]],1)()([[]],1)()([[ 1)(1)( XsDsYEEXtDtYEE tDXtDX =−= ==

and given the generalized propensity score, we can rewrite it as follows:

41

]]),()([[]]),()([[ 1)(),(1)(),( rXsrsYEErXtrtYEE tDxsrtDxtr =−= ==

where the outer expectation is over the treated units having received

treatment level t. Remember that, since the treatment can take on more than

two values, it is important to be sure that there is sufficient overlap in the

distribution of pre-treatment variables by treatment of interest. The

procedure is to compare for each value of t the univariate distribution

),( Xtr conditional on T = t with the same distribution with T ≠ t. If the two

distributions are similar, then all adjustment methods can be well

performed. Of course, other types of procedures can be applied in order to

obtain the average treatment effects estimation. For example, we can use

matching techniques, assuming that each individual is assigned to one

specific treatment and that, for any participants, only one component of

T∈ttY )}({ can be observed, while the remaining outcomes represents the

counterfactuals units. We introduce a pairwise comparison of the treatments

t and s according to the following equations (Lechner, 2000):

)]([)]([)]()([ sYEtYEsYtYEATEts −=−=

that denotes the ATE of the treatment t relative to treatment s for a

participant drawn randomly from the population.

Note that ATEts can be re-written in the following way:

)()])(())(([)]()([1

jTPjTsYEjTtYEsYtYEATEK

jts ==−==−= ∑

=

The strong unconfoundedness condition identifies all counterfactuals:

42

))(( jTtYE = and ))(( jTsYE = , because it implies ),)((),)(( tTxXtYEjTxXtYE ===== and ),)((),)(( sTxXsYEjTxXsYE ===== , kj ,..1,0=∀ .

As a result, stATE , tsATE tsATT , stATT , are identified.

The expected effect for an individual randomly drawn from the population

of participants in treatment t only is, instead, equal to:

))(())((])()([ tTsYEtTtYEtTsYtYEATTts =−===−= 2

The weak unconfoundedness condition identifies only the counterfactual

))(( tTsYE = that is needed to compute the ATTts. Note that this last

assumption is derived from the independence and assignment in population

that implies independence in any subpopulation defined by treatment

participation categories.

However, a stronger ignorability assumption of treatment assignment (with

respect to assumption 2.5) can be also adopted for arbitrary treatment

regimes, in order to model T without conditioning on potential outcomes

(Van Dik and Imai, 2003). We postpone a discussion on the generalization

of the propensity score, under strong ignorability assumption, in the

continuous treatment case, also comparing Van Dik’s approach (2003) with

respect to Hirano and Imbens’ elaboration (2004) of the propensity score

method applied for the treatment effect estimation.

2 It is evident that, if the participants in treatments t and s differ in a non-random way, this can influence the outcome values: ATTts ≠ ATTst, that is to say they are not symmetric.

43

3 Continuous treatment

3.1 Introduction

We showed how, under specific assumptions, like the strong ignorability

treatment assignment, multivariate adjustment methods based on the

propensity score have the property of reducing the bias that arises in

observational studies.

In this project we implement an extension of the propensity score method in

a setting with a continuous treatment, that is we refer to the generalized

propensity score already introduced in multiple treatment case. We make an

unconfoundedness assumption (Rosenbaum and Rubin, 1983) and adjust for

the Generalized Propensity Score (function of the covariates) to remove all

bias associated with differences in the covariates. The Generalized

Propensity score is just a generalization of the binary treatment propensity

score (Imbens and Hirano, 2004; Van Dik and Imai, 2003), with many of its

characteristics and balancing property which are essential to assess the right

specification of the score. We proceed to the estimation and inference of

the causal effects of interest in a parametric way (even if a non parametric

version is possible). We apply this methodology to the public contributions

(treatment variable) supplied to the Piedmont enterprises, during years 2001

- 2003 . Due, infact, to the variety of funds set by public policies, the

treatment turns out to be a continuous variable. We are interested in the

effect of the amount of contribution on occupational level.

We estimate the average effect of the contribution adjusting for the

difference in background characteristics using the propensity score

44

methodology and compare the results to conventional regression based

methods. According to the empirical evidence (Dehejia and Wahba 1999;

Imai 2004) the former methodology often leads to more robust results than

the latter one or other estimation methods, such as DIDor selection model

presented in section 1.7.

3.2 Framework

We consider a sample of units i=1,2,…,N and, for each unit, we have a set

of potential unit-level outcomes Yi(t) for t∈ τ. In the binary treatment τ =

{0,1}, but in the continuous case we have τ ⊂ [t0,t1]. We are interested in the

average dose-response function µ(t)=E[Yi(t)], in correspondence with the

observed vector of covariates Xi and the level of the assigned treatment t

[i.e Yi = Yi(t)].

We assume {Y(t)}t∈τ , T, X defined in a common probability space, T

continuously distributed with respect to Lebesgue measure on τ and Y =

Y(T) a well-defined random variable; i.e Y(.) suitable measurable.

We are interested in the estimation of average causal effects, which can be

computed through the dose-response function µ(t) and in particular in the

ATE and ATT, such as:

][ Y(t)t)Y(tEATE?t,t −+= ?

][ tY(t)t)Y(tEATT?t,t −+= ?

45

that is, in the continuous treatment case we can be interested in marginal

treatment effect estimation, for example with respect to a specific treatment

level t. Imbens and Hirano (2004) generalize the uncounfoundness

assumption available for binary treatment (Rosembaum and Rubin 1983) to

the continuous case and crucial for the estimation of the above quantities.

Assumption 3.1 Strong ignorability assumption of treatment

assignment (Van Dyk and Imai, 2003)

{ } XTtY t ⊥∈τ)(

Assumption 3.2 Weak unconfoundedness assumption (Imbens and

Hirano, 2004)

Y(t) ⊥ T|X for all t ∈ τ.

Assumption 3.2 requires a conditional independence for each value of

treatment t ∈ [t0,t1] and not joint independence of all potential outcomes

{ }τ∈ttY )( .

As already underlined, there are many characteristics that need to be

controlled for the average treatment effect estimation. The introduction of a

generalized propensity score reduces the entire collection of background

characteristics to a single “composite” variable that appropriately

summarizes them. Here the GPS definition:

Definition 3.1 Generalized Propensity Score

Let r(t,x) be the conditional density function of the treatment given

the covariates

46

r(t,x) = fT|X (t|x)

such that R(T,X) and r(t,X), for every t ∈ τ, are well-defined random

variables.

The conditional distribution fT|X(t|x) must be modeled and its unknown

parameters must be estimated using, for example, maximum likelihood

method1. Misspecification of the model for the propensity score is possible

and generally leads to biased causal inference estimation. Hence, care must

be taken to identify as many covariates as possible, as well as to check for

model misspecification (Drake, 1993). The generalized propensity score can

be also defined through a propensity function:

),()()( XTrxtf XT ψψ =

where its distribution is assumed to be parameterized by ψ (Van Dik and

Imai, 2003). Under these analytical framework, it is possible to derive

theoretical results which extend those in Rosenbaum and Rubin (1983b).

Dik and Imai (2003) show the propensity score is a balancing score even

with a non binary treatment, so that it could be applied to arbitrary treatment

regimes, also reducing the dimensionality of X enough to allow for the

application of efficient estimation techniques. Formally we have:

Lemma 3.1 Balancing of pre-treatment variables given the

generalized propensity score (Van Dyk and Imai, 2003)

1 We can think to the normal distribution for the treatment given the covariates

)),;(( 2σβiii XhNXT ∼ where β is the parameter vector, ),( βiXh is a known function of the

covariates which depends on the parameters β to estimate and σ2 is the unknown common variance of the errors.

47

Within strata with the same value of r(t,X) , the probability that T = t

does not depend on the value of X:

X ⊥ 1{T=t}| r(t,X)

This definition does not require unconfoundedness.

The following theorem establishes that the potential outcomes and the

treatment assignment are conditionally independent given the generalized

propensity score. Formally we write:

Theorem 3.1 Strong unconfoundedness given the Generalized

Propensity Score (Van Dyk and Imai, 2003)

))(.,})(({))(.,,})(({ XrtYfXrTtYf tt ττ ∈∈ =

Proof (Van Dyk and Imai, 2003)

Theorem 3.2 Weak unconfoundedness given the Generalized

Propensity Score (Van Dyk and Imai, 2003)

If assignment to the treatment is (weakly) unconfounded given the

pre-treatment variables X, then:

fT (t| r(t,X), Y(t)) = fT (t|r(t,X)) for each value of t.

Proof (Van Dyk and Imai, 2003)

In other words, if the balancing hypothesis of Lemma 3.1 is satisfied,

observation with the same GPS must have the same distribution of

observable characteristics, independently of treatment’s value. So, just like

48

for the standard propensity score, exposure to treatment is random and

treated and control units should be on average identical. Hence, having the

generalized propensity score equivalent properties to the propensity score

for binary treatment, it can be applied, instead of covariates, as one

dimensional score summarizing the information on the background

characteristics, so leading to more efficient average treatment effect

estimations. The difference, here, is that the conditional density of the

treatment level at t corresponds to the evaluation of generalized propensity

score at the same t: this implies as many propensity scores as levels of

treatment to use each at one time. In particular, using GPS in connection to

smoothing techniques we have:

Theorem 3.3 Bias removal with Generalized Propensity Score

(Imbens and Hirano, 2004).

Suppose that the assumptions of Theorem 3.3 are satisfied, then:

B(t,r)=E[Y(t) | r(t,X) = r] = E[Y | T = t, R = r]= B(t,r)

µ(t) = E[B(t,r(t,X)]= E[E[Y(t) | r(t,X) ]]=E[Y(t)] (by iterated

expectations)

Theorem 3.3 implies that, in order to estimate the dose-response function

u(t), First, we must estimate the conditional expectation of the outcome, E[Y

| T = t, R = r], is estimated as a function of a specific level of the treatment

T = t and of a specific value of GPS R = r. Second, the dose-response

function, µ(t) = E[B(t,r(t,X)], is estimated averaging the conditional

expectation over the score r(t,X), evaluated at a certain level of the treatment

t. As already underlined, it should be clear that B(t,r) does not have a causal

49

interpretation. We, infact, need to average the conditional expectation over

the marginal distribution r(t,X), E[E[Y(t) | r(t,X) ]], to estimate the causal

effect.

Proof of Theorem 3.3 (Imbens and Hirano, 2004):

Let ),(),(,)( rtyf XtrTty represent the conditional density of Y(t) given T = t

and r(t,X) = r. Then applying the Bayes rule and Theorem 1 we get:

)),((

)()),(,)((),(

),()(

),(,)( rXtrtf

ryfrXtrytYtfrtyf

T

XtrtYT

XtrTty =

===

= )(),()( ryf XtrtY

So we can write

]),()([]),(,)([ rXtrtYErXtrtTtYE ====

but also we have

]),(,)([],)([ rXTrtTtYErRtTtYE =====

]),(,)([ rXtrtTtYE ===

),(]),()([ rtrXtrtYE β===

Hence ),(]),()([ rtrXtrtYE ii β== that is part (i) of Theorem 3. Then we

have ).()]([)]],()([[]),(,([ ttYEXtrtYEErXtrtE µβ ====

50

Moreover, supposing to be interested in marginal treatment effect

estimation with respect to treatment level t, we can write:

)]([)]([)]()([, tYEttYEtYttYEATE tt −+=−+= ???

that denotes the ATE of the treatment )( tt ∆+ relative to treatment (t) for a

participant drawn randomly from the population N. Another quantity of

primary interest is represented by the treatment effect estimation ATT, in a

specific sub-population:

])([])([])()([, ttYEtttYEttYttYEATT tt −+=−+= ???

]],)([[]],)([[ XtTtYEEXtTttYEE tTXtTX =−=+= == ?

that, by weak unconfoundedness, is equal to:

)]],()([[)]],()([[ ),(),( XtrtYEEXttTrttYEE tTXtrtTXttr ==+ −+=+ ???

The ttATT ,∆ denotes the expected effect for an individual randomly drawn

from the population of participants having treatment level t, while r(t,X) is

measurable with respect to the sigma-algebra generated by X.

Imbens’ procedure (2004) for the dose-response estimation - according to

the previous assumptions and Theorems – is based on the regression on the

propensity score technique. We will apply an extension of it since it

represents a valid strategy if implemented in empirical study. A method of

using the propensity score is to estimate the conditional expectation of Y

given T and r(t,X). First the GPS is estimated through the conditional

51

distribution of the treatment variable given the covariates, assuming a

specific functional form, for example a normal linear model2:

)),;(( 2σβiii XhNXT ∼

or

)),;((log 2σβiii XhNXT ∼

with the estimated GPS equal to:

);( ii XTgps φ=∧

To verify whether this specification is suitable, we investigate how it affects

the balance of the covariates. Hence, we first divide the range of the

treatment in an arbitrary number of intervals, we then define further blocks

of the GPS, for a specific );( iXtr = - computed at a certain treatment level.

Then, we examine the balancing for each covariate, testing whether the

mean in one of the treatment groups is different from the other treatment

groups combined, inside each GPS block . We make this for each treatment

interval with respect to the others groups, computing the t-tests for each

covariate and treatment interval. However, the precise steps of the GPS

implementation will be shown in the next chapter, in our empirical case

study. After having specified and estimated the GPS, we need to model the

conditional expectation of the outcome on the treatment variable and the

2 We may use more general models such as mixture of normals or heteroskedastic normal distributions, with the variance considered as a parametric function of the covariates.

52

score, E[Y | T = t, R = r] , as a flexible function of its two arguments3. For

example, we can use a linear regression or a quadratic approximation, such

as:

iiiiiiii RTRRTTRTYE 52

432

010],[ ββββββ +++++=

We estimate the parameters of the model , e.g by ordinary least squares,

using the estimated GPS iR̂ among the regressors. Hence, we estimate the

average potential outcome at treatment level t: ∧

)]([ tYE , doing this for each

level of the treatment we are interested in, to get an estimate of the entire

dose-response function4.

tXtrXtrXtrtttYEN

iiii∑

=

∧∧∧∧∧∧∧∧∧

+++++=1

52

432

010 ),(),(),(])([ ββββββ

In the last step we need to average the estimated regress function over the

score function in correspondence of the desired level of t. Rather than

referring to the dose-response function, we can report its derivatives. In

economics, this represents the marginal propensity (Imbens and Hirano,

2004) with respect to what we are interested in estimating. As we will show

in our observational study, this will be a useful and an alternative strategy to

estimate the dose – response, allowing for computing estimates at a specific

treatment level as well as comparing the marginal propensities at different

levels of intervention. 3 Remember that there is no causal interpretation for the conditional expectation of the outcome. 4 It is convenient to use bootstrap methods to compute standard errors and form confidence intervals.

53

3.3 The sub-classification procedure

The GPS can be used also with sub-classification and matching procedure,

although they are usually more cumbersome than in the binary treatment

case. Van Dik and Imai (2003) implement analysis techniques mostly based

on sub-classification and able to balance a high-dimensional covariate

adjusting for a low-dimensional propensity score. In sub-classification

technique, first they model the conditional distribution of the treatment

given the covariates )( xtfXTψ , where ψ parameterises the distribution.

Second they compute ∧

ψ of ψ that represents the parameters estimation. As

a result, the parametric model defines the generalized propensity score as

follows: ),( Xtr ∧∧

ψ5

. Third they compute ),( Xtr ∧∧

ψ for each observation and

sub-classify observations with the same or similar values of ∧r into a number

of sub-classes of equal size. Within each sub-class they model the outcome

distribution given the treatment ),(),(,)(

∧

rtyf XtrTty , e.g by regressing Y(t) on

both t and ∧r . To further reduce the bias Robins and Rotnitzky (2001) have

suggested the inclusion of other covariates in the regression. The average

causal effect can be computed as a weighted average of the within sub-

classes effects, with weights equal to the relative size of the sub-classes.

Formally the average treatment effect can be approximated in the following

way:

5 We can think to the Gaussian density function: )),;(( 2σβiii XhNXT ∼ , where ),( 2σβψ =

and 2,σβ can be estimated by maximum likelihood.

54

ss

S

s

WrtTtYEtYE ],)([)]([1

∧

=

=≈ ∑

where S is the number of sub-classes and sW is the relative size of each sub-

classes, estimated by the proportion of observations included into sub-class

s. Since results may be sensitive to the number of subclasses and sub-classes

choice, Van Dick and Imai (2003) suggest to conduct a sensitivity analysis

with different types of sub-classifications.

3.3.1 Lu’s matching technique applying the GPS

In contrast to sub-classification method, Lu et al. (2001) suggest matching

pairs of units on ∧r . In order to implement this procedure, in the continuous

treatment case, we need to divide the range of the treatment variable in

blocks, applying matching inside each strata, so proceeding in the average

treatment effect estimation. However, in this context, matching procedure

turns out to be more difficult to apply than in binary treatment. This because

the matched pairs should not only have similar ∧r , but they should also have

different treatment levels (this is not a problem in the binary treatment case

since each pairs is a unit from the treatment group and a unit from the

control group). Lu et al. (2001) propose a distance measure that decreases

when the propensity scores become similar and the received treatments

become dissimilar. The treatment effect can be evaluated by examining the

difference in response between the “high” and “low” treatment and, in order

to take into account the difference in treatment, they also suggest to regress

the difference in response on difference in treatment. It is evident the

55

difficult application of these techniques for a generalization to continuous

treatment variables. In this sense sub-classification ad smoothing techniques

represent more powerful strategies, since they allow a simpler

implementation of more complex causal effect analysis.

4 Conclusion

Propensity score methods have become one of the most important tools for

analyzing causal effects in observational studies. Although the original work

of Rosenbaum and Rubin (1983) considered applications with binary

treatments, it can also be extended to multivalued and continuous

treatments. We have discussed some of the issues involved in handling

multiple and continuous treatments and emphasized how the propensity

score methodology can be applied to “arbitrary” cases.

56

Bibliography

[Abadie and Imbens, 2002] Abadie, A., Imbens, G. (2002) Simple and bias-

corrected matching estimators for average treatment effects. NBER technical

working paper series, 283.http://www.nber.org/papers/T0283.

[Abadie and Imbens, 2004] Abadie, A., Imbens, G. (2004) Large sample

properties of matching estimators for average treatment effeects. Working

paper, 283. http://elsa.berkeley.edu/.

[Ahn and Powell, 1993] Ahn, H. and J.L. Powell (1993), Semiparametric

Estimation of Censored Selection Models with a Nonparametric Selection

Mechanism, Journal of Econometrics, 58, 3-29.

[Angrist and Krueger, 2000] Angrist, J., Krueger, (2000) Empirical

Strategies in Labor Economics, in A. Ashenfelter and D. Card eds.

Handbook of Labor Economics, vol. 3. New York: Elsevier Science.

[Angrist et al., 1996] Angrist, J., Imbens, G., Rubin, D. B. (1996)

Identification of causal effects using instrumental variables. Journal of the

American Statistical Association 91, 444-472.

[Athey and Imbens, 2002] Athey, S., and Imbens, G. (2002), “Identification

and Inference in Nonlinear Difference-In-Differences Models,” unpublished

manuscript, Department of Economics, Stanford University.

57

[Battistin et al., 2001] Battistin, E., Gavosto, A., Rettore, E. (2001) Why do

subsidised firms survive longer? An evaluation of a program promoting

youth enterpreneurship in Italy, in Lechner M., F. Pfeiffer (eds.),

Econometric evaluation of active labour market policies, Physica/Springer-

Verlag, Heidelberg

[Becker and Ichino, 2002] Becker, S. O. and Ichino, A. (2002). Estimation

of average treatment effects based on propensity scores. The Stata Journal,

4, 358-377.

[Blundel and MaCurdy, 2000] Blundell, Richard, and Thomas MaCurdy,

(2000): Labor Supply, Handbook of Labor Economics, O. Ashenfelter and

D. Card, eds., North Holland: Elsevier, 2000, 1559-1695.

[Bryson et al., 1999] Bryson, A., Dorsett, R., Purdon, S. (2002). The use of

propensity score matching in the evaluation of active labour market policies.

Policy Studies Institute, U.K. Department for Work and Pensions Working

Paper No. 4. http://www.dwp.gov.uk/asd/asd5/wp-index.html.

[Cox and Oakes, 1984] Cox D.R., Oakes D. (1984) Analysis of survival

data. Chapman and Hall, London.89

[Dehejia and Wahba, 1999] Dehejia, R. H., Wahba, S. (1999) Causal effects

in nonexperimental studies: Re-evaluating the evaluation of training

programs. Journal of the American Statistical Association 94, 1053-62.

58

[DiPrete and Gangl, 2004] DiPrete, T., Gangl, M. (2004). Assessing bias in

the estimation of causal effects: Rosenbaum bounds on matching estimators

and instrumental variables estimation with imperfect instruments.

Sociological methodology.

[Drake, 1993] Drake, C. (1993) Effects of misspecification of the propensity

score on estimators of treatment effects. Biometrics 49, 1231-1236.

[Frolich, 2002] Frolich, M. (2002) What is the value of knowing the

propensity score for the estimation af the average treatment effects.

Department of economics, University of St. Gallen.

[Greevy et al. 2004] Greevy, R., Lu, B., Silber, J. H., and Rosenbaum,

P.(2004) Optimal multivariate matching before randomization. Biostatistics

5, 263-275.

[Hahn, 1998] Hahn, J. (1998) On the role of the propensity scores in e±cient

semiparametric estimation of average treatment effects. Econometrica 66,

315-331.

[Hausman and Newey, 1995] Hausman, J., and Newey, W., (1995)

Nonparametric Estimation of Exact Consumer Surplus and Deadweight

Loss, Econometrica, 63, 1445-1476.

[Heckman, 1979b] Heckman, J. J. (1979) Sample selection bias as a

specification error. Econometrica 41(1), 153-161.

59

[Heckman and Hotz, 1989] Heckman, J. J., Hotz, V. J. (1989) Choosing

among alternative nonexperimental methods for estimating the impact of

social programs: the case of manpower training. Journal of the American

Statistical Association 84, 408, 862-874.

[Heckman et al., 1998b] Heckman, J. J., Ichimura, H., Todd, P. (1998b)

Matching as an econometric evaluation estimator. Review of Economic

Studies 65, 261-294.

[Heckman and Robb, 1985] Heckman, J. and R. Robb, (1985), Alternative

Methods for Evaluating the Impact of Interventions, in J. Heckman and B.

Singer, eds., Longitudinal Analysis of Labor Market Data, New York:

Cambridge University Press.

[Heckman and Todd, 1999] Heckman, J. J., Todd, P. (1999) Adopting

propensity score matching and selection models of choice-based samples.

University of Chicago.

[Hirano et al., 2003] Hirano, K., Imbens, G. W., Ridder, G. (2003) Efficient

estimation of average treatment effects using the estimated propensity score.

Econometrica 71(4).

[Holland 1986] Holland, P. (1986) Statistics and causal inference. Journal of

the American Statistical Association, 81.

60

[Honorè and Powell, 1994] Honorè, B. and Powell, J., (1994) Pairwise

Difference Estimators for Censored and Truncated Regression Models,

Journal of Econometrics, 64: 241-278.

[Imbens, 1999] Imbens, G. W., (1999) The Role of the Propensity Score in

Estimating Dose-Response Functions, NBER Working Paper No. T0237.

Available at SSRN: http://ssrn.com/abstract=226648.

[Imbens and Hirano, 2004] Imbens, G. and Hirano, K., (2004) The

Propensity score with continuous treatment, chapter for Missing data and

Bayesian Method in Practice: Contributions by Donald Rubin Statistical

Family.

[Imbens et al., 2001] Imbens, G. (2001) Implementing Matching Estimators

for Average Treatment Effects in Stata, The Stata Journal (2001).

[Joffe and Rosenbaum, 1999] Jo_e, M. M. and Rosenbaum, P. R. (1999).

Propensity scores. American Journal of Epidemiology 150, 327-333.

[Koisuke and Dyk, 2003] Koisuke, I. and van Dyk, D. A., (2003) Causal

treatment with general treatment regimes: Generalizing the Propensity

Score, Revised for the Journal of the American Statistical Association.

[Lechner, 2001] Lechner, M., (2001), Identification and Estimation of

Causal Effects of Multiple Treatments under the Conditional Independence

Assumption, in Lechner and Pfeiffer (eds.), Econometric Evaluations of

Active Labor Market Policies in Europe, Heidelberg, Physica.

61

[Lee, 2005] Lee W-S, (2005) Propensity Score Matching and Variations on

the Balancing Test, Working Paper - 3rd Conference on Policy Evaluation,

Mannheim.

[Lu et al., 2001] Lu et al. (2001). Matching with doses in an observational

study of a media campaign against drug abuse. Journal of the American

Statistical Association 96, 1245-1253.

[Mealli e Pagni, 2001 ] Mealli, F., Pagni, R. (2001) Analisi e valutazione

delle politiche per le nuove imprese. Il caso della L.R. Toscana

n.27/93.IRPET.

[Meyer, 1995] Meyer, B, (1995), Natural and Quasi-experiments in

Economics, Journal of Business and Economic Statistics, 13 (2), 151-161.

[Meini, 2001] Meini, M.C. (2001) Politiche per l'occupazione a scala locale.

Valutazione del ruolo degli interventi per lo start-up d'impresa. Provincia di

Massa, Osservatorio Provinciale Mercato del Lavoro, IRPET.

[Ming and Rosenbaum, 2000] Ming, K., Rosenbaum, P. R. (2000).

Substantial gains in bias reduction from matching with a variable number of

controls. Biometrics 56

[Moffit, 1991] Moffit R. (1991), Program evaluation with nonexperimental

data, Evaluation Review, 15: 291-314.

62

[Newey, 1988] Newey W. K. (1988) Two Step Series Estimation of Sample

Selection Models, Working Paper, MIT Department of Economics.

[Newey and Vella, 2003] Newey and Vella (2003), Non-parametric

Estimation of Sample Selection. Models, Review of Economic Studies, 2003,

Vol 70, pp 33-58.

[Neyman, 1923] Neyman, J. (1923) On the application of probability theory

to agricultural experiments. Essay on principles. Section 9.

[Pellegrini, 2001] Pellegrini G. (2001). La struttura produttiva delle piccole

e medie imprese italiane: il modello dei distretti." Banca Impresa Società, 20

(2001), n. 2.

[Powell J. L., 1987] Powell J. L. (1987) Semiparametric Estimation of

Bivariate Latent Variable Models, Working Paper No. 8704, Social Systems

Research Institute, University of Wisconsin-Madison.

[Powell, 1987] Powell, J. L., (1987) Semiparametric Estimation of

Employment Duration Models, Econometric Reviews, 6: 65-78.

[Purdon, 2002] Purdon, S. (2002) The use of propensity score matching in

the evaluation of active labour market policies, Rosenbaum, P. R.

(2002).Observational Studies, 2nd Edition. Springer Verlag, New York,

NY.

63

[Rettore and Gavosto, 2001] Rettore E. and Gavosto A. (2001) Why do

subsidised firms survive longer? An evaluation of a program promoting

youth entrepreneurship in Italy, Econometric Evaluation of Active Labour

Market Policies, Physica/Springer-Verlag, Heidelberg.

[Rosenbaum, 2002] Rosenbaum, P. R. (2002) Observational Studies, 2nd

Edition. Springer Verlag, New York, NY.

[Rosenbaum and Rubin, 1983] Rosenbaum, P.R., Rubin, D. B. (1983) The

central role of the propensity score in observational studies for causal

effects. Biometrika, 70.

[Rosenbaum and Rubin, 1983b] Rosenbaum, P.R., Rubin, D. B. (1983b)

Assessing sensitivity to an unobserved binary covariate in an observational

study with binary outcome Journal of the Royal Statistical Society, 45, 212.

[Rosenbaum and Rubin, 1984] Rosenbaum, P. R., Rubin, D. B. (1984)

Reducing bias in observational studies using sub-classification on the

propensity score. Journal of the American Statistical Association 79,516-

524.

[Rubin, 1973a] Rubin, D. B. (1973a) Matching to remove bias in

observational studies. Biometrics 29, 159-184.

[Rubin, 1973b] Rubin, D. B. (1973b) The use of matched sampling and

regression adjustment to remove bias in observational studies. Biometrics

29, 159-184.

64

[Rubin, 1974] Rubin, D. B. (1974) Estimating causal effects of treatments in

randomized and nonrandomized studies. Journal of Educational Psychology

66, 688-701.

[Rubin, 1980a] Rubin, D. B. (1980a) Bias reduction using Mahalanobis's

metric matching. Biometrics, 36, 295-298.

[Rubin and Thomas, 1992b] Rubin, D. B., Thomas, N. (1992a)

Characterizing the effect of matching using linear propensity score methods

with normal distributions. Biometrika 79, 797-809.

[Rubin and Thomas, 1996] Rubin, D. B., Thomas, N. (1996) Matching using

estimated propensity scores, relating theory to practice. Biometrics 52, 249-

264.

[Zaho, 2004] Zaho, Z. (2004) Using matching to estimate treatment effects:

data requirements, matching metrics, and Monte Carlo evidence. Review of

Economics and Statistics 86(1), 91-107.

Working Papers The full text of the working papers is downloadable at http://polis.unipmn.it/

*Economics Series **Political Theory Series ε Al.Ex Series

2007 n.88* Michela Bia: The Propensity Score method in public policy evaluation: a survey

2007 n.87* Luca Mo Costabella and Alberto Martini: Valutare gli effetti indesiderati dell’istituto della mobilità sul comportamento delle imprese e dei lavoratori.

2007 n.86ε Stefania Ottone: Are people samaritans or avengers?

2007 n.85* Roberto Zanola: The dynamics of art prices: the selection corrected repeat-sales index

2006 n.84* Antonio Nicita and Giovanni B. Ramello: Property, liability and market power: the antitrust side of copyright

2006 n.83* Gianna Lotito: Dynamic inconsistency and different models of dynamic choice – a review

2006 n.82** Gabriella Silvestrini: Le républicanisme genevois au XVIIIe siècle

2006 n.81* Giorgio Brosio and Roberto Zanola: Can violence be rational? An empirical analysis of Colombia

2006 n.80* Franco Cugno and Elisabetta Ottoz: Static inefficiency of compulsory licensing: Quantity vs. price competition

2006 n.79* Carla Marchese: Rewarding the consumer for curbing the evasion of commodity taxes?

2006 n.78** Joerg Luther: Percezioni europee della storia costituzionale cinese

2006 n.77ε Guido Ortona, Stefania Ottone, Ferruccio Ponzano and Francesco Scacciati: Labour supply in presence of taxation financing public services. An experimental approach.

2006 n.76* Giovanni B. Ramello and Francesco Silva: Appropriating signs and meaning: the elusive economics of trademark

2006 n.75* Nadia Fiorino and Roberto Ricciuti: Legislature size and government spending in Italian regions: forecasting the effects of a reform

2006 n.74** Joerg Luther and Corrado Malandrino: Letture provinciali della costituzione europea

2006 n.73* Giovanni B. Ramello: What's in a sign? Trademark law and economic theory

2006 n.72* Nadia Fiorino and Roberto Ricciuti: Determinants of direct democracy across Europe

2006 n.71* Angela Fraschini and Franco Oscultati: La teoria economica dell'associazionismo tra enti locali

2006 n.70* Mandana Hajj and Ugo Panizza: Religion and gender gap, are Muslims different?

2006 n.69* Ana Maria Loboguerrero and Ugo Panizza: Inflation and labor market flexibility: the squeaky wheel gets the grease

2006 n.68* Alejandro Micco, Ugo Panizza and Monica Yañez: Bank ownership and performance: does politics matter?

2006 n.67* Alejandro Micco and Ugo Panizza: Bank ownership and lending behavior

2006 n.66* Angela Fraschini: Fiscal federalism in big developing countries: China and India

2006 n.65* Corrado Malandrino: La discussione tra Einaudi e Michels sull'economia pura e sul metodo della storia delle dottrine economiche

2006 n.64ε Stefania Ottone: Fairness: a survey

2006 n.63* Andrea Sisto: Propensity Score matching: un'applicazione per la creazione di un database integrato ISTAT-Banca d'Italia

2005 n.62* P. Pellegrino: La politica sanitaria in Italia: dalla riforma legislativa alla riforma costituzionale

2005 n.61* Viola Compagnoni: Analisi dei criteri per la definizione di standard sanitari nazionali

2005 n.60ε Guido Ortona, Stefania Ottone and Ferruccio Ponzano: A simulative assessment of the Italian electoral system

2005 n.59ε Guido Ortona and Francesco Scacciati: Offerta di lavoro in presenza di tassazione: l'approccio sperimentale

2005 n.58* Stefania Ottone and Ferruccio Ponzano, An extension of the model of Inequity Aversion by Fehr and Schmidt

2005 n.57ε Stefania Ottone, Transfers and altruistic punishment in Solomon's Game experiments

2005 n. 56ε Carla Marchese and Marcello Montefiori, Mean voting rule and strategical behavior: an experiment

2005 n.55** Francesco Ingravalle, La sussidiarietà nei trattati e nelle istituzioni politiche dell'UE.

2005 n. 54* Rosella Levaggi and Marcello Montefiori, It takes three to tango: soft budget constraint and cream skimming in the hospital care market

2005 n.53* Ferruccio Ponzano, Competition among different levels of government: the re-election problem.

2005 n.52* Andrea Sisto and Roberto Zanola, Rationally addicted to cinema and TV? An empirical investigation of Italian consumers .

2005 n.51* Luigi Bernardi and Angela Fraschini, Tax system and tax reforms in India

2005 n.50* Ferruccio Ponzano, Optimal provision of public goods under imperfect intergovernmental competition.

2005 n.49* Franco Amisano e Alberto Cassone, Proprieta’ intellettuale e mercati: il ruolo della tecnologia e conseguenze microeconomiche

2005 n.48* Tapan Mitra e Fabio Privileggi, Cantor Type Attractors in Stochastic Growth Models

2005 n.47ε Guido Ortona, Voting on the Electoral System: an Experiment

2004 n.46ε Stefania Ottone, Transfers and altruistic Punishments in Third Party Punishment Game Experiments.

2004 n.45* Daniele Bondonio, Do business incentives increase employment in declining areas? Mean impacts versus impacts by degrees of economic distress.

2004 n.44** Joerg Luther, La valorizzazione del Museo provinciale della battaglia di Marengo: un parere di diritto pubblico

2004 n.43* Ferruccio Ponzano, The allocation of the income tax among different levels of government: a theoretical solution

2004 n.42* Albert Breton e Angela Fraschini, Intergovernmental equalization grants: some fundamental principles

2004 n.41* Andrea Sisto, Roberto Zanola, Rational Addiction to Cinema? A Dynamic Panel Analisis of European Countries

2004 n.40** Francesco Ingravalle, Stato, groβe Politik ed Europa nel pensiero politico di F. W. Nietzsche

2003 n.39ε Marie Edith Bissey, Claudia Canegallo, Guido Ortona and Francesco Scacciati, Competition vs. cooperation. An experimental inquiry

2003 n.38ε Marie-Edith Bissey, Mauro Carini, Guido Ortona, ALEX3: a simulation program to compare electoral systems

2003 n.37* Cinzia Di Novi, Regolazione dei prezzi o razionamento: l’efficacia dei due sistemi di allocazione nella fornitura di risorse scarse a coloro che ne hanno maggiore necessita’

2003 n. 36* Marilena Localtelli, Roberto Zanola, The Market for Picasso Prints: An Hybrid Model Approach

2003 n. 35* Marcello Montefiori, Hotelling competition on quality in the health care market.

2003 n. 34* Michela Gobbi, A Viable Alternative: the Scandinavian Model of “Social Democracy”

2002 n. 33* Mario Ferrero, Radicalization as a reaction to failure: an economic model of islamic extremism

2002 n. 32ε Guido Ortona, Choosing the electoral system – why not simply the best one?

2002 n. 31** Silvano Belligni, Francesco Ingravalle, Guido Ortona, Pasquale Pasquino, Michel Senellart, Trasformazioni della politica. Contributi al seminario di Teoria politica

2002 n. 30* Franco Amisano, La corruzione amministrativa in una burocrazia di tipo concorrenziale: modelli di analisi economica.

2002 n. 29* Marcello Montefiori, Libertà di scelta e contratti prospettici: l’asimmetria informativa nel mercato delle cure sanitarie ospedaliere

2002 n. 28* Daniele Bondonio, Evaluating the Employment Impact of Business Incentive

Programs in EU Disadvantaged Areas. A case from Northern Italy

2002 n. 27** Corrado Malandrino, Oltre il compromesso del Lussemburgo verso l’Europa federale. Walter Hallstein e la crisi della “sedia vuota”(1965-66)

2002 n. 26** Guido Franzinetti, Le Elezioni Galiziane al Reichsrat di Vienna, 1907-1911

2002 n. 25ε Marie-Edith Bissey and Guido Ortona, A simulative frame to study the integration of defectors in a cooperative setting

2001 n. 24* Ferruccio Ponzano, Efficiency wages and endogenous supervision technology

2001 n. 23* Alberto Cassone and Carla Marchese, Should the death tax die? And should it leave an inheritance?

2001 n. 22* Carla Marchese and Fabio Privileggi, Who participates in tax amnesties? Self-selection of risk-averse taxpayers

2001 n. 21* Claudia Canegallo, Una valutazione delle carriere dei giovani lavoratori atipici: la fedeltà aziendale premia?

2001 n. 20* Stefania Ottone, L'altruismo: atteggiamento irrazionale, strategia vincente o amore per il prossimo?

2001 n. 19* Stefania Ravazzi, La lettura contemporanea del cosiddetto dibattito fra Hobbes e Hume

2001 n. 18* Alberto Cassone e Carla Marchese, Einaudi e i servizi pubblici, ovvero come contrastare i monopolisti predoni e la burocrazia corrotta

2001 n. 17* Daniele Bondonio, Evaluating Decentralized Policies: How to Compare the Performance of Economic Development Programs across Different Regions or States.

2000 n. 16* Guido Ortona, On the Xenophobia of non-discriminated Ethnic Minorities

2000 n. 15* Marilena Locatelli-Biey and Roberto Zanola, The Market for Sculptures: An Adjacent Year Regression Index

2000 n. 14* Daniele Bondonio, Metodi per la valutazione degli aiuti alle imprse con specifico target territoriale

2000 n. 13* Roberto Zanola, Public goods versus publicly provided private goods in a two-class economy

2000 n. 12** Gabriella Silvestrini, Il concetto di «governo della legge» nella tradizione repubblicana.

2000 n. 11** Silvano Belligni, Magistrati e politici nella crisi italiana. Democrazia dei

guardiani e neopopulismo

2000 n. 10* Rosella Levaggi and Roberto Zanola, The Flypaper Effect: Evidence from the

Italian National Health System

1999 n. 9* Mario Ferrero, A model of the political enterprise

1999 n. 8* Claudia Canegallo, Funzionamento del mercato del lavoro in presenza di informazione asimmetrica

1999 n. 7** Silvano Belligni, Corruzione, malcostume amministrativo e strategie etiche. Il ruolo dei codici.

1999 n. 6* Carla Marchese and Fabio Privileggi, Taxpayers Attitudes Towaer Risk and

Amnesty Partecipation: Economic Analysis and Evidence for the Italian Case.

1999 n. 5* Luigi Montrucchio and Fabio Privileggi, On Fragility of Bubbles in Equilibrium Asset Pricing Models of Lucas-Type

1999 n. 4** Guido Ortona, A weighted-voting electoral system that performs quite well.

1999 n. 3* Mario Poma, Benefici economici e ambientali dei diritti di inquinamento: il caso della riduzione dell’acido cromico dai reflui industriali.

1999 n. 2* Guido Ortona, Una politica di emergenza contro la disoccupazione semplice, efficace equasi efficiente.

1998 n. 1* Fabio Privileggi, Carla Marchese and Alberto Cassone, Risk Attitudes and the Shift of Liability from the Principal to the Agent

Department of Public Policy and Public Choice “Polis” The Department develops and encourages research in fields such as:

• theory of individual and collective choice; • economic approaches to political systems; • theory of public policy; • public policy analysis (with reference to environment, health care, work, family, culture,

etc.); • experiments in economics and the social sciences; • quantitative methods applied to economics and the social sciences; • game theory; • studies on social attitudes and preferences; • political philosophy and political theory; • history of political thought.

The Department has regular members and off-site collaborators from other private or public organizations.

Instructions to Authors

Please ensure that the final version of your manuscript conforms to the requirements listed below:

The manuscript should be typewritten single-faced and double-spaced with wide margins.

Include an abstract of no more than 100 words. Classify your article according to the Journal of Economic Literature classification system. Keep footnotes to a minimum and number them consecutively throughout the manuscript with superscript Arabic numerals. Acknowledgements and information on grants received can be given in a first footnote (indicated by an asterisk, not included in the consecutive numbering). Ensure that references to publications appearing in the text are given as follows: COASE (1992a; 1992b, ch. 4) has also criticized this bias.... and “...the market has an even more shadowy role than the firm” (COASE 1988, 7). List the complete references alphabetically as follows: Periodicals: KLEIN, B. (1980), “Transaction Cost Determinants of ‘Unfair’ Contractual Arrangements,” American Economic Review, 70(2), 356-362. KLEIN, B., R. G. CRAWFORD and A. A. ALCHIAN (1978), “Vertical Integration, Appropriable Rents, and the Competitive Contracting Process,” Journal of Law and Economics, 21(2), 297-326. Monographs: NELSON, R. R. and S. G. WINTER (1982), An Evolutionary Theory of Economic Change, 2nd ed., Harvard University Press: Cambridge, MA. Contributions to collective works: STIGLITZ, J. E. (1989), “Imperfect Information in the Product Market,” pp. 769-847, in R. SCHMALENSEE and R. D. WILLIG (eds.), Handbook of Industrial Organization, Vol. I, North Holland: Amsterdam-London-New York-Tokyo. Working papers: WILLIAMSON, O. E. (1993), “Redistribution and Efficiency: The Remediableness Standard,”

Working paper, Center for the Study of Law and Society, University of California, Berkeley.

Date post:	20-Nov-2020
Category:	Documents
Upload:	others
View:	1 times
Download:	0 times

The Propensity Score method in public policy evaluation: a...

Documents