Dipartimento di Politiche Pubbliche e Scelte Collettive – POLIS Department of Public Policy and Public Choice – POLIS
Working paper n. 88
April 2007
The Propensity Score method in public policy evaluation: a survey
Michela Bia
UNIVERSITA’ DEL PIEMONTE ORIENTALE “Amedeo Avogadro” ALESSANDRIA
Periodico mensile on-line "POLIS Working Papers" - Iscrizione n.591 del 12/05/2006 - Tribunale di Alessandria
The Propensity Score Method in Public Policy
Evaluation: a Survey♣
Michela Bia*
♣University of Florence – G. Parenti Statistics Department – Florence, Italy. University of Eastern
Piedmont, Department of Public Policy and Public Choice - POLIS, Alessandria, Italy. E-mail: [email protected]; [email protected]
*
The author wishes to thank F. Mealli, A. Mattei, D. Bondonio, A. Martini, G. Imbens.
ii
Abstract
Recently, in the field of causal inference, nonparametric techniques, that use
matching procedures based for example on the propensity score
(Rosenbaum, Rubin, 1983), have received growing attention. In this paper
we focus on propensity score methods, introduced by Rosenbaum and Rubin
(1983). The key result underlying this methodology is that, given the
ignorability assumption, treatment assignment and the potential outcomes
are independent given the propensity score. Much of the work on propensity
score analysis has focused on the case where the treatment is binary, but in
many cases of interest the treatment takes on more than two values. In this
article we examine an extension to the propensity score method, in a setting
with a continuous treatment.
iii
Introduction
Recently, in the field of causal inference, nonparametric techniques, that use
matching procedures based for example on the propensity score
(Rosenbaum, Rubin, 1983), have received growing attention. When the
data, the type of intervention and the assignment criterion allow it, a quasi-
experimental design can be assumed such as the regression discontinuity
design (Thwistelthwaite, Campbell, 1960; Battistin, Rettore, 2004). Another
assumption, that leads to another quasi-experimental design and may be
reasonable to assume in some observational studies, is that treatment
assignment is unconfounded with potential outcomes conditional on a
sufficient set of covariates or pretreatment variables. The unconfoundedness
assumption allows us to compare treated and control units with the same
value of the covariates. Given unconfoundedness, various methods have
been proposed for estimating causal effects. In this paper we focus on
propensity score methods, introduced by Rosenbaum and Rubin (1983). The
key result underlying this methodology is that, given the ignorability
assumption, treatment assignment and the potential outcomes are
independent given the propensity score. Thus, adjusting on the propensity
score removes the bias associated with differences in the observed
covariates in the treated and control groups. To estimate propensity scores,
which are the conditional probabilities of being treated given a vector of
observed covariates, we must specify the distribution of the treatment
indicator given pre-treatment variables.
iv
Much of the work on propensity score analysis has focused on the case
where the treatment is binary, but in many cases of interest the treatment
takes on more than two values (for example, we can think to drug applied in
different doses or a treatment applied over different time periods…etc). In
this paper we examine an extension to the propensity score method, in a
setting with a continuous treatment. The first section introduces the standard
propensity score analysis (Rosenbaum and Rubin, 1983) - that is when the
treatment is binary. The second section is a review of the propensity score
methodology with multiple treatment. The third section deals with the
propensity score method when the treatment is continuous.
1
1 The evaluation of public policies: some statistical methods
1.1 Introduction
The evaluation of policies carried out by using quantitative tools is a
tangible answer to the need to express “judgements empirically based on
achievement accomplished by a public policy when facing a particular
collective problem”. By “collective problem” we mean a situation that is
socially perceived as inadequate and, as such, worthy of change and
eventually worthy of public contribution. Think about pollution in city
centres, assistance to old people or the lack of competitiveness between
small and medium enterprises: these are all problems which require public
involvement through allocation of fund. When a problem is faced by an
intervention, we are referring to a public policy (Martini et al. 2005). The
statistic field of reference is that of causal inference, with the reference to
and the development of appropriate quantitative methods for policy effect
evaluation. The starting point of a policy effect evaluation is the
identification of the object of analysis, i.e., referring to the potential
outcome approach to causal inference (Neyman, 1923; Rubin, 1974), a
characteristic of the distribution of the difference between two potentially
observable outcomes: Y0 (a post-intervention variable observed on a unit -
individual or firm - in the absence of an intervention) and Y1 (a post-
intervention variable observed on a unit in the presence of an intervention).
Identification and estimation of such parameters present some relevant
problems: a) only one of the two potential outcomes is observed on a single
2
unit, the other representing the counterfactual situation; b) the assignment to
the treatment is usually not random, so estimation is based on observational
data; c) it is necessary to isolate the effect of the intervention from the
effects of other factors, which can influence access and results. Appropriate
estimation methods (parametric or nonparametric) should be based on
sensible hypotheses about the assignment rule, which allow to identify (even
partially, Manski, 1995, 2003) the causal effects. In observational studies, a
usual starting point consists in constructing a control group (units not
receiving the treatment, but similar to units receiving it), under the
unconfoundedness assumption (Rosenbaum, Rubin, 1983). In this section
we intend to describe the basic principles of such an approach, continuing
with a more formal discussion.
1.2 Potential results and the Rubin Causal model
The basic idea of causal inference is that of an action (or a treatment)
applied to a unit, where unit means a person or a company, at a specific
point in time. As a result, in the binary treatment case, for each unit and
each treatment there are two potential results: one referring to the value of
the outcome variable in the event of treatment, and the other in the event of
non treatment. The causal effect is the result of a comparison between the
two potential results. The use of the adjective ‘potential’ is motivated by the
impossibility of observing the outcome both with and without treatment.
This is defined as the basic problem of causal inference (Holland, 1986). In
this sense it is very useful to have information about several units, analysing
the distribution of the treatment effect and concentrating on summary
3
measures of such distribution, for example the average treatment effects. In
order to obtain correct estimates of such quantities, it is crucial to define the
assignment mechanism, described below. We now introduce some notation:
consider a population of N units. Each unit i is characterised by a k-
dimensional vector of Xi covariates, two potential results Yi(0) and Yi(1) and
a variable Zi { }1,0∈ , which denotes the assignment (Zi = 1) or not (Zi = 0) to
the treatment. X indicates the matrix (N × K) of the k units’ characteristics,
Y(0) and Y(1) the vector of the potential results and Z the vector for
assignment to treatment 1.
From the existing tie between the vectors of potential results (Y(0), Y(1))
and treatments (Z), we have two distinct relationships between the observed
and unobserved results, denoted by Yi (observed) and Yi (missing) respectively:
Yi (observed) = Zi⋅ Y(1) + (1-Zi)⋅ Yi(0)
Yi (missing) = (1-Zi)⋅ Y(1) + Zi⋅ Yi(0)
where Yi (observed) and Yi (missing) represent the i.th element of vectors Yi (observed)
and Y(missing). In order to identify and define causal effects, it is necessary to
make some assumptions. An important assumption, that reduces the number
of potential results is the following (Rubin, 1978a, 1980):
Assumption 1.1 Stable Unit Treatment Value Assumption
(SUTVA), under which the potential outcomes Yi(Zi) for the ιth unit
just depend on the treatment that the ιth unit received. That is, there
is “no interference between units” and there are “no versions of
treatments”.
4
It is necessary to emphasize how the reliability of such an assumption is
neither testable, nor removable and completely based on the experience of
the researcher. Identifying treatment effects relies on further assumption on
the assignment mechanism, that is, the mechanism that determines which
units get which treatments, formally defined as follows:
Definition 1.1 Assignment mechanism
Given a population of N units, the assignment mechanism is a row
exchangeable function p( )1(),0(,;( YYXZ ), with values included in
{ }N1,0 and so that ∑ =Z
YYXZp 1))1(),0(,;( , for each )1(),0(, YYX .
The probability of unit assignment is defined as follows:
Definition 1.2 Units assignment probability
The probability of assignment to treatment for unit i is given by:
))1(),0(,;())1(),0(,( 1 YYXZpYYXpizi ∑ == z .
Let X(i) indicates the matrix (N-1) × K dimension obtained removing the i.th
matrix row X; and analogously for Y(i)(0) and Y(i)(1). The exchangeability of
the assignment mechanism allows us to rewrite the N functions pi(.) in terms
of a common function q(.), that depends on the covariates and potential
results of unit i and on the covariates and potential results of all other units.
))1(),0(,( YYXp i = ))1(,)0(,),1(),0(,( )()()( iiiiii YYXYYXq
for any i = 1,…N
Strictly connected to the assignment probability concept there is the
propensity score, defined as follows:
5
Definition 1.3 Propensity score.
Given a population of N units, the propensity score is defined as:
{ }∑ == )/)1(),0(,())1(),0(,( : xixXi NYYXpYYXei
with xN equal to the number of units with xX i = . For each x
resulting in xN = 0, the propensity score is not defined. (Imbens,
2002).
This definition of the propensity score, which will be examined in detail in
later sections, will be useful later on for analysing our case study, where the
treatment turns out to be a continuous variable and a generalized propensity
score is defined to allow for treatment effect estimation with no binary
treatment. The definition of probabilistic assignment follows:
Definition 1.4 Probabilistic assignment
An assignment mechanism is referred to as probabilistic if for every i
the assignment probability is between 0 and 1, that is:
( ) 1)1(),0(,0 << YYXpi
This assumption requires that each unit has a non-zero probability of being
treated and, at the same time, there are no units with a probability equal to 1
of being treated.
6
1.3 Causal effects and identifying assumptions
Inferences about the effects of the treatments involve speculations about the
effect that one treatment would have had on a unit which, actually, received
an other treatment (Rosenbaum and Rubin, 1983a). If we consider the
binary treatment, according to the type of intervention assigned to the N
units under study, the i.th unit has both a response Yi(1), that would have
resulted if it had received treatment 1 and a response Yi(0) that would have
resulted if it had received treatment 0. As a result, causal effects are
comparison of )0()1( ii ,YY (for example, a difference )0()1( ii YY − or a ratio
)0(/)1( ii YY ). It is evident that estimating the causal effects of treatments is a
missing data problem, since either Yi(1) or Yi(0) is missing. In causal
inference in general – and in policy evaluation in particular – a quantity of
primary interest to be estimated is the average treatment effect ATE,
defined as follows:
ATE = )]0([)]1([)]0()1([][ iiiii YEYEYYEE −=−=τ
The estimation of the average treatment effect for a subpopulation (SATE)
having received treatment level z - with z = 0,1 - is equal to:
SATE = [ ]zZiE iT == :ττ
and when z = 1 the SATE is usually known as the ATT. In particular, in the
field of policy evaluation, we are interested in the ATT estimation, because,
with such an estimate, it is possible to assess how much the intervention
may have produced a change in a given condition or behaviour of the policy
7
beneficiaries. Now, in the case of randomized experiments, it has been
shown (Neyman, 1923) that the ATT can be easily estimated; this means
that it is possible to obtain an unbiased estimate of the average causal effect
through the SUTVA assumption, by a direct comparison between the
average results of the two treatment groups, in which units are similar, with
respect to any possible characteristics, including potential results, thanks to
randomisation. In an observational field1, however, such direct comparisons
may be misleading because the units exposed to one treatment generally
differ systematically from the units exposed to the other treatment.
Specifically, whereas in experimental situations one can obtain a control and
treatment group which are homogeneous with respect to the observable
characteristics, X, this is not possible in nonexperimental studies since it is
likely that the decision to be assigned to a treatment is, in this case, not
independent from the observable as well as unobservable characteristics2.
This leads to a self-selection process which makes the two groups
potentially different even before the policy is carried out. A possible way to
address this complication in nonexperimental studies is to consider the
randomized experiment as a template for the analysis of an observational
(i.e., nonrandomized) study. Having the template of a randomized
experiment means having to think about the underlying randomized
experiment that could have been done, where in the randomized experiment
underlying an observational study, the probabilities of assignment to
1 A study is considered observational when a treatment assignment is not known. 2 With random assignment, homogeneity of the control and treatment group with respect to the unobservable characteristics is also guaranteed if the size of the groups is sufficiently large.
8
treatments are not equal, but are rather functions of the covariates, and so
the template is actually an unconfounded assignment mechanism.
To do this we make the strong ignorability or unconfoundedness
assumption.
Assumption 1.2 Strong Unconfoundedness assumption
(Rosenbaum and Rubin, 1983)
Generally, we shall say treatment assignment is strongly ignorable
given a vector of covariates X if
XZYY ⊥)1(),0( and 1)1(0 <=< XZprob (common support)
referring, from now on, to )1(),0( YY instead of )1(),0( ii YY for the potential
results corresponding to the i.th individual. For brevity, when treatment
assignment is strongly ignorable given the observed covariates X, we shall
say simply that treatment assignment is strongly ignorable. The strong
ignorability assumption asserts that the probability of assignment to a
treatment does not depend on the potential outcomes conditional on
observed covariates. In other words, within subpopulations defined by
values of the covariates, we have random assignment. This assumption rules
out the role of the unobservable variables. The issue of unobserved
covariates should be addressed using models for sensitivity analysis (e.g.,
Rosenbaum and Rubin, 1983b) or using non parametric bounds for
treatment effects (Manski, 1990; Manski et al., 1992).
Of course, if the goal is to identify only the Average Treatment Effect for
the Treated (ATT), a weaker assumption can be made:
9
Assumption 1.3 Weak Unconfoundedness assumption (Rosenbaum
and Rubin, 1983)
XZY ⊥)0( and 1)1( <= XZprob
That is, the unconfoundedness assumption can be relaxed, requiring
only that Y(0) is independent of Z given X. Also the overlap
condition can be relaxed so that the support of X for the treated units
is a subset of the support of X for the untreated units
Both the unconfoundedness assumption and the overlap condition, may be
controversial in applications. The first assumption requires that all variables
that affect both outcome and the likelihood of receiving the treatment are
observed or that all the others are perfectly collinear with the observed ones.
Although this assumption is not testable, it is a very strong assumption, and
one that need not generally be applicable. Clearly selection may also take
place on the basis of unobservable characteristics. However, any alternative
assumptions that not rely on unconfoundedness, while allowing for
consistent estimation of the causal effects of interest, must make alternative
untestable assumptions. Whereas the unconfoundedness assumption implies
that the best matches are units that differ only in their treatment status, but
otherwise are identical, alternative assumptions implicitly match units that
differ in the pre-treatment characteristics. Often such assumptions are even
more difficult to justify. For instance, the technique of instrumental
variables is sometimes considered as an alternative to assuming
unconfoundedness (Heckman, 1979; Heckman and Hotz, 1989), but a
disadvantage of these methodologies is the high sensitiveness with respect
to the distributional hypothesis. A possible solution to this is a non or semi-
10
parametric approach through the selection of instrumental variables (Angrist
et al., 1996). But, since the identification of these variables is often
extremely difficult, the use of unconfoundedness assumption therefore may
be a natural starting point after comparing average outcomes for treated and
control units to adjust for observable pretreatment differences.
The strong ignorability assumption validates the comparison of treated and
control units with the same value of covariates; in fact the average treatment
effect (ATE) can be written as:
)]0()1([ YYE −=τ )])0()1(([ XYYEE −=
)],0)0(([)],1)1(([ xXZYEExXZYEE ==−===
)],0([)],1([ xXZYEExXZYEE ==−===
while the average treatment effect on the treated (ATT) formula may be
rewritten as follows:
]],1)0()1([[1 xXZYYEE ZxT ==−= =τ
]],1)0([[]],1)1([[ 11 xXZYEExXZYEE ZxZx ==−=== ==
]],0[[]],1[[ 11 xXZYEExXZYEE ZxZx ==−=== ==
Note that in both τ and Tτ , due to the unconfoundedness, what is not
known:
11
]],0)1([[ xXZYEE == and ]],1)0([[ xXZYEE == for τ ,
]],1)0([[1 xXZYEE Zx === for Tτ
can be substituted with what can be actually observed:
]],1[[ xXZYEE == and ]],0[[ xXZYEE == for τ ,
]],0[[1 xXZYEE Zx === for Tτ
Typically, there are many background characteristics that need to be
controlled for estimating the average causal effect and adjusting the
estimation for all these covariates can be actually unfeasible. Propensity
score technology, introduced by Rosenbaum and Rubin (1983a), addresses
this situation by reducing the entire collection of background characteristics
to a single “composite” characteristic that appropriately summarizes the
collection. In the following sections, we will focus on common variants of
such method.
1.4 Propensity score: definition and properties
Theoretically if the unconfoundedness assumption is valid, the expression
for the propensity score can be rewritten as follows:
)())1(),0(,( XeYYXe =
12
Formally the unit propensity score is the conditional probability that a unit
be assigned to treatment given pre-treatment variables:
)1()( xXZpXe ===
The propensity score is a balancing score, that is, where propensity score is
equal, distribution of covariates is the same for treatment and controls,
formally we can write (Rosenbaum and Rubin,1983):
Lemma 1.1 Balancing of pre-treatment variables given the
propensity score
)(XeZX ⊥
In particular, the propensity score is the coarsest balancing score, i.e., any
balancing score b(X) must satisfy the relation e(X) = f(b(X)), for some
function f (Rosenbaum and Rubin, 1983a). The key feature of propensity
score methodology is that, given the strong ignorability assumption,
treatment assignment and the potential outcomes are independent:
)()1(),0( XeZYY ⊥
and
1))(1(0 <=< XeZp
Thus, adjusting for the propensity score removes the bias associated with
differences in the observed covariates in the treated and control group. As a
13
result, given the strong ignorability assumption, if the propensity score e(X)
is known, it follows that:
[ ])0()1(( YYE −=τ
)](,0)0(([)](,1)1(([)]()0()1(([ XeZYEEXeZYEEXeYYEE =−==−=
and
)]](,1)0()1([[1 XeZYYEE ZxT =−= =τ
)]](,0)0([[)]](,1)1([[ 11 XeZYEEXeZYEE ZxZx =−== ==
where the outer expectation is over the distribution of e(X).
1.5 Matching and propensity score based methods
In what follows we will concentrate on the estimation of ATTs, although
the techniques can be easily modified and used for the estimation of ATE.
As already mentioned, the quantity
]],0[[1 xXZYEE Zx ===
which by the uncounfoundedness assumption is used to estimate the
unknown quantity
14
]],1)0([[1 xXZYEE Zx ===
may be computed using different procedures. The most appropriate way
would be to use the information about the untreated units, considering
eventual differences in terms of observable characteristics between the two
sub-populations of treated and untreated individuals. The most common
methodologies in use are the regression and matching techniques. The first
ones are based on the specification of a model for the outcome variable, for
example the simple linear regression or more complex models. However, it
is clear that correct specification of the model is crucial for correct causal
interpretation. On the contrary, matching techniques do not need any a priori
functional form specification between the dependent and independent
variables and, in this sense, they are more robust (Rubin, 1973a). We will
now describe the most common variants of matching, that, as already
mentioned, can be used together with the propensity score.
1.5.1 Matching types
A wide range of literature about matching procedure is available (see for
example Rubin, 1973a, b, Abadie and Imbens, 2004). These methods match
each treated unit/s to control unit/s according to different procedures. In
general, we may suppose to have a dataset concerning a population/sample
of N units. For each of the N units we observe (Yiobs, Zi, Xi) that,
respectively, represent the observed potential outcome, the treatment
indicator and the vector of the k covariates. Because ATT is our causal
estimand, Yi(1) is observed for every treated unit, whereas Yi(0), the
15
counterfactual outcome, must be somehow estimated. Matching allows to
find, in the control group, a value for Yi(0) identified on the basis of the Xi
pre-treatment variables. We can define by T0 the untreated unit group, with
z(j) the weight given to the unit j and with Ai = {j∈ T0: Xj ∈ C(Xi)} the
subgroup of the untreated units, which have are used to estimate Yi(0),
following criteria C(Xi).
By defining every type of matching with Ai and Zi, we obtain the following
definitions:
i) Exact matching:
Control unit/s with the same observed characteristics of the treated
units are sought out:
Ai = {j∈ T0: (Xi = Xj)}
The greatest problem concerning this type of matching is given by
the possibility that in the control group there is no unit with this type
of characteristic. The probability of such an event happening
increases with the number of covariates, if covariates are continuous
variables and if the sample is not too large.
ii) Caliper matching:
This type of matching is a generalization of exact matching. Instead
of requiring a perfect equality of the covariates, the (treated and
untreated) units characteristics are assumed to be “not too distant”.
This may be formalized as follows:
Ai = {j∈ T0: Xi(m) - Xj
(m) < c(m), m=1,i…,k)}
16
In this case the problem is choosing the threshold c(m) for each
covariate.
iii) Nearest neighbors:
This type of method allows us to overcome the multidimensionality
problem. In fact, according to this procedure, we may consider
suitable metrics to reduce the distance between covariates. An
appropriate solution is to choose the unit/s which are nearest, through
an appropriate distance function:
Ai = { j∈ T0 : minj ji XX − }
In this case matching type would depend on the chosen metrics, for
example the euclid distance, Mahalanobis distance (Rubin,1980a),
variance covariance matrix (Abadie and Imbens, 2002,…etc.
Another solution could be to include in Ai more than one unit,
varying appropriately the weights zij in the estimator.
1.6 Use of propensity score in matching techniques and
matching estimators
Matching methods, applied in connection to propensity scores, remove the
covariates multidimensionality problem. As previously mentioned, one of
the most important propensity score property is, in fact, to be one-
dimensional summary of multidimensional covariates X, such that when the
propensity scores are balanced across the treatment and control groups, the
17
distribution of all the covariates X are balanced in expectation across the
two groups. Rosenbaum and Rubin (1983) showed that for a specific value
of the propensity score, the difference between the treatment and control
means for all units with that value of the propensity score is an unbiased
estimate of the average treatment effect at that propensity score, if the
treatment assignment is strongly ignorable given the covariates. Thus
matching or regression (covariance) adjustment on propensity score tends to
produce unbiased estimates of the treatment effects when treatment
assignment is strongly ignorable. Here the basic matching techniques
(Rosenbaum and Rubin, 1984, Dehejia and Wahba, 1999) and estimation
based on the propensity score methods presented.
i) Stratification matching
According to this method, the propensity score is divided in blocks
so that in each layer the covariates are balanced and the assignment
to treatment can be considered random. Once the stratification
responding to such properties is obtained, treatment effect estimation
is carried out through two steps. First, within each interval, we
compute the difference between the means of the observed potential
outcomes for treated and untreated units (obtaining a conditional
effect estimation for that block. Second, we estimate the ATT effect
weighting each difference according to treated units distribution
inside each block (see Stratification matching estimator formula
1.6.1 section).
ii) Nearest neighbor matching
18
This matching procedure matches to each treated unit that specific
untreated unit that has the nearest propensity score:
Ai = { j∈ T0 : minj )()( ji XeXe − } with 1=∑∈ iAj
ijz
The control group is represented by just one control unit and the
selection is usually made with repetition, so it is possible to match it
several times to various treated units. As a result, the number of
control units, used for the intervention effect estimation, may be
lower than the number of treated units. According to this method, it
is possible to match some treated units to control units with a very
different propensity score, in that it is the nearest among those
singled out. As a result, a minimal distance between the two
propensity scores needs to be set up. The group Ai may be
considered, however, suitably redefined so as to include more than
one neighbour for each treated (number to be defined beforehand).
iii) Radius matching
Each treated unit is matched to control units with a propensity score
interval which is minor or equal to a certain “radius” δ and the
number of controls to be used for the Yi(0) identification is not
defined:
Ai = { j∈ T0: e(Xi) - δ ≤ e(Xj) ≤ e(Xi) + δ} with 1=∑∈ iAj
ijz
19
This procedure, compared to the previous one, has two basic
differences: some treated units may be rejected because there is no
untreated unit with a propensity score within the defined interval,
more than one untreated unit can be matched to a single treated unit,
as there are more untreated units with a propensity score includes in
the interval. The choice of range δ has been made as a compromise
between two existing requirements. In fact, if the range is very
small, some treated units will be missed, but making comparison
between “very similar units” will be an advantage; vice versa a wide
range will mean a higher number of controls, but these will be “less
similar” to the treated units
iv) Kernel matching
Each treated unit is “matched” to all untreated units (Ai. = T0), with
weights varying inversely to the distance of their propensity score
from treated units propensity score. We use this type of weighting
system:
where k(.)3 is a density function and h is the bandwidth parameter.
3For example the Kernel density function:
−−=
2)()(
21
exp211
h
xexe
hk ij
j π
∑ ∈
−
−
=
0
)()(1
)()(1
Tj
ij
ij
ij
hxexe
kh
hxexe
kh
z
20
1.6.1 Matching Estimators
We list the formulas for the matching estimators introduced in the previous
section and their variance:
Nearest Neighbor and radius matching
The average effect on the treated, applying the nearest neighbor or
radius matching method, is equal to the following formula (where n
stands for either nearest neighbor or radius matching and the number
of units in the treated group is denoted by NT ):
( ) ( )∑ ∑∈ ∈
−=
Ti Ajiiji
T
n
i
YzYN
011
τ
with 1=∑∈ iAj
ijz
The variance estimator is assumed to have fixed weights and
indipendent outcomes accross units:
)]0(())1(([)(
1)(
0
22 j
Tjj
Tii
T
n YVarzYVarN
Var ∑∑∈∈
+=τ
)]0((
)(1
))1(([)(
1
0
222 j
TjjTi
T
T
YVarzN
YVarNN ∑
∈
+=
)0((
)(1
))1((1
0
22 j
TjjTi
T
YVarzN
YVarN ∑
∈
+=
21
where T0 denotes the selected control sub-group applying the
matching procedure ∑=i ijj zz . Standard errors are obtained
analytically using the above formula, or using the bootstrap method,
even if this last point appears to be controversial for nearest neighbor
matching, since standard errors seem to be inconsistent according to
this procedure (Imbens, 2004).
Stratification matching:
By construction, the propensity score is divided in blocks so that in
each layer the covariates are balanced and the assignment to
treatment can be considered random. As a result, the difference
between the means of the observed potential outcomes for treated
and untreated units, is equal to:
01
)0()1(
q
qj j
q
qi isq N
Y
N
Y ∑∑ ∈∈ −=τ
where 1qN and 0
qN denote the number of treated and control units
inside each block q. The estimator of the ATT is computed weighting
each differences sqτ according to treated units distribution in each
block.
T
q
sq
s
N
N 1
1∑
=
= ττ
22
where Q is the number of layers and NT is the total treated units.
Assuming independence of outcomes across units, the variance sτ is
computed by:
)]())1(([1
)( 00
1
1
1
jq
q T
qi
T
s YVarN
N
N
NYVar
NVar ∑
=
+=τ
Standard errors are obtained analytically using the above formula, or
using the bootstrap method.
Kernel matching:
The kernel matching estimator is given by:
∑∑∈∈
−=0
)]0()1([1
Ijjij
Tii
T
k YzYN
τ
where ijz is computed by the formula:
∑ ∈
−
−
=
0
)()(1
)()(1
Ij
ij
ij
ij
h
xexek
h
h
xexek
hz
In this case standard errors are easy to obtain using bootstrap method.
1.7 Alternative estimation methods
Alternatives to matching methodologies are outlined in this paragraph. We
will focused on the Difference in Difference and Heckman selection model.
23
DID methods for estimating causal effect of policy interventions are widely
used in economics, in particular when outcomes are measured in both the
treatment and control group before and after the policy intervention. In the
standard DID model we have N individuals (usually random sample from
the population), observed in time periods iT = (t-m),…(t-1),(t),(t+1)…(t+k),
with (t-m),…(t-1) and (t+1)…(t+k) denoting the pre and post - policy
intervention period, respectively, while the error terms iε are assumed to be
additive and constant over time. To account for time trends unrelated to the
treatment, the change experienced by the group subject to the intervention
(treatment group) is adjusted by the change experienced by the no-
beneficiary group (control group). Meyer (1995), Angrist and Krueger
(2000), Blundell and MaCurdy (2000) describe many applications of this
methodology. In the field of Program Evaluation, the difference in
difference method (Moffit, 1991; Heckman and Robb, 1985) involves the
use of panel data to better define the control group and reduce the selection
bias effect. A great number of observational units: Yi,t-1, Yi,t-2, Yi,t-3… Yi,t-m, -
before programme intervention at time (t) - can be (potentially) considered
in the model. This means that we can highlight existing systematic
differences between the treated and untreated groups. Taking into account
these differences allows us to obtain unbiased treatment effect estimation,
since they could influence the outcome value independently to the program.
It is important to underline that a greater number of observations, before
program treatment, that take into consideration the differences related to the
temporal trends before policy actuation, certainly improving the estimate of
the unobservable conterfactual measure.
24
Note that, the interpretation of the standard DID estimator depends on the
assumptions about treatment effect with respect to the individuals. It is, in
fact, often assumed to be constant across individuals, but more generally the
effect of the intervention might differ across individuals, then the standard
DID estimator gives the average intervention effect on the treatment group.
Recently Imbens and Athey (2005) proposed a different approach from the
standard DID method. They allow the effects of both time and intervention
differ systematically across individuals (e.g, we can think about new
medical technology that differentially benefits sicker patients). The setting
considered in their research is that of repeated cross-sections4 of individuals
observed in a treatment group and a control group, before and after the
treatment. They propose an estimator for the entire counterfactual treatment
effect distribution on the treatment group as well as the treatment effects
distribution on the control group, where the two distributions may differ
from each other in arbitrary ways. First they propose a new model that
relates outcomes to an individuals’ group, time and unobservable
characteristics. Groups can differ in arbitrary ways (and, in particular, the
treatment group might have individuals who experience a high treatment
benefit). In DID method the mean of individual outcomes in the absence of
the treatment can vary by group and by time. In contrast, in their model,
time periods and groups are treated asymmetrically. Second, they provide
conditions to identify the model in a non parametric way, proposing an
estimation strategy based on the identification method. They use the entire
control group outcome distributions pre and post intervention to make a non
parametric estimation of the change occurred on the group. Assuming that
the outcomes distribution in the treatment group would be the same (that is, 4 But they apply their model also to panel data.
25
with the same change), they estimate the counterfactual distribution for the
treatment group in the second period. They compare this counterfactual
distribution to the actual post-intervention distribution for the treatment
group, yielding an estimate of the treatment effects distribution for treated
units. Using a similar strategy they define the treatment effect on the
control units. In other words, to figure out what would have happened to a
treated unit in the first period with outcome Y, they look at units in the first
period control group with the same outcome Y. Under weak monotonicity
assumption, the distribution of their second period outcomes is possible to
be derived, using that to obtain the counterfactual distribution for the second
period treated units with no policy intervention.
In this way it is possible to evaluate a range of economic questions
suggested by policy analysis, such as, for example, which part of the
distribution benefits most from a policy intervention, always basing on a
consistent economic model of the outcomes. The proposed CIC model has
many advantages. It allows the distribution of unobservable characteristics
to vary across groups in arbitrary ways. It allows for changes of the
distribution outcomes, both in mean and variance, over time and without a
policy intervention. Moreover, it is possible to estimate the effects of a
policy on the mean and variance of the treatment groups distribution relative
to the original time trend in these moments. It is clear that the DID model is
assumed to be a special case of the change in change model.
One common worry (Besley and Case, 2000) is that the effects identified by
DID may not be correct if the policy occurred in a “field” that derives
atypical benefits from the policy intervention. It implies that the treatment
group may differ from the control group not just in terms of the outcomes
distribution in the absence of the treatment, but also in the effects of the
26
treatment. Athey and Imbens’ model allows for both of these eventualities
across groups, because they allow the effect of the treatment to vary by
unobservable characteristics of an individual and the unobservable
distribution varies across groups.
Another model that is usually used to remove the hypothesis of selection on
observable (unconfoundedness assumptions) is the Heckman selection
model (Heckman, 1974) which can be specified in its simplest form as
follows:
iiii ZXY 1210 εβββ +++=
iii XZ 210* εγγ ++=
that is, the model includes latent dependent variables models. iY is the
outcome and *iZ is latent variable underlying the treatment indicator Z. X is
a matrix ((N1 + N0)×h) with h equal to the number of characteristics constant
over time, for the i.th unit, before policy intervention. The errors
components 1ε , 2ε are assumed to be jointly bivariate normally distributed
conditional on X, with zero mean vector and variance matrixΣ , so that:
),0(1 Σ≈ Nε , ),0(2 Σ≈ Nε
with ρεε =),( 21corr
It is possible to remove the bivariate normality assumption of the errors in
the following cases: maintaining the monotonicity assumption with the
27
availability of an instrumental variable (semiparametric Heckman’s
selection models, Deaton(1989), Hausman and Newey (1995)) or, if an
instrumental variable is not available, introducing non parametric bounds
(Lee, 2005). However, most of the recent studies, aimed to develop
semiparametric versions of selection models (Newey and Vella, 2003),
while keeping some of the previous assumptions: Powell (1987), Newey
(1988), Ahn and Powell (1993) and Honore and Powell (2001).
28
2 Multivalued treatment
2.1 Introduction
The Rubin causal model is usually presented for binary treatments, although
in principle, in many cases of interest, the treatment takes on more than two
values. There are many examples of that: we can think about drug applied in
different doses or a treatment applied over different time periods, as well as
labour market programmes that need a more complex framework including
the actual choice set of individuals, certainly characterized by more than
two options. Anyway, in all these cases, the standard propensity score
methodology must be modified in a non-trivial way. As a consequence,
methods have been developed in order to extend the conventional two
treatments framework to allow for estimation of average causal effects with
multiple mutually exclusive treatments. Imbens (1999) and Lechner (2000)
gave, with this respect, the major methodological contributions. They refine
identification using strong and weak unconfoundedness assumptions for the
case of more than two treatments. In the following sections we present and
compare both approaches.
2.2 The basic framework.
In order to extend propensity score application from binary treatment to
arbitrary treatment regimes, we report the basic assumptions available in the
first case that we can usefully generalize also in multiple treatment. Let’s
29
summarize the conventional Rubin Causal Model. We have a binary
treatment, that is Zi { }1,0∈ 1. Associated with each unit i = 1,2…N and each
value of the treatment z, there is a potential outcome Yi(z). We are interested
in the average outcome, E[Y(z)] and particularly in the average causal effect
of exposing units to treatment or not: E[Yi(1) - Yi(0)]. A key assumption,
that we will now restate for the identification of causal effect is the
uncounfoundedness assumption in its two strong and weak forms.
Assumption 2.1 Strong unconfoundedness assumption
Assignment to treatment Z is strongly ignorable, given pretreatment
variable X, if
{ } { } XZzY z ⊥∈ 1,0)(
In order to redefine
the weaker version of unconfoundedness, we define Di(z) to be the
indicator, for unit i, of receiving treatment z.
=
=otherwise
zZif
zDi
i
0
1
)(
As a result, weak unconfoundedness assumption is defined in the following
way:
Assumption 2.2 Weak unconfoundedness assumption
1 See assumptions in previous section: Potential outcomes.
30
Assignment to treatment Z is weakly ignorable, given pretreatment
variable X, if
XzDzY )()( ⊥ for all Zi { }1,0∈ .
As we can see, Rosenbaum and Rubin show how strong unconfoundedness
requires the treatment Z to be independent of the entire set of potential
outcomes, while weak unconfoundedness implies only pairwise
independence of the treament indicator with each of the potential outcomes.
Moreover, weak unconfoundedness requires a local independence of the
potential outcome Y(z) with respect to the considered treatment level. This
means independence of the level indicator )(zD , rather than of the entire
vector of treatment values Z. In the binary treatment case, first and second
condition are obviously the same thing. It is clear that the importance of the
two ignorability assumptions versions is strictly related to what we are
interested in estimating. Particularly, the weak unconfoundedness concept is
linked to the missing data problem of causal inference. More often the
concern is, infact, with the average of Yi(z) in the sub-sample with
1)( =zDi . As a consequence, units with 0)( =zDi did not receive treatment
level z and the other potential outcomes Yi(0) are never observed for the
units with 1)( =zDi , so that they can play no role in any adjustment for
differences procedures by defining subpopulations. This lack of relevance is
well reflected by weakly ignorable assumption. In addition we report the
following Lemma 2.1 and Lemma 2.2.
Let be )(Xe the propensity score in binary treatment case, we have:
31
Lemma 2.1 Balancing property of pre-treatment variables given the
propensity score (Rosenbaum and Rubin, 1983)
)(XeXZ ⊥
Lemma 2.2 Weak unconfoundedness given the propensity score with
binary treatments (Rosenbaum and Rubin, 1983)
)()()( XezDzY ⊥ for all Zi { }1,0∈
According to this result, it is sufficient to condition on the propensity score
instead of the entire set of covariates (Imbens,1999). Formally, we also
report the following Theorem that will be useful in section 2.4, in order to
introduce the average treatment effect estimation in multivalued treatment
case.
Theorem 2.1 Adjustment for propensity score given weak
unconfoundedness assumption:
i) ])(,)([])()([),( eXezZzYEeXezYEez ====≡µ
ii) ))](,([[)]([ XezEEzYE µ= for all Zi { }1,0∈ .
2.3 Multiple treatment
From now on, we allow the treatment variable to take on integer values
between 0 and k. Let T be the treatment variable in the multi-valued case,
so that { }kT ,....1,0= and Xi the set of covariates such that χ∈X . It is
32
assumed that each individual i = 1,2…N is assigned to one specific
treatment.
We are interested in the population average treatment effect and,
particularly, in the average causal effect of exposing units to treatment t or
to treatment s, that is:
)]()([ sYtYEATEts −=
which denotes the ATE of the treatment t relative to treatment s for a
participant drawn randomly from the population. The average effect of
treatment t relative to treatment s, for the sub-population having received
treatment level t only, can be defined as follows:
])()([ tTsYtYEATTts =−=
Imbens and Lechner refer to different versions of unconfoundedness
assumptions according to the type of treatment effect that is needed
to identify and estimate. The following weak ignorability
assumptions can be introduced:
Assumption 2.3 Weak unconfoundedness given pre-treatment
variables X (version 1) (Imbens 1999)
XtDtY )()( ⊥ Tt ∈∀
Assumption 2.4 Weak ignorable assumption (version 2)
XsDtDsY )(),()( ⊥
33
Assumption 2.5 Strong ignorability assumption (Lechner, 2000)
xXTsYtY =⊥)(),(
Assumption 2.6 Weak ignorability assumption (Lechner, 2000)
{ }tsTxXTsY ,,)( ∈=⊥
Synthetically we report the average treatment effects that can be identified
under each of the previous assumptions:
)]()([ sYtYEATEts −=
according to the assumption 2.3 and assumption 2.5
]1)()()([ =−= tDsYtYEATTts
according to the assumption 2.4 and assumption 2.6
Again, there are many background characteristics that need to be controlled
for estimating the average causal effect and adjusting the estimation for all
these covariates can be unfeasible. In this sense, the introduction of a
Propensity score generalized to arbitrary treatment regimes results very
useful since the propensity score summarizes the information on the
background characteristics in an appropriate single summary score. As a
consequence, we need to modify the standard definition of propensity score,
to allow for the implementation of a generalized propensity score (Imbens,
1999):
Definition 2.1 Generalized propensity score
34
The Generalized propensity score (GPS) is the conditional
probability of receiving a particular level of the treatment given the
pre-treatment variables:
])([)Pr(),( xXtDExXtTxtr ====≡
According to this notation, the propensity score in the binary treatment is
equivalent to:
),1()( xrxe =
Hence, i) the GPS defines a single random variable as a transformation of
the two random variables T and X: r(T,X); ii) it defines a family of random
variables indexed by t as a transformation of X alone: r(t,X) for all T∈t .
The GPS also satisfies the balancing property, like the conventional
propensity score:
Lemma 2.3 Balancing property of the Generalized Propensity
Score
),()( XtrXtD ⊥ for all Tt ∈ .
Proof (Imbens, 1999)
First we have
),(])([)],(,1)(Pr[ XtrXtDEXtrXtD ===
in fact by definition
35
])([),( XtDEXtr =
Second
],()],(,)([[)],()([)],(1)(Pr[ XtrXtrXtDEEXtrtDEXtrtD ===
Hence
)],(1)(Pr[()],(,1)(Pr[ XtrtDXtrXtD === ,
that is, conditionally on r(t,X), the treatment indicator D(t) and the pre-
treatment variables are independent. It is important to note that the
conditioning argument changes according to the level of treatment. As a
result, to guarantee conditional independence of the multi-valued treatment
T and covariates X, we need to condition on the entire set of T∈tXtr )},({ . It
is only in the binary treatment case that conditioning on T∈tXtr )},({ is
identical to conditioning on a single score e(X). As a result, all previous
unconfoundedness assumptions can be re-written given the generalized
propensity score definition. In fact, if strong or weak ignorability
assumptions given the covariates are available, then:
Theorem 2.2 Weak unconfoundedness given GPS (Imbens, 1999)
Suppose assignment to treatment T is weakly unconfounded given
pre-treatment variables X (version 1), then:
),()()( XtrtDtY ⊥ Tt ∈∀
Proof
36
)],(),()([)],(),(1)(Pr[( XtrtYtDEXtrtYtD ==
)],(),()],(,),()([[ XtrtYXtrXtYtDEE=
),()],(,),(),([ XtrXtrXtYXtrE ==
Moreover, as shown in the proof for Lemma 2.3,
),()],(1)(Pr[ XtrXtrtD == .
Hence,
)],(1)(Pr[)],(),(1)(Pr[ XtrtDXtrtYtD === ,
so, conditionally on r(t,X), D(t) and Y(t) are independent.
Assumption 2.7 Weak unconfoundedness given GPS (version 2)
),(),,()(),()( XsrXtrsDtDsY ⊥
According to Lechner’s approach we can re-write the previous assumptions
3 and 4, given the pre-treatment variables, in the following way:
Assumption 2.8 Strong unconfoundedness given GPS (Lechner,
2000)
If xXTsYtY =⊥)(),( and 1)Pr(0 <==< xXjT hold for
χ∈∀x and for kstj ,....,...1,0=∀ ,
It follows that
37
)]Pr()Pr(),...,Pr(
)Pr(),Pr()[Pr()(),(
xXkTXkTxXsT
XsTxXtTXtTTsYtY
=======
=====⊥
Assumption 2.9 Weak unconfoundedness given GPS (Lechner,
2000)
If { }tsTxXTsY ,,)( ∈=⊥ and 1)Pr(0 <==< xXjT hold for
χ∈∀x and stj ,=∀
It follows that
{ }],),(Pr)([Pr)(),( ,, stTxXTsYtY stssts ∈=⊥
where
{ })Pr()Pr(
)Pr(),,Pr(
xXtTxXsTxXsT
xXstTsT==+==
====∈=
.
2.4 Implementation of the GPS in multi-valued treatments.
Since the GPS has analogous properties to the propensity score used in
binary treatment, we now apply it instead of the covariates, in order to
obtain the ATEts and ATTts estimations. In the binary treatment case, the
propensity score is computed using a logistic regression. In the multi-valued
case could be applied multinomial logit or nested logit models (with ordered
levels of treatments in the second case, for example the dose of a drug or
time over which a treatment is applied, …etc). Given the generalized
propensity score, we can compute the average outcomes estimation by
38
conditioning solely on the GPS. As a result, according to Theorem 2.2 and
imposing smoothness of the expectation function if appropriate, the
conditional expectation of the outcome can be estimated (Imbens, 1999),
given the treatment t and the probability of receiving the treatment actually
received, applying the following Theorem:
Theorem 2.3 Estimation of Average Potential Outcomes given the
generalized propensity score, supposing assignment to treatment
weakly unconfounded given the pre-treatment variables.
Then
i) ]),(,[]),()([),( rXTrtTYErXtrtYErt =====β
ii) ))],(,([)]([ XtrtEtYE β= by iterated expectations
for all Tt ∈ .
Proof
The proof concerns part i), since part ii) follows by applying iterated
expectations
]),(,)([]),(,[ rXTrtTtYErXTrtTYE =====
]),(,1)()([]),(,[ rXtrtDtYErXtrtTYE =====
which by weak unconfoundedness assumption is equal to
]),([ rXtrYE =
39
Note that to obtain the population average value (which, as we will show in
the continuous case, is the causal effect estimation) we need to apply
iterated expectations on ),( rtβ , i.e ))],(,([)]([ XtrtEtYE β= . We can
consider the subpopulations obtained as strata of the population applying the
GPS. In particular, let Y(t) be the average value for units with treatments t
and r(T,X) = r, this is an unbiased estimate of the average Y(t) for the
subpopulation with T = t and r(t,X) = r. The reason is that the former
subpopulation with r(T,X) = r is the same as the latter one with r(t,X) = r.
As a result, the average of Y(s) for units with T = s, in the same
subpopulation with r(T,X) = r, is unbiased for the average of Y(s) in a
different subpopulation with r(s,X) = r (that is, with a different set from
subpopulation with r(t,X) = r ). Hence no causal comparison can be possible
within the subpopulation defined by r(T,X) = r and the regression of the
observed Y on the treatment level T and the GPS r(T,X) = r has no causal
interpretation.
Formally consider the difference
]),(,)([]),(,)([),(),( rXTrsTsYErXTrtTtYErsrt ==−===− ββ
by weak ignorability assumption (version 1) this is equal to
]),()([]),()([ rXsrsYErXtrtYE =−=
but there is no causal interpretation for the comparison conditional on the
GPS value, because the conditioning sets differ:
{ } { }rxsrxrxtrx =≠= ),(),(
40
In order to obtain a causal interpretation, we need to condition the difference
to the intersection of the two conditioning sets, that is:
)],(),,(,)([)],(),,(,)([ XsrXtrsTsYEXsrXtrtTtYE =−=
)],(),,()()([ XsrXtrsYtYE −
But, if the researcher is interested in the dose-response of a specific sub-
population or in the average effect of a specific treatment versus another
one, the average should be computed over the distribution of the pre-
treatment variables in that particular sub-population. For example, we can
estimate the expected (average) effect of treatment t relative to treatment s
for the sub-population having received treatment level t only. In particular,
according to the weak unconfoundedness (version 2), the tsATT is supposed
to be equal to:
]1)()()([ =−= tDsYtYEATTts
]1)()([]1)()([ =−== tDsYEtDtYE
]],1)()([[]],1)()([[ XtDsYEEXtDtYEE =−==
that by weak unconfoundedness is equal to:
]],1)()([[]],1)()([[ 1)(1)( XsDsYEEXtDtYEE tDXtDX =−= ==
and given the generalized propensity score, we can rewrite it as follows:
41
]]),()([[]]),()([[ 1)(),(1)(),( rXsrsYEErXtrtYEE tDxsrtDxtr =−= ==
where the outer expectation is over the treated units having received
treatment level t. Remember that, since the treatment can take on more than
two values, it is important to be sure that there is sufficient overlap in the
distribution of pre-treatment variables by treatment of interest. The
procedure is to compare for each value of t the univariate distribution
),( Xtr conditional on T = t with the same distribution with T ≠ t. If the two
distributions are similar, then all adjustment methods can be well
performed. Of course, other types of procedures can be applied in order to
obtain the average treatment effects estimation. For example, we can use
matching techniques, assuming that each individual is assigned to one
specific treatment and that, for any participants, only one component of
T∈ttY )}({ can be observed, while the remaining outcomes represents the
counterfactuals units. We introduce a pairwise comparison of the treatments
t and s according to the following equations (Lechner, 2000):
)]([)]([)]()([ sYEtYEsYtYEATEts −=−=
that denotes the ATE of the treatment t relative to treatment s for a
participant drawn randomly from the population.
Note that ATEts can be re-written in the following way:
)()])(())(([)]()([1
jTPjTsYEjTtYEsYtYEATEK
jts ==−==−= ∑
=
The strong unconfoundedness condition identifies all counterfactuals:
42
))(( jTtYE = and ))(( jTsYE = , because it implies ),)((),)(( tTxXtYEjTxXtYE ===== and ),)((),)(( sTxXsYEjTxXsYE ===== , kj ,..1,0=∀ .
As a result, stATE , tsATE tsATT , stATT , are identified.
The expected effect for an individual randomly drawn from the population
of participants in treatment t only is, instead, equal to:
))(())((])()([ tTsYEtTtYEtTsYtYEATTts =−===−= 2
The weak unconfoundedness condition identifies only the counterfactual
))(( tTsYE = that is needed to compute the ATTts. Note that this last
assumption is derived from the independence and assignment in population
that implies independence in any subpopulation defined by treatment
participation categories.
However, a stronger ignorability assumption of treatment assignment (with
respect to assumption 2.5) can be also adopted for arbitrary treatment
regimes, in order to model T without conditioning on potential outcomes
(Van Dik and Imai, 2003). We postpone a discussion on the generalization
of the propensity score, under strong ignorability assumption, in the
continuous treatment case, also comparing Van Dik’s approach (2003) with
respect to Hirano and Imbens’ elaboration (2004) of the propensity score
method applied for the treatment effect estimation.
2 It is evident that, if the participants in treatments t and s differ in a non-random way, this can influence the outcome values: ATTts ≠ ATTst, that is to say they are not symmetric.
43
3 Continuous treatment
3.1 Introduction
We showed how, under specific assumptions, like the strong ignorability
treatment assignment, multivariate adjustment methods based on the
propensity score have the property of reducing the bias that arises in
observational studies.
In this project we implement an extension of the propensity score method in
a setting with a continuous treatment, that is we refer to the generalized
propensity score already introduced in multiple treatment case. We make an
unconfoundedness assumption (Rosenbaum and Rubin, 1983) and adjust for
the Generalized Propensity Score (function of the covariates) to remove all
bias associated with differences in the covariates. The Generalized
Propensity score is just a generalization of the binary treatment propensity
score (Imbens and Hirano, 2004; Van Dik and Imai, 2003), with many of its
characteristics and balancing property which are essential to assess the right
specification of the score. We proceed to the estimation and inference of
the causal effects of interest in a parametric way (even if a non parametric
version is possible). We apply this methodology to the public contributions
(treatment variable) supplied to the Piedmont enterprises, during years 2001
- 2003 . Due, infact, to the variety of funds set by public policies, the
treatment turns out to be a continuous variable. We are interested in the
effect of the amount of contribution on occupational level.
We estimate the average effect of the contribution adjusting for the
difference in background characteristics using the propensity score
44
methodology and compare the results to conventional regression based
methods. According to the empirical evidence (Dehejia and Wahba 1999;
Imai 2004) the former methodology often leads to more robust results than
the latter one or other estimation methods, such as DIDor selection model
presented in section 1.7.
3.2 Framework
We consider a sample of units i=1,2,…,N and, for each unit, we have a set
of potential unit-level outcomes Yi(t) for t∈ τ. In the binary treatment τ =
{0,1}, but in the continuous case we have τ ⊂ [t0,t1]. We are interested in the
average dose-response function µ(t)=E[Yi(t)], in correspondence with the
observed vector of covariates Xi and the level of the assigned treatment t
[i.e Yi = Yi(t)].
We assume {Y(t)}t∈τ , T, X defined in a common probability space, T
continuously distributed with respect to Lebesgue measure on τ and Y =
Y(T) a well-defined random variable; i.e Y(.) suitable measurable.
We are interested in the estimation of average causal effects, which can be
computed through the dose-response function µ(t) and in particular in the
ATE and ATT, such as:
][ Y(t)t)Y(tEATE?t,t −+= ?
][ tY(t)t)Y(tEATT?t,t −+= ?
45
that is, in the continuous treatment case we can be interested in marginal
treatment effect estimation, for example with respect to a specific treatment
level t. Imbens and Hirano (2004) generalize the uncounfoundness
assumption available for binary treatment (Rosembaum and Rubin 1983) to
the continuous case and crucial for the estimation of the above quantities.
Assumption 3.1 Strong ignorability assumption of treatment
assignment (Van Dyk and Imai, 2003)
{ } XTtY t ⊥∈τ)(
Assumption 3.2 Weak unconfoundedness assumption (Imbens and
Hirano, 2004)
Y(t) ⊥ T|X for all t ∈ τ.
Assumption 3.2 requires a conditional independence for each value of
treatment t ∈ [t0,t1] and not joint independence of all potential outcomes
{ }τ∈ttY )( .
As already underlined, there are many characteristics that need to be
controlled for the average treatment effect estimation. The introduction of a
generalized propensity score reduces the entire collection of background
characteristics to a single “composite” variable that appropriately
summarizes them. Here the GPS definition:
Definition 3.1 Generalized Propensity Score
Let r(t,x) be the conditional density function of the treatment given
the covariates
46
r(t,x) = fT|X (t|x)
such that R(T,X) and r(t,X), for every t ∈ τ, are well-defined random
variables.
The conditional distribution fT|X(t|x) must be modeled and its unknown
parameters must be estimated using, for example, maximum likelihood
method1. Misspecification of the model for the propensity score is possible
and generally leads to biased causal inference estimation. Hence, care must
be taken to identify as many covariates as possible, as well as to check for
model misspecification (Drake, 1993). The generalized propensity score can
be also defined through a propensity function:
),()()( XTrxtf XT ψψ =
where its distribution is assumed to be parameterized by ψ (Van Dik and
Imai, 2003). Under these analytical framework, it is possible to derive
theoretical results which extend those in Rosenbaum and Rubin (1983b).
Dik and Imai (2003) show the propensity score is a balancing score even
with a non binary treatment, so that it could be applied to arbitrary treatment
regimes, also reducing the dimensionality of X enough to allow for the
application of efficient estimation techniques. Formally we have:
Lemma 3.1 Balancing of pre-treatment variables given the
generalized propensity score (Van Dyk and Imai, 2003)
1 We can think to the normal distribution for the treatment given the covariates
)),;(( 2σβiii XhNXT ∼ where β is the parameter vector, ),( βiXh is a known function of the
covariates which depends on the parameters β to estimate and σ2 is the unknown common variance of the errors.
47
Within strata with the same value of r(t,X) , the probability that T = t
does not depend on the value of X:
X ⊥ 1{T=t}| r(t,X)
This definition does not require unconfoundedness.
The following theorem establishes that the potential outcomes and the
treatment assignment are conditionally independent given the generalized
propensity score. Formally we write:
Theorem 3.1 Strong unconfoundedness given the Generalized
Propensity Score (Van Dyk and Imai, 2003)
))(.,})(({))(.,,})(({ XrtYfXrTtYf tt ττ ∈∈ =
Proof (Van Dyk and Imai, 2003)
Theorem 3.2 Weak unconfoundedness given the Generalized
Propensity Score (Van Dyk and Imai, 2003)
If assignment to the treatment is (weakly) unconfounded given the
pre-treatment variables X, then:
fT (t| r(t,X), Y(t)) = fT (t|r(t,X)) for each value of t.
Proof (Van Dyk and Imai, 2003)
In other words, if the balancing hypothesis of Lemma 3.1 is satisfied,
observation with the same GPS must have the same distribution of
observable characteristics, independently of treatment’s value. So, just like
48
for the standard propensity score, exposure to treatment is random and
treated and control units should be on average identical. Hence, having the
generalized propensity score equivalent properties to the propensity score
for binary treatment, it can be applied, instead of covariates, as one
dimensional score summarizing the information on the background
characteristics, so leading to more efficient average treatment effect
estimations. The difference, here, is that the conditional density of the
treatment level at t corresponds to the evaluation of generalized propensity
score at the same t: this implies as many propensity scores as levels of
treatment to use each at one time. In particular, using GPS in connection to
smoothing techniques we have:
Theorem 3.3 Bias removal with Generalized Propensity Score
(Imbens and Hirano, 2004).
Suppose that the assumptions of Theorem 3.3 are satisfied, then:
B(t,r)=E[Y(t) | r(t,X) = r] = E[Y | T = t, R = r]= B(t,r)
µ(t) = E[B(t,r(t,X)]= E[E[Y(t) | r(t,X) ]]=E[Y(t)] (by iterated
expectations)
Theorem 3.3 implies that, in order to estimate the dose-response function
u(t), First, we must estimate the conditional expectation of the outcome, E[Y
| T = t, R = r], is estimated as a function of a specific level of the treatment
T = t and of a specific value of GPS R = r. Second, the dose-response
function, µ(t) = E[B(t,r(t,X)], is estimated averaging the conditional
expectation over the score r(t,X), evaluated at a certain level of the treatment
t. As already underlined, it should be clear that B(t,r) does not have a causal
49
interpretation. We, infact, need to average the conditional expectation over
the marginal distribution r(t,X), E[E[Y(t) | r(t,X) ]], to estimate the causal
effect.
Proof of Theorem 3.3 (Imbens and Hirano, 2004):
Let ),(),(,)( rtyf XtrTty represent the conditional density of Y(t) given T = t
and r(t,X) = r. Then applying the Bayes rule and Theorem 1 we get:
)),((
)()),(,)((),(
),()(
),(,)( rXtrtf
ryfrXtrytYtfrtyf
T
XtrtYT
XtrTty =
===
= )(),()( ryf XtrtY
So we can write
]),()([]),(,)([ rXtrtYErXtrtTtYE ====
but also we have
]),(,)([],)([ rXTrtTtYErRtTtYE =====
]),(,)([ rXtrtTtYE ===
),(]),()([ rtrXtrtYE β===
Hence ),(]),()([ rtrXtrtYE ii β== that is part (i) of Theorem 3. Then we
have ).()]([)]],()([[]),(,([ ttYEXtrtYEErXtrtE µβ ====
50
Moreover, supposing to be interested in marginal treatment effect
estimation with respect to treatment level t, we can write:
)]([)]([)]()([, tYEttYEtYttYEATE tt −+=−+= ???
that denotes the ATE of the treatment )( tt ∆+ relative to treatment (t) for a
participant drawn randomly from the population N. Another quantity of
primary interest is represented by the treatment effect estimation ATT, in a
specific sub-population:
])([])([])()([, ttYEtttYEttYttYEATT tt −+=−+= ???
]],)([[]],)([[ XtTtYEEXtTttYEE tTXtTX =−=+= == ?
that, by weak unconfoundedness, is equal to:
)]],()([[)]],()([[ ),(),( XtrtYEEXttTrttYEE tTXtrtTXttr ==+ −+=+ ???
The ttATT ,∆ denotes the expected effect for an individual randomly drawn
from the population of participants having treatment level t, while r(t,X) is
measurable with respect to the sigma-algebra generated by X.
Imbens’ procedure (2004) for the dose-response estimation - according to
the previous assumptions and Theorems – is based on the regression on the
propensity score technique. We will apply an extension of it since it
represents a valid strategy if implemented in empirical study. A method of
using the propensity score is to estimate the conditional expectation of Y
given T and r(t,X). First the GPS is estimated through the conditional
51
distribution of the treatment variable given the covariates, assuming a
specific functional form, for example a normal linear model2:
)),;(( 2σβiii XhNXT ∼
or
)),;((log 2σβiii XhNXT ∼
with the estimated GPS equal to:
);( ii XTgps φ=∧
To verify whether this specification is suitable, we investigate how it affects
the balance of the covariates. Hence, we first divide the range of the
treatment in an arbitrary number of intervals, we then define further blocks
of the GPS, for a specific );( iXtr = - computed at a certain treatment level.
Then, we examine the balancing for each covariate, testing whether the
mean in one of the treatment groups is different from the other treatment
groups combined, inside each GPS block . We make this for each treatment
interval with respect to the others groups, computing the t-tests for each
covariate and treatment interval. However, the precise steps of the GPS
implementation will be shown in the next chapter, in our empirical case
study. After having specified and estimated the GPS, we need to model the
conditional expectation of the outcome on the treatment variable and the
2 We may use more general models such as mixture of normals or heteroskedastic normal distributions, with the variance considered as a parametric function of the covariates.
52
score, E[Y | T = t, R = r] , as a flexible function of its two arguments3. For
example, we can use a linear regression or a quadratic approximation, such
as:
iiiiiiii RTRRTTRTYE 52
432
010],[ ββββββ +++++=
We estimate the parameters of the model , e.g by ordinary least squares,
using the estimated GPS iR̂ among the regressors. Hence, we estimate the
average potential outcome at treatment level t: ∧
)]([ tYE , doing this for each
level of the treatment we are interested in, to get an estimate of the entire
dose-response function4.
tXtrXtrXtrtttYEN
iiii∑
=
∧∧∧∧∧∧∧∧∧
+++++=1
52
432
010 ),(),(),(])([ ββββββ
In the last step we need to average the estimated regress function over the
score function in correspondence of the desired level of t. Rather than
referring to the dose-response function, we can report its derivatives. In
economics, this represents the marginal propensity (Imbens and Hirano,
2004) with respect to what we are interested in estimating. As we will show
in our observational study, this will be a useful and an alternative strategy to
estimate the dose – response, allowing for computing estimates at a specific
treatment level as well as comparing the marginal propensities at different
levels of intervention. 3 Remember that there is no causal interpretation for the conditional expectation of the outcome. 4 It is convenient to use bootstrap methods to compute standard errors and form confidence intervals.
53
3.3 The sub-classification procedure
The GPS can be used also with sub-classification and matching procedure,
although they are usually more cumbersome than in the binary treatment
case. Van Dik and Imai (2003) implement analysis techniques mostly based
on sub-classification and able to balance a high-dimensional covariate
adjusting for a low-dimensional propensity score. In sub-classification
technique, first they model the conditional distribution of the treatment
given the covariates )( xtfXTψ , where ψ parameterises the distribution.
Second they compute ∧
ψ of ψ that represents the parameters estimation. As
a result, the parametric model defines the generalized propensity score as
follows: ),( Xtr ∧∧
ψ5
. Third they compute ),( Xtr ∧∧
ψ for each observation and
sub-classify observations with the same or similar values of ∧r into a number
of sub-classes of equal size. Within each sub-class they model the outcome
distribution given the treatment ),(),(,)(
∧
rtyf XtrTty , e.g by regressing Y(t) on
both t and ∧r . To further reduce the bias Robins and Rotnitzky (2001) have
suggested the inclusion of other covariates in the regression. The average
causal effect can be computed as a weighted average of the within sub-
classes effects, with weights equal to the relative size of the sub-classes.
Formally the average treatment effect can be approximated in the following
way:
5 We can think to the Gaussian density function: )),;(( 2σβiii XhNXT ∼ , where ),( 2σβψ =
and 2,σβ can be estimated by maximum likelihood.
54
ss
S
s
WrtTtYEtYE ],)([)]([1
∧
=
=≈ ∑
where S is the number of sub-classes and sW is the relative size of each sub-
classes, estimated by the proportion of observations included into sub-class
s. Since results may be sensitive to the number of subclasses and sub-classes
choice, Van Dick and Imai (2003) suggest to conduct a sensitivity analysis
with different types of sub-classifications.
3.3.1 Lu’s matching technique applying the GPS
In contrast to sub-classification method, Lu et al. (2001) suggest matching
pairs of units on ∧r . In order to implement this procedure, in the continuous
treatment case, we need to divide the range of the treatment variable in
blocks, applying matching inside each strata, so proceeding in the average
treatment effect estimation. However, in this context, matching procedure
turns out to be more difficult to apply than in binary treatment. This because
the matched pairs should not only have similar ∧r , but they should also have
different treatment levels (this is not a problem in the binary treatment case
since each pairs is a unit from the treatment group and a unit from the
control group). Lu et al. (2001) propose a distance measure that decreases
when the propensity scores become similar and the received treatments
become dissimilar. The treatment effect can be evaluated by examining the
difference in response between the “high” and “low” treatment and, in order
to take into account the difference in treatment, they also suggest to regress
the difference in response on difference in treatment. It is evident the
55
difficult application of these techniques for a generalization to continuous
treatment variables. In this sense sub-classification ad smoothing techniques
represent more powerful strategies, since they allow a simpler
implementation of more complex causal effect analysis.
4 Conclusion
Propensity score methods have become one of the most important tools for
analyzing causal effects in observational studies. Although the original work
of Rosenbaum and Rubin (1983) considered applications with binary
treatments, it can also be extended to multivalued and continuous
treatments. We have discussed some of the issues involved in handling
multiple and continuous treatments and emphasized how the propensity
score methodology can be applied to “arbitrary” cases.
56
Bibliography
[Abadie and Imbens, 2002] Abadie, A., Imbens, G. (2002) Simple and bias-
corrected matching estimators for average treatment effects. NBER technical
working paper series, 283.http://www.nber.org/papers/T0283.
[Abadie and Imbens, 2004] Abadie, A., Imbens, G. (2004) Large sample
properties of matching estimators for average treatment effeects. Working
paper, 283. http://elsa.berkeley.edu/.
[Ahn and Powell, 1993] Ahn, H. and J.L. Powell (1993), Semiparametric
Estimation of Censored Selection Models with a Nonparametric Selection
Mechanism, Journal of Econometrics, 58, 3-29.
[Angrist and Krueger, 2000] Angrist, J., Krueger, (2000) Empirical
Strategies in Labor Economics, in A. Ashenfelter and D. Card eds.
Handbook of Labor Economics, vol. 3. New York: Elsevier Science.
[Angrist et al., 1996] Angrist, J., Imbens, G., Rubin, D. B. (1996)
Identification of causal effects using instrumental variables. Journal of the
American Statistical Association 91, 444-472.
[Athey and Imbens, 2002] Athey, S., and Imbens, G. (2002), “Identification
and Inference in Nonlinear Difference-In-Differences Models,” unpublished
manuscript, Department of Economics, Stanford University.
57
[Battistin et al., 2001] Battistin, E., Gavosto, A., Rettore, E. (2001) Why do
subsidised firms survive longer? An evaluation of a program promoting
youth enterpreneurship in Italy, in Lechner M., F. Pfeiffer (eds.),
Econometric evaluation of active labour market policies, Physica/Springer-
Verlag, Heidelberg
[Becker and Ichino, 2002] Becker, S. O. and Ichino, A. (2002). Estimation
of average treatment effects based on propensity scores. The Stata Journal,
4, 358-377.
[Blundel and MaCurdy, 2000] Blundell, Richard, and Thomas MaCurdy,
(2000): Labor Supply, Handbook of Labor Economics, O. Ashenfelter and
D. Card, eds., North Holland: Elsevier, 2000, 1559-1695.
[Bryson et al., 1999] Bryson, A., Dorsett, R., Purdon, S. (2002). The use of
propensity score matching in the evaluation of active labour market policies.
Policy Studies Institute, U.K. Department for Work and Pensions Working
Paper No. 4. http://www.dwp.gov.uk/asd/asd5/wp-index.html.
[Cox and Oakes, 1984] Cox D.R., Oakes D. (1984) Analysis of survival
data. Chapman and Hall, London.89
[Dehejia and Wahba, 1999] Dehejia, R. H., Wahba, S. (1999) Causal effects
in nonexperimental studies: Re-evaluating the evaluation of training
programs. Journal of the American Statistical Association 94, 1053-62.
58
[DiPrete and Gangl, 2004] DiPrete, T., Gangl, M. (2004). Assessing bias in
the estimation of causal effects: Rosenbaum bounds on matching estimators
and instrumental variables estimation with imperfect instruments.
Sociological methodology.
[Drake, 1993] Drake, C. (1993) Effects of misspecification of the propensity
score on estimators of treatment effects. Biometrics 49, 1231-1236.
[Frolich, 2002] Frolich, M. (2002) What is the value of knowing the
propensity score for the estimation af the average treatment effects.
Department of economics, University of St. Gallen.
[Greevy et al. 2004] Greevy, R., Lu, B., Silber, J. H., and Rosenbaum,
P.(2004) Optimal multivariate matching before randomization. Biostatistics
5, 263-275.
[Hahn, 1998] Hahn, J. (1998) On the role of the propensity scores in e±cient
semiparametric estimation of average treatment effects. Econometrica 66,
315-331.
[Hausman and Newey, 1995] Hausman, J., and Newey, W., (1995)
Nonparametric Estimation of Exact Consumer Surplus and Deadweight
Loss, Econometrica, 63, 1445-1476.
[Heckman, 1979b] Heckman, J. J. (1979) Sample selection bias as a
specification error. Econometrica 41(1), 153-161.
59
[Heckman and Hotz, 1989] Heckman, J. J., Hotz, V. J. (1989) Choosing
among alternative nonexperimental methods for estimating the impact of
social programs: the case of manpower training. Journal of the American
Statistical Association 84, 408, 862-874.
[Heckman et al., 1998b] Heckman, J. J., Ichimura, H., Todd, P. (1998b)
Matching as an econometric evaluation estimator. Review of Economic
Studies 65, 261-294.
[Heckman and Robb, 1985] Heckman, J. and R. Robb, (1985), Alternative
Methods for Evaluating the Impact of Interventions, in J. Heckman and B.
Singer, eds., Longitudinal Analysis of Labor Market Data, New York:
Cambridge University Press.
[Heckman and Todd, 1999] Heckman, J. J., Todd, P. (1999) Adopting
propensity score matching and selection models of choice-based samples.
University of Chicago.
[Hirano et al., 2003] Hirano, K., Imbens, G. W., Ridder, G. (2003) Efficient
estimation of average treatment effects using the estimated propensity score.
Econometrica 71(4).
[Holland 1986] Holland, P. (1986) Statistics and causal inference. Journal of
the American Statistical Association, 81.
60
[Honorè and Powell, 1994] Honorè, B. and Powell, J., (1994) Pairwise
Difference Estimators for Censored and Truncated Regression Models,
Journal of Econometrics, 64: 241-278.
[Imbens, 1999] Imbens, G. W., (1999) The Role of the Propensity Score in
Estimating Dose-Response Functions, NBER Working Paper No. T0237.
Available at SSRN: http://ssrn.com/abstract=226648.
[Imbens and Hirano, 2004] Imbens, G. and Hirano, K., (2004) The
Propensity score with continuous treatment, chapter for Missing data and
Bayesian Method in Practice: Contributions by Donald Rubin Statistical
Family.
[Imbens et al., 2001] Imbens, G. (2001) Implementing Matching Estimators
for Average Treatment Effects in Stata, The Stata Journal (2001).
[Joffe and Rosenbaum, 1999] Jo_e, M. M. and Rosenbaum, P. R. (1999).
Propensity scores. American Journal of Epidemiology 150, 327-333.
[Koisuke and Dyk, 2003] Koisuke, I. and van Dyk, D. A., (2003) Causal
treatment with general treatment regimes: Generalizing the Propensity
Score, Revised for the Journal of the American Statistical Association.
[Lechner, 2001] Lechner, M., (2001), Identification and Estimation of
Causal Effects of Multiple Treatments under the Conditional Independence
Assumption, in Lechner and Pfeiffer (eds.), Econometric Evaluations of
Active Labor Market Policies in Europe, Heidelberg, Physica.
61
[Lee, 2005] Lee W-S, (2005) Propensity Score Matching and Variations on
the Balancing Test, Working Paper - 3rd Conference on Policy Evaluation,
Mannheim.
[Lu et al., 2001] Lu et al. (2001). Matching with doses in an observational
study of a media campaign against drug abuse. Journal of the American
Statistical Association 96, 1245-1253.
[Mealli e Pagni, 2001 ] Mealli, F., Pagni, R. (2001) Analisi e valutazione
delle politiche per le nuove imprese. Il caso della L.R. Toscana
n.27/93.IRPET.
[Meyer, 1995] Meyer, B, (1995), Natural and Quasi-experiments in
Economics, Journal of Business and Economic Statistics, 13 (2), 151-161.
[Meini, 2001] Meini, M.C. (2001) Politiche per l'occupazione a scala locale.
Valutazione del ruolo degli interventi per lo start-up d'impresa. Provincia di
Massa, Osservatorio Provinciale Mercato del Lavoro, IRPET.
[Ming and Rosenbaum, 2000] Ming, K., Rosenbaum, P. R. (2000).
Substantial gains in bias reduction from matching with a variable number of
controls. Biometrics 56
[Moffit, 1991] Moffit R. (1991), Program evaluation with nonexperimental
data, Evaluation Review, 15: 291-314.
62
[Newey, 1988] Newey W. K. (1988) Two Step Series Estimation of Sample
Selection Models, Working Paper, MIT Department of Economics.
[Newey and Vella, 2003] Newey and Vella (2003), Non-parametric
Estimation of Sample Selection. Models, Review of Economic Studies, 2003,
Vol 70, pp 33-58.
[Neyman, 1923] Neyman, J. (1923) On the application of probability theory
to agricultural experiments. Essay on principles. Section 9.
[Pellegrini, 2001] Pellegrini G. (2001). La struttura produttiva delle piccole
e medie imprese italiane: il modello dei distretti." Banca Impresa Società, 20
(2001), n. 2.
[Powell J. L., 1987] Powell J. L. (1987) Semiparametric Estimation of
Bivariate Latent Variable Models, Working Paper No. 8704, Social Systems
Research Institute, University of Wisconsin-Madison.
[Powell, 1987] Powell, J. L., (1987) Semiparametric Estimation of
Employment Duration Models, Econometric Reviews, 6: 65-78.
[Purdon, 2002] Purdon, S. (2002) The use of propensity score matching in
the evaluation of active labour market policies, Rosenbaum, P. R.
(2002).Observational Studies, 2nd Edition. Springer Verlag, New York,
NY.
63
[Rettore and Gavosto, 2001] Rettore E. and Gavosto A. (2001) Why do
subsidised firms survive longer? An evaluation of a program promoting
youth entrepreneurship in Italy, Econometric Evaluation of Active Labour
Market Policies, Physica/Springer-Verlag, Heidelberg.
[Rosenbaum, 2002] Rosenbaum, P. R. (2002) Observational Studies, 2nd
Edition. Springer Verlag, New York, NY.
[Rosenbaum and Rubin, 1983] Rosenbaum, P.R., Rubin, D. B. (1983) The
central role of the propensity score in observational studies for causal
effects. Biometrika, 70.
[Rosenbaum and Rubin, 1983b] Rosenbaum, P.R., Rubin, D. B. (1983b)
Assessing sensitivity to an unobserved binary covariate in an observational
study with binary outcome Journal of the Royal Statistical Society, 45, 212.
[Rosenbaum and Rubin, 1984] Rosenbaum, P. R., Rubin, D. B. (1984)
Reducing bias in observational studies using sub-classification on the
propensity score. Journal of the American Statistical Association 79,516-
524.
[Rubin, 1973a] Rubin, D. B. (1973a) Matching to remove bias in
observational studies. Biometrics 29, 159-184.
[Rubin, 1973b] Rubin, D. B. (1973b) The use of matched sampling and
regression adjustment to remove bias in observational studies. Biometrics
29, 159-184.
64
[Rubin, 1974] Rubin, D. B. (1974) Estimating causal effects of treatments in
randomized and nonrandomized studies. Journal of Educational Psychology
66, 688-701.
[Rubin, 1980a] Rubin, D. B. (1980a) Bias reduction using Mahalanobis's
metric matching. Biometrics, 36, 295-298.
[Rubin and Thomas, 1992b] Rubin, D. B., Thomas, N. (1992a)
Characterizing the effect of matching using linear propensity score methods
with normal distributions. Biometrika 79, 797-809.
[Rubin and Thomas, 1996] Rubin, D. B., Thomas, N. (1996) Matching using
estimated propensity scores, relating theory to practice. Biometrics 52, 249-
264.
[Zaho, 2004] Zaho, Z. (2004) Using matching to estimate treatment effects:
data requirements, matching metrics, and Monte Carlo evidence. Review of
Economics and Statistics 86(1), 91-107.
Working Papers The full text of the working papers is downloadable at http://polis.unipmn.it/
*Economics Series **Political Theory Series ε Al.Ex Series
2007 n.88* Michela Bia: The Propensity Score method in public policy evaluation: a survey
2007 n.87* Luca Mo Costabella and Alberto Martini: Valutare gli effetti indesiderati dell’istituto della mobilità sul comportamento delle imprese e dei lavoratori.
2007 n.86ε Stefania Ottone: Are people samaritans or avengers?
2007 n.85* Roberto Zanola: The dynamics of art prices: the selection corrected repeat-sales index
2006 n.84* Antonio Nicita and Giovanni B. Ramello: Property, liability and market power: the antitrust side of copyright
2006 n.83* Gianna Lotito: Dynamic inconsistency and different models of dynamic choice – a review
2006 n.82** Gabriella Silvestrini: Le républicanisme genevois au XVIIIe siècle
2006 n.81* Giorgio Brosio and Roberto Zanola: Can violence be rational? An empirical analysis of Colombia
2006 n.80* Franco Cugno and Elisabetta Ottoz: Static inefficiency of compulsory licensing: Quantity vs. price competition
2006 n.79* Carla Marchese: Rewarding the consumer for curbing the evasion of commodity taxes?
2006 n.78** Joerg Luther: Percezioni europee della storia costituzionale cinese
2006 n.77ε Guido Ortona, Stefania Ottone, Ferruccio Ponzano and Francesco Scacciati: Labour supply in presence of taxation financing public services. An experimental approach.
2006 n.76* Giovanni B. Ramello and Francesco Silva: Appropriating signs and meaning: the elusive economics of trademark
2006 n.75* Nadia Fiorino and Roberto Ricciuti: Legislature size and government spending in Italian regions: forecasting the effects of a reform
2006 n.74** Joerg Luther and Corrado Malandrino: Letture provinciali della costituzione europea
2006 n.73* Giovanni B. Ramello: What's in a sign? Trademark law and economic theory
2006 n.72* Nadia Fiorino and Roberto Ricciuti: Determinants of direct democracy across Europe
2006 n.71* Angela Fraschini and Franco Oscultati: La teoria economica dell'associazionismo tra enti locali
2006 n.70* Mandana Hajj and Ugo Panizza: Religion and gender gap, are Muslims different?
2006 n.69* Ana Maria Loboguerrero and Ugo Panizza: Inflation and labor market flexibility: the squeaky wheel gets the grease
2006 n.68* Alejandro Micco, Ugo Panizza and Monica Yañez: Bank ownership and performance: does politics matter?
2006 n.67* Alejandro Micco and Ugo Panizza: Bank ownership and lending behavior
2006 n.66* Angela Fraschini: Fiscal federalism in big developing countries: China and India
2006 n.65* Corrado Malandrino: La discussione tra Einaudi e Michels sull'economia pura e sul metodo della storia delle dottrine economiche
2006 n.64ε Stefania Ottone: Fairness: a survey
2006 n.63* Andrea Sisto: Propensity Score matching: un'applicazione per la creazione di un database integrato ISTAT-Banca d'Italia
2005 n.62* P. Pellegrino: La politica sanitaria in Italia: dalla riforma legislativa alla riforma costituzionale
2005 n.61* Viola Compagnoni: Analisi dei criteri per la definizione di standard sanitari nazionali
2005 n.60ε Guido Ortona, Stefania Ottone and Ferruccio Ponzano: A simulative assessment of the Italian electoral system
2005 n.59ε Guido Ortona and Francesco Scacciati: Offerta di lavoro in presenza di tassazione: l'approccio sperimentale
2005 n.58* Stefania Ottone and Ferruccio Ponzano, An extension of the model of Inequity Aversion by Fehr and Schmidt
2005 n.57ε Stefania Ottone, Transfers and altruistic punishment in Solomon's Game experiments
2005 n. 56ε Carla Marchese and Marcello Montefiori, Mean voting rule and strategical behavior: an experiment
2005 n.55** Francesco Ingravalle, La sussidiarietà nei trattati e nelle istituzioni politiche dell'UE.
2005 n. 54* Rosella Levaggi and Marcello Montefiori, It takes three to tango: soft budget constraint and cream skimming in the hospital care market
2005 n.53* Ferruccio Ponzano, Competition among different levels of government: the re-election problem.
2005 n.52* Andrea Sisto and Roberto Zanola, Rationally addicted to cinema and TV? An empirical investigation of Italian consumers .
2005 n.51* Luigi Bernardi and Angela Fraschini, Tax system and tax reforms in India
2005 n.50* Ferruccio Ponzano, Optimal provision of public goods under imperfect intergovernmental competition.
2005 n.49* Franco Amisano e Alberto Cassone, Proprieta’ intellettuale e mercati: il ruolo della tecnologia e conseguenze microeconomiche
2005 n.48* Tapan Mitra e Fabio Privileggi, Cantor Type Attractors in Stochastic Growth Models
2005 n.47ε Guido Ortona, Voting on the Electoral System: an Experiment
2004 n.46ε Stefania Ottone, Transfers and altruistic Punishments in Third Party Punishment Game Experiments.
2004 n.45* Daniele Bondonio, Do business incentives increase employment in declining areas? Mean impacts versus impacts by degrees of economic distress.
2004 n.44** Joerg Luther, La valorizzazione del Museo provinciale della battaglia di Marengo: un parere di diritto pubblico
2004 n.43* Ferruccio Ponzano, The allocation of the income tax among different levels of government: a theoretical solution
2004 n.42* Albert Breton e Angela Fraschini, Intergovernmental equalization grants: some fundamental principles
2004 n.41* Andrea Sisto, Roberto Zanola, Rational Addiction to Cinema? A Dynamic Panel Analisis of European Countries
2004 n.40** Francesco Ingravalle, Stato, groβe Politik ed Europa nel pensiero politico di F. W. Nietzsche
2003 n.39ε Marie Edith Bissey, Claudia Canegallo, Guido Ortona and Francesco Scacciati, Competition vs. cooperation. An experimental inquiry
2003 n.38ε Marie-Edith Bissey, Mauro Carini, Guido Ortona, ALEX3: a simulation program to compare electoral systems
2003 n.37* Cinzia Di Novi, Regolazione dei prezzi o razionamento: l’efficacia dei due sistemi di allocazione nella fornitura di risorse scarse a coloro che ne hanno maggiore necessita’
2003 n. 36* Marilena Localtelli, Roberto Zanola, The Market for Picasso Prints: An Hybrid Model Approach
2003 n. 35* Marcello Montefiori, Hotelling competition on quality in the health care market.
2003 n. 34* Michela Gobbi, A Viable Alternative: the Scandinavian Model of “Social Democracy”
2002 n. 33* Mario Ferrero, Radicalization as a reaction to failure: an economic model of islamic extremism
2002 n. 32ε Guido Ortona, Choosing the electoral system – why not simply the best one?
2002 n. 31** Silvano Belligni, Francesco Ingravalle, Guido Ortona, Pasquale Pasquino, Michel Senellart, Trasformazioni della politica. Contributi al seminario di Teoria politica
2002 n. 30* Franco Amisano, La corruzione amministrativa in una burocrazia di tipo concorrenziale: modelli di analisi economica.
2002 n. 29* Marcello Montefiori, Libertà di scelta e contratti prospettici: l’asimmetria informativa nel mercato delle cure sanitarie ospedaliere
2002 n. 28* Daniele Bondonio, Evaluating the Employment Impact of Business Incentive
Programs in EU Disadvantaged Areas. A case from Northern Italy
2002 n. 27** Corrado Malandrino, Oltre il compromesso del Lussemburgo verso l’Europa federale. Walter Hallstein e la crisi della “sedia vuota”(1965-66)
2002 n. 26** Guido Franzinetti, Le Elezioni Galiziane al Reichsrat di Vienna, 1907-1911
2002 n. 25ε Marie-Edith Bissey and Guido Ortona, A simulative frame to study the integration of defectors in a cooperative setting
2001 n. 24* Ferruccio Ponzano, Efficiency wages and endogenous supervision technology
2001 n. 23* Alberto Cassone and Carla Marchese, Should the death tax die? And should it leave an inheritance?
2001 n. 22* Carla Marchese and Fabio Privileggi, Who participates in tax amnesties? Self-selection of risk-averse taxpayers
2001 n. 21* Claudia Canegallo, Una valutazione delle carriere dei giovani lavoratori atipici: la fedeltà aziendale premia?
2001 n. 20* Stefania Ottone, L'altruismo: atteggiamento irrazionale, strategia vincente o amore per il prossimo?
2001 n. 19* Stefania Ravazzi, La lettura contemporanea del cosiddetto dibattito fra Hobbes e Hume
2001 n. 18* Alberto Cassone e Carla Marchese, Einaudi e i servizi pubblici, ovvero come contrastare i monopolisti predoni e la burocrazia corrotta
2001 n. 17* Daniele Bondonio, Evaluating Decentralized Policies: How to Compare the Performance of Economic Development Programs across Different Regions or States.
2000 n. 16* Guido Ortona, On the Xenophobia of non-discriminated Ethnic Minorities
2000 n. 15* Marilena Locatelli-Biey and Roberto Zanola, The Market for Sculptures: An Adjacent Year Regression Index
2000 n. 14* Daniele Bondonio, Metodi per la valutazione degli aiuti alle imprse con specifico target territoriale
2000 n. 13* Roberto Zanola, Public goods versus publicly provided private goods in a two-class economy
2000 n. 12** Gabriella Silvestrini, Il concetto di «governo della legge» nella tradizione repubblicana.
2000 n. 11** Silvano Belligni, Magistrati e politici nella crisi italiana. Democrazia dei
guardiani e neopopulismo
2000 n. 10* Rosella Levaggi and Roberto Zanola, The Flypaper Effect: Evidence from the
Italian National Health System
1999 n. 9* Mario Ferrero, A model of the political enterprise
1999 n. 8* Claudia Canegallo, Funzionamento del mercato del lavoro in presenza di informazione asimmetrica
1999 n. 7** Silvano Belligni, Corruzione, malcostume amministrativo e strategie etiche. Il ruolo dei codici.
1999 n. 6* Carla Marchese and Fabio Privileggi, Taxpayers Attitudes Towaer Risk and
Amnesty Partecipation: Economic Analysis and Evidence for the Italian Case.
1999 n. 5* Luigi Montrucchio and Fabio Privileggi, On Fragility of Bubbles in Equilibrium Asset Pricing Models of Lucas-Type
1999 n. 4** Guido Ortona, A weighted-voting electoral system that performs quite well.
1999 n. 3* Mario Poma, Benefici economici e ambientali dei diritti di inquinamento: il caso della riduzione dell’acido cromico dai reflui industriali.
1999 n. 2* Guido Ortona, Una politica di emergenza contro la disoccupazione semplice, efficace equasi efficiente.
1998 n. 1* Fabio Privileggi, Carla Marchese and Alberto Cassone, Risk Attitudes and the Shift of Liability from the Principal to the Agent
Department of Public Policy and Public Choice “Polis” The Department develops and encourages research in fields such as:
• theory of individual and collective choice; • economic approaches to political systems; • theory of public policy; • public policy analysis (with reference to environment, health care, work, family, culture,
etc.); • experiments in economics and the social sciences; • quantitative methods applied to economics and the social sciences; • game theory; • studies on social attitudes and preferences; • political philosophy and political theory; • history of political thought.
The Department has regular members and off-site collaborators from other private or public organizations.
Instructions to Authors
Please ensure that the final version of your manuscript conforms to the requirements listed below:
The manuscript should be typewritten single-faced and double-spaced with wide margins.
Include an abstract of no more than 100 words. Classify your article according to the Journal of Economic Literature classification system. Keep footnotes to a minimum and number them consecutively throughout the manuscript with superscript Arabic numerals. Acknowledgements and information on grants received can be given in a first footnote (indicated by an asterisk, not included in the consecutive numbering). Ensure that references to publications appearing in the text are given as follows: COASE (1992a; 1992b, ch. 4) has also criticized this bias.... and “...the market has an even more shadowy role than the firm” (COASE 1988, 7). List the complete references alphabetically as follows: Periodicals: KLEIN, B. (1980), “Transaction Cost Determinants of ‘Unfair’ Contractual Arrangements,” American Economic Review, 70(2), 356-362. KLEIN, B., R. G. CRAWFORD and A. A. ALCHIAN (1978), “Vertical Integration, Appropriable Rents, and the Competitive Contracting Process,” Journal of Law and Economics, 21(2), 297-326. Monographs: NELSON, R. R. and S. G. WINTER (1982), An Evolutionary Theory of Economic Change, 2nd ed., Harvard University Press: Cambridge, MA. Contributions to collective works: STIGLITZ, J. E. (1989), “Imperfect Information in the Product Market,” pp. 769-847, in R. SCHMALENSEE and R. D. WILLIG (eds.), Handbook of Industrial Organization, Vol. I, North Holland: Amsterdam-London-New York-Tokyo. Working papers: WILLIAMSON, O. E. (1993), “Redistribution and Efficiency: The Remediableness Standard,”
Working paper, Center for the Study of Law and Society, University of California, Berkeley.