+ All Categories
Home > Documents > Application of propensity scores and potential outcomes to estimate effectiveness of traffic safety...

Application of propensity scores and potential outcomes to estimate effectiveness of traffic safety...

Date post: 24-Nov-2016
Category:
Upload: eric-t
View: 218 times
Download: 2 times
Share this document with a friend
15
Accident Analysis and Prevention 50 (2013) 539–553 Contents lists available at SciVerse ScienceDirect Accident Analysis and Prevention j ourna l h o mepage: www.elsevier.com/locate/aap Application of propensity scores and potential outcomes to estimate effectiveness of traffic safety countermeasures: Exploratory analysis using intersection lighting data Lekshmi Sasidharan , Eric T. Donnell Department of Civil and Environmental Engineering, The Pennsylvania State University, 212 Sackett Building, University Park, PA 16802, United States a r t i c l e i n f o Article history: Received 26 September 2011 Received in revised form 25 May 2012 Accepted 29 May 2012 Keywords: Propensity scores Potential outcomes Traffic safety countermeasure Matching Sub-classification Roadway lighting a b s t r a c t More than 5.5 million police-reported traffic crashes occurred in the United States in 2009, resulting in 33,808 fatalities and more than 2.2 million injuries. Significant funds are expended annually by federal, state, and local transportation agencies in an effort to reduce traffic crashes. Effective safety manage- ment involves selecting highway and street locations with potential for safety improvements; correctly diagnosing safety problems; identifying appropriate countermeasures; prioritizing countermeasure implementation at selected sites; and, evaluating the effectiveness of implemented countermeasures. Accurate estimation of countermeasure effectiveness is a critical component of the safety management process. In this study, a statistical modeling framework, based on propensity scores and potential out- comes, is described to estimate countermeasure effectiveness from non-randomized observational data. Average treatment effects are estimated using semi-parametric estimation methods. To demonstrate the framework, the average treatment effect of fixed roadway lighting at intersections in Minnesota is estimated. The results indicate that fixed roadway lighting reduces expected nighttime crashes by approximately 6%, which compares favorably to other, recent lighting-safety research findings. © 2012 Elsevier Ltd. All rights reserved. 1. Introduction The effectiveness of a traffic safety countermeasure is an important consideration when selecting one for implementation. Accurate estimation of countermeasure effectiveness is therefore critical in transportation safety programming. Traffic safety stud- ies often rely on retrospective, non-randomized observational data to assess the safety performance of highways. Experimental data, which are commonly used to determine causal (or “treatment”) effects of interventions in other fields of research (e.g., medicine, economics, education, politics, etc.), are generally not available in traffic safety research because such studies require that coun- termeasures be randomly assigned to treatment locations. Legal, ethical, and safety concerns in using such an approach to assign countermeasures have necessitated transportation agencies to assign treatments to locations experiencing high-crash frequencies or severities (i.e., treatment implementation is not randomized). Statistical causal inference methods may be used to estimate the average treatment effect (ATE) of a countermeasure from non- randomized observational data. Such an approach minimizes bias Corresponding author. Tel.: +1 814 206 4647; fax: +1 814 863 7304. E-mail addresses: [email protected] (L. Sasidharan), [email protected] (E.T. Donnell). in the estimated treatment effect by mimicking randomization based on observed covariates (e.g., Rosenbaum and Rubin, 1983, 1984, 1985; Dehejia and Wahba, 1998; Schafer and Kang, 2008). The objective of this paper is to describe and apply propensity score-potential outcomes framework, which are not widely used in transportation research, to determine the effectiveness of safety countermeasures from non-randomized observational safety data. This framework is derived from the statistical causal inference lit- erature. Our paper is based on the potential outcomes framework proposed by Rubin, which is also known as Rubin’s causal model (Holland, 1986). We demonstrate two different sampling schemes; propensity score matching and propensity-based sub-classification methods, based on crash frequency, to determine the ATE of traffic safety countermeasures. The proposed framework is then executed using intersection data in Minnesota to estimate the ATE of fixed roadway lighting. 2. Background There are three commonly used methods to determine traf- fic safety countermeasure effectiveness, including observational before–after studies, with–without comparisons using cross- sectional or longitudinal statistical models, and case–control epidemiological studies. This section of the paper summarizes these methods and also contains a brief summary of statistical 0001-4575/$ see front matter © 2012 Elsevier Ltd. All rights reserved. http://dx.doi.org/10.1016/j.aap.2012.05.036
Transcript
Page 1: Application of propensity scores and potential outcomes to estimate effectiveness of traffic safety countermeasures: Exploratory analysis using intersection lighting data

Aei

LD

a

ARRA

KPPTMSR

1

iAcitweeitecaoSar

(

0h

Accident Analysis and Prevention 50 (2013) 539– 553

Contents lists available at SciVerse ScienceDirect

Accident Analysis and Prevention

j ourna l h o mepage: www.elsev ier .com/ locate /aap

pplication of propensity scores and potential outcomes to estimateffectiveness of traffic safety countermeasures: Exploratory analysis usingntersection lighting data

ekshmi Sasidharan ∗, Eric T. Donnellepartment of Civil and Environmental Engineering, The Pennsylvania State University, 212 Sackett Building, University Park, PA 16802, United States

r t i c l e i n f o

rticle history:eceived 26 September 2011eceived in revised form 25 May 2012ccepted 29 May 2012

eywords:ropensity scoresotential outcomes

a b s t r a c t

More than 5.5 million police-reported traffic crashes occurred in the United States in 2009, resulting in33,808 fatalities and more than 2.2 million injuries. Significant funds are expended annually by federal,state, and local transportation agencies in an effort to reduce traffic crashes. Effective safety manage-ment involves selecting highway and street locations with potential for safety improvements; correctlydiagnosing safety problems; identifying appropriate countermeasures; prioritizing countermeasureimplementation at selected sites; and, evaluating the effectiveness of implemented countermeasures.Accurate estimation of countermeasure effectiveness is a critical component of the safety management

raffic safety countermeasureatching

ub-classificationoadway lighting

process. In this study, a statistical modeling framework, based on propensity scores and potential out-comes, is described to estimate countermeasure effectiveness from non-randomized observational data.Average treatment effects are estimated using semi-parametric estimation methods. To demonstratethe framework, the average treatment effect of fixed roadway lighting at intersections in Minnesotais estimated. The results indicate that fixed roadway lighting reduces expected nighttime crashes byapproximately 6%, which compares favorably to other, recent lighting-safety research findings.

. Introduction

The effectiveness of a traffic safety countermeasure is anmportant consideration when selecting one for implementation.ccurate estimation of countermeasure effectiveness is thereforeritical in transportation safety programming. Traffic safety stud-es often rely on retrospective, non-randomized observational datao assess the safety performance of highways. Experimental data,hich are commonly used to determine causal (or “treatment”)

ffects of interventions in other fields of research (e.g., medicine,conomics, education, politics, etc.), are generally not availablen traffic safety research because such studies require that coun-ermeasures be randomly assigned to treatment locations. Legal,thical, and safety concerns in using such an approach to assignountermeasures have necessitated transportation agencies tossign treatments to locations experiencing high-crash frequenciesr severities (i.e., treatment implementation is not randomized).

tatistical causal inference methods may be used to estimate theverage treatment effect (ATE) of a countermeasure from non-andomized observational data. Such an approach minimizes bias

∗ Corresponding author. Tel.: +1 814 206 4647; fax: +1 814 863 7304.E-mail addresses: [email protected] (L. Sasidharan), [email protected]

E.T. Donnell).

001-4575/$ – see front matter © 2012 Elsevier Ltd. All rights reserved.ttp://dx.doi.org/10.1016/j.aap.2012.05.036

© 2012 Elsevier Ltd. All rights reserved.

in the estimated treatment effect by mimicking randomizationbased on observed covariates (e.g., Rosenbaum and Rubin, 1983,1984, 1985; Dehejia and Wahba, 1998; Schafer and Kang, 2008).

The objective of this paper is to describe and apply propensityscore-potential outcomes framework, which are not widely usedin transportation research, to determine the effectiveness of safetycountermeasures from non-randomized observational safety data.This framework is derived from the statistical causal inference lit-erature. Our paper is based on the potential outcomes frameworkproposed by Rubin, which is also known as Rubin’s causal model(Holland, 1986). We demonstrate two different sampling schemes;propensity score matching and propensity-based sub-classificationmethods, based on crash frequency, to determine the ATE of trafficsafety countermeasures. The proposed framework is then executedusing intersection data in Minnesota to estimate the ATE of fixedroadway lighting.

2. Background

There are three commonly used methods to determine traf-fic safety countermeasure effectiveness, including observational

before–after studies, with–without comparisons using cross-sectional or longitudinal statistical models, and case–controlepidemiological studies. This section of the paper summarizesthese methods and also contains a brief summary of statistical
Page 2: Application of propensity scores and potential outcomes to estimate effectiveness of traffic safety countermeasures: Exploratory analysis using intersection lighting data

5 nalys

cs

2

st(2tbcebmtca

1

2

3

4

5

2

sMDtmrvmToesoodducspit(mdtg

40 L. Sasidharan, E.T. Donnell / Accident A

ausal inference methods that have been published in the trafficafety literature.

.1. Observational before–after studies

There are several methods to estimate traffic safety countermea-ure effectiveness using observational before–after data; however,he state-of-the-practice is to use the empirical Bayes (EB) methode.g., Hauer et al., 2002; Hauer and Persaud, 1983; Harwood et al.,002). The EB approach is the only known before–after method inraffic safety research that mitigates regression-to-the-mean (rtm)ias, and can also appropriately evaluate sites with zero crashounts during the analysis period (Hauer et al., 2002; Harwoodt al., 2002). When applied in the context of an observationalefore–after study, the EB method does produce a causal or treat-ent effect estimate for a traffic safety countermeasure provided

hat implementation of the treatment of interest was the onlyhange made to the site during the analysis period. However, therere several limitations associated with the EB method, including:

. It does not account for site selection bias thereby making theestimator inconsistent (Davis, 2000);

. Several years of before-period data are required to adequatelyhandle the rtm problem (Hauer and Persaud, 1983);

. The method works well only for overdispersed data (Aul andDavis, 2006);

. It is often difficult to determine the treatment application dateat select sites, thus differentiating between the before and afterperiods can be challenging; and

. Application of the method is limited to conditions when thetreatment status (installation of countermeasure) is the onlychange occurring at study sites between the before and afterperiods, which seldom occurs in practice (Hauer, 2010).

.2. Cross-sectional statistical models

Cross-sectional statistical models are commonly used in trafficafety research to determine countermeasure effectiveness (e.g.,iaou and Lum, 1993; Persaud et al., 2009; Donnell et al., 2010;onnell and Gross, 2011), and can overcome some of the limita-

ions encountered in observational before–after studies. In theseodels the relationship between the outcome (e.g., crashes) and

oadway and roadside features (e.g., traffic volume, horizontal andertical alignment, cross-section elements), including the counter-easure, is generally determined using count regression models.

hese models determine the statistical association between theutcome and the countermeasure of interest. Cross-sectional mod-ls offer the benefit of including a large number of sites in the studyample, and do not require a time sequence (i.e., before–after peri-ds) to evaluate a safety countermeasure. Causal conclusions areften drawn from these estimates; however, statistical associationso not imply causation. Tarko et al. (1998) note that cross-sectionalata are not subjected to rtm; however, there are limitations insing cross-sectional statistical models to determine traffic safetyountermeasure effectiveness. These include the potential for siteelection bias, lack of control for confounding variables, and unex-lored potential interactions among explanatory variables included

n the model (Hauer, 2004). Different traits of with and withoutreatment entities can also influence the treatment effect valueHauer, 2010). Rubin (2001) indicates that treatment effect esti-

ates based on a regression model are not trustworthy if theifference between the means of logit propensity scores for thereated and untreated groups exceed one half of the pooled withinroup standard deviation of propensity scores.

is and Prevention 50 (2013) 539– 553

2.3. Epidemiological case–control studies

Epidemiological case–control studies have recently gained pop-ularity in traffic safety research (e.g., Tsai et al., 1995; Gross andJovanis, 2007; Majdzadeh et al., 2008; Gross et al., 2009; Bunn et al.,2009; Costello et al., 2009). These studies are used to separate thetreatment effect of a particular countermeasure from the effect ofother potential confounding variables. One of the major limitationsof the matched case–control study is that it deals primarily withbinary outcomes – occurrence or non-occurrence of a crash on anentity in a given time period. It is not clear if the current practiceof treating multiple crashes occurring at one location as separatecases is an appropriate way to handle multi-crash sites, althoughrecent research (Gross and Donnell, 2011) suggests that resultsfrom matched case–control studies compare favorably to resultsfrom cross-sectional statistical models. While case–control stud-ies can produce unbiased treatment effects by matching cases andcontrols on all potential confounding variables, matching becomesdifficult when the number of confounders is large.

2.4. Causal inference models

Causal inference models are widely accepted methods to esti-mate treatment effects using observational, non-randomized data.These models are common in medical, economic, political and edu-cational research (e.g., D’Agostino, 1998; Gelman and Meng, 2004;Dehejia and Wahba, 2002; Yanovitzky et al., 2005). The purposeof causal inference models is to determine the ATE, or populationcausal effect of a treatment, as the difference in mean outcomes forhomogenous treated and control (untreated) groups, or as the riskratio of the two groups. There are different causal analysis meth-ods in use, such as Rubin’s causal model (RCM) based on potentialoutcomes (Rubin, 1973, 1978; Rubin and Thomas, 1996; Dehejiaand Wahba, 1997; Little and Rubin, 2000; Schafer and Kang, 2008),graphical models based on causal diagrams (Pearl, 1995; Karwaet al., 2011), and sufficient-component cause models (Greenlandand Brumback, 2002). This paper employs the RCM.

Although statistical causal models are widely used and acceptedin several disciplines, few examples exist in the traffic safety lit-erature (Davis, 2000; Aul and Davis, 2006; Park and Saccomanno,2007; Cao, 2009; Karwa et al., 2011). In most cases, the traffic safetystudies estimate propensity scores based on statistically signifi-cant covariates (p-value ≤ 0.05) and the ATE is estimated as thedifference in the mean number of crashes between the treatedand untreated groups. However, published literature on statisticalcausal modeling recommends that covariates in a propensity scoremodel be selected based on their relevance to treatment selec-tion rather than statistical significance (Lunceford and Davidian,2004; Rubin and Thomas, 1996; Schafer and Kang, 2008). Further-more, the statistical causal modeling literature indicates that it ismore meaningful to determine ATE as a multiplicative effect (oddsratios) using marginal structural models (Robins, 1999) when theresponse variable is discrete (Schafer and Kang, 2008), as in thecase of crashes, rather than taking the difference in mean number ofcrashes between treated and untreated groups. This allows the ATEto vary among covariates. The propensity score-potential outcomesframework discussed in this paper addresses both limitations ofcausal inference methods, as applied in traffic safety research.

3. Methodology

This section of the paper begins with a discussion of Rubin’spotential outcomes framework, including important assumptionsrelated to the method. Methods to estimate propensity scores andto determine ATE are then described.

Page 3: Application of propensity scores and potential outcomes to estimate effectiveness of traffic safety countermeasures: Exploratory analysis using intersection lighting data

nalys

3

msoiowao(licta

A

wa

(ipdts

A

wn

ibara

1

2

3

T

pdimop

L. Sasidharan, E.T. Donnell / Accident A

.1. Potential outcomes framework

The outcomes for any entity “i” corresponding to two treat-ent conditions, treated (Ti = 1) and untreated (Ti = 0), observed

imultaneously under the same period of time, are called potentialutcomes. In traffic safety research, an entity can be an intersection,nterchange, or road segment, and the outcome can be the numberf crashes per unit time (e.g., per year). Each entity is associatedith two potential outcomes: outcome with treatment Yi(Ti = 1)

nd outcome without treatment Yi(Ti = 0). If an entity is treated,nly Yi(Ti = 1) can be observed and Yi(Ti = 0) cannot be observedand vice versa) thereby making it impossible to determine entity-evel treatment effects. This is the fundamental problem of causalnference (Holland, 1986). These missing outcomes have also beenalled counterfactuals (Greenland et al., 1999). The gold standard ofhe unbiased and efficient estimate of ATE, if all potential outcomesre observed, is written as shown in Eq. (1):

TE =∑n

i=1Yi(1)

n−

∑ni=1Yi(0)

n(1)

here n, total number of entities; Yi(1), outcome when all entitiesre treated; Yi(0), outcome when all entities are untreated.

Because of the missing potential outcomes or counterfactualsYi(1) and Yi(0) cannot be observed simultaneously), the ATE shownn Eq. (1) cannot be estimated in an observational study. In a com-letely randomized study, however, ATE can be determined as theifference in average outcomes for the treated and untreated enti-ies. The unbiased estimate of ATE for a completely randomizedtudy is written as shown in Eq. (2):

T̂E =∑

iTiYi(1)∑iTi

−∑

i(1 − Ti)Yi(0)∑i(1 − Ti)

(2)

here∑

iTi, total number of treated entities and∑

i(1 − Ti), totalumber of untreated entities.

In observational studies, traffic safety countermeasures may benstalled based on outcomes. This dependence can be controlledy making use of important covariates that affect the outcomend treatment selection to reduce the bias for ATE from non-andomized observational data, provided it satisfies the followingssumptions:

. Stable unit treatment value assumption (SUTVA) (Rubin, 1990):This assumption states that the treatment applied to one entitydoes not affect the outcome of any other (i.e., no interferenceamong the entities).

. Positivity: This assumption requires that there be a non-zeroprobability of receiving every level of treatment for the combi-nation of values of exposure and covariates that occur amongentities in the population (Rubin, 1978; Hernan and Robins,2006). The positivity assumption can be made when eachhomogenous entity can be assigned to the treatment or non-treatment group.

. Unconfoundedness: The treatment assignment mechanism is saidto be unconfounded if the treatment status (Ti) is conditionallyindependent of the potential outcomes, given a set of covariatesXi. This is represented as shown in Eq. (3):

i � Yi(0), Yi(1)|Xi (3)

The treatment status Ti is unconditionally independent of theotential outcomes [Yi(1) and Yi(0)] by design in the case of a ran-omized design. For non-randomized observational data, however,

ndependence is achieved by balancing observed covariates usingethods such as propensity score matching or stratification, both

f which are described below. The confounders used to estimateropensity scores need to be selected based on expert judgment,

is and Prevention 50 (2013) 539– 553 541

in consideration of the factors that might influence the treatmentassignment. It is assumed that all confounding covariates are mea-sured and also available for the analysis.

3.2. Estimation of propensity scores

The first step in attempting to mimic randomization basedon observed covariates (identification of homogenous treated anduntreated entities) in the potential outcomes framework is the esti-mation of propensity scores, which is the conditional probability ofan entity receiving treatment given the covariates (X) and outcomes(Y). When the treatment assignment mechanism is unconfounded,the propensity score p̂ is represented as shown in Eq. (4):

p̂ = P(T = 1|X) (4)

The propensity scores are used to balance the covariates in thetreated and untreated groups by reducing the bias due to differ-ences in observed covariates. According to Rosenbaum and Rubin(1983), treatment assignment is strongly ignorable if unconfound-edness and common overlap hold, both of which are illustrated inEqs. (5) and (6).

Yi(0), Yi(1) � Ti|Xi (5)

0 < P(T = 1|X) < 1 (6)

Propensity scores model the relationship between treatmentstatus (treated vs. untreated) and confounders. The most frequentlyused methods to estimate propensity scores include the logitmodel, robit model, and classification trees or neural networks. Thelogit model is considered in the present study.

3.2.1. Comparison of the distributions of propensity scores fortreated and untreated groups

True propensity scores can balance covariate distributions andbe assessed by performing a test on each of the covariates to com-pare the means of treated and untreated groups (Schafer and Kang,2008). However, comparisons of one variable at a time are not suffi-cient to establish balance between treated and untreated groups asthey do not take into account the correlations among the covariates(Cochran, 1968). Therefore, the distribution of propensity scoresestimated for the treated and untreated groups must be comparedand should have sufficient overlap between the two. The compar-ison can be made by plotting histograms of propensity scores orlogit-propensity scores (linear predictions) for both treated anduntreated groups. The distributions will be different, but the rangesshould be comparable.

According to Rubin (2001), it is more appropriate to comparethe distributions for linear predictors as the distribution tends tobe less skewed and, variances tend to be more nearly equal, on thelogit scale. The data in the region with no overlap between the twodistributions should be discarded, as statistical inference for thisregion will be based entirely on extrapolation. As such, comparisonsneed to be limited to subgroups whose propensities lie in the regionof overlap.

3.3. Determining ATE

Once the propensity scores are estimated and overlap of thepropensity score distribution for the two groups are compared,homogenous groups of treated and untreated entities are iden-tified and ATE is determined using methods such as matching,

sub-classification or stratification, inverse propensity weighting,weighted residual bias corrections, weighted regression estima-tion, or regression estimation with propensity-related covariates(Schafer and Kang, 2008). These methods use propensity scores in
Page 4: Application of propensity scores and potential outcomes to estimate effectiveness of traffic safety countermeasures: Exploratory analysis using intersection lighting data

542 L. Sasidharan, E.T. Donnell / Accident Analys

dac

3

AmaRitealueettie2p

dpganm

Fig. 1. Matching methodology.

ifferent ways to mimic randomization based on observed covari-tes. The matching and stratification methods are the two mostommonly used methods and are described below.

.3.1. Matching methodOne of the most common sampling schemes to determine

TE in the potential outcomes literature is based on pair-wiseatching of the propensity scores (Schafer and Kang, 2008; Park

nd Saccomanno, 2007; Dehejia and Wahba, 1997; Rubin, 1973;osenbaum and Rubin, 1985; Rubin and Thomas, 1992). The basic

dea is to pair each treated entity to an untreated entity withhe closest propensity score. The matched pairs have the prop-rty that the distribution of observed covariates for the treatednd untreated groups is approximately the same. This method fol-ows two primary steps: (a) achieving balance between treated andntreated entities using propensity scores via matching, and (b)stimation of ATE. ATE estimated using this method is the averageffect of the treatment applied to some entities compared to enti-ies that are not treated, but has the same potential of receivingreatment in a randomized study. In this method, the dual model-ng of propensity scores and potential outcomes to determine ATEnsures doubly robust treatment effect values (Bang and Robins,005; Schafer and Kang, 2008). Fig. 1 shows the matching methodrocess.

Matching on propensity scores mimics the results of a ran-omized block experiment in which entities having the sameropensity scores are randomly assigned to treated or untreated

roups (Schafer and Kang, 2008). This process produces only covari-te balance, not perfect matches. However, perfect matches are notecessary to balance observed covariates. Multivariate matchingethods attempt to produce matched pairs or sets that balance

is and Prevention 50 (2013) 539– 553

observed covariates so that, in aggregate, the distributions ofobserved covariates are similar in the treated and untreated groups.

The unmatched treated and untreated entities are not consid-ered for further analysis. Although this method may seem wastefulas further analysis is limited only to matched observations, theunmatched observations are actually noise in the data and will pro-duce misleading conclusions if included in the analysis (Schafer andKang, 2008). According to Cohen (1988), in a two sample compar-ison of means, the precision is largely driven by the sample sizeof the smaller group and therefore discarding unmatched observa-tions typically produces only a slight reduction in power.

LaLonde (1986) conducted a study on econometric evaluationsof training programs in which he compared the treatment effectsestimated using non-experimental methods and a randomizedstudy. The study reported that non-experimental methods failedto replicate the results of a randomized experiment. However,Dehejia and Wahba (2002) used the same data set as LaLonde(1986) and replicated the results of a randomized experiment usingthe propensity score matching method. The authors showed thatnon-experimental estimators determined using propensity scorematching methods mimicked the results of a randomized experi-ment.

Nearest-neighbor matching (Dehejia and Wahba, 1995, 2002),K-nearest neighbor matching, radius matching (Dehejia andWahba, 2002), Mahalanobis metric matching (Rubin, 1980; Schaferand Kang, 2008; Karwa et al., 2011), and kernel matching (Heckmanet al., 1997a, 1998) are common matching methods. In this paper,matching is performed using nearest-neighbor (NN) propensityscore matching and Mahalanobis metric matching to illustrate thatthe ATE being estimated is not by chance. NN matching is basedsolely on propensity score estimates. An entity in the treatmentgroup with a specific propensity score is matched to an entity (1:1matching) or group of entities (1:n matching) in the untreatedgroup based on the proximity of their propensity scores. In thispaper, we focus on 1:1 NN matching. The treated and untreatedentities are randomly ordered before matching. The first treatedentity is then selected and matched to an untreated entity with theclosest propensity score. The matched entities are then removedfrom the sample and are not available for further matching.

When there is substantial overlap in the propensity score distri-butions of treated and untreated entities, all matching algorithmswill yield similar results. However, if there are not many untreatedentities that are comparable to the treated entities, 1:n matchingwith replacement of untreated entities accompanied by weightedregression is the best option to determine ATE (Dehejia and Wahba,2002). Ming and Rosenbaum (2000) also suggest that it may beworthwhile to consider matching a variable number of untreatedentities with the entities in the treated group when the numbersof treated and untreated entities are very different. As such, wefocus on 1:n Mahalanobis metric matching with replacement ofuntreated entities. Mahalanobis metric matching is based on theMahalanobis metric distance estimated using other covariates thatmay be correlated with the treatment, including propensity scores.Mahalanobis distance is estimated using Eq. (7):

d(u, v) = (u − v)T C−1UT (u − v) (7)

where u and v are values of {xT , p̂}T(x-set of covariates) and CUT

is the sample covariance matrix of {xT , p̂}Tin the untreated group

(Rosenbaum and Rubin, 1985). The Mahalanobis distance betweena treated entity and all other entities in the untreated group aredetermined after randomly ordering the entities in the untreated

group. The first treated entity is matched with the untreated enti-ties within the caliper distance (see below for explanation) definedby the Mahalanobis distance. The untreated entities are replacedin the control pool after matching and are available for further
Page 5: Application of propensity scores and potential outcomes to estimate effectiveness of traffic safety countermeasures: Exploratory analysis using intersection lighting data

nalys

mwt

esstftiTm

apapddaabsmusaE

S

wts

s

P

wm

eg

A

etr

dtWsm

3

tdecs

L. Sasidharan, E.T. Donnell / Accident A

atching with other treated entities. ATE is determined usingeighted regression in which untreated entities are weighted by

he number of times they are matched to a treated entity.For both matching methods, a caliper value is specified; oth-

rwise, poor matches may result (i.e., matching entities withignificantly different propensity scores). When a caliper value ispecified, the matches are restricted to a maximum distance equalo the caliper specified. If p̂i is the estimated propensity score, thenor each entity in the treated group, a pool of potential matches inhe untreated group is identified whose logit propensities are in thenterval p̂i ± c where c is the caliper distance specified for matching.his step is the same for the NN and Mahalanobis metric matchingethods.After matching the treated and untreated entities, the bal-

nce of covariates for the matched sample must be checked. Theercentage standardized bias is used to compare the matchednd unmatched samples because t-statistics are not directly com-arable when the full and matched samples vary in size. Theenominators used to estimate the percentage reduction in stan-ardized bias (shown in Eq. (8)) is the same for both the matchednd unmatched samples but it is different for the t-statistic (Schafernd Kang, 2008). To eliminate the dependence on sample size, thealance of covariates was assessed using the standardized bias ortandardized difference in means. The standardized bias is esti-ated as the difference of the sample means in the treated and

ntreated sub-samples (full or matched) as a percentage of thequare root of the average of the sample variances in the treatednd untreated groups (Rosenbaum and Rubin, 1985) as shown inq. (8).

tandardized bias = 100 × (x̄T − x̄UT )√(S2

T + S2UT )/2

(8)

here x̄T , sample mean in the treated group; x̄UT , sample mean inhe untreated group; S2

T , sample variance in the treated group; S2UT ,

ample variance in the untreated group.A properly balanced group of treated and untreated entities is

hown mathematically in Eq. (9):

(Xi|pi = a, Ti = 1) = P(Xi|pi = a, Ti = 0) (9)

here pi, propensity score (0 < a < 1); Xi, set of covariates; Ti, treat-ent status (1 = with treatment, 0 = without treatment).Let Y(T) be the effect of a countermeasure on matched treated

ntities and Y(C) be the effect on matched untreated entities. Ineneral, ATE is determined as a percentage using Eq. (10):

TE = [Y(C) − Y(T)] × 100Y(T)

=[

Y(C)Y(T)

− 1]

× 100 (10)

A positive value resulting from Eq. (10) indicates that the ratiostimated is lower on entities with the treatment when comparedo those without the treatment. In this study, negative binomialegression models are used to estimate Y(T) and Y(C).

In general, the matching method to be selected for an analysisepends on the data in question and the degree of overlap betweenhe treated and untreated entities in terms of propensity scores.

hen there is substantial overlap in the distribution of the propen-ity scores between the treated and untreated entities, many of theatching methods will result in comparable results.

.3.2. Propensity based stratification methodIn this method, strata of treated and untreated entities are iden-

ified based on propensity scores. Unlike the matching methods

escribed earlier, no observations are discarded in this method. Thentities can be divided into “n” strata based on the specified per-entile values of propensity scores. Rosenbaum and Rubin (1984)uggested using five groups of approximately equal size based on

is and Prevention 50 (2013) 539– 553 543

the guidance by Cochran (1968) that five classes remove about 90%of the selection bias from an estimate of a population mean. How-ever, there should be at least a few treated and untreated entitiesin each stratum (Schafer and Kang, 2008). If not, the strata may befurther divided by splitting the stratum that cover a wide rangeof propensities, which are identified based on the minimum andmaximum values of propensity scores in each stratum. The stratifi-cation is continued until the mean propensity scores for treated anduntreated entities in each stratum are comparable (difference inmean propensity scores < 0.01). After covariate balance is achieved(see Eq. (9)), the treated and untreated entities within each stratumwith homogenous propensity scores are compared as if they are arandomized experiment. Fig. 2 shows the steps in the stratificationmethod.

The treatment effect for each stratum, ˇ1, and the standarderrors of ˇ1, need to be determined separately for each stratumusing a simple marginal structural model as shown in Eq. (11).

For each stratum j : ln[Yi(Ti)] = ˇ0j + ˇ1jTi + ˇjX (11)

where ˇ0j = ln[Y(0)], ˇ0j is constant term.

ˇ1j = ln[Y(1)] − ln[Y(0)]

ˇ1j = lnY(1)Y(0)

= ln �j (12)

Using the estimated ˇ1 for each stratum from Eq. (12), the �’sfor each stratum are estimated using Eq. (13):

�j = eˇ1j (13)

ATE is then determined as the mean of the strata-specific treat-ment effects, weighted by the number of treated and untreatedentities in each subclass. The overall treatment effect is estimatedusing Eq. (14):

Overall treatment effect = 1N

n∑j=1

mj�̂j (14)

where n, number of strata n = 1, 2,. . ., j; mj, number of entities ineach stratum; N, total number of entities.

The ATE for treated and untreated entities is computed sep-arately in the stratification method. ATE for untreated entitiesmeasure the effect of the treatment when it is applied to untreatedentities, which are often more relevant than overall ATE when dis-cussing implications of policy (Heckman et al., 1997b). Similarly,ATE among treated entities represents the effect of a countermea-sure if it is removed from the treated group. ATET is the ATE amongtreated entities, and is determined as shown in Eq. (15):

ATET =∑n

j=1mj(1)�̂j∑nj=1mj(1)

(15)

where mj(1), number of treated entities in subclass j; n, number ofsubclasses n = 1, 2,. . ., j.

The treatment effect for untreated entities is determined asshown in Eq. (16):

ATEUT =∑n

j=1mj(0)�̂j∑nj=1mj(0)

(16)

where mj(0), number of untreated entities in subclass j; n, numberof subclasses n = 1, 2,. . ., j.

Properly balanced strata with homogenous propensity scoresfor the treated and untreated groups mimics randomization withminimal loss of original data. The method also estimates ATE fortreated and untreated entities, along with the overall ATE.

Page 6: Application of propensity scores and potential outcomes to estimate effectiveness of traffic safety countermeasures: Exploratory analysis using intersection lighting data

544 L. Sasidharan, E.T. Donnell / Accident Analysis and Prevention 50 (2013) 539– 553

differ

cstu(asto

3

ittsshmopompgsdtdw

h

Fig. 2. Flowchart showing

In the present study, once the balance between observedovariates for the treated (lighted) and untreated (unlighted) inter-ections is achieved, the effect of intersection lighting in reducinghe frequency of daytime and nighttime crashes is determinedsing count models. Other predictors influencing crash occurrencetype of intersection, area type, access control, number of lanes, etc.)re also included in the model to control site-specific effects. Pois-on regression and negative binomial regression models are usedo estimate daytime and nighttime crash frequencies, dependingn the presence of overdispersion in the crash data.

.4. Summary of proposed methodological process

The propensity score-potential outcomes framework discussedn this paper is derived from the statistical causal inference litera-ure. This framework is intended to reduce the bias associated withhe estimation of ATE using observational non-randomized trafficafety data by applying different sampling schemes. Two samplingchemes – matching and sub-classification – are applied to identifyomogenous treated and untreated groups. The treatment assign-ent mechanism is included in the sampling method in the form

f propensity scores, thereby reducing site selection bias. Usingropensity scores in the framework helps to replace the collectionf confounding covariates, thereby allowing simultaneous adjust-ent of the same confounders. The propensity score model can

roduce comparable set covariates in the treated and untreatedroups. ATE estimated using the matching and sub-classificationampling schemes is doubly robust since it is estimated using theual modeling of propensity scores and potential outcomes. One ofhe limitations of the framework is that it cannot account for hid-

en bias due to the exclusion of unmeasured confounding variables,hich may be present in the propensity scores model.

In the matching method, ATE is estimated after identifyingomogenous pairs of treated and untreated groups based on the

ent steps in stratification.

comparability of propensity scores. Matching is done using 1:1nearest neighbor and 1:n Mahalanobis metric matching. The calipervalue specified in the present study is 0.25 times the within groupstandard deviation of propensity scores. The unmatched entitiesare not considered when estimating ATE.

The sub-classification method is used to identify homogenoustreated and untreated entities by subdividing the entire dataset intosmall subsets based on different ranges of propensity scores. Oneof the major advantages of the sub-classification method is that noobservations are discarded in this method. In addition, it can beused to determine ATE of treated and untreated entities, which isvery important for policy making. There should be sufficient num-ber of treated and untreated entities with similar propensity scoresin each subclass; however, not many untreated entities will havepropensity scores near one and very few treated entities will havepropensity scores near zero.

We demonstrate the application of the propensity score-potential outcomes framework by estimating the safety effectsof fixed roadway lighting at intersections in Minnesota. The dataused in executing the proposed framework are non-randomized,observational data. The advantages of applying the proposedframework is that the cross-sectional data are not subjected toregression-to-mean bias; the data collection effort is less intensivewhen compared to before–after studies (i.e., no time-sequence isrequired); the method can be used to determine ATE even if changesare made to the treatment entity during the analysis period; and,the method does not require a date of application for the treatment.

4. Example modeling application

As described earlier, the countermeasure selected to executethe proposed propensity score-potential outcomes framework pre-sented above is the presence of fixed roadway illumination at

Page 7: Application of propensity scores and potential outcomes to estimate effectiveness of traffic safety countermeasures: Exploratory analysis using intersection lighting data

nalys

ipramsbutlS2ebAatz

isab1e1b2crwt(situci

wdit(w6oTtoits“sTAtec

trci

(

(

L. Sasidharan, E.T. Donnell / Accident A

ntersections in Minnesota. The Minnesota Department of Trans-ortation Traffic Safety Fundamentals Handbook (MnDOT, 2001)eports that intersection lighting has a B/C ratio of 21.0, the highestmong all safety improvements included in the handbook. Differentetrics have been used to quantify the safety effects of inter-

ection lighting, such as comparing crash frequencies (i.e., naïveefore–after or with–without comparisons between lighted andnlighted conditions), crash rates, night-to-day crash ratios, night-o-day crash rate ratios, etc. Many previous studies on the effect ofighting (Walker and Roberts, 1976; Lipinski and Wortman, 1978;chwab et al., 1982; Isebrands et al., 2004, 2006; Green et al.,003) were focused on the reduction in nighttime crashes. How-ver, Donnell et al. (2010) suggest that the effects of lighting onoth nighttime and daytime crash frequency should be considered.s such, this paper considers nighttime and daytime crashes in thenalysis. Because roadway lighting is turned off during the day-ime, it is hypothesized that the ATE during the daytime should beero.

The treatment effect estimates of intersection lighting reportedn published studies vary widely. Simple observational before–aftertudies conducted by comparing crash data from intersections withnd without lighting show that nighttime crash rates decreasey 45–52% (Walker and Roberts, 1976; Lipinski and Wortman,978); nighttime crash frequencies decrease by 13–49% (Schwabt al., 1982; Isebrands et al., 2004, 2006; Walker and Roberts,976; Green et al., 2003); and night-to-day crash ratios decreasey 22–40% (Preston and Schoenecker, 1999; Isebrands et al., 2004,006; Lipinski and Wortman, 1978). Observational with–withoutomparisons conducted by Schwab et al. (1982) reported a 39%eduction in nighttime crashes with lighting at intersections,hile Preston and Schoenecker (1999) reported 25% lower night-

ime crash rates at intersections with lighting. Isebrands et al.2004, 2006) reported that night-to-day crash ratios at inter-ections with lighting are 31% lower when compared to unlitntersections. Donnell et al. (2010) reported a 12% reduction inhe expected night-to-day crash ratio. The Highway Safety Man-al (AASHTO, 2010) reports a 4% reduction in the total number ofrashes when intersections are treated with fixed roadway light-ng.

To illustrate the propensity scores-potential outcomes frame-ork, data from Minnesota intersections were used. The analysisatabase (same database as in Donnell et al., 2010) contained

nformation from the Minnesota Highway Safety Information Sys-em (HSIS) roadway inventory and crash data files. Four years1999–2002) of crash and corresponding roadway inventory dataere used in the analysis. A total of 25,832 observations (from

464 intersections) were available for analysis. More than 42%f the intersections contained some form of roadway lighting.able 1 shows the definitions and descriptive statistics for con-inuous variables and proportions for categorical variables. 888f the 6464 intersections were signalized while the remain-ng 5576 operated under stop-controlled conditions. There werehree intersection forms coded in the database: cross, tee, andkew. Approximately 49% of the intersections were four-legged,cross” intersections. Nearly 40% were three-legged, “tee” inter-ections. The remaining 11% of the intersections were skew.here were 38,437 reported crashes in the analysis database.ll crashes that occurred within 250 ft. of where intersec-

ion roadway alignments cross were included in the database,xcept those involving crashes on icy roads, which totaled 827rashes.

A four year summary of Minnesota intersection crashes shows

hat 8156 daytime crashes and 3341 nighttime crashes wereeported at unlighted intersections, compared to 20,831 daytimerashes and 6109 nighttime crashes that were reported at lightedntersections. A direct comparison of the night-day crash ratios for

is and Prevention 50 (2013) 539– 553 545

lighted and unlighted intersections without considering confound-ing variables show a safety benefit of 40%, which is in agreementwith the findings from previous research cited above, with theexceptions being the work by Donnell et al. (2010) and the intersec-tion lighting crash modification factor found in the Highway SafetyManual (AASHTO, 2010).

The following describes how the modeling assumptions are metin the present study:

a) SUTVA: Crashes were coded as intersection-related if theyoccurred within 250 ft. of an intersection. The distance betweenintersections was not measured and was unavailable in thedatabase; however, it was assumed that the intersections arefar enough apart so that lighting installed at one intersectiondoes not influence the outcome (crashes) at another intersec-tion, thereby satisfying the SUTVA assumption.

b) Positivity: In the present study, none of the intersections had anestimated propensity score of zero or one, indicating that thereis a positive chance that every intersection may be lighted orunlighted.

(c) Unconfoundedness: The warrants for intersection lighting inMinnesota (MnDOT, 2003) indicate that fixed roadway illumi-nation systems may be installed based on traffic volumes; crashfrequency; the presence of a traffic signal, flashing beacons, orschool crossings; the presence of intersection channelization onhigh-speed (40 mph or higher) approaches; or, ambient lightthat adversely affects driver visibility. The covariates that influ-ence the installation of lighting (based on lighting warrants)were included in the propensity score model. Pre-treatmentoutcome (crashes) information was not available because thedata are cross-sectional and therefore, covariates that influencethe occurrence of crashes were also included in the propen-sity score model to better predict the treatment assignmentmechanism.

5. Results

The propensity score model developed for the treatment assign-ment (presence of intersection lighting) is shown in Table 2. Thepre-treatment outcome (crash) data were not available and there-fore, variables that influence the outcome (crashes) were includedto form a rich propensity score model that specifies the treatmentassignment mechanism. The propensity score model for the presentstudy was estimated based on the guidelines described by Schaferand Kang (2008). Propensity scores were estimated using binarylogistic regression as only two treatment levels (lighting vs. nolighting) are considered. Covariates that have signs in the expecteddirection and interaction terms that can influence the treatmentselection were also included in the propensity score model, irre-spective of statistical significance.

As shown in Table 2, the following variables are associated witha lower probability of lighting presence at an intersection: minorstop control indicator, high speed indicator, depressed medianindicator, two lane major road indicator, skew intersection typeindicator, divided major road indicator, and the urban–suburbanand no left shoulder interaction. The following variables are associ-ated with a higher probability of lighting presence at intersections:urban–suburban indicator, signal control indicator, no access con-trol indicator, no left shoulder indicator, no right shoulder indicator,log (major AADT), log (minor AADT), skew-minor stop interaction,urban–suburban and two lane interaction, high speed and two lane

interaction, high speed and depressed median interaction, and highspeed and no right shoulder interaction. It is interesting to note thatthe majority of the covariates that are associated with an increase inthe propensity of an intersection receiving lighting (coefficient > 0)
Page 8: Application of propensity scores and potential outcomes to estimate effectiveness of traffic safety countermeasures: Exploratory analysis using intersection lighting data

546 L. Sasidharan, E.T. Donnell / Accident Analysis and Prevention 50 (2013) 539– 553

Table 1Definitions and proportions of Minnesota data.

Variable Min. Max. Mean Std. dev.

Continuous variablesNight crash frequency (per year) 0 28 0.366 0.969Day crash frequency (per year) 0 55 1.121 2.457Major road average daily traffic 40 77,430 8284 9381Percentage heavy vehicles on major road 0 61.11 8.888 5.1092Minor road average daily traffic 1 77,430 3164 5179

Variable Proportion in sample

Categorical variablesArea type indicator(1 = urban/suburban; 0 = rural)

1: 45.6%0: 55.4%

Traffic control indicator(1 = signal; 2 = minor stop-controlled; 0 = all way stop-controlled)

0: 85.5%1: 13.7%2: 0.80%

Lighting indicator(1 = present; 0 = not present)

1: 42.1%0: 57.9%

Intersection type indicator(1 = skew; 0 = cross or tee)

1: 10.1%0: 89.9%

High speed indicatora

(1 = 50mph or greater; 0 otherwise)1: 67.3%0: 32.7%

No access control indicatora

(1 = no access; 0 = partial access control)1: 93.7%0: 6.3%

Depressed median indicatora

(1 = depressed median; 0 = barrier or no median)1: 11.5%0: 88.5%

Two lanea

(1 = Two lane; 0 = otherwise)1: 76.5%0: 23.5%

Paved left-shoulder indicatora

(1 = paved shoulder; 0 = unpaved or no shoulder)1: 53.9%0: 46.1%

No left shouldera

(1 = no left shoulder; 0 = otherwise)1: 29.0%0: 71.0%

Paved right-shoulder indicatora

(1 = paved shoulder; 0 = unpaved or no shoulder)1: 49.1%0: 50.9%

No right shouldera

(1 = no right shoulder; 0 = otherwise)1: 24.2%0: 75.8%

a 1: 20.4%

attt

TP

Divided(1 = divided; 0 = undivided)

a Data were available for the major intersecting roadway only.

re warrants for lighting in Minnesota (e.g., urban-suburban areaype, signal control, traffic volumes) (AASHTO, 2005). This suggestshat the estimated propensity score model is in accordance withhe actual treatment assignment for lighting.

able 2ropensity score model.

Variable Co

Natural logarithm of major road AADT 0.Natural logarithm of minor road AADT 0.Percentage heavy vehicles on major road −Area type indicator (1 = urban/suburban; 0 = rural) 2.Minor stop control indicator (1 = minor stop; 0 = otherwise) −Signal control indicator (1 = signalized; 0 = otherwise) 2.High speed indicator (1 = 50 mph or greater; 0 otherwise) −No access control indicatora (1 = no access; 0 = partial access control) 0.Depressed median indicatora (1 = depressed median; 0 = barrier or no median) −No left shoulder indicatora (1 = no left shoulder; 0 = otherwise) 1.No right shoulder indicatora (1 = no right shoulder; 0 = otherwise) 0.Skew intersection type indicator (1 = skew; 0 = cross or tee) −Two lanea (1 = two lane; 0 = otherwise) −Divided road indicatora (1 = divided; 0 = otherwise) −Skew-minor stop interaction 0.Urban–suburban and no left shoulder interaction −Urban–suburban and two lane interaction 0.High speed and two lane interaction 0.High speed and depressed median interaction 0.High speed and no left shoulder interaction −High speed and no right shoulder interaction 1.Constant −Number of observations = 22,092 LRLog likelihood = −7444.67 Ps

a Data were available for the major intersecting roadway only.

0: 79.6%

5.1. Matching

To execute the propensity scores-potential outcomes frame-work, there should be sufficient overlap between the propensity

efficient Std. err. z P > z 95% conf. interval

538 0.028 18.91 <0.001 0.482 0.594176 0.014 12.9 <0.001 0.149 0.202.005 0.005 −1.02 0.307 −0.015 0.005177 0.095 22.98 <0.001 1.991 2.3630.543 0.249 −2.18 0.029 −1.032 −0.055071 0.269 7.7 <0.001 1.544 2.5980.621 0.158 −3.93 <0.001 −0.93 −0.312148 0.097 1.53 0.127 −0.042 0.3380.516 0.171 −3.02 0.003 −0.85 −0.181318 0.212 6.21 <0.001 0.902 1.733452 0.199 2.27 0.023 0.062 0.8420.122 0.244 −0.5 0.617 −0.601 0.3570.615 0.146 −4.22 <0.001 −0.900 −0.3290.345 0.131 −2.64 0.008 −0.601 −0.089499 0.254 1.97 0.049 0.002 0.9960.507 0.136 −3.72 <0.001 −0.774 −0.24433 0.101 4.27 <0.001 0.234 0.632264 0.152 1.74 0.082 −0.034 0.561658 0.19 3.47 0.001 0.287 1.030.894 0.214 −4.18 <0.001 −1.313 −0.475387 0.224 6.18 <0.001 0.947 1.8266.625 0.411 −16.11 <0.001 −7.431 −5.819

�2(22) = 15582.57uedo R2 = 0.5114

Page 9: Application of propensity scores and potential outcomes to estimate effectiveness of traffic safety countermeasures: Exploratory analysis using intersection lighting data

L. Sasidharan, E.T. Donnell / Accident Analysis and Prevention 50 (2013) 539– 553 547

spssr

ptutsoiM

otrau0cffr

apuuasmuTg

s1t

iuaceFrsica

Fig. 4. (a) Distributions of frequency of logit propensity scores before matching. (b)

Fig. 3. Distribution of propensity scores.

cores for the treated and untreated groups. The overlap of theropensity scores for the lighted and unlighted intersections ishown in Fig. 3. Each data point in the figure represents the propen-ity score estimated for an intersection. The propensity scoresanging from 0 to 1 are shown along the horizontal axis.

Fig. 3 shows that there is considerable overlap between theropensity scores for lighted and unlighted intersections usinghe matching method. In this paper, a caliper value of 0.25� wassed to match lighted and unlighted intersections, where ‘�’ ishe within group standard deviation of the estimated propensitycores. The histogram showing the distribution of the frequencyf logit propensity score estimates for all lighted and unlightedntersections before matching and after 1:1 NN matching and 1:n

ahalanobis metric matching are shown in Fig. 4a–c, respectively.Fig. 4a shows that there are observations outside the common

verlap area. The logit propensity scores for the unlighted intersec-ions range from −5 to 5; whereas, those for lighted intersectionsange from about −4 to 6. The mean propensity scores for lightednd unlighted intersections are 0.776 and 0.189, respectively. Fornlighted intersections, the propensity scores range from 0.005 to.996, and from 0.028 to 0.998 for lighted intersections. Fig. 4b and

shows that the ranges of logit propensity scores are comparableor lighted and unlighted intersections after NN matching (rangesrom −4 to 5) and Mahalanobis matching (ranges from −4 to 6),espectively.

After 1:1 NN matching, the mean propensity scores for lightednd unlighted intersections are 0.63 and 0.41, respectively, and theropensity scores range from 0.028 to 0.996 for both lighted andnlighted intersections. The mean propensity scores for lighted andnlighted intersections after Mahalanobis metric matching are 0.77nd 0.48, respectively, and the range of propensity scores are theame as that of NN matching. These statistics clearly show thatatching improved the overlap of propensity scores for lighted and

nlighted intersections, when compared to the unmatched data.his indicates balance of covariates in the lighted and unlightedroups has been achieved.

Figs. 5 and 6 show the absolute standardized bias or absolutetandardized difference in means for the covariates before and after:1 NN matching and 1:n Mahalanobis metric matching, respec-ively.

Figs. 5 and 6 show that the percent bias, or absolute standard-zed difference in means, is large for the unmatched lighted andnlighted intersection sample when compared to the 1:1 (Fig. 5)nd 1:n (Fig. 6) matched intersections. Fig. 5 shows that the per-ent bias for the matched sample is less than 5% for all covariatesxcept the major road AADT variable, for which the bias is 10%.ig. 6 shows that the two-lane indicator variable for the majoroad had the highest percent bias after matching (12%). The very

mall bias for the covariates in the matched lighted and unlightedntersection samples, compared to the unmatched samples, indi-ates that the mean values for observed covariates in the lightednd unlighted groups are comparable after matching and the main

Distributions of frequency of logit propensity scores after NN matching. (c) Distri-butions of frequency of logit propensity scores after Mahalanobis metric matching.

Page 10: Application of propensity scores and potential outcomes to estimate effectiveness of traffic safety countermeasures: Exploratory analysis using intersection lighting data

548 L. Sasidharan, E.T. Donnell / Accident Analysis and Prevention 50 (2013) 539– 553

ching

dodqp

rrpnc

e

mfcmusc

Fo

Fig. 5. Absolute standardized difference in means before and after 1:1 NN mat

ifference between them is in the treatment status (i.e., presencef intersection lighting). The effect of intersection lighting is thenetermined for the matched intersections by modeling the fre-uency of crashes against covariates, including an indicator for theresence of lighting.

Tables 3 and 4 show the results of the negative binomialegression models estimated for nighttime and daytime crashes,espectively, using the matched data. Table 3 shows that theresence of lighting at intersections decreases the frequency ofighttime crashes. ATE of lighting estimated by 1:1 NN matching isomputed as follows:

xp(ˇlighting) − 1 = exp(−0.068) − 1 = −6.6%.

The weighted regression ATE estimate based on Mahalanobisetric matching estimates a 4.7% reduction in nighttime crash

requency. The explanatory variables that were positively asso-iated with the expected nighttime crash frequency for both

ethods were the major and minor road nighttime traffic vol-

mes, indicator for signal control, indicator for minor stop-control,kewed intersection form indicator, and paved right shoulder indi-ator. Explanatory variables that were negatively associated with

ig. 6. Absolute standardized difference in means before and after 1:n Mahalanobis metrnly).

(*indicates that data were available for the major intersecting roadway only).

expected nighttime crash frequency were the lighting presenceindicator, percentage heavy vehicles on the major road, indica-tor for high speed major roadways, and the paved left shoulderindicator.

The ATE for lighting estimated using 1:1 NN matching and 1:nMahalanobis metric matching are a 3% and 2.8% increase, respec-tively, for the frequency of daytime crashes. This was expectedbecause lighting cannot improve visibility during the daytime andtherefore the installation of lighting should have a nominal or noeffect on daytime crash frequency. In this case, the magnitudeof the lighting presence indicator was small, suggesting that theaddition of lighting on an intersection approach may result in aslight increase in daytime crashes from the introduction of fixedobjects to the roadside. There is considerable overlap between theconfidence intervals of the ATE estimated using the 1:1 NN and1:n Mahalanobis metric matching methods when considering day-time crashes. The advantage of the matching method is the biasreduction resulting from the identification and use of homoge-

nous lighted and unlighted intersections to determine ATE. A majordrawback of the 1:1 matching method is the requirement of a largesample of unlighted intersections for matching with the lightedintersections.

ic matching (*indicates that data were available for the major intersecting roadway

Page 11: Application of propensity scores and potential outcomes to estimate effectiveness of traffic safety countermeasures: Exploratory analysis using intersection lighting data

L. Sasidharan, E.T. Donnell / Accident Analysis and Prevention 50 (2013) 539– 553 549

Table 3Nighttime crash frequency models.

Variable Nighttime crashes

1:1 NN matching 1:n Mahalanobis matching

Parameter estimate S.E p-Value Parameter estimate S.E p-Value

Intersection lighting indicator (1 = present; 0 = not present) −0.068 0.059 0.244 −0.048 0.024 0.052Intersection type indicator (1 = skew; 0 = cross or tee) 0.484 0.084 <0.001 0.397 0.035 <0.001Area type indicator (1 = urban/suburban; 0 = rural) −0.478 0.061 <0.001 −0.393 0.036 <0.001Minor-stop control indicator (1 = minor stop controlled; 0 = otherwise) 0.377 0.322 0.241 0.685 0.127 <0.001Signal control indicator (1 = signalized; 0 = unsignalized) 0.664 0.116 <0.001 0.723 0.032 <0.001High speed indicatora (1 = 50 mph or greater; 0 otherwise) −0.187 0.061 0.002 −0.214 0.026 <0.001Percentage heavy vehicles on major roada 0.001 0.008 0.863 −0.022 0.004 <0.001No access control indicatora (1 = no access; 0 = partial access control) −0.075 0.1 0.456 0.059 0.041 0.143Depressed median indicatora (1 = depressed median; 0 = barrier or no median) 0.030 0.113 0.792 −0.021 0.043 0.634Paved left shoulder indicatora (1 = paved shoulder; 0 = unpaved or no shoulder) −0.251 0.116 0.031 −0.031 0.045 0.489Paved right shoulder indicatora (1 = paved shoulder; 0 = unpaved or no shoulder) 0.106 0.118 0.371 −0.050 0.038 0.188Two lane indicatora (1 = two lane; 0 = otherwise) −0.153 0.089 0.086 −0.289 0.030 <0.001Natural logarithm of major road AADT at night 0.585 0.044 <0.001 0.572 0.021 <0.001Natural logarithm of minor road AADT at night 0.069 0.016 <0.001 0.124 0.011 <0.001Constant −5.503 0.414 <0.001 −6.687 0.220 <0.001Overdisperson (˛) 1.285 0.107 0.434 0.023

N = 6188LL(constantonly) = −4268.64LL (fullmo

N = 12,317LL(constantonly) = −16880.075LL(full

caun1otrimiM

TD

a Data were available for major intersecting roadway only.

As noted in the literature review, night-to-day ratios are oftenomputed to account for differences in crash frequency at lightednd unlighted intersections; the daytime serves as a control to eval-ate the assumption that the presence of fixed roadway lighting isot associated with daytime crash frequencies (CIE, 1992; Griffith,994). If, however, there is an association between the presencef fixed poles and crashes during the daytime, it is assumed thathis same association is present at night. In this case, night-to-dayatios are used as the metric to evaluate the effectiveness of light-

ng presence. Using the parameter estimates from the NN matching

ethod, the night-to-day crash ratios are 9.45% lower at lightedntersections when compared to unlighted intersections. Using the

ahalanobis metric matching models, the night-to-day crash ratio

able 4aytime crash frequency models.

Variable Day

1:1

Par

Intersection lighting indicator (1 = present; 0 = not present) 0.0Intersection type indicator (1 = skew; 0 = cross or tee) 0.6Area type indicator (1 = urban/suburban; 0 = rural) −0Minor-stop control indicator (1 = minor stop controlled; 0 = otherwise) −0Signal control indicator (1 = signalized; 0 = unsignalized) 0.6High speed indicatora (1 = 50 mph or greater; 0 otherwise) −0Percentage heavy vehicles on major roada −0No access control indicatora (1 = no access; 0 = partial access control) 0.0Depressed median indicatora (1 = depressed median; 0 = barrier or no median) 0.0Paved left shoulder indicatora (1 = paved shoulder; 0 = unpaved or no shoulder) −0Paved right shoulder indicatora (1 = paved shoulder; 0 = unpaved or no shoulder) 0.1Two lane indicatora (1 = two lane; 0 = otherwise) −0Natural logarithm of major road AADT at night 0.6Natural logarithm of minor road AADT at night 0.1Constant −6Overdisperson (˛) 1.2

N =LL(onlLL(mo

a Data were available for major intersecting roadway only.

del) = −3930.19 model) = −16477.036

is 6.0% lower at lighted intersections when compared to unlightedintersections. The estimates of night-to-day crash ratios from bothmethods are comparable to the safety effect estimates reportedby Donnell et al. (2010) (12% lower night-to-day crash ratio forlighted intersections compared to unlighted intersections), Grossand Donnell (2011) (12% lower night-to-day crash ratio for lightedintersections) and the effect published in the Highway Safety Man-ual (4% reduction in the total number of crashes).

5.2. Sub-classification

For the sub-classification method, data were first divided intofive strata based on 20th, 40th, 60th and 80th percentile values

time crashes

NN matching 1:n Mahalanobis metric matching

ameter estimate S.E p-Value Parameter estimate S.E p-Value

31 0.043 0.464 0.028 0.017 0.05476 0.062 <0.001 0.492 0.025 <0.001.087 0.045 0.055 −0.134 0.027 <0.001.124 0.272 0.649 0.257 0.095 0.00728 0.094 <0.001 0.701 0.022 <0.001.193 0.045 <0.001 −0.119 0.018 <0.001.003 0.006 0.658 −0.024 0.003 <0.00143 0.081 0.594 −0.033 0.031 0.28321 0.089 0.803 0.011 0.032 0.734.110 0.100 0.023 −0.080 0.032 0.01118 0.102 0.003 −0.029 0.028 0.301.120 0.064 0.062 −0.188 0.020 <0.00100 0.032 <0.001 0.558 0.014 <0.00119 0.012 <0.001 0.146 0.007 <0.001.140 0.338 <0.001 −5.802 0.150 <0.00188 0.057 0.528 0.014

6188constanty) = −7799.69fulldel) = −7211.29

N = 12,317LL(constantonly) = −33165.83LL(fullmodel) = −30563.688

Page 12: Application of propensity scores and potential outcomes to estimate effectiveness of traffic safety countermeasures: Exploratory analysis using intersection lighting data

550 L. Sasidharan, E.T. Donnell / Accident Analysis and Prevention 50 (2013) 539– 553

Table 5Different strata for analysis.

Strata N Range (p) Mean (p) Treatment effect

UL L Total Min Max UL L Night (N) Day (D)

1 4288 130 4418 0.005 0.066 0.042 0.051 0.228 −0.0802 4059 360 4419 0.066 0.148 0.101 0.109 0.130 0.1893a 878 282 1160 0.148 0.200 0.169 0.170 −0.069 −0.1243b 407 124 531 0.200 0.250 0.221 0.224 −0.102 0.0383c 199 111 310 0.250 0.300 0.273 0.272 −0.597 −0.1833d 254 205 459 0.300 0.399 0.350 0.357 −0.756 −0.0383e 268 284 552 0.400 0.500 0.448 0.449 0.020 0.0723f 300 365 665 0.500 0.589 0.549 0.549 −0.050 0.2423g 272 469 741 0.589 0.659 0.625 0.627 −0.038 0.0584a 218 330 548 0.659 0.700 0.681 0.680 −0.132 0.0894b 300 1035 1335 0.700 0.800 0.743 0.754 −0.131 −0.1584c 119 776 895 0.800 0.850 0.823 0.827 −0.183 −0.079

oepip

rtf

fTteftase

iiuc

naidnWllta

6

pApmuotec

4d 205 1436 1641 0.850

5 201 4217 4418 0.912

f propensity scores, and the balance of covariates were tested inach stratum. Strata 3 and 4 were further subdivided until the meanropensity scores for treated (lighted) and untreated (unlighted)

ntersections in each group were comparable (difference in meanropensity scores <0.01).

The sample size for lighted and unlighted intersections (N), theange of propensity scores (p), mean propensity score values, andreatment effect estimates based on nighttime and daytime crashrequencies for each stratum are shown in Table 5.

Table 5 shows that the difference between the mean p-valuesor lighted and unlighted groups is less than 0.01 for all strata.his indicates that lighted and unlighted intersections in each stra-um are comparable. The last two columns shows the treatmentffect estimates based on nighttime and daytime crash frequenciesor all strata. The overall treatment effect of lighting estimated byhe stratification method is a 13% reduction in nighttime crashesnd a 1% reduction in daytime crashes (based on Eq. (14)). Table 6hows the ATE resulting from the stratification method, which wasstimated using Eqs. (15) and (16).

ATE for lighted intersections (if lighting is turned off at all lightedntersections) is an 11% increase in nighttime crashes and 1.6%ncrease in daytime crashes. ATE for unlighted intersections (if allnlighted intersections are lighted) is a 9.4% decrease in nighttimerashes and a 3.6% decrease in daytime crashes.

The advantage of using the stratification method is thato observations are discarded from the database. This methodlso allows the determination of ATE for lighted and unlightedntersections which is important from a policy standpoint. Theisadvantage of the method is that there should be a minimumumber of lighted and unlighted intersections in each stratum.hen we stratify to meet the difference in mean propensity scores

ess than a threshold value, in some cases the number of lighted orighted intersections will be small in certain strata, especially forhe first and last strata where propensity scores tend toward zerond one, respectively.

. Discussion

This paper reviewed and demonstrated the application of aropensity scores-potential outcomes framework to estimate theTE of intersection lighting using data from Minnesota. The sam-ling schemes using propensity score matching and stratificationethods presented in this paper identified comparable lighted and

nlighted intersections. This method yielded ATE for the presence

f intersection lighting under non-experimental settings in whichhe treated group substantially differs from the pool of untreatedntities. The ATE for intersection lighting in the present study wasonsistent with recent estimates of the safety effect found in the

0.912 0.879 0.885 −0.120 −0.1670.999 0.950 0.971 −0.151 0.015

published literature (Donnell et al., 2010; Gross and Donnell, 2011),and is also consistent with the crash modification factor presentedin the Highway Safety Manual (AASHTO, 2010).

The site selection mechanism was incorporated into thepropensity scores-potential outcomes framework by modeling thetreatment assignment mechanism in the form of propensity scores.These propensity scores are then used to identify homogenoustreated and untreated entities thus mitigating sample site selec-tion bias. The comparability of the results of this study to otherrecently published methods for the evaluation of intersection light-ing, which employed cross-sectional and case–control studies,indicates the appropriateness of this method to analyze counter-measure effectiveness from retrospective, non-randomized trafficsafety data. The propensity scores-potential outcomes frameworkpresented in this paper addresses one of the major limitations ofcross-sectional studies, which is to minimize bias in the ATE esti-mates that result in differences in observed covariates for treatedand untreated entities. According to a rule-of-thumb provided byRubin (2001), the treatment effect estimated using cross-sectionalmodels are not trustworthy when the difference between themeans of the logit-propensities for treated and untreated groupsexceeds half of the pooled within-group standard deviation ofpropensity scores. Under such circumstances, sampling schemesusing propensity scores helps to identify treated and untreatedentities with the same propensity of receiving the treatment and,therefore, the ATE can be determined by comparing the comparabletreated and untreated entities.

As described earlier, a limitation of the matched case–controlmethod to assess the effects of traffic safety treatments relates tothe difficulty in matching outcomes when a large number of con-founding variables are present in the analysis data. The samplingschemes employed in the present study overcome this limitationby matching based on a single metric, propensity score, which alsois estimated based on confounding variables.

With regards to observational before–after studies in trafficsafety using the EB method (Hauer, 1997), the treatment effect isdetermined based on a safety performance function (SPF), whichis developed using geometric design and traffic information froma reference group. As such, identifying a reference group that wasnot treated is an important consideration in the method. Statis-tical methods to identify reference group sites that are similar totreated sites provide an opportunity for future research. Propensityscores offer one possible opportunity to identify similar referencegroup and treatment sites. Propensity scores can be estimated using

the confounders from a database with treated sites and a poolof potential reference sites. The reference sites used to estimateSPFs may be identified from the pool of potential reference sitesbased on the comparability of propensity scores to the treated sites.
Page 13: Application of propensity scores and potential outcomes to estimate effectiveness of traffic safety countermeasures: Exploratory analysis using intersection lighting data

L. Sasidharan, E.T. Donnell / Accident Analysis and Prevention 50 (2013) 539– 553 551

Table 6Stratification treatment effect estimates.

Treatment status Nighttime crashes Daytime crashes

Average treatment effect (ATE) S.E Average treatment effect (ATE) S.E

Unlighted 0.099 0.038 0.037 0.0350

Ctf

cntaa2pmtemseawoiupdm(biitat

c[t(sslm2fi

prhpla

7

pip

Lighted −0.117

omparable propensity scores for the treated and reference enti-ies ensures comparability of the two groups based on confoundingactors (Schafer and Kang, 2008).

Another advantage of using the propensity score-potential out-omes framework is that the ATE estimate is doubly robust (i.e., it isot biased even if one of the two models [propensity score or poten-ial outcomes] is incorrect) as a result of using the dual modelingpproach (separate models for treatment assignment mechanismnd potential outcomes) (Elliott and Little, 2000; Little and An,004; Kang and Schafer, 2007; Schafer and Kang, 2008). The pro-osed framework also reduces the sensitivity of estimates to modelisspecification (Rosenbaum, 2005) and can reduce overt biases

o a great extent (Rubin, 1973). Both matching methods consid-red in this paper utilize a caliper value to find the best possibleatch for the treated entity from a pool of untreated entities. We

pecified one fourth of the within group standard deviation of thestimated propensity score for the treated and untreated groupss the caliper for matching. However, we suspect that the estimateill depend on the caliper value specified, and thus is a topic rec-

mmended for further research. Another limitation of the methods that unlike randomized experiments in which both observed andnobserved variables are randomly distributed across groups, theropensity score-potential outcomes framework only mimic ran-omization based on observed covariates and therefore hidden biasay exist. Addressing such hidden bias is problematic. Rosenbaum

2002) showed a sensitivity analysis in which a hypothetical hiddenias of varying magnitudes was included in the analysis to exam-

ne how it influences the conclusions of the study. However, evenn this analysis, there is no way to know the existence and magni-ude of a hidden bias. Including as many variables as possible thatre directly or indirectly related to the treatment assignment andhe outcome in the propensity score model can mitigate this issue.

The propensity score estimation method discussed in this paperonsiders only two treatment conditions (i.e., with treatmentlighted intersections] and without treatment [unlighted intersec-ions]). However, for conditions with more than two treatmentse.g., determining ATE for different types of intersection lightinguch as point, partial, full and continuous lighting), a combination ofeveral binary logistic regression models, or multinomial or nestedogit models, should be considered based on whether the treat-

ent is qualitatively distinct or without a logical order (Imbens,000). Future research is thus recommended to further explore thendings from a framework that considers more than two outcomes.

Finally, lighting design parameters were not considered in theresent study due to data limitations. Future research is thereforeecommended to assess the safety effects of luminaire mountingeight and spacing, glare, luminance, uniformity, and contrast. Aossible first step to perform this assessment would be to develop a

ighting management system that is linkable to roadway inventorynd crash databases.

. Conclusions

This paper determined the ATE of intersection lighting using aropensity score-potential outcomes framework. Both the match-

ng and sub-classification sampling schemes considered in theaper suggest a nighttime safety benefit for the installation of

.025 −0.017 0.017

intersection lighting. NN matching estimated a 6.6% reduction inthe frequency of nighttime crashes, while the Mahalanobis met-ric matching method estimated a nighttime safety reduction of5%. ATE for lighted intersections estimated using stratificationproduced an 11.7% reduction in the nighttime crash frequencyestimate. Each of the methods produced different results for day-time crashes; however, the daytime effect estimates were smallin magnitude and not statistically significant. This finding wasexpected since roadway lighting is not turned on during thedaytime.

Because cross-sectional data can be easily procured when com-pared to before–after data, and are not subjected to the rtmproblem, suggests that the proposed propensity scores-potentialoutcomes framework is a viable alternative to the EB method todetermine ATE when before–after data are not available. Addition-ally, the use of propensity scores can ensure that a reference groupand treatment sites are similar when employing the EB method inan observational before–after study by selecting sites in the refer-ence group that have similar logit propensities at the treatmentsites. Further research is recommended to determine and com-pare the treatment effect estimates produced by the propensityscores-potential outcomes framework and the EB method using asimulated before–after dataset with a control group, for which atrue cause-effect relationship is known. With regards to proposedpropensity scores-potential outcomes framework, the caliper valuemay influence the ATE estimate and therefore additional researchusing simulated data is recommended to determine the effect ofvarying the caliper value on the ATE estimate for traffic safety data.

Acknowledgements

The authors would like to thank Dr. Joseph L. Schafer, Prin-cipal Researcher/Mathematical Statistician, Center for StatisticalResearch and Methodology, U.S. Census Bureau for his valuable sug-gestions concerning this research. The authors would also like toacknowledge Drs. Paul Jovanis and Martin Pietrucha, both mem-bers of the lead author’s Ph.D. thesis committee, for their insightsand suggestions on this research. Finally, the authors would like tothank the anonymous reviewers whose suggestions have improvedthe manuscript.

References

American Association of State Highway Transportation Officials, 2005. RoadwayLighting Design Guide. Washington, DC.

American Association of State Highway Transportation Officials, 2010. HighwaySafety Manual, Washington, DC.

Aul, N., Davis, G., 2006. Use of propensity score matching method and hybridBayesian method to estimate crash modification factors of signal installation.Transportation Research Record 1950, 17–23.

Bang, H., Robins, J.M., 2005. Doubly robust estimation in missing data and causalinference models. Biometrics 61, 962–972.

Bunn, T.L., Yu, L., Slavova, S., Bathke, A., 2009. The effects of semi truck driver age andgender and the presence of passengers on collisions with other vehicles. TrafficInjury and Prevention 20 (3), 266–272.

Cao, X., 2009. Exploring causal effects of neighborhood design on travel behav-

ior using stratification on the propensity score. In: TRB 88th Annual MeetingCompendium of Papers DVD, Paper No. 09-0115.

Road Lighting as an Accident Countermeasure. CIE No. 93, 1992.Commission Internationale de l’Éclairage, Vienna, Austria.http://cie.kee.hu/newcie/framepublications.html.

Page 14: Application of propensity scores and potential outcomes to estimate effectiveness of traffic safety countermeasures: Exploratory analysis using intersection lighting data

5 nalys

C

C

C

D

D

D

D

D

D

D

D

E

G

G

G

G

G

G

G

G

H

H

H

H

H

H

H

H

H

H

H

I

I

52 L. Sasidharan, E.T. Donnell / Accident A

ochran, W.G., 1968. The effectiveness of adjustment by subclassification in remov-ing bias in observational studies. Biometrics 24, 205–213.

ohen, J., 1988. Statistical Power Analysis for the Behavioral Sciences, 2nd ed. Erl-baum, Hillsdale, NJ.

ostello, T.M., Schulman, M.D., Mitchell, R.E., 2009. Risk factors for a farm vehiclepublic road crash. Accident Analysis and Prevention 41 (1), 42–47.

’Agostino Jr., R.B., 1998. Propensity score methods for bias reduction in the com-parison of a treatment to a non-randomized control group. Statistics in Medicine17, 2265–2281.

avis, G.A., 2000. Accident reduction factors and causal inference in traffic safetystudies: a review. Accident Analysis and Prevention 32, 95–109.

ehejia, R., Wahba, S., 1995. An oversampling algorithm for causal inference in non-experimental studies with incomplete matching and missing outcome variables.Harvard University, unpublished.

ehejia, R., Wahba, S., 1997. Causal effects in non-experimental studies: re-evaluating the evaluation of training programs. Econometric Methods forProgram Evaluation, Ph.D. Dissertation, Harvard University (Chapter 1).

ehejia, R., Wahba, S., 1998. Causal effects in non-experimental studies: re-evaluating the evaluation of training programs, NBER Working Papers 6586,National Bureau of Economic Research, Inc.

ehejia, R., Wahba, S., 2002. Propensity score matching methods for non-experimental causal studies. The Review of Economics and Statistics 84 (1),151–161.

onnell, E.T., Gross, F., 2011. Case-control and cross-sectional methods for estimat-ing crash modification factors: comparisons from roadway lighting and laneand shoulder width safety effect studies. Journal of Safety Research 42 (2),117–129.

onnell, E.T., Porter, R.J., Shankar, V.N., 2010. A framework for estimating the safetyeffects of roadway lighting at intersections. Safety Science 48 (10), 1436–1445.

lliott, M.R., Little, R.J.A., 2000. Model-based alternatives to trimming surveyweights. Journal of Official Statistics 16, 191–209.

elman, A., Meng, X.L., 2004. Applied Bayesian Modeling and Causal Inference fromIncomplete-data Perspectives. Wiley, New York.

reen, E.R., Agent, K.R., Barrett, M.L., Pigman, J.G., 2003. Roadway lighting and driversafety. Report No. KTC-03-12/SPR247-02-1F. University of Kentucky, Lexington,KY.

reenland, S., Brumback, B., 2002. Theory and methods an overview of relationsamong causal modeling methods. International Journal of Epidemiology 31,1030–1037.

reenland, S., Pearl, J., Robins, J., 1999. Causal diagrams for epidemiologic research.Epidemiology 10, 37–48.

riffith, M.S., 1994. Comparison of the Safety of Lighting Options on Urban Freeways.Public Roads Online (September 1994).

ross, F., Donnell, E.T., 2011. Case–control and cross-sectional methods for estimat-ing crash modification factors: comparisons from roadway lighting and laneand shoulder width safety effect studies. Journal of Safety Research 42 (2),117–130.

ross, F., Jovanis, P.P., 2007. Estimation of the safety effectiveness of lane and shoul-der width: case–control approach. Journal of Transportation Engineering 133(6), 362–369.

ross, F., Jovanis, P.P., Eccles, K.A., Ko-Yu, C., 2009. Safety evaluation of lane andshoulder width combinations on rural, two-lane, undivided roads. Final Report,FHWA-HRT 09-031.

arwood, D.W., Bauer, E., Potts, I.B., Torbic, D.J., Richard, K.R., Kohlman Rabbani, E.R.,Hauer, E., Elefteriadou, L., 2002. Safety Effectiveness of Intersection Left- andRight-turn Lanes. FHWA-RD-02-089. Federal Highway Administration, McLean,VA.

auer, E., 1997. Observational Before–After Studies in Road Safety. Pergamon Press,Oxford.

auer, E., 2004. Statistical road safety modeling. Transportation Research Record1897, 81–87.

auer, E., 2010. Cause, effect and regression in road safety: a case study. AccidentAnalysis and Prevention 42 (4), 1128–1135.

auer, E., Persaud, B., 1983. A common bias in before and after com-parison and its elimination. Transportation Research Record 905. Trans-portation Research Board, National Research Council, Washington, DC, pp.164–174.

auer, E., Harwood, D.W., Council, F.M., Griffith, M., 2002. Estimating safety bythe empirical Bayes method: a tutorial. Transportation Research Record 1784,126–131.

eckman, J.J., Ichimura, H., Todd, P.E., 1997a. Matching as an econometric evalua-tion estimator: evidence from evaluating a job training programme. Review ofEconomic Studies 64, 605–654.

eckman, J.J., Smith, J., Clements, N., 1997b. Making the most out of programmeevaluations and social experiments: accounting for heterogeneity in programmeimpacts. Review of Economic Studies 64, 487–535.

eckman, J.J., Ichimura, H., Smith, J., Todd, P.E., 1998. Characterizing selection biasusing experimental data. Econometrica 66 (5), 1017–1098.

ernan, M.A., Robins, J.M., 2006. Estimating causal effects from epidemiological data.Journal of Epidemiological Community Health 60, 578–586.

olland, P.W., 1986. Statistics and causal inference. Journal of the American Statis-

tical Association 81, 945–970.

mbens, G.W., 2000. The role of the propensity score in estimating dose-responsefunctions. Biometrika 83, 706–710.

sebrands, H., Hallmark, S., Hans, Z., McDonald, T., Preston, H., Storm, R., 2004. Safetyimpacts of street lighting at isolated rural intersections: Part II, year 1 report.

is and Prevention 50 (2013) 539– 553

Center for Transportation Research and Education, Iowa State University, Ames,IA.

Isebrands, H., Hallmark, S., Hans, Z., McDonald, T., Preston, H., Storm, R., 2006. SafetyImpacts of Street Lighting at Isolated Rural Intersections – Part II. MinnesotaDepartment of Transportation, St. Paul.

Kang, J.D.Y., Schafer, J.L., 2007. Demystifying double robustness: a comparison ofalternative strategies for estimating population means from incomplete data.Statistical Science 26, 523–539.

Karwa, V., Slavkovic, A.B., Donnell, E.T., 2011. Causal inference in transportationsafety studies: comparison of the potential outcomes and causal Bayesian net-works. Annals of Applied Statistics 5 (2B), 1428–1455.

LaLonde, R., 1986. Evaluating the econometric evaluations of training programs.American Economic Review 76 (4), 604–620.

Lipinski, M.E., Wortman, R.H., 1978. Effect of illumination on rural at-grade inter-section accidents. Transportation Research Record 611. TRB, National ResearchCouncil, Washington, DC, pp. 25–27.

Little, R.J.A., An, H., 2004. Robust likelihood-based analysis of multivariate data withmissing values. Statistica Sinica 14, 949–968.

Little, R.J., Rubin, R.B., 2000. Causal effects in clinical and epidemiological studiesvia potential outcomes: concepts and analytical approaches. Annual Review ofPublic Health 21, 121–145.

Lunceford, J.K., Davidian, M., 2004. Stratification and weighting via the propensityscore in estimation of causal treatment effects: a comparative study. Statisticsin Medicine 23, 2937–2960.

Majdzadeh, R., Khalagi, K., Naraghi, K., Motevalian, A., Eshraghian, M.R., 2008. Deter-minants of traffic injuries in drivers and motorcyclists involved in an accident.Accident Analysis and Prevention 40 (1), 17–23.

Miaou, S.P., Lum, H., 1993. Modeling vehicle accidents and highway geometric designrelationships. Accident Analysis and Prevention 25 (6), 689–709.

Ming, K., Rosenbaum, P.R., 2000. Substantial gains in bias reduction from matchingwith a variable number of controls. Biometrics 56, 118–124.

Minnesota Department of Transportation, 2001. Traffic Safety Fundamentals Hand-book. Office of Traffic Safety, St. Paul, MN.

Minnesota Department of Transportation, 2003. Roadway Lighting Design Manual.Office of Traffic Safety, St. Paul, MN.

Park, Y.P., Saccomanno, F.F., 2007. Reducing treatment selection bias forestimating treatment effects using propensity score method. Journal ofTransportation Engineering, American Society of Civil Engineers 133 (2),112–118.

Pearl, J., 1995. Causal diagrams for experimental research. Biometrika 82,669–710.

Persaud, B., Craig, L., Kimberly. E., Nancy, L., Frank, G., 2009. Safety evaluation ofoffset improvements for left-turn lanes. FHWA-HRT-09-035.

Preston, H., Schoenecker, T., 1999. Safety impacts of street lighting at rural intersec-tions. Report No. 1999-17. Minnesota Department of Transportation, St. Paul,MN.

Robins, J., 1999. Marginal structural models versus structural nested models as toolsfor causal inference. In: Halloran, M.E., Berry, D. (Eds.), Statistical Models in Epi-demiology: The Environment and Clinical Trials. Springer-Verlag, New York, pp.95–134.

Rosenbaum, P.R., 2002. Observational Studies, 2nd ed. Springer, New York,NY.

Rosenbaum, P.R., 2005. Observational Study. In: Everitt, B.S., Howell, D.C. (Eds.),Encyclopedia of Statistics in Behavioral Science, vol. 3. John Wiley andSons.

Rosenbaum, P.R., Rubin, D.B., 1983. The central role of the propensity score in obser-vational studies for causal effects. Biometrika 70, 41–55.

Rosenbaum, P.R., Rubin, D.B., 1984. Reducing bias in observational studies usingsub-classification on the propensity score. Journal of the American StatisticalAssociation 79, 516–524.

Rosenbaum, P.R., Rubin, D.B., 1985. Constructing a control group using multivariatematched sampling methods that incorporate the propensity score. AmericanStatistician 39, 33–38.

Rubin, D.B., 1973. The use of matched sampling and regression adjustment to removebias in observational studies. Biometrics 29, 185–203.

Rubin, D.B., 1978. Bayesian inference for causal effects: the role of randomization.Annals of Statistics 6, 34–58.

Rubin, D.B., 1980. Bias reduction using Mahalanobis metric matching. Biometrics 36(2), 293–298.

Rubin, D.B., 1990. Neyman (1923) Causal inference in experiments and observationalstudies. Statistical Science 5, 472–480.

Rubin, D.B., 2001. Using propensity scores to help design observational studies:application to the tobacco litigation. Health Services and Outcomes ResearchMethodology 2, 169–188.

Rubin, D.B., Thomas, N., 1992. Characterizing the effect of matching using lin-ear propensity score methods with normal distributions. Biometrika 79,797–809.

Rubin, D.B., Thomas, N., 1996. Matching using estimated propensity scores: relatingtheory to practice. Biometrics 52, 249–264.

Schafer, J.L., Kang, J., 2008. Average causal effects from nonrandomized stud-ies: a practical guide and simulated example. Psychological Methods 13 (4),

279–313.

Schwab, R.N., Walton, N.E., Mounce, J.M., Rosenbaum, M.J., 1982. Synthesis of safetyresearch related to traffic control and roadway elements – Volume 2, Chapter12: Highway Lighting. Report No. FHWA-TS-82-233. Federal Highway Adminis-tration.

Page 15: Application of propensity scores and potential outcomes to estimate effectiveness of traffic safety countermeasures: Exploratory analysis using intersection lighting data

nalys

T

T

L. Sasidharan, E.T. Donnell / Accident A

arko, A., Eranky, S., Sinha, K., 1998. Methodological considerations in the develop-

ment and use of crash reduction factors. Presented at 77th Annual Meeting ofthe Transportation Research Board, Washington, DC.

sai, Y.J., Wang, J.D., Huang, W.F., 1995. Case–control study of the effectiveness ofdifferent types of helmets for the prevention of head injuries among motor cycleriders in Taipei, Taiwan. American Journal of Epidemiology 142 (9), 974–981.

is and Prevention 50 (2013) 539– 553 553

Walker, F.W., Roberts, S.E., 1976. Influence of lighting on accident frequency at high-

way intersections. Transportation Research Record 562. TRB, National ResearchCouncil, Washington, DC, pp. 73–78.

Yanovitzky, I., Zanutto, E., Hornik, R., 2005. Estimating causal effects of public healtheducation campaigns using propensity score methodology. Evaluation and Pro-gram Planning 28, 209–220.


Recommended