+ All Categories
Home > Documents > Evaluating an employee wellness program

Evaluating an employee wellness program

Date post: 23-Dec-2016
Category:
Upload: jeanne
View: 214 times
Download: 1 times
Share this document with a friend
27
Int J Health Care Finance Econ (2013) 13:173–199 DOI 10.1007/s10754-013-9127-4 Evaluating an employee wellness program Sankar Mukhopadhyay · Jeanne Wendel Received: 20 November 2012 / Accepted: 4 May 2013 / Published online: 9 June 2013 © Springer Science+Business Media New York 2013 Abstract What criteria should be used to evaluate the impact of a new employee wellness program when the initial vendor contract expires? Published academic literature focuses on return-on-investment as the gold standard for wellness program evaluation, and a recent meta-analysis concludes that wellness programs can generate net savings after one or two years. In contrast, surveys indicate that fewer than half of these programs report net savings, and actuarial analysts argue that return-on-investment is an unrealistic metric for evaluat- ing new programs. These analysts argue that evaluation of new programs should focus on contract management issues, such as the vendor’s ability to: (i) recruit employees to par- ticipate and (ii) induce behavior change. We compute difference-in-difference propensity score matching estimates of the impact of a wellness program implemented by a mid-sized employer. The analysis includes one year of pre-implementation data and three years of post- implementation data. We find that the program successfully recruited a broad spectrum of employees to participate, and it successfully induced short-term behavior change, as mani- fested by increased preventive screening. However, the effects on health care expenditures are positive (but insignificant). If it is unrealistic to expect new programs to significantly reduce healthcare costs in a few years, then focusing on return-on-investment as the gold stan- dard metric may lead to early termination of potentially useful wellness programs. Focusing short-term analysis of new programs on short-term measures may provide a more realistic evaluation strategy. Keywords Wellness · Cost · Absenteeism · Screenings · Return-on-investment JEL Classification I11 · I19 Karl Geisler and Tim Morgan provided excellent research assistance. This work was supported by a summer research grant from the College of Business at University of Nevada, and by the employer that provided the data. S. Mukhopadhyay (B ) · J. Wendel Department of Economics (0030), University of Nevada, Reno, NV 89557-0030, USA e-mail: [email protected] 123
Transcript
Page 1: Evaluating an employee wellness program

Int J Health Care Finance Econ (2013) 13:173–199DOI 10.1007/s10754-013-9127-4

Evaluating an employee wellness program

Sankar Mukhopadhyay · Jeanne Wendel

Received: 20 November 2012 / Accepted: 4 May 2013 / Published online: 9 June 2013© Springer Science+Business Media New York 2013

Abstract What criteria should be used to evaluate the impact of a new employee wellnessprogram when the initial vendor contract expires? Published academic literature focuseson return-on-investment as the gold standard for wellness program evaluation, and a recentmeta-analysis concludes that wellness programs can generate net savings after one or twoyears. In contrast, surveys indicate that fewer than half of these programs report net savings,and actuarial analysts argue that return-on-investment is an unrealistic metric for evaluat-ing new programs. These analysts argue that evaluation of new programs should focus oncontract management issues, such as the vendor’s ability to: (i) recruit employees to par-ticipate and (ii) induce behavior change. We compute difference-in-difference propensityscore matching estimates of the impact of a wellness program implemented by a mid-sizedemployer. The analysis includes one year of pre-implementation data and three years of post-implementation data. We find that the program successfully recruited a broad spectrum ofemployees to participate, and it successfully induced short-term behavior change, as mani-fested by increased preventive screening. However, the effects on health care expenditures arepositive (but insignificant). If it is unrealistic to expect new programs to significantly reducehealthcare costs in a few years, then focusing on return-on-investment as the gold stan-dard metric may lead to early termination of potentially useful wellness programs. Focusingshort-term analysis of new programs on short-term measures may provide a more realisticevaluation strategy.

Keywords Wellness · Cost · Absenteeism · Screenings · Return-on-investment

JEL Classification I11 · I19

Karl Geisler and Tim Morgan provided excellent research assistance. This work was supported by a summerresearch grant from the College of Business at University of Nevada, and by the employer that provided thedata.

S. Mukhopadhyay (B) · J. WendelDepartment of Economics (0030), University of Nevada, Reno, NV 89557-0030, USAe-mail: [email protected]

123

Page 2: Evaluating an employee wellness program

174 S. Mukhopadhyay, J. Wendel

Before investing in any type of wellness and population health management program,those responsible for demonstrating its success must be realistic about the outcomesthat can be achieved and over what period of time (Weltz 2009).

Introduction

Return on investment (ROI) analysis is typically presented as the gold standard for evaluatingemployee wellness program (EWP) outcomes. While some long-established EWPs may earnpositive ROI’s, this goal may not be realistic for small and mid-size employers offering newprograms. Actuarial analysts argue that wellness programs require investments during earlyyears, prior to earning returns in subsequent years (Fitch 2008). Specification of the programgoal and the evaluation metric is a salient issue for employers initiating new programs, becausethose employers may face contract renewal or termination decisions after two or three years ofprogram operation. Instead of measuring ROI, these employers could rely on the United StatesPreventive Task Force (USPSTF) recommendations, that identify preventive measures forwhich there is “high certainty” that a net benefit will be generated over time (USPSTF 2013),and focus firm-level analysis on vendor management issues, such as the ability of the vendorto: (i) induce participation that aligns with program goals, and (ii) induce those participantsto increase their investments in individual health production activities. The Disease Manage-ment Association of American (DMAA) advises employers to evaluate both ROI and vendormanagement measures; however this does not address the question of whether ROI is a real-istic measure for new programs. We use health claims and employment data to evaluate a newEWP offered by a mid-size employer, and we find that the two metrics yield conflicting results:

• The program successfully recruited a broad spectrum of employees to participate, and itsuccessfully induced short-term behavior change, as manifested by increased preventivescreening, and

• Health care expenditures and absenteeism did not decrease.

We conclude that unrealistic reliance on ROI as the gold standard could lead employers toterminate viable programs, simply because it is premature to attempt to measure ROI.

EWPs have become ubiquitous among large employers (Fitch and Pyenson 2008), butsmaller employers have not followed suit (O’Donnell 2010). The Patient Protection andAffordable Care Act (Section 10408) attempts to close this gap, by authorizing $200 millionfor short-term grants to small employers that initiate new comprehensive wellness programs.This raises the question: how will these employers evaluate the new programs when it is timeto make a decision to renew the initial vendor contract, solicit bids from additional vendors,or terminate the program? Specification of the evaluation criteria may play a key role indetermining whether these programs continue after the initial vendor contract expires. Thepublished literature offers three views on the evaluation strategy.

Published evaluations of EWP outcomes typically focus on ROI estimates. For example,Nicholson et al. (2005) stress the importance of ROI analysis to guide decisions to invest inemployee health, and they provide pragmatic strategies to facilitate estimation of this ROI.Serxner et al. (2006) analyze methodological options for estimating ROI, and provide rec-ommended guidelines. In addition, vendor organizations provide online wellness programROI calculators that offer to compute expected program impacts, based on basic informationsuch as numbers of benefits-eligible employees, and the level and growth-rate of health-care expenditures. (See, for example, http://www.wellnessonline.com/about/what-we-offer/return-on-investment/). Finally, a recent meta-analysis concludes that some EWPs have suc-

123

Page 3: Evaluating an employee wellness program

Evaluating an employee wellness program 175

cessfully generated net savings after a few years of operation (Baicker et al. 2010). Theseresults are supported by evidence indicating that almost one-fifth of healthcare expendituresare attributed to ten individual health risk factors—that could potentially be modified throughEWP efforts (Goetzel et al. 2012).

In contrast, (Fitch 2008) argues that “Looking for a financial ROI from medical claimssavings is the wrong approach”, because wellness programs require short-term expendituresto support healthy behaviors such as screenings for cancers or chronic conditions, while thepotential benefits of these investments may accrue over years—or decades. Similarly, theUSPSTF concludes that there is “high certainty” that most1 of the screenings included in thisstudy will generate moderate or substantial net benefit over time; however they do not indicatethat positive net benefits can be expected in the short run. In addition, Pyenson and Zenner’s(2005) actuarial analysis of the outcomes of cancer screening programs concludes that cancerscreening programs generate net costs during the initial years, following by medium-termnet savings. These authors also provide an example of a screening program that is expectedto prevent five deaths for an employee group of 50,000. This poses two serious issues forfirms that initiate new programs by signing two-year or three-year vendor contracts. First,contract renewal/termination decisions for new programs will be based on short-term results.Second, demonstrating a positive ROI is a tougher hurdle for EWPs implemented by smalland mid-size employers, because the screening costs are clearly measureable, but the benefitscannot be measured with precision. Thus, the outcomes measurement question is particularlysalient for small and mid-size firms initiating new programs.

Employer survey results are also substantially less positive than the results summarizedby Baicker et al. (2010). Fitch and Pyenson (2008) report that only 14 % of employers thatoffered incentives for behavior change actually observed a positive return on that investment.Similarly, the 2010 Buck Survey reports that only a subset of firms estimate program impactsand only 45 % of these firms reported that the programs generated net savings (Buck Surveys2010).2 Serxner et al. (2009) analyze factors that may underlie this employer skepticism, andconclude that the factors that influence a program’s ROI are complex, and employers shouldtherefore engage in ongoing program monitoring and vendor management.

The Disease Management Association of America (2007) addresses this issue by advo-cating measurement of both intermediate targets (such as behavior change) and the outcomesmeasures that support ROI estimation. Some analysts implement this two-pronged strategy—and report that the evaluated EWP successfully met both criteria (Loeppke et al. 2008, 2010).However, neither the DMAA nor Loeppke et al. (2008, 2010) address the issues raised byPyenson and Zenner (2005), who argue that achieving an ROI greater than one is an unreal-istic expectation for new programs. If the published literature creates pressure for managersto demonstrate that a firm’s EWP is generating a positive ROI, before renewing a vendorcontract or extending this employee benefit beyond the initial contract period, this unrealisticexpectation could lead to termination of EWPs that are—in fact—useful. This raises thequestion of whether these employers should be advised to focus on evaluating the EWP suc-cess in recruiting participants and inducing behavior change, while relying on larger studiesto provide evidence that the behavior changes are likely to generate long-term savings.

Gross (2012) provides an overview of the tradeoffs between outcomes measures andprocess measures such as cancer screenings and chronic disease management. While his

1 USPSTF does not recommend screening for prostate cancer.2 In addition, a 2012 Congressional Budget Office working paper concluded that a set of 34 Medicare DiseaseManagement demonstration projects did not generate net savings. While Disease Management programs aredistinct from EWPs, this result is relevant because both aim to generate savings by preventing hospitalizations,and the prevention strategies include strengthening individual prevention and self-care behaviors.

123

Page 4: Evaluating an employee wellness program

176 S. Mukhopadhyay, J. Wendel

discussion focuses on measuring the quality of medical care, his analysis is relevant for eval-uating wellness programs. He argues that final outcomes measures have generally-accepted“face validity”, but they present three weaknesses. Broad measures of overarching outcomes(such as mortality or healthcare cost) cannot be measured in the short-run, they are influencedby an array of observable and unobservable socioeconomic, genetic, and environmental fac-tor, and these measures do not identify actionable quality-improvement issues. On the otherhand, process improvement measures can be analyzed in the short-run and they can supportimprovement efforts—to the extent that the process measures are linked to the program goals.In this framework, the quality of the national-level evidence linking wellness and preventiveactivities to long-term health outcomes is the central issue.

We analyze data provided by one mid-size employer with a new EWP, to examine theimplications of the ambiguous advice that employers should consider both ROI and behaviorchange, as program evaluation criteria. We use four years of administrative data from a mid-size employer that was facing a decision to either continue (or terminate) its three-year-oldEWP. We examine both types of program-outcome indicators, and we find that an evaluationfocused on the impacts of the EWP on healthcare claims and absenteeism would support adecision to terminate the program, because participation in the EWP is not associated withdecreases in either of these variables. In contrast, an evaluation focused on the impacts ofthe EWP on intermediate targets (such as employee engagement and participant behaviorchange) would support a decision to continue to offer the EWP, because EWP participationis associated with increased rates of recommended health screenings, and participation isspread broadly across demographic groups. The decision to continue or terminate the EWPhinges on the initial specification of the key evaluation metric.

Given the low current adoption rate of EWP’s among small and mid-size employers, andfederal efforts to induce these entities to initiate new programs, identification of useful metricsfor evaluating new programs is a salient issue.

Data and variables

We analyze health claims and encounter data, EWP participation data, and employee dataprovided by a mid-size employer. This employer signed a three-year contract, to initiate anemployee EWP in the third quarter of 2006, and the employer began preparing to make thecontract-renewal decision midway through the third year. The employer requested a programevaluation to support the upcoming decision to: renew the three-year contract with the currentvendor, solicit bids from new program vendors, or terminate the program. This evaluationwas conducted midway through year 3, to allow time to solicit bids and complete the processof contracting with a new vendor in the event that the employer decided to change vendors.This contract renewal decision was, therefore, based on two full years of post-implementationdata. (The dataset also includes the third year of post-implementation data. This data wasadded after the contract decision was made, to assess whether an additional year would haveaffected the results.)

This EWP, which was the first wellness program offered by this employer, included fourcomponents: employees were encouraged to complete an online Health Risk Assessment(HRA), attend the annual employee Health Fair, participate in a class (a one-time event),and/orparticipate in a campaign (a series of events). While the classes and campaigns addressedgeneral issues of healthy behaviors (e.g. compliance with recommended screening, diet, exer-cise and stress management behaviors); they particularly emphasized diabetes prevention andmanagement. The employer aimed to increase employee inputs into the health productionfunction, by encouraging employees to (i) increase compliance with recommended screenings

123

Page 5: Evaluating an employee wellness program

Evaluating an employee wellness program 177

for chronic conditions and for cancer, (ii) strengthen individual efforts to prevent and managechronic conditions, and (iii) adopt healthy behaviors. During the year prior to the implemen-tation of the EWP, employees who elected to enroll in the Health Maintenance Organization(HMO) plan incurred a $10 co-payment for all wellness activities, including screenings forchronic conditions and cancers. For employees who elected fee-for-service (FFS) coverage,the PPO paid for the first $250 of wellness expenditures for wellness visits. After the first$250 of wellness expenditures, the employee paid a $10 co-payment for wellness activitiesthat occur in a primary care setting, and these employees incurred a 20 % copayment forwellness activities (such as the cancer screening tests) that do not occur in a primary caresetting. Both types of employees were eligible for free blood sugar and cholesterol tests and$25 osteoporosis screenings at the annual employee Health Fair. The employer consideredmodifying the HMO and FFS plans, so that both sets of employees faced reduced paymentsfor wellness healthcare visits; however this was not feasible due to issues that arose duringthe collective bargaining process. Hence, the out-of-pocket expenditures incurred by eachset of employees (HMO members and FFS enrollees), for wellness visits and the associatedscreening tests, remained constant before and after implementation of the EWP.

Monetary incentives for participation were minimal. During the first and second years,small prizes (e.g. Starbucks gift cards or raffle tickets for an iPod) were offered for completingthe Health Risk Assessment and participating in EWP events. The employer specified thatevaluation of the program should be conducted by independent analysts. Based on publishedliterature, the employer anticipated that this analysis would conclude that the program wouldgenerate a positive ROI during the initial three-year contract period. The EWP was terminatedafter four years, in response to recession-induced financial pressures.

We use four years of data, from the third quarter of 2005 through the second quarter of2009, which provides one full year of “before” data and three years of “after” data. (The finalyear of data did not become available until after the employer actually made the contractingdecision; we provide year-by-year estimation results to assess whether the results are sensi-tive to the number of years included in the analysis. We include this data to assess whether thespecification of the decision criteria would have been less critical, if the employer delayedthe decision for a year.) The EWP roll-out started in the third quarter of 2006; and the roll-outprocess continued for several months. The dataset includes the 2,425 unique individuals whowere age 65 or younger during the “before” period, employed during the year prior to imple-mentation of the EWP, and represented in the data with observations for all study variables.

We use data from three sources: employee data (wages and hours absent) was providedby the employer, health claims data was provided by the third-party entity that processesclaims for the employees who elected FFS coverage, and encounter data was provided by theHMO that provides care to employees who elected to join the HMO. The HMO provided datathat was formatted to be comparable to claims data, with costs included for each healthcareencounter (based on the reimbursement rates at which the HMO contracted with providers).

We use difference in difference (DD) with propensity score matching, to examine theimpact of program participation on the overall goals of reducing both healthcare costs andabsenteeism, and on the intermediate target of inducing individuals to obtain recommendedhealth screenings. We define variables to measure: (i) program participation, (ii) individualdemographic, health insurance and health characteristics, and (iii) program outcomes.

Variable definitions

We define four types of variables, to measure program participation, individual demographiccharacteristics, individual diagnoses, and program outcomes.

123

Page 6: Evaluating an employee wellness program

178 S. Mukhopadhyay, J. Wendel

Program participation

A zero-one indicator variable identifies whether each individual participated in any compo-nent of the EWP program during any of the three years of program operation: 993 individuals(41 %) participated, while 1,432 (59 %) did not. We use the most liberal definition of programparticipation: an individual who participated in any component of the program during anyprogram year is categorized as a participant. Therefore, our estimates of program impactsmay have a downward bias but the likelihood of upward bias is low.

Demographic characteristics

The dataset includes observable demographic characteristics: age, marital status, weekly pay,and gender. As shown in Table 1, average age, marital status and weekly pay were similarfor the participant and non-participant groups: the average age of the participants was 44,while the average age of non-participants was 45 years; 64 % of individuals in both groupswere married; weekly pay was $1,101 for participants and $1,119 for the non-participants.In contrast, gender is associated with participation: 72 % of the participants were women,compared only 50 % of the non-participants. We also include a health insurance variable,to indicate whether each individual elected to enroll in the HMO or the FFS plan. About44 % of participants were enrolled in HMO compared to 46 % of non-participants. Table 1provides t statistics to test whether the means of these variables are significantly different forthe participant versus non-participant groups. Participants are significantly more likely to befemale, below-median age, below-median wage, and enrolled in the FFS plan.

Diagnoses

We include two types of diagnosis variables, to indicate whether each individual had a condi-tion, prior to implementation of the EWP, that was: (i) potentially impacted by the EWP, or (ii)associated with high healthcare expenditures that are not prevented by an EWP. The programvendor identified five diagnoses for which EWP participation can potentially help individualsprevent or manage the condition: diabetes, mental health conditions, bone and joint condi-tions, hypertension, and asthma. The incidence of diabetes (which was the key diagnosistargeted by the program) among participants was 29.6 %, compared with 31.1 % among non-participants. In addition, the incidence of diagnoses indicating mental health conditions was17.3 % among participants and 16.6 % among non-participants; the incidence of diagnosesindicating bone/joint 41.4 % among participants and 43.4 % among non-participants; the inci-dence of hypertension was 25.2 % among participants and 29.7 % among non-participants,and the incidence of asthma was 5.9 % among participants and 6.3 % among non-participants(see Table 1).

The EWP vendor also identified high-cost conditions that are not expected to be impactedby the EWP: cancer, pregnancy, hepatitis and HIV/AIDs.3 We used ICD-9 codes included inthe claims and encounter data, to indicate whether each individual had any of these diagnosesduring the years prior to initiation of the wellness program. In the pre-EWP period, 4.5 %(4.1 %) of the participants (non-participants) had diagnoses indicating cancer, 3.0 % (3.0 %)of the participants (non-participants) were pregnant, and 0.5 % (1.1 %) of the participants(non-participants) had hepatitis.

3 Program vendor also identified HIV/AIDS as a high cost condition but our sample did not have any individualswith this diagnosis.

123

Page 7: Evaluating an employee wellness program

Evaluating an employee wellness program 179

Tabl

e1

Var

iabl

ede

finiti

ons

and

desc

ript

ive

stat

istic

sfo

rth

eye

arpr

ior

toE

WP

impl

emen

tatio

n

Var

iabl

eD

efini

tion

Part

icip

ants

Non

-par

ticip

ants

Mea

nSt

d.er

ror

Mea

nSt

d.er

ror

tst

at

Lon

g-ru

nou

tcom

es

Hea

lthca

reco

stA

nn.p

dcl

aim

s/en

coun

ters

3063

.319

7.5

3689

.326

5.7

−2.6

3∗H

ours

abse

ntA

nnua

lhou

rspd

time

abse

nt83

.62.

396

.02.

4−5

.26∗

Dem

ogra

phic

char

acte

rist

ics

Gen

der

1if

fem

ale;

0if

mal

e0.

722

0.01

40.

501

0.01

316

.22∗

Age

age

44.2

600.

307

45.3

170.

273

−3.6

8∗m

arri

ed1

ifm

arri

ed,0

othe

rwis

e0.

642

0.01

50.

641

0.01

30.

07

Wk_

pay

wee

kly

pay

1101

.03

10.9

8411

18.6

39.

967

−1.6

9∗H

MO

1if

mem

ber

HM

O;0

ifFF

S0.

435

0.01

60.

462

0.01

3−1

.90∗

Dia

gnos

es0.

000

0.00

0

Dia

betic

cond

ition

1if

ICD

-9co

de;0

othe

rwis

e0.

296

0.01

50.

311

0.01

2−1

.14

Men

tali

llnes

s1

ifIC

D-9

code

;0ot

herw

ise

0.17

30.

012

0.16

60.

010

0.65

Bon

eco

nditi

on1

ifIC

D-9

code

;0ot

herw

ise

0.41

40.

016

0.43

40.

013

−1.4

1

Hyp

erte

nsio

n1

ifIC

D-9

code

;0ot

herw

ise

0.25

20.

014

0.29

70.

012

−3.5

2∗A

sthm

a1

ifIC

D-9

code

;0ot

herw

ise

0.05

90.

008

0.06

30.

006

−0.5

8

Can

cer

1if

ICD

-9co

de;0

othe

rwis

e0.

045

0.00

70.

041

0.00

50.

69

Preg

nanc

y1

ifIC

D-9

code

;0ot

herw

ise

0.03

00.

005

0.03

00.

005

0.00

HIV

/AID

s1

ifIC

D-9

code

;0ot

herw

ise

0.00

00.

000

0.00

00.

000

Hep

atiti

s1

ifIC

D-9

code

;0ot

herw

ise

0.00

50.

002

0.01

10.

003

−2.3

4∗

123

Page 8: Evaluating an employee wellness program

180 S. Mukhopadhyay, J. WendelTa

ble

1co

ntin

ued

Var

iabl

eD

efini

tion

Part

icip

ants

Non

-par

ticip

ants

Mea

nSt

d.er

ror

Mea

nSt

d.er

ror

tst

at

Scre

enin

gs

Can

cer

Pros

tate

scre

en1

ifIC

D-9

code

;0ot

herw

ise

0.12

30.

020

0.13

40.

013

−0.6

9

Cer

vica

lscr

een

1if

ICD

-9co

de;0

othe

rwis

e0.

481

0.01

90.

423

0.01

83.

15*

Col

orec

tals

cree

n1

ifIC

D-9

code

;0ot

herw

ise

0.06

10.

008

0.05

70.

006

0.59

Chr

onic

cond

.0.

020

0.00

40.

017

0.00

30.

88

Dia

bete

ssc

reen

1if

ICD

-9co

de;0

othe

rwis

e0.

241

0.01

40.

252

0.01

1−0

.90

Cho

lest

erol

scre

en1

ifIC

D-9

code

;0ot

herw

ise

0.12

30.

020

0.13

40.

013

−0.6

9

Sam

ple

size

993

1,43

2

*St

atis

tical

lysi

gnifi

cant

diff

eren

ce;5

%le

vel;

two-

side

dte

st

123

Page 9: Evaluating an employee wellness program

Evaluating an employee wellness program 181

Program outcomes

We define two overall outcomes measures: healthcare costs and absenteeism. We denote thecost information provided by the HMO and the FFS plans as “healthcare costs”, and westate these costs in constant 2005 dollars, using the medical care component of the ConsumerPrice Index. The absenteeism variable measures the number of hours per year each individualwas absent from work for either sick leave or unpaid time-off; paid vacation days were notincluded in this variable. The average health care cost for the participants was $3,063 in year1, compared to $3,689 for the non-participants; participants averaged 83.6 absent hours inyear 1, compared to 96 h for non-participants. These differences are statistically significant,suggesting that the EWP attracted relatively low-cost and low-absenteeism individuals. Thisinitial situation contrasts with four of the six studies of healthcare costs reported by Baickeret al. (2010) in their second category of studies (which reported evaluations of programs forwhich participation was voluntary). Average pre-program health care costs were higher inthe participant group, compared with the non-participant group, in four of these studies; thecosts were roughly comparable in the other two studies. Baicker et al. (2010) also reportevaluations of absenteeism for 11 studies in which participation decisions were voluntary. Inthese 11 studies, pre-program absenteeism was higher among participants, compared withnon-participants, in 3 studies, lower in 5, and comparable in 3.

We also examine the impact of the EWP on short-term “behavior change”. We focus onbehaviors that can be observed in the health claim/encounter data, to avoid the problemsintrinsic to HRA data (the data are self-reported, and they are only available for the subsetof individuals who self-select to complete the HRA). The range of behaviors that can bemeasured in claims data is currently limited; however the range of behaviors that will bevisible in administrative data will increase as providers increasingly use electronic medicalrecords systems. The claim/encounter data are unlikely to be completely accurate for allindividuals, but there is no a priori reason to expect the inaccuracy to vary systematicallywith participation (vs. non-participation) in the EWP program (Sing 2004). For example, theincidence of breast cancer and osteoporosis screening reported in our data is zero for the yearprior to implementation of the wellness program, but these screening rates are greater thanzero for both participants and non-participants in the three subsequent years. There appearsto have been a change in the system for coding and billing for these tests, which coincidedwith implementation of the EWP. While this specific coding issue is unique to this dataset,the issues posed by changes in coding practices are endemic to claims/encounter data.

We focus on the impact of program participation on the probability that an individualwill obtain screenings for indications of chronic conditions (blood glucose, cholesterol andosteoporosis) and screenings for cancer [mammogram and pap test (females only)], prostatescreening (males only), and colonoscopy), because the HRA and the Health Fair encouragedemployees to obtain recommended screenings. Because the vendor’s programming specif-ically emphasized diabetes prevention and management, we hypothesize that the programmay be particularly likely to generate increased screenings for diabetes. The recommendedschedules for these screenings may vary across the type of screening and individual’s gender,health risk and age; hence we estimate separate regressions for each screening, and we controlfor gender, pre-existing diagnoses, and age in the computation of DD with propensity scorematching.

The probabilities of obtaining some of the screenings may not be independent (screeningsfor all three of the chronic conditions could be obtained at the Health Fair); hence we alsoconstruct three screening indices to indicate whether each individual obtained at least one

123

Page 10: Evaluating an employee wellness program

182 S. Mukhopadhyay, J. Wendel

screening for a chronic condition, cancer relevant for males (prostate or colon), or cancerrelevant for females (breast, cervical or colon).

The pre-program screening rates, for most of the screening tests, were similar forparticipants and non-participants: 12.3 % of male participants and 13.4 % of male non-participants received prostate cancer screening, 48.1 % of female participants and 42.3 % offemale non-participants received cervical cancer screening, 6.1 % of participants and 5.7 %of non-participants received colorectal cancer screening, 2.0 % of participants and 1.7 %of non-participants received diabetes screening, and 24.1 % of participants and 25.2 % ofnon-participants received cholesterol screening. The difference between the cervical can-cer screening rate for participants (0.481) and for non-participants (0.423) is statisticallysignificant; the other differences are not significant (see Table 1).

Normalized differences of the independent variables that will be used in the propensityscore matching equation

To test whether the participant and non-participant groups have enough similarity in the pre-EWP period to support additional analysis, we compute the normalized difference for eachvariable (Imbens and Wooldridge 2010). The normalized differences [which are reportedin Appendix Table 9 (column 5)], are equal to the difference between the average for eachvariable for participants and the average among non-participants, divided by the square root ofthe sum of the two variances. We check whether the (normalized difference for each variableexceeds the critical value of 0.25. This critical value was initially suggested by Imbens andRubin (2007) for linear regression methods; however it also useful for non-linear analyses.Only one of the normalized differences (gender) exceeds the critical value of 0.25. In addition,most of the normalized differences for our data are lower than the normalized differencesreported by Imbens and Wooldridge (2010) after they used the matching techniques proposedin Rubin (2006) and Imbens et al. (2001). In our implementation of propensity score matching,we will impose a common support restriction; however these numbers already suggest thatcommon support will not be an issue in our sample. In other words, our sample meetsthe criteria defined by Imbens and Wooldridge (2010) for producing “credible and robustestimates”, with the possible exception of the fact that women are more likely to participatein the EWP than men. We address this issue by reporting estimates for the full sample, andfor separate subsamples of males and females.

Difference in difference computation—means

Table 2 presents the DD estimates (in means) for the outcomes measures for the pre-EWPyear (year 1 in the data) and for the first, second and third years of the EWP (years 2, 3 and4 in the dataset).4 By year 3, EWP participation is associated with statistically significantincreases (at the 5 % level) in the proportions of individuals screened for prostate cancer,diabetes and cholesterol. By year 4, EWP participation is also associated with a statisticallysignificant increase in absenteeism. The increased number of screenings may account forsome of the increase in hours absent from work. However, regression to the mean may alsobe occurring: participants logged fewer absent hours prior to EWP implementation, and thegap narrowed during the three EWP years.

These results contrast with the DD (in means) estimates reported by Baicker et al. (2010).While these authors do not report standard errors for the DD estimates, they report negative

4 We do not have data on pre-program for breast cancer and osteoporoses screening; hence these results aresimple differences in means, rather than difference in difference estimates.

123

Page 11: Evaluating an employee wellness program

Evaluating an employee wellness program 183

Tabl

e2

Dif

fere

nce

indi

ffer

ence

com

puta

tion

for

mea

ns

Part

icip

ants

Non

-par

ticip

ants

Dif

fSE

-dif

fD

DSE

-DD

tst

atD

D

Mea

nSE

Mea

nSE

Pane

lA:y

ear

1in

the

data

set(

pre-

EW

Pye

ar)

Hea

lthca

reco

st30

63.3

197.

536

89.3

265.

7−6

26.0

331.

0

Hou

rsab

sent

83.6

2.3

96.0

2.4

−12.

43.

3

Pros

tate

scre

en0.

123

0.02

00.

134

0.01

3−0

.011

0.02

4

Cer

vica

lscr

een

0.48

10.

019

0.42

30.

018

0.05

90.

026

Col

orec

tals

cree

n0.

061

0.00

80.

057

0.00

60.

005

0.01

0

Dia

bete

ssc

reen

0.02

00.

004

0.01

70.

003

0.00

30.

006

Cho

lest

erol

scre

en0.

241

0.01

40.

252

0.01

1−0

.012

0.01

8

Pane

lB:y

ear

2in

the

data

set(

first

EW

Pye

ar)

Hea

lthca

reco

st38

46.8

388.

940

57.4

329.

−210

.750

9.5

415 .

360

7.6

0.68

Hou

rsab

sent

88.4

2.5

97.3

2.5

−8.9

3.5

3.6

4.8

0.75

Pros

tate

scre

en0.

203

0.02

40.

204

0.01

5−0

.002

0.02

90.

010

0.03

70.

27

Cer

vica

lscr

een

0.64

20.

018

0.56

50.

019

0.07

70.

026

0.01

80.

037

0.49

Bre

asts

cree

n0.

234

0.01

60.

172

0.01

40.

063

0.02

1

Col

orec

tals

cree

n0.

104

0.01

00.

096

0.00

80.

007

0.01

20.

002

0.01

60.

13

Ost

eopo

rosi

ssc

reen

0.02

60.

005

0.01

10.

003

0.01

50.

006

Dia

bete

ssc

reen

0.03

70.

006

0.03

00.

005

0.00

70.

008

0 .00

50.

009

0.56

Cho

lest

erol

scre

en0.

398

0.01

60.

361

0.01

30.

036

0.02

00.

048

0.02

71.

78

Pane

lC:y

ear

3in

the

data

set(

seco

ndE

WP

year

)

Hea

lthca

reco

st45

15.3

325.

643

20.9

324.

219

4.4

459.

482

0.4

566.

31.

45

Hou

rsab

sent

92.8

2.4

98.8

2.5

−6.1

3.5

6.4

4.8

1.33

Pros

tate

scre

en0.

380

0.02

90.

283

0.01

70.

098

0.03

40.

109∗

0.04

12.

66

123

Page 12: Evaluating an employee wellness program

184 S. Mukhopadhyay, J. Wendel

Tabl

e2

cont

inue

d

Part

icip

ants

Non

-par

ticip

ants

Dif

fSE

-dif

fD

DSE

-DD

tst

atD

D

Mea

nSE

Mea

nSE

Cer

vica

lscr

een

0.75

70.

016

0.66

50.

018

0.09

20.

024

0.03

30.

035

0.94

Bre

asts

cree

n0.

562

0.01

90.

418

0.01

80.

144

0.02

6

Col

orec

tals

cree

n0.

156

0.01

20.

152

0.01

00.

004

0.01

5−0

.001

0.01

8−0

.06

Ost

eopo

rosi

ssc

reen

0.07

70.

008

0.03

80.

005

0.03

80.

010

Dia

bete

ssc

reen

0.15

70.

012

0.04

80.

006

0.10

90.

013

0.10

6*0.

014

7.57

Cho

lest

erol

scre

en0.

575

0.01

60.

482

0.01

30.

093

0.02

10.

104*

0.02

73.

85

Pane

lD:y

ear

4in

the

data

set(

thir

dE

WP

year

)

Hea

lthca

reco

st43

79.8

344.

941

81.6

354.

819

8.1

494.

882

4.1

595.

41.

38

Hou

rsab

sent

52.4

1.9

54.2

1.8

−1.8

2.6

10.6

*4.

32.

47

Pros

tate

scre

en0.

442

0.03

00.

336

0.01

80.

106

0.03

50.

117*

0.04

22.

79

Cer

vica

lscr

een

0.81

70.

014

0.70

40.

017

0.11

30.

022

0.05

40.

034

1.59

Bre

asts

cree

n0.

675

0.01

80.

484

0.01

90.

191

0.02

6

Col

orec

tals

cree

n0.

220

0.01

30.

189

0.01

00.

030

0.01

70.

025

0.01

91.

32

Ost

eopo

rosi

ssc

reen

0.12

00.

010

0.06

50.

007

0.05

50.

012

Dia

bete

ssc

reen

0.17

40.

012

0.06

00.

006

0.11

40.

014

0.11

1*0.

015

7.40

Cho

lest

erol

scre

en0.

651

0.01

50.

551

0.01

30.

099

0.02

00.

111*

0.02

74.

11

*St

atis

tical

lysi

gnifi

cant

atth

e5

%le

vel(

two-

taile

dte

stfo

rco

mpr

ehen

sive

mea

sure

s;on

e-ta

iled

test

for

inte

rmed

iate

targ

ets)

123

Page 13: Evaluating an employee wellness program

Evaluating an employee wellness program 185

EWP impacts on cost for 5 of the 6 cost studies, and negative EWP impacts on absenteeismfor all 11 of the studies that report absenteeism numbers.

Methods

Even though our analysis of the descriptive statistics indicates that the participant and non-participant groups are roughly comparable, this analysis of DD (in means) does not addressthe key counterfactual question: Would the participant’s behavior have been the same, hadhe not participated? We use propensity score matching to construct this counterfactual, usingthe appropriate weights, and then report results for DD propensity score matching. Stata 10was used to complete the analysis. We designate individuals who participated in the programas the treated group and those who did not participate as the control group.

Propensity score matching estimates

The first step is to estimate propensity scores. We use a Probit model to compute apropensity score for every individual in the sample, as a function of the pre-EWP demo-graphic, health status and absenteeism characteristics of the 2,425 individuals includedin our sample. We use two types of variables to measure pre-EWP health status. First,we include the pre-EWP healthcare expenditures as a measure of overall health status.However, Eichner et al. (1997) show that year-to-year variability in healthcare expendi-tures is substantial; hence we also include variables to measure the diagnoses that are tar-geted by the EWP and two diagnoses (cancer and hepatitis) with high costs that are notexpected to be impacted by an EWP. We also included a variable to indicate pregnancy,because costs incurred for pregnancy and delivery are not expected to continue over multipleyears.

We estimate propensity scores for the full sample and for four sets of split samples. (Thesubsamples are split by gender (male vs. female), age (younger than the median age of 45vs. older than 45), income (annual wage less than the median of $54,080 vs. wage above thislevel), and insurance type (managed care vs. FFS). The split sample results allow the employerto assess whether the vendor’s programming successfully appeals to the entire spectrum ofemployees. Some analysts argue that employers should also consider whether the EWPshould be designed to target individuals with either high (or low) healthcare expendituresto maximize the productivity of the EWP (Edington 2009). We do not split the samples bybaseline healthcare expenditures, however, because the impacts of such a targeting strategycannot be observed during the initial years of a new EWP

Table 3 reports the estimation results for the propensity score equation for the full sample.The likelihood ratio χ2 statistic is 183.4. Women, older employees, FFS participants, indi-viduals with a diagnosis of diabetes, and individuals with relatively low rates of absenteeismwere significantly more likely to participate in the EWP. The impact of a diabetes diagnosison participation is consistent with the fact that the program vendor emphasized diabetes pre-vention and management. While the univariate analysis presented in Table 1 indicated thatolder individuals were significantly less likely to participate in the EWP, the sign is reversedin the multivariate analysis presented in Table 3.

The minimum value of predicted propensity scores is 0.07 and the maximum is 0.72, whichimplies that most of these variables satisfy the rule of thumb (propensity scores between 0.1and 0.9) criterion for assessing overlap as suggested by Crump et al. (2008). The distributionsof the estimated propensity scores for the participant and non-participant groups, shown in

123

Page 14: Evaluating an employee wellness program

186 S. Mukhopadhyay, J. Wendel

Table 3 Propensity scorematching equation: estimationresults

Coefficient t statistic

Demographic characteristicsFemale 0.6889 11.71

Age 0.0475 2.10

Age squared −0.0007 −2.63

Married 0.0909 1.61

Ln (annual pay) 0.1084 1.17

Insurance type −0.1013 −1.85

DiagnosesDiabetic condition 0.1538 2.42

Mental health issue −0.0505 −0.71

Bone/joint condition −0.01108 −0.20

Hypertension −0.0714 −1.11

Asthma −0.0820 −0.74

Cancer 0.0948 0.71

Pegnancy −0.1724 −1.06

Hpatitis −0.4918 −1.58

Long-term outcomes measuresHealthcare cost −1.32e−06 −0.36

Hours absent −0.0014 −3.97

Constant −2.0135 −2.72

01

23

40

12

34

0 .2 .4 .6 .8

control group distribution

treated group distribution

Den

sity

Estimated propensity scoreGraphs by treatment

Fig. 1 Overlap of propensity scores for the non-participant (control) and participant (treated) samples

123

Page 15: Evaluating an employee wellness program

Evaluating an employee wellness program 187

Fig. 1, indicate that common support is not an issue for the analysis reported here becausethe treatment and control groups are comparable.

We re-estimate the propensity score equation for split samples using age, annual pay,type of health insurance, and gender (see Table 4). The significant impacts of gender andabsenteeism on participation in the full sample are observed in all of the subsamples. How-ever, the impact of a diagnosis of diabetes is concentrated among women, FFS participants,individuals who are relatively young and individuals with above-median income.

We used the Becker and Ichino (2002) p score command to check that the balancingcriterion is satisfied. (This criterion uses a t test to test whether the treatment and controlgroup covariate means are equal, within each propensity score block. For our study, theparticipants are the treatment group, and the non-participants serve as the control group.)After verifying that the balancing criterion is satisfied, we use Kernel matching, to permit fulluse of the sample observations. For each participant, this procedure assigns weights to eachnon-participant based on the difference between the two propensity scores. We completed thisprocedure for the full sample, and for each of the sub-samples. We use the psmatch2 commandwritten by Leuven and Sianesi (2003) to generate the matching estimates. The commonsupport restriction was imposed in all cases. Also, we use bootstrapped standard errors basedon 200 replications, because it is well known that analytical standard errors are not accurate.

The estimated propensity scores were used to compute DD estimates of the impacts ofEWP participation on overall outcomes and short-term screening behaviors. The DD methodcompares the before-vs-after differences in outcomes for participants with the comparablebefore-vs-after differences in outcomes for non-participants. Thus, the impacts of unobservedtime-constant individual characteristics are differenced-out.

However, neither the matching nor the DD methodology addresses unobserved individualcharacteristics that vary over time. Hence, the matching estimates can be only be interpretedas evidence of a causal effect under the assumption of unconfoundedness. This assumptionholds when that there is no selection (into the participant group), based on unobservable time-varying characteristics. For example, if some of the EWP participants were already planningto make behavioral changes (prior to program participation), these behavior changes wouldoccur concurrently with EWP implemention—but the relationship would not be causal. Inthis case, the DD propensity score matching method may indicate a statistically-significantassociation, but we could not conclude that the EWP generated the behavior change. Unfor-tunately, the presence (or absence) of this type of selection on time-varying unoservables(such as intention to change behavior) cannot be tested directly. However, a falsification testis available, to provide an indirect test of the unconfoundedness assumption. (Smith and Todd2005; Imbens and Wooldridge 2010)

The falsification test estimates a pseudo treatment effect when the treatment effect isexpected to be zero (because there was no known treatment at that time). We implementthe falsification test using the first six quarters of data, we use the first three quarters as“pseudo before” period (2005:Q3–2006:Q1) and last three quarters of the before-programperiod as the “pseudo after” period (2006:Q2–2006:Q4). Thus, the “pseudo after” periodincludes one quarter of data prior to program implementation and the first two quarters ofdata observed after the official program-implementation date. This generates a conservativetest, in the sense that the test may be biased toward finding a significant effect—if the programgenerated measurable impacts during the first two quarters. This issue is minor in this case,however, because the program vendor indicated that the program was not fully operationaluntil the end of these two quarters.

We estimate whether a measurable “treatment effect” occurred between these two periods,and we hypothesize that the estimated treatment effect should be zero because there was no

123

Page 16: Evaluating an employee wellness program

188 S. Mukhopadhyay, J. Wendel

Tabl

e4

Prop

ensi

tysc

ore

mat

chin

geq

uatio

n:pr

obab

ility

ofpa

rtic

ipat

ion

depe

nden

tvar

iabl

e:pa

rtic

ipat

ion

indi

cato

rva

riab

le

Subs

ampl

e

Age

<44

Age

<44

Pay<

med

.Pa

y<m

ed.

FFS

HM

OW

omen

Men

Coe

f.C

oef.

Coe

f.C

oef.

Coe

f.C

oef.

Coe

f.C

oef.

tst

at.

tst

at.

tst

at.

tst

at.

tst

at.

tst

at.

tst

at.

tst

at.

Dem

ogra

phic

char

acte

rist

ics

Hea

lthco

st0.

0001

−0.0

001

00.

0001

−0.0

001

0.00

01−0

.000

10

1.28

−1.1

9−0

.66

0.13

−0.9

10.

99−0

.89

0.28

Hou

rsab

sent

−0.0

02−0

.001

−0.0

01−0

.002

−0.0

02−0

.001

−0.0

01−0

.001

−2.7

2*−2

.84*

−2.7

1*−2

.65*

−3.3

*−2

.56*

−2.0

1*−3

.37*

Fem

ale

0.71

30.

640.

689

0.69

40.

717

0.63

6

8.27

*7.

86*

7.82

*8.

62*

9.13

*7.

09*

Age

−0.1

780.

093

0.04

30.

089

0.02

60.

055

−0.0

180.

085

−2.0

5*0.

71.

552.

15*

0.77

1.74

*−0

.48

2.96

*

Age

sq0.

003

−0.0

01−0

.001

−0.0

010

−0.0

010

−0.0

01

2.09

*−0

.91

−2.1

*−2

.29*

−1.2

7−1

.94*

0.21

−3.4

1*

Mar

ried

0.12

80.

073

0.09

50.

101

0.14

60.

039

−0.1

280.

179

1.5

0.94

1.23

1.19

1.85

*0.

47−1

.28

2.58

*

Ln

(ann

.pay

)−0

.079

0.24

40.

212

0.02

50.

085

0.12

30.

284

0.04

3

−0.5

32.

03*

1.02

0.14

0.7

0.85

1.83

*0.

36

Insu

ranc

e−0

.148

−0.0

55−0

.081

−0.1

16−0

.09

−0.1

2

−1.8

3*−0

.73

−1.0

6−1

.45

−0.9

9−1

.73*

123

Page 17: Evaluating an employee wellness program

Evaluating an employee wellness program 189Ta

ble

4co

ntin

ued

Subs

ampl

e

Age

<44

Age

<44

Pay<

med

.Pa

y<m

ed.

FFS

HM

OW

omen

Men

Coe

f.C

oef.

Coe

f.C

oef.

Coe

f.C

oef.

Coe

f.C

oef.

tst

at.

tst

at.

tst

at.

tst

at.

tst

at.

tst

at.

tst

at.

tst

at.

Dia

gnos

is

Dia

bete

s0.

286*

0.07

80.

144

0.16

3*0.

169*

0.11

30.

268*

0.08

5

2.64

0.99

1.59

1.81

1.97

1.18

2.68

1.02

Men

talh

ealth

−0.0

72−0

.035

−0.0

960.

019

−0.0

80.

013

0.09

8−0

.126

−0.6

8−0

.36

−0.9

70.

18−0

.85

0.12

0.73

−1.5

Bon

e/jo

int

−0.1

55*

0.11

2−0

.011

−0.0

150.

074

−0.1

250.

047

−0.0

45

−1.8

51.

52−0

.14

−0.2

0.99

−1.5

20.

53−0

.64

Hyp

erte

nsio

n−0

.075

−0.0

8−0

.093

−0.0

65−0

.02

−0.1

36−0

.282

0.06

4

−0.6

9−0

.99

−1.0

3−0

.71

−0.2

3−1

.42

−2.7

0.77

Ast

hma

−0.2

0.02

40.

054

−0.1

72−0

.054

−0.1

660.

13−0

.16

−1.1

10.

170.

33−1

.13

−0.3

6−0

.99

0.61

−1.2

3

Can

cer

0.04

10.

126

0.15

80.

021

0.14

2−0

.012

0.15

30.

045

0.15

0.82

0.82

0.12

0.79

−0.0

60.

60.

29

Preg

nanc

y−0

.109

−0.6

79−0

.239

−0.0

84−0

.444

−0.0

06−0

.155

−0.5

9−1

.41

−1.1

2−0

.33

−1.6

9−0

.03

0−0

.93

Hep

atiti

s−0

.284

−0.6

26−0

.25

−1.0

06−0

.549

−0.3

33−0

.702

−0.3

94

−0.5

1−1

.65

−0.6

5−1

.76

−1.4

1−0

.64

−1.2

2−1

.04

Con

stan

t3.

038

−4.0

19−2

.521

−2.5

28−1

.364

−2.3

62−1

.792

−1.6

69

1.8

−1.1

1−1

.69

−1.6

1−1

.33

−2.1

4−1

.48

−1.7

9

#O

bser

v.1,

094

1,33

01,

205

1,21

71,

331

1,09

31,

434

990

*St

atis

tical

lysi

gnifi

cant

atth

e5

%le

vel(

two-

taile

dte

stfo

rco

mpr

ehen

sive

mea

sure

s;on

e-ta

iled

test

for

inte

rmed

iate

goal

s)

123

Page 18: Evaluating an employee wellness program

190 S. Mukhopadhyay, J. Wendel

Table 5 Falsification test results Restricted sample Full sample

Coefficient t statistic With tail t statistic

Healthcare cost 27.00 0.62 102.4 0.76

hours absent 0.266 0.36 0.056 0.07

Prostate cancer 0.013 0.74 0.004 0.24

Cervical cancer 0.023 1.38 0.013 0.75

Breast cancer

Colorectal cancer 0.009 1.21 0.007 1.02

Osteoporosis

Diabetes −0.000 −0.01 0.001 0.27

Cholesterol 0.002 0.17 0.001 0.09

Any test −0.002 0.17

Male_cancer −0.006 0.35

Female_cancer 0.003 0.42

treatment in 2006:Q2. On the other hand, if some individuals were already making changesbefore the EWP was implemented (and if these individuals selected into the participant group),the estimated “treatment effect” would be positive. Thus, an insignificant estimated treatmenteffect will indicate that the data does not support the hypothesis that concurrent changes wereoccurring. We use DD propensity score matching to estimate the pseudo treatment effects.The results of the falsification test are presented in Table 5. None of the estimated treatmenteffects are statistically different from zero; hence the results of the falsification test areconsistent with the unconfoundedness hypothesis.

However, we note that Imbens and Wooldridge (2010) caution that a zero treatment effectin this falsification test does not imply that the Unconfoundedness assumption is true; insteadthis result simply indicates that the assumption is more plausible. Without a valid instrumentor a regression discontinuity design it is impossible to argue that the estimated treatmenteffect is indeed causal (Sekhon 2009).

Finally, the coefficients for the seven screening variables (presented for the full samplefor each year, and for the full time period for each subsample) could potentially exhibit bias(toward accepting a null hypothesis that is false) due to multiple comparisons. We use twostrategies to address this issue. First, we use the Bonferroni method to adjust for multiplecomparisons (Pfaffenberger and Patterson 1987). This method is appropriate, because deci-sions to obtain some of the screenings, particularly screenings available at the Health Fair(osteoporosis, blood sugar, blood pressure) may not be independent. However, this methodprovides a conservative result, because it focuses on the probability that any one of the sevenresults is a false positive. Therefore, we also construct three indexes of screening behavior.The first index counts the number of screenings obtained by each individual, among thoseavailable at the Health Fair (up to three). The second index counts the number of cancerscreenings obtained by each male (colorectal and prostate) and the third counts the numberof cancer screenings obtained by each female (colorectal, breast and cervical).

Results

Difference-in-difference estimates: full sample

Table 6 reports the DD matching estimates for the comprehensive measures of health-care cost, absenteeism, the three indices of screening behavior (chronic conditions, male-

123

Page 19: Evaluating an employee wellness program

Evaluating an employee wellness program 191

Table 6 Difference in difference estimate of the impact of program participation with propensity scorematching

EWP year 1 EWP year 2 EWP year 3

Coefficient t statistic Coefficient t statistic Coefficient t statistic

Comprehensive measures

Healthcare cost 454.4 0.77 738.9 1.37 734.0 1.27

Hours absent −0.78 0.34 −1.26 0.35 0.79 0.22

Cancer male 0.032 1.40 0.090* 3.04 0.098* 2.90

Cancer female 0.049 2.13 0.111* 4.33 0.155* 5.73

Screen (d/c/o) 0.051 3.62 0.102* 5.28 0.098* 4.56

Specific screenings

Prostate cancer 0.011 0.59 0.110* 3.54 0.119* 3.49

Cervical cancer 0.009 0.44 0.017 0.70 0.038 1.51

Breast cancer 0.055 2.33 0.132* 4.96 0.177* 6.58

Colorectal cancer 0.007 0.91 0.008 0.65 0.032 2.03

Osteoporosis 0.011 1.33 0.029 1.91 0.033 1.78

Diabetes 0.004 0.86 0.104* 8.61 0.109* 8.33

Cholesterol 0.046* 3.01 0.098* 4.79 0.100* 5.29

* Statistically significant at the 5 % level (two-tailed test for comprehensive measures; one-tailed test forintermediate goals; Bonferroni multiple comparisons method; critical t = 2.45)n = 2,425 for measures relevant to the entire sample (healthcare cost, % of hours absent, colorectal cancer,osteoporosis, diabetes, cholesterol); n = 993 for the male-only test (prostate cancer); n = 1,432 for the female-only tests (cervical and breast cancers)

relevant cancers and female-relevant cancers), and the individual screening tests for thefull sample. We report separate estimates for the first, second and third year of programoperation.

In the full sample, participation is not associated with changes in healthcare cost or absen-teeism in any of the post-treatment years. The coefficients of healthcare cost are positivefor all three years; however they are not statistically significant at conventional levels. Twofactors could potentially contribute to the fact that these coefficients are not statistically sig-nificant. First, it is well-known that healthcare cost distributions are typically highly skewed;hence a small number of extreme observations may be generating relatively high variance.One way to circumvent this problem is to exclude individuals in the top 5 % of the expen-diture distribution. Second, employee turnover could be adding noise to the sample. Weestimated the effect of participation on a restricted sample that excluded: (i) employees withthe top 5 % of healthcare costs in each year and (ii) individuals who were not employedfor the full four-year observation period. DD, with propensity score matching, results forthe outcome measures are presented in Appendix Table 10. We see a small (statisticallysignificant at 5 %) increase in healthcare expenditure in the third year of the program forthe restricted sample; however the overall results are roughly comparable to the full sampleresults.

We also assess whether the program successfully achieved the short-term goal of inducinghealthy behavior changes, such as increased participation in healthcare screenings. The fullsample results indicate that program participation is associated with increased screening forchronic conditions and cancers relevant for females, during each of the three years. It is

123

Page 20: Evaluating an employee wellness program

192 S. Mukhopadhyay, J. Wendel

also associated with increased screening for cancers relevant for males during the secondand third years. During the second year, which was most salient for the employer’s contractdecision, EWP is associated with a 9 percentage point increase in the rate of cancer screeningfor males, an 11 point increase in this rate for females, and a 10 point increase in the rate ofscreenings for chronic conditions (see Table 6).

The results for the specific screening tests are also presented in Table 6. To test thesignificance of the estimates for specific screenings, we apply the Bonferroni correction formultiple comparisons. This is a conservative criterion that focuses on the probability that afalse positive result will occur on at least one test, in a set of tests. We apply this criterionfor the set of individual screenings that might occur in a specific year; this implies that thecritical t statistic for a test at the 5 % level is t = 2.45.

Using the Bonferroni significance criterion, there is no statistically significant effect ofparticipation on cancer screenings during the first year. The results also indicate that par-ticipation in the wellness program is associated with increased screenings for cholesterolduring the first program year (this screening was available during the employee Health Fair).(We should note that since breast cancer and osteoporosis screenings were not coded in thepre-treatment period for these outcomes our results are essentially cross-sectional matchingestimates for these screenings.)

During the second year of program operation, EWP participation is significantly associatedwith increased likelihood of screenings for prostate cancer (by 11 % points) and breast cancer(by 13.2 % points). Program participation also led to a 10.4 % point increase in diabetesscreening, and 9.8 % point increase in cholesterol screening. The third year of programparticipation was associated with similar increases in screenings. The magnitudes of theestimated impacts increased from the first year to the second year as the program matured;then remained stable from year 2 to year 3.

Distribution of EWP program impacts across demographic subgroups

To assess whether the wellness program benefits are clustered among demographic sub-groups, we report results for four sets of split samples. We use the first three sets of splitsamples to assess whether EWP impacts are distributed across demographic groups that arerelevant for assessing whether the vendor is successfully inducing participation across theemployee population. The fourth split, by insurance type, permits assessment of potentialinteractions between the HMO/FFS systems for providing and funding healthcare, and theEWP incentives.

The DD estimates are detailed in Table 7 for the four sets of split samples for the thirdyear of program operation. A significant positive impact on healthcare cost is observed in thesubsamples with older individuals and FFS health insurance. (In addition, the magnitudes ofthe cost impacts are significantly larger for individuals with FFS coverage than for individualsenrolled in the HMO, and for older individuals compared with younger individuals. This isconsistent with the hypothesis that the EWP is conceptually more likely to impact individualsenrolled in the FFS plan, rather than the HMO, because HMOs may offer some overlappingdisease prevention/management features. Alternately, this result may reflect pricing issues.)Statistically significant impacts of participation on testing for diabetes and cholesterol areobserved in all of the split samples, even after the Bonferroni correction (i.e., critical t statisticis 2.45 for 5 % level of significance). The impacts of participation on some cancer screeningsare concentrated among specific subgroups: the increased probability of prostate screeningis concentrated in the subsamples with above-average age, above-median income and HMO

123

Page 21: Evaluating an employee wellness program

Evaluating an employee wellness program 193

enrollment, while the increased probability of colorectal cancer screening is concentratedamong older employees and employees with FFS health insurance.

These results indicate that recruitment and program design are reasonably well-alignedfor this program, in the sense that individuals with pre-existing diabetes are more likelyto participate than individuals who did not have this diagnosis prior to participation, andEWP participation is associated with increased rates of blood sugar testing. However, thefact that the impact of EWP participation is significantly associated with insurance type forcolorectal screening and blood sugar testing (in opposite directions) raises questions aboutinteractions among EWP design and the design of the health insurance plans offered toemployees. Similarly, the fact that the impact of participation on the most costly screening(for colorectal cancer) is concentrated in the subgroup with above-median income (aftercontrolling for age in the propensity score equation) could indicate a potential programdesign issue. After reviewing the results, for example, the subject employer indicated that itwould use the evaluation results to assess the convenience of program events (both timing andlocation) for employees who work on diverse shifts and at diverse locations. This strategy,to use firm-specific EWP program evaluation information to support efforts to strengthenthe program design is potentially useful: the Aon (2011) survey of 1,000 employers showsthat approximately half (52 %) use firm-specific data to influence wellness program design“significantly” or “moderately”, and 19 % do not use firm-specific data for this purpose atall.

Impact of increased propensity to obtain screenings on healthcare expenditures

The results reported above indicate that EWP participation is associated with: (i) signif-icant increases in the screening rates for cancers and chronic conditions and (ii) positive(insignificant) coefficients for healthcare cost variable. This pair of results raises the ques-tion: what proportion of the positive coefficient on the cost variable can be attributed to thenew screenings? To address this question, we regressed healthcare cost on the screeningvariables, controlling for demographic and health characteristics to estimate the statisticalimpact of screenings on cost in our data. These estimated coefficients include the cost of boththe screening and additional procedures that may be triggered by positive test results. Weused ordinary least squares estimation, hence it is possible that these estimated coefficientscould be statistically biased, due to omitted variable bias and/or correlation with unobserved(especially time varying) characteristics. The coefficients of the demographic and health sta-tus variables have reasonable signs and significance (see Appendix Table 11). However noneof the screening variable coefficients are significant. Nonetheless, the estimates provide auseful benchmark. The results, reported in Table 8, indicate that the cost of the increasedscreenings, induced by participation in the EWP, accounts for approximately 46 % of theestimated year-3 coefficient on the healthcare cost variable. This suggests that the initial costof inducing increased screening compliance is an important consideration, when choosingan evaluation metric.

Conclusion

These results indicate that selection of the evaluation criteria may determine the result ofthe program evaluation—and the decision to continue or terminate the program. Much ofthe academic literature on wellness programs focuses on ROI as the key metric, and con-

123

Page 22: Evaluating an employee wellness program

194 S. Mukhopadhyay, J. Wendel

Tabl

e7

Dif

fere

nce

indi

ffer

ence

estim

ate

ofth

eim

pact

ofpr

ogra

mpa

rtic

ipat

ion

sub-

sam

ple

resu

ltsfo

rye

ar3

Age

<44

Age

<44

Inco

me<

Med

ian

Inco

me<

Med

ian

FFS

HM

OM

ale

Fem

ale

Coe

ff.

Coe

ff.

Coe

ff.

Coe

ff.

Coe

ff.

Coe

ff.

Coe

ff.

Coe

ff.

tst

at.

tst

at.

tst

at.

tst

at.

tst

at.

tst

at.

tst

at.

text

ittst

at.

Com

preh

ensi

vem

easu

res

Hea

lthca

reco

st−2

57.8

1512

.881

9.7

644.

817

52.4

−678

.121

2.7

535.

1

0.36

1.67

∗1.

010.

782.

21∗

0.92

0.25

0.71

Hou

rsab

sent

4.83

−2.4

6−4

.60

5.94

3.48

−3.3

46.

560.

180

1.02

0.43

0.90

1.30

0.76

0.57

1.42

0.04

Pros

tate

canc

er0.

062

0.16

30.

080

0.13

80.

091

0.15

20.

081

1.71

3 .24

∗1.

493.

40∗

2.16

3.19

∗3.

65∗

Cer

vica

lcan

cer

0.02

40.

048

0.03

40.

051

0.03

20.

052

0.03

9

0.56

1.44

1.12

1.49

0.96

1.33

1.63

Bre

astc

ance

r0.

169

0.20

20.

163

0.17

90.

155

0.19

00.

173

3.94

∗7.

66∗

4.87

∗4.

23∗

4.44

∗4.

92∗

6.36

∗C

olor

ecta

lcan

cer

0.00

10.

063

0.02

30.

037

0 .06

1−0

.008

0.01

40.

035

0.11

2.51

∗1.

211.

792.

86∗

0.39

0.53

1.92

Ost

eopo

rosi

s0.

009

0.06

40.

014

0.05

40.

014

0.05

90.

031

0.65

2.27

0.57

1.87

0.49

2.17

1.69

Dia

bete

s0.

107

0.11

10.

111

0.11

00.

045

0.19

20.

081

0.12

0

6.06

∗6.

10∗

6.23

∗6.

46∗

2.81

∗8.

84∗

3.65

∗8.

04∗

Cho

lest

erol

0.11

30.

083

0.10

80.

091

0.07

30.

134

0.07

30.

107

3.84

∗3.

07∗

3.70

∗3.

18∗

2.83

∗4.

71∗

2.10

4.32

∗*

Stat

istic

ally

sign

ifica

ntat

the

5%

leve

l;tw

o-ta

iled

test

for

com

preh

ensi

vem

easu

res;

one-

taile

dte

stfo

rin

term

edia

tego

als;

Bon

ferr

onim

ultip

leco

mpa

riso

nsm

etho

d;cr

itica

lt=

2.45

123

Page 23: Evaluating an employee wellness program

Evaluating an employee wellness program 195

Table 8 Computation of the estimate of the cost impact of additional screenings

Condition targetedby the screening

Column 1: impactof screening onannual cost

Column 2: impactof EWP participa-tion on screening

Impact of EWPparticipation oncost = col. 1×col.2

Prostate cancer −473.57 0.119 −56.35

Cervical cancer 302.68 0.038 11.50

Breast cancer 994.38 0.177 176.00

Colorectal cancer −80.49 0.032 −2.576

Osteoporosis 1766.49 0.033 58.29

Diabetes 1363.64 0.109 148.64

Cholesterol 13.36 0.1 1.34

Sum = annual cost impact of screenings 336.84

cludes that EWPs can generate net savings. In contrast, employer survey results indicate thatmany programs do not achieve this result, and some actuarial analysts argue that short-termemployer-specific analyses should focus—instead—on the short term targets of employeeengagement and behavior change. We analyze data from one mid-size employer, for the firstthree years of the EWP, using propensity score matching to compute DD estimates of theimpact of participation on employee screening decisions. We find that the estimated impactof the program on healthcare cost is insignificant, but positive: participation in the programis associated with an (insignificant) increase in healthcare expenditures. This result is con-sistent with the actuarial evidence, indicating that preventive screening programs require aninitial investment—with the promise of generating savings in later years. However, we alsofind that the program vendor is managing the program well, in the sense that the programsuccessfully induced 40 % of the employees to participate, and it successfully generatedincreased screening rates among the participants. Coupled with the USPSTF conclusion thatmost of these screening tests are likely to generate net benefits (possibly, over a period ofyears), the vendor’s success in inducing increased screening participation indicates that theEWP is likely to generate benefits over time. The individual’s current employer may enjoy aportion of those benefits, if the organization continues to offer health insurance, and employeeturnover is low.

The employer’s decision to either terminate or continue the program, therefore, hingeson the specification of the evaluation criteria. If the employer’s goal is to achieve apositive ROI within a few years after implementing an EWP, these results suggest thatthe program should be terminated. In contrast, the employer will continue the pro-gram if he (i) relies on externally-generated evidence to demonstrate that compliancewith wellness and prevention recommendations will generate long-term savings, and (ii)focuses on firm-specific analysis on the impact of the program on the employee wellnessbehaviors.

Appendix

See Tables 9, 10, and 11.

123

Page 24: Evaluating an employee wellness program

196 S. Mukhopadhyay, J. Wendel

Tabl

e9

Nor

mal

ized

diff

eren

ces

for

the

year

prio

rto

EW

Pim

plem

enta

tion* Pa

rtic

ipan

tsN

on-p

artic

ipan

ts

Var

iabl

eD

efini

tion

Mea

nSD

Mea

nSD

Nor

m.d

iff.

**

Lon

g-ru

nou

tcom

es

Hea

lthca

reex

p.A

nn.p

dcl

aim

s/en

coun

ters

3063

.25

6222

.42

3732

.69

1018

1.03

−0.0

56

Hou

rsab

sent

Ann

ualh

ours

pdtim

eab

sent

83.5

872

.30

96.0

091

.54

−0.1

06

Dem

ogra

phic

char

acte

rist

ics

Gen

der

1if

fem

ale;

0if

mal

e0.

722

0.44

80.

501

0.50

00.

329

Age

age

44.2

609.

671

45.3

1710

.345

−0.0

75

Mar

ried

1if

mar

ried

,0ot

herw

ise

0.64

20.

475

0.64

10.

476

0.00

2

Wk_

pay

wee

kly

pay

1101

.034

345.

607

1118

.625

377.

575

−0.0

34

HM

O1

ifm

embe

rH

MO

;0if

FFS

0.43

50.

496

0.46

20.

499

−0.0

38

Dia

gnos

es

Dia

betic

cond

ition

1if

ICD

-9co

de;0

othe

rwis

e0.

296

0.45

70.

311

0.46

3−0

.024

Men

tali

llnes

s1

ifIC

D-9

code

;0ot

herw

ise

0.17

30.

379

0.16

60.

372

0.01

3

Bon

eco

nditi

on1

ifIC

D-9

code

;0ot

herw

ise

0.41

40.

493

0.43

40.

496

−0.0

29

Hyp

erte

nsio

n1

ifIC

D-9

code

;0ot

herw

ise

0.25

20.

434

0.29

70.

457

−0.0

71

Ast

hma

1if

ICD

-9co

de;0

othe

rwis

e0.

059

0.23

70.

063

0 .24

3−0

.010

Can

cer

1if

ICD

-9co

de;0

othe

rwis

e0.

045

0.20

80.

041

0.19

90.

014

Preg

nanc

y1

ifIC

D-9

code

;0ot

herw

ise

0.03

00.

171

0.03

00.

171

0.00

1

HIV

/AID

s1

ifIC

D-9

code

;0ot

herw

ise

0.00

00.

000

0.00

00.

000

Hep

atiti

s1

ifIC

D-9

code

;0ot

herw

ise

0.00

50.

071

0.01

10.

105

−0.0

48

Sam

ple

size

993

1,43

2

*V

aria

bles

that

will

bein

clud

edin

the

prop

ensi

tysc

ore

equa

tion

**N

orm

aliz

eddi

ffer

ence

=(p

artic

ipan

tmea

n−no

n-pa

rtic

ipan

tmea

n)/s

qrt(

sum

ofth

etw

ova

rian

ces)

,as

defin

edby

Imbe

nsan

dW

oold

ridg

e(2

010)

123

Page 25: Evaluating an employee wellness program

Evaluating an employee wellness program 197

Table 10 Difference in difference estimate of the impact of program participation

Year 1 Year 2 Year 3

Coefficient t statistic Coefficient t statistic Coefficient t statistic

Comprehensive measures

Healthcare cost −111.2 0.28 372.8 0.98 48.3 0.13

Hours absent −2.67 0.91 −3.36 0.71 −0.24 −0.6

Cancer male 0.049 1.63 0.060 1.35 0.068 1.38

Cancer female 0.058 2.12∗ 0.096 2.72∗ 0.148 3.73∗Screen (d/c/o) 0.022 1.06 0.059 2.32∗ 0.057 1.89∗

* Results for restricted sample of individuals excludes: individuals who were not employed for the full timeperiod; excludes individuals with the top 5 % of healthcare costs in each year* Statistically significant at the 5 % level (two-tailed test for comprehensive measures; one-tailed test forintermediate goals) n = 1,816 for measures relevant to the entire sample (healthcare cost, % of hours absent,colorectal cancer, osteoporosis, diabetes, cholesterol); n = 753 for the male-only test (prostate cancer); n = 1,063for the female-only tests (cervical and breast cancers)

Table 11 Impact of individual characteristics and screening tests on cost

Coefficient Standard error t statistic

Dependent variable: healthcare cost

Female −535.6258 1035.184 −0.52

Age 8.982805 145.5489 0.06

Age sq 0.339821 1.661134 0.2

Married 22.78439 441.671 0.05

Ln (wkly pay) −0.167269 0.6664037 −0.25

HMO 825.6703 486.3354 1.7

Diagnoses

Diabetes −1092.232 783.9723 −1.39

Mental health condition 1139.89 507.0803 2.25

Bone/joint condition 1022.834 523.2221 1.95

High cholesterol 1804.186 601.5565 3

Asthma 3420.587 1809.308 1.89

Cancer 8292.089 3003.397 2.76

Trauma 3994.727 883.3016 4.52

Pregnancy 4120.538 823.4665 5

HIV/AIDs 26924.24 22282.56 1.21

Hepatitis 10870.79 8471.199 1.28

Screenings

Cancer

Prostate cancer −473.5683 921.5218 −0.51

Cervical 302.6813 625.9315 0.48

Breast 994.3753 1509.641 0.66

Colorectal −80.49063 658.7774 −0.12

123

Page 26: Evaluating an employee wellness program

198 S. Mukhopadhyay, J. Wendel

Table 11 continued

Coefficient Standard error t statistic

Chronic conditions

Diabetes 1363.636 1369.78 1

Cholesterol 13.35859 655.6383 0.02

Osteoporosis 1766.486 2357.003 0.75

_cons −569.0456 3210.315 −0.18

References

Aon/Hewitt. (2011). 2011 Health Care Survey. http://img.en25.com/Web/AON/Aon%20Hewitt%20Health%20Care_Survey_2011_Final%5B1%5D.pdf. Accessed 14 May 2013.

Baicker, K., Cutler, D., & Song, Z. (2010). Workplace WPs can generate savings. Health Affairs, 29(2), 1–8.Becker, S. O., & Ichino, A. (2002). Estimation of average treatment effects based on propensity scores. The

Stata Journal, 2, 358–377.Buck Surveys. (2010). Working well: A global survey of health promotion and workplace wellness strate-

gies. https://www.bucksurveys.com/bucksurveys/product/tabid/139/p-51-working-well-a-global-survey-of-health-promotion-and-workplace-wellness-strategies.aspx. Accessed 14 May 2013.

Crump, R., Hotz, V. J., Imbens, G., & Mitnik, O. (2008). Dealing with limited overlap in estimation of averagetreatment effects. Biometrika, 96(1), 187–199.

Disease Management Association of American and National Manufacturers’ Association. (2007). Wellness,disease and care management: Background for developing a business strategy. An employer toolkit (pp.72).

Edington, D. (2009). Zero trends: Health as a serious economic strategy. Ann Arbor, MI: University ofMichigan Press.

Eichner, M. J., McClellan, M. B., & Wise, D. A. (1997). Health expenditure persistence and the feasibilityof medical savings accounts. In J. M. Poterba (Ed.), Tax policy and the economy (Vol. 11, pp. 91–128).Cambridge, MA: MIT Press.

Fitch, K. (2008). Wellness lessons learned from the private sector. Insight: Expert thinking from Milliman.www.Milliman.com.

Fitch, K., & Pyenson, B. (2008). Taking stock of wellness. Benefits Quarterly, 24(2), 34–40.Goetzel, R., Pei, X., Tabrizi, M., Henke, R., Kowlessar, N., Nelson, C., et al. (2012). Ten modifiable health

risk factors are linked to more than one-fifth of employer-employee health care spending. Health Affairs,31(11), 2474–2484.

Gross, P. (2012). Editoral: Process versus outcome measures—the end of the debate. Medical Care, 50, 200–202.

Imbens, G., & Rubin, D. (2007). Causal inference: Statistical methods for estimating causal effects in bio-medical, social, and behavioral sciences. Cambridge: Cambridge University Press.

Imbens, G., Rubin, D., & Sacerdote, B. (2001). Estimating the effect of unearned income on labor supply,earnings, savings and consumption: Evidence from a survey of lottery players. American Economic Review,91, 778–794.

Imbens, G. W., & Wooldridge, J. M. (2010). Recent developments in the econometrics of program development.Journal of Economic Literature, 47(1), 5–86.

Leuven, E., & Sianesi, B. (2003). PSMATCH2: Stata module to perform full Mahalanobis and propensity scorematching, common support graphing, and covariate imbalance testing (version 3.0.0). Boston, MA: Depart-ment of Economics, Boston University. Retrieved from http://ideas.repec.org/c/boc/bocode/s432001.html.Accessed 14 May 2013.

Loeppke, R., Edington, D., & Beg, S. (2010). Impact of a prevention plan on employee health risk reduction.Population Health Management, 13(5), 275–284.

Loeppke, R., Nicholson, S., Taitel, M., Sweeney, M., Haufle, V., & Kessler, R. (2008). The impact of anintegrated population health enhancement and disease management program on employee health risk,health conditions, and productivity. Population Health Management, 11(6), 287–296.

Nicholson, S., Pauly, M., Polsky, D., Baase, C., Billotti, B., Ozminkowski, R., et al. (2005). How to present thebusiness case for healthcare quality to employers. Working paper. http://knowledge.wharton.upenn.edu/papers/1303.pdf. Accessed 14 May 2013.

123

Page 27: Evaluating an employee wellness program

Evaluating an employee wellness program 199

O’Donnell, M. (2010). Federal grants to small employers for comprehensive wellness programs: Passed aspart of health care reform. Editor’s Notes. American Journal of Health Promotion, 25(1), iv–v.

Pfaffenberger, J. H., & Patterson, R. C. (1987). Statistical methods for business and economics. Homewood,IL: Irwin.

Pyenson, B., & Zenner, P. (2005). Cancer screening: payer cost/benefit through employee benefits programs.Commissioned by C-Change and the American Cancer Society. Milliman Consultants and Actuaries.

Rubin, D. (2006). Matched sampling for causal effects. Cambridge: Cambridge University Press.Sekhon, J. S. (2009). Opiates for the matches: Matching methods for causal inference. Annual Review of

Political Science, 12(1), 487–508.Serxner, S., Gold, D., Meraz, A., & Gray, A. (2009). Do employee health management programs work?

American Journal of Health Promotion, 23(4), 1–8.Serxner, S., Noeldner, S. P., & Gold, D. (2006). Best practices for an integrated population health management

(PHM) program. American Journal of Health Promotion, 20(Suppl. 5), 1–10.Sing, M. (2004). Using encounter data from Medicaid HMOs for research and monitoring. Inquiry, 41(3),

336–346.Smith, J., & Todd, P. (2005). Does matching overcome LaLonde’s critique of non-experimental estimators?

Journal of Econometrics, 125, 305–353.USPSTF. (2013). United States Preventive Task Force A and B recommendations. Retrieved April 13, 2013

from http://www.uspreventiveservicestaskforce.org/uspstf/uspsabrecs.htm.Weltz, S. (2009). Is your wellness program cost effective? Benefits Perspectives. Fall, 1–3. http://publications.

milliman.com/periodicals/bp/pdfs/BP11-24-09.pdf.

123


Recommended