Download - Craig’s Festschrift - University of British Columbia€¦ · Craig’s Festschrift . Stability of Stated Preferences, Nursing Workforce in Australia . by . Denise Doiron, UNSW Australia

Craig’s Festschrift

Stability of Stated Preferences, Nursing Workforce in Australia

by Denise Doiron, UNSW Australia

Hong Il Yoo, Durham University UK

This research was partially supported by ARC grant DP0881205. We thank the research team at UTS: Jane Hall, Deborah Street, Patsy Kenny and Agne Suziedelyte, City University London UK.

Introduction

• Motivation for the study: to understand job preferences of junior nurses • A dedicated and specialised longitudinal survey was conducted and

analysed. • Comparisons across waves and methods uncovered a puzzling result that

may have wider repercussions in terms of the elicitation and estimation of preferences involving monetary values.

• We found: – Substantial level of stability in the weights on non-salary job attributes

across elicitation methods (same wave same people). – Salary weights differ across methods (same wave same people). – Substantial level of stability in the weights on non-salary job attributes

across time periods (same method same people). – Salary weights differ across waves (same method same people).

Introduction

• The stability in weights on non-salary attributes is reassuring as it supports the usefulness of stated preference methods generally and in particular for policy recommendations.

• But the instability in the MU of income is concerning especially since it is used in the calculations of WTP measures. WTPs are the main measures used in comparisons across studies and in policy recommendations.

Introduction

• Concerns over nursing workforce shortages has led to research on nurses’ job preferences and choices

• Causal factors of projected shortages on both demand and supply side • Policy approach has been to increase number of training places and

salaries; however, research shows high importance of non-salary factors • Attrition rates are very high especially among junior nurses

• Lack of appropriate data is a challenge for researchers, admin data has

restricted information while survey data are usually not specific to nurses so samples are small and information is not specific to nursing jobs.

• Papers: Frijters et al 2011, Shields and Ward 2001, Doiron et al 2008, Cunich and Whelan 2010, Nooney et al 2010, Aiken et al 2002,

(for surveys see: Oulton 2006, Aiken et al 2012)

Survey • Designed and conducted a longitudinal survey among nursing students and

new graduates • 2 large nursing training programs in NSW: University Technology Sydney

(urban) and University of New England (rural) • Intensive recruiting campaign, online surveys incentivised with donations to

Medecins Sans Frontieres • 2 components - survey questions on personal and job characteristics and attitudes - discrete choice experiments to elicit job preferences • Timeline:

– Recruitment: 2008-2010 – Wave 1 questionnaires: Sept 2009 - July 2011 – Wave 2 questionnaires: April 2011 - August 2012 – Later waves not analysed yet

Survey • Wave 1 and wave 2 completion dates are at least 1 year apart for each

respondent, and 15 months on average. • Sample sizes: 628 persons answered wave 1 241 persons answered waves 1 and 2. • Just under 50% of student body registered for survey and around 20%

completed online survey (wave 1). Similar proportions for both unis. • Representativeness relative to BN national student body:

– similar gender composition – slightly younger and more English speaking

Stated Preference Methods • Historically used in marketing, transport, and environmental economics.

Growing literature in health economics. • Advantages: provides evaluation of

– New products or policies – Goods or services where markets do not exist or function well – Goods or services with little observed variation

• In this case, nursing jobs are highly regulated and there is little variation. Only “revealed” choice by workers is often to leave nursing altogether.

• Challenges: – Data collection and design (importance of incentivization, use of pilots) – External validity (relation between revealed and stated preferences) – Little is known about best methodology

• Applications in health economics: de Bekker-Grob et al 2010, Lagarde and Blaauw 2009 (surveys of DCEs), Sivey et al 2010 (application to doctors in Australia)

Stated Preference Instruments • Older method of open-ended questions (contingent valuation) has been

replaced by Discrete Choice Experiments (DCEs) where the stated valuations are not so directly elicited.

• In DCEs, respondent is asked to choose between two or more options in repeated scenarios or choice tasks.

• We use a variant: best-worst profile henceforth multi-profile or best-worst

job (BWJ) where the respondent chooses the best and the worst of 3 job profiles.

• Each respondent sees 8 BWJ scenarios • Choice of attributes based on literature identifying important aspects of

nursing job (e.g. magnet hospital research in the US). Salary levels correspond to the range observed in entry level nursing jobs in Australia.

• This application would be considered a complex task in the literature due to the number of attributes used.

Example of best-worst job task

Stated Preference Instruments

• We also use a recently proposed experiment: best-worst attribute henceforth single-profile or BWA.

• In BWA, the respondent chooses the best and worse attribute in a single job profile.

• Each respondent sees 8 BWA scenarios (in addition to the 8 BWJ choice tasks).

• We didn’t include an opt-out choice but we did include a yes-no “would you take this job?” question in the BWA scenarios.

• For both experiments, the variation in attribute levels across scenarios was chosen by D. Street, according to a design that is optimal under specific modelling assumptions.

• A pilot study led to some changes but also supported the instruments used.

Example of best-worst attribute-level task

BWJ vs BWA • Cognitive demands are believed to be lessened under BWA. • Identification and estimation results are different:

– In BWJ, the model is identified from variation in levels within attributes; i.e. the utility function is normalised based on a profile or job. Estimated preference weights provide valuations of the variation in attribute levels.

– In BWA, the model is identified from variations across attributes as well as across levels of the attributes; i.e. the objective function is normalised based on one attribute-level. Estimated preference weights provide valuations of the attribute levels relative to a base attribute level

More preference information is identified in BWA

• In both cases, we can analyse either the best choice only or the best and worst choices.

Stability • A comparison of preference weights across waves and across methods.

• To the best of our knowledge, only 3 papers have made comparisons

across the two methods. Our earlier paper was the only one to include a monetary attribute.

• There is a very small literature on comparisons over time in stated preference weights.

• In our case, the time period between waves would be considered to be long (most published studies have time spans of a few weeks at most).

• The attributes and their levels are identical across waves except for salary. The range of salary levels was raised to reflect the actual salaries for entry-level nursing jobs. This complicates the comparison across time and has led to much sensitivity analysis.

Econometric specifications • Likelihood models are built around the logit kernel with type 1 extreme value

distributions for the errors.

• Models differ based on: – The use of best AND worst choices (vs best only)

o BWJ: heteroskedastic rank-ordered logit (Hausman and Ruud 1987) o BWA: max-diff (Marley and Louviere 2005 & Marley et al 2008)

– The modelling of unobserved heterogeneity o Mixed logits (McFadden and Train 2000) o GMNL (Fiebig et al 2010) o Latent class logits (Train 2008)

• In previous papers, we have developed variants combining these models

(e.g. the latent class heteroskedastic rank-ordered logit, the latent class maxdiff).

Econometric specifications

• In the current paper, we estimate a model that combines both BWJ and BWA data as well as the first two waves of the survey. This will allow us to – directly test hypotheses across methods and waves – to link the unobserved heterogeneity directly across methods and

waves.

• These form new applications and in some cases, extensions of various latent class and heteroskedastic models.

• We also look at observed heterogeneity by interacting various characteristics with the job attributes.

Stability across methods, Wave 1 • Based on Yoo and Doiron, Journal of Health Economics, 2013. • See Figure 3 for main findings (based on latent class heteroskedastic rank-

ordered logit and latent class max-diff). • Respondents place less weight on salary in BWA compared to BWJ. • Non-salary attributes are similar in the two methods. • Error variance is smaller in the BWA method (more certain choices). • Other findings:

– Main result not driven by specific latent classes – Unlikely to be due to differences in cognitive demands (accept/reject

results align with the BWJ rather than the BWA choices) • Our conjecture is that the overt comparison between salary and other

attributes leads to a dampening of the weight on salary.

• Previous 2 papers comparing methods (on quality of life) find the preference weights similar across methods but they did not have a monetary attribute (Potoglou et al 2011, Flynn et al 2013).

BWT(BWA) vs BWL(BWJ): Average estimates

Stability across waves, BWJ • Based on Doiron & Yoo, Health Economics, 2016. • See Figure 1 (based on latent class logits using best choice only)

• Preference weights are generally consistent across the two waves, the biggest shift is in salary which has lower valuation in wave 2.

• Sensitivity analysis to various salary specifications suggests it is not due to price vector effects (e.g. dummy variable specifications, comparisons of nominal and real salary levels).

• Keeping the weight on salary constant across waves, the average transfer error (average percentage change in MWTP) between waves is 11%, well within the range in studies that find temporal stability of utility weights.

• Papers on temporal stability: Schaafsma et al 2014, Liebe et al 2012, Skoldorg et al 2009, Czajkowski et al 2014. Generally, studies find that weights shift more with longer time spans.

• Ours was the first paper on the stability of job preferences.

Wave 2 vs Wave 1, BWJ, Average estimates

Importance of preference weights on salary • The instability of the estimates on salary are especially concerning since

this coefficient is used to convert utility weights into monetary values or willingness-to-pay (WTP). These are used to interpret and compare results.

Figure 2 from Doiron & Yoo 2016.

New results: Stability of preferences using the BWA method • BWA estimates had less error variance in the wave 1 comparison with BWJ.

Does the method yield more stability over time? Is the volatility in the MU of income also found with BWA?

• To the best of our knowledge, only one other paper has looked at stability of weights using this method: – Islam and Louviere 2015, look at 3 products: toothpaste, pizza,

detergent. 4 waves over 2 years, time span between waves was 3 to 6 months.

• The Islam&Louviere paper looks at aggregate preferences only (i.e. they perform tests on the proportions of choices as best) and find very strong preference stability. Price was included and its rank was consistent across waves. No model using individual data was estimated.

Prelim results, BWA, waves 1 & 2 comparisons

• Following estimates are based on conditional logits and heteroskedastic conditional logits using the best attribute choice only. (We use the clogithet command written by Hole)

• Several salary specifications are used, a linear specification could not be rejected when compared to the most general (dummy variable) specification. The log specification did marginally better than the linear model in terms of pseudolikelihoods.

• Salary comparisons across waves were made under ordinal and cardinal assumptions.

• No. obs=45204 No. indivs=236 (balanced sample). • All standard errors and test statistics are clustered by individuals.

Prelim results, BWA, waves 1 & 2 comparisons • Salary is the only attribute whose weight shifts significantly (at 5%) across

the waves. In the linear (log linear) salary specification, only the intercept (constant) shift is significant.

• Joint Wald test of shifts in attribute weights:

Salary Specification

Dummy Variables Linear Log linear

Χ2(dof), p-value Χ2(dof), p-value Χ2(dof), p-value

All attributes 45.77(25), 0.007 166.78(23), 0.000 145.01(23), 0.000

All ex. salary 30.33(21), 0.086 31.76(21), 0.062 31.67(21), 0.063

Salary 14.46(4), 0.006 13.42(2), 0.001 14.23(2), 0.001

Pseudo-ll -6250.5032 -6253.6238 -6253.4682

No. coeffs 50 46 46

2.287

3.798

4.768

5.503

1.094

1.969

3.384

4.688

0

1

2

3

4

5

6

7

800 900 1000 1100 1200 1300 1400 1500

Pref

eren

ce W

eigh

ts

Salary

Salary weights, Dummy variable and linear specifications

Wave 1

Wave 2

slope=0.0066

slope=0.0062

2.22

4

priv_hosp

publ_hosp

three_rot

ft_hours

flex_hours

inflex_rost

flex_rost

short_staff

well_staff

unsupp_mgmt

supp_mgmt

poor_equip

well_equip

no_encourage

encourage

limit_park

abund_park

excess_resp

approp_resp

poor_qual

excel_qual salary (at $1100)

-2

-1

0

1

2

3

4

5

6

7

-3 -2 -1 0 1 2 3 4 5 6

Wav

e 1

Wave 2

BWA Preference weights, Wave 1 vs Wave 2 (linear salary specification)

trendline w/o salary slope=0.9 intercept=1.0

• In heteroskedastic models, after allowing for shifts in the salary variables, the hypothesis of homoskedasticity cannot be rejected (lm test in linear specification has p-value of 0.7624).

Overall findings, BWA, comparisons across waves

• For non-salary attributes: an amazing level of stability in preference weights across a relatively long time span and for a complex task.

• Relative weight on salary shifts down substantially over time (consistent with previous BWJ results). The shift is fairly constant across the range in salary.

• No indication of a shift in the error variance across waves; when allowing shifts in salary the hypothesis of homoskedasticity cannot be rejected. In this sense, there is no more or less uncertainty in preferences in the two waves.

Comparisons across methods

• We estimate models with both BWJ and BWA data.

• Three main specifications are used allowing for shifts across experiments: – Additive shifts in coefficients – Heteroskedastic models with a shift in the error variance – Hybrid models with shifts in the error variance and additive shifts in the

salary coefficients

• Three salary specifications are used: – Dummy variables – Linear salary – Loglinear salary

Comparisons across methods • General models allowing additive shifts across methods and waves are

estimated • In BWJ:

– Salary weights are better fitted with log than linear functions. – A few significant shifts across time in non-salary attributes although the

shifts are generally small.

• Is the shift in salary as large in BWJ as it is in BWA? • Estimated drops are 41% in BWJ and 64% in BWA . A non-linear Wald-type

test suggests that the relative shifts are not significantly different. (A test of the equality of the relative shifts across waves in salary evaluated at a salary of 1100 and based on the model with additive shifts in all coefficients and dummy variables for salary yields a Χ2(1)= 6.9, p-value = 0.0086).

0

0.521

0.930

1.059

0

0.549

0.751

0.987

0

0.2

0.4

0.6

0.8

1

1.2

800 900 1000 1100 1200 1300 1400 1500 1600

Pref

eren

ce w

eigh

ts

Salary

Salary weights, BWJ data Dummy variable specification

wave 1wave 2

choice Coef. W1 W2-W1 Coef. W1 W2-W1 Coef W1 W2-W1

publ_hosp 1.212 *** -0.113 0.332 *** -0.019 0.881 ** -0.094three_rot 4.795 *** -0.885 0.208 *** -0.009 4.587 *** -0.876flex_hours 2.261 *** -0.467 0.110 ** 0.069 2.151 *** -0.536flex_rost 4.608 *** 1.707 0.666 *** -0.141 * 3.942 *** 1.848well_staff 4.448 *** 0.708 0.438 *** 0.171 ** 4.010 *** 0.537supp_mgmt 4.811 *** 0.507 1.207 *** -0.153 3.604 *** 0.659well_equip 4.572 *** -0.132 0.529 *** -0.249 *** 4.043 *** 0.117encourage 5.131 *** -0.548 0.550 *** 0.089 4.581 *** -0.637abund_park 3.625 *** -0.816 0.046 0.169 ** 3.579 *** -0.985approp_resp 3.087 *** 1.048 0.517 *** -0.087 2.570 *** 1.135excel_qual 4.947 *** 0.363 1.051 *** -0.182 * 3.896 *** 0.546

salary 1100 4.768 *** -2.800 ***normalized 2.481 *** -1.606 *** 0.930 *** -0.382 *** 1.551 *** -1.225 **

loglikelihood -6250.5 -3029.26 -9279.76No. obs. 45204 11244 56448No. inds. 236 236 236No. pars. 50 28 78

*** indicates significant at 1%, ** at 5% and * at 10%. All standard errors are clustered at the individual level.Coeffs for BWA are normalized to be comparable to BWJ coeffs.

BWA BWJ BWA-BWJClogit with additive shifts in all coeffs across waves and methods, dummy vars specification for salary

Heteroskedastic models

• These models are parsimonious (useful for models with heterogeneity). • Hybrid models with additive shifts in salary perform fairly well.

• The estimated shifts in error variances with test of H0: ratio=1

– Var BWJ W1/ Var BWJ W2 = 0.906 Χ2(1)=2.76, p-value=0.097

– Var BWJ W1/ Var BWA W1 = 6.415 Χ2(1)=122.5, p-value=0.000

– Var BWA W1/ Var BWA W2 = 1.001 Χ2(1)=0.00, p-value=0.9878

• Some weak evidence of heteroskedasticity across waves in BWJ, none in BWA

Comparisons across methods

• More stability across waves in BWA than BWJ (both with additive shifts and heteroskedastic models) except for salary

• Relative shift in salary across waves is larger in BWA than BWJ

To Do

• Estimate models with unobserved heterogeneity • Use of all info (best and worst job/attribute choices) • Compute and compare WTP measures • Look at observed heterogeneity w.r.t. interesting characteristics (graduates

vs students)

Discussion

• How do we elicit preferences on MU of income? Different methods yield different estimates for the same people at the same time.

• The greater volatility of MU of income across waves for a balanced sample also raises questions of credibility and usefulness of stated preferences, especially WTP measures.

• Are these results specific to this context? External validity.

• Do the results hold for simpler goods?

• How could we look at this issue in the lab?

• Implications for revealed preference estimates?

Job attributes

4 weekly salary levels and 2 levels of 11 non-pecuniary attributes

Attribute name Levels Location Private hospital; Public hospital Clinical rotations None; Three Work hours Fulltime only; Part-time or fulltime Rostering Inexible, does not allow requests; Flexible, usually accommodating requests Staffing levels Frequently short of staff; Usually well-staffed Workplace culture Unsupportive management & staff; Supportive management & staff Physical environment Poorly equipped & maintained facility; Well equipped & maintained facility Professional development and progression

No encouragement for nurses; Nurses encouraged

Parking Limited; Abundant and safe Responsibility Too much responsibility; Appropriate responsibility Quality of care Poor; Excellent Salary $800 ; $950 ; $1,100 ; $1,250