Craig’s Festschrift
Stability of Stated Preferences, Nursing Workforce in Australia
by Denise Doiron, UNSW Australia
Hong Il Yoo, Durham University UK
This research was partially supported by ARC grant DP0881205. We thank the research team at UTS: Jane Hall, Deborah Street, Patsy Kenny and Agne Suziedelyte, City University London UK.
Introduction
• Motivation for the study: to understand job preferences of junior nurses • A dedicated and specialised longitudinal survey was conducted and
analysed. • Comparisons across waves and methods uncovered a puzzling result that
may have wider repercussions in terms of the elicitation and estimation of preferences involving monetary values.
• We found: – Substantial level of stability in the weights on non-salary job attributes
across elicitation methods (same wave same people). – Salary weights differ across methods (same wave same people). – Substantial level of stability in the weights on non-salary job attributes
across time periods (same method same people). – Salary weights differ across waves (same method same people).
Introduction
• The stability in weights on non-salary attributes is reassuring as it supports the usefulness of stated preference methods generally and in particular for policy recommendations.
• But the instability in the MU of income is concerning especially since it is used in the calculations of WTP measures. WTPs are the main measures used in comparisons across studies and in policy recommendations.
Introduction
• Concerns over nursing workforce shortages has led to research on nurses’ job preferences and choices
• Causal factors of projected shortages on both demand and supply side • Policy approach has been to increase number of training places and
salaries; however, research shows high importance of non-salary factors • Attrition rates are very high especially among junior nurses
• Lack of appropriate data is a challenge for researchers, admin data has
restricted information while survey data are usually not specific to nurses so samples are small and information is not specific to nursing jobs.
• Papers: Frijters et al 2011, Shields and Ward 2001, Doiron et al 2008, Cunich and Whelan 2010, Nooney et al 2010, Aiken et al 2002,
(for surveys see: Oulton 2006, Aiken et al 2012)
Survey • Designed and conducted a longitudinal survey among nursing students and
new graduates • 2 large nursing training programs in NSW: University Technology Sydney
(urban) and University of New England (rural) • Intensive recruiting campaign, online surveys incentivised with donations to
Medecins Sans Frontieres • 2 components - survey questions on personal and job characteristics and attitudes - discrete choice experiments to elicit job preferences • Timeline:
– Recruitment: 2008-2010 – Wave 1 questionnaires: Sept 2009 - July 2011 – Wave 2 questionnaires: April 2011 - August 2012 – Later waves not analysed yet
Survey • Wave 1 and wave 2 completion dates are at least 1 year apart for each
respondent, and 15 months on average. • Sample sizes: 628 persons answered wave 1 241 persons answered waves 1 and 2. • Just under 50% of student body registered for survey and around 20%
completed online survey (wave 1). Similar proportions for both unis. • Representativeness relative to BN national student body:
– similar gender composition – slightly younger and more English speaking
Stated Preference Methods • Historically used in marketing, transport, and environmental economics.
Growing literature in health economics. • Advantages: provides evaluation of
– New products or policies – Goods or services where markets do not exist or function well – Goods or services with little observed variation
• In this case, nursing jobs are highly regulated and there is little variation. Only “revealed” choice by workers is often to leave nursing altogether.
• Challenges: – Data collection and design (importance of incentivization, use of pilots) – External validity (relation between revealed and stated preferences) – Little is known about best methodology
• Applications in health economics: de Bekker-Grob et al 2010, Lagarde and Blaauw 2009 (surveys of DCEs), Sivey et al 2010 (application to doctors in Australia)
Stated Preference Instruments • Older method of open-ended questions (contingent valuation) has been
replaced by Discrete Choice Experiments (DCEs) where the stated valuations are not so directly elicited.
• In DCEs, respondent is asked to choose between two or more options in repeated scenarios or choice tasks.
• We use a variant: best-worst profile henceforth multi-profile or best-worst
job (BWJ) where the respondent chooses the best and the worst of 3 job profiles.
• Each respondent sees 8 BWJ scenarios • Choice of attributes based on literature identifying important aspects of
nursing job (e.g. magnet hospital research in the US). Salary levels correspond to the range observed in entry level nursing jobs in Australia.
• This application would be considered a complex task in the literature due to the number of attributes used.
Example of best-worst job task
Stated Preference Instruments
• We also use a recently proposed experiment: best-worst attribute henceforth single-profile or BWA.
• In BWA, the respondent chooses the best and worse attribute in a single job profile.
• Each respondent sees 8 BWA scenarios (in addition to the 8 BWJ choice tasks).
• We didn’t include an opt-out choice but we did include a yes-no “would you take this job?” question in the BWA scenarios.
• For both experiments, the variation in attribute levels across scenarios was chosen by D. Street, according to a design that is optimal under specific modelling assumptions.
• A pilot study led to some changes but also supported the instruments used.
Example of best-worst attribute-level task
BWJ vs BWA • Cognitive demands are believed to be lessened under BWA. • Identification and estimation results are different:
– In BWJ, the model is identified from variation in levels within attributes; i.e. the utility function is normalised based on a profile or job. Estimated preference weights provide valuations of the variation in attribute levels.
– In BWA, the model is identified from variations across attributes as well as across levels of the attributes; i.e. the objective function is normalised based on one attribute-level. Estimated preference weights provide valuations of the attribute levels relative to a base attribute level
More preference information is identified in BWA
• In both cases, we can analyse either the best choice only or the best and worst choices.
Stability • A comparison of preference weights across waves and across methods.
• To the best of our knowledge, only 3 papers have made comparisons
across the two methods. Our earlier paper was the only one to include a monetary attribute.
• There is a very small literature on comparisons over time in stated preference weights.
• In our case, the time period between waves would be considered to be long (most published studies have time spans of a few weeks at most).
• The attributes and their levels are identical across waves except for salary. The range of salary levels was raised to reflect the actual salaries for entry-level nursing jobs. This complicates the comparison across time and has led to much sensitivity analysis.
Econometric specifications • Likelihood models are built around the logit kernel with type 1 extreme value
distributions for the errors.
• Models differ based on: – The use of best AND worst choices (vs best only)
o BWJ: heteroskedastic rank-ordered logit (Hausman and Ruud 1987) o BWA: max-diff (Marley and Louviere 2005 & Marley et al 2008)
– The modelling of unobserved heterogeneity o Mixed logits (McFadden and Train 2000) o GMNL (Fiebig et al 2010) o Latent class logits (Train 2008)
• In previous papers, we have developed variants combining these models
(e.g. the latent class heteroskedastic rank-ordered logit, the latent class maxdiff).
Econometric specifications
• In the current paper, we estimate a model that combines both BWJ and BWA data as well as the first two waves of the survey. This will allow us to – directly test hypotheses across methods and waves – to link the unobserved heterogeneity directly across methods and
waves.
• These form new applications and in some cases, extensions of various latent class and heteroskedastic models.
• We also look at observed heterogeneity by interacting various characteristics with the job attributes.
Stability across methods, Wave 1 • Based on Yoo and Doiron, Journal of Health Economics, 2013. • See Figure 3 for main findings (based on latent class heteroskedastic rank-
ordered logit and latent class max-diff). • Respondents place less weight on salary in BWA compared to BWJ. • Non-salary attributes are similar in the two methods. • Error variance is smaller in the BWA method (more certain choices). • Other findings:
– Main result not driven by specific latent classes – Unlikely to be due to differences in cognitive demands (accept/reject
results align with the BWJ rather than the BWA choices) • Our conjecture is that the overt comparison between salary and other
attributes leads to a dampening of the weight on salary.
• Previous 2 papers comparing methods (on quality of life) find the preference weights similar across methods but they did not have a monetary attribute (Potoglou et al 2011, Flynn et al 2013).
BWT(BWA) vs BWL(BWJ): Average estimates
Stability across waves, BWJ • Based on Doiron & Yoo, Health Economics, 2016. • See Figure 1 (based on latent class logits using best choice only)
• Preference weights are generally consistent across the two waves, the biggest shift is in salary which has lower valuation in wave 2.
• Sensitivity analysis to various salary specifications suggests it is not due to price vector effects (e.g. dummy variable specifications, comparisons of nominal and real salary levels).
• Keeping the weight on salary constant across waves, the average transfer error (average percentage change in MWTP) between waves is 11%, well within the range in studies that find temporal stability of utility weights.
• Papers on temporal stability: Schaafsma et al 2014, Liebe et al 2012, Skoldorg et al 2009, Czajkowski et al 2014. Generally, studies find that weights shift more with longer time spans.
• Ours was the first paper on the stability of job preferences.
Wave 2 vs Wave 1, BWJ, Average estimates
Importance of preference weights on salary • The instability of the estimates on salary are especially concerning since
this coefficient is used to convert utility weights into monetary values or willingness-to-pay (WTP). These are used to interpret and compare results.
Figure 2 from Doiron & Yoo 2016.
New results: Stability of preferences using the BWA method • BWA estimates had less error variance in the wave 1 comparison with BWJ.
Does the method yield more stability over time? Is the volatility in the MU of income also found with BWA?
• To the best of our knowledge, only one other paper has looked at stability of weights using this method: – Islam and Louviere 2015, look at 3 products: toothpaste, pizza,
detergent. 4 waves over 2 years, time span between waves was 3 to 6 months.
• The Islam&Louviere paper looks at aggregate preferences only (i.e. they perform tests on the proportions of choices as best) and find very strong preference stability. Price was included and its rank was consistent across waves. No model using individual data was estimated.
Prelim results, BWA, waves 1 & 2 comparisons
• Following estimates are based on conditional logits and heteroskedastic conditional logits using the best attribute choice only. (We use the clogithet command written by Hole)
• Several salary specifications are used, a linear specification could not be rejected when compared to the most general (dummy variable) specification. The log specification did marginally better than the linear model in terms of pseudolikelihoods.
• Salary comparisons across waves were made under ordinal and cardinal assumptions.
• No. obs=45204 No. indivs=236 (balanced sample). • All standard errors and test statistics are clustered by individuals.
Prelim results, BWA, waves 1 & 2 comparisons • Salary is the only attribute whose weight shifts significantly (at 5%) across
the waves. In the linear (log linear) salary specification, only the intercept (constant) shift is significant.
• Joint Wald test of shifts in attribute weights:
Salary Specification
Dummy Variables Linear Log linear
Χ2(dof), p-value Χ2(dof), p-value Χ2(dof), p-value
All attributes 45.77(25), 0.007 166.78(23), 0.000 145.01(23), 0.000
All ex. salary 30.33(21), 0.086 31.76(21), 0.062 31.67(21), 0.063
Salary 14.46(4), 0.006 13.42(2), 0.001 14.23(2), 0.001
Pseudo-ll -6250.5032 -6253.6238 -6253.4682
No. coeffs 50 46 46
2.287
3.798
4.768
5.503
1.094
1.969
3.384
4.688
0
1
2
3
4
5
6
7
800 900 1000 1100 1200 1300 1400 1500
Pref
eren
ce W
eigh
ts
Salary
Salary weights, Dummy variable and linear specifications
Wave 1
Wave 2
slope=0.0066
slope=0.0062
2.22
4
priv_hosp
publ_hosp
three_rot
ft_hours
flex_hours
inflex_rost
flex_rost
short_staff
well_staff
unsupp_mgmt
supp_mgmt
poor_equip
well_equip
no_encourage
encourage
limit_park
abund_park
excess_resp
approp_resp
poor_qual
excel_qual salary (at $1100)
-2
-1
0
1
2
3
4
5
6
7
-3 -2 -1 0 1 2 3 4 5 6
Wav
e 1
Wave 2
BWA Preference weights, Wave 1 vs Wave 2 (linear salary specification)
trendline w/o salary slope=0.9 intercept=1.0
• In heteroskedastic models, after allowing for shifts in the salary variables, the hypothesis of homoskedasticity cannot be rejected (lm test in linear specification has p-value of 0.7624).
Overall findings, BWA, comparisons across waves
• For non-salary attributes: an amazing level of stability in preference weights across a relatively long time span and for a complex task.
• Relative weight on salary shifts down substantially over time (consistent with previous BWJ results). The shift is fairly constant across the range in salary.
• No indication of a shift in the error variance across waves; when allowing shifts in salary the hypothesis of homoskedasticity cannot be rejected. In this sense, there is no more or less uncertainty in preferences in the two waves.
Comparisons across methods
• We estimate models with both BWJ and BWA data.
• Three main specifications are used allowing for shifts across experiments: – Additive shifts in coefficients – Heteroskedastic models with a shift in the error variance – Hybrid models with shifts in the error variance and additive shifts in the
salary coefficients
• Three salary specifications are used: – Dummy variables – Linear salary – Loglinear salary
Comparisons across methods • General models allowing additive shifts across methods and waves are
estimated • In BWJ:
– Salary weights are better fitted with log than linear functions. – A few significant shifts across time in non-salary attributes although the
shifts are generally small.
• Is the shift in salary as large in BWJ as it is in BWA? • Estimated drops are 41% in BWJ and 64% in BWA . A non-linear Wald-type
test suggests that the relative shifts are not significantly different. (A test of the equality of the relative shifts across waves in salary evaluated at a salary of 1100 and based on the model with additive shifts in all coefficients and dummy variables for salary yields a Χ2(1)= 6.9, p-value = 0.0086).
0
0.521
0.930
1.059
0
0.549
0.751
0.987
0
0.2
0.4
0.6
0.8
1
1.2
800 900 1000 1100 1200 1300 1400 1500 1600
Pref
eren
ce w
eigh
ts
Salary
Salary weights, BWJ data Dummy variable specification
wave 1wave 2
choice Coef. W1 W2-W1 Coef. W1 W2-W1 Coef W1 W2-W1
publ_hosp 1.212 *** -0.113 0.332 *** -0.019 0.881 ** -0.094three_rot 4.795 *** -0.885 0.208 *** -0.009 4.587 *** -0.876flex_hours 2.261 *** -0.467 0.110 ** 0.069 2.151 *** -0.536flex_rost 4.608 *** 1.707 0.666 *** -0.141 * 3.942 *** 1.848well_staff 4.448 *** 0.708 0.438 *** 0.171 ** 4.010 *** 0.537supp_mgmt 4.811 *** 0.507 1.207 *** -0.153 3.604 *** 0.659well_equip 4.572 *** -0.132 0.529 *** -0.249 *** 4.043 *** 0.117encourage 5.131 *** -0.548 0.550 *** 0.089 4.581 *** -0.637abund_park 3.625 *** -0.816 0.046 0.169 ** 3.579 *** -0.985approp_resp 3.087 *** 1.048 0.517 *** -0.087 2.570 *** 1.135excel_qual 4.947 *** 0.363 1.051 *** -0.182 * 3.896 *** 0.546
salary 1100 4.768 *** -2.800 ***normalized 2.481 *** -1.606 *** 0.930 *** -0.382 *** 1.551 *** -1.225 **
loglikelihood -6250.5 -3029.26 -9279.76No. obs. 45204 11244 56448No. inds. 236 236 236No. pars. 50 28 78
*** indicates significant at 1%, ** at 5% and * at 10%. All standard errors are clustered at the individual level.Coeffs for BWA are normalized to be comparable to BWJ coeffs.
BWA BWJ BWA-BWJClogit with additive shifts in all coeffs across waves and methods, dummy vars specification for salary
Heteroskedastic models
• These models are parsimonious (useful for models with heterogeneity). • Hybrid models with additive shifts in salary perform fairly well.
• The estimated shifts in error variances with test of H0: ratio=1
– Var BWJ W1/ Var BWJ W2 = 0.906 Χ2(1)=2.76, p-value=0.097
– Var BWJ W1/ Var BWA W1 = 6.415 Χ2(1)=122.5, p-value=0.000
– Var BWA W1/ Var BWA W2 = 1.001 Χ2(1)=0.00, p-value=0.9878
• Some weak evidence of heteroskedasticity across waves in BWJ, none in BWA
Comparisons across methods
• More stability across waves in BWA than BWJ (both with additive shifts and heteroskedastic models) except for salary
• Relative shift in salary across waves is larger in BWA than BWJ
To Do
• Estimate models with unobserved heterogeneity • Use of all info (best and worst job/attribute choices) • Compute and compare WTP measures • Look at observed heterogeneity w.r.t. interesting characteristics (graduates
vs students)
Discussion
• How do we elicit preferences on MU of income? Different methods yield different estimates for the same people at the same time.
• The greater volatility of MU of income across waves for a balanced sample also raises questions of credibility and usefulness of stated preferences, especially WTP measures.
• Are these results specific to this context? External validity.
• Do the results hold for simpler goods?
• How could we look at this issue in the lab?
• Implications for revealed preference estimates?
Job attributes
4 weekly salary levels and 2 levels of 11 non-pecuniary attributes
Attribute name Levels Location Private hospital; Public hospital Clinical rotations None; Three Work hours Fulltime only; Part-time or fulltime Rostering Inexible, does not allow requests; Flexible, usually accommodating requests Staffing levels Frequently short of staff; Usually well-staffed Workplace culture Unsupportive management & staff; Supportive management & staff Physical environment Poorly equipped & maintained facility; Well equipped & maintained facility Professional development and progression
No encouragement for nurses; Nurses encouraged
Parking Limited; Abundant and safe Responsibility Too much responsibility; Appropriate responsibility Quality of care Poor; Excellent Salary $800 ; $950 ; $1,100 ; $1,250