Comparison of the C-statistic with new model discriminators in the prediction of long versus short hospital stay Richard J Woodman 1 , Campbell H Thompson 2 , Susan W Kim 1 , Paul Hakendorf 3 . 1 Flinders Centre for Epidemiology and Biostatistics, Flinders University, Adelaide 2 Discipline of General Medicine, Adelaide University, Adelaide 3 Redesigning Care, Flinders Medical Centre, Adelaide 2011 Australia and New Zealand Stata Users Group meeting 17 th September 2011
Transcript
Slide 1
Comparison of the C-statistic with new model discriminators in
the prediction of long versus short hospital stay Richard J Woodman
1, Campbell H Thompson 2, Susan W Kim 1, Paul Hakendorf 3. 1
Flinders Centre for Epidemiology and Biostatistics, Flinders
University, Adelaide 2 Discipline of General Medicine, Adelaide
University, Adelaide 3 Redesigning Care, Flinders Medical Centre,
Adelaide 2011 Australia and New Zealand Stata Users Group meeting
17 th September 2011
Slide 2
Meaningful new risk predictors Traditionally rely on the
Concordance statistic (C-statistic / ROC) for assessing usefulness
of new predictive measures C-statistic Measures overall test/model
accuracy (sensitivity/specificity) A weighted average of
sensitivity over all possible cut-points Weighted by pdf of
non-events High sensitivities (low cut-points) have high weights
Probability Interpretation: the probability of assigning a greater
risk to a randomly selected patient with the event compared with a
randomly selected patient without the event. P(p event > p
non-event ) for random pair Usefulness of new predictors ^ ^
Slide 3
Receiver Operating Curve (ROC) True positive rate False
positive rate C-statistic Interpretation: Increase in probability
that a random event subject will have a higher predicted p than a
random non-event subject. Usually small after a few good predictors
included in the model Predicted p
Slide 4
Clinicians want to know whether an added predictor will change
risk such that they should treat patients differently Can we better
quantify improvement in risk prediction from new biomarkers? Net
Reclassification Improvement (NRI) Integrated Discrimination
Improvement (IDI) Pencina, Agostino et al., Statist. Med. 2008;
27:157-172. How do they differ from the C-statistic? How and when
should we be using them? New Risk reclassification measures
Slide 5
NRI can be calculated as a sum of two separate components: one
for individuals with events and the other for individuals without
events For events, assign 1 for upward reclassification, -1 for
downward and 0 for people who do not change their risk category The
opposite is done for non-events Sum the individual scores and
divide by numbers of people in each group Net Reclassification
Improvement
Slide 6
Category-free NRI Calculate p 1 and p 2 (Old model=p 1 New
model=p 2 ) Event NRI = P(up l event) P(down l event) Non-event NRI
= P(down l nonevent) P(up l nonevent) NRI= Event NRI+Non-event
NRI(Pencina 2008) Or NRI(Pencina 2010) Or wNRI(Pencina 2010)
Slide 7
Absolute IDI: Probability difference in discrimination slopes
(mean difference in p between events and non- events). = (p 2E - p
2NE ) - (p 1E - p 1NE ) = (p 2E - p 1E ) - (p 2NE - p 1NE )
Relative IDI = (p 2E - p 2NE )/(p 1E - p 1NE ) Integrated
Discrimination Improvement (IDI)
Slide 8
Recent example JACC 2011; 58(10): 1025-33. August 2011
Predicting length of hospital stay Short-stay wards necessary
due to bed shortages in specialist wards But incorrectly assign
patients to short-stay Would overfill short stay units Prevent
correct treatment for long stay patients Clinicians trained to
diagnose and treat not to predict length of stay Few variables
beyond age appear informative
Slide 15
Dataset 3 major hospitals FMC RGH Auckland N=1457 General
medical patients Complete data on: Age SBP HR RR Mobility WBC count
Cardiac failure (CF) Need for supplementary oxygen (SuO 2 ) All
previously collected for predicting outcome Modified Early Warning
Score (MEWS) Used by Emergency Medical Services to quickly
determine risk of death SBP HR RR Temperature
Slide 16
Logistic regression model for predicting p: P(long stay)
Scaling using 2 STATA commands: lintrend (Joanne Garrett Univ North
Carolina) fracpoly (Patrick Royston) Calibration HL-deciles and LR
tests Measures of Discrimination C-statistic IDI Category-dependent
NRI 50% cut-off 57% cut-off Category free NRI Statistical
Analysis
Slide 17
lintrend longstay age, round(10) plot(log) xlab ylab STATA
lintrend command log odds age
. fracpoly logistic longstay wbc, table compare........ ->
gen double Iwbc__1 = X^.5-.9876731667 if e(sample) -> gen double
Iwbc__2 = X^.5*ln(X)+.0245010876 if e(sample) (where: X = wbc/10)
Logistic regression Number of obs = 1457 LR chi2(2) = 49.38 Prob
> chi2 = 0.0000 Log likelihood = -971.8662 Pseudo R2 = 0.0248
------------------------------------------------------------------------------
longstay | Odds Ratio Std. Err. z P>|z| [95% Conf. Interval]
-------------+----------------------------------------------------------------
Iwbc__1 |.0040704.0076682 -2.92 0.003.0001014.1633818 Iwbc__2 |
34.78284 33.17947 3.72 0.000 5.362915 225.5948
------------------------------------------------------------------------------
Deviance: 1943.73. Best powers of wbc among 44 models fit:.5.5.
Fractional polynomial model comparisons:
--------------------------------------------------------------- wbc
df Deviance Dev. dif. P (*) Powers
--------------------------------------------------------------- Not
in model 0 1993.113 49.380 0.000 Linear 1 1954.819 11.087 0.011 1 m
= 1 2 1949.234 5.502 0.064 2 m = 2 4 1943.732 -- --.5.5
--------------------------------------------------------------- (*)
P-value from deviance difference comparing reported model with m =
2 model Fracpoly WBC
Slide 20
Odds ratio95% CIP-value Age (yrs)1.071.04-1.10
Calibration number of observations = 1457 number of groups = 10
Hosmer-Lemeshow chi2(8) = 14.66 Prob > chi2 = 0.07 number of
observations = 1457 number of groups = 5 Hosmer-Lemeshow chi2(3) =
5.64 Prob > chi2 = 0.13 number of observations = 1457 number of
covariate patterns = 1457 Pearson chi2(1445) = 1486.69 Prob >
chi2 = 0.22
Slide 22
#Compare Age with Age + Heart rate using roccomp quietly
logistic longstay age predict p1 if e(sample),p quietly logistic
longstay c.age##c.hrby10 predict p2 if e(sample),p roccomp longstay
p1 p2 ROC -Asymptotic Normal-- Obs Area Std. Err. [95% Conf.
Interval]
-------------------------------------------------------------------------
p1 1457 0.7167 0.0136 0.69000 0.74338 p2 1457 0.7433 0.0131 0.71767
0.76897
-------------------------------------------------------------------------
Ho: area(p1) = area(p2) chi2(1) = 15.68 Prob>chi2 = 0.0001
C-statistic
Slide 23
P(p event > p non-event ) for random pair ~ 2.5% ROC curves
Age Area ROC=0.717 Age + heart rate Area ROC=0.743 ^^
Slide 24
Sensitivity and Specificity Improved sensitivity only at high
cut-points. C-statistic weights large sensitivities more heavily
May be why improvements in sensitivities with later predictors dont
translate to increased C.
Slide 25
Distribution of probabilities shift lower Distribution of
probabilities flatten Predicted probabilities
Slide 26
User written Author Liisa Byberg, Department of Surgical
Sciences, Orthopedics unit, and Uppsala Clinical Research Center,
Uppsala University, Sweden type net from
http://www.ucr.uu.se/sv/images/stories/downloadshttp://www.ucr.uu.se/sv/images/stories/downloads
Syntax nri1 depvar varlist1, prvars(varlist2) cut(#) nri2 depvar
varlist1, prvars(varlist2) cut(# #) nri3 depvar varlist1,
prvars(varlist2) cut(# # #) STATA NRI command
STATA IDI command syntax idi depvar varlist1,prvars(varlist2)
idi longstay age,prvars(hrby10 agehrby10)
---------------------------------------------------- IDI | Estimate
Std. Err. P-value
----------+----------------------------------------- | 0.04195
0.00525 0.00000
---------------------------------------------------- Definition:
IDI= (IS 2 IS 1 ) (IP 2 IP 1 ) IDI = (p 2 -p 1 )events - (p 2 -p 1
)non-events IS = sensitivity IP = (1 specificity)
Slide 29
Predicted probabilities and the IDI IDI interpretation:
Improvement in average sensitivity plus any potential decrease in
average (1-specificty). Magnitude is hard to interpret. Some
studies also present relative IDI (%).
Slide 30
C-Statistic IDI HRMobilityBPWBC RRCCFSupp_O2
Slide 31
NRI50 NRI57 HRMobilityBPWBC RRCCFSupp_O2 Effect of each
variable on re-classification depends on the classification cut-
point Small changes in chosen cut-point can have large
influences
Slide 32
HRMobilityBPWBC RRCCFSupp_O2 Overall Category-free NRI
Interpretation: proportion of subjects with movement of p in the
correct direction averaged for event and non-event subjects.
Slide 33
Category-free Event NRICategory-free Non-Event NRI
HRMobilityBPWBC RRCCFSupp_O2 Pr(p is higher-p is lower) mostly
poorer re-classification Pr(p is lower- p is higher) consistently
improved re-classification Interpretation: Net movement of ps in
the correct direction - for event and non-event subjects
separately.
Slide 34
Proportion of long-stay whose p went up Proportion of
short-stay whose p went down HRMobilityBPWBC RRCCFSupp_O2 Mostly
< 50% with each new variable Consistently > 50% with each new
variable
Slide 35
Summary IDI Mirrored the C-statistic but was more sensitive.
Equally weights sensitivity across cut-points. C-statistic weights
large sensitivities more heavily. Category-dependent NRI The
variables selected were heavily dependent on the chosen cut-points
Fewer variables identified as important discriminators than for
either the C-statistic, the IDI or category-free NRI. Category-free
NRI Overall, quite similar results to the C-statistic and IDI Very
different performances amongst the short-stay and long- stay
patients
Slide 36
Conclusions Discrimination statistics cannot be used
interchangeably May be necessary to present all 4 for greatest
insight. C-statistic: Averaged sensitivity Does not weight equally
across cut-points Does not assess risk re-classification. IDI:
Averaged sensitivity Weights cut-points equally Adjusts for
specificity differently to C-statistic May better highlight
potentially important predictors. Category-free NRI: % subjects
with correct movement in p. Event and non-event NRI may perform
quite differently Category-dependent NRI: % correct movement across
categories. Results may be heavily influenced by chosen cut-points.
Be wary of studies using the category-dependent NRI with non
predefined cut-points.