+ All Categories
Home > Documents > Risk scoring Allan Wardhaugh. Why bother? Comparison of performance between units Comparison of...

Risk scoring Allan Wardhaugh. Why bother? Comparison of performance between units Comparison of...

Date post: 17-Dec-2015
Category:
Upload: benedict-reynolds
View: 217 times
Download: 0 times
Share this document with a friend
48
Risk scoring Risk scoring Allan Wardhaugh Allan Wardhaugh
Transcript
Page 1: Risk scoring Allan Wardhaugh. Why bother? Comparison of performance between units Comparison of performance between units Used in RCT to adjust for case-mix.

Risk scoringRisk scoring

Allan WardhaughAllan Wardhaugh

Page 2: Risk scoring Allan Wardhaugh. Why bother? Comparison of performance between units Comparison of performance between units Used in RCT to adjust for case-mix.

Why bother?Why bother?

• Comparison of performance between Comparison of performance between unitsunits

• Used in RCT to adjust for case-mixUsed in RCT to adjust for case-mix

Page 3: Risk scoring Allan Wardhaugh. Why bother? Comparison of performance between units Comparison of performance between units Used in RCT to adjust for case-mix.

Standardised Mortality RatioStandardised Mortality Ratio

• Measured mortalityMeasured mortality

• Predicted mortality – risk adjustment Predicted mortality – risk adjustment tooltool

• SMR = Measured/ PredictedSMR = Measured/ Predicted

– SMR > 1SMR > 1 performing poorlyperforming poorly– SMR < 1SMR < 1 performing well performing well

Page 4: Risk scoring Allan Wardhaugh. Why bother? Comparison of performance between units Comparison of performance between units Used in RCT to adjust for case-mix.

Making a risk adjustment Making a risk adjustment tooltool

Page 5: Risk scoring Allan Wardhaugh. Why bother? Comparison of performance between units Comparison of performance between units Used in RCT to adjust for case-mix.

Regression statisticsRegression statistics

• Target variable – ‘dependent variable’Target variable – ‘dependent variable’• Predictors – ‘independent variables’Predictors – ‘independent variables’• Regression statistics use association Regression statistics use association

between variables to predict one (DV) from between variables to predict one (DV) from another (IV).another (IV).

• Simplest form Simplest form y = by = b00 + b + b11(x)(x) where y = predicted value, bwhere y = predicted value, b00= regression constant, b= regression constant, b11= =

regression coefficientregression coefficient

• Multiple regressionMultiple regression

y = by = b00+b+b11(x(x11)+b)+b22(x(x22)+…b)+…bnn(x(xnn))

Page 6: Risk scoring Allan Wardhaugh. Why bother? Comparison of performance between units Comparison of performance between units Used in RCT to adjust for case-mix.

Regression statistics - Regression statistics - logisticlogistic• Linear multiple regressionLinear multiple regression

– DV and IV quantitativeDV and IV quantitative• For non-quantitative DV (e.g. dead/alive), logistic For non-quantitative DV (e.g. dead/alive), logistic

regression is usedregression is used– Relationship with IV may be non-linearRelationship with IV may be non-linear

• For each IV, odds are calculated for likelihood of For each IV, odds are calculated for likelihood of having DVhaving DV

• Odds very assymetrical Odds very assymetrical – very small number (0 – 1) if event unlikelyvery small number (0 – 1) if event unlikely– very large if event likely (>1 - ∞)very large if event likely (>1 - ∞)

• Rectified by using natural log off odds – called Rectified by using natural log off odds – called logit – logit – makes it a linear functionmakes it a linear function

Page 7: Risk scoring Allan Wardhaugh. Why bother? Comparison of performance between units Comparison of performance between units Used in RCT to adjust for case-mix.

Regression statistics - logitRegression statistics - logit

• Logit = bLogit = b00+b+b11(x(x11)+b)+b22(x(x22)+…b)+…bnn(x(xnn))

• Probability = odds/(1 + odds)Probability = odds/(1 + odds)

• Logit = ln oddsLogit = ln odds

p = ep = elogitlogit/(1 + e/(1 + elogitlogit))

Page 8: Risk scoring Allan Wardhaugh. Why bother? Comparison of performance between units Comparison of performance between units Used in RCT to adjust for case-mix.

PRISMPRISM

• Pediatric Risk of Mortality ScorePediatric Risk of Mortality Score

• 14 physiological variables 14 physiological variables – Worst measurement in first 24 hoursWorst measurement in first 24 hours– Now on PRISM III – relies on scores in first 12 or Now on PRISM III – relies on scores in first 12 or

24 hours24 hours

• Probability of PICU deathProbability of PICU death

= e= eRR/1 + e/1 + eRR

Where R = 0.207 Where R = 0.207 PRISM – 0.005 PRISM – 0.005 age(mo) – 0.433 age(mo) – 0.433 operative status – 4.782 operative status – 4.782

Page 9: Risk scoring Allan Wardhaugh. Why bother? Comparison of performance between units Comparison of performance between units Used in RCT to adjust for case-mix.

PRISM – examplePRISM – example60 month old non-surgical patient60 month old non-surgical patient

PRISM ScorePRISM Score Mortality Risk (%)Mortality Risk (%)

33 1.61.6

66 2.72.7

99 55

1212 8.98.9

1515 15.315.3

1818 25.225.2

2121 38.638.6

2424 46.146.1

2727 68.568.5

3030 80.280.2

Page 10: Risk scoring Allan Wardhaugh. Why bother? Comparison of performance between units Comparison of performance between units Used in RCT to adjust for case-mix.

PRISM - disadvantagesPRISM - disadvantages

• Data collection cumbersome (14 Data collection cumbersome (14 variables over a 24 hour period)variables over a 24 hour period)

• May diagnose death rather than predict May diagnose death rather than predict it (40% deaths occur in first 24 hours)it (40% deaths occur in first 24 hours)

• Score may not allow comparison Score may not allow comparison between units – patients poorly between units – patients poorly managed in first 24 hours will develop managed in first 24 hours will develop high PRISM score, so disease severity high PRISM score, so disease severity will appear to be greaterwill appear to be greater

Page 11: Risk scoring Allan Wardhaugh. Why bother? Comparison of performance between units Comparison of performance between units Used in RCT to adjust for case-mix.

PRISM IIIPRISM III

Page 12: Risk scoring Allan Wardhaugh. Why bother? Comparison of performance between units Comparison of performance between units Used in RCT to adjust for case-mix.

PIM – Paediatric Index of PIM – Paediatric Index of Mortality – initial cohortsMortality – initial cohorts

•678 consecutive admissions PICU 678 consecutive admissions PICU RCHM 1988RCHM 1988

•814 consecutive admissions 814 consecutive admissions RCHM 1990RCHM 1990

•1412 consecutive admissions 1412 consecutive admissions 1994–5 RCHM1994–5 RCHM

Page 13: Risk scoring Allan Wardhaugh. Why bother? Comparison of performance between units Comparison of performance between units Used in RCT to adjust for case-mix.

PIM – identifying variablesPIM – identifying variables

•Data collected for admission (for Data collected for admission (for most) and first 24 hoursmost) and first 24 hours

•34 Physiological Stability Index 34 Physiological Stability Index measurementsmeasurements

•MAP, PIP, PEEP, and othersMAP, PIP, PEEP, and others

•Worst value in first 24 hours for Worst value in first 24 hours for allall

Page 14: Risk scoring Allan Wardhaugh. Why bother? Comparison of performance between units Comparison of performance between units Used in RCT to adjust for case-mix.

PIM – derivation of modelPIM – derivation of model

• All PRISM data collected plus additional All PRISM data collected plus additional informationinformation

• Univariate analysis carried out on all Univariate analysis carried out on all factors to test for association with factors to test for association with mortalitymortality (Chi squared dichotomous variables, (Chi squared dichotomous variables, Copas p by x plots continuous variables)Copas p by x plots continuous variables)

• Factors not associated (p>0.1) excluded Factors not associated (p>0.1) excluded from further analysisfrom further analysis

• Logistic regression analysis used to Logistic regression analysis used to derive preliminary modelderive preliminary model..

Page 15: Risk scoring Allan Wardhaugh. Why bother? Comparison of performance between units Comparison of performance between units Used in RCT to adjust for case-mix.

PIM – testing the modelPIM – testing the model

• Learning and Test cohortsLearning and Test cohorts– 1994 – 96 5695 patients in 8 PICUs (Australia, 1994 – 96 5695 patients in 8 PICUs (Australia,

Birmingham)Birmingham)• Enough patients in each unit to include 20 deaths.Enough patients in each unit to include 20 deaths.• Learning sample data analysed to calculate Learning sample data analysed to calculate

regression coefficientsregression coefficients• Model then tested on test sample, and examined for Model then tested on test sample, and examined for

goodness of fit.goodness of fit.• Regression coefficients re-estimated using all 8 Regression coefficients re-estimated using all 8

units for final model.units for final model.• Risk of death assigned to 5 groups - <1%, 1–4%, 5–Risk of death assigned to 5 groups - <1%, 1–4%, 5–

14%, 15–29% and 30%14%, 15–29% and 30%

Page 16: Risk scoring Allan Wardhaugh. Why bother? Comparison of performance between units Comparison of performance between units Used in RCT to adjust for case-mix.

PIM - resultsPIM - results

Page 17: Risk scoring Allan Wardhaugh. Why bother? Comparison of performance between units Comparison of performance between units Used in RCT to adjust for case-mix.
Page 18: Risk scoring Allan Wardhaugh. Why bother? Comparison of performance between units Comparison of performance between units Used in RCT to adjust for case-mix.
Page 19: Risk scoring Allan Wardhaugh. Why bother? Comparison of performance between units Comparison of performance between units Used in RCT to adjust for case-mix.

PIM – final equationsPIM – final equations

• elogit/(1+elogit)

• Logit = (2.357.pupils)+(1.826.specified diagnosis)+(–1.552.elective admission)+(1.342.mechanical ventilation)+(0.021.(SBP–120))+(0.071.Baseex) +(0.415.(100.FiO2/PaO2))–4.873

Page 20: Risk scoring Allan Wardhaugh. Why bother? Comparison of performance between units Comparison of performance between units Used in RCT to adjust for case-mix.

UHW PICU PIMUHW PICU PIM

0

0.2

0.4

0.6

0.8

1

1.2

Rel

ativ

e ri

sk

<1% 1-4% 5-14% 15-29% >30%

Probability of Death (by percentage risk)

Page 21: Risk scoring Allan Wardhaugh. Why bother? Comparison of performance between units Comparison of performance between units Used in RCT to adjust for case-mix.

PIM and PRISM comparedPIM and PRISM compared

• Variables used by PIM that are not used by PRISM are– presence of a specified

diagnosis– use of mechanical

ventilation– plasma base excess

• Variables used by PRISM that are not used by PIM– diastolic blood pressure,

heart rate – respiratory rate, pCO2– the Glasgow Coma Score

(three separate variables)– prothrombin time, serum

bilirubin, serum potassium, serum calcium, blood glucose and plasma bicarbonate

Page 22: Risk scoring Allan Wardhaugh. Why bother? Comparison of performance between units Comparison of performance between units Used in RCT to adjust for case-mix.

PRISM vs PIMPRISM vs PIM

• PRISM predicted 66% more deaths in this PRISM predicted 66% more deaths in this samplesample

• Score altered by treatment in the first 24 hoursScore altered by treatment in the first 24 hours

• May diagnose rather than predict deathMay diagnose rather than predict death

• PRISM III data requires 96 measured variablesPRISM III data requires 96 measured variables

• License requiredLicense required

• Note that neither are adequate fro individual Note that neither are adequate fro individual case prediction – apply to populations onlycase prediction – apply to populations only

Page 23: Risk scoring Allan Wardhaugh. Why bother? Comparison of performance between units Comparison of performance between units Used in RCT to adjust for case-mix.

PIM - recalibrationPIM - recalibration

• PICU outcomes change with timePICU outcomes change with time

• Referral patterns change with timeReferral patterns change with time

• Attitudes to withdrawing and limiting Attitudes to withdrawing and limiting care may change with timecare may change with time

Page 24: Risk scoring Allan Wardhaugh. Why bother? Comparison of performance between units Comparison of performance between units Used in RCT to adjust for case-mix.

PIM 2PIM 2

• 14 PICUs14 PICUs– 8 Australia8 Australia– 4 UK4 UK– 2 NZ2 NZ

• 20 787 patients 1997-199820 787 patients 1997-1998

• Units randomly assigned to be Units randomly assigned to be learning sample or testing sample for learning sample or testing sample for new modelnew model

Page 25: Risk scoring Allan Wardhaugh. Why bother? Comparison of performance between units Comparison of performance between units Used in RCT to adjust for case-mix.

PIM 2PIM 2

• PIM applied to new population (all units)PIM applied to new population (all units)– Observed to expected deathsObserved to expected deaths

• Poorly performing variables altered to Poorly performing variables altered to make prediction bettermake prediction better

• Re-tested by forward and backward Re-tested by forward and backward logistic regression to produce new modellogistic regression to produce new model

• New model applied to learning sample – New model applied to learning sample – coeffciients adjusted and applied to testing coeffciients adjusted and applied to testing samplesample

Page 26: Risk scoring Allan Wardhaugh. Why bother? Comparison of performance between units Comparison of performance between units Used in RCT to adjust for case-mix.

Calibration findingsCalibration findings

• Specific diagnosisSpecific diagnosis– Resp illness O:E 160:212Resp illness O:E 160:212– Non-cardiac post-op O:E 48:82Non-cardiac post-op O:E 48:82

• 293 coded diagnostic categories examined293 coded diagnostic categories examined– In-hospital cardiac arrest associated with increaed risk of In-hospital cardiac arrest associated with increaed risk of

deathdeath– Asthma. Bronchiolitis, croup, obstructive sleep apnoea, Asthma. Bronchiolitis, croup, obstructive sleep apnoea,

DKA associated with reduced riskDKA associated with reduced risk

• New ‘high risk’ and ‘low risk’ categories New ‘high risk’ and ‘low risk’ categories introducedintroduced

• Post – op subdivided into with or without CBP.Post – op subdivided into with or without CBP.• IQ <35 omitted (difficult to code reliably)IQ <35 omitted (difficult to code reliably)

Page 27: Risk scoring Allan Wardhaugh. Why bother? Comparison of performance between units Comparison of performance between units Used in RCT to adjust for case-mix.

SMRSMR

•Australia and New Zealand SMR 0.84 (0.76–0.92)

•UK 0.89 (0.77–1.00).

Page 28: Risk scoring Allan Wardhaugh. Why bother? Comparison of performance between units Comparison of performance between units Used in RCT to adjust for case-mix.

New coefficientsNew coefficients

Page 29: Risk scoring Allan Wardhaugh. Why bother? Comparison of performance between units Comparison of performance between units Used in RCT to adjust for case-mix.
Page 30: Risk scoring Allan Wardhaugh. Why bother? Comparison of performance between units Comparison of performance between units Used in RCT to adjust for case-mix.
Page 31: Risk scoring Allan Wardhaugh. Why bother? Comparison of performance between units Comparison of performance between units Used in RCT to adjust for case-mix.

UUnited nited KKingdomingdom PPaediatric aediatric IIntensive ntensive CCare are

OOutcome utcome SStudytudyUK PICOS (phase I)UK PICOS (phase I)

Page 32: Risk scoring Allan Wardhaugh. Why bother? Comparison of performance between units Comparison of performance between units Used in RCT to adjust for case-mix.

Mortality ratio calculated using the UK PICOS calibration of PIM in the UK.

Upper and lower control limits represent a 99.9% confidence interval around a mortality ratio of 1 based on the UK PICOS overall mortality of 6.2%..

PIM mortality ratio (observed/expected unit deaths) by unit. Generated using UK PICOS recalibration

.5

1

2M

ort

alit

y ra

tio

200 400 600 800 1000Number of admissions

Your unit Other units Control limits

Page 33: Risk scoring Allan Wardhaugh. Why bother? Comparison of performance between units Comparison of performance between units Used in RCT to adjust for case-mix.

Phase I outcomePhase I outcome

• PRISM III 24 hour score re-calibrated PRISM III 24 hour score re-calibrated for UKfor UK

• Performance of PIM-2 and PRISM III Performance of PIM-2 and PRISM III very similarvery similar

• PIM – 2 recommended as model of PIM – 2 recommended as model of choice as data easier to collectchoice as data easier to collect

Page 34: Risk scoring Allan Wardhaugh. Why bother? Comparison of performance between units Comparison of performance between units Used in RCT to adjust for case-mix.

• DoH/ WAG fundedDoH/ WAG funded

• Run from Universities of Sheffield, Leicester Run from Universities of Sheffield, Leicester and Leedsand Leeds

• First annual report March 2003 – February First annual report March 2003 – February 20042004

Page 35: Risk scoring Allan Wardhaugh. Why bother? Comparison of performance between units Comparison of performance between units Used in RCT to adjust for case-mix.
Page 36: Risk scoring Allan Wardhaugh. Why bother? Comparison of performance between units Comparison of performance between units Used in RCT to adjust for case-mix.
Page 37: Risk scoring Allan Wardhaugh. Why bother? Comparison of performance between units Comparison of performance between units Used in RCT to adjust for case-mix.

PELODPELOD

• Death is relatively infrequent Death is relatively infrequent outcome (6%) in PICUoutcome (6%) in PICU– Sample sizes needed for trials need to Sample sizes needed for trials need to

be large to detect different outcomesbe large to detect different outcomes

• MODS more prevalent (11 – 27%)MODS more prevalent (11 – 27%)– Correlates well with risk of deathCorrelates well with risk of death– Good proxy outcome measure for risk of Good proxy outcome measure for risk of

deathdeath

Page 38: Risk scoring Allan Wardhaugh. Why bother? Comparison of performance between units Comparison of performance between units Used in RCT to adjust for case-mix.

PELODPELOD

• Prospective study – 7 PICUS France, Prospective study – 7 PICUS France, Canada, SwitzerlandCanada, Switzerland

• 18months 11998 – 200018months 11998 – 2000

• 1806 patients (<18yrs)1806 patients (<18yrs)

Page 39: Risk scoring Allan Wardhaugh. Why bother? Comparison of performance between units Comparison of performance between units Used in RCT to adjust for case-mix.
Page 40: Risk scoring Allan Wardhaugh. Why bother? Comparison of performance between units Comparison of performance between units Used in RCT to adjust for case-mix.
Page 41: Risk scoring Allan Wardhaugh. Why bother? Comparison of performance between units Comparison of performance between units Used in RCT to adjust for case-mix.
Page 42: Risk scoring Allan Wardhaugh. Why bother? Comparison of performance between units Comparison of performance between units Used in RCT to adjust for case-mix.

probability of death=1/(1+exp [7·64–0·30PELOD score])

Page 43: Risk scoring Allan Wardhaugh. Why bother? Comparison of performance between units Comparison of performance between units Used in RCT to adjust for case-mix.
Page 44: Risk scoring Allan Wardhaugh. Why bother? Comparison of performance between units Comparison of performance between units Used in RCT to adjust for case-mix.

League tablesLeague tables

• Governments like themGovernments like them

• Journalists like themJournalists like them

• Local politicians like themLocal politicians like them

• Patients groups like themPatients groups like them

Do any of the above understand Do any of the above understand them?them?

Page 45: Risk scoring Allan Wardhaugh. Why bother? Comparison of performance between units Comparison of performance between units Used in RCT to adjust for case-mix.

• 9 NICUs over 6 years9 NICUs over 6 years• Crude and risk adjusted (CRIB score) Crude and risk adjusted (CRIB score)

mortalitymortality• Hospitals ranked in league tables Hospitals ranked in league tables

each year according to W score each year according to W score – W= 100 (observed - expected

deaths)/No of admissions.– Mortality lower than expected if W < 0

Page 46: Risk scoring Allan Wardhaugh. Why bother? Comparison of performance between units Comparison of performance between units Used in RCT to adjust for case-mix.

ResultsResults

Page 47: Risk scoring Allan Wardhaugh. Why bother? Comparison of performance between units Comparison of performance between units Used in RCT to adjust for case-mix.

ConclusionsConclusions

• Hospitals varied annually in their league Hospitals varied annually in their league positionposition

• Confidence intervals for W scores overlapped Confidence intervals for W scores overlapped for all hospital every year except year 3for all hospital every year except year 3

• ‘Overall, hospital 1 did perform significantly better than expected but it is debatable whether this makes it a model hospital since its performance was inconsistent’.

Page 48: Risk scoring Allan Wardhaugh. Why bother? Comparison of performance between units Comparison of performance between units Used in RCT to adjust for case-mix.

SummarySummary

• PIM/ PIM 2 data easy to collect PIM/ PIM 2 data easy to collect

• Useful in comparing unit performanceUseful in comparing unit performance

• Interpret with care if number of deaths low Interpret with care if number of deaths low (especially <20).(especially <20).

• Not for use as an individual prediction testNot for use as an individual prediction test

• Important to complete as accurately as possibleImportant to complete as accurately as possible

• PICANET randomly check to ensure data qualityPICANET randomly check to ensure data quality

• League tables are unreliableLeague tables are unreliable


Recommended