Potential usefulness of a framework of 7 steps for prediction models

Potential usefulness of a framework of 7 steps for prediction models

Ewout Steyerberg

Professor of Medical Decision Making

Dept of Public Health, Erasmus MC, Rotterdam, the Netherlands

Oberwolfach, Jan 28, 2010

Erasmus MC – University Medical Center Rotterdam

Overview

Background: “Oberwolfach in the mountains”

A framework to develop prediction models

Potential usefulness

Discussion: how to improve prediction research

Oberwolfach

Workshop «Statistical Issues in Prediction», personal aims

Meet and listen to other researchers

Go cross-country skiing

Sell book

Get back on track with work in ‘TO DO’ box

Presentation options

Theoretical challenges

Practical challenges

Problems in prediction models

1. Predictor selection: we all present something new

2. Methodological problems

Missing values

Optimal recoding and dichotomization

Stepwise selection, relatively small data sets

Presentation

Validation

Potential solutions

Awareness and education

Scientific progress required

Translation to practice

Epidemiologists/clinicians interested in prediction modeling

Statisticians not interested in prediction modeling

Reporting guidelines

Not yet available

Study protocol registration

Possible, rare

http://www.clinicalpredictionmodels.org

http://www.springer.com/978-0-387-77243-1

Proposed modeling framework

Aim: knowledge on predictors, or provide predictions?

Predictors (Prognostic factors)

Traditional (demographics, patient, disease related)

Modern (‘omics’, biomarkers, imaging)

Challenges:

Testing: independent effect, adjusted for confounders

Estimation: correct functional form

Predictions

Pragmatic combination of predictors

Essentially an estimation problem

Prognostic modelling checklist:intended to assist in developing a valid prediction model

Step Specific issues General considerations Research question Aim: predictors / prediction? Intended application? Clinical practice / research? Outcome Clinically relevant? Predictors Reliable measurement?

Comprehensiveness Study design Retrospective/prospective?

Cohort; case-control Statistical model Appropriate for research question and type of

outcome? Sample size Sufficient for aim? 7 modeling steps 1. Preliminary Missing values 2. Coding of predictors Continuous predictors

Combining categorical predictors Combining predictors with similar effects

3. Model specification Appropriate selection of main effects? Assessment of a assumptions (distributional, linearity and additivity)?

4. Model estimation Shrinkage included? External information used?

5. Model performance Appropriate statistical measures used? Clinical usefulness considered?

6. Model validation Internal validation including model specification and estimation? External validation?

7. Model presentation Format appropriate for audience? Validity Internal: overfitting Sufficient attempts to limit and correct for

overfitting? External: generalizability Predictions valid for plausibly related populations?

Prognostic modeling checklist: general considerations












Prognostic modeling checklist: 7 steps












Prognostic modeling checklist: validity












Usefulness of framework

Checklist for model building

SMART data, survival after cardiovascular event, 2008

Critical assessment of model building

GUSTO-I model, Lee 1995

Example: prediction of myocardial infarction outcome

Aim: predictors or predictions?

Title: predictions vs

text: prediction

Additional publication focuses at clinicians

Predictors

Categories Examples Demographics Age, sex, weight, height, geographical site Risk factors Diabeters, hypertension, smoking status, hypercholesterolemia, family

history of MI Other history Previous MI, angina, cereborvascular disease (e.g. stroke), bypass

surgery, angioplasty Cardiac state Location of infarction, Electrocardiogram abnormalities Presenting characteristics

Systolic and diastolic blood pressure, heart rate, left ventricular function (e.g. presence of shock, Killip class)

General considerations in GUSTO-I model

Step Specific issues GUSTO-I model General considerations Research question Aim: predictors / prediction? Both Intended application Clinical practice / research? Clinical practice Outcome Clinically relevant? 30-day mortality Predictors Reliable measurement?

Comprehensiveness Standard clinical work-up; extensive set of candidate predictors

Study design Retrospective/prospective? Cohort; case-control

RCT data: prospective cohort

Statistical model Appropriate for research question and type of outcome?

Logistic regression

Sample size Sufficient for aim? >40,000 patients; 2851 events: excellent

7 modeling steps 1. Preliminary Inspection of data

Missing values Table 1 (here: Table 22.3) Single imputation

2. Coding of predictors Continuous predictors Combining categorical predictors Combining predictors with similar effects

Extensive checks of transformations for continuous predictors Categories kept separate

3. Model specification Appropriate selection of main effects? Assessment of assumptions (distributional, linearity and additivity)?

Stepwise selection Additivity checked with interaction terms, one included


Not necessary No

5. Model performance Appropriate measures used? Calibration and discrimination 6. Model validation Internal validation including

model specification and estimation? External validation?

Bootstrap and 10 fold cross-validation No external validation

7. Model presentation Format appropriate for audience

No; formula in appendix; later paper focused on clinical application

Validity Internal: overfitting Sufficient attempts to limit and

correct for overfitting? Large sample size, predictors from literature

External: generalizability Predictions valid for plausibly related populations?

Large set of predictors, representing important domains; not assessed in this study

1. Data inspection, specifically: missing values

Among the array of clinical characteristics considered potential predictor

variables in the modeling analyses were occasional patients with missing

values. Although a full set of analyses was performed in patients with

complete data for all the important predictor variables (92% of the study

patients), the subset of patients with one or more missing predictor variables

had a higher mortality rate than the other patients, and excluding those

patients could lead to biased estimates of risk. To circumvent this, a method

for simultaneous imputation and transformation of predictor variables based

on the concepts of maximum generalized variance and canonical variables

was used to estimate missing predictor variables and allow analysis of all

patients.33 34 The iterative imputation technique conceptually involved

estimating a given predictor variable on the basis of multiple regression on

(possibly) transformed values of all the other predictor variables. End-point

data were not explicitly used in the imputation process. The computations for

these analyses were performed with S-PLUS statistical software (version 3.2 for UNIX32),

using a modification of an existing algorithm.33 34 The imputation software is

available electronically in the public domain.33

2. Coding of predictors

continuous predictors

linear and restricted cubic spline functions

truncation of values (for example for systolic blood pressure)

categorical variables

Detailed categorization for location of infarction:

anterior (39%), inferior (58%), or other (3%)

Ordinality ignored for Killip class (I – IV)

class III and class IV each contained only 1% of the

patients

3. Model specification

Main effects: “.. which variables were most strongly related to short-

term mortality”:

hypothesis testing rather than prediction question

Interactions: many tested, one included: Age*Killip

Linearity of predictors:

transformations chosen at univariate analysis were also used in

multivariable analysis

4. Model estimation

Standard ML

No shrinkage / penalization

No external information

5. Model performance

Discrimination

AUC

Calibration: observed vs predicted

Graphically, including deciles

(links to Hosmer-Lemeshow goodness of fit test)

Specific subgroups of patients

Calibration

Calibration

6. Model validation

10-fold cross validation

100 bootstrap samples

model refitted, tested on the original sample

7. Model presentation

Predictor effects:

Relative importance: Chi-square statistics

Relative effects: Odds ratios graphically

Predictions

Formula

Risk Model for 30-Day Mortality

Probability of death within 30 days=1/[1+exp (-L)], where L=3.812+0.07624 age-0.03976 minimum (SBP,

120)+2.0796 [Killip class II]+3.6232 [Killip class III]+4.0392 [Killip class IV]-0.02113 heart rate+0.03936 (heart

rate-50)+-0.5355 [inferior MI]-0.2598 [other MI location]+0.4115 [previous MI]

-0.03972 height+0.0001835 (height-154.9)+^3-0.0008975 (height-165.1)+^3+0.001587 (height-172.0)+^3-

0.001068 (height-177.3)+^3+0.0001943 (height-185.4)+^3

+0.09299 time to treatment-0.2190 [current smoker]-0.2129 [former smoker]+0.2497 [diabetes]-0.007379

weight+0.3524 [previous CABG]+0.2142 [treatment with SK and intravenous heparin]+0.1968 [treatment with

SK and subcutaneous heparin]+0.1399 [treatment with combination TPA and SK plus intravenous heparin]

+0.1645 [hx of hypertension]+0.3412 [hx of cerebrovascular disease]-0.02124 age · [Killip class II]-0.03494 age

· [Killip class III]-0.03216 age · [Killip class IV].

Explanatory notes.

1. Brackets are interpreted as [c]=1 if the patient falls into category c, [c]=0 otherwise.

2. (x)+=x if x>0, (x)+=0 otherwise.

3. For systolic blood pressure (SBP), values >120 mm Hg are truncated at 120.

4. For time to treatment, values <2 hours are truncated at 2.

5. The measurement units for age are years; for blood pressure, millimeters of mercury; for heart rate, beats

per minute; for height, centimeters; for time to treatment, hours; and for weight, kilograms.

6. "Other" MI location refers to posterior, lateral, or apical but not anterior or inferior.

7. CABG indicates coronary artery bypass grafting; SK, streptokinase; and hx, history.

Step Specific issues GUSTO-I model General considerations Research question Aim: predictors / prediction? Both Intended application Clinical practice / research? Clinical practice Outcome Clinically relevant? 30-day mortality Predictors Reliable measurement?

Comprehensiveness Standard clinical work-up; extensive set of candidate predictors

Study design Retrospective/prospective? Cohort; case-control

RCT data: prospective cohort

Statistical model Appropriate for research question and type of outcome?

Logistic regression

Sample size Sufficient for aim? >40,000 patients; 2851 events: excellent

7 modeling steps 1. Preliminary Inspection of data

Missing values Table 1 (here: Table 22.3) Single imputation

2. Coding of predictors Continuous predictors Combining categorical predictors Combining predictors with similar effects

Extensive checks of transformations for continuous predictors Categories kept separate

3. Model specification Appropriate selection of main effects? Assessment of assumptions (distributional, linearity and additivity)?

Stepwise selection Additivity checked with interaction terms, one included


Not necessary No

5. Model performance Appropriate measures used? Calibration and discrimination 6. Model validation Internal validation including

model specification and estimation? External validation?

Bootstrap and 10 fold cross-validation No external validation

7. Model presentation Format appropriate for audience

No; formula in appendix; later paper focused on clinical application

Validity Internal: overfitting Sufficient attempts to limit and

correct for overfitting? Large sample size, predictors from literature

External: generalizability Predictions valid for plausibly related populations?

Large set of predictors, representing important domains; not assessed in this study

Conclusion on usefulness of framework

GUSTO-I makes for an interesting case-study on

General modeling considerations

Illustration of 7 modeling steps

Internal vs external validity (early 1990s 2009?)

Debate possible on some choices

1. Missing values: multiple imputation, including the outcome

2. Coding: fractional polynomials? Lump categories?

3. Selection: stepwise works because of large N

4. Estimation: standard ML works because of large N; penalization?

5. Performance: usefulness measures

6. Validation: CV and bootstrap, not necessary because of large N?

7. Presentation: Predictor effects: nice! Predictions: score chart / nomogram

Discussion on usefulness of framework

Checklist for model buildingChecklist for model building

SMART data, survival after cardiovascular event, 2009SMART data, survival after cardiovascular event, 2009

Critical assessment of model buildingCritical assessment of model building

GUSTO-I model, Lee 1995GUSTO-I model, Lee 1995

Basis for reporting checklist

Link with REMARK / STROBE / …

Basis for protocol registration

Link with requirements in other protocols?

Challenges in developing a valid prognostic model

Theoretical: biostatistical research

New analysis techniques, e.g.

Neural networks / Support vector machines / …

Fractional polynomials / splines for continuous predictors

Performance measures

Simulations: what makes sense as a strategy?

Applications: epidemiological and decision-analytic research

Subject matter knowledge

Clinical experts

Literature: review / meta-analysis

Balance research questions vs effective sample size

Incremental value new markers

Transportability and external validity

Clinical impact of using a model

Which performance measure when?

1. Discrimination: if poor, usefulness unlikely, but >= 0

2. Calibration: if poor in new setting:

Prediction model may harm rather than improve decision-making

Application area Calibration Discrimination Clinical usefulness Public health Targeting of preventive interventions Predict incident disease x X x Clinical practice Diagnostic work-up Test ordering X x X Starting treatment X x X Therapeutic decision making Surgical decision making X x X Intensity of treatment X x X Delaying treatment X x X Research Inclusion in a RCT X x X Covariate adjustment in a RCT X Confounder adjustment with a propensity score Case-mix adjustment

1

Date post:	31-Dec-2015
Category:	Documents
Upload:	nayda-cobb
View:	21 times
Download:	0 times

Potential usefulness of a framework of 7 steps for prediction models

Documents