Introduction to Survival Analysis October 19, 2004

Introduction to Survival Analysis

October 19, 2004

Brian F. Gage, MD, MSc with thanks to Bing Ho, MD, MPH

Division of General Medical Sciences

Presentation goals

Survival analysis compared w/ other regression techniques

What is survival analysis When to use survival analysis Univariate method: Kaplan-Meier curves Multivariate methods:

• Cox-proportional hazards model• Parametric models

Assessment of adequacy of analysis Examples

Regression vs. Survival Analysis

Technique PredictorVariables

OutcomeVariable

Censoringpermitted?

LinearRegression

Categorical orcontinuous

Normallydistributed

No

LogisticRegression

Categorical orcontinuous

Binary (except inpolytomous log.

regression)

No

SurvivalAnalyses

Time andcategorical or

continuous

Binary Yes

Regression vs. Survival Analysis

Technique Mathematicalmodel

Yields

LinearRegression

Y=B1X + Bo(linear)

Linear changes

LogisticRegression

Ln(P/1-P)=B1X+Bo(sigmoidal prob.)

Odds ratios

SurvivalAnalyses

h(t) =ho(t)exp(B1X+Bo)

Hazard rates

What is survival analysis?

Model time to failure or time to event• Unlike linear regression, survival analysis has

a dichotomous (binary) outcome• Unlike logistic regression, survival analysis

analyzes the time to an event – Why is that important?

Able to account for censoring Can compare survival between 2+

groups Assess relationship between

covariates and survival time

Importance of censored data

Why is censored data important? What is the key assumption of

censoring?

Types of censoring

Subject does not experience event of interest

Incomplete follow-up• Lost to follow-up• Withdraws from study• Dies (if not being

studied) Left or right

censored

When to use survival analysis

Examples• Time to death or clinical endpoint• Time in remission after treatment of disease• Recidivism rate after addiction treatment

When one believes that 1+ explanatory variable(s) explains the differences in time to an event

Especially when follow-up is incomplete or variable

Relationship between survivor function and hazard function Survivor function, S(t) defines the

probability of surviving longer than time t• this is what the Kaplan-Meier curves show.• Hazard function is the derivative of the

survivor function over time h(t)=dS(t)/dt– instantaneous risk of event at time t (conditional

failure rate)

Survivor and hazard functions can be converted into each other

Approach to survival analysis

Like other statistics we have studied we can do any of the following w/ survival analysis:• Descriptive statistics• Univariate statistics• Multivariate statistics

Descriptive statistics

Average survival• When can this be calculated?• What test would you use to compare

average survival between 2 cohorts? Average hazard rate

• Total # of failures divided by observed survival time (units are therefore 1/t or 1/pt-yrs)

• An incidence rate, with a higher values indicating more events per time

Univariate method: Kaplan-Meier survival curves Also known as product-limit formula Accounts for censoring Generates the characteristic “stair

step” survival curves Does not account for confounding or

effect modification by other covariates• When is that a problem?• When is that OK?

0.0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1.0

0 100 200 300 400 500 600 700 800 900

Days Since Index Hospitalization

WarfASA

No Rx

Age 76 Years and Older (N = 394)

Time to Cardiovascular Adverse Event in VIGOR Trial

Comparing Kaplan-Meier curves

Log-rank test can be used to compare survival curves• Less-commonly used test: Wilcoxon, which places greater weights

on events near time 0.

Hypothesis test (test of significance)• H0: the curves are statistically the same

• H1: the curves are statistically different

Compares observed to expected cell counts Test statistic which is compared to 2

distribution

Comparing multiple Kaplan-Meier curves

Multiple pair-wise comparisons produce cumulative Type I error – multiple comparison problem

Instead, compare all curves at once • analogous to using ANOVA to compare > 2

cohorts• Then use judicious pair-wise testing

Limit of Kaplan-Meier curves

What happens when you have several covariates that you believe contribute to survival?

Example• Smoking, hyperlipidemia, diabetes, hypertension,

contribute to time to myocardial infarct Can use stratified K-M curves – for 2 or

maybe 3 covariates Need another approach – multivariate Cox

proportional hazards model is most common -- for many covariates• (think multivariate regression or logistic

regression rather than a Student’s t-test or the odds ratio from a 2 x 2 table)

Multivariate method: Cox proportional hazards

Needed to assess effect of multiple covariates on survival

Cox-proportional hazards is the most commonly used multivariate survival method• Easy to implement in SPSS, Stata, or SAS• Parametric approaches are an alternative,

but they require stronger assumptions about h(t).

Cox proportional hazard model

Works with hazard model

Conveniently separates baseline hazard function from covariates• Baseline hazard function over time

– h(t) = ho(t)exp(B1X+Bo)• Covariates are time independent• B1 is used to calculate the hazard ratio, which is

similar to the relative risk Nonparametric Quasi-likelihood function

Cox proportional hazards model, continued

Can handle both continuous and categorical predictor variables (think: logistic, linear regression)

Without knowing baseline hazard ho(t), can still calculate coefficients for each covariate, and therefore hazard ratio

Assumes multiplicative risk—this is the proportional hazard assumption• Can be compensated in part with interaction

terms

Limitations of Cox PH model

Does not accommodate variables that change over time• Luckily most variables (e.g. gender, ethnicity,

or congenital condition) are constant– If necessary, one can program time-dependent

variables – When might you want this?

Baseline hazard function, ho(t), is never specified• You can estimate ho(t) accurately if you need to

estimate S(t).

Hazard ratio

What is the hazard ratio and how to you calculate it from your parameters, β

How do we estimate the relative risk from the hazard ratio (HR)?

How do you determine significance of the hazard ratios (HRs).• Confidence intervals• Chi square test

Assessing model adequacy

Multiplicative assumption Proportional assumption: covariates are

independent with respect to time and their hazards are constant over time

Three general ways to examine model adequacy• Graphically• Mathematically• Computationally: Time-dependent variables

(extended model)

Model adequacy: graphical approaches

Several graphical approaches• Do the survival curves intersect?• Log-minus-log plots• Observed vs. expected plots

Testing model adequacy mathematically with a goodness-of-fit test Uses a test of significance (hypothesis

test) One-degree of freedom chi-square

distribution p value for each coefficient Does not discriminate how a coefficient

might deviate from the PH assumption

Example: Tumor Extent

3000 patients derived from SEER cancer registry and Medicare billing information

Exploring the relationship between tumor extent and survival

Hypothesis is that more extensive tumor involvement is related to poorer survival

Log-Rank 2 = 269.0973 p <.0001


Tumor extent may not be the only covariate that affects survival• Multiple medical comorbidities may be associated

with poorer outcome

• Ethnic and gender differences may contribute

Cox proportional hazards model can quantify these relationships


Test proportional hazards assumption with log-minus-log plot

Perform Cox PH regression• Examine significant coefficients and corresponding

hazard ratios

Example: Tumor Extent 5 The PHREG Procedure

Analysis of Maximum Likelihood Estimates Parameter Standard Hazard 95% Hazard Ratio Variable Variable DF Estimate Error Chi-Square Pr > ChiSq Ratio Confidence Limits Label age2 1 0.15690 0.05079 9.5430 0.0020 1.170 1.059 1.292 70<age<=80 age3 1 0.58385 0.06746 74.9127 <.0001 1.793 1.571 2.046 age>80 race2 1 0.16088 0.07953 4.0921 0.0431 1.175 1.005 1.373 black race3 1 0.05060 0.09590 0.2784 0.5977 1.052 0.872 1.269 other comorb1 1 0.27087 0.05678 22.7549 <.0001 1.311 1.173 1.465 comorb2 1 0.32271 0.06341 25.9046 <.0001 1.381 1.219 1.564 comorb3 1 0.61752 0.06768 83.2558 <.0001 1.854 1.624 2.117 DISTANT 1 0.86213 0.07300 139.4874 <.0001 2.368 2.052 2.732 REGIONAL 1 0.51143 0.05016 103.9513 <.0001 1.668 1.512 1.840 LIPORAL 1 0.28228 0.05575 25.6366 <.0001 1.326 1.189 1.479 PHARYNX 1 0.43196 0.05787 55.7206 <.0001 1.540 1.375 1.725 treat3 1 0.07890 0.06423 1.5090 0.2193 1.082 0.954 1.227 both treat2 1 0.47215 0.06074 60.4215 <.0001 1.603 1.423 1.806 rad treat0 1 1.52773 0.08031 361.8522 <.0001 4.608 3.937 5.393 none

Summary

Survival analyses quantifies time to a single, dichotomous event

Handles censored data well Survival and hazard can be mathematically

converted to each other Kaplan-Meier survival curves can be

compared statistically and graphically Cox proportional hazards models help

distinguish individual contributions of covariates on survival, provided certain assumptions are met.

Date post:	10-Jan-2016
Category:	Documents
Upload:	reilly
View:	35 times
Download:	1 times

Introduction to Survival Analysis October 19, 2004

Documents