Introduction to Survival Analysis
October 19, 2004
Brian F. Gage, MD, MSc with thanks to Bing Ho, MD, MPH
Division of General Medical Sciences
Presentation goals
Survival analysis compared w/ other regression techniques
What is survival analysis When to use survival analysis Univariate method: Kaplan-Meier curves Multivariate methods:
• Cox-proportional hazards model• Parametric models
Assessment of adequacy of analysis Examples
Regression vs. Survival Analysis
Technique PredictorVariables
OutcomeVariable
Censoringpermitted?
LinearRegression
Categorical orcontinuous
Normallydistributed
No
LogisticRegression
Categorical orcontinuous
Binary (except inpolytomous log.
regression)
No
SurvivalAnalyses
Time andcategorical or
continuous
Binary Yes
Regression vs. Survival Analysis
Technique Mathematicalmodel
Yields
LinearRegression
Y=B1X + Bo(linear)
Linear changes
LogisticRegression
Ln(P/1-P)=B1X+Bo(sigmoidal prob.)
Odds ratios
SurvivalAnalyses
h(t) =ho(t)exp(B1X+Bo)
Hazard rates
What is survival analysis?
Model time to failure or time to event• Unlike linear regression, survival analysis has
a dichotomous (binary) outcome• Unlike logistic regression, survival analysis
analyzes the time to an event – Why is that important?
Able to account for censoring Can compare survival between 2+
groups Assess relationship between
covariates and survival time
Importance of censored data
Why is censored data important? What is the key assumption of
censoring?
Types of censoring
Subject does not experience event of interest
Incomplete follow-up• Lost to follow-up• Withdraws from study• Dies (if not being
studied) Left or right
censored
When to use survival analysis
Examples• Time to death or clinical endpoint• Time in remission after treatment of disease• Recidivism rate after addiction treatment
When one believes that 1+ explanatory variable(s) explains the differences in time to an event
Especially when follow-up is incomplete or variable
Relationship between survivor function and hazard function Survivor function, S(t) defines the
probability of surviving longer than time t• this is what the Kaplan-Meier curves show.• Hazard function is the derivative of the
survivor function over time h(t)=dS(t)/dt– instantaneous risk of event at time t (conditional
failure rate)
Survivor and hazard functions can be converted into each other
Approach to survival analysis
Like other statistics we have studied we can do any of the following w/ survival analysis:• Descriptive statistics• Univariate statistics• Multivariate statistics
Descriptive statistics
Average survival• When can this be calculated?• What test would you use to compare
average survival between 2 cohorts? Average hazard rate
• Total # of failures divided by observed survival time (units are therefore 1/t or 1/pt-yrs)
• An incidence rate, with a higher values indicating more events per time
Univariate method: Kaplan-Meier survival curves Also known as product-limit formula Accounts for censoring Generates the characteristic “stair
step” survival curves Does not account for confounding or
effect modification by other covariates• When is that a problem?• When is that OK?
0.0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1.0
0 100 200 300 400 500 600 700 800 900
Days Since Index Hospitalization
WarfASA
No Rx
Age 76 Years and Older (N = 394)
Time to Cardiovascular Adverse Event in VIGOR Trial
Comparing Kaplan-Meier curves
Log-rank test can be used to compare survival curves• Less-commonly used test: Wilcoxon, which places greater weights
on events near time 0.
Hypothesis test (test of significance)• H0: the curves are statistically the same
• H1: the curves are statistically different
Compares observed to expected cell counts Test statistic which is compared to 2
distribution
Comparing multiple Kaplan-Meier curves
Multiple pair-wise comparisons produce cumulative Type I error – multiple comparison problem
Instead, compare all curves at once • analogous to using ANOVA to compare > 2
cohorts• Then use judicious pair-wise testing
Limit of Kaplan-Meier curves
What happens when you have several covariates that you believe contribute to survival?
Example• Smoking, hyperlipidemia, diabetes, hypertension,
contribute to time to myocardial infarct Can use stratified K-M curves – for 2 or
maybe 3 covariates Need another approach – multivariate Cox
proportional hazards model is most common -- for many covariates• (think multivariate regression or logistic
regression rather than a Student’s t-test or the odds ratio from a 2 x 2 table)
Multivariate method: Cox proportional hazards
Needed to assess effect of multiple covariates on survival
Cox-proportional hazards is the most commonly used multivariate survival method• Easy to implement in SPSS, Stata, or SAS• Parametric approaches are an alternative,
but they require stronger assumptions about h(t).
Cox proportional hazard model
Works with hazard model
Conveniently separates baseline hazard function from covariates• Baseline hazard function over time
– h(t) = ho(t)exp(B1X+Bo)• Covariates are time independent• B1 is used to calculate the hazard ratio, which is
similar to the relative risk Nonparametric Quasi-likelihood function
Cox proportional hazards model, continued
Can handle both continuous and categorical predictor variables (think: logistic, linear regression)
Without knowing baseline hazard ho(t), can still calculate coefficients for each covariate, and therefore hazard ratio
Assumes multiplicative risk—this is the proportional hazard assumption• Can be compensated in part with interaction
terms
Limitations of Cox PH model
Does not accommodate variables that change over time• Luckily most variables (e.g. gender, ethnicity,
or congenital condition) are constant– If necessary, one can program time-dependent
variables – When might you want this?
Baseline hazard function, ho(t), is never specified• You can estimate ho(t) accurately if you need to
estimate S(t).
Hazard ratio
What is the hazard ratio and how to you calculate it from your parameters, β
How do we estimate the relative risk from the hazard ratio (HR)?
How do you determine significance of the hazard ratios (HRs).• Confidence intervals• Chi square test
Assessing model adequacy
Multiplicative assumption Proportional assumption: covariates are
independent with respect to time and their hazards are constant over time
Three general ways to examine model adequacy• Graphically• Mathematically• Computationally: Time-dependent variables
(extended model)
Model adequacy: graphical approaches
Several graphical approaches• Do the survival curves intersect?• Log-minus-log plots• Observed vs. expected plots
Testing model adequacy mathematically with a goodness-of-fit test Uses a test of significance (hypothesis
test) One-degree of freedom chi-square
distribution p value for each coefficient Does not discriminate how a coefficient
might deviate from the PH assumption
Example: Tumor Extent
3000 patients derived from SEER cancer registry and Medicare billing information
Exploring the relationship between tumor extent and survival
Hypothesis is that more extensive tumor involvement is related to poorer survival
Log-Rank 2 = 269.0973 p <.0001
Example: Tumor Extent
Tumor extent may not be the only covariate that affects survival• Multiple medical comorbidities may be associated
with poorer outcome
• Ethnic and gender differences may contribute
Cox proportional hazards model can quantify these relationships
Example: Tumor Extent
Test proportional hazards assumption with log-minus-log plot
Perform Cox PH regression• Examine significant coefficients and corresponding
hazard ratios
Example: Tumor Extent 5 The PHREG Procedure
Analysis of Maximum Likelihood Estimates Parameter Standard Hazard 95% Hazard Ratio Variable Variable DF Estimate Error Chi-Square Pr > ChiSq Ratio Confidence Limits Label age2 1 0.15690 0.05079 9.5430 0.0020 1.170 1.059 1.292 70<age<=80 age3 1 0.58385 0.06746 74.9127 <.0001 1.793 1.571 2.046 age>80 race2 1 0.16088 0.07953 4.0921 0.0431 1.175 1.005 1.373 black race3 1 0.05060 0.09590 0.2784 0.5977 1.052 0.872 1.269 other comorb1 1 0.27087 0.05678 22.7549 <.0001 1.311 1.173 1.465 comorb2 1 0.32271 0.06341 25.9046 <.0001 1.381 1.219 1.564 comorb3 1 0.61752 0.06768 83.2558 <.0001 1.854 1.624 2.117 DISTANT 1 0.86213 0.07300 139.4874 <.0001 2.368 2.052 2.732 REGIONAL 1 0.51143 0.05016 103.9513 <.0001 1.668 1.512 1.840 LIPORAL 1 0.28228 0.05575 25.6366 <.0001 1.326 1.189 1.479 PHARYNX 1 0.43196 0.05787 55.7206 <.0001 1.540 1.375 1.725 treat3 1 0.07890 0.06423 1.5090 0.2193 1.082 0.954 1.227 both treat2 1 0.47215 0.06074 60.4215 <.0001 1.603 1.423 1.806 rad treat0 1 1.52773 0.08031 361.8522 <.0001 4.608 3.937 5.393 none
Summary
Survival analyses quantifies time to a single, dichotomous event
Handles censored data well Survival and hazard can be mathematically
converted to each other Kaplan-Meier survival curves can be
compared statistically and graphically Cox proportional hazards models help
distinguish individual contributions of covariates on survival, provided certain assumptions are met.