Prognostic Model Building with Biomarkers in Pharmacogenomics Trials Li-an Xu & Douglas Robinson...

transcript

Prognostic Model Building with Biomarkers in Pharmacogenomics Trials

Li-an Xu & Douglas RobinsonStatistical Genetics & Biomarkers

Exploratory Development, Global Biometric SciencesBristol-Myers Squibb

2006 FDA/Industry Statistics WorkshopTheme - Statistics in the FDA and Industry: Past, Present, and Future

Washington, DC

September 27-29, 2006

Outline

Statistical Challenges in Prognostic Model Building

Data quantity and quality across multiple platforms

Dimension reduction in model building process

Model performance measures

Realistic assessment of model performance

Handling correlated predictors: when p >> n

Tumor samples for mRNA Trial A Sample Size : 161 Subjects

134 usable (sufficient quality and quantity) mRNA samples (85%)

Trial B Sample Size : 110 Subjects

83 usable mRNA samples (75%)

Plasma protein profiling (Liquid Chromatography / Mass Spectrometry) Trial B Sample Size : 110 Subjects

90 usable plasma samples (82%) Even if sample collection is mandatory, usable sample size <

subject sample size

Data Quantity and Quality Across Platforms

Need to design studies based on expected usable sample size

Number of potential predictors is greater than number of subjects (p>>n) in high throughput biomarker studies No unique solutions in prognostic model fitting with

traditional methods Regularized methods can provide some possible solutions

Penalized logistic regression (PLR) + Recursive Feature Elimination (RFE)

Threshold gradient descent + RFE Further dimension reduction may still be needed

Incorporate prior information (e.g. results from preclinical studies as the starting point for p)

Intersection of single-biomarker results from multiple statistical methods

Dimension Reduction in Prognostic Model Building

Dimension Reduction Through Penalized Logistic Regression with Recursive Feature

Elimination to Select Genes

Training Set

Patients

~22,000 genes

1 gene

Choose the model with the smallest cross-validation error and fewest genes

Number of predictors in model

02000400060008000

100001200014000160001800020000

Sensitive Resistant

Example of one gene

Sensitive Resistant

Predicting cell line sensitivity to a compound 18 cancer cell lines (12 sensitive, 6 resistant)

Identified top 200 genes associated with in vitro sensitivity/resistance

Dimension Reduction Through Preclinical Studies

18 Caner Cell Lines

Predicting Response in Trial A

Models PPV (95% CI)

NPV (95% CI)

Sensitivity(95% CI)

Specificity (95% CI)

Starting with full gene list, resulting in 6-gene model

(0-0.30)

(0.69-0.89)

(0 -0.26)

(0.72 -0.91)

Starting with preclinical top 200, resulting in 10-gene model

(0.21-0.72)

(0.79-0.95)

(0.21-0.72)

(0.79-0.95)

All treated patients

Patients included in the genomics analysis

Response 29 (18%) 23 (17%)

Dimension reduction by using prior preclinical results seemed to help in this trial

Dimension Reduction Through Intersection of Single-

Biomarker Results from Multiple Statistical Methods

Method Resp1 Resp2 Resp3 Resp4 TTP

Log Reg X X X X

t - Test X X X X

Intersection resulted in 51 potential candidates It may be more beneficial to start model building with this set than

the complete set of potential predictors (work currently in progress)

Cox Proportional Hazards: 446 Probesets

Logistic Regression 297 Probesets

t – Test 396 Probesets

Model Performance Measures Sensitivity, Specificity, Positive and Negative Predictive Value are

common measures of model performance Dependent on the threshold

Area under the ROC curve (AUC) may be a better measure for comparing models

All three models yield complete separation between responders and non-responders

Arbitrary threshold of 0.5 probability may lead one to believe that model 2 is superior

AUC correctly shows equivalence

Sensitivity Specificity PPV NPV AUC

Model 1 0.73 1 1 0.79 1

Model 2 1 1 1 1 1

Model 3 1 0.77 0.81 1 1

Response P

robability

Non-Responder Responder

Response Status

Model 3

Response P

robability

Response Status

Model 2

Response P

robability

Response Status

Model 1

• These figures are from simulated perfect predictors

Realistic Assessment of Model Performance

When sample size is reasonably large Split sample into a training set and

independent test Set Build the model on the training

set and test the model performance on the test set

Pro: One independent test of model performance for the model picked in the training set

Cons: When sample size is small, the

estimate of performance may have a large variance

Reduced sample size for training may yield sub-optimal model

• Christophe Ambroise & Geoffrey J. McLachlan, PNAS 99(10): 2002 Entire model building procedure should

be cross-validated

Realistic Assessment of Model Performance

Number of Predictors

Cross-validation should be repeated multiple times Allows one to observe effects of sampling variability The average of replicate estimators gives a more accurate assessment

of model performance

When sample size is small, one cannot split data into training / test set Cross–validation alone is a reasonable alternative Warning: Initial performance estimate may be misleading

Individual runs

Average AUC

Handling Correlated Predictors: When p >> n

Complex correlation structure (mRNA as example) Multiple probe sets interrogate the same gene Multiple genes function together in pathways

Not all pathways are known Multiple response definitions that are interrelated False positive genes may be correlated with true

positives

Most prognostic modeling techniques do not handle this well Recursive feature elimination may remove important

predictors because of correlations

This is an open research problem

Summary

Need to design studies based on expected usable sample size Dimension reduction in the model building process

Overfitting problem can be mitigated by regularized methods To further reduce the candidate set of predictors

Preclinical information can be useful Intersection of single-biomarker results by different statistical

methods may also be useful Model performance

Independent test set may be important for validation purposes. When sample size is small, cross-validation is a viable alternative.

Cross-validation should include biomarker selection procedures and needs to be performed appropriately

Cross-validation should be repeated multiple times Performance measures should be carefully chosen when

comparing multiple models. AUC often is a good choice. Handling correlated predictors is still an open research problem

Acknowledgments

Can CaiScott Chasalow

Ed ClarkMark Curran

Ashok Dongre

Matt Farmer

Alexander Florczyk

Shirin Ford

Susan Galbraith

Ji Gao

Nancy GustafsonBen Huang

Tom Kelleher

Christiane Langer

Hyerim Lee

Haolan Lu

David Mauro

Shelley MayfieldOksana MokliatchoukRelekar Padmavathibai

Barry PaulLynn Ploughman

Amy RonczkaKaty Simonsen

Eric Strittmatter

Dana Wheeler

Shujian WuShuang WuKim Zerba

Renping Zhang

Prognostic Model Building with Biomarkers in Pharmacogenomics Trials Li-an Xu & Douglas Robinson...

Documents