Post on 27-Mar-2015
transcript
Prognostic Model Building with Biomarkers in Pharmacogenomics Trials
Li-an Xu & Douglas RobinsonStatistical Genetics & Biomarkers
Exploratory Development, Global Biometric SciencesBristol-Myers Squibb
2006 FDA/Industry Statistics WorkshopTheme - Statistics in the FDA and Industry: Past, Present, and Future
Washington, DC
September 27-29, 2006
2
Outline
Statistical Challenges in Prognostic Model Building
Data quantity and quality across multiple platforms
Dimension reduction in model building process
Model performance measures
Realistic assessment of model performance
Handling correlated predictors: when p >> n
3
Tumor samples for mRNA Trial A Sample Size : 161 Subjects
134 usable (sufficient quality and quantity) mRNA samples (85%)
Trial B Sample Size : 110 Subjects
83 usable mRNA samples (75%)
Plasma protein profiling (Liquid Chromatography / Mass Spectrometry) Trial B Sample Size : 110 Subjects
90 usable plasma samples (82%) Even if sample collection is mandatory, usable sample size <
subject sample size
Data Quantity and Quality Across Platforms
Need to design studies based on expected usable sample size
4
Number of potential predictors is greater than number of subjects (p>>n) in high throughput biomarker studies No unique solutions in prognostic model fitting with
traditional methods Regularized methods can provide some possible solutions
Penalized logistic regression (PLR) + Recursive Feature Elimination (RFE)
Threshold gradient descent + RFE Further dimension reduction may still be needed
Incorporate prior information (e.g. results from preclinical studies as the starting point for p)
Intersection of single-biomarker results from multiple statistical methods
Dimension Reduction in Prognostic Model Building
5
Dimension Reduction Through Penalized Logistic Regression with Recursive Feature
Elimination to Select Genes
Training Set
Genes
Patients
~22,000 genes
1 gene
Choose the model with the smallest cross-validation error and fewest genes
Ave
rag
e C
ross
-val
idat
ion
Err
or
Number of predictors in model
6
02000400060008000
100001200014000160001800020000
AU
565
HC
C18
06
HC
C38
BT2
0
BT5
49
MD
AM
B43
5S
HC
C19
54
SkB
r3
MD
AM
B15
7
HC
C70
Hs5
78T
MD
AM
B43
6
HC
C14
28
BT4
74
Her
2MC
F7
MC
F7
Zr-7
5-30
Zr-7
5-1
Sensitive Resistant
Example of one gene
High
Low
Exp
ress
ion
leve
l
Sensitive Resistant
Predicting cell line sensitivity to a compound 18 cancer cell lines (12 sensitive, 6 resistant)
Identified top 200 genes associated with in vitro sensitivity/resistance
Dimension Reduction Through Preclinical Studies
18 Caner Cell Lines
Exp
ress
ion
7
Predicting Response in Trial A
Models PPV (95% CI)
NPV (95% CI)
Sensitivity(95% CI)
Specificity (95% CI)
Error
Starting with full gene list, resulting in 6-gene model
0
(0-0.30)
0.81
(0.69-0.89)
0
(0 -0.26)
0.84
(0.72 -0.91)
0.580
Starting with preclinical top 200, resulting in 10-gene model
0.45
(0.21-0.72)
0.89
(0.79-0.95)
0.45
(0.21-0.72)
0.89
(0.79-0.95)
0.326
All treated patients
N=161
Patients included in the genomics analysis
N=134
Response 29 (18%) 23 (17%)
Dimension reduction by using prior preclinical results seemed to help in this trial
8
Dimension Reduction Through Intersection of Single-
Biomarker Results from Multiple Statistical Methods
Method Resp1 Resp2 Resp3 Resp4 TTP
Log Reg X X X X
t - Test X X X X
Cox X
Intersection resulted in 51 potential candidates It may be more beneficial to start model building with this set than
the complete set of potential predictors (work currently in progress)
Cox Proportional Hazards: 446 Probesets
9746
51
Logistic Regression 297 Probesets
t – Test 396 Probesets
9
Model Performance Measures Sensitivity, Specificity, Positive and Negative Predictive Value are
common measures of model performance Dependent on the threshold
Area under the ROC curve (AUC) may be a better measure for comparing models
All three models yield complete separation between responders and non-responders
Arbitrary threshold of 0.5 probability may lead one to believe that model 2 is superior
AUC correctly shows equivalence
Sensitivity Specificity PPV NPV AUC
Model 1 0.73 1 1 0.79 1
Model 2 1 1 1 1 1
Model 3 1 0.77 0.81 1 1
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
Response P
robability
Non-Responder Responder
Response Status
Model 3
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
Response P
robability
Non-Responder Responder
Response Status
Model 2
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
Response P
robability
Non-Responder Responder
Response Status
Model 1
• These figures are from simulated perfect predictors
10
Realistic Assessment of Model Performance
When sample size is reasonably large Split sample into a training set and
independent test Set Build the model on the training
set and test the model performance on the test set
Pro: One independent test of model performance for the model picked in the training set
Cons: When sample size is small, the
estimate of performance may have a large variance
Reduced sample size for training may yield sub-optimal model
• Christophe Ambroise & Geoffrey J. McLachlan, PNAS 99(10): 2002 Entire model building procedure should
be cross-validated
11
Realistic Assessment of Model Performance
Number of Predictors
Cro
ss-v
alid
ated
AU
C
Cross-validation should be repeated multiple times Allows one to observe effects of sampling variability The average of replicate estimators gives a more accurate assessment
of model performance
When sample size is small, one cannot split data into training / test set Cross–validation alone is a reasonable alternative Warning: Initial performance estimate may be misleading
Individual runs
Average AUC
12
Handling Correlated Predictors: When p >> n
Complex correlation structure (mRNA as example) Multiple probe sets interrogate the same gene Multiple genes function together in pathways
Not all pathways are known Multiple response definitions that are interrelated False positive genes may be correlated with true
positives
Most prognostic modeling techniques do not handle this well Recursive feature elimination may remove important
predictors because of correlations
This is an open research problem
13
Summary
Need to design studies based on expected usable sample size Dimension reduction in the model building process
Overfitting problem can be mitigated by regularized methods To further reduce the candidate set of predictors
Preclinical information can be useful Intersection of single-biomarker results by different statistical
methods may also be useful Model performance
Independent test set may be important for validation purposes. When sample size is small, cross-validation is a viable alternative.
Cross-validation should include biomarker selection procedures and needs to be performed appropriately
Cross-validation should be repeated multiple times Performance measures should be carefully chosen when
comparing multiple models. AUC often is a good choice. Handling correlated predictors is still an open research problem
14
Acknowledgments
Can CaiScott Chasalow
Ed ClarkMark Curran
Ashok Dongre
Matt Farmer
Alexander Florczyk
Shirin Ford
Susan Galbraith
Ji Gao
Nancy GustafsonBen Huang
Tom Kelleher
Christiane Langer
Hyerim Lee
Haolan Lu
David Mauro
Shelley MayfieldOksana MokliatchoukRelekar Padmavathibai
Barry PaulLynn Ploughman
Amy RonczkaKaty Simonsen
Eric Strittmatter
Dana Wheeler
Shujian WuShuang WuKim Zerba
Renping Zhang