Date post: | 27-Mar-2015 |
Category: |
Documents |
Upload: | rachel-harrington |
View: | 216 times |
Download: | 1 times |
Challenges In Progressing Biomarkers To Clinical Use
Proteomic ExperiencesChris Harbron
Technical Lead For High Dimensional Data
AstraZeneca
FDA Industry Statistics Workshop
September 2006
2Gap Between Published Biomarkers And
Biomarkers Being Approved For Use
3 Why Might This Be?Challenges
• Pressures from the contextual environment• High quality data is essential
– These are new technologies - not simple to use or analyse– Robust study design including :– Consistent sample collection and processing– Need to understand reproducibility between & within labs & within
subjects
• Failure leads to poor data quality, frequently dominated by nuisance factors
• Rigorous validation is also essential– Occurs at many levels– Avoid overfitting data
• Omics may not do it alone– Applications will require combining -omics with other data types
4
Example : Case-Control Study
• Interest in identifying a peptidomic profile that could predict an adverse event– Potential use as a personalised medicine predictive
marker
• Blood samples taken from subjects at start of treatment
• Subjects monitored for adverse event using a rigorous definition
• Subjects entered in cohorts• Samples processed in batches within cohorts• Analysed on a LC/MS-MS platform
5
200
300
400
500
600
700
800
900
1000
1100
1200
1300
1400
m/z
05
10
15
20
25
30
35
40
45
50
55
60
65
70
75
80
85
90
95
100
Rel
ativ
e A
bund
anc
e
690.81
1027.87
570.33 1156
.84599.13
635.85
1138.861122
.831251.79
371.25
799.93
1010.89242
.26727.23258
.19881.99
389.22
561.21
958.89
276.24
832.76
1269.83
286.28
1234.85
1107.00
1346.63
1252.9
579.3
643.8F
ragm
ent
Ion
inte
nsity
Mass / Charge Ratio
Ion
inte
nsity
Mas
s / C
harg
e R
atio
Retention Time
LC-MS/MS Proteomics
Clinical Plasma Samples
Peptides
Liquid Chromatography
Preparation& Digestion
Mass Spectrometry
MS/MS
Separation By Mass/ChargeMeasurement Of Intensity
ProteinIdentification
Separation By Retention Time
6 Distribution Of Average Intensities
Retention Time
Mas
s-C
harg
e R
atio
High Intensity
LowIntensity
Distribution Of Average Intensities~5,500,000RT / MZ / IntensityMeasurementsPer Sample
~25,000Common PeaksPer Sample
Pre-Processing- Alignment Of Retention Times- Scaling- Binning
7 Proteomic DataExploratory Analysis - PCAConsiderable batch to batch variation
Cohort 1
Cohort 2
Cohort 3
Cohort 4
ControlCaseNon-Index Case
8 Proteomic DataExploratory Analysis - PCA
Within all batches withboth cases and controls, there is separation of cases and controls
9 Univariate Analyses Within BatchesHistograms Of t-Test p-Values
10 Global Test Of Agreement Between Batches Using A Permutation Test
Observed Permuted
Identify peaks where direction of effect agrees in all 3 batchesSummarise by maximum p-valueGlobal test of expected level due to multiple testing by permutation
11 Typical Highly Significant Peak
CASE CONTROL NIC
Within each batch,cases are highly expressed compared to controls
Not possible to define a global cut-off between cases and controls
Inte
nsity
Batches
12
Multivariate Analyses
• Identified consistent effect• BUT, may be difficult to use as a predictive
biomarker in a clinical setting due to batch variation
• Would a combination of markers, a peptidomic profile, work as a predictive biomarker?
• Use Random Forests to generate multivariate predictive models
• Assess predictive power using a nested cross-validation– Within and between batch prediction
13
Modelling Process
Data
Analyse Each PeakWithin Each Batch
Take Maximum p-Value For Each Peak
Test SetTraining Set
Rank Peaks By p-Value
Build Model WithTop n Peaks
Test Model InTest Set
Mixed Case-Control batchesExclude Batches In TurnExclude Observations By LOO
Control Only batchesBatch excludedObservation excluded
Number Of Peaks
ObservationExcluded
BatchExcluded
14 Leave One Out Cross ValidationProteomic Model Predictions
Leave One Out Training Set Batches CasesLeave One Out Training Set Batches ControlsOther Mixed Batch CasesOther Mixed Batch ControlsOther Batches - Controls
15Mask Data By Restricting To High Quality
Regions Of Proteomic Space
Retention Time
Mas
s C
harg
e R
atio
TECHNICALLY• Region of focus for instrument
EMPIRICALLY• Lowest residual variability• Highest average intensity
16
Analysis Of Unmasked Peaks
• Batch Effects Still Dominate• Consistent Case-Control Effect
Can Identify Peaks SeparatingCases & Controls Across Batches
17 Cross-Validation PredictionsUnmasked Peaks
Leave One Out Same Batch – CasesLeave One Out Same Batch - ControlsOther Mixed Batch - CasesOther Mixed Batch - ControlsOther Batches - Controls
•Good Predictions Within Same Batch•Prediction Rate Falls When Extrapolated To Other Batches•Need To Prospectively Test In Another Set Of Patients
18How To Combine Other Non-omic
Information Into A Biomarker?
• Combining different data types is challenging
• The “bigger” data type will dominate the modelling
• Greater signal in data, but doesn’t extrapolate as well
• Exploring options turning the random part of random forests to our advantage
Known Clinical PrognosticProteomic Peaks
19 Proteomic Quality Control Consortium?
• MAQC recently reported a reproducibility study for microarrays– Wealth of valuable information– Mammoth effort
• Could we do the same for proteomics?– Less mature technology– Greater diversity of platforms– Diversity of pre-processing methodologies– Issues of identification making large scale
comparisons challenging
20
Conclusions
• Complicated new technologies• Many challenges
– Technical, Data Quality, Data Analysis, Practical
• Essential role for statistics• Need to integrate statistical approaches with
understanding of technologies and biology• Great potential
– Better treatments for patients– Improved use of compounds– Greater biological understanding