Date post: | 21-Jan-2016 |
Category: |
Documents |
Upload: | vernon-ball |
View: | 213 times |
Download: | 1 times |
Innovative Paths to Better Medicines
Design Considerations in Molecular Biomarker Discovery Studies
Doris Damian and Robert McBurneyJune 6, 2007
Innovative Paths to Better Medicines
Confidential Information – Do Not Reproduce or Distribute – page 2
Outline of Presentation
• Introduction:
– Mass Spectrometry Data
– Studies objectives and questions
• Statistical Processing of MS Data
– Sample normalization
– Removal of peak-specific batch and other temporal trends
– Filtering of noisy peaks
• Design Considerations
– Power calculations – for univariate biomarkers
– Power calculations for multivariate biomarkers (regression)
Innovative Paths to Better Medicines
Confidential Information – Do Not Reproduce or Distribute – page 3
• Measurements: chemical compounds of different classes (proteins,
lipids, polar and non-polar metabolites, amino acids, etc.)
• The variables constituting the data sets are peak intensities (peaks)
identified by m/z and retention time. The peak intensities are
proportional to the amount of analyte detected by the mass
spectrometer. Note that p >> n!
0 10 20 30 40 50 60 70 80 90 100 110 120 1305
e+
067
e+
06
sample
peak
inte
nsity
MS of Individual
Peaks
Total Ion Chromatogram
Selected Ion Chromatogram
Figure modified from: http://www.asms.org/whatisms/p13.html
biological samplesQC samples
Mass Spectrometry Data
Innovative Paths to Better Medicines
Confidential Information – Do Not Reproduce or Distribute – page 4
Questions
Design
Experiment
StatisticalProcessing
Data Analysis
Objectives
Structure of a Molecular Biomarker Discovery Study
Innovative Paths to Better Medicines
Confidential Information – Do Not Reproduce or Distribute – page 5
Questions
Design
Experiment
Processing
Analysis
Objectives
Objectives
Questions
Diagnosis Elucidation of Mechanisms of Action (MoA)
•What is a minimal set of biomarkers?
•What are all the biomarkers?•What are the molecular
pathways?
Questions
Biomarker:A characteristic that is objectively measured and evaluated as an indicator of normal biological processes, pathogenic processes, or pharmacologic response(s) to a therapeutic intervention.
Studies Objectives and Questions
Innovative Paths to Better Medicines
Confidential Information – Do Not Reproduce or Distribute – page 6
Outline of Presentation
• Introduction:
– Mass Spectrometry Data
– Studies objectives and questions
• Statistical Processing of MS Data
– Sample normalization
– Removal of peak-specific batch and other temporal trends
– Filtering of noisy peaks
• Design Considerations
– Power calculations – for univariate biomarkers
– Power calculations for multivariate biomarkers (regression)
Innovative Paths to Better Medicines
Confidential Information – Do Not Reproduce or Distribute – page 7
• Sample normalization
– correction of baseline differences between samples
• Removal of peak-specific batch and other temporal trends
– due to instrument and processing limitations, samples are acquired
sequentially in batches – peaks exhibit batch-to-batch variation;
– instrument performance may become unstable over time, samples
may undergo degradation.
These are main causes for temporal variation observed in peak
intensities.
• Filtering of noisy peaks
– for each biological sample replicate measurements are obtained;
– the estimated correlation between these replicates is used as a filter
for noisy data.
Statistical Processing
Presented at IBC’s Biomarkers and Molecular Diagnostic conferences September 2006
Innovative Paths to Better Medicines
Confidential Information – Do Not Reproduce or Distribute – page 8
• Correction of baseline differences between samples.
• Based on Internal Standards.
• Internal Standards are known exogenous compounds,
added to the biological samples in fixed amounts at the
beginning of the sample preparation stage (same for all
samples).
• Used to account for sample variability (e.g., pipetting
errors) during sample preparation and acquisition.
Sample Normalization
Innovative Paths to Better Medicines
Confidential Information – Do Not Reproduce or Distribute – page 9
1 2 3 4 5 6
14.0
14.5
15.0
15.5
16.0
16.5
17.0
17.5
IS Peak
log(
inte
nsity
)
Before Normalization: Sample Profiles of 6 Internal Standard Peaks
Typical Sample Profiles of IS Peaks – before Normalization
Innovative Paths to Better Medicines
Confidential Information – Do Not Reproduce or Distribute – page 10
• Normalization – the statistical procedure of multivariate
scaling of samples based on (a subset of) IS peaks.
• Y = log(intensity); i = 1,…,I IS peak; j = 1,…,J sample.
• The sample-specific factors, , are estimated in this
ANOVA model and removed from all peaks.
ij i j ijY
j
Sample Normalization
Innovative Paths to Better Medicines
Confidential Information – Do Not Reproduce or Distribute – page 11
1 2 3 4 5 6
14.0
14.5
15.0
15.5
16.0
16.5
17.0
17.5
IS Peak
log(
inte
nsity
)
After Normalization: Sample Profiles of 6 Internal Standard Peaks
Through normalization, temporal trends common to all peaks are removed.
Typical Sample Profiles of IS Peaks – after Normalization
Innovative Paths to Better Medicines
Confidential Information – Do Not Reproduce or Distribute – page 12
0 50 100 150 200 250 300 350 400 450
14.0
14.5
15.0
15.5
16.0
16.5
17.0
17.5
sample order
log(
inte
nsity
)
Before Normalization: Temporal Profiles of 6 Internal Standard Peaks
ˆ t
Typical Temporal Profiles of IS Peaks – before Normalization
Innovative Paths to Better Medicines
Confidential Information – Do Not Reproduce or Distribute – page 13
0 50 100 150 200 250 300 350 400 450
14.0
14.5
15.0
15.5
16.0
16.5
17.0
17.5
sample order
log(
inte
nsity
)
After Normalization: Temporal Profiles of 6 Internal Standard Peaks
Typical Temporal Profiles of IS Peaks – after Normalization
Innovative Paths to Better Medicines
Confidential Information – Do Not Reproduce or Distribute – page 14
• Sample normalization
– correction of baseline differences between samples
• Removal of peak-specific batch and other temporal trends
– due to instrument and processing limitations, samples are acquired
sequentially in batches – peaks exhibit batch-to-batch variation;
– instrument performance may become unstable over time, samples
may undergo degradation.
These are main causes for temporal variation observed in peak
intensities.
• Filtering of noisy peaks
– for each biological sample replicate measurements are obtained;
– the estimated correlation between these replicates is used as a filter
for noisy data.
Statistical Processing
Innovative Paths to Better Medicines
Confidential Information – Do Not Reproduce or Distribute – page 15
0 5 10 15 20 25 30 35 40 45 50 55 60 65 70 75 80 85 90 95 100 110 120 130
14.8
15.2
15.6
sample order
log(
inte
nsity
)
Before Normalization: Temporal Profile of Peak 41
QC: Black; Biological samples: Red
0 5 10 15 20 25 30 35 40 45 50 55 60 65 70 75 80 85 90 95 100 110 120 130
14.8
15.2
15.6
sample order
log(
inte
nsity
)
After Normalization: Temporal Profile of Peak 41
QC: Black; Biological samples: Red
Peak-Specific Temporal Trends – after Normalization
Innovative Paths to Better Medicines
Confidential Information – Do Not Reproduce or Distribute – page 16
• The within and between batch patterns cause visible batch
separations:
• If one does not account for these intrinsic experimental trends,
important biological effects may be obscured.
The Need for Batch Corrections
-12
-10
-8
-6
-4
-2
0
2
4
6
8
10
12
14
-15 -14 -13 -12 -11 -10 -9 -8 -7 -6 -5 -4 -3 -2 -1 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14
t[2]
t[1]
PCA: Iris Plasma GC/MS Data SetAfter Normalization
(colored by batch, numbered sequentially)
Ellipse: Hotelling T2 (0.95)
1234
1
2
3
4 5
67
8
910
11 12
131415
16
1718
1920
2122
23
2425
2627
2829
30
31
32
33
34
35 36 3738
39 40
4142
43
44
45
46
47
4849
5051
5253
54
55
565758
5960
61
62
6364
65
6667
68
69
70
717273
7475
7677
78
7980
8182
8384
8586
8788
8990
9192
93
9495
9697
9899
100
101102
103104
105107
108109
110111
112
113
114
115
116
117118
119
120
121
122
123124
125126
127128129
130131
132133134
PCA Plot: Data set after NormalizationColored by Batch
first principal component
secon
d p
rin
cip
al com
pon
en
t
Innovative Paths to Better Medicines
Confidential Information – Do Not Reproduce or Distribute – page 17
• Based on QC samples (ideally)
– QC samples: a pool of material from the biological
samples in a study, aliquoted into a set of identical
samples that are acquired at specific intervals in
each batch of samples.
0 5 10 15 20 25 30 35 40 45 50 55 60 65 70 75 80 85 90 95 100 110 120 130
14.8
15.2
15.6
sample order
log(
inte
nsity
)
Before Normalization: Temporal Profile of Peak 41
QC: Black; Biological samples: Red
0 5 10 15 20 25 30 35 40 45 50 55 60 65 70 75 80 85 90 95 100 110 120 130
14.8
15.2
15.6
sample order
log(
inte
nsity
)
After Normalization: Temporal Profile of Peak 41
QC: Black; Biological samples: Red
Removal of Peak-Specific Temporal Trends
Innovative Paths to Better Medicines
Confidential Information – Do Not Reproduce or Distribute – page 18
0 5 10 15 20 25 30 35 40 45 50 55 60 65 70 75 80 85 90 95 100 110 120 130
14.8
15.2
15.6
sample order
log(
inte
nsity
)
After Normalization, Before Batch Correction: Temporal Profile of Peak 41
QCY: Black; Biological samples: Red
0 5 10 15 20 25 30 35 40 45 50 55 60 65 70 75 80 85 90 95 100 110 120 130
14.8
15.2
15.6
sample order
log(
inte
nsity
)
After Normalization, After Batch Correction: Temporal Profile of Peak 41
QCY: Black; Biological samples: Red
0 5 10 15 20 25 30 35 40 45 50 55 60 65 70 75 80 85 90 95 100 110 120 130
14.8
15.2
15.6
sample order
log(
inte
nsity
)
After Normalization, Before Batch Correction: Temporal Profile of Peak 41
QCY: Black; Biological samples: Red
0 5 10 15 20 25 30 35 40 45 50 55 60 65 70 75 80 85 90 95 100 110 120 130
14.8
15.2
15.6
sample order
log(
inte
nsity
)
After Normalization, After Batch Correction: Temporal Profile of Peak 41
QCY: Black; Biological samples: Red
20, 1, 2,( )b b b bf t t t Temporal trend within batch b (b=1,…,B batches):
estimated based on QC samples within batch b
Removal of Peak-Specific Temporal Trends
Innovative Paths to Better Medicines
Confidential Information – Do Not Reproduce or Distribute – page 19
• Sample normalization
– correction of baseline differences between samples
• Removal of peak-specific batch and other temporal trends
– due to instrument and processing limitations, samples are acquired
sequentially in batches – peaks exhibit batch-to-batch variation;
– instrument performance may become unstable over time, samples
may undergo degradation.
These are main causes for temporal variation observed in peak
intensities.
• Filtering of noisy peaks
– for each biological sample replicate measurements are obtained;
– the estimated correlation between these replicates is used as a filter
for noisy data.
Statistical Processing
Innovative Paths to Better Medicines
Confidential Information – Do Not Reproduce or Distribute – page 20
• When the same sample is measured several times, we require
the measurements to correlate well.
• The correlation between replicates can be expressed as a
tradeoff between the biological variance ( ) and the
measurement error variance ( ).
• Ideal case: no measurement error .
• The estimated correlation, , can be used to filter noisy peaks.
2
1 2 2 2, Bio
Bio
Corr Y Y
2Bio
2
1 2 20.5 .Bio
Correlations between Biological Replicates
Innovative Paths to Better Medicines
Confidential Information – Do Not Reproduce or Distribute – page 21
10.4 10.6 10.8 11.0 11.2 11.4 11.6 11.8 12.0 12.2 12.4 12.6
10.4
10.6
10.8
11.0
11.2
11.4
11.6
11.8
12.0
12.2
12.4
12.6
replicate 1
repl
icat
e 2
Peak 25: Estimated Correlation = 0.37
10.4 10.6 10.8 11.0 11.2 11.4 11.6 11.8 12.0 12.2 12.4 12.6
10.4
10.6
10.8
11.0
11.2
11.4
11.6
11.8
12.0
12.2
12.4
12.6
replicate 1
repl
icat
e 2
Peak 101: Estimated Correlation = 0.98Distribution of Correlations
between Replicates
Fre
quen
cy
0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0
010
2030
4050
6070
80Examples of Correlations (two extremes)
Innovative Paths to Better Medicines
Confidential Information – Do Not Reproduce or Distribute – page 22
Outline of Presentation
• Introduction:
– Mass Spectrometry Data
– Studies objectives and questions
• Statistical Processing of MS Data
– Sample normalization
– Removal of peak-specific batch and other temporal trends
– Filtering of noisy peaks
• Design Considerations
– Power calculations – for univariate biomarkers
– Power calculations for multivariate biomarkers (regression)
Innovative Paths to Better Medicines
Confidential Information – Do Not Reproduce or Distribute – page 23
• The power in biomarker discovery studies is a function of:
– The sample size
– The separation between the groups (e.g., MFC)
– The proportion of biomarkers in the data set
– The false discovery rate (FDR) allowed
– The platform variability
– The within-group variability
– Other factors (e.g. other covariates in the model) ?
Power Calculations
• Statistical power = probability to detect biomarkers
Innovative Paths to Better Medicines
Confidential Information – Do Not Reproduce or Distribute – page 24
• The power in biomarker discovery studies is a function of:
– The sample size
– The separation between the groups (e.g., MFC)
– The proportion of biomarkers in the data set
– The false discovery rate (FDR) allowed
– The platform variability
– The within-group variability
– Other factors (e.g. other covariates in the model) ?
Power Calculations
• Statistical power = probability to detect biomarkers
Innovative Paths to Better Medicines
Confidential Information – Do Not Reproduce or Distribute – page 25
den
sity
x
healthydiseased
time (days)
y (E
xpec
ted
Val
ue)
0 1 2 3 4 5 6 7
healthydiseased
: MFC = 1.7: MFC = 2.0: MFC = 3.0
solid: FDR 0.1dashed: FDR 0.2
6 8 10 12 14 16 18 20 22 24 26 28 30
0.0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1.0
sample size (per group)
po
wer
Proportion of Biomarkers = 90%
Illustration I: Power Curves
Innovative Paths to Better Medicines
Confidential Information – Do Not Reproduce or Distribute – page 26
den
sity
x
healthydiseased
time (days)
y (E
xpec
ted
Val
ue)
0 1 2 3 4 5 6 7
healthydiseased
: MFC = 1.7: MFC = 2.0: MFC = 3.0
solid: FDR 0.1dashed: FDR 0.2
6 8 10 12 14 16 18 20 22 24 26 28 30
0.0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1.0
sample size (per group)
po
wer
Proportion of Biomarkers = 90%
6 8 10 12 14 16 18 20 22 24 26 28 300.
00.
10.
20.
30.
40.
50.
60.
70.
80.
91.
0
sample size (per group)
po
wer
Proportion of Biomarkers = 10%
Illustration I: Power Curves
Innovative Paths to Better Medicines
Confidential Information – Do Not Reproduce or Distribute – page 27
den
sity
x
healthydiseased
time (days)
y (E
xpec
ted
Val
ue)
0 1 2 3 4 5 6 7
healthydiseased
: MFC = 1.7: MFC = 2.0: MFC = 3.0
dotted: EstimatedFDR
There is no loss in power,
(proportion of biomarkers
discovered) BUT the FDR
may be undesirable.
6 8 10 12 14 16 18 20 22 24 26 28 300.
00.
10.
20.
30.
40.
50.
60.
70.
80.
91.
0
sample size (per group)
po
wer
Proportion of Biomarkers = 10%
FRD
Power Curves Not Accounting for the FDR
Innovative Paths to Better Medicines
Confidential Information – Do Not Reproduce or Distribute – page 28
Power Calculation for Multivariate Biomarkers (Regression)
Classical Setting
• n > p
• Linear regression model
• Parametric (F) test of model
significance
• Computationally inexpensive
Biomarker Discovery Setting
• n << p
• Regression with constraints on
parameters (elastic net)
• Dimensionality reduction
needed (through cross-
validation)
• Non-parametric (label
permutations) test of model
significance
• Computationally very expensive
Innovative Paths to Better Medicines
Confidential Information – Do Not Reproduce or Distribute – page 29
Illustration: Power for Regression
Tf X X
1 1 p pY X X
2
2
1
1,
1 p
ii
Corr Y f
X
• Model
• Multivariate biomarker
• Parameter of interest
• Test: = 0
• Power = proportion of times that this hypothesis is rejected
Innovative Paths to Better Medicines
Confidential Information – Do Not Reproduce or Distribute – page 30
Power Calculation – Regression
15 20 25 30 35 40 45 50 55 60
0.0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1.0
number of samples
pow
er
rho = 0.58rho = 0.75rho = 0.92
rhoNumber of Samples
Power
0.92 30 0.50
0.92 38 0.79
0.92 45 0.96
0.75 30 0.31
0.75 38 0.46
0.75 45 0.50
0.75 60 0.70
0.00 30 0.02
Biomarker with 10 Components(known in advance)
…10 minutes to calculate
Biomarker with 10 Components(buried among 90 other analytes)
…days to calculate
Innovative Paths to Better Medicines
Confidential Information – Do Not Reproduce or Distribute – page 31
Thank you!