Date post: | 29-Dec-2015 |
Category: |
Documents |
Upload: | marlene-woods |
View: | 218 times |
Download: | 0 times |
1
Early stopping for phase II cancer studies:
a likelihood approach
Elizabeth Garrett-Mayer, PhDAssociate Professor of Biostatistics
The Hollings Cancer CenterThe Medical University of South Carolina
2
Motivation Oncology Phase II studies
Single arm Evaluation of efficacy
Historically, ‘clinical response’ is the outcome of interest Evaluated within several months (cycles) of enrollment Early stopping often incorporated for futility
More recently targeted agents halt growth may or may not shrink tumor ‘progression-free survival’ is outcome of interest extensions for survival evaluation, but not today
3
Early Stopping in Phase II studies: Binary outcome (clinical response)
Attractive solutions exist for this setting
Common design is Simon’s two-stage (Simon, 1989) Preserves type I and type II error Procedure: Enroll N1 patients (stage 1).
If x or more respond, enroll N2 more (stage 2) If fewer than x respond, stop.
Appropriate for binary responses
Bayesian approaches also implemented binary likelihood, beta prior → beta binomial model other forms possible requires prior Lee and Liu: predictive probability design (Clinical Trials,
2008)
4
Alternative approach for early stopping Use likelihood-based approach (Royall (1997),
Blume (2002))
Not that different than Bayesian Parametric model-based No “penalties” for early looks
But it is different No prior information included Early evaluations are relatively simple “Probability of misleading evidence” controlled Can make statements about probability of
misleading evidence
5
Today’s talk Likelihood approach
principles multiple looks
Focus on binary outcome situation
Can be extended to the time-to-event outcome setting parametric survival distributions issues with length of follow-up issues with how often to “look”
6
Law of Likelihood
If hypothesis A implies that the probability of observing some data X is PA(X), and hypothesis B implies that the probability is PB(X), then the observation X=x is evidence supporting A over B if PA(x) > PB(x), and the likelihood ratio, PA(x)/PB(x), measures the strength of that evidence. (Hacking 1965, Royall 1997)
7
0.01 0.02 0.03 0.04 0.05
0.0
0.2
0.4
0.6
0.8
1.0
Lambda
Lik
elih
oo
d
Likelihood approach Determine “what the data say” about the parameter of
interest Likelihood function: gives a picture of the data Likelihood intervals (LI): gives range of reasonable values
for parameter of interest
1/81/32
80.01 0.02 0.03 0.04 0.05
0.0
0.2
0.4
0.6
0.8
1.0
Lambda
Lik
elih
oo
d
Likelihood approach Likelihood ratios (LR)
Take ratio of heights of L for different values of λ L(λ=0.030)=0.78; L(λ=0.035)=0.03. LR = 26
9
Likelihood-Based Approach Use likelihood ratio to determine if there is
sufficient evidence in favor of the one or another hypothesis
Error rates are bounded Implications: Can look at data frequently
without concern over mounting errors “Evidence-based”
10
Key difference in likelihood versus frequentist paradigm Consideration of the alternative hypothesis Frequentist hypothesis testing:
H0: null hypothesis
H1: alternative hypothesis
Frequentist p-values: calculated assuming the null is true, Have no regard for the alternative hypothesis
Likelihood ratio: Compares evidence for two hypotheses Acceptance or rejection of null depends on the
alternative
11
Example: Assume H0: λ = 0.12 vs.
H1: λ = 0.08
What if true λ = 0.10? Simulated data, N=300 Frequentist:
p = 0.01 Reject the null
Likelihood LR = 1/4 Weak evidence in favor of null 0.08 0.09 0.10 0.11 0.12
0.0
0.2
0.4
0.6
0.8
1.0
Lambda
Lik
elih
oo
d
12
Example: Why? P-value looks for
evidence against null LR compares evidence
for both hypotheses When the “truth” is in the
middle, which makes more sense?
0.08 0.09 0.10 0.11 0.12
Lambda
Lik
elih
oo
d
1e
-04
0.0
10
.11
13
Likelihood Inference Weak evidence: at the end of the study, there is
not sufficiently strong evidence in favor of either hypothesis This can be controlled by choosing a large enough
sample size But, if neither hypothesis is correct, can end up with
weak evidence even if N is seemingly large (appropriate) Strong evidence
Correct evidence: strong evidence in favor of correct hypothesis
Misleading evidence: strong evidence in favor of the incorrect hypothesis.
This is our interest today: what is the probability of misleading evidence?
This is analogous to the alpha (type I) and beta (type II) errors that frequentists worry about
14
Operating Characteristics
0.1 0.2 0.3 0.4 0.5 0.6
0.0
0.2
0.4
0.6
0.8
1.0
Simon two-stage
True p
Pro
babi
lity
Accept H0Reject H0
15
Operating Characteristics
0.1 0.2 0.3 0.4 0.5 0.6
0.0
0.2
0.4
0.6
0.8
1.0
Likelihood Approach
True p
Pro
babi
lity
Accept H0Accept HAWeak Evidence
16
Misleading Evidence in Likelihood Paradigm Universal bound: Under H0,
In words, the probability that the likelihood ratio exceeds k in favor of the wrong hypothesis can be no larger than 1/k.
In certain cases, an even lower bound applies (Royall,2000)
Difference between normal means Large sample size
Common choices for k are 8 (strong), 10, 32 (very strong).
kLL kP 1
0
1 (Birnbaum, 1962; Smith, 1953)
17
Implications Important result: For a sequence of independent
observations, the universal bound still holds (Robbins, 1970)
Implication: We can look at the data as often as desired and our probability of misleading evidence is bounded
That is, if k=10, the probability of misleading strong evidence is ≤ 10%
Reasonable bound: Considering β = 10-20% and α = 5-10% in most studies
18
Early stopping in phase II study:binary outcome Motivating Example
Single arm cancer clinical trial outcome = clinical response Early stopping for futility
Standard frequentist approach Simon two-stage design Only one look at the data Determine “optimality” criterion
minimax minimum E(N) under H0 (Simon’s optimal)
Likelihood approach Use binomial likelihood Can look at the data after each observation
19
Motivating Example New cancer treatment agent Anticipated response rate is 40% Null response rate is 20%
the standard of care yields 20% not worth pursuing new treatment with same response
rate as current treatment Using frequentist approach:
Simon two-stage with alpha = beta = 10% Optimum criterion: smallest E(N) First stage: enroll 17. if 4 or more respond, continue Second stage: enroll 20. if 11 or more respond,
conclude success.
20
Likelihood Approach Recall: we can look after each patient at the
data Use the binomial likelihood to compare two
hypotheses. Difference in the log-likelihoods provides the log
likelihood ratio Simplifies to something simple
)1log()1log(
)1log()1log(loglogloglog
01
010101
ppN
ppppyLL i
21
Implementation Look at the data after each patient Estimate the difference in logL0 and logL1
Rules: if logL0 – logL1 > log(k): stop for futility
if logL0 – logL1 < log(k): continue
22
Likelihood Approach But, given discrete nature, only certain looks
provide an opportunity to stop Current example: stop the study if…
0 responses in 9 patients 1 response in 12 patients 2 responses in 15 patients 3 responses in 19 patients 4 responses in 22 patients 5 responses in 26 patients 6 responses in 29 patients 7 responses in 32 patients
Although total N can be as large as 37, there are only 8 thresholds for futility early stopping assessment
23
Design Performance Characteristics How does the proposed approach compare
to the optimal Simon two-stage design? What are performance characteristics we
would be interested in? small E(N) under the null hypothesis frequent stopping under null (similar to above) infrequent stopping under alternative acceptance of H1 under H1
acceptance of H0 under H0
24
Example 1: Simon DesignsH0: p = 0.20 vs. H1: p = 0.40. Power ≥ 90% and alpha ≤ 0.10.
Optimal Design:Stage 1: N1 = 17, r2=3Stage 2: N = 37, r=10Enroll 17 in stage 1. Stop if 3 or fewer responses.If more than three responses, enroll to a total N of 37.Reject H0 if more than 10 responses observed in 37 patients
Minimax Design:Stage 1: N1 = 22, r2=4Stage 2: N = 36, r=10Enroll 22 in stage 1. Stop if 4 or fewer responses.If more than four responses, enroll to a total N of 36.Reject H0 if more than 10 responses observed in 36 patients
25
Simon Optimal vs. Likelihood (N=37)
0.1 0.2 0.3 0.4 0.5 0.6
0.0
0.2
0.4
0.6
0.8
1.0
True p
Pro
babi
lity
Accept HA, LikAccept H0, LikWeak EvidenceAccept HA, SimonAccept H0, Simon
26
Simon Minimax vs. Likelihood (N=36)
0.1 0.2 0.3 0.4 0.5 0.6
0.0
0.2
0.4
0.6
0.8
1.0
True p
Pro
babi
lity
Accept HA, LikAccept H0, LikWeak EvidenceAccept HA, SimonAccept H0, Simon
27
Probability of Early Stopping
0.1 0.2 0.3 0.4 0.5 0.6
0.0
0.2
0.4
0.6
0.8
1.0
Likelihood (optimal N)Likelihood (minmax N)Simon OptimalSimon MinMax
True p
Pro
babi
lity
of S
topp
ing
Ear
ly
28
Expected Sample Size
0.2 0.4 0.6 0.8
10
15
20
25
30
35
Likelihood (optimal N)Likelihood (minmax N)Simon OptimalSimon MinMax
True p
Exp
ecte
d S
ampl
e S
ize
29
Another scenario Lower chance of success
Now, only 3 criteria for stopping: 0 out of 14 1 out of 23 2 out of 32
H0: p = 0.05 vs. H1: p = 0.20
30
Simon DesignsH0: p = 0.05 vs. H1: p = 0.20. Power ≥ 90% and alpha ≤ 0.10.
Optimal Design:Stage 1: N1 = 12, r2=0Stage 2: N = 37, r=3Enroll 12 in stage 1. Stop if 0 responses.If at least one response, enroll to a total N of 37.Reject H0 if more than 3 responses observed in 37 patients
Minimax Design:Stage 1: N1 = 18, r2=0Stage 2: N = 32, r=3Enroll 18 in stage 1. Stop if 0 responses.If at least one response, enroll to a total N of 32.Reject H0 if more than 3 responses observed in 32 patients
31
Simon Optimal vs. Likelihood (N=37)
0.0 0.1 0.2 0.3 0.4
0.0
0.2
0.4
0.6
0.8
1.0
True p
Pro
babi
lity
Accept HA, LikAccept H0, LikWeak EvidenceAccept HA, SimonAccept H0, Simon
32
Simon Minimax vs. Likelihood (N=32)
0.0 0.1 0.2 0.3 0.4
0.0
0.2
0.4
0.6
0.8
1.0
True p
Pro
babi
lity
Accept HA, LikAccept H0, LikWeak EvidenceAccept HA, SimonAccept H0, Simon
33
Probability of Early Stopping
0.00 0.05 0.10 0.15 0.20 0.25 0.30
0.0
0.2
0.4
0.6
0.8
1.0
Likelihood (optimal N)Likelihood (minmax N)Simon OptimalSimon MinMax
True p
Pro
babi
lity
of S
topp
ing
Ear
ly
34
Expected Sample Size
0.0 0.1 0.2 0.3 0.4 0.5 0.6
15
20
25
30
35
Likelihood (optimal N)Likelihood (minmax N)Simon OptimalSimon MinMax
True p
Exp
ecte
d S
ampl
e S
ize
35
Last scenario Higher chance of success
Now, 21 criteria for stopping: 0 out of 6 1 out of 8 2 out of 10 3 out of 12 4 out of 4 5 out of 16 6 out of 18 ... 20 out 46
H0: p = 0.40 vs. H1: p = 0.60
36
Simon DesignsH0: p = 0.40 vs. H1: p = 0.60. Power ≥ 90% and alpha ≤ 0.10.
Optimal Design:Stage 1: N1 = 18, r2=7Stage 2: N = 46, r=22Enroll 18 in stage 1. Stop if 7 or fewer responses.If more than 7 responses, enroll to a total N of 46.Reject H0 if more than 22 responses observed in 46 patients
Minimax Design:Stage 1: N1 = 28, r2=11Stage 2: N = 41, r=20Enroll 28 in stage 1. Stop if 11 or fewer responses.If more than 11 responses, enroll to a total N of 41.Reject H0 if more than 20 responses observed in 41 patients
37
Simon Optimal vs. Likelihood (N=46)
0.2 0.3 0.4 0.5 0.6 0.7 0.8
0.0
0.2
0.4
0.6
0.8
1.0
True p
Pro
babi
lity
Accept HA, LikAccept H0, LikWeak EvidenceAccept HA, SimonAccept H0, Simon
38
Simon Minimax vs. Likelihood (N=41)
0.2 0.3 0.4 0.5 0.6 0.7 0.8
0.0
0.2
0.4
0.6
0.8
1.0
True p
Pro
babi
lity
Accept HA, LikAccept H0, LikWeak EvidenceAccept HA, SimonAccept H0, Simon
39
Probability of Early Stopping
0.2 0.3 0.4 0.5 0.6 0.7 0.8
0.0
0.2
0.4
0.6
0.8
1.0
Likelihood (optimal N)Likelihood (minmax N)Simon OptimalSimon MinMax
True p
Pro
babi
lity
of S
topp
ing
Ear
ly
40
Expected Sample Size
0.2 0.3 0.4 0.5 0.6 0.7 0.8
10
20
30
40
Likelihood (optimal N)Likelihood (minmax N)Simon OptimalSimon MinMax
True p
Exp
ecte
d S
ampl
e S
ize
41
More on the predictive probability approach Lee and Liu, Clinical Trials, 2008. Bayesian but without loss function (no Bayes risk) Searches for design parameters to ensure size
and power Prior is chosen so that mean of prior is the null
hypothesis, but weak. Predictive probability (PP) = probability that the
end result of the trial is positive given current data and data to be observed Based on the probability that the true response rate is
greater than the null response rate. Again, ignores the alternative
More on the predictive probability approach Stopping:
if PP < θL : then stop trial and reject alternative if PP > θU: stop trial and reject the null (but often θU = 1)
At pre-specified times, the predictive probability is calculated Lee and Liu explore different frequency of stopping Comparisons here are for looking after every patient
θT is defined as the threshold for determining efficacy at the trial’s end θT and θU do not have the same stringency
42
43
Example of Predictive Probability Design
44
Comparison with Predicted Probability
Minimax Sample Size
45
Comparison with Predicted Probability
Optimal Sample Size
46
Summary and Conclusions (1) Likelihood based stopping provides another option for trial
design in phase II single arm studies We only considered 1 value of K
chosen to be comparable to frequentist approach other values will lead to more/less conservative results extension: different K for early stopping versus final go/no go
decision Overall, sample size is smaller
especially marked when you want to stop for futility when early stopping is not expected, not much difference in
sample size For ‘ambiguous’ cases:
likelihood approach stops early more often than Simon In minimax designs, finds ‘weak’ evidence frequently
47
Summary and Conclusions (2) ‘r’ for final analysis is generally smaller.
why? the notion of comparing hypotheses instead of
conditioning only on the null. Comparison the the PP approach is
favorable likelihood stopping is less time consuming and
less computationally intensive LS does not require specification of a prior “search” for designs in relatively simple
49
Early Stopping in Phase II studies: time-to-event outcomes Disease stabilization More common with novel treatments,
targeted therapies Example: targeting stem cells
If treatment works, cancer does not progress
But, “bulk” may still remain Time-to-progression is relevant outcome But, takes a long time to evaluate…
50
One suggested/common approach
Apply Simon’s two-stage design Example:
1 year PFS of 0.30 versus 0.50 (α = β = 0.10) Enroll 20 patients If 6 or more are PF at 1 year, enroll an additional 22 for a
total of 42 patients. Study design
Assume trial will take 2 years to accrue (21 patients per year)
First 20 patients will be enrolled by end of year 1 20th patient should be evaluable for 1 year PFS at end of
year 2.
51
So, what’s the problem?
Problem 1: By the end of year 2, almost all of the additional 22 patients will have been enrolled, yet the stage 1 patients have just become evaluable.
Problem 2: if the trial needs to be suspended after 20 patients (to wait for events), investigators may need to stop enrollment for 1 year.
One suggested/common approach
52
Current approaches for early stopping
with TTE outcomes
Bayesian approaches (Thall et al., 2005)
Frequentist approaches (Case and Morgan, 2003)
Ad hoc approaches Use related outcome (e.g., clinical response) Spend a little alpha early and evaluate:
At a prespecified time When a prespecified number of patients have
reached a landmark time (e.g. 1 year) When a prespecified number of patients have been
enrolled
53
Early stopping in phase II study:time to event outcome Motivating Example
Single arm Time-to-event outcome Early stopping for futility
Standard frequentist approach Non-parametric (i.e., no model) Kaplan-Meier estimate of 6 mo. PFS “Robust”, but not powerful!
Likelihood approach Requires a parametric model (like the
Bayesians!)
54
Model Choice Considerations Trade-off: One-parameter vs. >One-parameter
model Parsimony versus fit Bias versus variance
Small amount of data: cannot tolerate many parameters
Exponential (one-parameter) obvious choice Some other options:
Weibull Log-normal Cure-rate
55
Critical Issue Decision to stop must be robust to model
misspecification “Robustifying” likelihood (Royall and Tsou, 2003)
Not appropriate here: exponential with censoring does not meet criteria
Further study needed to see early stopping behavior when model is misspecified
56
Exponential Model and Likelihood
p ro b ab ility d en sity
su rv iv a l fu n c tio n
L o g - lik e lih o o d fu n c tio n
M ax im u m lik e lih o o d e stim ate :
: ( | )
: ( )
: ( ; , ) lo g
f t e
S t e
L t d d t
d
t
t
t
i ii
N
i
i
1
57
Simulations Need comparability across distributions of
simulated data Chose underlying distributions with same 6
month survival Exponential Weibull: one with larger variance, one with smaller Log-normal: one with larger variance, one with smaller Cure-rate
Working model: exponential distribution Simulations: data generated assuming treatment
lacks efficacy
58
0 5 15 25 35
0.0
0.4
0.8
Time (months)
S(t
)
Exponential, 0.08
0 5 15 25 35
0.0
0.4
0.8
Time (months)S
(t)
Weibull, 1/10, 1.43
0 5 15 25 35
0.0
0.4
0.8
Time (months)
S(t
)
LogNormal, 2, 0.69
0 5 15 25 35
0.0
0.4
0.8
Time (months)
S(t
)
Cure Rate, 0.13, 30%
0 5 15 25 35
0.0
0.4
0.8
Time (months)
S(t
)
Weibull, 1/17, 0.7
0 5 15 25 35
0.0
0.4
0.8
Time (months)S
(t)
LogNormal, 2.4, 2.0
Comparison of underlying distributionsBlack: true distnRed: best exponential
59
Simulation study 1
Null hypothesis is true: should stop early for futility in large fraction of trials
Three ways to characterize hypotheses:
H0: 6 mo PFS = 62% vs. H1: 6 mo PFS = 74% H0: E(t) = 12.5 mo vs. H1: E(t) = 20 mo
H0: λ = 0.08 vs. H1: λ = 0.05
N = 100 Starting with 25th patient, analyze data every 5th enrollment Censoring is assumed to be administrative 24 months of enrollment (assuming no early stopping) Total study time 36 months (24 month accrual, 12 month F.U.) Use likelihood intervals of 1/10
60
Stopping Rule Stop if the likelihood ratio < 1/10 That is, if the ratio of the likelihood for the NULL
to the ALTERNATIVE is 10, then stop.
Note 1: ONLY considering stopping for futility! Note 2: based on universal bound, we have a
less than 10% chance of strong evidence in favor of the wrong hypothesis
Note 3: based on Royall (2000), probably have even less than that….
61
Simulated Data Examples
No stopping Stop at N=55
Number of Patients Enrolled
Lik
elih
oo
d R
atio
25 35 45 55 65 75 85 95
0.1
0.2
51
41
0
Number of Patients Enrolled
Lik
elih
oo
d R
atio
25 35 45 55 65 75 85 95
0.1
0.2
51
41
0
62
Frequentist Properties of Simulation Study
N=100, H0: λ = 0.08 vs. H1: λ = 0.05 Using exponential test and assuming exponential
data: Alpha = 5% Power = 98%
Using non-parametric test, and assuming exponential data: Alpha = 5% Power = 78%
No interim analyses included
63
Why not look before 25 patients?
End month
1
End month
2
End month
3
End month
4
End month
5
End month
6
Total enrolled 4 8 12 16 21 25
≥ 1 month f.u 0 4 8 12 16 21
≥ 2 month f.u 0 0 4 8 12 16
≥ 3 month f.u 0 0 0 4 8 12
≥ 4 month f.u 0 0 0 0 4 8
≥ 5 month f.u 0 0 0 0 0 4
≥ 6 month f.u 0 0 0 0 0 0
64
0 5 15 25 35
0.0
0.4
0.8
Time (months)
S(t
)
Exponential, 0.08
0 5 15 25 35
0.0
0.4
0.8
Time (months)
S(t
)
Weibull, 1/10, 1.43
0 5 15 25 35
0.0
0.4
0.8
Time (months)
S(t
)
LogNormal, 2, 0.69
0 5 15 25 35
0.0
0.4
0.8
Time (months)
S(t
)
Cure Rate, 0.13, 30%
0 5 15 25 35
0.0
0.4
0.8
Time (months)
S(t
)
Weibull, 1/17, 0.7
0 5 15 25 350.
00.
40.
8
Time (months)
S(t
)
LogNormal, 2.4, 2.0
Simulations Blue: 12 month estimate solid black: true distnRed: 60 month estimate dashed: hypotheses
65
25 30 35 40 45 50 55 60 65 70 75 80 85 90 95 100
Exponential
Total sample size
Fre
quen
cy
0
410
40100
40010002000
Median N = 60% stopped = 87
25 30 35 40 45 50 55 60 65 70 75 80 85 90 95 100
Weibull 1
Total sample size
Fre
quen
cy
0
410
40100
40010002000
Median N = 85% stopped = 64
25 30 35 40 45 50 55 60 65 70 75 80 85 90 95 100
Log-Normal 1
Total sample size
Fre
quen
cy
0
410
40100
40010002000
Median N = 60% stopped = 99
25 30 35 40 45 50 55 60 65 70 75 80 85 90 95 100
Cure Rate 1
Total sample size
Fre
quen
cy
0
410
40100
40010002000
Median N = 100% stopped = 4
25 30 35 40 45 50 55 60 65 70 75 80 85 90 95 100
Weibull 2
Total sample size
Fre
quen
cy
0
410
40100
40010002000
Median N = 35% stopped = 97
25 30 35 40 45 50 55 60 65 70 75 80 85 90 95 100
Log-Normal 2
Total sample size
Fre
quen
cy
0
410
40100
40010002000
Median N = 55% stopped = 62
Early Stopping
66
Likelihood ratios:<1/32 [1/32, 1/10) [1/10, 1) [1,10) [10,32) >32
Exponential* 0.20 0.76 0.03 0.01 <0.01 <0.01
Weibull 1 0.47 0.53 <0.01 <0.01 <0.01 <0.01
Log-Normal 1 0.27 0.73 <0.01 <0.01 <0.01 <0.01
Cure Rate <0.01 0.04 <0.01 <0.01 <0.01 0.96
Weibull 2 0.18 0.80 0.01 0.01 <0.01 0.01
Log-Normal 2 0.06 0.55 <0.01 <0.01 0.01 0.37
67
Frequentist Approach: Exponential Data Based on observed data (stopped and
completed trials) 0.55% of trials showed significant p-value
(versus 0.45% with LR>10) Agreement of 99.6% for hypothesis testing
decision High agreement in inferences
68
Additional simulations Early stopping is critical when we have a
rate that is even WORSE than the null Example:
We are testing 62% vs. 74% 6 month PFS What if true 6 month PFS based on our
regimen is only 55%? Or 49%? What is the chance of early stopping in these
cases? Simple scenario: exponential data,
exponential model
69
Early Stopping:H0: 6 mo PFS = 62% vs. H1: 6 mo PFS = 74%
25
30
35
40
45
50
55
60
65
70
75
80
85
90
95
10
0
6 mo PFS = 55%
Total sample size
Fre
qu
en
cy
0
4
10
40
100
400
10002000
Median N = 40% stopped = 99.8
25
30
35
40
45
50
55
60
65
70
75
80
85
90
95
10
0
6 mo PFS = 49%
Total sample size
Fre
qu
en
cy
0
4
10
40
100
400
10002000
Median N = 35% stopped = 99.9
70
Likelihood Ratios<1/32 [1/32, 1/10) [1/10, 1) [1,10) [10,32
)>32
55% 6 mo PFS
0.19 0.81 <0.01 <0.01 <0.01 <0.01
49% 6 mo PFS
0.26 0.74 <0.01 <0.01 <0.01 <0.01
71
Summary and Conclusions (2) Properties are consistent with what we expected
When we have exponential data and k=10: We stop early OFTEN when we should We RARELY stop early when we shouldn’t
But, we need to be careful… We need a full understanding of the expected and
observed survival distribution If we have model misspecification, we could run into
trouble Not unrealistic: breast cancer—cure rate might be best-
fitting Quantifying simply by one point in time (e.g. 6 month
PFS) could be dangerous Should elicit several PFS at several times from clinical
investigator
72
Summary and Conclusions (3) This is the perfect example of why we need
to work in close collaboration with oncologists Need to get a good appreciation for the
anticipated distribution Early stopping should be carefully considered
based on observed data Implementation issues
Probably will not be able to do this in an “off-the-shelf” way
High-maintenance for the statistician Better for patients Better for Cancer Center (resources)
73
Future work in TTE Feasibility of 2-parameter models
In practice, can we fix one parameter? Preliminary data should give us a sense of the shape
Interval censoring Different censoring mechanisms Larger deviations from exponential (how
common?) Looks: when to start and how often? Study design guidelines (e.g. sample size)
74
References Case and Morgan (2003) Design of Phase II cancer trials evaluating survival
probabilities. BMC Medical Research Methodology; v. 3.
Birnbaum (1962) On the Foundations of Statistical Inference (with discussion). JASA, 53, 259-326.
Blume (2002) Likelihood Methods for Measuring Statistical Evidence, Statistics in Medicine, 21, 2563-2599.
Hacking (1965) Logic of Statistical Inference, New York: Cambridge Univ Press.
Royall (1997) Statistical Evidence:A Likelihood Paradigm, London, Chapman & Hall.
Royall (2000) On the Probability of Misleading Statistical Evidence, JASA, 95; 760-768.
Royall and Tsou (2003) Interpreting statistical evidence by using imperfect models: robust adjusted likelihood functions. JRSS-B; 65(2), 391-404.
Simon (1989) Optimal Two-Stage Designs for Phase II Clinical Trials. Controlled Clinical Trials; 10,1-10.
Smith (1953) The Detection of Linkage in Human Genetics. JRSS-B, 15, 153-192.
Thall, Wooten and Tannir (2005) Monitoring event times in early phase clinical trials: some practical issues. Clinical Trials; v. 2, 467-478.