Post on 01-Oct-2020
transcript
Data Monitoring of Clinical Trials Using Prediction
Scott EvansLingling Li, Hajime Uno, LJ Wei
East Campus, Reisman Lecture HallFeldberg/Reisman Complex
Beth Israel Deaconess Medical Center
April 14, 2010
Special Thanks• Professor Jim Ware
• Kim Jackson
• Our hosts at BIDMC
• Colleagues– Sachiko Miyahara– Tzu-min Yeh– Huichao Chen– Jeanne Jiang– Lijuan Deng– Laura Smeaton
Outline
• Setting
• Motivation
• Predicted Intervals
• Predicted Interval Plots (PIPs)
• Extensions
Setting• Interim data from clinical trials are
monitored
• Treatment effects are estimated
• DSMBs (or DMCs) meet to review interim results and make recommendations regarding future trial conduct
Practical Questions
• Should the trial be stopped?– For efficacy?– For futility?
• Are there non-efficacious arms that should be dropped?
• Should sample size be re-calculated?– Due to a lack of precision in estimating a parameter during trial
design (e.g., variability, control group response)
• Should the duration of follow-up be modified due to unexpected event rates?– E.g., with an event-time endpoint?
Motivation• Answering these questions has:
– Ethical attractiveness• Fewer participants generally exposed to inefficacious and
potentially harmful therapies
– Economical advantages• Smaller expected sample sizes and shorter expected duration
than designs without interim analyses– Saving time, money, and other resources
– Public health advantages• Answers may get to the medical community more quickly
Interim Analyses Methods
• Group sequential methods – Control α spending– E.g., Slud and Wei, O’Brien-Fleming, Lan-
DeMets, Pocock– Generally boundary driven with test statistics
• Conditional power/futility index
Limitations of Many Traditional Methods
• Do not
– Provide estimates of effect or associated precision (only test statistics, p-values, and decision rules)
– Evaluate “clinical relevance”• Statistical significance is not the only consideration
– Information regarding the reasons for:• High p-values (or test statistics):
– Negligible effect vs. insufficient data vs. too much variation• Low p-values (or test statistics):
– Clinical significance?
Limitations of Many Traditional Methods
• Inflexible with binding decision rules based usually on a single (primary) endpoint
– Desire to base decisions upon simultaneous assessment of many factors, such as:
• Safety data • Secondary endpoints• Quality of life• Benefit:risk assessment• Results of other trials• Scientific relevance• Availability of new alternative therapies• Cost:benefit considerations
Repeated Confidence Intervals (RCIs)
• Sequential CIs– Simultaneous coverage control
• Uses principles of group sequential methods
• Provides estimates of effects sizes
• Allows for flexibility in decision making
• Jennison & Turnbull, Controlled Clinical Trials, 1984.• Mehta et.al., Statistics in Medicine, 2000.
Repeated Confidence Intervals
In fa
vor o
f Mon
oIn
favo
r of C
omb
99.95%CI
98.6%CI95.5%CI
Limitations of Repeated CIs
• At the interim, we wish to weigh the options of stopping vs. continuing
• Repeated CIs do not: – Provide formal evaluation of the ramifications of
continuing• What effect size estimates and associated precision will be
observed at the end of the trial? At the next interim?
• Thus how do we weigh the options?
Need for Methods that:• Control error rates
• Are flexible to allow for expert DSMB judgment – Allow incorporation of other information into decision
• Provide effect size estimates and associated precision– Assess clinical relevance and statistical significance
• Provide information about decision alternatives
Predicted Intervals
• Predict CI at future timepoint (e.g., end of trial or next interim analysis time) conditional upon:
1. Observed data2. Assumptions regarding future data (e.g., observed
trend continues, HA is true, H0 is true, best/worst case scenarios, etc.)
• Evans SR, Li L, Wei LJ, Drug Information Journal, 41:733-742, 2007.
NARC 009• Randomized, double-blind, placebo-controlled,
multicenter, dose-ranging study of prosaptide (PRO) for the treatment of HIV-associated neuropathic pain
• Participants were randomized to 2, 4, 8, 16 mg/d PRO or placebo administered via subcutaneous injection
• Primary endpoint:– 6 week change from baseline in weekly average of random daily
Gracely pain scale prompts using an electronic diary
NARC 009
• Designed for 390 participants equally allocated between groups
• Interim analysis conducted after 167 participants completed the 6-week double-blind treatment period
• Computed PIs
Treatment N 95% CI forMean Change
95% CI for Diff1
95% PI forDiff2
95% PI forDiff3
Required Diff4
Placebo 31 (-0.35, -0.11)
2 mg 34 (-0.21, -0.04) (-0.04, 0.25) (-0.01, 0.21) (-0.16, 0.06) -0.54
4 mg 34 (-0.38, -0.12) (-0.19, 0.16) (-0.14, 0.10) (-0.23, 0.01) -0.45
8 mg 32 (-0.18, -0.02) (-0.01, 0.28) (0.03, 0.23) (-0.15, 0.05) -0.56
16 mg 36 (-0.34, -0.09) (-0.16, 0.19) (-0.11, 0.14) (-0.21, 0.04) -0.54
1: 95% CI for the difference in mean changes vs. placebo2: 95% PI for the difference in mean changes vs. placebo assuming full enrollment, assuming current trend3: 95% PI for the difference in mean changes vs. placebo assuming full enrollment, assuming per protocol, μplacebo = -
0.17 and μdrug = -0.344: Difference in mean changes needed in the remaining participants for the CI for the difference in mean changes to
exclude zero (in favor of active treatment) at the end of the trial
Interim Analysis Results: NARC 009
NARC 009
• Sensitivity analyses shows that the futility assessment is robust
• Trial was discontinued by NARC DSMB for futility– Evans et. al., PLoS ONE, 2007.
Predicted Intervals
• Intuitive
• Practical– Provides information on effect size and associated
precision (clinical relevance)– Invariant to study design (superiority vs. noninferiority)– Use with RCI theory to control error rates and allow
flexibility in decision process– Can use for:
• Binary, continuous, or time-to-event endpoints• 1 and 2 sample problems
Issues
• Sensitivity analyses necessary to assess the robustness of results to varying assumptions– Strategic assessment of assumptions to be employed
• Need to assess impact of sampling variability
• Need concise and intuitive summaries for DSMBs– Too much information is not digestible
• Graphics are helpful– Too little information is not informative– Remember most DSMB members are not statisticians
Predicted Interval Plots (PIPs)
• Evaluates the sampling variability associated with the assumed model using simulation
• Plots the simulated PIs under the model assumption
• Conditional power is readily available
• Li L, Evans SR, Uno H, Wei LJ (Statistics in Biopharmaceutical Research)
PIP Construction
• Impose parametric assumptions for the unobserved data– Estimate or specify the values of the unknown parameters under
reasonable and strategic assumptions
• Simulate future data
• Combine the observed data with the simulated data
• Construct PIs using standard methods
• Repeat to obtain many simulated PIs
PIPs: Construction Schema
Simulate future outcome
A simulated completedataset at the end of the trial or later interim
Calculate the “final” result
Observed dataset at interim point
+ Assumption about future data
Adjusted PIDSMB
PIPs: Construction Schema
Simulate future outcome
A simulated completedataset at the end of the trial or later interim
Calculate the “final” result
Observed dataset at interim point
+ Assumption about future data
Adjusted PIDSMB
Repeat
PIPs: Construction Schema
Simulate future outcome
A simulated completedataset at the end of the trial or later interim
Calculate the “final” result
Observed dataset at interim point
Adjusted PI
+ Assumption about future data
DSMB
Simulate many timesand get many adjusted PIs
Minocycline for Cognitive Impairment in HIV
• Design– Randomized, blinded, placebo-controlled single
site study– Primary endpoint: 24 week change in
composite of standardized neuropsychological testing battery
– N=100
• DSMB reviewed results after ~40% information
24-Week Change in NP Summary
TotalTotalTreatment ArmTreatment Arm
pp--valuevalueAA BB
U NP Sum ChangeU NP Sum Change 0.5930.593NN 4141 2020 2121# missing# missing 2424 1313 1111Mean (SD)Mean (SD) 0.41 (0.80)0.41 (0.80) 0.34 (0.84)0.34 (0.84) 0.49 (0.78)0.49 (0.78)Min, MaxMin, Max --1.43, 2.061.43, 2.06 --1.43, 2.061.43, 2.06 --1.00, 1.661.00, 1.66MedianMedian 0.580.58 0.600.60 0.520.52Q1, Q3Q1, Q3 --0.17, 0.840.17, 0.84 --0.30, 0.810.30, 0.81 0.17, 0.930.17, 0.93
Predicted Intervals Summary
AssumptionAssumption CurrentCurrent95% CI95% CI
Width ofWidth ofCurrent Current 95% CI95% CI
Median (Q1, Q3)Median (Q1, Q3)Width of Width of
PredictedPredicted95% CI95% CI
ProbabilityProbabilityto reject H0to reject H0
in favorin favorof Aof A
ProbabilityProbabilityto reject H0to reject H0
in favorin favorof Bof B
ObservedObservedTrendTrend --0.67, 0.380.67, 0.38 1.051.05 0.63 (0.61, 0.66)0.63 (0.61, 0.66) 0%0% 8%8%
NullNullHypothesisHypothesis --0.67, 0.380.67, 0.38 1.051.05 0.68 (0.65, 0.71)0.68 (0.65, 0.71) 0%0% 2%2%
AlternativeAlternativeHypothesisHypothesis --0.67, 0.380.67, 0.38 1.051.05 0.67 (0.64, 0.70)0.67 (0.64, 0.70) 19%19% 0%0%
0-0.5 0.5
1234
567
89
10
Clinically Inferior
Clinically Superior Clinically Equivalent
Treatment B is preferred Treatment A is preferred
0-0.5 0.5
1234
56
89
10
Clinically Inferior
Clinically Superior Clinically Equivalent
Treatment B is preferred Treatment A is preferred
7
0-0.5 0.5
1234
567
89
10
Clinically Inferior
Clinically Superior Clinically Equivalent
Treatment B is preferred Treatment A is preferred
Scenario Under Observed
Under H0
Under Ha
1 0% 0% 0%
2 31.6% 23% 0.2%
3 7.8% 2% 0%
4 0% 0% 0%
5 0.6% 4.6% 56.2%
6 60% 70.4% 24.8%
7 0% 0% 0%
8 0% 0% 18.8%
9 0% 0% 0%
10 0% 0% 0%
12
3
4
567
89
10
Summary
• Futility analyses suggest low probability of a positive trial/superiority result with respect to primary endpoint
• DSMB evaluated if relevant effects could reasonably be ruled out
• DSMB evaluated if other reasons to continue (e.g., secondary endpoints)
• DSMB recommended early termination of the study
Primary Comparisons and Power Considerations
Superiority
A total of 1700 primary events in the two treatment arms attains 85.9% power for detecting HR=0.85.
Non-inferiority
1850 primary events in these two treatment arms will provide 88.1% power if valsartan is actually 2.5% better than captopril
The total events = ½ (1700 + 1850 +1850) = 2700
Captopril 50 mg tidValsartan 160 mg bid
Captopril 50 mg tid + Valsartan 80 mg bid Captopril 50 mg tid
Captopril 50 mg tidValsartan 160 mg bid
Example: Large CV Trial
• Design– Superiority trial comparing combination
therapy to monotherapy– Primary endpoint is time-to-CV-event– Sample size based on observing 2700 events– Interim planned after 900 and 1800 observed
events
Final Result: VALIANT Study
Final Result: VALIANT Study
Comparison Combo vs Captopril
Valsartan vs Captopril
Hazard Ratio 0.98 1.0
97.5%CI 0.89 to 1.09 0.90 to 1.11
Superiority Test P=0.73 P=0.98
Non-inferiority Test
NA P=0.004
Example
Could these results been obtained more efficiently?We focus on the comparison of
Combo group vs. Mono-therapy groups
Captopril 50 mg tid
Valsartan 160 mg bidCaptopril 50 mg tid + Valsartan 80 mg bid
Final Analysis Result (Comb vs Mono)
Time after Randomization
Sur
viva
l
CombMono
6M 12M 18M 24M
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1.0
6M Survival [95%CI]
Comb : 90.6 [ 89.8 , 91.4 ]
Mono : 90.8 [ 90.2 , 91.4 ]
( Comb - Mono ): -0.2 [ -1.3 , 0.9 ]
12M Survival [95%CI]
Comb : 87.7 [ 86.7 , 88.6 ]
Mono : 87.1 [ 86.4 , 87.8 ]
( Comb - Mono ): 0.5 [ -0.8 , 1.8 ]
18M Survival [95%CI]
Comb : 84.4 [ 83.4 , 85.4 ]
Mono : 84.2 [ 83.5 , 84.9 ]
( Comb - Mono ): 0.2 [ -1.3 , 1.7 ]
Observed Events 6M 12M 18M 24MCombMono
1311 1708 2141 2508908 1396 1694 1910
No. at risk 6M 12M 18M 24MCombMono
4415 4265 4000 26488895 8513 8036 5283
The final HR [95%CI]0.975 [0.902, 1.054]
Group Sequential Testing: O’Brien-Fleming
Observed test statistics
Conditional Power
With 8% of chance, the significant result will be observed at the final.
Observed HR
?P<0.05
Limitation of Conditional Power• Why is there a low probability (8%) of significance?
– No effect?– Not enough events?– Both?
• Conditional Power does not distinguish these
• Need more information
Each Patient’s Entry and Follow-upwith calendar time
900 events11/6/2000
1800 events7/19/2001
Total events: 2,878Total Survivors: 11,825Total: 14,703
Hazard Ratio
Perc
entil
e of
Poi
nt E
stim
ate
Dis
tribu
tion
0.9 1.0 1.1
525
5075
95Predicted Interval Plot
18M after the 2nd interim analysis (4Y from the start) Assumed HR = 0.975.
Current Interval[0.88, 1.08]
Length:0.20
In favor of Mono therapyIn favor of Comb
Hazard Ratio
Perc
entil
e of
Poi
nt E
stim
ate
Dis
tribu
tion
0.9 1.0 1.1
525
5075
95Predicted Interval Plot
18M after the 2nd interim analysis (4Y from the start) Assumed HR = 0.975.
The final interval0.98 [0.90, 1.05]
Length: 0.15
Current Interval[0.88, 1.08]
Length:0.20
In favor of Mono therapyIn favor of Comb
Sensitivity Analyses
• Perform sensitivity analyses to evaluate the effect of the uncertainty of the model assumption by creating PIPs for various assumed models– E.g., observed trend is true, HA is true, H0 is
true, optimistic/pessimistic case scenarios, etc.– Strategically chosen
Hazard Ratio
Perc
entil
e of
Poi
nt E
stim
ate
Dis
tribu
tion
0.9 1.0
525
7595
Predicted Interval PlotHR = 0.85 (original alternative hypothesized value)
In favor of Mono therapyIn favor of Comb
6M or 12M after the DSMB(Assumed HR = 0.975)
From DSM B From Start 95% CI Length
As of DSM B 2.5Y O bserved [0.88, 1.08] 0.20
6M later 3.0Y PI (m edian) [0.89, 1.07] 0.18
12M later 3.5Y PI (m edian) [0.90, 1.06] 0.16
18M later 4.0Y PI (m edian) [0.91, 1.06] 0.15
18M later 4.0Y (END) O bserved [0.90, 1.05] 0.15
ACTG A5175• Phase IV, randomized, open-label, three-arm trial designed
to evaluate three ARV regimens for treatment-naïve HIV+ participants– One regimen contained two nucleoside reverse transcriptase
inhibitors (NRTIs) + an HIV-1 protease inhibitor (PI) and two regimens each containing two NRTIs + a non-nucleoside reverse transcriptase inhibitor (NNRTI)
– Primary endpoint = time to first of (virologic failure, new AIDS-defining OI, death)
• PIPs used to show that under reasonable assumptions, precision would not be appreciably increased with reasonable additional follow-up– Additional follow-up would not alter qualitative interpretation of
the trial
Extensions• Bayesian analog (pending)
– Posterior probability of hypotheses– Side-by-side analyses to aid DSMBs in decision-
making– R2Winbugs
• Development Program Decisions– Using Phase II data to predict Phase III and make
go/no-go decisions
• Use of PIPs in Design
Example: ACTG 5263
• A Randomized Comparison of Three Chemotherapy Regimens as an Adjunct to Antiretroviral Therapy for Treatment of Advanced AIDS-KS in Resource-Limited Settings
• Multinational
ACTG 5263
• Liposomal doxorubicin = active control– expensive
• 2 primary comparisons vs. cheaper therapies– Etoposide + ART vs. liposomal doxorubicin + ART– BV + ART vs. liposomal doxorubicin + ART
• Primary endpoint– KS progression or death by 48 weeks (binary)
ACTG 5263• Desire to make noninferiority claims
• Principles of defining the NI margin– Statistical
• Retain some effect of active control over placebo– Conceptual
• Maximum treatment difference that is clinically irrelevant• Largest treatment difference that is acceptable in order to gain other advantages of the
experimental intervention
• Difficult to define a single NI margin for settings with diverse resources
• Study sizes based on estimation and precision of the estimate (i.e., width of a CI) rather than hypothesis testing
– But desire to claim superiority or noninferiority will likely remain
Type I error 0.048
Median of CI width 0.199
Non-inferiority Margin Power
0.1 0.524
0.12 0.681
0.15 0.866
Power 0.508
Median of CI width 0.197
Non-inferiority Margin Prop. of claiming non-inferiority
0.1 0.019
0.12 0.055
0.15 0.166
Software
• Frontier Science (Beta version 0.01)– R interface– Menu driven with user setting parameters and
selecting options
• CBAR
• Grants
Predicted Interval Plot
Kernel-smoothing is used to obtain the estimated density. The mode is identified as the value with the highest estimated density.