Catalyst Data Monitoring Prediction · 2010. 9. 13. · – Remember most DSMB members are not...

transcript

Data Monitoring of Clinical Trials Using Prediction

Scott EvansLingling Li, Hajime Uno, LJ Wei

East Campus, Reisman Lecture HallFeldberg/Reisman Complex

Beth Israel Deaconess Medical Center

April 14, 2010

Special Thanks• Professor Jim Ware

• Kim Jackson

• Our hosts at BIDMC

• Colleagues– Sachiko Miyahara– Tzu-min Yeh– Huichao Chen– Jeanne Jiang– Lijuan Deng– Laura Smeaton

Outline

• Setting

• Motivation

• Predicted Intervals

• Predicted Interval Plots (PIPs)

• Extensions

Setting• Interim data from clinical trials are

monitored

• Treatment effects are estimated

• DSMBs (or DMCs) meet to review interim results and make recommendations regarding future trial conduct

Practical Questions

• Should the trial be stopped?– For efficacy?– For futility?

• Are there non-efficacious arms that should be dropped?

• Should sample size be re-calculated?– Due to a lack of precision in estimating a parameter during trial

design (e.g., variability, control group response)

• Should the duration of follow-up be modified due to unexpected event rates?– E.g., with an event-time endpoint?

Motivation• Answering these questions has:

– Ethical attractiveness• Fewer participants generally exposed to inefficacious and

potentially harmful therapies

– Economical advantages• Smaller expected sample sizes and shorter expected duration

than designs without interim analyses– Saving time, money, and other resources

– Public health advantages• Answers may get to the medical community more quickly

Interim Analyses Methods

• Group sequential methods – Control α spending– E.g., Slud and Wei, O’Brien-Fleming, Lan-

DeMets, Pocock– Generally boundary driven with test statistics

• Conditional power/futility index

Limitations of Many Traditional Methods

• Do not

– Provide estimates of effect or associated precision (only test statistics, p-values, and decision rules)

– Evaluate “clinical relevance”• Statistical significance is not the only consideration

– Information regarding the reasons for:• High p-values (or test statistics):

– Negligible effect vs. insufficient data vs. too much variation• Low p-values (or test statistics):

– Clinical significance?

Limitations of Many Traditional Methods

• Inflexible with binding decision rules based usually on a single (primary) endpoint

– Desire to base decisions upon simultaneous assessment of many factors, such as:

• Safety data • Secondary endpoints• Quality of life• Benefit:risk assessment• Results of other trials• Scientific relevance• Availability of new alternative therapies• Cost:benefit considerations

Repeated Confidence Intervals (RCIs)

• Sequential CIs– Simultaneous coverage control

• Uses principles of group sequential methods

• Provides estimates of effects sizes

• Allows for flexibility in decision making

• Jennison & Turnbull, Controlled Clinical Trials, 1984.• Mehta et.al., Statistics in Medicine, 2000.

Repeated Confidence Intervals

r of C

99.95%CI

98.6%CI95.5%CI

Limitations of Repeated CIs

• At the interim, we wish to weigh the options of stopping vs. continuing

• Repeated CIs do not: – Provide formal evaluation of the ramifications of

continuing• What effect size estimates and associated precision will be

observed at the end of the trial? At the next interim?

• Thus how do we weigh the options?

Need for Methods that:• Control error rates

• Are flexible to allow for expert DSMB judgment – Allow incorporation of other information into decision

• Provide effect size estimates and associated precision– Assess clinical relevance and statistical significance

• Provide information about decision alternatives

Predicted Intervals

• Predict CI at future timepoint (e.g., end of trial or next interim analysis time) conditional upon:

1. Observed data2. Assumptions regarding future data (e.g., observed

trend continues, HA is true, H0 is true, best/worst case scenarios, etc.)

• Evans SR, Li L, Wei LJ, Drug Information Journal, 41:733-742, 2007.

NARC 009• Randomized, double-blind, placebo-controlled,

multicenter, dose-ranging study of prosaptide (PRO) for the treatment of HIV-associated neuropathic pain

• Participants were randomized to 2, 4, 8, 16 mg/d PRO or placebo administered via subcutaneous injection

• Primary endpoint:– 6 week change from baseline in weekly average of random daily

Gracely pain scale prompts using an electronic diary

NARC 009

• Designed for 390 participants equally allocated between groups

• Interim analysis conducted after 167 participants completed the 6-week double-blind treatment period

• Computed PIs

Treatment N 95% CI forMean Change

95% CI for Diff1

95% PI forDiff2

95% PI forDiff3

Required Diff4

Placebo 31 (-0.35, -0.11)

2 mg 34 (-0.21, -0.04) (-0.04, 0.25) (-0.01, 0.21) (-0.16, 0.06) -0.54

4 mg 34 (-0.38, -0.12) (-0.19, 0.16) (-0.14, 0.10) (-0.23, 0.01) -0.45

8 mg 32 (-0.18, -0.02) (-0.01, 0.28) (0.03, 0.23) (-0.15, 0.05) -0.56

16 mg 36 (-0.34, -0.09) (-0.16, 0.19) (-0.11, 0.14) (-0.21, 0.04) -0.54

1: 95% CI for the difference in mean changes vs. placebo2: 95% PI for the difference in mean changes vs. placebo assuming full enrollment, assuming current trend3: 95% PI for the difference in mean changes vs. placebo assuming full enrollment, assuming per protocol, μplacebo = -

0.17 and μdrug = -0.344: Difference in mean changes needed in the remaining participants for the CI for the difference in mean changes to

exclude zero (in favor of active treatment) at the end of the trial

Interim Analysis Results: NARC 009

NARC 009

• Sensitivity analyses shows that the futility assessment is robust

• Trial was discontinued by NARC DSMB for futility– Evans et. al., PLoS ONE, 2007.

Predicted Intervals

• Intuitive

• Practical– Provides information on effect size and associated

precision (clinical relevance)– Invariant to study design (superiority vs. noninferiority)– Use with RCI theory to control error rates and allow

flexibility in decision process– Can use for:

• Binary, continuous, or time-to-event endpoints• 1 and 2 sample problems

Issues

• Sensitivity analyses necessary to assess the robustness of results to varying assumptions– Strategic assessment of assumptions to be employed

• Need to assess impact of sampling variability

• Need concise and intuitive summaries for DSMBs– Too much information is not digestible

• Graphics are helpful– Too little information is not informative– Remember most DSMB members are not statisticians

Predicted Interval Plots (PIPs)

• Evaluates the sampling variability associated with the assumed model using simulation

• Plots the simulated PIs under the model assumption

• Conditional power is readily available

• Li L, Evans SR, Uno H, Wei LJ (Statistics in Biopharmaceutical Research)

PIP Construction

• Impose parametric assumptions for the unobserved data– Estimate or specify the values of the unknown parameters under

reasonable and strategic assumptions

• Simulate future data

• Combine the observed data with the simulated data

• Construct PIs using standard methods

• Repeat to obtain many simulated PIs

PIPs: Construction Schema

Simulate future outcome

A simulated completedataset at the end of the trial or later interim

Calculate the “final” result

Observed dataset at interim point

+ Assumption about future data

Adjusted PIDSMB

Repeat

Adjusted PI

Simulate many timesand get many adjusted PIs

Minocycline for Cognitive Impairment in HIV

• Design– Randomized, blinded, placebo-controlled single

site study– Primary endpoint: 24 week change in

composite of standardized neuropsychological testing battery

– N=100

• DSMB reviewed results after ~40% information

24-Week Change in NP Summary

TotalTotalTreatment ArmTreatment Arm

pp--valuevalueAA BB

U NP Sum ChangeU NP Sum Change 0.5930.593NN 4141 2020 2121# missing# missing 2424 1313 1111Mean (SD)Mean (SD) 0.41 (0.80)0.41 (0.80) 0.34 (0.84)0.34 (0.84) 0.49 (0.78)0.49 (0.78)Min, MaxMin, Max --1.43, 2.061.43, 2.06 --1.43, 2.061.43, 2.06 --1.00, 1.661.00, 1.66MedianMedian 0.580.58 0.600.60 0.520.52Q1, Q3Q1, Q3 --0.17, 0.840.17, 0.84 --0.30, 0.810.30, 0.81 0.17, 0.930.17, 0.93

Predicted Intervals Summary

AssumptionAssumption CurrentCurrent95% CI95% CI

Width ofWidth ofCurrent Current 95% CI95% CI

Median (Q1, Q3)Median (Q1, Q3)Width of Width of

PredictedPredicted95% CI95% CI

ProbabilityProbabilityto reject H0to reject H0

in favorin favorof Aof A

ProbabilityProbabilityto reject H0to reject H0

in favorin favorof Bof B

ObservedObservedTrendTrend --0.67, 0.380.67, 0.38 1.051.05 0.63 (0.61, 0.66)0.63 (0.61, 0.66) 0%0% 8%8%

NullNullHypothesisHypothesis --0.67, 0.380.67, 0.38 1.051.05 0.68 (0.65, 0.71)0.68 (0.65, 0.71) 0%0% 2%2%

AlternativeAlternativeHypothesisHypothesis --0.67, 0.380.67, 0.38 1.051.05 0.67 (0.64, 0.70)0.67 (0.64, 0.70) 19%19% 0%0%

0-0.5 0.5

Clinically Inferior

Clinically Superior Clinically Equivalent

Treatment B is preferred Treatment A is preferred

0-0.5 0.5

Clinically Inferior

0-0.5 0.5

Clinically Inferior

Scenario Under Observed

Under H0

Under Ha

1 0% 0% 0%

2 31.6% 23% 0.2%

3 7.8% 2% 0%

4 0% 0% 0%

5 0.6% 4.6% 56.2%

6 60% 70.4% 24.8%

7 0% 0% 0%

8 0% 0% 18.8%

9 0% 0% 0%

10 0% 0% 0%

Summary

• Futility analyses suggest low probability of a positive trial/superiority result with respect to primary endpoint

• DSMB evaluated if relevant effects could reasonably be ruled out

• DSMB evaluated if other reasons to continue (e.g., secondary endpoints)

• DSMB recommended early termination of the study

Primary Comparisons and Power Considerations

Superiority

A total of 1700 primary events in the two treatment arms attains 85.9% power for detecting HR=0.85.

Non-inferiority

1850 primary events in these two treatment arms will provide 88.1% power if valsartan is actually 2.5% better than captopril

The total events = ½ (1700 + 1850 +1850) = 2700

Captopril 50 mg tidValsartan 160 mg bid

Captopril 50 mg tid + Valsartan 80 mg bid Captopril 50 mg tid

Captopril 50 mg tidValsartan 160 mg bid

Example: Large CV Trial

• Design– Superiority trial comparing combination

therapy to monotherapy– Primary endpoint is time-to-CV-event– Sample size based on observing 2700 events– Interim planned after 900 and 1800 observed

events

Final Result: VALIANT Study

Comparison Combo vs Captopril

Valsartan vs Captopril

Hazard Ratio 0.98 1.0

97.5%CI 0.89 to 1.09 0.90 to 1.11

Superiority Test P=0.73 P=0.98

Non-inferiority Test

NA P=0.004

Example

Could these results been obtained more efficiently?We focus on the comparison of

Combo group vs. Mono-therapy groups

Captopril 50 mg tid

Valsartan 160 mg bidCaptopril 50 mg tid + Valsartan 80 mg bid

Final Analysis Result (Comb vs Mono)

Time after Randomization

CombMono

6M 12M 18M 24M

6M Survival [95%CI]

Comb : 90.6 [ 89.8 , 91.4 ]

Mono : 90.8 [ 90.2 , 91.4 ]

( Comb - Mono ): -0.2 [ -1.3 , 0.9 ]

12M Survival [95%CI]

Comb : 87.7 [ 86.7 , 88.6 ]

Mono : 87.1 [ 86.4 , 87.8 ]

( Comb - Mono ): 0.5 [ -0.8 , 1.8 ]

18M Survival [95%CI]

Comb : 84.4 [ 83.4 , 85.4 ]

Mono : 84.2 [ 83.5 , 84.9 ]

( Comb - Mono ): 0.2 [ -1.3 , 1.7 ]

Observed Events 6M 12M 18M 24MCombMono

1311 1708 2141 2508908 1396 1694 1910

No. at risk 6M 12M 18M 24MCombMono

4415 4265 4000 26488895 8513 8036 5283

The final HR [95%CI]0.975 [0.902, 1.054]

Group Sequential Testing: O’Brien-Fleming

Observed test statistics

Conditional Power

With 8% of chance, the significant result will be observed at the final.

Observed HR

?P<0.05

Limitation of Conditional Power• Why is there a low probability (8%) of significance?

– No effect?– Not enough events?– Both?

• Conditional Power does not distinguish these

• Need more information

Each Patient’s Entry and Follow-upwith calendar time

900 events11/6/2000

1800 events7/19/2001

Total events: 2,878Total Survivors: 11,825Total: 14,703

Hazard Ratio

0.9 1.0 1.1

95Predicted Interval Plot

18M after the 2nd interim analysis (4Y from the start) Assumed HR = 0.975.

Current Interval[0.88, 1.08]

Length:0.20

In favor of Mono therapyIn favor of Comb

Hazard Ratio

0.9 1.0 1.1

95Predicted Interval Plot

18M after the 2nd interim analysis (4Y from the start) Assumed HR = 0.975.

The final interval0.98 [0.90, 1.05]

Length: 0.15

Current Interval[0.88, 1.08]

Length:0.20

Sensitivity Analyses

• Perform sensitivity analyses to evaluate the effect of the uncertainty of the model assumption by creating PIPs for various assumed models– E.g., observed trend is true, HA is true, H0 is

true, optimistic/pessimistic case scenarios, etc.– Strategically chosen

Hazard Ratio

0.9 1.0

Predicted Interval PlotHR = 0.85 (original alternative hypothesized value)

6M or 12M after the DSMB(Assumed HR = 0.975)

From DSM B From Start 95% CI Length

As of DSM B 2.5Y O bserved [0.88, 1.08] 0.20

6M later 3.0Y PI (m edian) [0.89, 1.07] 0.18

18M later 4.0Y (END) O bserved [0.90, 1.05] 0.15

ACTG A5175• Phase IV, randomized, open-label, three-arm trial designed

to evaluate three ARV regimens for treatment-naïve HIV+ participants– One regimen contained two nucleoside reverse transcriptase

inhibitors (NRTIs) + an HIV-1 protease inhibitor (PI) and two regimens each containing two NRTIs + a non-nucleoside reverse transcriptase inhibitor (NNRTI)

– Primary endpoint = time to first of (virologic failure, new AIDS-defining OI, death)

• PIPs used to show that under reasonable assumptions, precision would not be appreciably increased with reasonable additional follow-up– Additional follow-up would not alter qualitative interpretation of

the trial

Extensions• Bayesian analog (pending)

– Posterior probability of hypotheses– Side-by-side analyses to aid DSMBs in decision-

making– R2Winbugs

• Development Program Decisions– Using Phase II data to predict Phase III and make

go/no-go decisions

• Use of PIPs in Design

Example: ACTG 5263

• A Randomized Comparison of Three Chemotherapy Regimens as an Adjunct to Antiretroviral Therapy for Treatment of Advanced AIDS-KS in Resource-Limited Settings

• Multinational

ACTG 5263

• Liposomal doxorubicin = active control– expensive

• 2 primary comparisons vs. cheaper therapies– Etoposide + ART vs. liposomal doxorubicin + ART– BV + ART vs. liposomal doxorubicin + ART

• Primary endpoint– KS progression or death by 48 weeks (binary)

ACTG 5263• Desire to make noninferiority claims

• Principles of defining the NI margin– Statistical

• Retain some effect of active control over placebo– Conceptual

• Maximum treatment difference that is clinically irrelevant• Largest treatment difference that is acceptable in order to gain other advantages of the

experimental intervention

• Difficult to define a single NI margin for settings with diverse resources

• Study sizes based on estimation and precision of the estimate (i.e., width of a CI) rather than hypothesis testing

– But desire to claim superiority or noninferiority will likely remain

Type I error 0.048

Median of CI width 0.199

Non-inferiority Margin Power

0.1 0.524

0.12 0.681

0.15 0.866

Power 0.508

Median of CI width 0.197

Non-inferiority Margin Prop. of claiming non-inferiority

0.1 0.019

0.12 0.055

0.15 0.166

Software

• Frontier Science (Beta version 0.01)– R interface– Menu driven with user setting parameters and

selecting options

• CBAR

• Grants

Predicted Interval Plot

Kernel-smoothing is used to obtain the estimated density. The mode is identified as the value with the highest estimated density.

Catalyst Data Monitoring Prediction · 2010. 9. 13. · – Remember most DSMB members are not...

Documents