+ All Categories
Home > Documents > Bayesian Design and Analysis of Active Control Clinical Trials

Bayesian Design and Analysis of Active Control Clinical Trials

Date post: 30-Jan-2017
Category:
Upload: richard-simon
View: 213 times
Download: 0 times
Share this document with a friend
5
Bayesian Design and Analysis of Active Control Clinical Trials Author(s): Richard Simon Source: Biometrics, Vol. 55, No. 2 (Jun., 1999), pp. 484-487 Published by: International Biometric Society Stable URL: http://www.jstor.org/stable/2533796 . Accessed: 28/06/2014 13:30 Your use of the JSTOR archive indicates your acceptance of the Terms & Conditions of Use, available at . http://www.jstor.org/page/info/about/policies/terms.jsp . JSTOR is a not-for-profit service that helps scholars, researchers, and students discover, use, and build upon a wide range of content in a trusted digital archive. We use information technology and tools to increase productivity and facilitate new forms of scholarship. For more information about JSTOR, please contact [email protected]. . International Biometric Society is collaborating with JSTOR to digitize, preserve and extend access to Biometrics. http://www.jstor.org This content downloaded from 193.105.245.130 on Sat, 28 Jun 2014 13:30:56 PM All use subject to JSTOR Terms and Conditions
Transcript
Page 1: Bayesian Design and Analysis of Active Control Clinical Trials

Bayesian Design and Analysis of Active Control Clinical TrialsAuthor(s): Richard SimonSource: Biometrics, Vol. 55, No. 2 (Jun., 1999), pp. 484-487Published by: International Biometric SocietyStable URL: http://www.jstor.org/stable/2533796 .

Accessed: 28/06/2014 13:30

Your use of the JSTOR archive indicates your acceptance of the Terms & Conditions of Use, available at .http://www.jstor.org/page/info/about/policies/terms.jsp

.JSTOR is a not-for-profit service that helps scholars, researchers, and students discover, use, and build upon a wide range ofcontent in a trusted digital archive. We use information technology and tools to increase productivity and facilitate new formsof scholarship. For more information about JSTOR, please contact [email protected].

.

International Biometric Society is collaborating with JSTOR to digitize, preserve and extend access toBiometrics.

http://www.jstor.org

This content downloaded from 193.105.245.130 on Sat, 28 Jun 2014 13:30:56 PMAll use subject to JSTOR Terms and Conditions

Page 2: Bayesian Design and Analysis of Active Control Clinical Trials

BIOMETRICS 55, 484-487 June 1999

Bayesian Design and Analysis of Active Control Clinical Trials

Richard Simon

Biometric Research Branch, National Cancer Institute, Bethesda, Maryland 20892, U.S.A. email: richdbrb.nci.nih.gov

SUMMARY. We consider the design and analysis of active control clinical trials, i.e., clinical trials comparing an experimental treatment E to a control treatment C considered to be effective. Direct comparison of E to placebo P, or no treatment, is sometimes ethically unacceptable. Much discussion of the design and analysis of such clinical trials has focused on whether the comparison of E to C should be based on a test of the null hypothesis of equivalence, on a test of a nonnull hypothesis that the difference is of some minimally medically important size 6, or on one or two-sided confidence intervals. These approaches are essentially the same for study planning. They all suffer from arbitrariness in specifying the size of the difference 3 that must be excluded. We propose an alternative Bayesian approach to the design and analysis of active control trials. We derive the posterior probability that E is superior to P or that E is at least k% as good as C and that C is more effective than P. We also derive approximations for use with logistic and proportional hazard models. Selection of prior distributions is discussed, and results are illustrated using data from an active control trial of a drug for the treatment of unstable angina.

KEY WORDS: Active control trials; Bayesian; Clinical trials; Equivalence trials.

1. Introduction

Many randomized clinical trials compare an experimental treatment to an active control. For life-threatening diseases, it is generally not ethical to withhold a treatment of established effectiveness and utilize a placebo or no-treatment control. For non-life-threatening conditions, it is sometimes difficult to do so. In some situations, the experimental treatment has ad- vantages in convenience and reduced side effects that would make it desirable if it were equivalent to the active control with regard to the primary efficacy endpoint. In other cases, such as seeking approval for the marketing of a new drug, interest focuses on demonstrating effectiveness of the exper- imental treatment by showing therapeutic equivalence to a control treatment for which effectiveness has been previously established. The interpretation of active control clinical trials is difficult, however (Temple, 1983; Fleming, 1990).

In cancer clinical trials, there is rarely a placebo arm or even a no-treatment arm. The clinical trial(s) that led to the adop- tion of the active control (C) will generally have compared C to some previous standard (P). For ease of exposition, we will use the term placebo loosely to refer to the standard ap- proach to patient management prior to the adoption of the active control C.

The analysis of active control trials is usually based on at- tempting to determine whether the experimental treatment is therapeutically equivalent to the active control. Therapeu- tic equivalence means equivalent clinical outcome. There has been controversy over the appropriate statistical methods that should be used for the planning and analysis of active con- trol trials. This controversy has been largely based on the

impossibility of demonstrating exact therapeutic equivalence. Makuch and Simon (1978) and Durrleman and Simon (1990) have suggested that confidence intervals should be used for the reporting of results because statistical significance tests of the null hypothesis of equivalence may lead to the adoption of an inferior treatment when statistical power for detecting differ- ences is poor. Blackwelder (1982) has recommended using a significance test of a nonnull hypothesis that the difference in efficacy is 6, where 3 represents the smallest difference of medical importance. Other suggestions have included using traditional tests of the null hypothesis while reversing the usual levels of oa and /3 to reflect the fact that the /3 error may be more important than the oa error in such trials.

In an important sense, none of the above approaches rep- resents a satisfactory statistical framework for the design and analysis of active control trials. These approaches depend on the specification of a minimal difference 3 in efficacy that one is willing to tolerate. None of the approaches deals with how 3 is determined. Fleming (1990) and Gould (1991) have noted that the design and interpretation of active control tri- als must utilize information about previous trials of the active control. Fleming proposed that the new treatment be consid- ered effective if an upper confidence limit for the amount that the new treatment may be inferior to the active control does not exceed a reliable estimate of the improvement of the ac- tive control over placebo or no treatment. Gould provided a method for creating a synthetic placebo control group in the active control trial based on previous trials comparing the active control to placebo. We present a general Bayesian ap- proach to the utilization of information from previous trials in the design and analysis of an active control trial.

484

This content downloaded from 193.105.245.130 on Sat, 28 Jun 2014 13:30:56 PMAll use subject to JSTOR Terms and Conditions

Page 3: Bayesian Design and Analysis of Active Control Clinical Trials

Analysis of Active Control Clinical Trials 485

Two major objectives of active control trials can be distin- guished. The first is to determine whether the experimental treatment is effective relative to P. This requires explicit use of prior information about outcomes of trials comparing P to the active control. Meaningful interpretation of active control trials is impossible without consideration of such information. Establishing whether or not the experimental treatment is ef- fective relative to P is a first requirement. The second objec- tive is to determine whether any medically important portion of the treatment effect for the active control is lost with the experimental treatment. In some cases, this objective is un- realistic because the size of the treatment effect (relative to P) for the active control is imprecisely determined. Active control trials are often called therapeutic equivalence trials. We use the former term here because therapeutic equivalence is only one of the possible objectives of trials with an active control structure.

2. The Model

We use the following model for the active control trial: y a + O3x + -yz + 6, where y denotes the response of a patient, x = 0 for placebo or the experimental treatment and 1 for the control treatment, z = 0 for placebo or the control treat- ment and 1 for the experimental treatment, and E is normally distributed experimental error. Hence, the expected response for C is a + /3, the expected response for E is a + -y, and the expected response for P is a. The likelihood function for the data (D) from the active control trial can be expressed as 7(D |a, c 7(- a,/ |)>(-e a,y), where the first factor is the likelihood of the data for the control group and the sec- ond factor is the likelihood of the data for the experimental group. Throughout this paper, we use the notation 7r(.) infor- mally to denote either probability density of observable data, prior probability density of a parameter, or posterior density of a parameter. The first factor is N(oa + /, u2) and the sec- ond factor is N(oa + Ay, u2), where a is the standard error for the observed means. We assume that ca2 is known, although it will generally be estimated. For the large sample sizes ap- propriate for active control trials, the additional variability caused by uncertainty in ca2 should be very small. This as- sumption enables us to obtain simple analytical results, but a more exact treatment is possible using posterior distribution sampling methods.

The posterior distribution of E) = (oa, /, -y) has density pro- portional to 7r(D a, /, -y)r(E)). We shall assume that the parameters have independent normal prior densities 7r(ae) N(btaCo), 7r(/3) N(bty, cU2), and 7r(-y) N (t,u2). Hence, the posterior distribution of E) is 7r(E) I D) cx 7r(c I a, )7(-| ae, y)ir(a)ir(/3>r(y). The posterior distribution can be shown to be multivariate normal. The covariance matrix is

E = (K/572)

( + 'rJ3) + 7-- -( + 7n) -( + 7-,8

X ( -(1?+ <) r + (1 +,)( + -)(1 v

\. -(1?+ r,) 1 r3 + (1-+ 7)(1- + r,)

(1)

where rg /2 rig 2 2 2/U22 and K I" (1 I)( __Wr +r I (1 W) I (1 I rk. The mean vector

= (rhg, rip, Pa) of the posterior distribution is

ro(1 ( + r + )(I + r + r,(1 + r_)(yc -,) a= K

r_(I + r,)(1 -

+ K

K

ry(r,1 + (1 + rn)(1 + r,))tt + ra(1 + r,)(6 - ic) K- K

r,~(r - ( + Ha)

+ K (2) This indicates that the posterior mean of oa is a weighted

average of three estimates of oa. The first estimate is the prior mean tta. The second estimate is the observed ye minus the prior mean for /3. This makes intuitive sense since the ex- pectation of yc is oa + /3. The third estimate in the weighted average is the observed ye minus the prior mean for -y. The expectation of ye is oa + -y. The sum of the weights is K. The other posterior means are similarly interpreted.

The marginal posterior distribution of -y is normal with mean r17 and variance the (3, 3) element of E. The parameter oy represents the contrast of experimental treatment versus placebo. One can thus easily compute the posterior probabil- ity that -y > 0, which would be a Bayesian analog of a statis- tical significance test of the null hypothesis that the experi- mental regimen is no more effective than placebo (if negative values of the parameter represent effectiveness).

The posterior distribution of -y - k/ is univariate normal with mean rig- kr and variance Z33 + k 22 - 2kE23 . Conse- quently, one can also easily compute the posterior probability that -y - kO < 0. For k = 0.5, if / < 0, this represents the probability that the experimental regimen is at least half as effective as the active control. Since there may be positive probability that /3 > 0, it is more appropriate to compute the joint probability that /3 < 0 and -y - k/ < 0 to represent the probability that the experimental regimen is at least a kth as effective as the active control.

In the special case where noninformative prior distributions are adopted for a and -y, one obtains

/ 1 +r -1 -(1 +r0)\0 E = U13 -1 . (3)

Y-(1+r1) 1 1+2r )

In this case, the posterior distribution of /3 is N(tu, ar2), the same as the prior distribution; the posterior distribution of oy is N(/i, + e - -c, a2 + 2u 2); and the posterior distribution

of oa is N(yc- ar2 + a 2). It can be seen that the clinical trial comparing C to E contains information about oa if an informative prior distribution is used for /3.

One may permit correlation among the prior distributions. Let S denote the covariance matrix for the multinormal prior distribution for (oz,/3<) and let T = S-1. Then E-1 = M + S-1, where

-2 1 1j M = (1I/U2) 1 1 0 ,(4)

This content downloaded from 193.105.245.130 on Sat, 28 Jun 2014 13:30:56 PMAll use subject to JSTOR Terms and Conditions

Page 4: Bayesian Design and Analysis of Active Control Clinical Trials

486 Biometrics, June 1999

and the posterior mean vector is the solution of Z-1 r =

(1/CJ2) (y Pe Ye)' + S-11', where b= (frc ,t f,) and y. = Sc + Ye.

3. Example

The above results can be applied to binary outcome data by approximating the log odds of failure by a normal dis- tribution. Let y denote the natural logarithm of the ratio of number of failures in a treatment arm of the trial divided by the number of nonfailures; e.g., for the active control group, ye = log[fc/(nc - fe)]. The standard approximation for the variance of the logit is o2 = n/[fc(nc - fe)].

The ESSENCE clinical trial compared aspirin plus stan- dard heparin to aspirin plus Lovenox (enoxaparin sodium) in the treatment of unstable angina and non-Q-wave myocardial infarction (MI). Lovenox is a low molecular weight heparin. We will analyze the regimens with regard to the double com- posite endpoint of death and MI at day 30. There were 121 events among 1564 patients in the control group and 99 events among 1607 patients in the experimental group. This results in a oa value of approximately 0.10. We also obtain yC =-2.48 and y -2.72 for the trial.

It was of interest to relate the results of ESSENCE to that which would be expected for aspirin alone. Although aspirin plus heparin is not an approved regimen for unstable angina, it had been recommended by various consensus groups. A meta- analysis of six randomized trials comparing aspirin alone to aspirin plus heparin had been conducted using a random ef- fects model (Oler et al., 1996). The point estimate of the rel- ative risk was 0.67 with 95% confidence interval (0.44, 1.02).

We will use noninformative priors for a and -y. One can alternatively use the meta-analysis to establish the prior for a. For the meta-analysis in this example, however, there was a large intertrial variance in the outcome for the P group (as- pirin alone), and this would give a prior distribution for a very similar to a noninformative distribution. The prior for /3 is taken from the random-effects meta-analysis. The relative risk is a good approximation to odds for low incidence diseases. Using a normal approximation to the log odds ratio gives a prior distribution for /3 with /3 =-0.40 and cr = 0.215.

Using the model described above with this data results in a normal posterior distribution for -y with mean -0.641 and standard deviation 0.257. The parameter -y represents the natural logarithm of the odds ratio of the experimental treatment versus placebo, which is aspirin alone. Transform- ing away from the log scale results in a point estimate of 0.53 for the odds ratio and a 95% confidence limit of (0.32, 0.87). The posterior probability that the experimental regimen is superior to aspirin alone is 1 - D(-0.641/0.257) = 0.9937, where (J is the cumulative normal distribution function. The posterior probability that the experimental regimen is at least 80% as effective as the active control and that the active con- trol is more effective than aspirin alone is 0.957 using Monte Carlo integration with 10,000 replications to approximate the bivariate normal integral. The posterior probability that the experimental regimen is more effective than the active control and that the active control is more effective than aspirin alone is 0.928. A sensitivity analysis assuming correlation of *0.5 between prior distributions gave very similar results.

4. Extension to Proportional Hazards Model Let the hazard be written as A(t) = Ao(t) exp(o3x+-yz), where AS (t) denotes the baseline hazard function and the indicator variables x and z are the same as described in Section 2. The data will be taken as the maximum likelihood estimate of the log hazard ratio for E relative to C for the active control study and will be denoted by y. Thus, for large samples, y is approximately normally distributed with mean y - /3 and variance ca2 = l/dc + 1/dE, where the d's denote the number of events observed on C and E, respectively. Using normal priors for /3 and -y as in Section 3, the same reasoning results in the posterior distribution of the parameters (/, -y) being approximately normal with mean r= (r= , ha) and covariance

matrix Z = (Aij)-1, with A11 = I/o2 + 1/o2, A22 = i/o2 +

1/oJ2 and A12 =-1/o2 and mean vector determined by

Ad = (yl2 +8/2)(5 A~q A393 / (5)

5. Design Considerations A minimal objective of the active control trial is to determine whether or not the E is effective relative to P. Hence, we might require that, if -y /3, then it should be very proba- ble that the trial will result in data = (ye, ) such that Pr(-y < 0 l -) > 0.95, where -y < 0 represents effectiveness

of the experimental treatment. Thus, we want Pr[rja/E1/2 < -1.645] > (, where the probability is calculated assuming -y = /, /3 is distributed according to its prior distribution, and ( is some appropriately large value such as 0.90. Because the posterior mean r17 is a linear combination of the data and is thus itself normally distributed with mean and variance denoted by pa and h, respectively, this expression can be written as

-1.645E 1/2 _ P(

33/ Z, (6)

where zi is the (100l)th percentile of the standard normal distribution. When -y / and E[-y] = p13, we obtain from (1) and (2)

pay {r=[rg + (1 + re,)(1 + ro)]Lu + [ra (1 + r3) + ro] 3}/K

and

f {[re, (1 + rig )]2 (U2 + U2 + U2) + 2r 2U21} /K2.

Hence, one can determine the value of a2 required. a2 rep- resents the variance of the means ye and yc and hence is inversely proportional to the sample size per treatment arm in the active control trial.

In the special case where noninformative prior distributions are adopted for a and -y, the mean of the predictive distribu- tion is pa = pu with predictive variance = 2cr2. Using these results in (6) and simplifying yields

-1.645 (I + 2U2/oU2)1/2 -

(22/c7)0 - (7)

The trial may be sized by finding the value of a2 that satisfies (7). It is of interest that ,u/C is the z value for the evaluation of the active control versus placebo. The required sample size

This content downloaded from 193.105.245.130 on Sat, 28 Jun 2014 13:30:56 PMAll use subject to JSTOR Terms and Conditions

Page 5: Bayesian Design and Analysis of Active Control Clinical Trials

Analysis of Active Control Clinical Trials 487

for the active control trial is very sensitive to that z value. ,tt/oa3 = 3 represents substantial evidence that the active control is indeed effective relative to placebo. In this case, for (-0.8, one requires that the ratio r - 2/U2 = 0.4 in

order for (7) to be satisfied. Since cr is known and since cU2 represents the variance of the mean response per treatment arm in the active control trial, the sample size per arm can be determined. Alternatively, if tu/ = 2, then one requires that the ratio r = u2/U2 = 0.05 in order to satisfy (7). This represents eight times the sample size required for the case when r = 3. When the evidence for the effectiveness of the active control is marginal, then the active control design is neither feasible nor appropriate.

For the binary response approximation described in Sec- tion 3, we have approximately ca2 = 1/npq, where n is the sample size per treatment group in the active control trial. If there is one previous randomized trial of active control versus placebo on which to base the prior distribution of /, then we have approximately that or = 2/nopq, where no denotes the average sample size per treatment group in that trial. Con- sequently, u2/U2 = no/2n. If tu/c = 3, then no/2n = 0.4, i.e., is n = 1.25no, and the sample size required for the ac- tive control trial is 25% larger than that required for the trial demonstrating the effectiveness of the active control. On the other hand, if tU/C = 2, then no/2n = 0.05, i.e., n = 10no.

Planning the trial to demonstrate that the new regimen is effective compared to placebo seems a minimal requirement. One can be more ambitious and plan the trial to ensure with high probability that the results will support the conclusion that the new treatment is at least 100k% as effective as the active control when in fact the new treatment is equivalent to the active control. For this objective, one obtains instead of (7) the requirement

-1.645 ((1 - k)2 + 2U2/U2)1/2 - (1 - k)t/t.3c8 0/ = zoo (8)

(2cr /c11)

6. Discussion Our approach is based on the belief that active control trials cannot be properly designed or evaluated without quantifica- tion of the evidence for effectiveness of the control treatment. The standard frequentist methods also require this informa- tion in the specification of the smallest reduction in effec- tiveness 3 of the new treatment compared to active control that is considered of medical importance. Without knowing how effective the control is compared to placebo, one cannot meaningfully specify 3. The Bayesian approach presented here provides for taking into consideration the uncertainty in the degree of effectiveness of the control.

A meta-analysis of randomized clinical trials comparing P to C can be used to provide a prior distribution for a as well as for -y. One must be aware, however, that a highly informative prior distribution for a is equivalent to assuming that the results for the experimental treatment on the positive control trial can be directly compared to the historical control of placebo arms included in the meta-analysis. Consequently, one may wish to use a noninformative prior for a.

The posterior distributions of the parameters derived here can be used directly for interim monitoring of the active con-

trol trial. The approach can also be easily generalized to ac- commodate covariate information.

ACKNOWLEDGEMENTS

We thank Dr Sylvain Durrleman of Rhone-Poulenc Rhorer for data on the ESSENCE trial and the reviewers and associate editor for very useful suggestions.

RESUME

Dans le cadre d'essais cliniques comparant un traitement expe- rimental E a un traitement de reference C considered comme efficace (en effet, il arrive qu'il soit inacceptable, d'un point de vue ethique, de comparer un nouveau traitement a un placebo P ou a une absence de traitement), nous discutons du fait de savoir si, concernant la conception du plan d'experience et son analyse, on doit comparer E et C en testant l'hypothese nulle qu'ils sont equivalents, en testant l'hypothese d'une difference

entre les deux traitements qui soit non nulle et gale au delta minimum ayant encore une pertinence clinique, ou en construisant des intervalles de confiance unilateraux ou bi- lateraux. Ces approches quasiment identiques quant a leurs consequences sur la planification de l'essai souffrent toutes d'un arbitraire intolerable dans le choix de la difference delta. Nous proposons une approche bayesienne, qui permet de cal- culer la probability a posteriori pour que E soit meilleur que le placebo P, ou bien la probability a posteriori pour qu'a la fois, E soit au moins k% aussi bon que C et que C soit meilleur que P. Nous proposons egalement des approxima- tions a utiliser lorsqu'on travaille sur des modules logistiques et des modules a hasards proportionnels. Nous discutons en- fin du choix des distributions a priori, et illustrons tous ces resultats sur les donnees d'un essai comparatif dans langor instable.

REFERENCES

Blackwelder, W. (1982). Proving the null hypothesis in clinical trials. Controlled Clinical Trials 3, 345-353.

Durrleman, S. and Simon, R. (1990). Planning and monitoring of equivalence studies. Biometrics 46, 329-336.

Fleming, T. (1990). Evaluation of active control trials in AIDS. Journal of Acquired Immune Deficiency Syndromes 3, S82-S87.

Gould, A. (1991). Another view of active-controlled trials. Controlled Clinical Trials 12, 474-485.

Makuch, R. and Simon, R. (1978). Sample size requirements for evaluating a conservative therapy. Cancer Treatment Reports 62, 1037-1040.

Oler, A., Whooley, M. A., Oler, J., and Grady, D. (1996). Adding heparin to aspirin reduces the incidence of my- ocardial infarction and death in patients with unsta- ble angina. Journal of the American Medical Association 276, 811-815.

Temple, R. (1983). Difficulties in evaluating positive control trials. Proceedings of the American Statistical Associa- tion (Biopharmaceutical Section) 1-7.

Received September 1997. Revised June 1998. Accepted July 1998.

This content downloaded from 193.105.245.130 on Sat, 28 Jun 2014 13:30:56 PMAll use subject to JSTOR Terms and Conditions


Recommended