+ All Categories
Home > Documents > Designing Phase II Studies in the Context of a Programme of Clinical Research

Designing Phase II Studies in the Context of a Programme of Clinical Research

Date post: 30-Jan-2017
Category:
Upload: john-whitehead
View: 215 times
Download: 0 times
Share this document with a friend
12
Designing Phase II Studies in the Context of a Programme of Clinical Research Author(s): John Whitehead Source: Biometrics, Vol. 41, No. 2 (Jun., 1985), pp. 373-383 Published by: International Biometric Society Stable URL: http://www.jstor.org/stable/2530863 . Accessed: 25/06/2014 06:57 Your use of the JSTOR archive indicates your acceptance of the Terms & Conditions of Use, available at . http://www.jstor.org/page/info/about/policies/terms.jsp . JSTOR is a not-for-profit service that helps scholars, researchers, and students discover, use, and build upon a wide range of content in a trusted digital archive. We use information technology and tools to increase productivity and facilitate new forms of scholarship. For more information about JSTOR, please contact [email protected]. . International Biometric Society is collaborating with JSTOR to digitize, preserve and extend access to Biometrics. http://www.jstor.org This content downloaded from 188.72.96.21 on Wed, 25 Jun 2014 06:57:49 AM All use subject to JSTOR Terms and Conditions
Transcript
Page 1: Designing Phase II Studies in the Context of a Programme of Clinical Research

Designing Phase II Studies in the Context of a Programme of Clinical ResearchAuthor(s): John WhiteheadSource: Biometrics, Vol. 41, No. 2 (Jun., 1985), pp. 373-383Published by: International Biometric SocietyStable URL: http://www.jstor.org/stable/2530863 .

Accessed: 25/06/2014 06:57

Your use of the JSTOR archive indicates your acceptance of the Terms & Conditions of Use, available at .http://www.jstor.org/page/info/about/policies/terms.jsp

.JSTOR is a not-for-profit service that helps scholars, researchers, and students discover, use, and build upon a wide range ofcontent in a trusted digital archive. We use information technology and tools to increase productivity and facilitate new formsof scholarship. For more information about JSTOR, please contact [email protected].

.

International Biometric Society is collaborating with JSTOR to digitize, preserve and extend access toBiometrics.

http://www.jstor.org

This content downloaded from 188.72.96.21 on Wed, 25 Jun 2014 06:57:49 AMAll use subject to JSTOR Terms and Conditions

Page 2: Designing Phase II Studies in the Context of a Programme of Clinical Research

BIOMETRICS 41, 373-383 June 1985

Designing Phase II Studies in the Context of a Programme of Clinical Research

John Whitehead*

Department of Applied Statistics, University of Reading, Whiteknights, P. 0. Box 217, Reading RG6 2AN, England

SUMMARY Conventional statistical determinations of sample size in phase II studies typically lead to sample sizes of the order of 25 (Schoenfeld, 1980, International Journal of Radiation Oncology, Biology and Physics 6, 371-374). When the development of new treatments is proceeding rapidly relative to the recruitment of suitable patients, such requirements can prove to be too demanding. As a result, either sample sizes are reduced by a rather arbitrary weakening of the risk specifications, or certain new treatments go untested.

In this paper, the phase II testing of a number of treatments will be considered as a single study which has the objective of identifying the most promising treatment for phase III investigation. It is seen to be advantageous to test more treatments, with fewer subjects receiving each, than the conventional methods would allow.

1. Introduction

This paper concerns phase II studies of new medical treatments and, in particular, their objective of assessing efficacy. It is assumed that questions of toxicity and dosage have already been addressed in small-scale phase I studies, although these aspects will clearly also be monitored in phase II. Treatments which appear to be both safe and efficacious at the end of phase II will pass on to the more demanding scrutiny of a phase III, comparative clinical trial.

Statistical considerations have a role to play in determining the size of phase II studies. Gehan (1961), Herson (1979), Schoenfeld (1980), and Sylvester and Staquet (1980) have all provided guidance in this matter by considering the risk of overlooking an efficacious treatment. The sample sizes suggested are often quite large for this stage of clinical research. For example, suppose a treatment is sought which will be effective for 20% or more of patients suffering from a usually fatal condition. Using Gehan's (1961) approach, with the probability of overlooking such an effective treatment set at .05, we require a preliminary trial with 14 subjects. Gehan divides the phase II study into a preliminary and a follow-up trial. If the treatment is effective for one or more of the 14 patients in the preliminary trial, then a follow-up trial with between 1 and 90 patients is suggested. The actual sample size depends both on the number of successes in the preliminary trial and the desired accuracy of the estimate of efficacy which results. Another example, this time using the approach of Schoenfeld (1980), concerns a study to determine whether a new agent can raise a traditional success rate of 25% to 45%. To achieve a probability of .1 of overlooking such an improvement and a probability of .25 of studying further a treatment no better than current methods, requires 27 patients.

* Research was conducted while the author was on leave at the Fred Hutchinson Cancer Research Center, Seattle, Washington 98104, U.S.A.

Key words. Clinical research; Clinical trials; Phase II trials; Pilot studies; Selection procedures.

373

This content downloaded from 188.72.96.21 on Wed, 25 Jun 2014 06:57:49 AMAll use subject to JSTOR Terms and Conditions

Page 3: Designing Phase II Studies in the Context of a Programme of Clinical Research

374 Biometrics, June 1985

In some areas of clinical research these calculations lead to embarrassingly large sample sizes. Problems occur in the case of rare diseases, or treatments for a special and narrow class of patients, as recruitment to such studies will be slow. In fast-moving areas of research, concerning perhaps a newly identified disease or the implications of a breakthrough in disease treatment, there may be too many new ideas to be tested with this degree of rigour. The motivation for the present research has come from contact with the bone marrow transplant research team at the Fred Hutchinson Cancer Research Center in Seattle. The successful introduction of bone marrow transplantation as a treatment for cancers of the blood has led in turn to a search for the ideal combination of preparative and follow-up measures. The number of patients in any particular disease class undergoing this lengthy and intensive procedure is quite small, while the number of potential therapeutic innova- tions is large. Many of these will provide improvements on techniques used during the pioneering phase of the procedure.

The conflict between the rate of new ideas for treatment and the rate of patients for evaluation can be resolved in one of two ways. Either some of the new ideas are never tested, or the phase II sample size can be reduced. The latter can be manipulated by raising the probability of overlooking a good treatment, or by strengthening the definition of "good"; either way involves an abuse of statistical method.

Both Schoenfeld (1980) and Sylvester and Staquet (1980) feel that the most serious error which can occur is that of overlooking a good treatment. Consequently, the false positive rates of their procedures are high, and many treatments will be passed on from phase II to phase III studies. The phase III study is the definitive, comparative trial of the treatment, one which may lead to its widespread use. It is important that the sample size of a phase III study should be large-large enough to provide a statistical analysis which has high power and provides an accurate estimate of treatment efficacy. In situations where the recruitment of patients may not keep pace with the supply of novel treatments, phase II trials must be used more crudely, rejecting all but the most promising treatments in order that the programme of phase III trials does not become overloaded.

This paper will concern a series of phase II trials from which one or more treatments are to be selected for phase III study. Selection methods have been discussed by many authors, including Paulson (1952); Bechhofer (1954); Bechhofer, Kiefer, and Sobel (1968); and Gibbons, Olkin, and Sobel (1977). Their formulation seeks to maximise the probability of selecting the best treatment, which is a demanding requirement and once more leads to the specification of large sample sizes. A more moderate criterion is that the selected treatment should have a high expected efficacy, in a sense to be made clear in Section 2. The concept used derives from work done by Finney (1957, 1958) in the context of varietal selection in agriculture. The many differences between the two areas of application make the direct implementation of his methods impossible.

The approach developed here seeks to achieve the greatest therapeutic gain from the clinical investigation of a limited number of patients. The number of patients available for study will be treated as fixed and known. Often this number will be a projection of the likely patient population presenting during several months or even a few years of investi- gation. Such projections form a necessary part of any systematic planning of a research programme, and in practice the number of patients would probably be fixed and the study duration varied to accommodate them. It is assumed that there are potentially successful treatments available in such numbers that a conflict exists between testing treatments on many patients and testing all of the treatments. The treatments may not all be available simultaneously, and indeed the number of treatments likely to be proposed during the study period might also be a projection based on the current rate of innovative development. In situations where patients are plentiful and the supply of new treatments limited, the methods described here would be less appropriate. A consequence of the method developed

This content downloaded from 188.72.96.21 on Wed, 25 Jun 2014 06:57:49 AMAll use subject to JSTOR Terms and Conditions

Page 4: Designing Phase II Studies in the Context of a Programme of Clinical Research

Designing Phase II Studies 375

here is that each treatment is tested on fewer patients than conventional designs would dictate. However, my purpose is not to justify the use of small sample sizes per se, but to recommend them only in conjunction with the comparative selection procedure described in later sections.

In Section 2 the selection process will be presented in an idealised setting which is admittedly unrealistic. An optimal solution will be found. Typically that optimal solution specifies far smaller sample sizes than the methods cited earlier. In Section 3 these theoretical results will be used to justify advice which, it is hoped, is relevant to the real world of clinical research. A discussion of the implications of this approach is given in Section 4. Theoretical justification of results quoted in Section 2 is provided in the Appendix.

2. An Idealisation

A series of phase II studies is to be conducted, each study involving a distinct and completely specified policy of treatment. The patients involved are from the appropriate target population. Each treatment will be administered to n patients whose responses can be classified as success or failure. The probability of successful application of the ith treatment, Ti, will be denoted by pi (i = 1, . .. , t). The most successful treatment will be passed on to a phase III trial; if two or more treatments tie as equally most successful then one will be chosen at random to pass on to phase III.

It will be assumed that the values Pi, . .. , pm are independently drawn from some prior distribution with density g(p). The criterion for an optimal selection scheme will involve the expected value E(p[l]), where p[l] denotes the probability of success of the selected treatment.

Here, the traditional formulation is turned around. Begin by supposing that N patients are available for the phase II testing which is to identify a single treatment for passing on to a phase III study. We shall find the number of treatments to test in order to maximise E(p[I]). Thus, N is fixed, and t and n are to be found, subject to the constraint that nt = N. Because n and t are also constrained to be integers, N will not be fixed too rigidly!

A convenient and flexible choice for the prior distribution of p is the beta distribution. With parameters r and s this gives g(p) = B(r, s)-'pT(1 - p)S-l and has mean r/(r + s) and standard deviation {[rs/(r + s + 1)] 212/(r + s)}. With this choice, it is shown in the Appendix that the expected success probability in the selected treatment is

E(p[l] r +n -X7k=I (JkY' (2.1) (P[l]) r + s + n '(21

where k- I (n\ B(r +j,s+ n - )

Jk = E i B(r ) (2.2)

The term Jk is just the probability that treatment Ti has fewer than k successes, for any value of i.

As an example of the behaviour of E(plI]), consider the case r = 2 and s = 8. This might be appropriate when current treatments achieve the low success rate of 20%. The mean of this prior distribution is .2 and the standard deviation is .12. The prior probabilities that a new treatment achieves a success rate greater than 10%, 20%, 30%, 40%, and 60% are .77, .44, .20, .07, and .00, respectively (Pearson, 1934). Figure 1 illustrates the behaviour of E(p[l]) as the number of patients per treatment (n) increases from 1 to 20. Three curves are shown, corresponding to three different, fixed total sample sizes N = nt. Achievable, integer combinations of n and t are indicated. If 60 patients are available for phase II testing then the (n, t) combinations (2, 30), (3, 20), (4, 15), (5, 12), and (6, 10) are all possible. They

This content downloaded from 188.72.96.21 on Wed, 25 Jun 2014 06:57:49 AMAll use subject to JSTOR Terms and Conditions

Page 5: Designing Phase II Studies in the Context of a Programme of Clinical Research

376 Biometrics, June 1985

0.4-

0.35-

~0.3- w

0.25- N =40

0.2-

1 5 10 15 20 n

Figure 1. A plot of E(p[q1) against n for r = 2, s = 8 and N = 40, 60, and 100. Expression (2.1) was evaluated for all integer n. Values at which t is also an integer are indicated by *.

yield .318, .326, .330, .330, and .329 for E(p[l]), respectively. Testing 10 to 15 treatments would be best. A more traditional approach might test only 5, 4, 3, or even 2 treatments with this sample, yielding values for E(p[l]) of .312, .302, .286, and .258, respectively.

Figure 1 illustrates the advantage of testing large numbers of treatments, rather than restricting attention to a few. Of course, this depends on the availability of treatments of genuine promise. In fact, given the availability of 60 patients for phase II studies, it would be more realistic to suggest that up to 10 or 15 treatments might be studied. Thus, it is unreasonable to remove treatments from study solely on the grounds that statistical theory calls for a sample size of 12 or more each, so that only 4 or fewer can be investigated. Perhaps only 6 treatments are worth investigating. Then all 6 should be studied on 10 patients each. It is disadvantageous to reduce sample size unless it releases patients for the study of extra treatments.

The small sample sizes in the example above arise in part from the fact that there is so much room for improving the poor performance of current treatments. As a second example, let the current success rate be 80%. The prior distribution for new treatments is taken to be beta with r = 8 and s = 2. Figure 2 illustrates the properties of E(p[l]) in this case. Now, larger samples are worthwhile. For N = 60, the values n = 6, 10, 12, and 15 give E(p[l]) = .874, .887, .887, and .884, respectively. In this case it is best to concentrate efforts on just 5 or 6 treatments.

The optimal sample sizes change only slowly if the parameters r and s are changed while the prior mean r/(r + s) is held constant. The effects are illustrated in parts I and II of Table 1, which shows the (n, t) combinations which maximise (2.1) over integer n such that N = nt = 60. In part I of the table the prior mean is .2; in part II it is .8. Sample sizes increase as the prior probability of discovering a major improvement decreases. In part II it can be seen that between 4 and 6 treatments should be tested whatever the shape of the prior.

This content downloaded from 188.72.96.21 on Wed, 25 Jun 2014 06:57:49 AMAll use subject to JSTOR Terms and Conditions

Page 6: Designing Phase II Studies in the Context of a Programme of Clinical Research

Designing Phase II Studies 377

1.0?

0.95-

.0.9 w

0.85- N =40

0.8- I I ,

1 5 10 15 20 n

Figure 2. A plot of E(p[l]) against n for r = 8, s = 2 and N = 40, 60, and 100. Expression (2. 1) was evaluated for all integer n. Values at which t is also an integer are indicated by *.

Parts III and IV of Table 1 investigate the behaviour of the optimum when the prior mean varies while its standard deviation is held approximately constant. As the prior mean increases, the number of treatments to be tested decreases. For a prior mean between .1 1 1 and .333, 10 to 15 treatments would be appropriate; for a prior mean between .889 and .727, 4 to 6 would be preferable. The optimum number changes only slowly with the prior mean which therefore does not have to be predicted with great accuracy.

Table 1 includes priors with (r, s) = (10, 40) and (25, 100). If such a prior was really felt to be realistic, then it is doubtful that the phase II study would be worthwhile. To see this, consider the size of the phase III trial which is to follow. Upwards of 200 patients would be required in phase III to distinguish between a standard with a success rate of 20% and an experimental treatment with a success rate of 40% with high power. To demonstrate the superiority of an experimental treatment with a smaller advantage over the standard, even more patients would be required. This would probably be impractical. However, a prior with (r, s) = (10, 40) or (25, 100) indicates that only small increases in success rate over 20% are likely-an increase to 40% is very unlikely indeed.

Modifications to the selection procedure can be made. It might be desirable to choose u (u > 1) treatments to go forward into phase III testing. Let P[u] denote the success probability of the uth best at phase II, ties again being broken. randomly. Then

r + n- E E () (Jk)(1 AJk)

E(p[])= r + s + n ' (2.3)

(see Appendix). When more than one treatment is to be selected, then even more should be tested. As an example, take u = 3, N = 60, and first r = 2 and s = 8. With only one patient per treatment, and sixty different treatments, E(p[3]) = .273, a result which cannot

This content downloaded from 188.72.96.21 on Wed, 25 Jun 2014 06:57:49 AMAll use subject to JSTOR Terms and Conditions

Page 7: Designing Phase II Studies in the Context of a Programme of Clinical Research

378 Biometrics, June 1985

Table 1 Optimal values of n and t for various prior beta distributions

(N = 60 in each case) r s E(p) SD(p) n t E(pp,])

I. 1 4 .2 .16 4 15 .426 2 8 .2 .12 5 12 .330 3 12 .2 .10 6 10 .292

10 40 .2 .056 8 7.5 .231 25 100 .2 .036 9 6.7 .213

II. 4 1 .8 .16 11 5.5 .924 8 2 .8 .12 11 5.5 .887 12 3 .8 .10 12 5 .867 40 10 .8 .056 13 4.6 .826 100 25 .8 .036 14 4.3 .811

III. 1 8 .111 .099 4 15 .233 2 8 .2 .121 5 12 .330 3 8 .273 .129 6 10 .401 4 8 .333 .131 6 10 .455 6 8 .429 .127 7 8.6 .537

IV. 8 6 .571 .127 8 7.5 .673 8 4 .667 .131 9 6.7 .770 8 3 .727 .129 10 6 .826 8 2 .8 .121 11 5.5 .887 8 1 .889 .099 14 4.3 .949

be bettered! Testing 30 treatments gives E(p[3]) = .269; testing 10 gives .243. With r = 8 and s = 2 the optimum occurs when 12 treatments are tested on 5 patients each, and E(p[3]) = .861.

A different sort of modification is possible. It may be decided to select the best treatment only if it achieves no fewer than 1 successes. If no treatment reaches this standard, then no selection will be made. Conditional upon selection, the expected value ofp[l] will be denoted by EI(p[l]) and can be calculated from

= r + n - Ek=n+ [(Jk)Y - (j1)]/[1 - (j'1y] E/(p[l]) r+s+ny, (2.4)

(see Appendix). Suppose once more that current treatments for a class of patients have only a 20%

success rate. A beta prior with parameters r = 2 and s = 8 is appropriate for experimental treatments in the phase II study. It is decided to select a treatment only if it achieves a success rate of over 40% in phase II. This requirement is met approximately by a suitable choice of 1. The upper part of Table 2 presents the effects of this strategy when N = 60. As sample size increases, so does E/(p[l]), but the probability of no selection being possible (denoted by PNS in the table) also becomes larger. One way of reconciling these conflicting tendencies is by defining the expected gain due to the phase II study. If no selection is made, there is no gain. If a selection is made, then the gain is G = P[1] - .2. Thus, expected gain, denoted by EI(G), is

E1(G) = (1 - PNS)[E/(p[l]) - .2]. (2.5)

The expected gain is maximised if 12 to 15 treatments are studied, the same strategy that maximises E(p[l]).

In the lower half of Table 2, the current success rate is assumed to be 80%, and the beta

This content downloaded from 188.72.96.21 on Wed, 25 Jun 2014 06:57:49 AMAll use subject to JSTOR Terms and Conditions

Page 8: Designing Phase II Studies in the Context of a Programme of Clinical Research

Designing Phase II Studies 379

Table 2 The effect of selecting only if the best treatment achieves no fewer than 1 successes

(In the uipper half of the table, r = 2, s = 8; in the lower half r = 8, s = 2; N = 60 throughout)

n t I l/n E(p[]) EI(p[]) PNS EI(G) 2 30 1 0.50 .318 .318 .000 .118 3 20 1 0.33 .326 .326 .000 .126 4 15 2 0.50 .330 .334 .033 .130 5 12 2 0.40 .330 .333 .021 .130 6 10 3 0.50 .329 .352 .214 .119 10 6 4 0.40 .319 .355 .295 .109 12 5 5 0.42 .312 .370 .448 .094 15 4 6 0.40 .302 .375 .525 .083 20 3 8 0.40 .286 .390 .659 .065 30 2 12 0.40 .258 .410 .791 .045

2 30 2 1.00 .833 .833 .000 .033 3 20 3 1.00 .846 .846 .000 .046 4 15 4 1.00 .857 .857 .000 .057 5 12 5 1.00 .867 .867 .002 .067 6 10 6 1.00 .874 .875 .015 .074 10 6 9 0.90 .887 .889 .025 .087 12 5 11 0.92 .887 .894 .086 .086 15 4 14 0.93 .884 .902 .229 .079 20 3 18 0.90 .877 .901 .257 .075 30 2 27 0.90 .858 .909 .457 .059

prior has parameters r = 8 and s = 2. Selection is to be made only if a success rate of 90% or better is observed in phase II. In formula (2.5) for expected gain, .8 replaces .2. Once more, the gain is highest for the same strategy that is optimum when E(ppl]) is considered; 5 or 6 treatments should be tested.

3. Practical Implementation of This Strategy

Suppose that N patients are to take part in a phase II study. Up to T treatments are of interest, although some of them might only become available part way through the study. The methods of Section 2 can be used to indicate the optimal number of treatments, to, to be used in the study. If to ' T then phase II can be designed to assign n patients to each of to treatments; N might have to be adjusted slightly because n and to are both to be integers. The initial selection of to treatments from T will not be made at random; any evidence of the superiority of one treatment over another should be used to justify the choice. If to > T, then all T treatments can be tested. If the resulting sample sizes greatly exceed those suggested by (for example) Schoenfeld (1980), then N might be reduced.

The allocation of patients to treatment is best made at random, or using a scheme which seeks balance with respect to major prognostic factors. The methods due to Pocock and Simon (1975) or Freedman and White (1976) might be useful here. Because sample sizes will be small it is unlikely that more than one prognostic factor could be considered during allocation. It might be that not all treatments are available throughout the study. Random- isation amongst available treatments, and efforts to increase the comparability of treatment groups with respect to stratification factors are still worthwhile.

During the study some treatments may be abandoned because of toxicity or other problems distinct from lack of efficacy. Patients are likely to be withdrawn, or to be incapable of evaluation, and so the final sizes of the treatment groups may be unequal. Rather than classify the application of a treatment to a patient as a success or failure, it may be preferable to use an ordinal response or a survival time. For all these reasons,

This content downloaded from 188.72.96.21 on Wed, 25 Jun 2014 06:57:49 AMAll use subject to JSTOR Terms and Conditions

Page 9: Designing Phase II Studies in the Context of a Programme of Clinical Research

380 Biometrics, June 1985

identifying the treatment which did best in phase II will not be straightforward. Once the appropriate response variable is chosen, a linear model can be fitted in which treatment appears as a factor and one or two important prognostic factors appear as covariates. The treatment selected will be the one with the largest estimated (beneficial) effect, after adjustment for the covariates. The significance of the treatment effect is irrelevant in this process. Interactions between treatments and covariates should be studied, and if striking, the application of the selected treatment might be limited to a subset of the patient population. It may happen that a few of the treatments are found to have similar efficacies. Rather than selecting from these at random, other considerations and evidence should be used to make the decision. It may be desirable to continue the phase II study of these treatments on some more patients before the final selection.

The original design for the study is going to be modified in practice by two processes: Clearly unsuitable treatments will be eliminated, and patients will withdraw or not be evaluated. The patients intended for treatments that are abandoned can be allocated amongst remaining treatments, counteracting the effect of patient withdrawal. Nevertheless, because of patient withdrawal, and the fact that the analysis will have to adjust for covariates, it would be wise to increase by one or two the number of patients per treatment indicated by the theory of Section 2. The number of treatments studied may have to be reduced in compensation.

The concept of using idealising assumptions in the choice of sample size and allowing the use of more sophisticated methods at analysis is not novel; indeed, it is often done in the planning of phase III studies. More complicated and realistic assumptions could be fed into the planning process, but little change is likely to such small sample sizes, and the adjustments would be complex and based only on the scanty information on responses and covariates available prior to phase II. Notice that, if binary responses are used, equal numbers do receive each treatment, and no covariates are of importance, then the use of a linear model will just select the treatment with the most successes.

The use of a lower bound I for the number of successes to be achieved before selection can be made is recommended; if the number of treatments is fixed following Section 2, there should be little chance of no selection being possible.

4. Discussion

The major suggestions made in this paper are as follows: (i) Plan phase II studies as procedures for selecting one treatment to go forward to phase

III out of those which are available. (ii) Use the considerations of Section 2 to determine how many treatments to study. (iii) As far as is possible, recruit to all treatments simultaneously and at random, or using

a statistical allocation procedure. (iv) Base the selection on estimated treatment effects after an analysis which uses an

appropriate linear statistical model. The suggestions are based on the assumption that the success probabilities of new

treatments follow a beta distribution. When the treatments are modifications of already successful regimens, such as drugs which are analogues of effective compounds or adjust- ments to part of an overall package of therapy, the beta distribution should be appropriate. If the treatments are all quite new, then the assumption of a beta distribution may be unreasonable, a more accurate model being one allowing a high probability that a treatment is completely ineffective. Sometimes such a situation will be avoided by more extensive in vitro and animal experimentation, but if it holds, then an alternative to the beta prior should be sought in order to apply the methodology described here.

The important question of how the total sample size N should be fixed has not been

This content downloaded from 188.72.96.21 on Wed, 25 Jun 2014 06:57:49 AMAll use subject to JSTOR Terms and Conditions

Page 10: Designing Phase II Studies in the Context of a Programme of Clinical Research

Designing Phase II Studies 381

addressed. Its choice depends in part on the ratio of the rate of development of new treatments to the rate of recruitment of suitable patients. It also depends on how patients are to be divided between phase II and phase III studies. Further research into the statistical aspects of choosing N would be worthwhile.

Two generalisations of this work would be worthwhile. The use of sequential strategies which would further reduce sampling is one. Although complex designs might be inappro- priate, there is certainly scope for curtailed sampling when it is no longer possible for one treatment to overtake another and be selected. Another generalisation would allow the prior distribution of p to be different for each treatment considered. This would be used to model the situation in which treatments would be added to the study in order of their anticipated efficacy, later ones being poorer prospects than earlier ones.

It has been assumed that the phase II studies will involve the same class of patients as the eventual phase III trial. Sometimes this will not be true, worse risk patients being treated at phase II. The comparative nature of this method should make it applicable in these circumstances. A factorial structure may be present within the treatment list. However, sample sizes are not going to be adequate for the assessment of interaction. Unless one can be sure of the absence of a major interaction, this form of selecting a treatment policy will remain valid.

The strategy described here has made no use of significance levels and has not quantified the risk of overlooking a good treatment. When the whole sequence of trials which make up phase II are considered, the fate of individual treatments becomes less important. The criticism that this type of procedure is "unfair to the treatments" is effectively countered in Finney (1958, ?12). It must be remembered that phase II is an internal procedure within a research institute or cooperative group. The selected procedure has yet to undergo a phase III comparative trial. The latter is rightly designed as a large and powerful study in which questions of significance and risks of error are important. It is the phase III results which will be used to convince colleagues and regulatory authorities of the beneficial effects of the new treatments.

ACKNOWLEDGEMENTS

The author is grateful for suggestions made by Professor R. N. Curnow of Reading University, England, and by Professors K. S. Brown, J. D. Kalbfleisch, J. McKay of Waterloo University, Ontario, and by two anonymous referees.

This work was funded by NIH grants CA- 15704 and CA-30924.

RESUME Les determinations conventionnelles des effectifs necessaires aux etudes de phase II conduisent a des tables d'environ 25, sujets (Schoenfeld, 1980, International Journal of Radiation Oncology, Biology and Physics 6, 371-374). Quand le developpement de nouveaux traitements est trop rapide vis a vis du nombre de malades disponibles, de tels effectifs sont trop importants. En consequence, soit les effectifs sont arbitrairement reduits, soit certaines therapeutiques ne peuvent etre testees.

I)ans ce papier, la phase II relative a un ensemble de traitements, est consideree comme une etude unique dont le but est d'identifier les traitements les plus prometteurs pour la phase III. On montre que tester plus de traitements avec moins de sujets pour chacun est plus avantageux que d'appliquer les methodes conventionnelles.

REFERENCES

Bechhofer, R. E. (1954). A single-sample multiple decision procedure for ranking means of normal populations with known variances. Annals of Mathematical Statistics 25, 16-39.

Bechhofer, R. E., Kiefer, J., and Sobel, M. (1968). Sequential Identification and Ranking Problems. Chicago: University of Chicago Press.

This content downloaded from 188.72.96.21 on Wed, 25 Jun 2014 06:57:49 AMAll use subject to JSTOR Terms and Conditions

Page 11: Designing Phase II Studies in the Context of a Programme of Clinical Research

382 Biometrics, June 1985

Received January 1984; revised July 1984.

APPENDIX

The notation of Section 2 will be used. In addition, let treatment Ti achieve Si successes in its n applications, treatment T[1] with S[1] successes being selected. Now, by Bayes' theorem,

E(pi I Si) = (r + S,)/(r + s + n). Furthermore,

E(p[I]) = E(pi I Ti is selected)

= E[E(pi Si) I Ti is selected]

= [r + E(S[l])]/(r + s + n). (A.1)

Now n k-I

E(S[I]) = n - E E Pr(S[l] = j) k=l j=O

n = n- > Pr(S[l] < k)

k=1

n

= n- i [Pr(S, < k)]'. (A.2) k=1

Putting Jk = Pr(Sj < k), we have

Jk = f E ( _i) p'(l - p)n-jg(p) dp

k (nl B(r +j,s + n-j) (A.3) j=O I B(r, s)(A3

Putting together (A. 1), (A.2), and (A.3) yields (2. 1) and (2.2). Equation (A. 1) holds when [ 1 ] is replaced by [u], and

n

E(S[U,]) = n- _ Pr(S[u] < k). k=lI

Finney, D. J. (1957). Statistical problems of plant selection. Bulletin de l'Institut International de Statistique 36, 242-268.

Finney, D. J. (1958). Plant selection for yield improvement. Euphytica 7, 83-106. Freedman, L. S. and White, S. J. (1976). On the use of Pocock and Simon's method for balancing

treatment numbers over prognostic factors in the controlled clinical trial. Biometrics 32, 691-694.

Gehan, E. A. (1961). The determination of the number of patients required in a preliminary and a follow-up trial of a new chemotherapeutic agent. Journal of Chronic Diseases 13, 346-353.

Gibbons, J. D., Olkin, I., and Sobel, M. (1977). Selecting and Ordering Populations: A New Statistical Methodology. New York: Wiley.

Herson, J. (1979). Predictive probability early termination plans for phase II clinical trials. Biometrics 35, 775-783.

Paulson, E. (1952). On the comparison of several experimental categories with a control. Annals of Mathematical Statistics 23, 239-246.

Pearson, K. (1934). Tables of the Incomplete Beta-Function. London: Biometrika. Pocock, S. J. and Simon, R. (1975). Sequential treatment assignment with balancing for prognostic

factors in the controlled clinical trial. Biometrics 31, 103-115. Schoenfeld, D. (1980). Statistical considerations for pilot studies. International Journal of Radiation

Oncology, Biology and Physics 6, 371-374. Sylvester, R. J. and Staquet, M. J. (1980). Design of phase II clinical trials in cancer using decision

theory. Cancer Treatment Reports 64, 519-524.

This content downloaded from 188.72.96.21 on Wed, 25 Jun 2014 06:57:49 AMAll use subject to JSTOR Terms and Conditions

Page 12: Designing Phase II Studies in the Context of a Programme of Clinical Research

Designing Phase II Studies 383

Now S[,ii < k if Si < k for all i, with up to (u - 1) exceptions. Hence, n u-I/\ E(Seu ) = n - ui t t (J) (I - Jk)

k=l v=O

and (2.3) follows. Finally, consider EI(p[1]),

E,(p[j]) = E(p[]I Sp] > 1) = r + E(S[11 I Sp, 2: 1) r +s + n as in (A. 1). Also

n

E(Sij] I Spl] 2 1) =n- E Pr(Sp,l < k I Spl] 2 1) k=lI

n Pr(l '~ Spl] < k) k=1+l Pr(S[l 2 1)

n 1 - J '

from which (2.4) follows.

This content downloaded from 188.72.96.21 on Wed, 25 Jun 2014 06:57:49 AMAll use subject to JSTOR Terms and Conditions


Recommended