PLEASE SCROLL DOWN FOR ARTICLE
This article was downloaded by: [University of North Carolina Chapel Hill]On: 16 December 2008Access details: Access Details: [subscription number 768122806]Publisher Taylor & FrancisInforma Ltd Registered in England and Wales Registered Number: 1072954 Registered office: Mortimer House,37-41 Mortimer Street, London W1T 3JH, UK
Communications in Statistics - Theory and MethodsPublication details, including instructions for authors and subscription information:http://www.informaworld.com/smpp/title~content=t713597238
Sample size re-estimation without unblinding for normally distributed outcomeswith unknown varianceA. Lawrence Gould a; Weichung Joseph Shih a
a Biostatistics and Research Data Systems, Merck, Sharp, and Dohme Research Laboratories,
Online Publication Date: 01 January 1992
To cite this Article Gould, A. Lawrence and Shih, Weichung Joseph(1992)'Sample size re-estimation without unblinding for normallydistributed outcomes with unknown variance',Communications in Statistics - Theory and Methods,21:10,2833 — 2853
To link to this Article: DOI: 10.1080/03610929208830947
URL: http://dx.doi.org/10.1080/03610929208830947
Full terms and conditions of use: http://www.informaworld.com/terms-and-conditions-of-access.pdf
This article may be used for research, teaching and private study purposes. Any substantial orsystematic reproduction, re-distribution, re-selling, loan or sub-licensing, systematic supply ordistribution in any form to anyone is expressly forbidden.
The publisher does not give any warranty express or implied or make any representation that the contentswill be complete or accurate or up to date. The accuracy of any instructions, formulae and drug dosesshould be independently verified with primary sources. The publisher shall not be liable for any loss,actions, claims, proceedings, demand or costs or damages whatsoever or howsoever caused arising directlyor indirectly in connection with or arising out of the use of this material.
COMMUN. STATIST.-THEORY METH., 21(10), 2833-2853 (1992)
SAMPLE SIZE REESTIMATION WITHOUT UNBLINDING
FOR NORMALLY DISTRIBUTED OUTCOMES
WITH UNKNOWN VARIANCE
A. Lawrence Gould Weichung Joseph Shih
Biostatistics and Research Data Systems
Merck, Sharp, and Dohme Research Laboratories
West Point, PA 19486 Rahway, NJ 07065-914
Key Words and Phrases: sample size adjustment; interim analysis;
clinical trial; EM algorithm
ABSTRACT
Monitoring clinical trials in nonfatal diseases where ethical
considerations do not dictate early termination upon demonstration of
efficacy often requires examining the interim findings t o assure that the
protocol-specified sample size will provide sufficient power against the null
hypothesis when the alternative hypothesis is true. The sample size may be
increased, if necessary to assure adequate power. This paper presents a new
method for carrying out such interim power evaluations for observations
from normal distributions without unbIinding the treatment assignments or
discernably affecting the Type 1 error rate. Simulation studies confirm the
expected performance of the method.
Copyright O 1992 by Marcel Dekker, Inc.
Downloaded By: [University of North Carolina Chapel Hill] At: 15:23 16 December 2008
1. INTRODUCTION
GOULD AND SHIH
As a rule, group sequential methods (e.g., Pocock, 1977, 1982; O'Brien
and Fleming, 1979; Gould and Pecore, 1982; Lan and DeMets, 1983, Geller
and Pocock, 1987), allow early rejection (or, sometimes, acceptance) of the
null hypothesis if warranted by the interim findings. These methods often
are used in clinical trials in cancer, heart disease, and other life-threatening
conditions where ethical considerations require terminating the trial if there
is compelling early evidence of efficacy.
Double-blinded trials should remain so until completion if the null
hypothesis will not be accepted or rejected at an interim stage, to prevent
conscious or unconscious bias. However, the ability to check the assump-
tions made in determining the sample size without unblinding would be use-
ful, to assure that the trial has adequate power. Gould (1992) described a
means for doing so when the outcomes were binomially distributed. Nor-
mally distributed outcomes with unknown within-group variances require a
different approach because estimating the within-group variances requires
group mean information unneeded for binomially distributed outcomes.
Sample size readjustment for normally distributed data has been
studied previously, most recently by Lohr (1988) and by Wittes and Brittain
(1990). Lohr obtained the asymptotic properties of estimates of the mean
and covariance matrix of a multivariate normal distribution when the sam-
ple size can be adjusted on the basis of one or two interim analyses of the
data. Lohr's method is based on the usual corrected cross-product estimator
of the sample covariance matrix, and so would require knowing the
Downloaded By: [University of North Carolina Chapel Hill] At: 15:23 16 December 2008
SAMPLE SIZE RE-ESTIMATION 2835
individual group means in the hypothesis-testing situation considered here.
Wittes and Brittain studied by simulation a procedure for adjusting the
sample size in finite samples that also requires knowing the individual group
means at the interim examination. The approach considered here does not
require knowing the group means at the interim examination, and applies
for finite samples from univariate normal distributions.
Section 2 below describes the method and how it affects the Type 1
error rate in finite samples. Section 3 discusses estimating u2, the common
within-group variance. Section 4 provides the findings from a series of simu-
lation studies that confirm the anticipated performance of the method.
Section 6 addresses briefly several issues arising in its application.
2. METHOD
2.1 Description The method is analogous to Stein's (1945) method for
obtaining a sample large enough to provide a specified-width confidence
interval, but differs in (a) not requiring identification of the treatment
assignments, and (b) using all of the information in the combined sample.
2 Suppose that N observations are to be drawn, 8N from a H(pl, u ) distribu-
2 tion and (1-Q)N from a N(p2, u ) distribution, n2 unknown, 0 < (3 < 1.
For simplicity, assume 8 = 0.5, although this is not essential. The null
hypothesis Ho: p1=p2 ordinaily would be tested against the alternative HI:
p1 # p2 using a Student t test. Given Type 1 and Type 2 error rates a and
p, respectively, the total sample size would be determined from
Downloaded By: [University of North Carolina Chapel Hill] At: 15:23 16 December 2008
2836 GOULD AND SHIH
where pl - p2 is determined by a specific alternative hypothesis HI: p1 - p2
2 = A (a known value), a is an assumed value $r u2, and z, is the value at
which the standard normal cdf equals 7. If a 2 underestimates u2, the actual
likelihood of rejecting Ho when H1 is true will be less than the power
specified for the trial.
Now suppose that the sample size will be reconsidered after n (c N)
observations (e.g., n t N/2) without knowing the treatment assignments.
With a rearonable estimate, i2, of the within-group variance, u2, one can
determine via (1) the actual sample size
needed to provide 100(1-p)% power for rejecting the null hypothesis. If N'
is "sufficiently larger" than N, additional patients would be obtained to
bring the final sample size up to N'; otherwise, the trial would be completed
as planned. For example, requiring N'IN > 1.25 means that the sample will
be increased only if the "correct" sample size is more than 25% larger than
the original sample size. To keep the final sample size within reasonable
limits, N' might be limited to no more than some multiple of N (e.g., N' 5
2N). The options when N' > UN are discussed in Section 6 .
2.2 Effect on Type 1 Error Rate Let the random LL. able Z1
denote the difference between the means of the initial samples based on a
total of nl observations, and let the random variable Z2 denote the differ-
ence between the subsequent sample means, based on a total of n2 observa-
tions. z1 and z2 both estimate 6 = p1 - p2; neither Z1 nor Z2 actually
would be observed in practice because the group membership of the data
Downloaded By: [University of North Carolina Chapel Hill] At: 15:23 16 December 2008
SAMPLE SIZE RE-ESTIMATION 2837
remains blinded. Suppose for simplicity that equal numbers of observations
are drawn from each distribution. Combining the two samples yields
N = nl + n2, m = ml + m2, 2 = (n l i l + n2Z2)/N,
and 2 2 2 s = (mlsl + m2s2)/m
2 2 where si denotes an estimator of a based on mi degrees of freedom from
2 2 the initial (i = 1) or subsequent (i = 2) sample. Assume that misi / u has a
chi-square distribution with mi degrees of freedom, at least approximately.
2 2 Values for sl and s2 are required in practice. The hypothesis Ho: 6 = 0 will
be tested using the statistic t = Z/s.
The probability of wrongly rejecting Ho when n2 does not depend on
2 s1 is provided by the integral of a central t density with m degrees of free-
dom over the set of values It1 > tc, an appropriate critical value. The
2 probability cannot be computed in this way when n2 depends on sl.
The joint density of the mean and sample variance from the initial
sample is essentially the product 0f.a normal and a chi-square density. Con-
2 ditional on sl, the same is true of the joint density of the mean and sample
variance from the second sample. Consequently, the joint density of the
statistics from both samples is the product of these densities. The joint
density of Z and the sample variances can be written as
Downloaded By: [University of North Carolina Chapel Hill] At: 15:23 16 December 2008
2838 GOULD AND SHIH
2 Since nl and a are fixed quantities, this expression can be simplified with
no loss of generality by the transforms vi = misZ/u2, i = 1, 2. With the
additional transformation Z -+ t =m Z/s, the density becomes
The probability of rejecting Ho is the integral of (3):
The quantity t,-(vl) depends on vl because the distribution of t and v2
2 depends on n2, which is determined by sl and, therefore, by vl. Consequent-
ly, the order of integration in (4) cannot be interchanged, as the usual
derivation of the Student t density would require.
To illustrate the effect of the dependence, suppose that n2 depends
on v1 in the following way: vl 6 v! , n2 = n 2 ~ ; v1 > v i n2 = n22.
With the transformation vl, v2 -. v ( = v + v2), w (= vl/v), (4) can be 1
Downloaded By: [University of North Carolina Chapel Hill] At: 15:23 16 December 2008
SAMPLE SIZE RE-ESTIMATION 2839
where I x ( . , . ) denotes the usual incomplete Beta function, f 2(.: m) denotes X . .
a central chi-square density with m degrees of freedom, a(,) denotes the
standard normal cdf, ti1) denotes the critical value for a central t distribu-
tion with m(') = ml + m21 degrees of freedom, m21 = n21 - 2, etc. The
first integral in (5) is a, the nominal Type 1 error rate. The remaining
terms of (5) represent the perturbation of the Type 1 error rate due to the
sequential sampling scheme. These latter two terms cancel if n21 = n22.
The magnitude of the perturbation can be calculated easily. Thus,
suppose that nl = 20, so that ml = 18. This is not a large initial sample.
At the interim stage, decide to obtain n2 = 20 more observations (10 from
2 2 each group) if sl < 1.5, or n2 = 40 more observations if sl > 1.5. Suppose
that the test is to be at a nominal 5% level, so that the critical t value
would be tc = 2.03 (n2 = 20) or 2.00 (n2 = 40). Assume that u = 1. Then
the lower integration limit in (5) is v i = rnlsf/02 = 18 x 1.5/1 = 27.
Figure 1 plots the values of the algebraic sum of the second and third terms
of (5). The net value of this sum is -0.0002, which represents the negligible
difference between the true and nominal Type 1 error rates in this example.
The simulation findings presented below also support the assertion that this
approach has a negligible effect on the Type 1 error rate.
The sample size re-estimation approach described here does not rule
out the possibility that the interim estimate of u2 might be small enough so
that no further observations would be required to assure the desired power,
Downloaded By: [University of North Carolina Chapel Hill] At: 15:23 16 December 2008
GOULD AND SHIH
Deviation From True Type 1 E r r o r Rate : Values of Differences Between I n t e g r a n d s
0.05,
Figure 1. Deviation from True Type 1 Error Rate: Values of Differences Between Integrands
0.04
0.03
a, 0.02 0
c 0 . 0 1 - Q,
0 . 0 0 - . . L C
-0.01
2 -0.02 - 2 -0.03 t!D 2 -0.04 C - -0.05
-0.06 -
-0 .07 -
i.e., that n2 = 0. Essentially the same argument used to obtain (5)
establishes the following result:
- !'-\ nl = 20, n2 = 0 or 20 - 1 \,/ Integrated diff. = -0.0103
I \ - I \
\ 1 I \ I
. . . . . . . . . . r 1
- 1 l
- I 8 I
- ; I 1 n , = 20, n2 = 20 or 40
- I I Integrated d i f f . = -0.0002 I I
- I t I I 1 I I I
'*/
Downloaded By: [University of North Carolina Chapel Hill] At: 15:23 16 December 2008
SAMPLE SIZE RE-ESTIMATION 2841
Here, tL1) refers to the critical value for a t distribution with rnl = 9 - 2
degrees of freedom and ti2) refers to a t distribution with rn = ml+m2 d.f.
To illustrate the effect of possible early termination on the Type 1
error rate, suppose that nl = 20. At the interim stage, obtain n2 = 20 more
2 2 observations if sl 2 0.5, or call the trial complete if sl < 0.5. For a nominal
5% level test, the critical t value would be tc = 2.10 (n2 = 0) or 2.03 (n2 =
20). If v = 1 then the lower integration limit in (6) is V; = mlsi2/a2 = 18 .
x 0.511 = 9. Figure 1 also displays the results of the calculations for this
case. Even with the small sample size (10 or 20 observations per group), the
RHS of (6) is -0.01, a small and conservative effect on the Type 1 error rate.
3. ESTIMATING a 2
If the treatment assignments were known, e2 could be computed by
pooling the within-group sample variances. Since the assignments are
2 unknown, a must be estimated some other way. We consider two ways to
estimate a2 : a simple adjustment of the pooled sample variance based on
the difference between the means presumed by HI; and the EM algorithm,
which does not depend on HI.
3.1 Simple adjustment Suppose the interim sample contains en obser-
vations from group 1 and (l-0)n observations from group 2; n is known, e is
unknown. Let x.. denote the j-th observation from group i. The overall 1J
estimate of a2 based on the pooled sample can be computed without
unblinding and written formally as
Downloaded By: [University of North Carolina Chapel Hill] At: 15:23 16 December 2008
GOULD AND SHIH
2 2 where $ denotes the unknown within-group estimate of a . Since the inter-
im sample is blinded, 6 and the group sample means jZ1, x -2 will be un-
known, as will both terms of this last expression. However, if the alternative
hypothesis HI: p l -p2 = A is t rue and if n is large enough so that - X 2 is
reasonably close to A, then
2 nqi -e) (i, - r212 e o(l-e)(n - I ) A ,
so that if O = 0.5,
2 2 k2 = (s - A 14). n-2 (7)
When a blocked randomization scheme is used to assign subjects to treat-
ments, 0 will be very nearly known and very close to @. This will be true
especially if the block size is l x or 2 x the number of treatments. The effect
will be to improve the approximation immediately preceding (7).
3.2 EM Algorithm Since the treatment identifications are unknown,
any of the interim observations xi, i = 1, ..., n could be in either treatment
group, so that the treatment assignments are "missing at random" (Rubin,
1976). Let ri denote the treatment group membership indicator:
7. = 1 (0) if sample member i is in treatment group 1 (group 2) 1
71, . . ., rn are independent random variables with T(ri = 1) = e. Given ril
x. (i = 1, ..., n) has a normal distribution with density 1
Downloaded By: [University of North Carolina Chapel Hill] At: 15:23 16 December 2008
SAMPLE SIZE RE-ESTIMATION 2843
TABLE 1
Accuracy of EM algorithm estimate of sigma (100 iterations per case)
25 obs/gp [True Mean Difference1 / True a
0 0.5 1 2 True a I Mean S. D. Mean S. D. Mean S.D. Mean S.D.
Notes: (1) Each recursive computation of 8 continued until conver-
gence (successive estimates differing by 0.01 or less) or until 50
cycles had been reached, whichever came first.
50 obs/gp )True Mean Difference) / True. o
0 0.5 1 2
(2) The tabulated quantities are the estimated values of a
and the corresponding standard deviations among the 100
repetitions of each case.
True o
0.5
The expression for the conditional probability (or expectation) of si given xi
Mean S.D. Mean S.D. Mean S.D. Mean S.D.
0.481 0.035 0.494 0.038 0.511 0.041 0.576 0.091
therefore is ?(ri = I I xi) = B(ri I xi) D
ownloaded By: [University of North Carolina Chapel Hill] At: 15:23 16 December 2008
2844 GOULD AND SHi.
The log likelihood of the interim observations follows from (8),
The EM algorithm (Dempster, Laird, and Rubin, 1977) for estimating a
proceeds as follows. Assume 0 = 8. The "E" step consists of substituting
"current" estimates of pl, p2, and u into (9) to obtain provisional values for
the expectations of the ri. The "M" consists of obtaining maximum likeli-
2 hood estimates of pl, p2, and o after replacing the ri in (10) with their pro-
visional expectations. The "E" and "Mu steps are repeated until the value
2 of u stabilizes; the resulting value is the estimate, &: of u required in (2).
Table 1 provides the results of a small simulation study investigating the
performance of this algorithm. Although u2 was estimated accurately, -
p2)/a was not estimated well. The averages over the iterations of the values
of (bl-ji2)/8, based on maximum likelihood estimators, ranged from 0.3 to
0.5 in 29 of the 32 cases shown in Table 1, in no particular pattern; the
exceptional values were 0.6, 0.7, and 0.8. This is consistent with Fowlkes's
(1979) assertion that the accuracy of the estimates of pl and p2 cannot be
assured due to their sensitivity to the starting values (Fowlkes, 1979).
3.3 Initial values for EM algorithm We adapt a suggestion of
Fowlkes (1979) for finding initial parameter estimates for the EM
algorithm. Let z ( ~ ) < z ( ~ ) < ... < z ( ~ ) denote the ordered data at the
interim evaluation. Let p. = (i - 0.5)/n for i=l , ..., n and calculate qi 1
= where 0-I denotes the inverse of the standard normal
distribution function. Fit a simple linear regression by least squares to
the points {(qi, z . ), i=l, ..., n); let b denote the slope of the fitted (1)
Downloaded By: [University of North Carolina Chapel Hill] At: 15:23 16 December 2008
SAMPLE SIZE RE-ESTIMATION
line, and let a denote its intercept:
The initial values of a , p,, and p2 then are
where c is some chosen constant. The choice of c influences the
estimation of the means, but not the variance. Ideally, we would like c
= 2 ~ / ( ~ ~ - p ~ ) ; however, although b estimates u , there is no good
estimate of (p2-p1). We get around this problem in the following way.
In most clinical trials that use a normal approximation for estimating
the sample size, the inverse of the coefficient of variation A = (pl-p2)/a
usually ranges between 0.20 and 0.50 (which correspond to about 430
and 70 patients per group, respectively, for power = .90, one-sided a =
0.05). We suggest taking the middle value in this range, 0.35, and
converting it to c = 2 x ( 110.35) = 5.71.
4. SIMULATION STUDIES
4.1 Design Simulation studies explored the behavior of the proce-
dure over a range of parameter values likely to occur in practice. The
values of sigma assumed by the design ( 8 ) and the true value of sigma
(a) were set at 0.707, 1, 1.414, 2, 2.828, and 4. All combinations of a
and 8 values were considered. The design always assumed A = 1, and
the sample size was selected to provide 90% power for rejecting the null
Downloaded By: [University of North Carolina Chapel Hill] At: 15:23 16 December 2008
2846 GOULD AND SHIH
hypothesis when the alternative was true. Equal samples were taken
from each distribution (O = 0.5). For the simulation, the true mean
differences were set at 0 (null hypothesis true), 0.5, 1, and 2. The
effects of evaluating the sample size after obtaining 25% and 50% of the
initially planned data were considered, as were the effects of two rules
for deciding to increase the sample size (increase if N1/N > 1.33 or
1.05). In all cases, N' < 2N, reflecting a practical limitation on
increasing the size of ongoing studies. The effect of the algorithm used
to estimate a (simple or EM) also was evaluated. In all, 864 cases (36
combinations of a and 8 , 3 nonzero true mean difference values, 2
examination time values, 2 values of sample size increase rule, 2 algo-
rithms) were run. Each case included a test with a zero mean
difference and a nonzero true mean difference, so there were 864 tests of
the null hypothesis when it was true. Each case was replicated 1000
times, and statistics were collected about the number of rejections of
the null hypothesis when it was true and when it was false, and the
distributions of the final sample size under either hypothesis.
4.2 Results The probability of rejecting Ho when it was true
did not depend materially on any of the factors defining the cases,
because none of the coefficients differed significantly from 0 in a logistic
regression relating the probability of wrongly rejecting Ho to these fac-
tors for each algorithm. Therefore, the 864 rejection frequency values
should be distributed like Binomial variates with n = 1000 and p =
0.05. Figure 2 displays the distributions of the rejection frequencies for
the two algorithms. The results agree closely with expectation.
Downloaded By: [University of North Carolina Chapel Hill] At: 15:23 16 December 2008
SAMPLE SIZE RE-ESTIMATION
Observed and E x p e c t e d CDF of R e j e c t i o n s of Ho in 1000 Runs (432 c a s e s )
1 0 Observed CDF (EM a l g o r i t h m )
- Expec ted C D F if p = 0 . 0 5
30 35 40 45 50 55 60 65 70 75 80
Rejections of Ho
Figure 2. Observed and Expected CDF of Rejections of Ho in 1000 Runs
(432 cases for each way of estimating u2)
Figure 3 displays the effects of correctly and incorrectly
specifying the true mean difference and the true variance on the
likelihood of rejecting Ho when A # 0. The two algorithms for
estimating o behaved essentially identically. This probably reflects the
range of A / U values used in the simulations (which covers most of the
situations in c1inica.l trials that use a normal approximation for sample
size calculations). Overspecifying the true mean difference or
underspecifying the true variance caused a loss in power, as expected.
However, when the true mean difference and variance were correctly ,
Downloaded By: [University of North Carolina Chapel Hill] At: 15:23 16 December 2008
GOULD AND SHIH
Percent of 1000 R u n s Rejecting Ho as a Function of the True Mean Difference ( T M D ) and odes/utrue
TMD = 2 6 A7
0 . 0
0 .
0 . . . . ... T M D = 0.5
/ 0
4 0
0 /
/ 0 Design assumptions
_ - - - v 4' ' - T M D = 1
/ Power = 90% 0
0 O d e s = V t r u e
7'
Figure 3. Percent of 1000 Runs Rejecting Ho as a Function of the True
Mean Difference (TMD) and udes/urrue
specified, the power was very close to the assumed value of 90%,
usually exceeding it slightly. Since the EM procedure does not depend
on &, the value assumed for u in calculating the sample size, the loss of
power when 6 (= odes in Fig. 1) is less than utrue actually was due to
requiring that N' 5 2N.
5 . EXAMPLE
Suppose that a difference A = 0.30 is to be detected with 90%
power using a 1-sided 5% level test (a = -05). A design taking e = 1.5
Downloaded By: [University of North Carolina Chapel Hill] At: 15:23 16 December 2008
SAMPLE SIZE RE-ESTIMATION 2849
would require 430 patients per group; a design with B = 0.80 would
require 120 patients in each group. If the (unknown) true value of u
actually were 1, then the trial should contain 190 patients per group.
In practice an interim examination might be carried out after observing
100 patients, 50 from each group, and might suggest that the final
sample should contain 200 patients in each group. If the trial had been
designed with 5 = 0.80, this would mean that 160 more patients than
planned needed to be entered into the trial and assigned at random to
the two groups. If the trial had been designed with B = 1.5, no further
patients beyond those planned would need to be recruited for the trial.
6. DISCUSSION
The method described here does not estimate reliably the true
difference between the treatment means (Fowlkes, 1979), and so does
not provide a way to ascertain the actual magnitude of pl-p2. The
average and median "mean differenceJu" values estimated from the 100
repetitions of each case summarized in Table 1 did not depend materi-
ally on the true "mean difference/an values.
The statistical power specified at the planning stage and checked
at the interim stage corresponds to a fixed alternative hypothesis that
the true mean difference equals A, a quantity specified by the
researcher. In the context of a clinical trial, A would be the least
clinically meaningful difference worth detecting, identified a priori.
The method provides a given level of assurance for detecting a specified
Downloaded By: [University of North Carolina Chapel Hill] At: 15:23 16 December 2008
GOULD AND SHIH
difference if it is present. It is not designed to enhance the likelihood of
detecting the difference that appears to be present (which cannot be
estimated).
The method ordinarily needs to applied only once, when enough
2 data are available to provide a reasonably reliable estimate of u . Table 1 suggests that as few as 25 observations per group should suffice.
From (2), N' is a random variable with a heavy tail to the right; when
the assumed and true a values happen to be close, then overly large N'
values become disproportionately more likely with smaller values of N.
Thus, an interim look with fewer than 25 observations per group may
lead to too large a final sample size. The procedure does not have to be
repeated after obtaining a reliable estimate of u2 because the estimate
and, therefore, the sample size, will not change materially with further
looks. Moreover, adding new patients to a multicenter clinical trial
brings up many administrative issues, e.g., changes in contracts,
funding, perhaps number of centers, etc. The fewer of these that have
to be made, and the earlier, the better.
When N' > wN, there are two options. The trial may be
terminated immediately and its results summarized without testing the
hypotheses. Such a trial would be regarded as uninformative about the
hypotheses, and reexamination of the assumptions about the variability
of the responses or the relevance of the target population would be
appropriate. Alternatively, the trial could be continued to completion
with the additional observations, accepting the possibility that the
Downloaded By: [University of North Carolina Chapel Hill] At: 15:23 16 December 2008
SAMPLE SIZE RE-ESTIMATION 285 1
actual power may be less than desired. Less power does not mean zero
power, so rejection of the null hypothesis still could occur on comple-
tion of the trial.
The reestimated sample size could turn out to be much smaller
than the planned size (e.g., 180 vs. 430 patients per group as in Section
5 ) , suggesting that the trial could be terminated after obtaining the
initial observations. This is unlikely to affect the Type 1 error rate
materially, as shown in section 2.2. However, unless ethical considera-
tions dictate otherwise, the trial should not be terminated because
demonstrating efficacy with respect to a single variable seldom is the
only objective of a trial.
The EM algorithm always reasonably estimates a, regardless of
the true and assumed values of A and a. This certainly is useful for
designing additional trials in the same indication before completing the
current trial. More importantly, however, the value of N' provided by
(2) is the value likely to provide the required power for rejecting KO in
favor of the specified alternative. This is not necessarily true for the
simple method. The simple estimate of a assumes a value for A and,
from (7), may understate or overstate the true value of a depending on
whether this assumed value overestimates or underestimates the true
value. Overestimating the true value of A causes underestimation of u,
so N' is insufficient to provide the required power against Ho in favor of
the specified alternative. This guards against an inflated sample size
when Ho is true, but the power loss may be excessive when the true
Downloaded By: [University of North Carolina Chapel Hill] At: 15:23 16 December 2008
2852 GOULD AND SHIH
value of A is only a little less than the vaiue set by HI. The converse is
true when the assumed value of A exceeds the true value, so that the
simple method has the undesirable property of moving the sample size
away from clinical reality (Spiegelhalter, Freeman, and Blackburn,
1986).
BIBLIOGRAPHY
Dempster, A.P., Laird, N.M., & Rubin, D.B. (1977). Maximum
likelihood from incomplete data via the EM algorithm (with
discussion). Journal of the Royal Statistical Society, B 39, 1-38.
Fowlkes, E.B. (1979). Some methods for studying the mixture of two
normal (lognormal) distributions. Journal of the American Statistical
Association 74, 561 - 575.
Geller, N. L. & Pocock, S. J. (1987). Interim analyses in randomized
clinical trials: Ramifications and guidelines for practitioners.
Biometries 43, 213-223.
Gould, A. L. (1992). Lnterim analyses for monitoring clinical trials that
do not affect the type I error rate. Statistics in Medicine 11, 55-66.
Gould, A.L. and Pecore, V.J. (1982). Group sequential methods for
clinical trials allowing early acceptance of Ho and incorporating costs.
Biometrika 69, 75-80.
Lan, K.K.G. and DeMets, D.L. (1983). Design and analysis of group
sequential tests based on the Type 1 error spending rate function.
Biornetrika 74, 149-154.
Lohr, S. L. (1988). Accurate multivariate estimation using double and
triple sampling. University of Minnesota Technical Report No. 505,
February 1988.
Downloaded By: [University of North Carolina Chapel Hill] At: 15:23 16 December 2008
SAMPLE SIZE RE-ESTIMATION 2853
OIBrien, P. C. & Fleming, T. R. (1979). A mutiple testing procedure
for clinical trials. Biometn'cs 35, 549-556.
Pocock, S.J. (1977). Group sequential methods in the design and
analysis of clinical trials. Biometrika 64, 191-199.
Pocock, S. J. (1982). Interim analyses for randomized clinical trials:
The group sequential approach. Biometn'cs 38, 153-162.
Rubin, D. B. (1976). Inference and missing data. Biometrika 63, 581-
592.
Stein, C. (1945). A two-sample test for a linear hypothesis whose
power is independent of the variance. Annals of Mathematical
Statistics 16, 243-258.
Wittes, J. and Brittain, E. (1990). The role of internal pilot st,udies in
increasing the efficiency of clinical trials. Statistics in Medicine 9,
65-72.
Received November 1991; Revised May 1992
Downloaded By: [University of North Carolina Chapel Hill] At: 15:23 16 December 2008