This article was downloaded by: [Moskow State Univ Bibliote]On: 10 January 2014, At: 18:37Publisher: Taylor & FrancisInforma Ltd Registered in England and Wales Registered Number: 1072954Registered office: Mortimer House, 37-41 Mortimer Street, London W1T 3JH,UK
Journal of the AmericanStatistical AssociationPublication details, including instructions for authorsand subscription information:http://amstat.tandfonline.com/loi/uasa20
Some Estimators in Samplingwith Varying Probabilitieswithout ReplacementDes Raj aa Indian Statistical InstitutePublished online: 11 Apr 2012.
To cite this article: Des Raj (1956) Some Estimators in Sampling with VaryingProbabilities without Replacement, Journal of the American Statistical Association,51:274, 269-284
To link to this article: http://dx.doi.org/10.1080/01621459.1956.10501326
PLEASE SCROLL DOWN FOR ARTICLE
Taylor & Francis makes every effort to ensure the accuracy of all theinformation (the “Content”) contained in the publications on our platform.However, Taylor & Francis, our agents, and our licensors make norepresentations or warranties whatsoever as to the accuracy, completeness,or suitability for any purpose of the Content. Any opinions and viewsexpressed in this publication are the opinions and views of the authors, andare not the views of or endorsed by Taylor & Francis. The accuracy of theContent should not be relied upon and should be independently verified withprimary sources of information. Taylor and Francis shall not be liable for anylosses, actions, claims, proceedings, demands, costs, expenses, damages,and other liabilities whatsoever or howsoever caused arising directly orindirectly in connection with, in relation to or arising out of the use of theContent.
This article may be used for research, teaching, and private study purposes.Any substantial or systematic reproduction, redistribution, reselling, loan,sub-licensing, systematic supply, or distribution in any form to anyone is
expressly forbidden. Terms & Conditions of access and use can be found athttp://amstat.tandfonline.com/page/terms-and-conditions
Dow
nloa
ded
by [
Mos
kow
Sta
te U
niv
Bib
liote
] at
18:
37 1
0 Ja
nuar
y 20
14
SOME ESTIMATORS IN SAMPLING WITH VARYINGPROBABILITIES WITHOUT REPLACEMENT
DES RAJ
Indian Statistical Institute
The problem considered is estimation of the total value of a characterfor a finite population from a sample when the units are selected withvarying probabilities without replacement. Several unbiased estimatorsare proposed. Exact expressions and unbiased estimators for the variances of the estimators are obtained. Certain properties of Yates andGrundy's estimator of variance are proved. Examples are includedto study the relative performance of the various estimators. The results obtained for unistage sampling in the first part of the paper areextended to multistage designs in the second part.
PART I
1, INTRODUCTION
I T IS well-known that by assigning varying probabilities of selection to different units in a population, it is possible to reduce considerably the sampling
error of the estimates over those obtained when sampling with equal probabilities. Recently, Horvitz and Thompson [4] have proposed an unbiasedestimator for estimating the total of a finite population and have also estimatedthe variance of their estimator when sampling is carried out without replacement with varying probabilities at each draw. Their estimator is
_ n YiYHT = :L -
I 7r i
(1)
where 7ri is the probability that the ith unit in the population enters the sampleof size n. Also
(2)
and an unbiased estimate of the variance is
(3)
where 1rij is the probability that the ith and jth units in the population enterthe sample. One serious disadvantage with the estimator (3) is that it mayassume negative values. For example, when a sample of size 2 is taken froma population of four units given by
YI = 1, Y2 = 2, Ya = 3, Y4 = 4
such that 1r12 = 1ra4 = 1, 1r13 = 1r14 = 1r23 = 1r24 = 1/16, it is easy to see that the estimator (3) would be negative for all samples except for (YI, Y2) and (Ya, Y4). In suchcases the estimate (1) becomes useless since its variance cannot be estimated
269
Dow
nloa
ded
by [
Mos
kow
Sta
te U
niv
Bib
liote
] at
18:
37 1
0 Ja
nuar
y 20
14
270 AMERICAN STATISTICAL ASSOCIATION JOURNAL, JUNE 1956
from the sample. Yates and Grundy [6] have proposed an alternative estimator for (2) which is believed to be less often negative. They recast (2) into
(4)
so that their estimator is
(5)
They remark that "although it is not immediately apparent that (7ri7rj
-7ri;)/7ri; is necessarily positive, this appears to be the case when the usualmethod of selection is employed." It is easy to see that the estimator (5) canbe negative. For instance, in the example considered earlier (5) is negativewhenever (3) is positive. We thus see that the estimators of variance given byHorvitz and Thompson [4] or by Yates and Grundy [6] can take negativevalues. This disquieting situation led the present author to search for estimatorswhose estimated variance should be always positive. Such estimates andothers have been presented in this paper. In the scheme considered by previousauthors there is only one unbiased estimator and that too can be negative. Wehave given a whole class of estimates whose estimated variance is always positive. It is not claimed that this estimator is necessarily more efficient thanthe estimates presented earlier, although this has been found to be the case inseveral examples. One such example is that given by Yates and Grundythemselves.
2. PROBABILITIES OF SELECTION AND
THEOREMS ON EXPECTATIONS
Let there be a population(6)
from which a sample of size n is drawn without replacement with varying probabilities of selection at each draw, the scheme of selection of a unit at a particular draw depending naturally on the units already drawn in the samplebut not on the order in which they were drawn. For the first draw the probabilities of selection are:
(8)
(7)L Pil = 1.Pil > 0,(i = 1 2 · · · N )" , ,At the second draw we have (~) sets of probabilities of selection
{p;21} , {P;22}, ... , {Pi2Nlaccording as the first, second, ... Nth units are drawn at the first draw. Andso on for other draws. At the nth draw we have (n~l) sets of probabilities ofselection depending on which of the n-l units have been drawn at the previousdraws. Let the sample values obtained in order according to this generalscheme be c Y2'" u; )
v« P;2 Pin(9)
Dow
nloa
ded
by [
Mos
kow
Sta
te U
niv
Bib
liote
] at
18:
37 1
0 Ja
nuar
y 20
14
SAMPLING WITH VARYING PROBABILITIES 271
where the superscripts of Pij have been omitted. Before proceeding furtherwe will present two theorems which will considerably simplify our derivations.Let a sample of size n = 2 be drawn with varying probabilities of selection ateach draw from a population of size N. Letf(Yi, Yj) be a function of the samplevalues drawn. By E 2(f) we shall denote the conditional expected value of f fora given value of the first draw and similarly V2(f) denotes the conditionalvariance off. In the same way E l (!) and VIC!) denote expected values and variances for all possible units obtained at the first draw. Then it is easy to provethe following:
Theorem 1. The expected value of f is equal to the expected value of the conditional expected values, i.e.,
(10)
Theorem 2. The variance of f is equal to the sum of the expected value ofthe conditional variances and the variance of the conditional expected values,i.e.,
(11)
The two theorems are capable of an easy generalization for any sample size n.In fact
E12 ,. .nC!) = ElE2 ••• En(f),
V12 ,. 'nC!) = (E l V23 ' '' n + V1E23 . 00 ,,)(f) .
3. ONE SE'!' OF ESTIMATES
(12)
(13)
In order to estimate the total Y of a character Y for the population (6) forthe sampling scheme given in Section 2, one set of proposed estimates is
(14)
y"t-: = Yl + Y2 + ... + Yn-1 + - .
pin
Using theorem 1, we have E(tn) =E12.....(tn) = E1E2 •• ·E..(tn). SinceE..(YnPln) = Y - (Yl+Y2+ ... +Yn-l), we have E(tn) = Y, so that t1, t2, •.• , t;are all unbiased for estimating Y. Hence any linear function
n
t = 2: Citi,1
..LCi = 11
(15)
is an unbiased estimate. To obtain the variance of t.. we use theorem 2 and have
Dow
nloa
ded
by [
Mos
kow
Sta
te U
niv
Bib
liote
] at
18:
37 1
0 Ja
nuar
y 20
14
(16)
272 AMERICAN STATISTICAL ASSOCIATION JOURNAL, JUNE 19M
Y 2
V(tn) = LPil LPi2 L ... L_n
Pin
- L: Pil L: . .. L Pq,n-I(Y -YI ... Yn-l)2, n > 1
where summation is always over the available units.An application of theorem 2 gives the very pleasing result
C(txip.) = 0, (X ~ J1.) (17)
where C(tx ip.) denotes the covariance between ix and tp..The result (17) shows that the estimates iI, t2, ' .. , t« are uncorrelated. The
variance of t, the general linear unbiased estimate in this set-up, can then beeasily obtained as
"Vet) = L Ci 2V (i i) .
I
In particular for the very practical situation n = 2, we have
YI ( Y2)t = CI - + C2 YI + - ,p« PiZ
(18)
(19)
(20)
For any choice of CI, C2, ••• , c,. such that their sum is unity we get an unbiased estimate of Y given by (15). It will be seen that the estimate for whichthe coefficients are all equal has a very desirable property. If one seeks for anestimate which should reduce to N times the sample mean when the selectionprobabilities are equal, it is easy to see that the coefficients CI, Cz, ••• , c.;
should satisfy the following equations:
NCI + Cz + ... + Cn = N In,
N - lc2 + ... + Cn = NIn,
N - n + Ic, = Nln.
(21)
(22)
We now come to the problem of estimating the variance of the estimator (15).Making use of the result (17) that iI, t2, ••• , t; are uncorrelated, we have
E(txip.) = yz for X~ u,
Hence an unbiased estimate of the variance of i given by (15) is
- L:' htjJ.Vet) = t 2
- 2 (23)n(n - 1)
'where L' denotes summation over the m pairs. In case CI=C2= ••• =Cn ,
so that the estimator is
Dow
nloa
ded
by [
Mos
kow
Sta
te U
niv
Bib
liote
] at
18:
37 1
0 Ja
nuar
y 20
14
SAMPLING WITH VARYING PROBABILITIES
1 n
tmean = - 2: t i ,n 1
we have from (23) that
273
(24)
(25)
which is evidently always positive. In view of the fact that the estimators sofar presented can be negative, the estimator tm ean seems to be a very interestingestimate.
4. A SECOND SET OF ESTIMATES
We shall now give another set of estimates as
tl' = yI/Pil
1
N - 1 p« Pi2
1 llYntn' = ------------ - - ... - .
(N - l)(N - 2) ... (N - n + 1) Pil Pi2 pin
Using theorem 1 we have
E(ll') = E(t2 ' ) = .,. = E(l,,') = Y
so that t/, l2', ... , tn' are unbiased for estimating Y. Hence
(26)
n
t' = 2: Cit!,1
2: c, = 1 (27)
is an unbiased estimate. For the variance of t,:, an application of theorem 2gives
1 1 1Y (tn') = 'V' " "[(N - l)(N - 2) ... (N - n + 1) J2 LJ Pil LJ pj2 LJ ...
y"22: --
PI"
while the covariance between tA' and t/ (p. >X) is given by
Y2, n > 1
(28)
111C(t),'t ') = 2: - 2: - L: ...
I' (N-'A)(N-'A+l)2 ... (N-1)2 p«. p02t J (29)
Y),L - (Y - Yi ... y),) - Y2pm),
where L: always denotes summation over the available units. Using (28) and
Dow
nloa
ded
by [
Mos
kow
Sta
te U
niv
Bib
liote
] at
18:
37 1
0 Ja
nuar
y 20
14
274 AMERICAN STATISTICAL ASSOCIATION JOURNAL, JUNE 1956
(30)
(29) the variance of t' can be easily obtained. In particular for n=2 we have
Y1 1 1 Y2t' = C1-+C2--- --
Pi1 N - 1 Pi1 Pj2
and
Y1 2 1 1 Y22Vet)' = Cl2 I: - + C22 I: - I: -
Pil (N - 1)2 Pil Pi2
1 u,+ 2ClC2 -- I: - (Y - Yl) - P.N-l Pi1
It is of interest to note that t' reduces to N times the sample mean if
1Cl = C2 = . . . = Cn = - .
n
(31)
(32)
We now turn to the question of obtaining an unbiased estimate of thevariance of t'. We notice that
N
E(Yit/) = L Yk 2
1
(i = 1,2, ... , n), (33)
Hence
N - 1(j> i-I, 2, ... , n). (34)
where
E(G) = yi (35)
(36)
Thus an unbiased estimate of the variance of t' is given by
Vet') = t'2 - G.
This estimate may assume negative values.
(37)
5. A THIRD SET OF ESTIMA.TES
We shall now give another set of estimates for Y. From the set of selectionprobabilities given in Section 2, we can calculate the unconditional probabilitieswith which the units will be drawn at each draw. Suppose
Pij(i = 1,2, ... , N; j = 1,2, ... , n) (38)
is the unconditional probability that the ith-unit (in the population) will bedrawn at the jth draw. Then the proposed set of estimates is
Dow
nloa
ded
by [
Mos
kow
Sta
te U
niv
Bib
liote
] at
18:
37 1
0 Ja
nuar
y 20
14
SAMPLING WITH VARYING PROBABILITIES
tI" = yI/Pil
t2" = y2/Pj 2
t;" = Yn/P ln.
It is easy to see that E(tl " ) =E(t2 " ) = ... = E(tn " ) = V, so that
275
(39)
n
t" = '"" c.t,"~ f f 1
1
n
L c, = 11
(40)
is an unbiased estimate. Also
V(tn") = L Yn2
- pPin
where L denotes summation over all units in the population. And
CU/'V') = L Pi1 L ... L Pt,~-1 L pY~ Pill L ...•11
. L Ps,p.-l L: ;p. pup. - p.up.
(41)
(42)
Hence the expression for the variance of t" is apparent. In particular for n = 2,we have
" Yl Y2t =Cl-+C2-' (43)r; P j2
To obtain an estimate which reduces to N times the sample mean when theprobabilities of selection are equal, it is easy to see that in this case
1P i1 = P j2 = . . . = Pin = N
so that
1Cl = C2 = . . . = en = - .
n
To estimate the variance of t" we make use of the fact that
SO that
L' t tV(t") = t"2 _ 2 11 p.
n(n - 1)
where L' denotes summation over the G) pairs.
(45)
(46)
Dow
nloa
ded
by [
Mos
kow
Sta
te U
niv
Bib
liote
] at
18:
37 1
0 Ja
nuar
y 20
14
276 AMERICAN STATISTICAL ASSOCIATION JOURNAL, JUNE 1956
This estimate may assume negative values.As stated before, explicit expressions for the unconditional probabilities can
be worked out in the general case. For example when n = 2, we have
P« = pil,
P i2 = PllPi21 + P2lPi22 + ... + Pi_,IPi2i-1 + Pi+1,1Pi2i+l + ... + PNIPi2N (47)
where Pi2 t is the probability that the ith unit is selected at the second draw whenit is known that the tth unit has been selected at the first draw. In case thesecond unit is drawn with probabilities proportionate to size (pps) of the remaining units, we have
Pi! = Pi (say)
(
N p' p.)P i2 = Pi L --'- - --'- .
i~l 1 - Pi 1 - Pi
If, however, the second draw is made with equal probabilities, we have
P i l = pi,
1P i2 = -- (1 - Pi).
N-1
6. SOME ASPECTS OF YATES AND GRUNDY'S ESTIMATOR
(48)
(49)
Before studying the relative performance of the estimators proposed, we shallturn our attention to the remark made by Yates and Grundy [6] that "althoughit is not immediately apparent that their estimator is necessarily positive, thisappears to be the case when the usual method of selection is employed." Weshall prove that this estimator is positive in at least two important situations.
(i) When the first unit is selected with probabilities proportionate to somemeasure of size (pps) and the remaining units are selected with equal probability.
(ii) When the first unit is selected with probabilities proportionate to somemeasure of size and the second unit with probabilities proportionate to the sizesof the remaining units, the sample size being two.
We will present two theorems incorporating the results mentioned, whichhave also been noticed by Sen.
Theorem 3. In sampling with varying probabilities when the first unit isselected with pps and the remaining units with equal probability, Yates andGrundy's estimator of the error variance is always positive.
Proof: If the first unit is selected with probabilities
PI, P2, ••• , PN; L Pi = 1 (50)
and the remaining n-1 units with equal probability without replacement, wehave
Dow
nloa
ded
by [
Mos
kow
Sta
te U
niv
Bib
liote
] at
18:
37 1
0 Ja
nuar
y 20
14
SAMPLING WITH VARYING PROBABILITIES
n - 1 [N - n n - 2 J7fij = N _ 1 N _ 2 (Pi + Pi) + N - 2 '
N-n n-17fi = L 7fij = ---Pi +--- .
i>i N - 1 N - 1
N ow Yates and Grundy's estimator of variance is given by
Substituting in (53) for 7fi and 7fii from (51) and (52), we have
~ N-n 1 [ n-1 JV Y G = I: - (N - n)piPi +-- (1 - Pi - Pi)
(N - 1)2 7I"ii N - 2
.(Yi _ Yi)27I"i 7I"i
277
(51)
(52)
(53)
which is always positive.Theorem 4. In sampling with varying probabilities for samples of size 2
when the first unit is selected with pps and the second unit with pps of the remaining units, Yates and Grundy's estimator of the variance is always positive.
Proof: For the sake of definiteness, we suppose that the units selected areYl and Y2.
Then we have
[PI P2 J
71'12 = P1P2 2 + -.-- +--- ,1 - Pl 1 - P2 (54)
where
71"2 = P2[1 + _P_l_ + AJ '1 - PI
71"1 = PI [1 + ..»: + AJ ' (55)1 - P2
po P4 PNA=--+--+···+-_·1 - po 1 - P4 1 - PN
(56)
Yates and Grundy's estimator of the error variance then is
For the two units selected it is easy to see that A is minimum when
(57)
po
1 - Pa
PN k=---=----
1 - PN N - 2 - k(58)
Dow
nloa
ded
by [
Mos
kow
Sta
te U
niv
Bib
liote
] at
18:
37 1
0 Ja
nuar
y 20
14
278
where
AMERICAN STATISTICAL ASSOCIATION JOURNAL, JUNE 1956
k=1-PI-P2
so that the minimum value Am in of A is given by
(N - 2)kj(N - 2 - k).
Hence the numerator of (57) exceeds
(N - 1)k 2
A m in 2 +------------(1 - Pl)(I - P2)(N - 2 - k)
(59)
(60)
which is positive for N >2.Thus Yates and Grundy's estimator of variance is positive in the two situa
tions described before.
7. RELATIVE PERFORMANCE OF THE ESTIMATORS
We shall now study the relative performance of the estimators proposed inthis paper and those given by Horvitz and Thompson [4] and Yates andGrundy [6]. For this purpose we shall consider the following population givenby Yates and Grundy
Unit P Y
1 .1 0.52 .2 1.23 .3 2.14 .4 3.2
This~population was deliberately chosen by them as being more extreme thanwill normally be encountered in practice. The object is to estimate the total ofthe population by taking a sample of two units. We shall study the performanceof the estimators
tm ean , ts, t', t", fjHT
given by (24), (15) and (21), (27) and (32), (40) and (45), and (1) respectively.The discussion is restricted to the two important cases (i) and (ii) given inSection 6. The results obtained are presented in Tables 1 and 2. As proved before, Yates and Grundy's estimate of variance VYG('jHT) is positive in both thecases as well as V(tmean) , which is always positive. Horvitz and Thompson'sestimator VHT('jHT) becomes negative twice under Case (i) and eight timesunder Case (ii). Certain other estimators (of variance) given in this paper alsotake negative values. With regard to the true variances themselves of the estimates, it is found that tm ean is best in Case (i) while ty is best in Case (ii). Inboth cases, however, tm ean has a much smaller variance as compared with thevariances of other estimators. Judged by this and some other examples considered by the author, it appears that the estimator tmean compares very favorably with Yates and Grundy's or Horvitz and Thompson's estimator 'jHT. Ithas besides the distinct advantage that for any sample size and for any systemof probability selection, the unbiased estimate of the variance would not be
Dow
nloa
ded
by [
Mos
kow
Sta
te U
niv
Bib
liote
] at
18:
37 1
0 Ja
nuar
y 20
14
SAMPLING W1TH VARYING PROBABILITIES 279
negative, while no general statement of this kind can be made for other estimators. While the intention is not to show in this paper that the estimatortm ean is always better than other possible estimators, the main point is that,given some reasonably correlated measures of size, by sampling with probability proportionate to these sizes, the estimator tm ean will be superior from thepoint of view of possibly smaller variance, computational and algebraic simplicity etc.
TABLE 1. UNBIASED ESTIMATES OF ERROR VARIANCE FOR CASE (i)
Units Vet") vet') VCtg) V (tm,an) VYGCDHT) VHTCDHT)1,2 2.06 45.80 -1.14 .20 1.51 -1.111, 3 15.00 114.20 4.44 .81 4.34 2.261,4 59.75 241.80 10.06 6.50 7.33 6.452, 1 -1.50 4.84 -1.76 2.72 1.51 -1.112, 3 11.25 15.64 4.00 .56 .92 .782, 4 56.20 34.20 19.84 5.76 3.05 3.933, 1 -6.42 .51 -2.80 2.89 4.34 2.263,2 -6.84 -4.70 -2.29 .42 .92 .783, 4 50.35 -13.58 20.78 5.52 .72 3.014, 1 -14.24 - 3.72 -3.96 2.72 7.33 6.454, 2 -15.34 -13.15 -2.56 .36 3.05 3.934,3 - 3.75 -24.82 5.00 .56 .72 3.01
True errorvariance 6.222 9.701 3.619 2.223 2.884 2.884
TABLE 2. UNBIASED ESTIMATES OF ERROR VARIANCE FOR CASE (ii)
Units Vet") VCt') V(ty) V(tm,an) VYG(DHT) VHT(DHT)1,2 -4.63 93.20 1.86 .20 .41 -6.211,3 .87 114.20 4.44 .81 1.52 -4.691,4 18.74 134.60 7.74 1.82 2.79 -0.582, 1 - 7.59 10.85 -1.32 .16 .41 -6.212, 3 .23 12.02 1.88 .16 .36 -3.782,4 19.40 10.39 4.34 .64 1.08 1.203,1 -10.49 - 3.17 -2.40 .49 1.52 -4.693,2 - 8.26 - 5.52 -1.42 .12 .36 -3.783,4 19.47 -12.80 1.86 .12 .18 5.014,1 -15.28 - 9.86 -3.36 .81 2.79 -0.584, 2 -12.32 -13.15 -2.56 .36 1.08 1.204,3 4.36 -17.01 -1.44 .09 .18 5.01
True errorvariance 1.178 5.435 .316 .365 .823 .823
PART II
8. EXTEXSIOX TO MULTISTAGE DESIGNS
Introduction. In the first part of this paper we have obtained some estimatorsin unistage designs when the units are selected with varying probabilities with-out replacement. Since large scale surveys are generally based on multistagedesigns, it would be appropriate to extend the estimators of the first part tosuch designs. The use of sampling with varying probabilities in subsamplingdesigns was first suggested by Hansen and Hurwitz [2]. They found out that a
Dow
nloa
ded
by [
Mos
kow
Sta
te U
niv
Bib
liote
] at
18:
37 1
0 Ja
nuar
y 20
14
280 AMERICAN STATISTICAL ASSOCIATION JOURNAL, JUNE 1956
subsampling design, in which only one first stage unit was selected from eachstratum with probability proportionate to the measure of the size of the unitand a fixed number of second stage units was selected with equal probabilityfrom each of the selected first stage units, brought about marked improvementin the precision of the estimate as compared with sampling systems involvingthe use of equal probabilities. In another paper Hansen and Hurwitz [3] considered the problem of determining probabilities of selection which haveoptimum properties in the sense that the schemes provide maximum efficiencyat a given cost. They however restricted themselves to the case in which thefirst stage units are selected with pps with replacement. To avoid the repetitionof first stage units in order to achieve gains in efficiency, Midzuno [5] developedthe scheme for the selection of n first stage units with pps from a stratum butwithout replacement for obtaining unbiased estimates of the population total.Midzuno, however, did not give unbiased estimates of the variance of theestimates. Horvitz and Thompson [4], though primarily interested in the unistage case, considered the multistage case where the first stage units are selectedwith varying probabilities and without replacement and random samples ofpredetermined sizes are chosen from the selected first stage units. If the population (or stratum) consists of N first stage units of which n units are sampledand if for the ith unit in the population there is an estimator 'I', based on sampling at second and subsequent stages such that
E (T i ) = Y i iF total of the ith uni t,
Horvitz and Thompson's estimator is
-YHT =
whose variance is
(62)
an unbiased estimate of which is
(64)
The estimator (64) like the corresponding one in the unistage case can takenegative values. Durbin [1] gave an alternative estimate of the variance following Yates and Grundy [6] who were interested in the unistage case only. Recasting the expression (63) as
(65)
the unbiased estimate of the variance obtained is
(66)
Dow
nloa
ded
by [
Mos
kow
Sta
te U
niv
Bib
liote
] at
18:
37 1
0 Ja
nuar
y 20
14
SAMPLIXG WITH VARYING PROBABILITIES 281
which is believed to be less often negative.Summary of results obtained. Let the population (or stratum) consist of N first
stage units
Ul, U2, ••• , UN (67)
the total of the ith unit being Vi. From these first stage units n units are selectedwithout replacement with varying probabilities according to the scheme givenin Section 2. We subsample a predetermined number of second stage units fromeach of the first stage units selected and may proceed with further subsamplingof the second stage units in a known and predetermined way and so on. Supposethat for the ith first stage unit there is an estimator T i (based on subsamplingat second and subsequent stages) such that
(68)
where 82 and V2 denote conditional expectations and variances over the secondand subsequent stages.
Defining similarly 81(1) and Vl(f) as the expected value and variance off overdifferent samples of units at the first stage, we can easily prove:
Theorem 5. The expected value of f is equal to the expected value of the conditional expected values, i.e.,
It may be noted that, in terms of the operators El , E2,
we have
(69)
. , E" of Part I,
(70)
Theorem 6. The variance of f equals the sum of the expected value of the conditional variance and the variance of the conditional expected value, i.e.,
(71)
We note that in terms of our operators of Part I,
(72)
In order to estimate the total of the population (or stratum), one set of unbiased estimates is
Zl = Tl/Pil
T2Z2 = T1 +
Pi2
T".. + T n - l +-pin
(73)
where Ti, T2, ••• , 'I'; are unbiased estimates of the totals of the respective firststage units drawn in the sample in this order.
Dow
nloa
ded
by [
Mos
kow
Sta
te U
niv
Bib
liote
] at
18:
37 1
0 Ja
nuar
y 20
14
282
Thus
AMERICAN STATISTICAL ASSOCIATION JOURNAL, JUNE 1956
We have
n
Z = L CiZi,1
L: c, = 1 is unbiased. (74)
(75)
where t; is the corresponding unistage estimator discussed in Part I.With regard to the covariance of Zh and Z~, theorem 5 gives
so that unlike the unistage case, Zh and Z~ are not uncorrelated.Considering the series of estimates
822
V2 = 81 2 + -,Pit
(76)
(77)
it is easy to see that
n
V = L C,V.,1
L c, = 1 (78)
is an unbiased estimate of Lf U,2.
Hence an unbiased estimate of the variance of Z is
n
L Z;Zj
f\z) = Z2 - 2 j>i~l + v.n(n - 1)
(79)
In case Cl=C2 ••• =Cn= lin, it is easy to see that V(z) is positive.Comparing (79) with the corresponding estimator (23) in the unistage case,
we arrive at the following important rule for estimating the variance in multistage designs.
The estimate of variance in multistage sampling is the sum of two parts.The first part is equal to the estimate of variance calculated on the assumptionthat the first stage units have been measured without error. The second partis obtainable from the population total estimate itself by substituting theestimated variances for the estimates of the totals of the units.
Another set of unbiased estimates is
Dow
nloa
ded
by [
Mos
kow
Sta
te U
niv
Bib
liote
] at
18:
37 1
0 Ja
nuar
y 20
14
SAMPLING WITH VARYING PROBABILITIES
Zl' = Tl/Pil
1 1 T 2Z2' =--- --
N - 1 »« Pj2
1 1 1 t;Zn' = - - ... -.
(N - l)(N - 2) ... (N - n + 1) Pil Pj2 pin
It is easy to see that
283
(80)
n
Z' = L CiZ/,1
is an unbiased estimate. And
L Ci'" 1
whereas
1 ,,1,,1C(zx'z,/) = c: c: L
(N - X)(N - X + 1)2 ... (N - 1)2 Pil Pj2 •••
Yx. L - (Y - Yl ... y,,) - P.pmX
(83)
which is the same as obtained in the unistage case. In order to estimate thevariance of z' we notice that
N N
8(TiZ/) = L Y k 2 + L (J'j,2
k=l k=l
N - 1
Considering the estimates
Vl' = 8l2/ Pil
1 1 822
V2' =--- --N - 1 Pi2 Pj2
(i = 1, 2, ... , n)
U> i = 1,2, ... , n).
(84)
(85)
(86)
1 1 1 8102
vn'= --"'-(N - l)(N - 2) ... (N - n + 1) Po. Pj2 PZn
it is found that
is an unbiased estimate of I:.f rTk2•
L c. = 1 (87)
Dow
nloa
ded
by [
Mos
kow
Sta
te U
niv
Bib
liote
] at
18:
37 1
0 Ja
nuar
y 20
14
284 AMERICAN STATISTICAL ASSOCINrION JOURNAL, JUNJ<J 1956
Hence an unbiased estimate of V(z') is given by
~ 1 N-1 nV(z') = Z'2 - - L: Tiz/ - --- L: Tiz/ + o',
n (~) j>i~1
Another set of unbiased estimates is
where P ij is defined by (38). As before
(88)
(89)
n
Z" = L: CiZ/',1
L: Ci = 1 (90)
is unbiased for estimating Y. With regard to the variance of e-" we have
(91)
The covariance between zx" and z;" is given by (42) in Part 1. The expressionfor the variance of z" is apparent. An unbiased estimate of the variance of z"is given by
~ L:' ZXzl'V(z") = Z"2 - 2 + v
n(n - 1)
where L:' denotes summation over the different pairs in the sample.
(92)
REFERENCES
[1] Durbin, J. "Some results in sampling theory when the units are selected with unequalprobabilities," Journal of the Royal Statistical Society, Series B, 15, No.2 (1953), 262-9.
[2] Hansen, M. H. and Hurwitz, W. N., "On the theory of sampling from finite populations." Annals of Mathematical Statistics, 14 (1943), 333-62.
[3] Hansen, M. H. and Hurwitz, W. N., "On the determination of optimum probabilitiesin sampling." Annals of Mathematical Statistics, 20 (1949), 426--32.
[4] Horvitz, D. G. and Thompson, D. J. "A generalization of sampling without replacementfrom a finite universe," Journal of the American Statistical Association, 47 (1952),663-85.
[5] Midzuno, H. "On the sampling system with probability proportionate to sum of sizes."Annals of the Institute of Statistical Mathematics 3 (1952), 99-107.
[6] Yates, F. and Grundy, P. M. "Selection without replacement from within strata withprobability proportional to size," Journal of the Royal Statistical Society, Series B, 15,No.1 (1953), 253-61.
Dow
nloa
ded
by [
Mos
kow
Sta
te U
niv
Bib
liote
] at
18:
37 1
0 Ja
nuar
y 20
14