+ All Categories
Home > Documents > Some Estimators in Sampling with Varying Probabilities without Replacement

Some Estimators in Sampling with Varying Probabilities without Replacement

Date post: 21-Dec-2016
Category:
Upload: des
View: 212 times
Download: 0 times
Share this document with a friend
18
This article was downloaded by: [Moskow State Univ Bibliote] On: 10 January 2014, At: 18:37 Publisher: Taylor & Francis Informa Ltd Registered in England and Wales Registered Number: 1072954 Registered office: Mortimer House, 37-41 Mortimer Street, London W1T 3JH, UK Journal of the American Statistical Association Publication details, including instructions for authors and subscription information: http://amstat.tandfonline.com/loi/uasa20 Some Estimators in Sampling with Varying Probabilities without Replacement Des Raj a a Indian Statistical Institute Published online: 11 Apr 2012. To cite this article: Des Raj (1956) Some Estimators in Sampling with Varying Probabilities without Replacement, Journal of the American Statistical Association, 51:274, 269-284 To link to this article: http://dx.doi.org/10.1080/01621459.1956.10501326 PLEASE SCROLL DOWN FOR ARTICLE Taylor & Francis makes every effort to ensure the accuracy of all the information (the “Content”) contained in the publications on our platform. However, Taylor & Francis, our agents, and our licensors make no representations or warranties whatsoever as to the accuracy, completeness, or suitability for any purpose of the Content. Any opinions and views expressed in this publication are the opinions and views of the authors, and are not the views of or endorsed by Taylor & Francis. The accuracy of the Content should not be relied upon and should be independently verified with primary sources of information. Taylor and Francis shall not be liable for any losses, actions, claims, proceedings, demands, costs, expenses, damages, and other liabilities whatsoever or howsoever caused arising directly or indirectly in connection with, in relation to or arising out of the use of the Content. This article may be used for research, teaching, and private study purposes. Any substantial or systematic reproduction, redistribution, reselling, loan, sub-licensing, systematic supply, or distribution in any form to anyone is
Transcript

This article was downloaded by: [Moskow State Univ Bibliote]On: 10 January 2014, At: 18:37Publisher: Taylor & FrancisInforma Ltd Registered in England and Wales Registered Number: 1072954Registered office: Mortimer House, 37-41 Mortimer Street, London W1T 3JH,UK

Journal of the AmericanStatistical AssociationPublication details, including instructions for authorsand subscription information:http://amstat.tandfonline.com/loi/uasa20

Some Estimators in Samplingwith Varying Probabilitieswithout ReplacementDes Raj aa Indian Statistical InstitutePublished online: 11 Apr 2012.

To cite this article: Des Raj (1956) Some Estimators in Sampling with VaryingProbabilities without Replacement, Journal of the American Statistical Association,51:274, 269-284

To link to this article: http://dx.doi.org/10.1080/01621459.1956.10501326

PLEASE SCROLL DOWN FOR ARTICLE

Taylor & Francis makes every effort to ensure the accuracy of all theinformation (the “Content”) contained in the publications on our platform.However, Taylor & Francis, our agents, and our licensors make norepresentations or warranties whatsoever as to the accuracy, completeness,or suitability for any purpose of the Content. Any opinions and viewsexpressed in this publication are the opinions and views of the authors, andare not the views of or endorsed by Taylor & Francis. The accuracy of theContent should not be relied upon and should be independently verified withprimary sources of information. Taylor and Francis shall not be liable for anylosses, actions, claims, proceedings, demands, costs, expenses, damages,and other liabilities whatsoever or howsoever caused arising directly orindirectly in connection with, in relation to or arising out of the use of theContent.

This article may be used for research, teaching, and private study purposes.Any substantial or systematic reproduction, redistribution, reselling, loan,sub-licensing, systematic supply, or distribution in any form to anyone is

expressly forbidden. Terms & Conditions of access and use can be found athttp://amstat.tandfonline.com/page/terms-and-conditions

Dow

nloa

ded

by [

Mos

kow

Sta

te U

niv

Bib

liote

] at

18:

37 1

0 Ja

nuar

y 20

14

SOME ESTIMATORS IN SAMPLING WITH VARYINGPROBABILITIES WITHOUT REPLACEMENT

DES RAJ

Indian Statistical Institute

The problem considered is estimation of the total value of a characterfor a finite population from a sample when the units are selected withvarying probabilities without replacement. Several unbiased estimatorsare proposed. Exact expressions and unbiased estimators for the vari­ances of the estimators are obtained. Certain properties of Yates andGrundy's estimator of variance are proved. Examples are includedto study the relative performance of the various estimators. The re­sults obtained for unistage sampling in the first part of the paper areextended to multistage designs in the second part.

PART I

1, INTRODUCTION

I T IS well-known that by assigning varying probabilities of selection to dif­ferent units in a population, it is possible to reduce considerably the sampling

error of the estimates over those obtained when sampling with equal prob­abilities. Recently, Horvitz and Thompson [4] have proposed an unbiasedestimator for estimating the total of a finite population and have also estimatedthe variance of their estimator when sampling is carried out without replace­ment with varying probabilities at each draw. Their estimator is

_ n YiYHT = :L -

I 7r i

(1)

where 7ri is the probability that the ith unit in the population enters the sampleof size n. Also

(2)

and an unbiased estimate of the variance is

(3)

where 1rij is the probability that the ith and jth units in the population enterthe sample. One serious disadvantage with the estimator (3) is that it mayassume negative values. For example, when a sample of size 2 is taken froma population of four units given by

YI = 1, Y2 = 2, Ya = 3, Y4 = 4

such that 1r12 = 1ra4 = 1, 1r13 = 1r14 = 1r23 = 1r24 = 1/16, it is easy to see that the estima­tor (3) would be negative for all samples except for (YI, Y2) and (Ya, Y4). In suchcases the estimate (1) becomes useless since its variance cannot be estimated

269

Dow

nloa

ded

by [

Mos

kow

Sta

te U

niv

Bib

liote

] at

18:

37 1

0 Ja

nuar

y 20

14

270 AMERICAN STATISTICAL ASSOCIATION JOURNAL, JUNE 1956

from the sample. Yates and Grundy [6] have proposed an alternative estima­tor for (2) which is believed to be less often negative. They recast (2) into

(4)

so that their estimator is

(5)

They remark that "although it is not immediately apparent that (7ri7rj

-7ri;)/7ri; is necessarily positive, this appears to be the case when the usualmethod of selection is employed." It is easy to see that the estimator (5) canbe negative. For instance, in the example considered earlier (5) is negativewhenever (3) is positive. We thus see that the estimators of variance given byHorvitz and Thompson [4] or by Yates and Grundy [6] can take negativevalues. This disquieting situation led the present author to search for estimatorswhose estimated variance should be always positive. Such estimates andothers have been presented in this paper. In the scheme considered by previousauthors there is only one unbiased estimator and that too can be negative. Wehave given a whole class of estimates whose estimated variance is always posi­tive. It is not claimed that this estimator is necessarily more efficient thanthe estimates presented earlier, although this has been found to be the case inseveral examples. One such example is that given by Yates and Grundythemselves.

2. PROBABILITIES OF SELECTION AND

THEOREMS ON EXPECTATIONS

Let there be a population(6)

from which a sample of size n is drawn without replacement with varying prob­abilities of selection at each draw, the scheme of selection of a unit at a par­ticular draw depending naturally on the units already drawn in the samplebut not on the order in which they were drawn. For the first draw the prob­abilities of selection are:

(8)

(7)L Pil = 1.Pil > 0,(i = 1 2 · · · N )" , ,At the second draw we have (~) sets of probabilities of selection

{p;21} , {P;22}, ... , {Pi2Nlaccording as the first, second, ... Nth units are drawn at the first draw. Andso on for other draws. At the nth draw we have (n~l) sets of probabilities ofselection depending on which of the n-l units have been drawn at the previousdraws. Let the sample values obtained in order according to this generalscheme be c Y2'" u; )

v« P;2 Pin(9)

Dow

nloa

ded

by [

Mos

kow

Sta

te U

niv

Bib

liote

] at

18:

37 1

0 Ja

nuar

y 20

14

SAMPLING WITH VARYING PROBABILITIES 271

where the superscripts of Pij have been omitted. Before proceeding furtherwe will present two theorems which will considerably simplify our derivations.Let a sample of size n = 2 be drawn with varying probabilities of selection ateach draw from a population of size N. Letf(Yi, Yj) be a function of the samplevalues drawn. By E 2(f) we shall denote the conditional expected value of f fora given value of the first draw and similarly V2(f) denotes the conditionalvariance off. In the same way E l (!) and VIC!) denote expected values and vari­ances for all possible units obtained at the first draw. Then it is easy to provethe following:

Theorem 1. The expected value of f is equal to the expected value of the con­ditional expected values, i.e.,

(10)

Theorem 2. The variance of f is equal to the sum of the expected value ofthe conditional variances and the variance of the conditional expected values,i.e.,

(11)

The two theorems are capable of an easy generalization for any sample size n.In fact

E12 ,. .nC!) = ElE2 ••• En(f),

V12 ,. 'nC!) = (E l V23 ' '' n + V1E23 . 00 ,,)(f) .

3. ONE SE'!' OF ESTIMATES

(12)

(13)

In order to estimate the total Y of a character Y for the population (6) forthe sampling scheme given in Section 2, one set of proposed estimates is

(14)

y"t-: = Yl + Y2 + ... + Yn-1 + - .

pin

Using theorem 1, we have E(tn) =E12.....(tn) = E1E2 •• ·E..(tn). SinceE..(YnPln) = Y - (Yl+Y2+ ... +Yn-l), we have E(tn) = Y, so that t1, t2, •.• , t;are all unbiased for estimating Y. Hence any linear function

n

t = 2: Citi,1

..LCi = 11

(15)

is an unbiased estimate. To obtain the variance of t.. we use theorem 2 and have

Dow

nloa

ded

by [

Mos

kow

Sta

te U

niv

Bib

liote

] at

18:

37 1

0 Ja

nuar

y 20

14

(16)

272 AMERICAN STATISTICAL ASSOCIATION JOURNAL, JUNE 19M

Y 2

V(tn) = LPil LPi2 L ... L_n

Pin

- L: Pil L: . .. L Pq,n-I(Y -YI ... Yn-l)2, n > 1

where summation is always over the available units.An application of theorem 2 gives the very pleasing result

C(txip.) = 0, (X ~ J1.) (17)

where C(tx ip.) denotes the covariance between ix and tp..The result (17) shows that the estimates iI, t2, ' .. , t« are uncorrelated. The

variance of t, the general linear unbiased estimate in this set-up, can then beeasily obtained as

"Vet) = L Ci 2V (i i) .

I

In particular for the very practical situation n = 2, we have

YI ( Y2)t = CI - + C2 YI + - ,p« PiZ

(18)

(19)

(20)

For any choice of CI, C2, ••• , c,. such that their sum is unity we get an un­biased estimate of Y given by (15). It will be seen that the estimate for whichthe coefficients are all equal has a very desirable property. If one seeks for anestimate which should reduce to N times the sample mean when the selectionprobabilities are equal, it is easy to see that the coefficients CI, Cz, ••• , c.;

should satisfy the following equations:

NCI + Cz + ... + Cn = N In,

N - lc2 + ... + Cn = NIn,

N - n + Ic, = Nln.

(21)

(22)

We now come to the problem of estimating the variance of the estimator (15).Making use of the result (17) that iI, t2, ••• , t; are uncorrelated, we have

E(txip.) = yz for X~ u,

Hence an unbiased estimate of the variance of i given by (15) is

- L:' htjJ.Vet) = t 2

- 2 (23)n(n - 1)

'where L' denotes summation over the m pairs. In case CI=C2= ••• =Cn ,

so that the estimator is

Dow

nloa

ded

by [

Mos

kow

Sta

te U

niv

Bib

liote

] at

18:

37 1

0 Ja

nuar

y 20

14

SAMPLING WITH VARYING PROBABILITIES

1 n

tmean = - 2: t i ,n 1

we have from (23) that

273

(24)

(25)

which is evidently always positive. In view of the fact that the estimators sofar presented can be negative, the estimator tm ean seems to be a very interestingestimate.

4. A SECOND SET OF ESTIMATES

We shall now give another set of estimates as

tl' = yI/Pil

1

N - 1 p« Pi2

1 llYntn' = ------------ - - ... - .

(N - l)(N - 2) ... (N - n + 1) Pil Pi2 pin

Using theorem 1 we have

E(ll') = E(t2 ' ) = .,. = E(l,,') = Y

so that t/, l2', ... , tn' are unbiased for estimating Y. Hence

(26)

n

t' = 2: Cit!,1

2: c, = 1 (27)

is an unbiased estimate. For the variance of t,:, an application of theorem 2gives

1 1 1Y (tn') = 'V' " "[(N - l)(N - 2) ... (N - n + 1) J2 LJ Pil LJ pj2 LJ ...

y"22: --

PI"

while the covariance between tA' and t/ (p. >X) is given by

Y2, n > 1

(28)

111C(t),'t ') = 2: - 2: - L: ...

I' (N-'A)(N-'A+l)2 ... (N-1)2 p«. p02t J (29)

Y),L - (Y - Yi ... y),) - Y2pm),

where L: always denotes summation over the available units. Using (28) and

Dow

nloa

ded

by [

Mos

kow

Sta

te U

niv

Bib

liote

] at

18:

37 1

0 Ja

nuar

y 20

14

274 AMERICAN STATISTICAL ASSOCIATION JOURNAL, JUNE 1956

(30)

(29) the variance of t' can be easily obtained. In particular for n=2 we have

Y1 1 1 Y2t' = C1-+C2--- --

Pi1 N - 1 Pi1 Pj2

and

Y1 2 1 1 Y22Vet)' = Cl2 I: - + C22 I: - I: -

Pil (N - 1)2 Pil Pi2

1 u,+ 2ClC2 -- I: - (Y - Yl) - P.N-l Pi1

It is of interest to note that t' reduces to N times the sample mean if

1Cl = C2 = . . . = Cn = - .

n

(31)

(32)

We now turn to the question of obtaining an unbiased estimate of thevariance of t'. We notice that

N

E(Yit/) = L Yk 2

1

(i = 1,2, ... , n), (33)

Hence

N - 1(j> i-I, 2, ... , n). (34)

where

E(G) = yi (35)

(36)

Thus an unbiased estimate of the variance of t' is given by

Vet') = t'2 - G.

This estimate may assume negative values.

(37)

5. A THIRD SET OF ESTIMA.TES

We shall now give another set of estimates for Y. From the set of selectionprobabilities given in Section 2, we can calculate the unconditional probabilitieswith which the units will be drawn at each draw. Suppose

Pij(i = 1,2, ... , N; j = 1,2, ... , n) (38)

is the unconditional probability that the ith-unit (in the population) will bedrawn at the jth draw. Then the proposed set of estimates is

Dow

nloa

ded

by [

Mos

kow

Sta

te U

niv

Bib

liote

] at

18:

37 1

0 Ja

nuar

y 20

14

SAMPLING WITH VARYING PROBABILITIES

tI" = yI/Pil

t2" = y2/Pj 2

t;" = Yn/P ln.

It is easy to see that E(tl " ) =E(t2 " ) = ... = E(tn " ) = V, so that

275

(39)

n

t" = '"" c.t,"~ f f 1

1

n

L c, = 11

(40)

is an unbiased estimate. Also

V(tn") = L Yn2

- pPin

where L denotes summation over all units in the population. And

CU/'V') = L Pi1 L ... L Pt,~-1 L pY~ Pill L ...•11

. L Ps,p.-l L: ;p. pup. - p.up.

(41)

(42)

Hence the expression for the variance of t" is apparent. In particular for n = 2,we have

" Yl Y2t =Cl-+C2-' (43)r; P j2

To obtain an estimate which reduces to N times the sample mean when theprobabilities of selection are equal, it is easy to see that in this case

1P i1 = P j2 = . . . = Pin = N

so that

1Cl = C2 = . . . = en = - .

n

To estimate the variance of t" we make use of the fact that

SO that

L' t tV(t") = t"2 _ 2 11 p.

n(n - 1)

where L' denotes summation over the G) pairs.

(45)

(46)

Dow

nloa

ded

by [

Mos

kow

Sta

te U

niv

Bib

liote

] at

18:

37 1

0 Ja

nuar

y 20

14

276 AMERICAN STATISTICAL ASSOCIATION JOURNAL, JUNE 1956

This estimate may assume negative values.As stated before, explicit expressions for the unconditional probabilities can

be worked out in the general case. For example when n = 2, we have

P« = pil,

P i2 = PllPi21 + P2lPi22 + ... + Pi_,IPi2i-1 + Pi+1,1Pi2i+l + ... + PNIPi2N (47)

where Pi2 t is the probability that the ith unit is selected at the second draw whenit is known that the tth unit has been selected at the first draw. In case thesecond unit is drawn with probabilities proportionate to size (pps) of the re­maining units, we have

Pi! = Pi (say)

(

N p' p.)P i2 = Pi L --'- - --'- .

i~l 1 - Pi 1 - Pi

If, however, the second draw is made with equal probabilities, we have

P i l = pi,

1P i2 = -- (1 - Pi).

N-1

6. SOME ASPECTS OF YATES AND GRUNDY'S ESTIMATOR

(48)

(49)

Before studying the relative performance of the estimators proposed, we shallturn our attention to the remark made by Yates and Grundy [6] that "althoughit is not immediately apparent that their estimator is necessarily positive, thisappears to be the case when the usual method of selection is employed." Weshall prove that this estimator is positive in at least two important situations.

(i) When the first unit is selected with probabilities proportionate to somemeasure of size (pps) and the remaining units are selected with equal proba­bility.

(ii) When the first unit is selected with probabilities proportionate to somemeasure of size and the second unit with probabilities proportionate to the sizesof the remaining units, the sample size being two.

We will present two theorems incorporating the results mentioned, whichhave also been noticed by Sen.

Theorem 3. In sampling with varying probabilities when the first unit isselected with pps and the remaining units with equal probability, Yates andGrundy's estimator of the error variance is always positive.

Proof: If the first unit is selected with probabilities

PI, P2, ••• , PN; L Pi = 1 (50)

and the remaining n-1 units with equal probability without replacement, wehave

Dow

nloa

ded

by [

Mos

kow

Sta

te U

niv

Bib

liote

] at

18:

37 1

0 Ja

nuar

y 20

14

SAMPLING WITH VARYING PROBABILITIES

n - 1 [N - n n - 2 J7fij = N _ 1 N _ 2 (Pi + Pi) + N - 2 '

N-n n-17fi = L 7fij = ---Pi +--- .

i>i N - 1 N - 1

N ow Yates and Grundy's estimator of variance is given by

Substituting in (53) for 7fi and 7fii from (51) and (52), we have

~ N-n 1 [ n-1 JV Y G = I: - (N - n)piPi +-- (1 - Pi - Pi)

(N - 1)2 7I"ii N - 2

.(Yi _ Yi)27I"i 7I"i

277

(51)

(52)

(53)

which is always positive.Theorem 4. In sampling with varying probabilities for samples of size 2

when the first unit is selected with pps and the second unit with pps of the re­maining units, Yates and Grundy's estimator of the variance is always positive.

Proof: For the sake of definiteness, we suppose that the units selected areYl and Y2.

Then we have

[PI P2 J

71'12 = P1P2 2 + -.-- +--- ,1 - Pl 1 - P2 (54)

where

71"2 = P2[1 + _P_l_ + AJ '1 - PI

71"1 = PI [1 + ..»: + AJ ' (55)1 - P2

po P4 PNA=--+--+···+-_·1 - po 1 - P4 1 - PN

(56)

Yates and Grundy's estimator of the error variance then is

For the two units selected it is easy to see that A is minimum when

(57)

po

1 - Pa

PN k=---=----

1 - PN N - 2 - k(58)

Dow

nloa

ded

by [

Mos

kow

Sta

te U

niv

Bib

liote

] at

18:

37 1

0 Ja

nuar

y 20

14

278

where

AMERICAN STATISTICAL ASSOCIATION JOURNAL, JUNE 1956

k=1-PI-P2

so that the minimum value Am in of A is given by

(N - 2)kj(N - 2 - k).

Hence the numerator of (57) exceeds

(N - 1)k 2

A m in 2 +------------­(1 - Pl)(I - P2)(N - 2 - k)

(59)

(60)

which is positive for N >2.Thus Yates and Grundy's estimator of variance is positive in the two situa­

tions described before.

7. RELATIVE PERFORMANCE OF THE ESTIMATORS

We shall now study the relative performance of the estimators proposed inthis paper and those given by Horvitz and Thompson [4] and Yates andGrundy [6]. For this purpose we shall consider the following population givenby Yates and Grundy

Unit P Y

1 .1 0.52 .2 1.23 .3 2.14 .4 3.2

This~population was deliberately chosen by them as being more extreme thanwill normally be encountered in practice. The object is to estimate the total ofthe population by taking a sample of two units. We shall study the performanceof the estimators

tm ean , ts, t', t", fjHT

given by (24), (15) and (21), (27) and (32), (40) and (45), and (1) respectively.The discussion is restricted to the two important cases (i) and (ii) given inSection 6. The results obtained are presented in Tables 1 and 2. As proved be­fore, Yates and Grundy's estimate of variance VYG('jHT) is positive in both thecases as well as V(tmean) , which is always positive. Horvitz and Thompson'sestimator VHT('jHT) becomes negative twice under Case (i) and eight timesunder Case (ii). Certain other estimators (of variance) given in this paper alsotake negative values. With regard to the true variances themselves of the esti­mates, it is found that tm ean is best in Case (i) while ty is best in Case (ii). Inboth cases, however, tm ean has a much smaller variance as compared with thevariances of other estimators. Judged by this and some other examples con­sidered by the author, it appears that the estimator tmean compares very favor­ably with Yates and Grundy's or Horvitz and Thompson's estimator 'jHT. Ithas besides the distinct advantage that for any sample size and for any systemof probability selection, the unbiased estimate of the variance would not be

Dow

nloa

ded

by [

Mos

kow

Sta

te U

niv

Bib

liote

] at

18:

37 1

0 Ja

nuar

y 20

14

SAMPLING W1TH VARYING PROBABILITIES 279

negative, while no general statement of this kind can be made for other esti­mators. While the intention is not to show in this paper that the estimatortm ean is always better than other possible estimators, the main point is that,given some reasonably correlated measures of size, by sampling with proba­bility proportionate to these sizes, the estimator tm ean will be superior from thepoint of view of possibly smaller variance, computational and algebraic sim­plicity etc.

TABLE 1. UNBIASED ESTIMATES OF ERROR VARIANCE FOR CASE (i)

Units Vet") vet') VCtg) V (tm,an) VYGCDHT) VHTCDHT)1,2 2.06 45.80 -1.14 .20 1.51 -1.111, 3 15.00 114.20 4.44 .81 4.34 2.261,4 59.75 241.80 10.06 6.50 7.33 6.452, 1 -1.50 4.84 -1.76 2.72 1.51 -1.112, 3 11.25 15.64 4.00 .56 .92 .782, 4 56.20 34.20 19.84 5.76 3.05 3.933, 1 -6.42 .51 -2.80 2.89 4.34 2.263,2 -6.84 -4.70 -2.29 .42 .92 .783, 4 50.35 -13.58 20.78 5.52 .72 3.014, 1 -14.24 - 3.72 -3.96 2.72 7.33 6.454, 2 -15.34 -13.15 -2.56 .36 3.05 3.934,3 - 3.75 -24.82 5.00 .56 .72 3.01

True errorvariance 6.222 9.701 3.619 2.223 2.884 2.884

TABLE 2. UNBIASED ESTIMATES OF ERROR VARIANCE FOR CASE (ii)

Units Vet") VCt') V(ty) V(tm,an) VYG(DHT) VHT(DHT)1,2 -4.63 93.20 1.86 .20 .41 -6.211,3 .87 114.20 4.44 .81 1.52 -4.691,4 18.74 134.60 7.74 1.82 2.79 -0.582, 1 - 7.59 10.85 -1.32 .16 .41 -6.212, 3 .23 12.02 1.88 .16 .36 -3.782,4 19.40 10.39 4.34 .64 1.08 1.203,1 -10.49 - 3.17 -2.40 .49 1.52 -4.693,2 - 8.26 - 5.52 -1.42 .12 .36 -3.783,4 19.47 -12.80 1.86 .12 .18 5.014,1 -15.28 - 9.86 -3.36 .81 2.79 -0.584, 2 -12.32 -13.15 -2.56 .36 1.08 1.204,3 4.36 -17.01 -1.44 .09 .18 5.01

True errorvariance 1.178 5.435 .316 .365 .823 .823

PART II

8. EXTEXSIOX TO MULTISTAGE DESIGNS

Introduction. In the first part of this paper we have obtained some estimatorsin unistage designs when the units are selected with varying probabilities with-out replacement. Since large scale surveys are generally based on multistagedesigns, it would be appropriate to extend the estimators of the first part tosuch designs. The use of sampling with varying probabilities in subsamplingdesigns was first suggested by Hansen and Hurwitz [2]. They found out that a

Dow

nloa

ded

by [

Mos

kow

Sta

te U

niv

Bib

liote

] at

18:

37 1

0 Ja

nuar

y 20

14

280 AMERICAN STATISTICAL ASSOCIATION JOURNAL, JUNE 1956

subsampling design, in which only one first stage unit was selected from eachstratum with probability proportionate to the measure of the size of the unitand a fixed number of second stage units was selected with equal probabilityfrom each of the selected first stage units, brought about marked improvementin the precision of the estimate as compared with sampling systems involvingthe use of equal probabilities. In another paper Hansen and Hurwitz [3] con­sidered the problem of determining probabilities of selection which haveoptimum properties in the sense that the schemes provide maximum efficiencyat a given cost. They however restricted themselves to the case in which thefirst stage units are selected with pps with replacement. To avoid the repetitionof first stage units in order to achieve gains in efficiency, Midzuno [5] developedthe scheme for the selection of n first stage units with pps from a stratum butwithout replacement for obtaining unbiased estimates of the population total.Midzuno, however, did not give unbiased estimates of the variance of theestimates. Horvitz and Thompson [4], though primarily interested in the uni­stage case, considered the multistage case where the first stage units are selectedwith varying probabilities and without replacement and random samples ofpredetermined sizes are chosen from the selected first stage units. If the popu­lation (or stratum) consists of N first stage units of which n units are sampledand if for the ith unit in the population there is an estimator 'I', based on sam­pling at second and subsequent stages such that

E (T i ) = Y i iF total of the ith uni t,

Horvitz and Thompson's estimator is

-YHT =

whose variance is

(62)

an unbiased estimate of which is

(64)

The estimator (64) like the corresponding one in the unistage case can takenegative values. Durbin [1] gave an alternative estimate of the variance follow­ing Yates and Grundy [6] who were interested in the unistage case only. Re­casting the expression (63) as

(65)

the unbiased estimate of the variance obtained is

(66)

Dow

nloa

ded

by [

Mos

kow

Sta

te U

niv

Bib

liote

] at

18:

37 1

0 Ja

nuar

y 20

14

SAMPLIXG WITH VARYING PROBABILITIES 281

which is believed to be less often negative.Summary of results obtained. Let the population (or stratum) consist of N first

stage units

Ul, U2, ••• , UN (67)

the total of the ith unit being Vi. From these first stage units n units are selectedwithout replacement with varying probabilities according to the scheme givenin Section 2. We subsample a predetermined number of second stage units fromeach of the first stage units selected and may proceed with further subsamplingof the second stage units in a known and predetermined way and so on. Supposethat for the ith first stage unit there is an estimator T i (based on subsamplingat second and subsequent stages) such that

(68)

where 82 and V2 denote conditional expectations and variances over the secondand subsequent stages.

Defining similarly 81(1) and Vl(f) as the expected value and variance off overdifferent samples of units at the first stage, we can easily prove:

Theorem 5. The expected value of f is equal to the expected value of the con­ditional expected values, i.e.,

It may be noted that, in terms of the operators El , E2,

we have

(69)

. , E" of Part I,

(70)

Theorem 6. The variance of f equals the sum of the expected value of the con­ditional variance and the variance of the conditional expected value, i.e.,

(71)

We note that in terms of our operators of Part I,

(72)

In order to estimate the total of the population (or stratum), one set of un­biased estimates is

Zl = Tl/Pil

T2Z2 = T1 +­

Pi2

T".. + T n - l +-pin

(73)

where Ti, T2, ••• , 'I'; are unbiased estimates of the totals of the respective firststage units drawn in the sample in this order.

Dow

nloa

ded

by [

Mos

kow

Sta

te U

niv

Bib

liote

] at

18:

37 1

0 Ja

nuar

y 20

14

282

Thus

AMERICAN STATISTICAL ASSOCIATION JOURNAL, JUNE 1956

We have

n

Z = L CiZi,1

L: c, = 1 is unbiased. (74)

(75)

where t; is the corresponding unistage estimator discussed in Part I.With regard to the covariance of Zh and Z~, theorem 5 gives

so that unlike the unistage case, Zh and Z~ are not uncorrelated.Considering the series of estimates

822

V2 = 81 2 + -,Pit

(76)

(77)

it is easy to see that

n

V = L C,V.,1

L c, = 1 (78)

is an unbiased estimate of Lf U,2.

Hence an unbiased estimate of the variance of Z is

n

L Z;Zj

f\z) = Z2 - 2 j>i~l + v.n(n - 1)

(79)

In case Cl=C2 ••• =Cn= lin, it is easy to see that V(z) is positive.Comparing (79) with the corresponding estimator (23) in the unistage case,

we arrive at the following important rule for estimating the variance in multi­stage designs.

The estimate of variance in multistage sampling is the sum of two parts.The first part is equal to the estimate of variance calculated on the assumptionthat the first stage units have been measured without error. The second partis obtainable from the population total estimate itself by substituting theestimated variances for the estimates of the totals of the units.

Another set of unbiased estimates is

Dow

nloa

ded

by [

Mos

kow

Sta

te U

niv

Bib

liote

] at

18:

37 1

0 Ja

nuar

y 20

14

SAMPLING WITH VARYING PROBABILITIES

Zl' = Tl/Pil

1 1 T 2Z2' =--- --

N - 1 »« Pj2

1 1 1 t;Zn' = - - ... -.

(N - l)(N - 2) ... (N - n + 1) Pil Pj2 pin

It is easy to see that

283

(80)

n

Z' = L CiZ/,1

is an unbiased estimate. And

L Ci'" 1

whereas

1 ,,1,,1C(zx'z,/) = c: c: L

(N - X)(N - X + 1)2 ... (N - 1)2 Pil Pj2 •••

Yx. L - (Y - Yl ... y,,) - P.pmX

(83)

which is the same as obtained in the unistage case. In order to estimate thevariance of z' we notice that

N N

8(TiZ/) = L Y k 2 + L (J'j,2

k=l k=l

N - 1

Considering the estimates

Vl' = 8l2/ Pil

1 1 822

V2' =--- --N - 1 Pi2 Pj2

(i = 1, 2, ... , n)

U> i = 1,2, ... , n).

(84)

(85)

(86)

1 1 1 8102

vn'= --"'-(N - l)(N - 2) ... (N - n + 1) Po. Pj2 PZn

it is found that

is an unbiased estimate of I:.f rTk2•

L c. = 1 (87)

Dow

nloa

ded

by [

Mos

kow

Sta

te U

niv

Bib

liote

] at

18:

37 1

0 Ja

nuar

y 20

14

284 AMERICAN STATISTICAL ASSOCINrION JOURNAL, JUNJ<J 1956

Hence an unbiased estimate of V(z') is given by

~ 1 N-1 nV(z') = Z'2 - - L: Tiz/ - --- L: Tiz/ + o',

n (~) j>i~1

Another set of unbiased estimates is

where P ij is defined by (38). As before

(88)

(89)

n

Z" = L: CiZ/',1

L: Ci = 1 (90)

is unbiased for estimating Y. With regard to the variance of e-" we have

(91)

The covariance between zx" and z;" is given by (42) in Part 1. The expressionfor the variance of z" is apparent. An unbiased estimate of the variance of z"is given by

~ L:' ZXzl'V(z") = Z"2 - 2 + v

n(n - 1)

where L:' denotes summation over the different pairs in the sample.

(92)

REFERENCES

[1] Durbin, J. "Some results in sampling theory when the units are selected with unequalprobabilities," Journal of the Royal Statistical Society, Series B, 15, No.2 (1953), 262-9.

[2] Hansen, M. H. and Hurwitz, W. N., "On the theory of sampling from finite popula­tions." Annals of Mathematical Statistics, 14 (1943), 333-62.

[3] Hansen, M. H. and Hurwitz, W. N., "On the determination of optimum probabilitiesin sampling." Annals of Mathematical Statistics, 20 (1949), 426--32.

[4] Horvitz, D. G. and Thompson, D. J. "A generalization of sampling without replacementfrom a finite universe," Journal of the American Statistical Association, 47 (1952),663-85.

[5] Midzuno, H. "On the sampling system with probability proportionate to sum of sizes."Annals of the Institute of Statistical Mathematics 3 (1952), 99-107.

[6] Yates, F. and Grundy, P. M. "Selection without replacement from within strata withprobability proportional to size," Journal of the Royal Statistical Society, Series B, 15,No.1 (1953), 253-61.

Dow

nloa

ded

by [

Mos

kow

Sta

te U

niv

Bib

liote

] at

18:

37 1

0 Ja

nuar

y 20

14


Recommended