An Empirical, General Population Assessment of - Cornell University

An Empirical, General Population Assessment of the Properties of

Variance Estimators of the Horwitz-Thompson Estimator Under

Random-Order, Variable Probability Systematic Sampling

Stephen V. Stehman

Biometrics Unit, Cornell University, Ithaca, NY 14850 USA

W. Scott Overton

Department of Statistics, Oregon State University, Corvallis, Oregon 97331 USA

BU-936-M Revised August 1989

An Empirical, General Population Assessment of the Properties

of Variance Estimators of the Horvitz-Thompson Estimator

Under Random-Order, Variable Probability Systematic Sampling

Stephen V. Stehman and W. Scott Overton_

Biometrics Unit, Cornell University and Department

of Statistics, Oregon State University

Technical Report 132

August 19d9

UEPARTMENT OF STATISTICS Oregon State University

Corvallis~ Oregon

ABSTRACT

Previous empirical studies of the properties of

variable probability, systematic sampling and the Horvitz

Thompson estimator, T,, have focused on specific, real-world

populations (cf. Stehman and Overton, 1987; Cumberland and

Royall, 1981; Rao and Singh, 1973). The study of special

case populations is recognized as important, but these

studies provide limited information that can be used to

generalize to other populations. By a systematic

simulation study of a specially designed set of

populations, we have extended the assessment of the

properties of V(Tr) and estimat~rs of V(Tr) is extended to

more general populations represented· by the population space.

The population space is a standardized representation of

bivariate populations with equal variance in each marginal

distribution. Each representation of this space has a

specific bivariate distribution shifted over the dimensions

of the space. Three bivariate distribution forms, each

with three correlations, were studied.

Two common estimators of V(T,) were investigated, one

proposed by Horvitz and Thompson (1952), the other by Yates

and Grundy (1953) and Sen (1953). The sampling design

stu~ied was random-order, variable probability systematic.

Both variance estimators require computing the pairwise

inclusion ·probabilities. Calculating these pairwise

probabilities requires immense computing time, so two

approximation form~las were studied, one due to Hartley and

Rao (1962), the other due to Overton (1985). The two

variance estimators were computed using each of the

pairwise inclusion probability approximations.

Behavior of the estimators was represented by contour

plots describing confidence interval coverage, root mean

square error, relative bias, and proportion of negative

estimates for the variance estimators over the range of

populations in the population space.· These plots provide

the basis of a descriptive theory for the properties of the

variance estimators. The population space approach

successfully serves as a bridge between strictly special

case, empirical results and a general analytical theory.

The approach also furnishes. a perspective in which more

theoretical results can be pursued.

1. INTRODUCTION

Properties o£ the variance and estimators o£ the

variance o£ the Hor-vitz-Thompson estimator are investigated

£or variable probab_i 1 ity, systematic sam_pl ing (herea-fter

denoted vps sampling). In a -finite population o£ size N,

assume that a response variable o£ interest, Yi, and an

auxiliary variable, xi>O, are de-fined £or each element o£

the universe. In variable probability sampling, a sample

unit is selected with probability proportional to x. We

will restrict attention to without replacement, -fixed

sample size schemes. The probability that the i'h

population element will be selected in the sample is given

by the. inclusion probabi 1 ity '~~"i = 2: PR(s), where PR(s) is {s:ies}

the probability o£ selecting sample s under sampling rule

R. The probability o£ selecting both the i'h and jth

population units is the pairwise inclusion probability,

1f'ii = L PR(s) • {s:(ij)es}

Horvitz and Thompson (1952) presented a general,

-finite population theory o£ .. estimation £or variable

probability sampling. The Horvitz-Thompson estimator, N

Tr=2: Yu/ru, is unbiaseo £or the population total, T,=2: Yi· UES . i=1

Two well-known estimators o-f the variance o-f T, are the

estimator proposed by Horvitz and Thompson,

VHT=E (;:y (1-1fi) +E E (1f'ij;.~i1f'j)~~;;~ i=1 i=lj::;fi I] I ]

and the estimator suggested by Yates and Grundy (1953) and

Sen (1953),

-1-

-2-

v - ~1 .@.. (1r(lf';-1fs;) (Y' - Y;)2 YG - LJ L.. 1r • • Y. 1r; ' i=1 j=i+1 IJ I

where the summations ror the t~o variance estimators are

over~the pairs or elements in the sample. Based on limited

investigation, the estimator vya is usually claimed

superior to vHT because vyG has smaller sampling variance

and is less likely to take on negative values (cf. Cassel

et~ (1977, p. 166), Cochran (1977, p. 261)). The evidence

ror the superiority or VyG is sketchy. It is known that

when the ratio ru = Yu/Xu is constant ror all u=1, ... ,N,

In this situation, vya:O, but vHT does not

identically equal 0; being unbiased, vHT must be capable of

negative values. Thus ror populations in which y is nearly

proportional to x, Vya would appear to have smaller

sampling variance.

Several empirical studies have shown advantages Tor

Vya· Rao and Singh (1973) studied 34 natural populations

using Brewer's probability proportional to x selection

procedure, and Cumberland and Royall (1981) examined

random-order, variable probability systematic selection

Tor 6 populations. Both studies round VHT Trequently

resulted in negative estimates, and that the sampling

variance or vHT was much larger ror many or their

populations. Stehman and Overton (1987a) presented some

simulation results showing that the advantages of vyG were

restricted to certain kinds or populations.

Clearly lacking in these comparisons of the two

-3-

variance estimators, vHT and vYG' is a general theory

describing their properties. A theoretical analysis o£

con£idence interval coverage, mean square error (MSE),

bias and proportion o£ negative estimates o£ the variance

estimators has not been done. A major complication in

development o£ an analytic theory is that the pairwise

inclusion probabilities depend on the particular Tinite

population x's. Writing speci£ically about variable

probability, systematic sampling, Brewer (1963) concluded

" ... the selection probabilities Tor the various possible

samples are £unctions o£ the sizes o£ ~11 the population

units and it is virtually impossible to construct an exact

general theory." A thorough empirical investigation o£ the

properties o£ these variance estimators is an intermediate

step toward development o£ a more general theory.

The population space assessment described in Section 3

was an extensive simulation study o£ a specially designed

set o£ populations. The sampling design investigated was

random-order, ~s with n=16. Approximation Tormulas Tor

the pairwise inclusion probabilities are commonly used in

practice. Also, since all populations in the simulation

study were relatively large (N>70), computing the exact

pairwise inclusion probabilities was not practical. Two

approximation Tormulas Tor the pairwise inclusion

probabilities were investigated,

(n-1) ;r.;r · il'i; = 1 ( ' 1) (Overton, 1985) ,

n-2 ;ri+;r;

and

-4-

(Hartley and Rao, 1962).

Further description of the pairwise inclusion probability

approximations and formulas for the estimators can be

obtained from Stehman and Overton (1987b, 1989).

Properties of the variance estimators were obtained by

simulation, using 5,000 replications of the sampling

procedure for each investigated population. The number of

replications was selected to provide a precise estimate of

the coverage probabilities obtained by the variance

estimators for constructing nominal 95% confidence

intervals for T,. With 5,000 replications, the standard

deviation of the estimated proportion of confidence

intervals covering the parameter is 0.003. Version 1.49 of

the GAUSS Mathematical and Statistical System (Aptech

Systems, Inc., Kent, WA) was used to run the simulations on

IBM XT or AT computers.

Several procedures were used to validate the

simulation programs and computing algorithms. Since T, is

unbiased for T,, the estimated expected value of Ty was

checked to make sure it was close to T,. The computing

formulas for the variance estimators were validated by

setting xi=1 V i=1, ... ,N,, and verifying that the estimates

matched those known for a simple random sample. If a

computing formula or algorithm was changed during the

course of the population space analysis, output from the

-5-

modified program was checked to ensure that the new

algorithm gave equivalent results to previously verified

algorithms. Finally, the results obtained from the

simulation programs matched those reported by Rao and Singh

(1973) and Cumberland and Royall (1981) for their

simulations using the same populations.

2. GENERATION OF PSEUDO-RANDOM NUMBERS

Random numbers from the Uniform(0,1) and standard

normal distributions were generated using the GAUSS

functions RNDU and RNDN, respectively. Gamma random

variables, G, were selected from a standard gamma

distribution (gamma distribution with parameter a, and

parameter~ set to 1). Only the standard gamma

distribution was considered because any other gamma

distribution can be obtained by scaling the standard gamma

(cr. Kennedy and Gentle, 1980). For integer-valued a,

gamma random variables were generated by the Tollowing

algorithm:

1) Generate ui TrO!Jl u ( 0' 1).

xi is an exponential random

variable with parameter 8=1. a

3) G = 2: Xd the gamma random variable is the sum of i=1

k independent, identically distributed

exponential random variables with 8=1.

Random variables for standard gamma distributions with non-

integer parameter, a, 0<a<1, were generated by algorithm

-6-

GS provided by Kennedy and Gentle (p. 213, 1980). Since a

sum o£ n independent standard gamma variables, each with

parameter ai, is distributed as standard gamma with n

parameter Ea1 , gamma random variables £rom distributions i=1

with parameter a>1 were generated by summing independent

gamma random variables generated by algorithm GS with"

proper choice o£ ai and n.

3. DESCRIPTION OF THE POPULATION SPACE

Three major £amilies o£ populations were studied, one

based on real data (STREAM), and two generated £rom known

probability distributions (GAMNORM and BIGAMMA). Within

each £amily, three di££erent sub£amilies representing low,

medium, and high correlations between the response

variable, y, and the design covariate, x, wer.e studied. A

sub£amily was then a set OT populations with the same

correlation, within ~he same major £amily.

All populations within a sub£amily were created £rom a

single base population. A sub£amily was created £rom the

base population by adding or subtracting constants to x

and/or y. Thus all populations in a sub£amily are the same

"cloud" o£ points shi£ted to various locations in the

(x,y)-plane. All populations within a sub£amily have

V,=V~, where v~ and v, are the population variances OT X

and y, respectively, and populations within a subfamily

also·have the same V, and the same correlation between x

I

'·

-7-

and y. Populations di££ering by an additive shirt in the

x's have di££erent inclusion probabilities. Additive

shirts in the x's are a £easible design tactic in some real

surveys, and this £eature oT sample design can be explored

conveniently in the context o£ the population space.

The variables x and y were standardized, X1 =x/..f\i; and

Y'=yf~, so comparisons would be invariant to the

measurement scale o£ the variables. The standardized

population centroid, (X''· Y1), was used to locate

populations within the population space. Note that

X'=X/..f\i;=l/cv(x), and Y'=Y/~=1/cv(y), where cv denotes

the population coe££icient o£ variation.

Con£idence interval coverage, ratios o£ root mean

square error (RMSE) o£ the variance estimators, and

relative .bias are invariant to the measurement scale o£ the

y's, and are, there£ore, the same in the original and the

standardized populations. The standardized population

space is also appropriate £or assessment o£ patterns o£

precision o£ T,. The standardized variance can be obtained

easily £rom V(T,), the variance o£ the Horvitz-Thompson

estimator £or the unstandardized variable y. For the

standardized variable Y'=y/~,

A convenient representation o£ standardized variance is

-8-

obtained by replacing V, by a quantity which is

proportional to V,, the variance oT T, Tor a simple random

sample.

4. FAMILIES OF POPULATIONS

4.1 STREAM Family

The Tirst Tamily of populations analyzed was

constructed Trom a subset oT the Phase I Stream Survey

Pilot Study data (Messer et~, 1986). A previous study of

this STREAM Tamily was reported in Stehman and Overton

(1987a). Some oT the results shown here supercede that

work.

Seventy-two oT the 100 units Trom the Pilot Study

sample were purposeTully selected yielding a base

population with correlation 0.82 between the response

variable, y=length of stream reach, and the auxiliary

variable, x=direct watershed area oT a stream reach. To

create the base populations Tor the two other STREAM

subTamilies, starting with the STREAM82 base population,

1) compute the least squares slope, p, and intercept,

a, oT the base population for subfamily STREAM82;

2) compute ei = Yi -yi, the residual Trom the least

squares fit, where Yi=a+Pxi;

3) let ei = k*ei (multiply the residuals by a constant

to obtain the specified correlation);

l l !

-9-

The values Yi were used as the response variable £or the

base population o£ the new sub£amily. Choosing k=2.5

resulted in a sub£amily with p=0.50 (sub£amily STREAM50),

and choosing k=0.25 resulted in ·a sub£amily with p=0.985

(sub£amily STREAM99). An advantage o£ this method o£

generating the base population was that the x's were the

same for each base population in the £amily. Thus the

simulations were "blocked" in that each sub£amily within

the STREAM family had the same £irst and second order _,

inclusion probabilities £or all populations with common X.

The STREAM £amily was created £rom data with unknown

population distributional properties, thus limiting our

ability to generalize to other populations. The empirical

study was expanded to include £amilies generated £rom known

probability distributions to permit broader understanding.

The next two £amilies examined were constructed to

represent distributions o£ random variables similar to

those likely to be encountered in pract1ce.

4.2 GAMNORM Family

For-the GAMNORM £amily o£ populations, x was randomly

generated from a standard gamma distribution with

parameters a=2 and l=l, and y was generated, conditional

on x, as a normal random variable. For each x 0 Yi was

obtained £rom the equation, Yi=Pxi+fn where fi was a random

variable distributed Normal (0, O'D, and 0'~= (1-p2 )0'~. The

-10-

infinite population notation is appropriate because cr2

denotes the population variance of the distribution of the

generated random variable. Once p(x, y) was specified,

the value of cr~ was fixed by the following argument. If

the relationship between y and x is given by

Yi = {Jxi+ei' (1)

then 0'~ = {J2cr~ + cr~ (xi independent of ei) ,_ and

CT=/CT~ = {J2 + CT~/CT~. (2)

Imposing the constraint that cr==crL {J=cr,; 11 /cr~=pu11 /cr,;=p, and

using equation (2),

{J2 = 1-crVcr~.

Solving (3) for u~ yields u~= (1-p2)cr~.

(3)

(4)

In practice, a set o£ x's was generated and V,;

calculated. Then V,; from the particular set o£ x's was

used instead of u~ in (4) for generation of the y's. A

subfamily base population was created by specifying p,

generating the set of e's, and forming the variable Yi from

(1). The target correlations were 0.5, 0.8, and 0.95, but

due to the random data generation, the realized

correlations were 0.48, 0.75, and 0.94.

The same set of 100 x's was used as the base

population for all three subfamilies. Using a single set

of x's again provided "blocking" on the inclusion

probab\lities of the ~s sampling design. The GAMNORM

subfamily base populations could have been created using

steps similar to those described in Section 4.1 to create

-11-

the STREAM subfamily base populations. This change would

have provided an additional level of blocking among the

GAMNORM subfamilies.

4.3 BIGAMMA Family

The third family studied consisted of populations

selected from a bivariate gamma distribution. Johnson and

Kotz (pp. 216-218, 1970) provide the following basic theory

for generation of bivariate gamma random variables. As

before, the standard gamma, with A=1, is used throughout.

If W0 , W1 , and W2 are independent random

variables, with Wi distributed Gamma(Oj), and

if X=W0 +W1 and Y=W0 +W2 , then X is distributed

Gamma(80 +81 ), Y is distributed Gamma(80 +02 ),

(X,Y) is distributed Bivariate Gamma, and

P (X, Y) = 80 [( 80 +01 ) ( 80 +02 )]-112 •

Generating bivariate random variables based on this result

permitted specifying p(X,Y) and provided marginal

distributions of X and Y that were both gamma

distributions. The parameter for both standard gamma

marginal distributions was a=2. For this parameter

specification, 01 =82 =2-80 , and o-~=o-i. Setting 81 =02 =0, we

obtain p(X,Y) =00 / (0+00 ). Then for a specified p,

0o=pt1 / (1-p). Finally, solving for 0 and 00 yields 0=2-00 ,

and Oo = 2p. 0 and 00 are the parameters needed to generate

the bivariate gamma random variables.

-12-

The algorithm used to generate a bivariate gamma base

population with speciTied p(x, y) and marginal gamma

distributions each with parame~er a-=2 was:

1) generate Wo distributed Gamma(2p);

2) generate wl distributed Gamma ( 2 ( 1-p) ) ;

3) generate w2 distributed Gamma ( 2 ( 1-p) ) ;

IT a population with large x values was generated so that

at least one oT the sampling units would be selected with

probability 1 in a sample oT size 16, that population was

discarded and a new base population was generated. The

three p's speciTied were 0.5, 0.75, and 0.95, and the

actual realized correlations were 0.49, 0.77, and 0.97.

For the BIGAMMA Tamily, a diTTerent set oT x's was

generated Tor each OT the subTamily base populations. To

obtain both marginal gamma distributions with parameter

a-=2, this bivariate random variable generation algorithm

required generating a new population OT x's Tor each base

population. To obtain blocking on the x's, another

algorithm Tor generating the bivariate random variables

would be necessary.

5. RESULTS OF POPULATION SPACE ANALYSIS

Notation: _, y-t X , population standardized means OT x and y

Ty Horvitz-Thompson estimator oT the population total

V(Ty) variance oT the Horvitz-Thompson estimator

-13-

Approximation Formulas

1r~!" approximate "formula "for 1ri,. (Hartley and Rao, 1962) IJ

1ri; approximate "formula "for 1rij (Overton, 1985)

Variance Estimators

vHT Horvitz-Thompson variance estimator

vyG Yates-Grundy variance estimator

v~T Horvitz-Thompson variance estimator calculated using 1r~

vHT Horvitz-Thompson variance estimator calculated using

v~G Yates-Grundy variance estimator calculated using 1r~

vyG Yates-Grundy variance estimator calculat~d using 1ri;

0 1rij

Scatter plots o-f the population located at

(X1 , Y1) = (7, 7) "for each sub"fa.mi ly are shown in Figure 1.

Population quantile-quantile plots "for the middle

correlation sub-families at (X', Y1):::: (7, 7) are shown "for the

variable x in Figure 2 and "for the variable y in Figure 3.

5.1 Comparison or Variance Estimators

The criteria "for comparison o-f the variance estimators

1) con-fidence interval coverage obtained by nominal 95%

intervals calculated as 1',±1.96*;

2) estimated MSE;

3) relative bias, estimated by

where E(v) was the simulated expected value o-f v, and

V(T,) was an unbiased estimator or V(T,) obtained "from

the simulations;

-14-

• hr 4) probability o£ a sample resulting in negat1ve vHT·

The behavior sur£aces were described by a battery o£

contour plots generated by the interpolation and contour

plotting routines supplied by the SURFER so£tware package

(Golden So£tware, Inc., P. 0. Box 281, Golden, Colorado).

The kriging option in SURFER was selected to create a

regularly spaced grid £rom the irregularly spaced input

data. The octant search option in SURFER, using the 10

nearest data points, was used Tor interpolating grid

points.

5.2 Interpretation o£ Contour Plots

To guide the reader's interpretation o£ the contour

plots, certain important £ea~ures o£ the plots will be

highlighted. The standard diagonal serves as a convenient

spatial re£erence. Although many details o£ speci£ic plots

are discussed in Sections 5.2.1 through 5.2.5, £ocus on the

overall patterns should be maintained. Figure~ 4 through

17 are organized such that each column on a page represents

a £amily, and the rows represent subfamilies, arranged in j.

the column by increasing correlation. Enlarged plots o£

the lower leTt corner are included to show the detail in

that portion or the population space. All plots are

located at the end or Section 5.

-15-

5.2.1 Standardized variance

The standardized·variance compared the variance or Tr

under random-order, vps sampling relative to the variance or

Tr under simple random sampling (Figure 4). The qualitative

pattern or standardized variance was similar for all three

families. The standardized variance surrace·was highest

along the left edge or the population space, then sloped

downward moving diagonally from the upper left to the lower

right corner. A trough or minimum standardized variance

was located near the standard diagonal for the medium and

high correlation subfamilies, but was clearly below the

standard diagonal for the low correlation subfamilies. The

surface sloped gradually upward out or this trough when

moving toward the lower righthand corner.

The region in which variable probability sampling was

more efficient than simple random sampling was larger Tor

high and medium correlation subfamilies compared to the low

correlation subramilies. The contour showing equal

precision Tor vps sampling and simple random sampling was

almost directly over the standard diagonal for the three

low correlation subfamilies, and the advantage or variable

probability sampling increased with p(x, y). In the upper

left region or the population space, variable probability

sampling was much less efficient than simple random

sampling.

-16-

5.2.2 Confidence interval coverage

The main results observed from the contour plots

(Figures 5-8) or observed confidence interval coverage

(nominal 95% intervals) obtained from each variance

estimator were:

1) Vya and v~G provided similar coverage over the

entire population space;

2) v~T provided the poorest coverage of the 4

estimators studied;

3) vHT provided close to the nominal 95% coverage

for most or the population space, but the pattern

or vHT coverage differed from the pattern shown

by vya and v~G;

4) coverage was poorest along the extreme left

edge or the population space ror all variance

estimators except V hr • HT'

5) the relier or the surfaces increased with

subfamily correlation;

6) qualitative patterns in coverage were similar

ror all three families.

For most or the population space, coverage provided by

vHT was near the nominal 95% (Figure 5). Regions or low

coverage occurred along the extreme left edge or the

population space, and in the high correlation subfamilies

with small ?'. A wide plateau or high coverage extended

rrom the upper right corner toward the lower lert corner,

narrowing toward the origin roughly parallel along the

-17-

standard diagonal .. The coverage surTace sloped steeply

downward oTT the leTt edge OT the high plat~au, the

contours nearly parallel to the vertical axis. Another

sharp decline in the surTace occurred along the standard

diagonal in the region near the o.rigin. The downward slope

oT coverage OTT the high plateau was muc.h gentler toward.

the lower right region oT the population space. The ·

gradients OT the coverage surTaces were steeper as the

subTamily correlation increased. Regions oT coverage

provided by vHT that were higher or much lower than the 95%

nominal level were associated with regions oT large

positive and large negative relative bias oT vHT"

Coverage provided by v~ was generally much worse than

coverage provided by vHT (Figure 7). v~T had very poor

coverage in the region surrounding the standard diagonal.

This region oT poor coverage OT v~T w~ associated with a

region OT high probability oT negative estimates (see

Figure 17). Coverage levels Tor v~T were unacceptably low

Tor the high correlation subTamilies.

The coverage provided by vya and v~G (Figures 6 and 8)

was 93 or 94% Tor most oT the population space. vya and

v~G had lower coverage than vHT along the extreme leTt edge

OT the population space, but both Yates-Grundy based

estimators improved on the coverage oT vHT in the high

correlation subTamilies in the region near the horizontal

axis.

-18-

5.2.3 Ratios of RMSE

Comparison of the variance estimators on the criterion

of RMSE (Figures 9-12) was based on selected ratios of

RMSE's. The main features of the RMSE comparisons were:

1) v~G had smaller RMSE than v~T for most of the

population space, but RMSE of v~T was less than

or equal to RMSE of v~G in some regions of the

population space;

2) vyG was almost always smaller in RMSE relative

to vHT;

3) vHT was far superior to v~T along the standard

diagonal, and never much poorer than v~T in any

region;

4) VyG and v~G had very similar RMSE, with vyG

having slightly smaller RMSE in populations

located near the origin;

5) ratios of RMSE's showed greater variation over

the population space in the high correlation

subfamilies compared to the low correlation

subfamilies.

The surface of the ratio RMSE(v~T)/RMSE(v~G) was

roughly symmetrical about the standard diagonal. A region

of high ratios extended along the standard diagonal sloping

downward from the upper right to the lower left. The

gradient of this slope increased with the subfamily

correlation, and was particularly steep in STREAM99.

-19-

Throughout much o-f the space pictured in Figure 9, v~G had

smaller RMSE than v~T· Near the origin, v~T had RMSE less

than or equal to v~G in the low correlation sub-families, so

v~G was not uni-formly superior to v~T on the basis o-f the

RMSE criterion. However, v~G was never much worse than v~T

in terms o-f RMSE, while v~T could be extremely poor

hr relative to vyG• Since relative bias was nearly zero -for

both v~T and v~G' these RMSE comparisons were essentially

equivalent to variance comparisons.

RMSE o-f vHT was less than the RMSE o-f v~T in the

region surrounding the standard diagonal (Figure 10). A

prominent -feature o-f the sur-face o-f the ratio o-f RMSE o-f

vHT to RMSE o-f v~T was a deep, V-shaped trough along the

standard diagonal sloping downward and widening toward the

upper right corner. This trough was deepest in the high

correlation sub-families as the RMSE advantage o-f vHT

relative to v~T increased with the sub-family correlation.

RMSE o-f v~T was smaller than RMSE o-f vHT along the le-ft

edge o-f the population space, and in the region near the

origin o-f the low and medium correlation sub-families.

Regions o-f superiority o-f vHT relative to v~T corresponded

to the regions in Figure 9 where v~G was -far superior to

hr VHT• Thus the approximation 1ri; improved the RMSE o-f the

Horvitz-Thompson variance estimator in those regions o-f the

population space where v~T had high RMSE.

RMSE o-f VyG was smaller than RMSE o-f vHT -for most o-f

-20-

the population space (Figure 11). The surface or the ratio

or RMSE's or vvG to vHT decreased gradually from the upper

left corner to the lower right corner or the population

space, the contours or the surface running roughly parallel

to the standard diagonal. The detailed plots or the region

near the origin indicated a U-shaped ridge sloping

gradually downward toward the origin along the standard

diagonal. Although the gradients in the surfaces increased

with correlation, the surraces were less steep than those

observed in Figures 9 and 10.

-v~G and vvG had almost identical RMSE throughout the

population space (Figure 12). vyG had slightly smaller

RMSE in populations located near the origin. This pattern

was consistent for all three subfamilies.

5.3.4 Relative Bias

Or the four variance estimators investigated, only vHT

displayed a significant relative bias (Figure 13). The

other three variance estimators were nearly unbiased

(Figures 14-16), with the exception that relative bias or

VyG was above -0.10 along the extreme left edge or the

population space near the origin for two or the STREAM

subfamilies. The pattern of relative bias or vHT was

similar in all families. Relative bias or vHT decreased

from a high positive value in the upper left region to a

high negative value in the lower right region, and the

-21-

magnitude of the bias was largest in the high correlation

subfamilies. vHT was unbiased in a band of the population

space located just below and roughly parallel to the

standard diagonal. The strong pattern in the bias of vHT

suggests that an adjustment of the estimator to make it

unbiased may be available.

5.2.5 Probability of Negative vUT Estimates

For all samples obtained in the simulation study, vyG

and v~G were non-negative, and negative vHT estimates were

extremely rare. Only v~T was subject to frequent negative

estimates (Figure 17). Negative v~T estimates were rare

for the low correlation subfamilies, but increased in

frequency as the subfamily correlation increased. Negative

estimates were infrequent in all subfamilies along the left

edge of the population space and along the horizontal axis.

The probability of a negative estimate was highest along

the standard diagonal, and increased along this diagonal

from the lower left to the upper right.

-22-

LIST OF FIGURES

1 Scatter Plots of Subfamily Populations at Standardized Centroid (7,7).

2 Population Quantile-Quantile Plots of the Variable x for the 3 Families.

3 Population Quantile-Quantile Plots of the Variable y for the 3 Families.

4 Standardized Variance, V(T,)/VsRs·

5 Confidence Interval Coverage Obtained using vHT (nominal 95% intervals).

6 Confidence Interval Coverage Obtained using 0 VyG (nominal 95% intervals).

7 Confidence Interval Coverage Obtained using vhr -HT

(nominal 95% intervals).

8 Confidence Interval Coverage Obtained using vhr YG (nominal 95% intervals).

9 Ratios of Root Mean Square Errors: RMSE(v~T)/RMSE(v~G)·

10 Ratios of Root Mean Square Errors: RMSE(vHT)/RMSE(v~T)·

11 Ratios of Root Mean Square Errors: RMSE(vHT)/RMSE(vyG)·

12 Ratios of Root Mean Square Errors: RMSE(vyG)/RMSE(v~G)·

13 Relative bias of vHT·

14 Relative bias of 0 VyG•

15 Relative bias of hr VHT•

16 Relative bias of hr VyG•

17 Probability of a Sample with Negative hr VHT•

. -23-

Figure 1. Scatter Plots oT Sub£amily Populations

at Standardized Centroid (7,7).

-24-

~ ~ ~

~ ~ ~

:I: :I: :::E < < <

II) w co w II) w a: a: a: ... ... ... en en (/)

... <0 ...

... ... ... Zl Ol 8 8 t Zl OL 8 9 t iL OL 8 9 " QS"'•JJOO ZS"•JJOO 66"-JJOO

Clot Cll Cll - -~ ~ ~

CD ;

CD ;

CD i < < < ~ S! S!

Gl Gl ~~i:,:

>~-.~~~:::. OD OD OD

... ... ... Zl Ol 8 9 " Zl Ol 8 9 " ZL Ol 8 9 " 6tr••JJOO u--JJOO LQ•JJOO

~ Cll ~ -0 ·o -

:::E :I: :I: a:· 0:· a: ~ 0 0 II) II) z .., z ::e :::E :I: < < < 0 0 0

... II)

"'

~ .... ., Zl Ol 8 9 .. lt Ol 8 9 .. Zl Ol 8 9 ..

Qtr"O•JJOO SL"O•JJo:> 1>6"G-JJO:)

:-- ~- ~ ;,~ : : : ..

-25-

,.-,.- • • •

• U) •• 0 (X) .;· ,.- • •

U) • " 0') ·" C\1 :E (X) .. ., ... :E U) t a:

<l: " " 0 (X) • .I I w z

/ a: ·' :E 1-

<l: " (/) U) .-(!} <0 I

<0 , I

• U) U)

U)

5 6 7 8 9 10 11 5 6 7 8 9 BIGAMMA77 BIGAMMA77

• • • • U1 • • .... (X) ~

• C\1 (X) , ~ U)

' <l: " ,

w I

a: J 1-(/) U1 ~ ..

<0 J , 1.()

;' . u; 5 6 7 8 9 10 11

GAMNORM75

Figure 2. Population Quantile-Quantile Plots of the

Variable x for the 3 Families.

(Middle correlation subfamilies at (X', Y 1) = (7, 7).)

•

•

10 11

lO r--. :E a: 0 z :E <( (!J

-26-

0 0 • ,..... ,.....

·' • • m m • • •• ... (\J ,. -· (X) (,/ ... - (X)

:E CX) •• <( ' r--. w ~

a: r--. / ._ <0 (f)

J ,j J <0 lO #> • -.;:t . . lO •

5 6 7 8 9 10 11 5 6 7 8 9

BIGAMMA77 BIGAMMA77.

0 ,.....

• • m • • (\J ;~· CX) _. :E (X)

I <( ,. w

I a: ...... ._ (f) .. / <0 •

• #' "'·

lO •

4 5 6 7 8 9 10 GAMNORM75

Figure 3. Population Quantile-Quantile Plots of the

Variable y for the 3 Families.

(Middle correlation subfami 1 ies at (X1 , Y1) = (7, 7).)

•

10 11

;F. -: ... ·,.··A}~·-· .. · :~·-.:~.:;4:

-27-

Figure 4. Standardized Variance, V(T,)/VsRs·

a) Complete Population Space.

b) Enlargement o£ Lower LeTt Corner.

(Contours plotted: ~.25, 0.50, 1.0, and 3.0.)

I .

·GAMNORM48

•• •• " /'., " ID 10

• •

10~51 / , 10~95

,, •

CAMNORM75

12 f " 10

•·95

0.12

o.tJ

. . . ....... . ....... '., ~. ~. ~-•;. :.: -: .... ~:·-:;_:_::::::~-~---.

{

/

x· ~--

5(:

BIGAMMA49 2.08 0. 5

0

/· .. ·" ./

BIGAMMA77 1,zs 0.45

'} I /I ' I

_/O;tf" o.t4

BJOAMMA97 0.13

... '\: .,.·~::~:-::~;·:~---::::.!:7"-~ ...

STREAM 50

II 2.

" ·" 10

/ Pl '-'

• 0 • 0 , 11~54 o. 2 a , ,, ..... • / ' (1)

c+ (1)

"'C 0 , I: .....

" .. ~ STREAM82 .... Ill 0

s.p (7 I ::s Cll "

10 , Pl 0 (1)

I . 7 ~11,.Z4 Iff U.'i:> I t-.j

00 I

STREAM99

llr-----------------,----;r-r~---;

" 10

18.,38

;····

.. :. ~~:: > 0'1 ... < ::& ::&

~ CD

-29-

Figure 4 (continued)

b) Enlargement of Lower Left Corner.

.... 0!•

"' ~·

"' 110

I

,... .... ~ ::::E

~

.. "!" 0

,... 0'1

~ ::::E

~ CD

... ., ::::E 0:· o. z

~

-30-

Figure 5. Confidence Interval Coverage Obtained

using vKr (nominal 95% intervals).



(Contours plotted: 85, 90, 93, 95, and 97)

::~~- :·:--·~·:·:->;•.·~:::'· . ··-:-:··. :~;' .. _ ..... :.•_ .. , •.. .;:~·--.. -:~~-.:),:~,:-·

CAIANoRJ.I48 9S

\ 94

BICAMJ.IA49

I :: [ I J. 9~

I 9~

STREAM~O

I ::o I I 97

I ) 9r ~ "-'

0 0

9,~ / ' , ~~ r.• r 9,6 / r ~ 1-' (1)

'.~I ..1 .l I I 1 ~Pr •• -1 • ~ \ I I ·.· / .A c+ (1)

"t1 0 '0

10 It 11

12 t

I 1 J t t I

17-1 J • 7 • 'I' 10 II

:t 7 I i~~:' ~: 1 2 c

II

l 4 t . I ......

10

J I t 10 11 I ~

]((((~) /l! ~ 0

.• ~------/' 93 :~ l)) r: /J ~/./I~

I

~ 9,2

~

:~ JJ) 1-'

ft~ I

3 ~· ~2 , . 9,2 ~ ...__._j

to n u I

I 2 l 4 I I t I I 10 tl I

BIGAMMA97. 2

t2l4il71 I 10 tt _,

STREAM99

l 97 93

II

10

.. ;:/~ t ~~ (·( '! .. • .:/·

I~ I 2):; 1">. ~ • 'J • I I II 11 u 10 11 IJ

CAMNORM48 BICAMMA49 STREAM 50

cr "%j '-" ""' oq tx:l t:: ::l '1 ....... Cll

8,81 \ ~.· . ,,. 1 1 1 I I T .....! ~ '1 CJ1

oq Cll ,....... 8 0 Cll 0 ::l ::l ct ct

""' 0 ::l I I I

I J • "'+! ~ I • J • I 2 J • I

Cll CAMNORM75 BICAMMA77 STREAM82 t"' 0.

0 '-" ( Cll

9,4 I I I I / I II I I I I '1

t""' Cll "'+! ct I

6,21 I ~f 9;4 I I e~ I 19() 9,4 J 1 \ s.e\ \ \ -:l·;)~ I 0 ~

0 tV '1 I ::l Cll '1

11 -........... ,.......,.~ ,.....-=::I I a a • ~~L----------~~-------7----====~

CAMNORM94 BICAMMA97 STREAM99

r:.:. ~

r

-33-


using vvG (nominal 95% intervals).




CAMNORM48 STREAM 50

pl -._;

('.) 0 8 '0 '"'"' (b ct

·(b

"tt 0 '0 r:: '"'"' BICAMMA77 STREAM82 ~ ....

.... I I I I Itt- I I I I "tl I ~."'! I 0 ::s Cll '0 pl (') (b

I (.1) If:>. I

10 " .. CAMNORI.t94 "BIGAMMA97 STR£AM99

'" ... 11

II

10

.I

•( I / 9,3

9,4

9,3

· · •· ··•· ··-~1""'211!!'<>. r.

CAMNOR1148 81CAMW.49 STREAM~O

CJ" ~ ..._, .... Q'q

tt:l c ::l ., ...... (b

PI ., 0) Q'q (b ,-... 8 0 (b 0 ::l ::l ct ct ....

I I I 0 ::l I 2 J • I 'l , • I l , • o-f) c

CAMNORM75 81CAMMA77 STR£AM82 (b

r 0.. 0 ..._, ~ (!) ., t"" (b o-f) ct

I

0 w 0 01 ., I

::l (!) .,

~~ .

I I J • I

t I . ·. , • I l

GAMNORM94 BIGAMMA97 STR£AM99

2.

~~L---------~~----------~-----------J ·~~----------~.~~--------~.----------~

-~·

-36-

Figure 7. Con'fide.nce Interva.l Covera.ge Obta.ined

using v~ (nomina.! 95% intervals).

a.) Complete Population Spa.ce.

b) Enlargement o'f Lower Le'ft Corner.


~:~

CAMNORM4S B~MA49 , 1 a1 as ., ee .,., ,. I 7 I I 7 """""I

·:l/ ~ I Ill '-'

0 0 a '0

.. I ,_. - ('b

c+ ('b

'"0 0 '0 s::

tl It ......

CAMNORM75 BICAMMA77 STREAM82 ~

~ , ::r Q\ I ] '·')) ~! '0

~ 8,7 \1 &' ........-!~ 1

(.1.)

I~ /1 • I I I

tt.s•tt11ttOUt2 tO II 12

OAMNORM94 BICAMMA97 II 69 ~~

10 tt 12

~

-~:·::

ta " tt

STRfAM99

~ I

.. ,

CAMNORM48

I I I ) •

CAMNORM75

I I I s •

GAMNORM94

11.7 f e.o J I

ft. II

~ 811

~

BICAMMA49

I I I )

61GAMW.77

I I I •

BICAMMA97

G I e.4 7,11 11,3

·r 8,5

7,1

11,7

·@ :'3:·. ~:·

•

•

STREAM 50

cr "%j '-" 1-1•

oq [?:j ~ :::1 "1 .... (b

Pl "1 -..1

Qq (b ,...._ 8 0 (b 0 :::1 :::1 ct ct

I I s • 1-1•

I 0 :::1 STREAM82 1-+, ~

(b

t"" 0. 0 '-" ~ (b

"1

t"" (b 1-+, I ct ~

00 0 I 0 "1 :::1 (b

"1

1 l ~ I .

I

STREAM99

. 0~ 5,11

2 ~ 5,9

6,5

-39-


using v~G (nominal 95% intervals).


b) Enlargement oT Lower LeTt Corner.


CAMNORM-48

CAMNORM75

CAMNORM94 94

9,4

BICAMMA49

BICAMMA77

BICAMMA97 93

.... -~.~

STREAM 50

~ ..._,

0 0 8 "0 ~

(b

ct (b

"0 0 "0 ~ ~

STREAM82 ~

::tr ('(,,, I

~· 0 ::3"

(/) "0 ~ (') (b

I ~ 0 I

10 " ,,

STREAM99

ltn--------r--------~-----------------,

II

10

IS,9 9,3

---·--- --~- ·-··---· ----,

.... ~,·· Cit ... ~ :IE

~ m

co ... :IE a:· 0 z

~

-41-

Figure 8 (Continued)

b) Enlargement OT Lower Le£t.Corner.

... ,... .·.· ~ "' ~ ~·,:~~ .. j·~~:.

:IE :I: ~ ~ m m

-42-

Figure 9. Ratios of Root Mean Square Errors:

RMSE(vUT)/RMSE(v~G)· a) Complete Population Space.


(Contours plotted: 0.8, 1.0, 1.2, 2.0, 5.0)

CAMNORM48 BICAMMA49 STREAM~O .. ~ T .. .... .... .... . ... ::[ ::l I .. l '\ I ~

0 0 a

l,.f5 119 ' L i!:tS \ 1 .~9 I J 9 ' ["b':i~ \ 2.~6 ~p ' J '0

~.~ ;; 1.p ____..- ~\ qs ~ 1.~9 ct . ' ~ ~

1.91 • ' 1.94 . '"0

·~· s0.9 1 s 0 ... .(Jilt ~· 1.0 I .~ 1.0~1.0 'g

.~90. t-'

1 1 '" I I l 4 J t • .' I . t 10 11 12 t 2 l • S. I 1 I t 10 tt 12 rl-CAMN0RM75 BICAMMA77 STREAM82 ,_.. " '·" J " '·" g t ~ " " 10 ·.... .. ~ .w ~ t """ 4.?1 ~ ~u ~u ~

• I ·1.1• q1 , L ~4 \ 1.~4 1. 4 , s.~e ~

II

" 10

1.0

OAMNORM94 5.74

e.F

~

" 12

~39

2.to

t.o"""' ~i.o

10 11 12

11

" ••

:'3:·. X:'·

l.t9

BICAMMA97 s.ee

I I •H I I I

10 11 12

STREAM99 !.Q;15

10 " 12

0 t()

~ a: In

"' .. ~ :It

2 CD

-44-



.., "!' 0

~j "' -0

.... -0

"' -0

... -0

.., ... d "!' ... 0

"!' 0 ... ~

.. "!' 0

"' ~ .., .. - d ci

&Q:;. ...: - 0

... d ... -0

..... "!' 0

<n -0

... -0

... «< ~

:i a: Iii

" ~ :It

2 CD

... -0

"' d

.. -ci

... -0

·o OJ" 0·

.., ~

.,. -0

.. Ill -0

"' .... 0

• -0

... ~

.... O>"!' ""0 0

" d ... -0

... ~

"' .... 0

.... -0

.. .,.. ci

" ~ :It

~ Cll ~)

~ ... fi .,;

,.~ ~ ...

5·2~ ___ ..... _..... --:- .

.. - ... ~. o 0 ci'

-45-

Figure 10. Ratios o£ Root Mean Square Errors:

RMSE(vHT)/RMSE(v~).


b) Enlargement o£ Lower Le£t Corner.


.. c)

go ~c) a:: ,_ .,

... ci

.. ... :& -~ .z

~

a) Complete "' c)

... ..... c)

-46-

Population Space • = = !!

• .. c)

.. N ... .... ~ co .... c:i ::1: c)

~ . a:: ... . "' '1"

0

= 0 N • ci ci .r---------------~-------------------,:

= !!

= .... 0

= !!

ci

.. ..,. d.,

"' ""!" 0

·. "!" --~~--... ..

~ g. ... ~~ • ..: --:·¥~~ -ct .

=

= :

!!

.,. Ol

~ a:: ,_ .,

0

... • 0 0

... 0> ::eo ~0 z ::e <§

-47-


b) Enlargement or Lower Lert Corner.

·-~ ....

.. .. ... <:!- .... <:!-.... .... ~ ., ~ ..

:-~;~:::. .

:I: ~ :I: ~ ~ ~ CD CD

s- -

"'

... ~-

.·

. t

.... "' ~ :I:

~ CD

.. *:

-48-

Figure 11. Ratios oT Root Mean Square Errors:

RMSE(vHT)/RMSE(vyG)·




•

1.?9

GAMN0RM48 I. 9 1.14

~,.,

99 1.412 _.,

1.91

. . ' . CAMNORM75

1.43

10 II

/ ·'

,., l

0.!9

• . 7 •

CAMNORM94 2.02

,. u

/ ~} ~

,o

0.?$

II

II

"

"·

1.~7

BICAMMA49 __ 1.211

' BICAMMA77

1.37

/·' ·'

.p

' BICAMMA97

2.01

/ ,,

1.)5

,.

II

10

qg

II

••

..

1.?6

STREAM 50 Ull

l.p

qo ../ Y.,!'

2

yL---\O 0.?7

10 II

•• II

STREAM99

P' -..,_,;

....... .. P'

Ct ..... 0 ;:l

(/). "0 P' () ~

••

"

I ~ <0 I.

-50-


b) Enlargement or Lower Left Corner.

<'f 0 N a> "!· II) Cl) a> ::;: on ;l; ...

~ "' <5 "!" ~ "!" ~-Q:

0:: -,_ 0:: ,_

G) "' ... "' VI .... CD

"' "1" "'!· ! "1-...

~-.

"?" ~

... ... en C!" " C!" " ... " "' ~ ~ ... ~ "' ;,;::.:~.=~~: ::;: ::;: ::::e "!·

. ·-. ~ < - ~ ~ Q Ql Gl al

:;. •!;. .., 0

~ ~ 0 o• ~ :: ... ., .., '~

..,

G) ... on ~ ... ... C!" " ., ::;: ::;: ::;: 0:: Q: -~ 0:: 0 0 q. £· z - z ::;: ::;: ::;: < < <3 Q Q

"' ... ... ... .!; ...

C!" "1" 0 .,. ;;;. 0;. ~

"' ... ... ... ~== ~ .. ... ,. .: -

-51-

Figure 12. Ratios or Root Mean Square Errors:

RMSE(vyG)/RMSE(v~G)· a) Complete Population Space.

b) Enlargement or Lower LeTt Corner.

(Contours plotted: 0.8, 1~0, 1.2, 2.0, 5.0)

0 ao ~ .

~

0

0

a)

=

=

Complete

~-

"!. 0

..... 0

"!·

-52-

~opulation Space.

::

0

=

Or--~~--

=

·~

;:

o ... o ..

... ~· ~-·~ v3

..

o .. o ... q.-

"! o .. q.

• G!-G!·O!'·

ci~=·

::

::

::

;:

o.-

.... 0

0!. 0

..

.... 0

.. ..

o..,

0 on

~ a: :;;

"' .... ~

~ Gl

-53-


b) Enlargement o£ Lower Left Corner.

d·

C!-

~·

~·

a·

"!-0

0~ .... a:J

~ "' .... ((I

So ~ s- a Gl

~·

~ -

o,.

0!;. ~. 0 0

·~ 0

·~ 0

"'~ 0

..... C>

;

s-

~ .... 0

~

... _

o,. -.... 0

"!-0

..... 0

0~

"!. 0

"'~ C>

:/ ..... 0

/\

~

s-

~ a· 0 0 ~_y

--

't

__} ~·

C!-

.... "!· 0 0

a> 0>

~ ~ II)

~ ~ m

.. a> ::::!: D: 0 2:

~-C)

"'~ 0

a·

~-

~-

So

s-

~·

"!o C>

"!· 0

0~

0~

--~ 0 0

-·

So

s-

S· s-"!o y 0

--

0_.

o:. 0!-0 0

-54-

Figure 13. Relative Bias of vHT"


b) Enla;gement of Lower Left Corner.

(Contours plotted are: -0.10, 0.0, 0.10)

GAMNORM48 BIGAMMA49 STREAM50 •• 0 J 0.10 •• 0.18 0.12 12 0.3. 0.15

:: '/0 :: . :: ~ • • ... 0 • • 0

0.)6 0.11 0.?6 I-'

0 • ~ 0.1V O.J5 ct . ~ ,

0 o.91 ~ I]

I-'

to tt t:l: 1 t l 4 ' t 1 e ' 10 " u {» GAMNORM75 BIGAMMA77 STREAM82 ;!'. ::[ ., ;r ::[ .... 7T :: ·~· ~

~ ~ ~ ~

0.~6 ll'

O.J8 O.)J / / -0,0> ,~ .,. 0.1/ /:o> :l,, op ~ ~ .,. / 1 •• ,. '7 / r .. ,. ,

~'~ ~ -o~,OO c::::J I I I I

tO 11 12 I t ) • I I 7 I t 10 t1 12 I 1 a t 10 t1 12

GAMNORM94 BIGAMMA97 STREAM99

::~ ·~ /1 ::I ... A' :: 10 /0 10 10 ~

/o :~ov 0.~1 o.JY / / -ope , L o.?&

o.p

<3 -® tO t1 12 tO II 12 ~.~~.---,~~.--~.~-.~-7,--~.---.~~ .. ~~.~.--7 ..

GAMNORM48 81GAMMA49 STREAM 50 . , I I •

' .; Q'

,/ 0"' "%j ......... 1-'•

O.J8 Oq

O.J8

~ I I ,, f 0.?2 tXl 1:: ::J '1

o.p O.J5 O.J4 ·98 O.?O ~ C1>

PJ 0~ '1 f-l

.; O.J6 oq ~ o.! 1 _,./(l.ga

·~ 2 O.J5 0.)2 C1>

8 ,......... O.QJ - C1> 0

o.ga o.g2 o.ga · o.gJ ' ::J 0 ct ::J.

ct 0 1-'•

I I "I ., ::J I 2 • • I 2 • • I 2 • • 1::

GAMNORM75 ' "BIGAMMA77 STREAM82 r C1>

0 0.. ~ .........

_0,. I I .;

I I I C1>

O.Q!I I '1

r O.J5 I I .l C1>

• ., 0.~0 ct I

0.!4 jO.(JS I I 0.?8 O.?O 0 C11 O.J5 0 0) :::::--·· '1 I

0.!8 o.p ::J 2 C1>

.I 2 -0,05

I

.,_...,___ '1 o.ga

-0,02 I I I

I 2 l • J I


O.?Jq2 0.~5 O.i6

O.J2 ,~v-

.~

0.?1 0.~4

-0,15

-57-

Figure 14. Relative Bias of VyG· a) Complete Population Space.


· CAMNORM48 81CAMMA49 STREAM50

:~ l 0.04 o.r :~ l -0.02 0.,2 :: 0.05 0.00

..l .. ~ .. ~ I !.-o 0

-0,04 -0104 0.,0 1 t -o,o& O.QI · O.r 1 t -0106 O.Q2 0. 2 ~ -o,oJ 0.91 -o,o1 1-'

• • (b

-0101 • 0.90 • -0,01 ct (b

-0103 4 O.Ql 4 O.QI 'i:l

• .P~J • o;fioJ • ~~5 .g 1 ~.{_ _01 O.Q2 1 ~~~o.p4 -0101 1 ~ O!> O.QJ J:: Oi$!P· o,o:>op o,atf', ~--'

I I 2 , 4 I t 1 I I 10 11 tJ I 1 2 l 4 S I 1 I I 10 tl 12 1 I 2 .S 4 S I 1 8 I 10 It 11 ~

::l . GAMN~~.~~5 0.13 :: [ BKlAM~-~7 -0101 :: t _,.,, "'"'"' I ! .. l .. ~ 10 ~

-O.oJ Pl . 0 (b' • I

, .. -o,oe o.Ql -,o, ,~-o,o1 0.91 -oo3 'lo,•o -o,o2 I &l O.Q1 , 0.90 O.Q I 1

• • -0102 . O.Q2 O.Q2 O.QI

O.Q2 0.00 -0.02 O.Q2 • . O.Q5 • -o,o1 -0 01

-o,os ~o OJ g.o5 o os ' > 0~~ 1 • 1 0 i!IJIOO > 0:1&"ID62

2 ~ OJ -oo~l -'0,04 2 ~ O.Q2 2 NA\ps o.oo .... . 01 • • • ' o. .

I . t I t 2 l 4 I I · 7 t t 10 t1 12 I 2 l 4 I t 1 I I 10 11 12 I 2 l 4 a I 1 I I IQ tt 12


:: ~ 0.92 -0~02 :: 0.04 0.03 ::

•• ~ •• 10

• -o,o1 -o,o 1

• O.QJ . O.Q2 0. 0 1 -0,02 O.Q1 0. J 1

-0,01 . -0,02 -0,01

• • -0.03 0.01 -0.03

' S ' I •

O.Q2 4 -0,01 4 O.QI

·~3 > 0~~1 > 0~~. 2 ~~~ 0·91 2 ~.00 °·92 2 d.d.~8~ OJ O.Q2 ~· "l'"'VV'/f- o,1~.9u

'0 " U 1 t J l 4 I • 7 •• 10 tl U t.!-,-,:!---,:!---'-,-7.-~.-~.-~.~. tO II 11

CAMNORM48 BICAMMA49 STREAM 50 ~~ ~- ~ ~~""-""'-'!Jt/'-"..1';1_~

cr "lj '-../ .....

-o,oa ()q -0101

I I -0,03

I I -o,os trl c

::s "1 -0,02 -o,oJ -0,04 -o,o2 -o,o3 -0,03 -o,os ..... (1)

P' -0,07 "1 """

-o,os -0,04

·r -0,07 -0,01

·r -0,10 -o,os ()q ~

(1)

-0,01 -0,04 -o,os 8 /""". (1) 0

-0,05 -o,os -o,os -0,01

-0,06 ::s 0 rt- ::s

rt-0 .....

I I • I

I 2 • I t I • • t-b ::s I • •

CAMNORM75 81GAMMA77 STREAM82 c t"' (1) 0 0. ~ '-../

(1) -0,02

.l "1

t"' -o,os (1)

• t-b -o,os -o,oe rt- I

-o,o7 -0,03 -0,04 I I -o,oa O.QI O.QO -0,14 -o,os 0.92 () CJt 0 co "1 I

-0,10 -0,04

·r -0,09 O.QO

·r -0,14

-o,os ::s -0,05 -0,03

(1) O.c;JO -o,oe "1

-0,11 -o,os -O.fJ7

-0,06 -0,02 -0,03 -0,12

-0,04

O.QI I

I • I I 2 • I • • • I


-o,&S,o2 -o,oe -o,os

-0.09 -0.03 -0.03 . . .. -0,04 -0,02 -0,01 -o,u -o,os -o,o1

-0,07 O.QO

O.QO

-0,05 -0,04 -0,10

-0.12 • -0,08

-0,03 -0.07 -0.05 • • -0,01 -0,02 -0,04

O.QI -0,10 090

II I ~~~--------~----------~--------~

CAMN0Rt.448 61GAMMM9 STREAM50 11 11 -003 0. 2 11 0.04 ~

11 " ..

•• •• tO

• t

. . -o1o2 -o,o4 o. o , -o1o4 0.91 o.fo , -o,o1 0.91 0.~2

-o,o3 0.91 -o,o1 • •

-o,o1 , -o,o1 , -0,01

-0,02 • 0.93 • 0.\)0

H·0oYfoo1 1 ~1 J ' 1 O'rQ/p4oJ ~ .~Oi)f91 0.92 I • !,n,o3 -0101 I .~~f 04 0.93 Qq ·¥o,Ol . 10.yr -olP• 1::

1 I I '1 t J ~ 4 1 t 7 t t It It tl I J .S • S I 1 ,I t 10 11 II I J .J 4 S I t I I II 11 II ('I)

GAMN0Rt.475 B~Mt.IA77 STREAM62 ~ :l -.. · T :f ... T J ~ I : -0,03 <II

....... a ~ I

-o,oz 0.93 -o 01 , -o,o3 (!.91 -o oJ , ·9' -o,o2 f!'. 8 0.92 0.91 0.91 < 1

• • <II -0101 1 O.CJ2 1 0.92 O.QI

O.QJ 0.91 ~ . . ~ . ~ ~ ~

I .c8:~2 tQ -o,o•. 1 1 11111 ~.oo 0.91 0 . .;~ -o,oJ ~

I I t, 1 I t It U ll I' t J • I I 7 I t to II tt t I I 4 I I t I t 10 II It <

CAMNORt.494 B~Mt.IA97 STREAM99 :I: ;.o

::r ·~· . T ::r ... T : --- -f' 11 tt to

I 0.90 -o,o2

• O.CJ7 0.92 -101 , 0.91 0.91 0. 2 ,

-o,o• -o,oJ -o,o1 • •

~ t ~ I ~

0.92 • -o,o1 • 0.91

1 .~IIA,o1 • .~8&.~1 0.91 I I • ~M 0.91 hJ 0 0 n2

• ""V<y'Z" I o, '6.~ J ·r

•• tt U It I .t • • • P • • •• U t• It I ~ • • a • a • •• u tl

• , <'"<~""'' ... ~ ~.. ••• . • ~·: ?~·

~. ~-· ..... , .... ,~

CAMNORM48 QIOAMMA49 STR£AM50 It o.o3 0.1 1 It -o 03 o. 2 11 o.o4 o.oo

II II II

tO tt tO

t f I

I I I

, . -o,o2 -o,Gti o. ~ , •o,04 ;o.92 o. o , -o,ot 0.90 o.t2 -o,o3 0.91 -o,o2

I I

~ I ~ I ~

•0102 ' 0.92 ' O.C]O

, Oo?Joot s ·102 s tr,Qf.4o3 ~ 1 .9o:of91 o.92 1 ~J l-4.03 -o,ot , -~~- 04 o.93 oq

.,o,Ol 11·' to.yr -o,O'P• = I I I '1 ll)tllfllllllll IIS411fllltllll IIS4IIfllllllll/)


:f ~ T :l ~ T :f "9' I : -0,04 ()

.......

·r ~ I , .; -0,02 0.94 -,02 , t •Op3 0.92 -,04 , .91 ·0,04 ~. e 0.92 0.91 0.92 < 1 . . ()

~ ~ ~ . ~

~ ~ 0.94 • ·0102 ·-o,o2 ~

-¥' I :~ . ' ':· ' ' ' ' I ' -·~ ~ ttuta ttJtltJittotttt: ttl•tltlttttttt(

CAMNORM94 BIOAMMA97 STR£AM99 -( :.-

::~ - T :i '¥ T : ~· II ' It II

.I 0.90 -o,to

• 0.91 0.95 _101 • I 0.91 -0,04 0. , ,

o.C]O -o,o2 o.q• I I

-0103 1 · O.QI 1 0.(]3

0.92 • -o,ot • 0.90 s l~s

O.QI I ·~·~.La O.QI I I ~~01 0.92

I . t•lltJ ltillllt.IOU12 f2J•SitlltiU11

1 f

-62-

Figure 17. Probability of a Sample with Negative v~T·



(Contours plotted: 0.05, 0.10, 0.20, 0.30)

;.;


] ~ T ::~ ~ -J :~ " I~ ()

0.90 0.92 0.,0 7 ~ 0.90 0.93 0.,1 7 ~ 0.<]0 ~ 0.,2 ~ 0.91 0.91 0.<]2 . . ......

(!) 0.91 . 0.91 . O.Q2 ct

~ ·~ I ·1 ·~ I . f ·~ I ~ : ~~ . . . ·~ . . . . :~~ ' ' ' ·: ' ' ' ' : ~~ ' ' ' ·:· ' ' ' ' . ! I 2 3 4 I I 1 . I I 10 11 11 I 2 ) t S I 7 8 I 10 1t 12 I J l 4 S I 7 I I 10 H U P'

r:: ... ~ T ::K \"'"~:· T :f ·? J ""'"" ~ ~ to . . 10 to ~ "0

~ 0 (!) I

O.C]O '\. 0.98 ~ 0.,2 7 O.C]O ~~·!3 0. 6 7 ·90 l 0·?4 1 • 0) 0.91 ~ ~ 0.<] ~··' c..,

• • I ~~ ·!7 ~~

O.Qf 5 5 0.00 D.\

O.C]O • O.Q1 • 0.00

··~ ·~~ . 0·90 I • ·~.oo 0·90 • ·~\J no -~ ·900.l,)tr 0.<]0

I 1 10 II II I 2 .S 4 f I 1 I I 10 11 12 I 2 l 4 5 I t I I tO 11 12

GAMNORM94 .BIGAMMA97 STREAM99 0.30 0.48

0.~6

0.14 o.t9

10 u 12 10 II 12

I I

I t

~

GAMNOR'-!48

0.90

0.90 o.qo

0.90 0.90

0.90 o.go 0.90

• , CAMNORM75

0.90

0.90 · 0.90 0.90

0.90 0.90 o.go O.QO

O.QO

O.Ql

0.'}2 O.C]J

O.Q2

O.QO

2

O.Q1

0.90

l

GAMNORM94

0~

0.90

BIGAMIAA49

I I o.qo

0.90 O.C]O 0.90

·t O.QO 0.90

o.go 0.90 O.QO

I • I 2

81CAMMA77

O.QO I I I .l

0.90

I I 0.90 0.91 0.92

·r o.qo O.Q1

O.QO

O.QO O.QO O.QO

o.qo t • t •

BIGAMMA97

o.p

STREAMSO

r::r "%] '-/ ....

o.qo ()q

I I O.C]O tll c::

::I "1 O.QO o.go ~ (1)

PI O.QO "1 ,...

()q ---1

·r O.QO O.QO (1)

o.go a ,.., (1) 0

O.QO ::I 0 c+ ::I

c+ 0 ....

t t l • >-h ::I , • •

STREAM82 c::

r (1) 0 0.. t '-/ (1)

I I I "1

r

·r (1) >-h o.go c+ I

O.QO O.C]I O.QJ 0 Ol 0 ~

"1 I

·r O.QO

O.Q1 ::I (1)

O.QO "1 O.QO

0.90

t l • t

STREAM99

.·.\:··· .. ·,'··:;

-65-

6. ANALYSIS AND DESIGN ISSUES

The exploration of the population space revealed some

potentially useful survey design and analysis

considerations for random-order, tps sampling. Given

information about the correlation, population centroid, and

distribution of x and y, the population space assessment

provides guidance on the choice of a variance estimator for

specified survey objectives. Example recommendations are:

1) if the population is such that vari-able

probability sampling has better precision than

simple random sampling, v~T should not be used;

2) none of the variance estimators work well if the

population is located near the extreme left edge

of the population space, but the random-order, tps

design is inefficient in this circumstance and

should be avoided (see later comments on shifting

populations out of this region);

3) vHT provides confidence intervals possessing good

coverage for most populations, sometimes at the

expense of positive bias and wider confidence

intervals than those obtained using vyG or v~G;

4) vyG is recommended over v~G since these two

estimators have similar properties and vyG is

easier to compute.

Of importance to survey design, the population space

analysis showed that shifting populations away from the

-66-

left edge of the population space ~esulted in improved

properties or the variance estimators (except v~T) and

improved efficiency or the estimator T,. A horizontal

population shirt is easily accomplished in the survey

design by adding a constant to all population x's so that

xi=x;+c, then sampling with inclusion probability

proportional to x*. The standardized variance plots

(Figure 4) provide guidance for advantageous population

space locations.

Shirting in the horizontal direction eliminates

extremely small ~'s, but deciding how far to shift the

population is a complication. Reddy and Rao (1977)

considered modifying the x values at the analysis stage to

improve precision or the estimator T,. Their theoretical

results may provide some information on how far to shirt

the population at the design stage. Ir small 1r's

detrimental to the precision or the estimators are not

eliminated at the design stage, strategies "scoring" the

small 1r's to a higher value can be employed to reduce MSE

(Overton and Stehman, 1987; Potter, 1988).

Vertical shirts or a population to a more desirable

region in the population space could also be considered to

improve estimates after the sample data have been

collected. Because the most drastic gradients in the

population space surfaces were usually pe~pendicular to the

horizontal axis, the advantage or a vertical shirt in a

-67-

population appear minor relative to the potential gains or

a horizontal shirt.

7. CONCLUSIONS

The population space assessment proved successful in

strengthening the conclusions available from empirical

studies, and in discovering associations or behaviors or

the variance estimators with characteristics or the

populations. Previous empirical studies (Cumberland and

Royall, 1981; Rao and Singh, 1973) did not reveal these

patterns because they focused on a more restricted set or

high correlation populations located near the standard

diagonal. The standard diagonal was a region or special

behavior, out more general conclusions were obtained in the

population space analysis by systematically exploring a

wide variety or structured populations.

Summarizing the important findings or the population

space assessment:

1) Properties or vya and v~G were virtually

2)

identical, so the simpler form vyG should be

used in practice;

vhrT performed the poorest or the four variance H.

estimators, and this estimator should be avoided;

3) The worst behavior or v~T was in the region or

the population space around the standard

diagonal, precisely the region or populations

-68-

examined in past empirical studies -- P.ast

emphasis on these populations contributed to the

perception that vyG was superior to vHT;

4) The perf'ormance of' vHT was f'ar superior to that

of' v~T' particularly f'or populations in'the

region of' the standard diagonal;

5) The extreme lef't edge of' the population space

was a region of' poor behavior f'or random-order,

vps sampling.

Patterns in the behaviors or the variance estimators

were consistent across all three 'families. Surraces f'or

the STREAM 'family were usually steeper, possibly because

the sampling Traction was higher ror this 'family. Although

only samples of' size 16 were investigated, the results

observed in the population space assessment were consistent

with results observed in previous empirical studies f'or

other sample sizes and populations (cr. Stehman and

Overton, 1987a; Rao and Singh, 1973).

The population space analysis is similar in philosophy

to a superpopulation model concept because a model was used

to generate the base populations ror the BIGAMMA and

GAMNORM 'families. The population ~pace results were,

thererore, representative of' a broad class of' populations.

But as in any empirical study, the results were dependent

on the particular realizations or the random variables

generated in creating the BIGAMMA and GAMNORM 'families.

-69-

The behavior surfaces represented a single realization of

these families, whereas, ideally, the mean trajectory or

surface would be described. Another source of variability

in the representation of the estimator properties was that

the behavior surfaces were estimated by simulation; that

is, the contour plots were not exact representations of the

true surfaces and were subject to some sampling

variability.

Theoretical comparison of the variance estimators in

variable probability sampling has proven very difficult.

The consistency of the variance estimator behaviors for

the three families indicate these behaviors to be general,

so a more general theory may be derivable, possibly even an

anal):'tic theory. Empir"ical identification or these

patterns is an important step towards development of

theoretical understanding.

ACKNOWLEDGEMENTS

Ron Stillinger provided invaluable help in implementing the computing and _graphics described in this report. John Carlile assisted with the contour plotting routines.

8. REFERENCES

Brewer, K. R. W. (1963). A model of systematic sampling with unequal probabilities. Austral.~ Statist. 5, 5-13.

Cumberland, W. G., and Royall, R. M. (1981). Prediction models and unequal probability sampling. ~Roy. Statist. ~ ~ B 43, 353-367.

Hartley, H. 0., and Rao, J. N. K. (1962). Sampling with unequal probability and without replacement. Ann. Math. Statist. 33, 350-374.

-70-

Horvitz, D. G., and Thompson, D. J. (1952). A generalization of sampling without replacement from a finite universe. ~ Amer. Statist. Assoc. 47, 663-685.

Johnson, N. L., and Kotz, S. (1972). Distributions in Statistics: Continuous Multivariate Distributions. Wiley: New York.

Kennedy, W. J., and Gentle, J. E. (1980). Statistical Computing. Marcel Dekker: New York.

Messer, J.J., C.W. Ariss, J.R. Baker, S.K. Drous~, K.N. Eshleman, P.N. Kaufmann, R.A. Linthurst, J.M. Omernik, W.S. Overton, M.J. Sale, R.D. Shon9rod, S.M. Stanbaugh, and J.R. Tutshall, Jr. (1986). National Surface Water Survey: National Stream Survey, Phase I== Pilot Survey. EPA-600/4-86-026, U.S. Envi ronmenta1 Protect ion Agency, Washington, D.C.

Overton, W. S. (1985). A Sampling Plan for Streams in the National Stream Survey. Technical Report 114, Department of Statistics, Oregon State University, Corvallis, Oregon, 97331.

Overton, W. S., and Stehman, S. V. (1987). An Empirical Study of Sampling and Other Errors in the National Stream Survey; II. Analysis of a Replicated Sample of Streams. Technical Report 11a, Department of Statistics, Oregon State University, Corvallis, Oregon, 97331.

Potter, F. (1988). Survey of procedures to control extreme sampling weights. To appear in Proceedings of the Section Qll Survey Research methods, American Statistical Association Annual Meetings, 1988.

Rao, J. N. K., and Singh, M.P. (1973). On the choice of estimator in survey sampling. Austral. ~·Statist. 15, 95-104.

Reddy, V. N., and Rao, T. J. (1977). Modified PPS method of estimation. Sankhya ~ Q 39(3), 185-197.

Sen,·A. R. (1953). On the estimate of the variance in sampling with varying probabilities. ~ Indian ~ Agric. Statist. 7, 119-127.

Stehman, S. V., and Overton, W. S. (1987a). Estimating the variance of the Horvitz-Thompson estimator in variable probability, systematic samples. Proceedings of the Section Qll Survey Research Methods, American Statistical Association Annual Meetings, 1987, pp. 743-748

Stehman, S. V., and Overton, W. S. (1987b). A comparison

-71-

of variance estimators of the"Horvitz-Thompson estimator in random order, variable probability, systematic sampling. Biometrics Unit Manuscript Bu-M ~' Cornell University, 337 Warren Hall, Ithaca, New York, 14853.

Stehman, S. V., and Overton, W. S. (1989). Pairwise inclusion probability formulas in random-order, variable probability systematic sampling. Biometrics Unit Manuscript Bu-M !QUa, Cornell University, 337 Warren Hall, Ithaca, New York, 14853.

Yates, F., and Grundy, P.M. (1953). Selection without replacement from within strata with probability proportional to size. ~Roy. Statist. ~ ~ B 15, 235-261.

Date post:	12-Feb-2022
Category:	Documents
Upload:	others
View:	1 times
Download:	0 times

An Empirical, General Population Assessment of - Cornell University

Documents