An Empirical, General Population Assessment of the Properties of
Variance Estimators of the Horwitz-Thompson Estimator Under
Random-Order, Variable Probability Systematic Sampling
Stephen V. Stehman
Biometrics Unit, Cornell University, Ithaca, NY 14850 USA
W. Scott Overton
Department of Statistics, Oregon State University, Corvallis, Oregon 97331 USA
BU-936-M Revised August 1989
An Empirical, General Population Assessment of the Properties
of Variance Estimators of the Horvitz-Thompson Estimator
Under Random-Order, Variable Probability Systematic Sampling
Stephen V. Stehman and W. Scott Overton_
Biometrics Unit, Cornell University and Department
of Statistics, Oregon State University
Technical Report 132
August 19d9
UEPARTMENT OF STATISTICS Oregon State University
Corvallis~ Oregon
ABSTRACT
Previous empirical studies of the properties of
variable probability, systematic sampling and the Horvitz
Thompson estimator, T,, have focused on specific, real-world
populations (cf. Stehman and Overton, 1987; Cumberland and
Royall, 1981; Rao and Singh, 1973). The study of special
case populations is recognized as important, but these
studies provide limited information that can be used to
generalize to other populations. By a systematic
simulation study of a specially designed set of
populations, we have extended the assessment of the
properties of V(Tr) and estimat~rs of V(Tr) is extended to
more general populations represented· by the population space.
The population space is a standardized representation of
bivariate populations with equal variance in each marginal
distribution. Each representation of this space has a
specific bivariate distribution shifted over the dimensions
of the space. Three bivariate distribution forms, each
with three correlations, were studied.
Two common estimators of V(T,) were investigated, one
proposed by Horvitz and Thompson (1952), the other by Yates
and Grundy (1953) and Sen (1953). The sampling design
stu~ied was random-order, variable probability systematic.
Both variance estimators require computing the pairwise
inclusion ·probabilities. Calculating these pairwise
probabilities requires immense computing time, so two
approximation form~las were studied, one due to Hartley and
Rao (1962), the other due to Overton (1985). The two
variance estimators were computed using each of the
pairwise inclusion probability approximations.
Behavior of the estimators was represented by contour
plots describing confidence interval coverage, root mean
square error, relative bias, and proportion of negative
estimates for the variance estimators over the range of
populations in the population space.· These plots provide
the basis of a descriptive theory for the properties of the
variance estimators. The population space approach
successfully serves as a bridge between strictly special
case, empirical results and a general analytical theory.
The approach also furnishes. a perspective in which more
theoretical results can be pursued.
1. INTRODUCTION
Properties o£ the variance and estimators o£ the
variance o£ the Hor-vitz-Thompson estimator are investigated
£or variable probab_i 1 ity, systematic sam_pl ing (herea-fter
denoted vps sampling). In a -finite population o£ size N,
assume that a response variable o£ interest, Yi, and an
auxiliary variable, xi>O, are de-fined £or each element o£
the universe. In variable probability sampling, a sample
unit is selected with probability proportional to x. We
will restrict attention to without replacement, -fixed
sample size schemes. The probability that the i'h
population element will be selected in the sample is given
by the. inclusion probabi 1 ity '~~"i = 2: PR(s), where PR(s) is {s:ies}
the probability o£ selecting sample s under sampling rule
R. The probability o£ selecting both the i'h and jth
population units is the pairwise inclusion probability,
1f'ii = L PR(s) • {s:(ij)es}
Horvitz and Thompson (1952) presented a general,
-finite population theory o£ .. estimation £or variable
probability sampling. The Horvitz-Thompson estimator, N
Tr=2: Yu/ru, is unbiaseo £or the population total, T,=2: Yi· UES . i=1
Two well-known estimators o-f the variance o-f T, are the
estimator proposed by Horvitz and Thompson,
VHT=E (;:y (1-1fi) +E E (1f'ij;.~i1f'j)~~;;~ i=1 i=lj::;fi I] I ]
and the estimator suggested by Yates and Grundy (1953) and
Sen (1953),
-1-
-2-
v - ~1 .@.. (1r(lf';-1fs;) (Y' - Y;)2 YG - LJ L.. 1r • • Y. 1r; ' i=1 j=i+1 IJ I
where the summations ror the t~o variance estimators are
over~the pairs or elements in the sample. Based on limited
investigation, the estimator vya is usually claimed
superior to vHT because vyG has smaller sampling variance
and is less likely to take on negative values (cf. Cassel
et~ (1977, p. 166), Cochran (1977, p. 261)). The evidence
ror the superiority or VyG is sketchy. It is known that
when the ratio ru = Yu/Xu is constant ror all u=1, ... ,N,
In this situation, vya:O, but vHT does not
identically equal 0; being unbiased, vHT must be capable of
negative values. Thus ror populations in which y is nearly
proportional to x, Vya would appear to have smaller
sampling variance.
Several empirical studies have shown advantages Tor
Vya· Rao and Singh (1973) studied 34 natural populations
using Brewer's probability proportional to x selection
procedure, and Cumberland and Royall (1981) examined
random-order, variable probability systematic selection
Tor 6 populations. Both studies round VHT Trequently
resulted in negative estimates, and that the sampling
variance or vHT was much larger ror many or their
populations. Stehman and Overton (1987a) presented some
simulation results showing that the advantages of vyG were
restricted to certain kinds or populations.
Clearly lacking in these comparisons of the two
-3-
variance estimators, vHT and vYG' is a general theory
describing their properties. A theoretical analysis o£
con£idence interval coverage, mean square error (MSE),
bias and proportion o£ negative estimates o£ the variance
estimators has not been done. A major complication in
development o£ an analytic theory is that the pairwise
inclusion probabilities depend on the particular Tinite
population x's. Writing speci£ically about variable
probability, systematic sampling, Brewer (1963) concluded
" ... the selection probabilities Tor the various possible
samples are £unctions o£ the sizes o£ ~11 the population
units and it is virtually impossible to construct an exact
general theory." A thorough empirical investigation o£ the
properties o£ these variance estimators is an intermediate
step toward development o£ a more general theory.
The population space assessment described in Section 3
was an extensive simulation study o£ a specially designed
set o£ populations. The sampling design investigated was
random-order, ~s with n=16. Approximation Tormulas Tor
the pairwise inclusion probabilities are commonly used in
practice. Also, since all populations in the simulation
study were relatively large (N>70), computing the exact
pairwise inclusion probabilities was not practical. Two
approximation Tormulas Tor the pairwise inclusion
probabilities were investigated,
(n-1) ;r.;r · il'i; = 1 ( ' 1) (Overton, 1985) ,
n-2 ;ri+;r;
and
-4-
(Hartley and Rao, 1962).
Further description of the pairwise inclusion probability
approximations and formulas for the estimators can be
obtained from Stehman and Overton (1987b, 1989).
Properties of the variance estimators were obtained by
simulation, using 5,000 replications of the sampling
procedure for each investigated population. The number of
replications was selected to provide a precise estimate of
the coverage probabilities obtained by the variance
estimators for constructing nominal 95% confidence
intervals for T,. With 5,000 replications, the standard
deviation of the estimated proportion of confidence
intervals covering the parameter is 0.003. Version 1.49 of
the GAUSS Mathematical and Statistical System (Aptech
Systems, Inc., Kent, WA) was used to run the simulations on
IBM XT or AT computers.
Several procedures were used to validate the
simulation programs and computing algorithms. Since T, is
unbiased for T,, the estimated expected value of Ty was
checked to make sure it was close to T,. The computing
formulas for the variance estimators were validated by
setting xi=1 V i=1, ... ,N,, and verifying that the estimates
matched those known for a simple random sample. If a
computing formula or algorithm was changed during the
course of the population space analysis, output from the
-5-
modified program was checked to ensure that the new
algorithm gave equivalent results to previously verified
algorithms. Finally, the results obtained from the
simulation programs matched those reported by Rao and Singh
(1973) and Cumberland and Royall (1981) for their
simulations using the same populations.
2. GENERATION OF PSEUDO-RANDOM NUMBERS
Random numbers from the Uniform(0,1) and standard
normal distributions were generated using the GAUSS
functions RNDU and RNDN, respectively. Gamma random
variables, G, were selected from a standard gamma
distribution (gamma distribution with parameter a, and
parameter~ set to 1). Only the standard gamma
distribution was considered because any other gamma
distribution can be obtained by scaling the standard gamma
(cr. Kennedy and Gentle, 1980). For integer-valued a,
gamma random variables were generated by the Tollowing
algorithm:
1) Generate ui TrO!Jl u ( 0' 1).
xi is an exponential random
variable with parameter 8=1. a
3) G = 2: Xd the gamma random variable is the sum of i=1
k independent, identically distributed
exponential random variables with 8=1.
Random variables for standard gamma distributions with non-
integer parameter, a, 0<a<1, were generated by algorithm
-6-
GS provided by Kennedy and Gentle (p. 213, 1980). Since a
sum o£ n independent standard gamma variables, each with
parameter ai, is distributed as standard gamma with n
parameter Ea1 , gamma random variables £rom distributions i=1
with parameter a>1 were generated by summing independent
gamma random variables generated by algorithm GS with"
proper choice o£ ai and n.
3. DESCRIPTION OF THE POPULATION SPACE
Three major £amilies o£ populations were studied, one
based on real data (STREAM), and two generated £rom known
probability distributions (GAMNORM and BIGAMMA). Within
each £amily, three di££erent sub£amilies representing low,
medium, and high correlations between the response
variable, y, and the design covariate, x, wer.e studied. A
sub£amily was then a set OT populations with the same
correlation, within ~he same major £amily.
All populations within a sub£amily were created £rom a
single base population. A sub£amily was created £rom the
base population by adding or subtracting constants to x
and/or y. Thus all populations in a sub£amily are the same
"cloud" o£ points shi£ted to various locations in the
(x,y)-plane. All populations within a sub£amily have
V,=V~, where v~ and v, are the population variances OT X
and y, respectively, and populations within a subfamily
also·have the same V, and the same correlation between x
I
'·
-7-
and y. Populations di££ering by an additive shirt in the
x's have di££erent inclusion probabilities. Additive
shirts in the x's are a £easible design tactic in some real
surveys, and this £eature oT sample design can be explored
conveniently in the context o£ the population space.
The variables x and y were standardized, X1 =x/..f\i; and
Y'=yf~, so comparisons would be invariant to the
measurement scale o£ the variables. The standardized
population centroid, (X''· Y1), was used to locate
populations within the population space. Note that
X'=X/..f\i;=l/cv(x), and Y'=Y/~=1/cv(y), where cv denotes
the population coe££icient o£ variation.
Con£idence interval coverage, ratios o£ root mean
square error (RMSE) o£ the variance estimators, and
relative .bias are invariant to the measurement scale o£ the
y's, and are, there£ore, the same in the original and the
standardized populations. The standardized population
space is also appropriate £or assessment o£ patterns o£
precision o£ T,. The standardized variance can be obtained
easily £rom V(T,), the variance o£ the Horvitz-Thompson
estimator £or the unstandardized variable y. For the
standardized variable Y'=y/~,
A convenient representation o£ standardized variance is
-8-
obtained by replacing V, by a quantity which is
proportional to V,, the variance oT T, Tor a simple random
sample.
4. FAMILIES OF POPULATIONS
4.1 STREAM Family
The Tirst Tamily of populations analyzed was
constructed Trom a subset oT the Phase I Stream Survey
Pilot Study data (Messer et~, 1986). A previous study of
this STREAM Tamily was reported in Stehman and Overton
(1987a). Some oT the results shown here supercede that
work.
Seventy-two oT the 100 units Trom the Pilot Study
sample were purposeTully selected yielding a base
population with correlation 0.82 between the response
variable, y=length of stream reach, and the auxiliary
variable, x=direct watershed area oT a stream reach. To
create the base populations Tor the two other STREAM
subTamilies, starting with the STREAM82 base population,
1) compute the least squares slope, p, and intercept,
a, oT the base population for subfamily STREAM82;
2) compute ei = Yi -yi, the residual Trom the least
squares fit, where Yi=a+Pxi;
3) let ei = k*ei (multiply the residuals by a constant
to obtain the specified correlation);
l l !
-9-
The values Yi were used as the response variable £or the
base population o£ the new sub£amily. Choosing k=2.5
resulted in a sub£amily with p=0.50 (sub£amily STREAM50),
and choosing k=0.25 resulted in ·a sub£amily with p=0.985
(sub£amily STREAM99). An advantage o£ this method o£
generating the base population was that the x's were the
same for each base population in the £amily. Thus the
simulations were "blocked" in that each sub£amily within
the STREAM family had the same £irst and second order _,
inclusion probabilities £or all populations with common X.
The STREAM £amily was created £rom data with unknown
population distributional properties, thus limiting our
ability to generalize to other populations. The empirical
study was expanded to include £amilies generated £rom known
probability distributions to permit broader understanding.
The next two £amilies examined were constructed to
represent distributions o£ random variables similar to
those likely to be encountered in pract1ce.
4.2 GAMNORM Family
For-the GAMNORM £amily o£ populations, x was randomly
generated from a standard gamma distribution with
parameters a=2 and l=l, and y was generated, conditional
on x, as a normal random variable. For each x 0 Yi was
obtained £rom the equation, Yi=Pxi+fn where fi was a random
variable distributed Normal (0, O'D, and 0'~= (1-p2 )0'~. The
-10-
infinite population notation is appropriate because cr2
denotes the population variance of the distribution of the
generated random variable. Once p(x, y) was specified,
the value of cr~ was fixed by the following argument. If
the relationship between y and x is given by
Yi = {Jxi+ei' (1)
then 0'~ = {J2cr~ + cr~ (xi independent of ei) ,_ and
CT=/CT~ = {J2 + CT~/CT~. (2)
Imposing the constraint that cr==crL {J=cr,; 11 /cr~=pu11 /cr,;=p, and
using equation (2),
{J2 = 1-crVcr~.
Solving (3) for u~ yields u~= (1-p2)cr~.
(3)
(4)
In practice, a set o£ x's was generated and V,;
calculated. Then V,; from the particular set o£ x's was
used instead of u~ in (4) for generation of the y's. A
subfamily base population was created by specifying p,
generating the set of e's, and forming the variable Yi from
(1). The target correlations were 0.5, 0.8, and 0.95, but
due to the random data generation, the realized
correlations were 0.48, 0.75, and 0.94.
The same set of 100 x's was used as the base
population for all three subfamilies. Using a single set
of x's again provided "blocking" on the inclusion
probab\lities of the ~s sampling design. The GAMNORM
subfamily base populations could have been created using
steps similar to those described in Section 4.1 to create
-11-
the STREAM subfamily base populations. This change would
have provided an additional level of blocking among the
GAMNORM subfamilies.
4.3 BIGAMMA Family
The third family studied consisted of populations
selected from a bivariate gamma distribution. Johnson and
Kotz (pp. 216-218, 1970) provide the following basic theory
for generation of bivariate gamma random variables. As
before, the standard gamma, with A=1, is used throughout.
If W0 , W1 , and W2 are independent random
variables, with Wi distributed Gamma(Oj), and
if X=W0 +W1 and Y=W0 +W2 , then X is distributed
Gamma(80 +81 ), Y is distributed Gamma(80 +02 ),
(X,Y) is distributed Bivariate Gamma, and
P (X, Y) = 80 [( 80 +01 ) ( 80 +02 )]-112 •
Generating bivariate random variables based on this result
permitted specifying p(X,Y) and provided marginal
distributions of X and Y that were both gamma
distributions. The parameter for both standard gamma
marginal distributions was a=2. For this parameter
specification, 01 =82 =2-80 , and o-~=o-i. Setting 81 =02 =0, we
obtain p(X,Y) =00 / (0+00 ). Then for a specified p,
0o=pt1 / (1-p). Finally, solving for 0 and 00 yields 0=2-00 ,
and Oo = 2p. 0 and 00 are the parameters needed to generate
the bivariate gamma random variables.
-12-
The algorithm used to generate a bivariate gamma base
population with speciTied p(x, y) and marginal gamma
distributions each with parame~er a-=2 was:
1) generate Wo distributed Gamma(2p);
2) generate wl distributed Gamma ( 2 ( 1-p) ) ;
3) generate w2 distributed Gamma ( 2 ( 1-p) ) ;
IT a population with large x values was generated so that
at least one oT the sampling units would be selected with
probability 1 in a sample oT size 16, that population was
discarded and a new base population was generated. The
three p's speciTied were 0.5, 0.75, and 0.95, and the
actual realized correlations were 0.49, 0.77, and 0.97.
For the BIGAMMA Tamily, a diTTerent set oT x's was
generated Tor each OT the subTamily base populations. To
obtain both marginal gamma distributions with parameter
a-=2, this bivariate random variable generation algorithm
required generating a new population OT x's Tor each base
population. To obtain blocking on the x's, another
algorithm Tor generating the bivariate random variables
would be necessary.
5. RESULTS OF POPULATION SPACE ANALYSIS
Notation: _, y-t X , population standardized means OT x and y
Ty Horvitz-Thompson estimator oT the population total
V(Ty) variance oT the Horvitz-Thompson estimator
-13-
Approximation Formulas
1r~!" approximate "formula "for 1ri,. (Hartley and Rao, 1962) IJ
1ri; approximate "formula "for 1rij (Overton, 1985)
Variance Estimators
vHT Horvitz-Thompson variance estimator
vyG Yates-Grundy variance estimator
v~T Horvitz-Thompson variance estimator calculated using 1r~
vHT Horvitz-Thompson variance estimator calculated using
v~G Yates-Grundy variance estimator calculated using 1r~
vyG Yates-Grundy variance estimator calculat~d using 1ri;
0 1rij
Scatter plots o-f the population located at
(X1 , Y1) = (7, 7) "for each sub"fa.mi ly are shown in Figure 1.
Population quantile-quantile plots "for the middle
correlation sub-families at (X', Y1):::: (7, 7) are shown "for the
variable x in Figure 2 and "for the variable y in Figure 3.
5.1 Comparison or Variance Estimators
The criteria "for comparison o-f the variance estimators
1) con-fidence interval coverage obtained by nominal 95%
intervals calculated as 1',±1.96*;
2) estimated MSE;
3) relative bias, estimated by
where E(v) was the simulated expected value o-f v, and
V(T,) was an unbiased estimator or V(T,) obtained "from
the simulations;
-14-
• hr 4) probability o£ a sample resulting in negat1ve vHT·
The behavior sur£aces were described by a battery o£
contour plots generated by the interpolation and contour
plotting routines supplied by the SURFER so£tware package
(Golden So£tware, Inc., P. 0. Box 281, Golden, Colorado).
The kriging option in SURFER was selected to create a
regularly spaced grid £rom the irregularly spaced input
data. The octant search option in SURFER, using the 10
nearest data points, was used Tor interpolating grid
points.
5.2 Interpretation o£ Contour Plots
To guide the reader's interpretation o£ the contour
plots, certain important £ea~ures o£ the plots will be
highlighted. The standard diagonal serves as a convenient
spatial re£erence. Although many details o£ speci£ic plots
are discussed in Sections 5.2.1 through 5.2.5, £ocus on the
overall patterns should be maintained. Figure~ 4 through
17 are organized such that each column on a page represents
a £amily, and the rows represent subfamilies, arranged in j.
the column by increasing correlation. Enlarged plots o£
the lower leTt corner are included to show the detail in
that portion or the population space. All plots are
located at the end or Section 5.
-15-
5.2.1 Standardized variance
The standardized·variance compared the variance or Tr
under random-order, vps sampling relative to the variance or
Tr under simple random sampling (Figure 4). The qualitative
pattern or standardized variance was similar for all three
families. The standardized variance surrace·was highest
along the left edge or the population space, then sloped
downward moving diagonally from the upper left to the lower
right corner. A trough or minimum standardized variance
was located near the standard diagonal for the medium and
high correlation subfamilies, but was clearly below the
standard diagonal for the low correlation subfamilies. The
surface sloped gradually upward out or this trough when
moving toward the lower righthand corner.
The region in which variable probability sampling was
more efficient than simple random sampling was larger Tor
high and medium correlation subfamilies compared to the low
correlation subramilies. The contour showing equal
precision Tor vps sampling and simple random sampling was
almost directly over the standard diagonal for the three
low correlation subfamilies, and the advantage or variable
probability sampling increased with p(x, y). In the upper
left region or the population space, variable probability
sampling was much less efficient than simple random
sampling.
-16-
5.2.2 Confidence interval coverage
The main results observed from the contour plots
(Figures 5-8) or observed confidence interval coverage
(nominal 95% intervals) obtained from each variance
estimator were:
1) Vya and v~G provided similar coverage over the
entire population space;
2) v~T provided the poorest coverage of the 4
estimators studied;
3) vHT provided close to the nominal 95% coverage
for most or the population space, but the pattern
or vHT coverage differed from the pattern shown
by vya and v~G;
4) coverage was poorest along the extreme left
edge or the population space ror all variance
estimators except V hr • HT'
5) the relier or the surfaces increased with
subfamily correlation;
6) qualitative patterns in coverage were similar
ror all three families.
For most or the population space, coverage provided by
vHT was near the nominal 95% (Figure 5). Regions or low
coverage occurred along the extreme left edge or the
population space, and in the high correlation subfamilies
with small ?'. A wide plateau or high coverage extended
rrom the upper right corner toward the lower lert corner,
narrowing toward the origin roughly parallel along the
-17-
standard diagonal .. The coverage surTace sloped steeply
downward oTT the leTt edge OT the high plat~au, the
contours nearly parallel to the vertical axis. Another
sharp decline in the surTace occurred along the standard
diagonal in the region near the o.rigin. The downward slope
oT coverage OTT the high plateau was muc.h gentler toward.
the lower right region oT the population space. The ·
gradients OT the coverage surTaces were steeper as the
subTamily correlation increased. Regions oT coverage
provided by vHT that were higher or much lower than the 95%
nominal level were associated with regions oT large
positive and large negative relative bias oT vHT"
Coverage provided by v~ was generally much worse than
coverage provided by vHT (Figure 7). v~T had very poor
coverage in the region surrounding the standard diagonal.
This region oT poor coverage OT v~T w~ associated with a
region OT high probability oT negative estimates (see
Figure 17). Coverage levels Tor v~T were unacceptably low
Tor the high correlation subTamilies.
The coverage provided by vya and v~G (Figures 6 and 8)
was 93 or 94% Tor most oT the population space. vya and
v~G had lower coverage than vHT along the extreme leTt edge
OT the population space, but both Yates-Grundy based
estimators improved on the coverage oT vHT in the high
correlation subTamilies in the region near the horizontal
axis.
-18-
5.2.3 Ratios of RMSE
Comparison of the variance estimators on the criterion
of RMSE (Figures 9-12) was based on selected ratios of
RMSE's. The main features of the RMSE comparisons were:
1) v~G had smaller RMSE than v~T for most of the
population space, but RMSE of v~T was less than
or equal to RMSE of v~G in some regions of the
population space;
2) vyG was almost always smaller in RMSE relative
to vHT;
3) vHT was far superior to v~T along the standard
diagonal, and never much poorer than v~T in any
region;
4) VyG and v~G had very similar RMSE, with vyG
having slightly smaller RMSE in populations
located near the origin;
5) ratios of RMSE's showed greater variation over
the population space in the high correlation
subfamilies compared to the low correlation
subfamilies.
The surface of the ratio RMSE(v~T)/RMSE(v~G) was
roughly symmetrical about the standard diagonal. A region
of high ratios extended along the standard diagonal sloping
downward from the upper right to the lower left. The
gradient of this slope increased with the subfamily
correlation, and was particularly steep in STREAM99.
-19-
Throughout much o-f the space pictured in Figure 9, v~G had
smaller RMSE than v~T· Near the origin, v~T had RMSE less
than or equal to v~G in the low correlation sub-families, so
v~G was not uni-formly superior to v~T on the basis o-f the
RMSE criterion. However, v~G was never much worse than v~T
in terms o-f RMSE, while v~T could be extremely poor
hr relative to vyG• Since relative bias was nearly zero -for
both v~T and v~G' these RMSE comparisons were essentially
equivalent to variance comparisons.
RMSE o-f vHT was less than the RMSE o-f v~T in the
region surrounding the standard diagonal (Figure 10). A
prominent -feature o-f the sur-face o-f the ratio o-f RMSE o-f
vHT to RMSE o-f v~T was a deep, V-shaped trough along the
standard diagonal sloping downward and widening toward the
upper right corner. This trough was deepest in the high
correlation sub-families as the RMSE advantage o-f vHT
relative to v~T increased with the sub-family correlation.
RMSE o-f v~T was smaller than RMSE o-f vHT along the le-ft
edge o-f the population space, and in the region near the
origin o-f the low and medium correlation sub-families.
Regions o-f superiority o-f vHT relative to v~T corresponded
to the regions in Figure 9 where v~G was -far superior to
hr VHT• Thus the approximation 1ri; improved the RMSE o-f the
Horvitz-Thompson variance estimator in those regions o-f the
population space where v~T had high RMSE.
RMSE o-f VyG was smaller than RMSE o-f vHT -for most o-f
-20-
the population space (Figure 11). The surface or the ratio
or RMSE's or vvG to vHT decreased gradually from the upper
left corner to the lower right corner or the population
space, the contours or the surface running roughly parallel
to the standard diagonal. The detailed plots or the region
near the origin indicated a U-shaped ridge sloping
gradually downward toward the origin along the standard
diagonal. Although the gradients in the surfaces increased
with correlation, the surraces were less steep than those
observed in Figures 9 and 10.
-v~G and vvG had almost identical RMSE throughout the
population space (Figure 12). vyG had slightly smaller
RMSE in populations located near the origin. This pattern
was consistent for all three subfamilies.
5.3.4 Relative Bias
Or the four variance estimators investigated, only vHT
displayed a significant relative bias (Figure 13). The
other three variance estimators were nearly unbiased
(Figures 14-16), with the exception that relative bias or
VyG was above -0.10 along the extreme left edge or the
population space near the origin for two or the STREAM
subfamilies. The pattern of relative bias or vHT was
similar in all families. Relative bias or vHT decreased
from a high positive value in the upper left region to a
high negative value in the lower right region, and the
-21-
magnitude of the bias was largest in the high correlation
subfamilies. vHT was unbiased in a band of the population
space located just below and roughly parallel to the
standard diagonal. The strong pattern in the bias of vHT
suggests that an adjustment of the estimator to make it
unbiased may be available.
5.2.5 Probability of Negative vUT Estimates
For all samples obtained in the simulation study, vyG
and v~G were non-negative, and negative vHT estimates were
extremely rare. Only v~T was subject to frequent negative
estimates (Figure 17). Negative v~T estimates were rare
for the low correlation subfamilies, but increased in
frequency as the subfamily correlation increased. Negative
estimates were infrequent in all subfamilies along the left
edge of the population space and along the horizontal axis.
The probability of a negative estimate was highest along
the standard diagonal, and increased along this diagonal
from the lower left to the upper right.
-22-
LIST OF FIGURES
1 Scatter Plots of Subfamily Populations at Standardized Centroid (7,7).
2 Population Quantile-Quantile Plots of the Variable x for the 3 Families.
3 Population Quantile-Quantile Plots of the Variable y for the 3 Families.
4 Standardized Variance, V(T,)/VsRs·
5 Confidence Interval Coverage Obtained using vHT (nominal 95% intervals).
6 Confidence Interval Coverage Obtained using 0 VyG (nominal 95% intervals).
7 Confidence Interval Coverage Obtained using vhr -HT
(nominal 95% intervals).
8 Confidence Interval Coverage Obtained using vhr YG (nominal 95% intervals).
9 Ratios of Root Mean Square Errors: RMSE(v~T)/RMSE(v~G)·
10 Ratios of Root Mean Square Errors: RMSE(vHT)/RMSE(v~T)·
11 Ratios of Root Mean Square Errors: RMSE(vHT)/RMSE(vyG)·
12 Ratios of Root Mean Square Errors: RMSE(vyG)/RMSE(v~G)·
13 Relative bias of vHT·
14 Relative bias of 0 VyG•
15 Relative bias of hr VHT•
16 Relative bias of hr VyG•
17 Probability of a Sample with Negative hr VHT•
-24-
~ ~ ~
~ ~ ~
:I: :I: :::E < < <
II) w co w II) w a: a: a: ... ... ... en en (/)
... <0 ...
... ... ... Zl Ol 8 8 t Zl OL 8 9 t iL OL 8 9 " QS"'•JJOO ZS"•JJOO 66"-JJOO
Clot Cll Cll - -~ ~ ~
CD ;
CD ;
CD i < < < ~ S! S!
Gl Gl ~~i:,:
>~-.~~~:::. OD OD OD
... ... ... Zl Ol 8 9 " Zl Ol 8 9 " ZL Ol 8 9 " 6tr••JJOO u--JJOO LQ•JJOO
~ Cll ~ -0 ·o -
:::E :I: :I: a:· 0:· a: ~ 0 0 II) II) z .., z ::e :::E :I: < < < 0 0 0
... II)
"'
~ .... ., Zl Ol 8 9 .. lt Ol 8 9 .. Zl Ol 8 9 ..
Qtr"O•JJOO SL"O•JJo:> 1>6"G-JJO:)
:-- ~- ~ ;,~ : : : ..
-25-
,.-,.- • • •
• U) •• 0 (X) .;· ,.- • •
U) • " 0') ·" C\1 :E (X) .. ., ... :E U) t a:
<l: " " 0 (X) • .I I w z
/ a: ·' :E 1-
<l: " (/) U) .-(!} <0 I
<0 , I
• U) U)
U)
5 6 7 8 9 10 11 5 6 7 8 9 BIGAMMA77 BIGAMMA77
• • • • U1 • • .... (X) ~
• C\1 (X) , ~ U)
' <l: " ,
w I
a: J 1-(/) U1 ~ ..
<0 J , 1.()
;' . u; 5 6 7 8 9 10 11
GAMNORM75
Figure 2. Population Quantile-Quantile Plots of the
Variable x for the 3 Families.
(Middle correlation subfamilies at (X', Y 1) = (7, 7).)
•
•
10 11
lO r--. :E a: 0 z :E <( (!J
-26-
0 0 • ,..... ,.....
·' • • m m • • •• ... (\J ,. -· (X) (,/ ... - (X)
:E CX) •• <( ' r--. w ~
a: r--. / ._ <0 (f)
J ,j J <0 lO #> • -.;:t . . lO •
5 6 7 8 9 10 11 5 6 7 8 9
BIGAMMA77 BIGAMMA77.
0 ,.....
• • m • • (\J ;~· CX) _. :E (X)
I <( ,. w
I a: ...... ._ (f) .. / <0 •
• #' "'·
lO •
4 5 6 7 8 9 10 GAMNORM75
Figure 3. Population Quantile-Quantile Plots of the
Variable y for the 3 Families.
(Middle correlation subfami 1 ies at (X1 , Y1) = (7, 7).)
•
10 11
;F. -: ... ·,.··A}~·-· .. · :~·-.:~.:;4:
-27-
Figure 4. Standardized Variance, V(T,)/VsRs·
a) Complete Population Space.
b) Enlargement o£ Lower LeTt Corner.
(Contours plotted: ~.25, 0.50, 1.0, and 3.0.)
I .
·GAMNORM48
•• •• " /'., " ID 10
• •
10~51 / , 10~95
,, •
CAMNORM75
12 f " 10
•·95
0.12
o.tJ
. . . ....... . ....... '., ~. ~. ~-•;. :.: -: .... ~:·-:;_:_::::::~-~---.
{
/
x· ~--
5(:
BIGAMMA49 2.08 0. 5
0
/· .. ·" ./
BIGAMMA77 1,zs 0.45
'} I /I ' I
_/O;tf" o.t4
BJOAMMA97 0.13
... '\: .,.·~::~:-::~;·:~---::::.!:7"-~ ...
STREAM 50
II 2.
" ·" 10
/ Pl '-'
• 0 • 0 , 11~54 o. 2 a , ,, ..... • / ' (1)
c+ (1)
"'C 0 , I: .....
" .. ~ STREAM82 .... Ill 0
s.p (7 I ::s Cll "
10 , Pl 0 (1)
I . 7 ~11,.Z4 Iff U.'i:> I t-.j
00 I
STREAM99
llr-----------------,----;r-r~---;
" 10
18.,38
;····
.. :. ~~:: > 0'1 ... < ::& ::&
~ CD
-29-
Figure 4 (continued)
b) Enlargement of Lower Left Corner.
.... 0!•
"' ~·
"' 110
I
,... .... ~ ::::E
~
.. "!" 0
,... 0'1
~ ::::E
~ CD
... ., ::::E 0:· o. z
~
-30-
Figure 5. Confidence Interval Coverage Obtained
using vKr (nominal 95% intervals).
a) Complete Population Space.
b) Enlargement of Lower Left Corner.
(Contours plotted: 85, 90, 93, 95, and 97)
::~~- :·:--·~·:·:->;•.·~:::'· . ··-:-:··. :~;' .. _ ..... :.•_ .. , •.. .;:~·--.. -:~~-.:),:~,:-·
CAIANoRJ.I48 9S
\ 94
BICAMJ.IA49
I :: [ I J. 9~
I 9~
STREAM~O
I ::o I I 97
I ) 9r ~ "-'
0 0
9,~ / ' , ~~ r.• r 9,6 / r ~ 1-' (1)
'.~I ..1 .l I I 1 ~Pr •• -1 • ~ \ I I ·.· / .A c+ (1)
"t1 0 '0
10 It 11
12 t
I 1 J t t I
17-1 J • 7 • 'I' 10 II
:t 7 I i~~:' ~: 1 2 c
II
l 4 t . I ......
10
J I t 10 11 I ~
]((((~) /l! ~ 0
.• ~------/' 93 :~ l)) r: /J ~/./I~
I
~ 9,2
~
:~ JJ) 1-'
ft~ I
3 ~· ~2 , . 9,2 ~ ...__._j
to n u I
I 2 l 4 I I t I I 10 tl I
BIGAMMA97. 2
t2l4il71 I 10 tt _,
STREAM99
l 97 93
II
10
.. ;:/~ t ~~ (·( '! .. • .:/·
I~ I 2):; 1">. ~ • 'J • I I II 11 u 10 11 IJ
CAMNORM48 BICAMMA49 STREAM 50
cr "%j '-" ""' oq tx:l t:: ::l '1 ....... Cll
8,81 \ ~.· . ,,. 1 1 1 I I T .....! ~ '1 CJ1
oq Cll ,....... 8 0 Cll 0 ::l ::l ct ct
""' 0 ::l I I I
I J • "'+! ~ I • J • I 2 J • I
Cll CAMNORM75 BICAMMA77 STREAM82 t"' 0.
0 '-" ( Cll
9,4 I I I I / I II I I I I '1
t""' Cll "'+! ct I
6,21 I ~f 9;4 I I e~ I 19() 9,4 J 1 \ s.e\ \ \ -:l·;)~ I 0 ~
0 tV '1 I ::l Cll '1
11 -........... ,.......,.~ ,.....-=::I I a a • ~~L----------~~-------7----====~
CAMNORM94 BICAMMA97 STREAM99
r:.:. ~
r
-33-
Figure 6. Confidence Interval Coverage Obtained
using vvG (nominal 95% intervals).
a) Complete Population Space.
b) Enlargement of Lower Left Corner.
(Contours plotted: 85, 90, 93, 95, and 97)
CAMNORM48 STREAM 50
pl -._;
('.) 0 8 '0 '"'"' (b ct
·(b
"tt 0 '0 r:: '"'"' BICAMMA77 STREAM82 ~ ....
.... I I I I Itt- I I I I "tl I ~."'! I 0 ::s Cll '0 pl (') (b
I (.1) If:>. I
10 " .. CAMNORI.t94 "BIGAMMA97 STR£AM99
'" ... 11
II
10
.I
•( I / 9,3
9,4
9,3
· · •· ··•· ··-~1""'211!!'<>. r.
CAMNOR1148 81CAMW.49 STREAM~O
CJ" ~ ..._, .... Q'q
tt:l c ::l ., ...... (b
PI ., 0) Q'q (b ,-... 8 0 (b 0 ::l ::l ct ct ....
I I I 0 ::l I 2 J • I 'l , • I l , • o-f) c
CAMNORM75 81CAMMA77 STR£AM82 (b
r 0.. 0 ..._, ~ (!) ., t"" (b o-f) ct
I
0 w 0 01 ., I
::l (!) .,
~~ .
I I J • I
t I . ·. , • I l
GAMNORM94 BIGAMMA97 STR£AM99
2.
~~L---------~~----------~-----------J ·~~----------~.~~--------~.----------~
-~·
-36-
Figure 7. Con'fide.nce Interva.l Covera.ge Obta.ined
using v~ (nomina.! 95% intervals).
a.) Complete Population Spa.ce.
b) Enlargement o'f Lower Le'ft Corner.
(Contours plotted: 70, 80, 90, 93, and 95)
~:~
CAMNORM4S B~MA49 , 1 a1 as ., ee .,., ,. I 7 I I 7 """""I
·:l/ ~ I Ill '-'
0 0 a '0
.. I ,_. - ('b
c+ ('b
'"0 0 '0 s::
tl It ......
CAMNORM75 BICAMMA77 STREAM82 ~
~ , ::r Q\ I ] '·')) ~! '0
~ 8,7 \1 &' ........-!~ 1
(.1.)
I~ /1 • I I I
tt.s•tt11ttOUt2 tO II 12
OAMNORM94 BICAMMA97 II 69 ~~
10 tt 12
~
-~:·::
ta " tt
STRfAM99
~ I
.. ,
CAMNORM48
I I I ) •
CAMNORM75
I I I s •
GAMNORM94
11.7 f e.o J I
ft. II
~ 811
~
BICAMMA49
I I I )
61GAMW.77
I I I •
BICAMMA97
G I e.4 7,11 11,3
·r 8,5
7,1
11,7
·@ :'3:·. ~:·
•
•
STREAM 50
cr "%j '-" 1-1•
oq [?:j ~ :::1 "1 .... (b
Pl "1 -..1
Qq (b ,...._ 8 0 (b 0 :::1 :::1 ct ct
I I s • 1-1•
I 0 :::1 STREAM82 1-+, ~
(b
t"" 0. 0 '-" ~ (b
"1
t"" (b 1-+, I ct ~
00 0 I 0 "1 :::1 (b
"1
1 l ~ I .
I
STREAM99
. 0~ 5,11
2 ~ 5,9
6,5
-39-
Figure 8. Confidence Interval Coverage Obtained
using v~G (nominal 95% intervals).
a) Complete Population Space.
b) Enlargement oT Lower LeTt Corner.
(Contours plotted: 85, 90, 93, 95, and 97)
CAMNORM-48
CAMNORM75
CAMNORM94 94
9,4
BICAMMA49
BICAMMA77
BICAMMA97 93
.... -~.~
STREAM 50
~ ..._,
0 0 8 "0 ~
(b
ct (b
"0 0 "0 ~ ~
STREAM82 ~
::tr ('(,,, I
~· 0 ::3"
(/) "0 ~ (') (b
I ~ 0 I
10 " ,,
STREAM99
ltn--------r--------~-----------------,
II
10
IS,9 9,3
---·--- --~- ·-··---· ----,
.... ~,·· Cit ... ~ :IE
~ m
co ... :IE a:· 0 z
~
-41-
Figure 8 (Continued)
b) Enlargement OT Lower Le£t.Corner.
... ,... .·.· ~ "' ~ ~·,:~~ .. j·~~:.
:IE :I: ~ ~ m m
-42-
Figure 9. Ratios of Root Mean Square Errors:
RMSE(vUT)/RMSE(v~G)· a) Complete Population Space.
b) Enlargement of Lower Left Corner.
(Contours plotted: 0.8, 1.0, 1.2, 2.0, 5.0)
CAMNORM48 BICAMMA49 STREAM~O .. ~ T .. .... .... .... . ... ::[ ::l I .. l '\ I ~
0 0 a
l,.f5 119 ' L i!:tS \ 1 .~9 I J 9 ' ["b':i~ \ 2.~6 ~p ' J '0
~.~ ;; 1.p ____..- ~\ qs ~ 1.~9 ct . ' ~ ~
1.91 • ' 1.94 . '"0
·~· s0.9 1 s 0 ... .(Jilt ~· 1.0 I .~ 1.0~1.0 'g
.~90. t-'
1 1 '" I I l 4 J t • .' I . t 10 11 12 t 2 l • S. I 1 I t 10 tt 12 rl-CAMN0RM75 BICAMMA77 STREAM82 ,_.. " '·" J " '·" g t ~ " " 10 ·.... .. ~ .w ~ t """ 4.?1 ~ ~u ~u ~
• I ·1.1• q1 , L ~4 \ 1.~4 1. 4 , s.~e ~
II
" 10
1.0
OAMNORM94 5.74
e.F
~
" 12
~39
2.to
t.o"""' ~i.o
10 11 12
11
" ••
:'3:·. X:'·
l.t9
BICAMMA97 s.ee
I I •H I I I
10 11 12
STREAM99 !.Q;15
10 " 12
0 t()
~ a: In
"' .. ~ :It
2 CD
-44-
Figure 9 (Continued)
b) Enlargement oT Lower LeTt Corner.
.., "!' 0
~j "' -0
.... -0
"' -0
... -0
.., ... d "!' ... 0
"!' 0 ... ~
.. "!' 0
"' ~ .., .. - d ci
&Q:;. ...: - 0
... d ... -0
..... "!' 0
<n -0
... -0
... «< ~
:i a: Iii
" ~ :It
2 CD
... -0
"' d
.. -ci
... -0
·o OJ" 0·
.., ~
.,. -0
.. Ill -0
"' .... 0
• -0
... ~
.... O>"!' ""0 0
" d ... -0
... ~
"' .... 0
.... -0
.. .,.. ci
" ~ :It
~ Cll ~)
~ ... fi .,;
,.~ ~ ...
5·2~ ___ ..... _..... --:- .
.. - ... ~. o 0 ci'
-45-
Figure 10. Ratios o£ Root Mean Square Errors:
RMSE(vHT)/RMSE(v~).
a) Complete Population Space.
b) Enlargement o£ Lower Le£t Corner.
(Contours plotted: 0.8, 1.0, 1.2, 2.0, 5.0)
.. c)
go ~c) a:: ,_ .,
... ci
.. ... :& -~ .z
~
a) Complete "' c)
... ..... c)
-46-
Population Space • = = !!
• .. c)
.. N ... .... ~ co .... c:i ::1: c)
~ . a:: ... . "' '1"
0
= 0 N • ci ci .r---------------~-------------------,:
= !!
= .... 0
= !!
ci
.. ..,. d.,
"' ""!" 0
·. "!" --~~--... ..
~ g. ... ~~ • ..: --:·¥~~ -ct .
=
= :
!!
.,. Ol
~ a:: ,_ .,
0
... • 0 0
... 0> ::eo ~0 z ::e <§
-47-
Figure 10 (Continued)
b) Enlargement or Lower Lert Corner.
·-~ ....
.. .. ... <:!- .... <:!-.... .... ~ ., ~ ..
:-~;~:::. .
:I: ~ :I: ~ ~ ~ CD CD
s- -
"'
... ~-
.·
. t
.... "' ~ :I:
~ CD
.. *:
-48-
Figure 11. Ratios oT Root Mean Square Errors:
RMSE(vHT)/RMSE(vyG)·
a) Complete Population Space.
b) Enlargement oT Lower LeTt Corner.
(Contours plotted: 0.8, 1.0, 1.2, 2.0, 5.0)
•
1.?9
GAMN0RM48 I. 9 1.14
~,.,
99 1.412 _.,
1.91
. . ' . CAMNORM75
1.43
10 II
/ ·'
,., l
0.!9
• . 7 •
CAMNORM94 2.02
,. u
/ ~} ~
,o
0.?$
II
II
"
"·
1.~7
BICAMMA49 __ 1.211
' BICAMMA77
1.37
/·' ·'
.p
' BICAMMA97
2.01
/ ,,
1.)5
,.
II
10
qg
II
••
..
1.?6
STREAM 50 Ull
l.p
qo ../ Y.,!'
2
yL---\O 0.?7
10 II
•• II
STREAM99
P' -..,_,;
....... .. P'
Ct ..... 0 ;:l
(/). "0 P' () ~
••
"
I ~ <0 I.
-50-
Figure 11 (Continued)
b) Enlargement or Lower Left Corner.
<'f 0 N a> "!· II) Cl) a> ::;: on ;l; ...
~ "' <5 "!" ~ "!" ~-Q:
0:: -,_ 0:: ,_
G) "' ... "' VI .... CD
"' "1" "'!· ! "1-...
~-.
"?" ~
... ... en C!" " C!" " ... " "' ~ ~ ... ~ "' ;,;::.:~.=~~: ::;: ::;: ::::e "!·
. ·-. ~ < - ~ ~ Q Ql Gl al
:;. •!;. .., 0
~ ~ 0 o• ~ :: ... ., .., '~
..,
G) ... on ~ ... ... C!" " ., ::;: ::;: ::;: 0:: Q: -~ 0:: 0 0 q. £· z - z ::;: ::;: ::;: < < <3 Q Q
"' ... ... ... .!; ...
C!" "1" 0 .,. ;;;. 0;. ~
"' ... ... ... ~== ~ .. ... ,. .: -
-51-
Figure 12. Ratios or Root Mean Square Errors:
RMSE(vyG)/RMSE(v~G)· a) Complete Population Space.
b) Enlargement or Lower LeTt Corner.
(Contours plotted: 0.8, 1~0, 1.2, 2.0, 5.0)
0 ao ~ .
~
0
0
a)
=
=
Complete
~-
"!. 0
..... 0
"!·
-52-
~opulation Space.
::
0
=
Or--~~--
=
·~
;:
o ... o ..
... ~· ~-·~ v3
..
o .. o ... q.-
"! o .. q.
• G!-G!·O!'·
ci~=·
::
::
::
;:
o.-
.... 0
0!. 0
..
.... 0
.. ..
o..,
0 on
~ a: :;;
"' .... ~
~ Gl
-53-
Figure 12 (Continued)
b) Enlargement o£ Lower Left Corner.
d·
C!-
~·
~·
a·
"!-0
0~ .... a:J
~ "' .... ((I
So ~ s- a Gl
~·
~ -
o,.
0!;. ~. 0 0
·~ 0
·~ 0
"'~ 0
..... C>
;
s-
~ .... 0
~
... _
o,. -.... 0
"!-0
..... 0
0~
"!. 0
"'~ C>
:/ ..... 0
/\
~
s-
~ a· 0 0 ~_y
--
't
__} ~·
C!-
.... "!· 0 0
a> 0>
~ ~ II)
~ ~ m
.. a> ::::!: D: 0 2:
~-C)
"'~ 0
a·
~-
~-
So
s-
~·
"!o C>
"!· 0
0~
0~
--~ 0 0
-·
So
s-
S· s-"!o y 0
--
0_.
o:. 0!-0 0
-54-
Figure 13. Relative Bias of vHT"
a) Complete Population Space.
b) Enla;gement of Lower Left Corner.
(Contours plotted are: -0.10, 0.0, 0.10)
GAMNORM48 BIGAMMA49 STREAM50 •• 0 J 0.10 •• 0.18 0.12 12 0.3. 0.15
:: '/0 :: . :: ~ • • ... 0 • • 0
0.)6 0.11 0.?6 I-'
0 • ~ 0.1V O.J5 ct . ~ ,
0 o.91 ~ I]
I-'
to tt t:l: 1 t l 4 ' t 1 e ' 10 " u {» GAMNORM75 BIGAMMA77 STREAM82 ;!'. ::[ ., ;r ::[ .... 7T :: ·~· ~
~ ~ ~ ~
0.~6 ll'
O.J8 O.)J / / -0,0> ,~ .,. 0.1/ /:o> :l,, op ~ ~ .,. / 1 •• ,. '7 / r .. ,. ,
~'~ ~ -o~,OO c::::J I I I I
tO 11 12 I t ) • I I 7 I t 10 t1 12 I 1 a t 10 t1 12
GAMNORM94 BIGAMMA97 STREAM99
::~ ·~ /1 ::I ... A' :: 10 /0 10 10 ~
/o :~ov 0.~1 o.JY / / -ope , L o.?&
o.p
<3 -® tO t1 12 tO II 12 ~.~~.---,~~.--~.~-.~-7,--~.---.~~ .. ~~.~.--7 ..
GAMNORM48 81GAMMA49 STREAM 50 . , I I •
' .; Q'
,/ 0"' "%j ......... 1-'•
O.J8 Oq
O.J8
~ I I ,, f 0.?2 tXl 1:: ::J '1
o.p O.J5 O.J4 ·98 O.?O ~ C1>
PJ 0~ '1 f-l
.; O.J6 oq ~ o.! 1 _,./(l.ga
·~ 2 O.J5 0.)2 C1>
8 ,......... O.QJ - C1> 0
o.ga o.g2 o.ga · o.gJ ' ::J 0 ct ::J.
ct 0 1-'•
I I "I ., ::J I 2 • • I 2 • • I 2 • • 1::
GAMNORM75 ' "BIGAMMA77 STREAM82 r C1>
0 0.. ~ .........
_0,. I I .;
I I I C1>
O.Q!I I '1
r O.J5 I I .l C1>
• ., 0.~0 ct I
0.!4 jO.(JS I I 0.?8 O.?O 0 C11 O.J5 0 0) :::::--·· '1 I
0.!8 o.p ::J 2 C1>
.I 2 -0,05
I
.,_...,___ '1 o.ga
-0,02 I I I
I 2 l • J I
GAMNORM94 BIGAMMA97 STREAM99
O.?Jq2 0.~5 O.i6
O.J2 ,~v-
.~
0.?1 0.~4
-0,15
-57-
Figure 14. Relative Bias of VyG· a) Complete Population Space.
b) Enlargement of Lower Left Corner.
· CAMNORM48 81CAMMA49 STREAM50
:~ l 0.04 o.r :~ l -0.02 0.,2 :: 0.05 0.00
..l .. ~ .. ~ I !.-o 0
-0,04 -0104 0.,0 1 t -o,o& O.QI · O.r 1 t -0106 O.Q2 0. 2 ~ -o,oJ 0.91 -o,o1 1-'
• • (b
-0101 • 0.90 • -0,01 ct (b
-0103 4 O.Ql 4 O.QI 'i:l
• .P~J • o;fioJ • ~~5 .g 1 ~.{_ _01 O.Q2 1 ~~~o.p4 -0101 1 ~ O!> O.QJ J:: Oi$!P· o,o:>op o,atf', ~--'
I I 2 , 4 I t 1 I I 10 11 tJ I 1 2 l 4 S I 1 I I 10 tl 12 1 I 2 .S 4 S I 1 8 I 10 It 11 ~
::l . GAMN~~.~~5 0.13 :: [ BKlAM~-~7 -0101 :: t _,.,, "'"'"' I ! .. l .. ~ 10 ~
-O.oJ Pl . 0 (b' • I
, .. -o,oe o.Ql -,o, ,~-o,o1 0.91 -oo3 'lo,•o -o,o2 I &l O.Q1 , 0.90 O.Q I 1
• • -0102 . O.Q2 O.Q2 O.QI
O.Q2 0.00 -0.02 O.Q2 • . O.Q5 • -o,o1 -0 01
-o,os ~o OJ g.o5 o os ' > 0~~ 1 • 1 0 i!IJIOO > 0:1&"ID62
2 ~ OJ -oo~l -'0,04 2 ~ O.Q2 2 NA\ps o.oo .... . 01 • • • ' o. .
I . t I t 2 l 4 I I · 7 t t 10 t1 12 I 2 l 4 I t 1 I I 10 11 12 I 2 l 4 a I 1 I I IQ tt 12
CAMNORM94 BICAMMA97 STREAM99
:: ~ 0.92 -0~02 :: 0.04 0.03 ::
•• ~ •• 10
• -o,o1 -o,o 1
• O.QJ . O.Q2 0. 0 1 -0,02 O.Q1 0. J 1
-0,01 . -0,02 -0,01
• • -0.03 0.01 -0.03
' S ' I •
O.Q2 4 -0,01 4 O.QI
·~3 > 0~~1 > 0~~. 2 ~~~ 0·91 2 ~.00 °·92 2 d.d.~8~ OJ O.Q2 ~· "l'"'VV'/f- o,1~.9u
'0 " U 1 t J l 4 I • 7 •• 10 tl U t.!-,-,:!---,:!---'-,-7.-~.-~.-~.~. tO II 11
CAMNORM48 BICAMMA49 STREAM 50 ~~ ~- ~ ~~""-""'-'!Jt/'-"..1';1_~
cr "lj '-../ .....
-o,oa ()q -0101
I I -0,03
I I -o,os trl c
::s "1 -0,02 -o,oJ -0,04 -o,o2 -o,o3 -0,03 -o,os ..... (1)
P' -0,07 "1 """
-o,os -0,04
·r -0,07 -0,01
·r -0,10 -o,os ()q ~
(1)
-0,01 -0,04 -o,os 8 /""". (1) 0
-0,05 -o,os -o,os -0,01
-0,06 ::s 0 rt- ::s
rt-0 .....
I I • I
I 2 • I t I • • t-b ::s I • •
CAMNORM75 81GAMMA77 STREAM82 c t"' (1) 0 0. ~ '-../
(1) -0,02
.l "1
t"' -o,os (1)
• t-b -o,os -o,oe rt- I
-o,o7 -0,03 -0,04 I I -o,oa O.QI O.QO -0,14 -o,os 0.92 () CJt 0 co "1 I
-0,10 -0,04
·r -0,09 O.QO
·r -0,14
-o,os ::s -0,05 -0,03
(1) O.c;JO -o,oe "1
-0,11 -o,os -O.fJ7
-0,06 -0,02 -0,03 -0,12
-0,04
O.QI I
I • I I 2 • I • • • I
CAMNORM94 BICAMMA97 STREAM99
-o,&S,o2 -o,oe -o,os
-0.09 -0.03 -0.03 . . .. -0,04 -0,02 -0,01 -o,u -o,os -o,o1
-0,07 O.QO
O.QO
-0,05 -0,04 -0,10
-0.12 • -0,08
-0,03 -0.07 -0.05 • • -0,01 -0,02 -0,04
O.QI -0,10 090
II I ~~~--------~----------~--------~
CAMN0Rt.448 61GAMMM9 STREAM50 11 11 -003 0. 2 11 0.04 ~
11 " ..
•• •• tO
• t
. . -o1o2 -o,o4 o. o , -o1o4 0.91 o.fo , -o,o1 0.91 0.~2
-o,o3 0.91 -o,o1 • •
-o,o1 , -o,o1 , -0,01
-0,02 • 0.93 • 0.\)0
H·0oYfoo1 1 ~1 J ' 1 O'rQ/p4oJ ~ .~Oi)f91 0.92 I • !,n,o3 -0101 I .~~f 04 0.93 Qq ·¥o,Ol . 10.yr -olP• 1::
1 I I '1 t J ~ 4 1 t 7 t t It It tl I J .S • S I 1 ,I t 10 11 II I J .J 4 S I t I I II 11 II ('I)
GAMN0Rt.475 B~Mt.IA77 STREAM62 ~ :l -.. · T :f ... T J ~ I : -0,03 <II
....... a ~ I
-o,oz 0.93 -o 01 , -o,o3 (!.91 -o oJ , ·9' -o,o2 f!'. 8 0.92 0.91 0.91 < 1
• • <II -0101 1 O.CJ2 1 0.92 O.QI
O.QJ 0.91 ~ . . ~ . ~ ~ ~
I .c8:~2 tQ -o,o•. 1 1 11111 ~.oo 0.91 0 . .;~ -o,oJ ~
I I t, 1 I t It U ll I' t J • I I 7 I t to II tt t I I 4 I I t I t 10 II It <
CAMNORt.494 B~Mt.IA97 STREAM99 :I: ;.o
::r ·~· . T ::r ... T : --- -f' 11 tt to
I 0.90 -o,o2
• O.CJ7 0.92 -101 , 0.91 0.91 0. 2 ,
-o,o• -o,oJ -o,o1 • •
~ t ~ I ~
0.92 • -o,o1 • 0.91
1 .~IIA,o1 • .~8&.~1 0.91 I I • ~M 0.91 hJ 0 0 n2
• ""V<y'Z" I o, '6.~ J ·r
•• tt U It I .t • • • P • • •• U t• It I ~ • • a • a • •• u tl
• , <'"<~""'' ... ~ ~.. ••• . • ~·: ?~·
~. ~-· ..... , .... ,~
CAMNORM48 QIOAMMA49 STR£AM50 It o.o3 0.1 1 It -o 03 o. 2 11 o.o4 o.oo
II II II
tO tt tO
t f I
I I I
, . -o,o2 -o,Gti o. ~ , •o,04 ;o.92 o. o , -o,ot 0.90 o.t2 -o,o3 0.91 -o,o2
I I
~ I ~ I ~
•0102 ' 0.92 ' O.C]O
, Oo?Joot s ·102 s tr,Qf.4o3 ~ 1 .9o:of91 o.92 1 ~J l-4.03 -o,ot , -~~- 04 o.93 oq
.,o,Ol 11·' to.yr -o,O'P• = I I I '1 ll)tllfllllllll IIS411fllltllll IIS4IIfllllllll/)
CAMNORM75 BICAMMA77 STREAM82
:f ~ T :l ~ T :f "9' I : -0,04 ()
.......
·r ~ I , .; -0,02 0.94 -,02 , t •Op3 0.92 -,04 , .91 ·0,04 ~. e 0.92 0.91 0.92 < 1 . . ()
~ ~ ~ . ~
~ ~ 0.94 • ·0102 ·-o,o2 ~
-¥' I :~ . ' ':· ' ' ' ' I ' -·~ ~ ttuta ttJtltJittotttt: ttl•tltlttttttt(
CAMNORM94 BIOAMMA97 STR£AM99 -( :.-
::~ - T :i '¥ T : ~· II ' It II
.I 0.90 -o,to
• 0.91 0.95 _101 • I 0.91 -0,04 0. , ,
o.C]O -o,o2 o.q• I I
-0103 1 · O.QI 1 0.(]3
0.92 • -o,ot • 0.90 s l~s
O.QI I ·~·~.La O.QI I I ~~01 0.92
I . t•lltJ ltillllt.IOU12 f2J•SitlltiU11
1 f
-62-
Figure 17. Probability of a Sample with Negative v~T·
a) Complete Population Space.
b) Enlargement of Lower Left Corner.
(Contours plotted: 0.05, 0.10, 0.20, 0.30)
;.;
GAMNORM48 BIGAMMA49 STREAM50
] ~ T ::~ ~ -J :~ " I~ ()
0.90 0.92 0.,0 7 ~ 0.90 0.93 0.,1 7 ~ 0.<]0 ~ 0.,2 ~ 0.91 0.91 0.<]2 . . ......
(!) 0.91 . 0.91 . O.Q2 ct
~ ·~ I ·1 ·~ I . f ·~ I ~ : ~~ . . . ·~ . . . . :~~ ' ' ' ·: ' ' ' ' : ~~ ' ' ' ·:· ' ' ' ' . ! I 2 3 4 I I 1 . I I 10 11 11 I 2 ) t S I 7 8 I 10 1t 12 I J l 4 S I 7 I I 10 H U P'
r:: ... ~ T ::K \"'"~:· T :f ·? J ""'"" ~ ~ to . . 10 to ~ "0
~ 0 (!) I
O.C]O '\. 0.98 ~ 0.,2 7 O.C]O ~~·!3 0. 6 7 ·90 l 0·?4 1 • 0) 0.91 ~ ~ 0.<] ~··' c..,
• • I ~~ ·!7 ~~
O.Qf 5 5 0.00 D.\
O.C]O • O.Q1 • 0.00
··~ ·~~ . 0·90 I • ·~.oo 0·90 • ·~\J no -~ ·900.l,)tr 0.<]0
I 1 10 II II I 2 .S 4 f I 1 I I 10 11 12 I 2 l 4 5 I t I I tO 11 12
GAMNORM94 .BIGAMMA97 STREAM99 0.30 0.48
0.~6
0.14 o.t9
10 u 12 10 II 12
I I
I t
~
GAMNOR'-!48
0.90
0.90 o.qo
0.90 0.90
0.90 o.go 0.90
• , CAMNORM75
0.90
0.90 · 0.90 0.90
0.90 0.90 o.go O.QO
O.QO
O.Ql
0.'}2 O.C]J
O.Q2
O.QO
2
O.Q1
0.90
l
GAMNORM94
0~
0.90
BIGAMIAA49
I I o.qo
0.90 O.C]O 0.90
·t O.QO 0.90
o.go 0.90 O.QO
I • I 2
81CAMMA77
O.QO I I I .l
0.90
I I 0.90 0.91 0.92
·r o.qo O.Q1
O.QO
O.QO O.QO O.QO
o.qo t • t •
BIGAMMA97
o.p
STREAMSO
r::r "%] '-/ ....
o.qo ()q
I I O.C]O tll c::
::I "1 O.QO o.go ~ (1)
PI O.QO "1 ,...
()q ---1
·r O.QO O.QO (1)
o.go a ,.., (1) 0
O.QO ::I 0 c+ ::I
c+ 0 ....
t t l • >-h ::I , • •
STREAM82 c::
r (1) 0 0.. t '-/ (1)
I I I "1
r
·r (1) >-h o.go c+ I
O.QO O.C]I O.QJ 0 Ol 0 ~
"1 I
·r O.QO
O.Q1 ::I (1)
O.QO "1 O.QO
0.90
t l • t
STREAM99
.·.\:··· .. ·,'··:;
-65-
6. ANALYSIS AND DESIGN ISSUES
The exploration of the population space revealed some
potentially useful survey design and analysis
considerations for random-order, tps sampling. Given
information about the correlation, population centroid, and
distribution of x and y, the population space assessment
provides guidance on the choice of a variance estimator for
specified survey objectives. Example recommendations are:
1) if the population is such that vari-able
probability sampling has better precision than
simple random sampling, v~T should not be used;
2) none of the variance estimators work well if the
population is located near the extreme left edge
of the population space, but the random-order, tps
design is inefficient in this circumstance and
should be avoided (see later comments on shifting
populations out of this region);
3) vHT provides confidence intervals possessing good
coverage for most populations, sometimes at the
expense of positive bias and wider confidence
intervals than those obtained using vyG or v~G;
4) vyG is recommended over v~G since these two
estimators have similar properties and vyG is
easier to compute.
Of importance to survey design, the population space
analysis showed that shifting populations away from the
-66-
left edge of the population space ~esulted in improved
properties or the variance estimators (except v~T) and
improved efficiency or the estimator T,. A horizontal
population shirt is easily accomplished in the survey
design by adding a constant to all population x's so that
xi=x;+c, then sampling with inclusion probability
proportional to x*. The standardized variance plots
(Figure 4) provide guidance for advantageous population
space locations.
Shirting in the horizontal direction eliminates
extremely small ~'s, but deciding how far to shift the
population is a complication. Reddy and Rao (1977)
considered modifying the x values at the analysis stage to
improve precision or the estimator T,. Their theoretical
results may provide some information on how far to shirt
the population at the design stage. Ir small 1r's
detrimental to the precision or the estimators are not
eliminated at the design stage, strategies "scoring" the
small 1r's to a higher value can be employed to reduce MSE
(Overton and Stehman, 1987; Potter, 1988).
Vertical shirts or a population to a more desirable
region in the population space could also be considered to
improve estimates after the sample data have been
collected. Because the most drastic gradients in the
population space surfaces were usually pe~pendicular to the
horizontal axis, the advantage or a vertical shirt in a
-67-
population appear minor relative to the potential gains or
a horizontal shirt.
7. CONCLUSIONS
The population space assessment proved successful in
strengthening the conclusions available from empirical
studies, and in discovering associations or behaviors or
the variance estimators with characteristics or the
populations. Previous empirical studies (Cumberland and
Royall, 1981; Rao and Singh, 1973) did not reveal these
patterns because they focused on a more restricted set or
high correlation populations located near the standard
diagonal. The standard diagonal was a region or special
behavior, out more general conclusions were obtained in the
population space analysis by systematically exploring a
wide variety or structured populations.
Summarizing the important findings or the population
space assessment:
1) Properties or vya and v~G were virtually
2)
identical, so the simpler form vyG should be
used in practice;
vhrT performed the poorest or the four variance H.
estimators, and this estimator should be avoided;
3) The worst behavior or v~T was in the region or
the population space around the standard
diagonal, precisely the region or populations
-68-
examined in past empirical studies -- P.ast
emphasis on these populations contributed to the
perception that vyG was superior to vHT;
4) The perf'ormance of' vHT was f'ar superior to that
of' v~T' particularly f'or populations in'the
region of' the standard diagonal;
5) The extreme lef't edge of' the population space
was a region of' poor behavior f'or random-order,
vps sampling.
Patterns in the behaviors or the variance estimators
were consistent across all three 'families. Surraces f'or
the STREAM 'family were usually steeper, possibly because
the sampling Traction was higher ror this 'family. Although
only samples of' size 16 were investigated, the results
observed in the population space assessment were consistent
with results observed in previous empirical studies f'or
other sample sizes and populations (cr. Stehman and
Overton, 1987a; Rao and Singh, 1973).
The population space analysis is similar in philosophy
to a superpopulation model concept because a model was used
to generate the base populations ror the BIGAMMA and
GAMNORM 'families. The population ~pace results were,
thererore, representative of' a broad class of' populations.
But as in any empirical study, the results were dependent
on the particular realizations or the random variables
generated in creating the BIGAMMA and GAMNORM 'families.
-69-
The behavior surfaces represented a single realization of
these families, whereas, ideally, the mean trajectory or
surface would be described. Another source of variability
in the representation of the estimator properties was that
the behavior surfaces were estimated by simulation; that
is, the contour plots were not exact representations of the
true surfaces and were subject to some sampling
variability.
Theoretical comparison of the variance estimators in
variable probability sampling has proven very difficult.
The consistency of the variance estimator behaviors for
the three families indicate these behaviors to be general,
so a more general theory may be derivable, possibly even an
anal):'tic theory. Empir"ical identification or these
patterns is an important step towards development of
theoretical understanding.
ACKNOWLEDGEMENTS
Ron Stillinger provided invaluable help in implementing the computing and _graphics described in this report. John Carlile assisted with the contour plotting routines.
8. REFERENCES
Brewer, K. R. W. (1963). A model of systematic sampling with unequal probabilities. Austral.~ Statist. 5, 5-13.
Cumberland, W. G., and Royall, R. M. (1981). Prediction models and unequal probability sampling. ~Roy. Statist. ~ ~ B 43, 353-367.
Hartley, H. 0., and Rao, J. N. K. (1962). Sampling with unequal probability and without replacement. Ann. Math. Statist. 33, 350-374.
-70-
Horvitz, D. G., and Thompson, D. J. (1952). A generalization of sampling without replacement from a finite universe. ~ Amer. Statist. Assoc. 47, 663-685.
Johnson, N. L., and Kotz, S. (1972). Distributions in Statistics: Continuous Multivariate Distributions. Wiley: New York.
Kennedy, W. J., and Gentle, J. E. (1980). Statistical Computing. Marcel Dekker: New York.
Messer, J.J., C.W. Ariss, J.R. Baker, S.K. Drous~, K.N. Eshleman, P.N. Kaufmann, R.A. Linthurst, J.M. Omernik, W.S. Overton, M.J. Sale, R.D. Shon9rod, S.M. Stanbaugh, and J.R. Tutshall, Jr. (1986). National Surface Water Survey: National Stream Survey, Phase I== Pilot Survey. EPA-600/4-86-026, U.S. Envi ronmenta1 Protect ion Agency, Washington, D.C.
Overton, W. S. (1985). A Sampling Plan for Streams in the National Stream Survey. Technical Report 114, Department of Statistics, Oregon State University, Corvallis, Oregon, 97331.
Overton, W. S., and Stehman, S. V. (1987). An Empirical Study of Sampling and Other Errors in the National Stream Survey; II. Analysis of a Replicated Sample of Streams. Technical Report 11a, Department of Statistics, Oregon State University, Corvallis, Oregon, 97331.
Potter, F. (1988). Survey of procedures to control extreme sampling weights. To appear in Proceedings of the Section Qll Survey Research methods, American Statistical Association Annual Meetings, 1988.
Rao, J. N. K., and Singh, M.P. (1973). On the choice of estimator in survey sampling. Austral. ~·Statist. 15, 95-104.
Reddy, V. N., and Rao, T. J. (1977). Modified PPS method of estimation. Sankhya ~ Q 39(3), 185-197.
Sen,·A. R. (1953). On the estimate of the variance in sampling with varying probabilities. ~ Indian ~ Agric. Statist. 7, 119-127.
Stehman, S. V., and Overton, W. S. (1987a). Estimating the variance of the Horvitz-Thompson estimator in variable probability, systematic samples. Proceedings of the Section Qll Survey Research Methods, American Statistical Association Annual Meetings, 1987, pp. 743-748
Stehman, S. V., and Overton, W. S. (1987b). A comparison
-71-
of variance estimators of the"Horvitz-Thompson estimator in random order, variable probability, systematic sampling. Biometrics Unit Manuscript Bu-M ~' Cornell University, 337 Warren Hall, Ithaca, New York, 14853.
Stehman, S. V., and Overton, W. S. (1989). Pairwise inclusion probability formulas in random-order, variable probability systematic sampling. Biometrics Unit Manuscript Bu-M !QUa, Cornell University, 337 Warren Hall, Ithaca, New York, 14853.
Yates, F., and Grundy, P.M. (1953). Selection without replacement from within strata with probability proportional to size. ~Roy. Statist. ~ ~ B 15, 235-261.