1
A Weighting Class Adjustment Estimator for the Total under a Stratified Sampling Design in a Continuous Domain
Breda Munoz
Virginia Lesser*
Oregon State University
R82-9096-01
2
This presentation was supported under STAR Research Assistance Agreement No. CR82-9096-01 awarded by the U.S. Environmental Protection Agency to Oregon State University. It has not been formally reviewed by EPA. The views expressed in this document are solely those of authors and EPA does not endorse any products or commercial services mentioned in this presentation.
3
Overview
• Introduction– Assumptions
• Estimator for the Total in a continuous domain
• Effect of missing data in estimator for the total
• Adjustment estimator
4
1{ , , }ns s
-5.47
-1.63
2.20
6.04
9.87
( )y
R
T y d s s
Assumptions
• probability sample
• Estimate of the population total of a variable Y
• Missing at random
John Day stream network
5
6
Estimating the Total
• Horvitz-Thompson Estimator for the total in a continuous domain (Cordy, 1993):
- Unbiased
• Estimator for the Variance of the total
• Other: Total and variance estimators (Yates and Grundy, 1953)Local Variance Estimator (Stevens and Olsen, 2003)
,ˆ ˆ
y HTVar T
,
1
( )ˆ( )
ni
y HTii
yT
s
s
1 2
1 1
( )
( )
hnHih
ihh i
y
s
s
1 1
1 1 1
( , ) ( ) ( )( ) ( )
( , ) ( ) ( )
h hn nH Hhi h i hi h i
hi h ihi h i hi h ih h i i i
y y
s s s ss s
s s s s
7
observed
missing
8
HT-total estimator under missing data
,
1
( )ˆ( )
ni
y HTii
yT
s
s
1
,
1
( )ˆ( )
ni
y HTii
yT
s
s
8000 10000 12000 14000
00
00
00
0
6000 8000 10000 12000 14000
00
00
00
8000 12000 16000 20000
00
00
0
15% missing 30% missing 50% missing
92% 89% 70%
9
-5 0 5 10
0.0
00.0
50.1
00.1
50.2
0
missing
observed
10
Accounting for missing data
1
*,
1
ˆ ( ) ( )n
y HT i i
i
T y w
s s
1
( ) ( )hn
hi
h H i
f
s s
( , ) ( , )hn
hi
h h h i i i
f
s s s s
1
*, 1
1 1
ˆ ( ) ( ) ( , )n n
y HT i i n i
i i
E T y w f d
s s s s s
| |h
j
n
R
2
( 1)
| |
( ) ( )
h h
j
n n
R
s s
1
1 ( )
( )h
in
wn
ss
11
Variance of the Adjustment Estimator
• Observe that:
22* * *ˆ ˆ ˆHT HT HTVar Y E Y E Y
1
22*
1 1
ˆ ( ) ( )hnH
HT hi hi
h i
Y w y
s s
1 1 1
2 2' '
1 1 1 ' 1 1
( ) ( ) ( ) ( ) ( ) ( )h h hn n nH H H
hi hi hi h i hi h i
h i h h i i i
w y w w y y
s s s s s s
12
Variance of the Adjustment Estimator
, stratathh s s
1 1( ) ( )h h
h h
n n
n n
s s
1 1
,
1 1 1
( ) ( ) ( ) ( ) ( , )h hn nH H
hi h i
h h i i i
w w y y f d d
s s s s s s s s
1 1( , ) ( 1)
( 1)h h
h h
n n
n n
s s
strata, stratah h s s
13
Variance of the Adjustment Estimator
* 2 ( , )ˆ ( ) ( ) ( ) ( ) 1( ) ( )HTw
Var Y w y d y y d dw w
s ss s s s s s s
s s
14
Population: John Day Middle Fork stream reaches
• Area of 785 mi2
• 143 stream reaches divided in survey segments (~1 mile)– 6536 survey
segments
• We simulate a continuous multivariate normal spatial random process
15
Population: John Day Middle Fork stream reaches
• The population of stream reaches was stratified in 6 strata based on the number of survey segments:
“<10 ” “10-20” “20-30”
“30-50” “50-100” “>100”
• 1,000 samples of size 100
16
Data
Site Outcome Strata U Prob. response If U<1-P then R=0 S1 Y1 1 U1 .70 missing S2 Y2 1 U2 .70 observed S3 Y3 2 U3 .85 observed S4 Y4 2 U4 .85 missing S5 Y5 3 U5 .90 observed S6 Y6 3 U6 .90 missing S7 Y7 4 U7 .75 observed S8 Y8 4 U8 .75 observed S9 Y9 5 U9 .80 observed S10 Y10 5 U10 .80 missing
178000 10000 12000 14000 16000
00
00
00
8000 10000 12000 14000 16000
00
00
00
8000 12000 16000 200000
00
00
15% Missing Rate 30% Missing Rate 50% Missing Rate
94.8% 94.1% 77.4%
18