Chapter 10: Variance Estimation
Jae-Kwang Kim
Iowa State University
Fall, 2014
Kim (ISU) Ch. 10: Fall, 2014 1 / 43
Introduction
1 Introduction
2 Taylor series linearization
3 Replication variance estimation
Kim (ISU) Ch. 10: Fall, 2014 2 / 43
Introduction
Introduction
Use of variance estimate in sampling
Inferential purpose : construct CI, hypothesis testingDescriptive purpose : evaluation of survey estimates, future surveyplanning
What is a good variance estimator ?
Unbiased, or nearly unbiased (with positive bias - conservative)Stable : Variance of the variance estimator is low.NonnegativeSimple to calculate
Kim (ISU) Ch. 10: Fall, 2014 3 / 43
Introduction
Introduction
HT variance estimator (or SYG variance estimator): Some problems1 Can take negative values2 Need to know the joint inclusion probability πij , which can be
cumbersome for large sample size.
Kim (ISU) Ch. 10: Fall, 2014 4 / 43
Introduction
Variance of variance estimator
Parameter of interest: V (θ)
Let V be an (unbiased) estimator of V (θ).
May assume that
dV
V (θ)∼ χ2 (d)
for some d . (d.f. of V ).
Kim (ISU) Ch. 10: Fall, 2014 5 / 43
Introduction
Variance of variance estimator (Cont’d)
By the property of χ2 distribution,
E(V)
= V (θ)
and
V(V)
=2{V (θ)
}2
dThus,
CV(V)
=
√V(V)
E(V) =
√2
d.
How to compute d?1 Method of moments: requires an estimate for V
(V)
.
2 Rule of thumb: use d = nPSU − H, where nPSU is the number ofsampled PSU and H is the number of strata.
Kim (ISU) Ch. 10: Fall, 2014 6 / 43
Introduction
Alternative to HT variance estimation
Simplified variance estimator: Motivation1 Consider the variance estimator for PPS sampling:
V0 =1
n (n − 1)
∑i∈A
yipi− 1
n
∑j∈A
yjpj
2
,
which is always nonnegative and simple to compute.2 What if we use V0 as an estimator for the variance of YHT =
∑i∈A
yiπi
by treating YHT∼= YPPS = 1
n
∑i∈A
yipi
?3 Simplified variance estimator: Use the PPS sampling variance
estimator (V0) to estimate the variance of YHT
Kim (ISU) Ch. 10: Fall, 2014 7 / 43
Introduction
Theorem
E(V0
)− Var
(YHT
)=
n
n − 1
{Var
(YPPS
)− Var
(YHT
)}where Var
(YPPS
)is the variance of YPPS using pk = πk/n as the
selection probability for unit k for each of PPS draw, and
Var(YPPS
)=
1
n
N∑i=1
pi
(yipi− Y
)2
.
Kim (ISU) Ch. 10: Fall, 2014 8 / 43
Introduction
Remark
1 In most cases, the bias is positive. (thus, conservative estimation.)
2 Under SRS, the relative bias of the simplified variance estimator is
E (V0)− Var(YHT )
Var(YHT )=
n
N − n
and it is negligible if n/N is negligible.
Kim (ISU) Ch. 10: Fall, 2014 9 / 43
Introduction
Remark (Cont’d)
3 Application to multi-stage sampling: Express
YHT =∑i∈AI
Yi
πIi.
The resulting simplified variance estimator can be written
V0 =1
n (n − 1)
∑i∈AI
(Yi
pi− YHT
)2
=n
(n − 1)
∑i∈AI
(Yi
πIi− 1
nYHT
)2
where pi = πIi/n and n is the size of the sampled PSU’s.
Kim (ISU) Ch. 10: Fall, 2014 10 / 43
Introduction
Remark (Cont’d)
4 The bias is negligible if the primary sampling rate is negligible. If thesampling design is also a stratified (multi-stage) sampling such that
YHT =H∑
h=1
nh∑i=1
whi Yi ,
where AIh = {1, 2, · · · , nh}, the simplified variance estimator can bewritten
V0 =H∑
h=1
nh(nh − 1)
nh∑i=1
whi Yhi −1
nh
nh∑j=1
whj Yhj
2
.
Kim (ISU) Ch. 10: Fall, 2014 11 / 43
Taylor series linearization
1 Introduction
2 Taylor series linearization
3 Replication variance estimation
Kim (ISU) Ch. 10: Fall, 2014 12 / 43
Taylor series linearization
Taylor series linearization
Estimate variance of nonlinear estimator by approximating estimatorby a linear function
First-order Taylor linearization: For p-dimensional y, ifyn = YN + Op
(n−1/2
), then
g (yn) = g(Y)
+
p∑j=1
∂g(Y)
∂yj
(yjn − Yj
)+ Op
(n−1)
Linearized variance
V {g (yn)} .=p∑
i=1
p∑j=1
∂g(Y)
∂yi
∂g(Y)
∂yjCov {yin, yjn}
Kim (ISU) Ch. 10: Fall, 2014 13 / 43
Taylor series linearization
Two methods of obtaining linearized variance estimation
1 Direct method: use
V {g (yn)} .=p∑
i=1
p∑j=1
∂g (yn)
∂yi
∂g (yn)
∂yjC {yin, yjn}
2 Residual technique:1 Obtain a first-order Taylor expansion to get
g (yn).
= g(Y)
+1
N
∑i∈A
1
πiei
for some ei .2 The variance of g (yn) is then approximated by the variance of
N−1∑
i∈A1πiei . If we observed ei , then we would estimate the variance
of N−1∑
i∈A1πiei . Obtain a variance estimator of N−1
∑i∈A
1πiei and
replace ei by ei .
Kim (ISU) Ch. 10: Fall, 2014 14 / 43
Taylor series linearization
Example: Ratio
R =y
x, R =
Y
X
Taylor expansion:
R = R + X−1 (y − Rx) + Op
(n−1)
Method 1 (Direct method)
V(R).
= x−2V (y) + x−2R2V (x)− 2x−2RC (x , y)
Method 2 (Residual technique)
V(R).
=1
N2
∑i∈A
∑j∈A
πij − πiπjπij
eiπi
ejπj
where ei = x−1(yi − Rxi
).
Kim (ISU) Ch. 10: Fall, 2014 15 / 43
Taylor series linearization
Ratio estimator ˆYr = X R
V1.
=1
N2
∑i∈A
∑j∈A
πij − πiπjπij
yi − Rxiπi
yj − Rxjπj
V2.
=1
N2
(X
x
)2∑i∈A
∑j∈A
πij − πiπjπij
yi − Rxiπi
yj − Rxjπj
Which one do you prefer ?
Kim (ISU) Ch. 10: Fall, 2014 16 / 43
Taylor series linearization
Variance estimation for GREG estimator
For simplicity, assume that ci = λ′x so that
YGREG =∑i∈A
1
πigiyi
where
gi = X′
(∑i∈A
1
πicixix′i
)−11
cixi
Two types of variance estimators
V1 =∑i∈A
∑j∈A
πij − πiπjπij
eiπi
ejπj
V2 =∑i∈A
∑j∈A
πij − πiπjπij
gieiπi
gjejπj
They are asymptotically equivalent because gi.
= 1.V2 has a good conditional property.
Kim (ISU) Ch. 10: Fall, 2014 17 / 43
Taylor series linearization
Variance estimation for Poststratified estimator
Poststratified estimator
Ypost =G∑
g=1
Ng
Ng
Yg
Unconditional variance estimator
V1 =∑i∈A
∑j∈A
πij − πiπjπij
eiπi
ejπj
where ei = yi − yg for xig = 1.
Conditional variance estimator
V2 =∑i∈A
∑j∈A
πij − πiπjπij
gieiπi
gjejπj
where ei = yi − yg and gi = Ng/Ng for xig = 1.
Kim (ISU) Ch. 10: Fall, 2014 18 / 43
Taylor series linearization
Variance estimation for Poststratified estimator
Under SRS:
V1 =N2
n
(1− n
N
) G∑g=1
ng − 1
n − 1s2g
V2 =(
1− n
N
) n
n − 1
G∑g=1
N2g
ng
ng − 1
ngs2g
where
s2g =1
ng − 1
∑i∈Ag
(yi − yg )2 .
Kim (ISU) Ch. 10: Fall, 2014 19 / 43
Replication variance estimation
1 Introduction
2 Taylor series linearization
3 Replication variance estimation
Kim (ISU) Ch. 10: Fall, 2014 20 / 43
Replication variance estimation
Replication method - Idea
1 Interested in estimating the variance of θ.
2 From the original sample A, generate G resamplesA(1),A(2), · · · ,A(G).
3 Based on the observations from the resample A(g), (g = 1, 2, · · · ,G ),
compute the replicate θ(g) for θ.
4 The replicate variance estimator for θ is computed as
V = KG
G∑g=1
(θ(g) − θ(·)
)2for some suitable KG , where θ(·) = G−1
∑Gg=1 θ
(g).
Kim (ISU) Ch. 10: Fall, 2014 21 / 43
Replication variance estimation
Replication methods for variance estimation
Random group method
Independent random group method:Mahalanobis (1939, 1946), Deming (1946)Non-independent random group method
Balanced repreated replication:Plackett and Burman (1946), McCarthy (1966)
Jackknife: Quenoulle (1949), Tukey (1958)
Bootstrap: Efron (1979)
Kim (ISU) Ch. 10: Fall, 2014 22 / 43
Replication variance estimation
Independent Random Group Method
Procedure1 A sample, A1, is drawn from the finite population according to the
design p. Compute θ1 from the observations in A1.2 Sample A1 is replaced into the population and a second sample, A2 is
drawn according to the same sampling design p. Compute θ2 from theobservations in A2.
3 This process is repeated until G ≥ 2 times.4 Use
θRG =1
G
G∑k=1
θ(k) (1)
as an estimator for θ. Use
V(θRG
)=
1
G
1
G − 1
G∑k=1
(θ(k) − θRG
)2(2)
as a variance estimator for θRG .
Kim (ISU) Ch. 10: Fall, 2014 23 / 43
Replication variance estimation
Independent Random Group Method
Property
Let θ1, · · · , θG be uncorrelated random variables with common expectationE (θ1) = θ. Then,
1 θRG in (1) is unbiased for θ.
2 V (θRG ) in (2) is unbiased for V (θRG ).
Kim (ISU) Ch. 10: Fall, 2014 24 / 43
Replication variance estimation
Independent Random Group Method
ExampleSuppose that a sample of households is to be drawn using a multistagesampling design. Two random groups are desired. An areal frame exists,and the target population is divided into two strata (defined, say, on thebasis of geography). Stratum 1 contains N1 PSUs and stratum 2 consistsof one PSU that is to be selected with certainty. G = 2 independentrandom groups are to be used. Each sample is selected independentlyaccording to the following plan:
Stratum 1: Two PSUs are selected using some πps sampling design.From each selected PSU, an equal probability systematic sample ofm1 households is selected.Stratum 2: The certainty PSU is divided into city blocks, with theblock size varying between 10 and 15 households. An unequalprobability systematic sample of m2 blocks is selected with probabilityproportional to the block sizes. All households in selected blocks areenumerated.
Kim (ISU) Ch. 10: Fall, 2014 25 / 43
Replication variance estimation
Independent Random Group Method
Example (Cont’d)
For point estimation, use
ˆθ =(θ(1) + θ(2)
)/2
For variance estimation, use
V(
ˆθ)
=1
2(1)
2∑g=1
(θ(g) − ˆθ
)2=(θ(1) − θ(2)
)2/4.
Easy Concept, Unbiased Variance Estimator; Unstable, Not oftenused in practice
Kim (ISU) Ch. 10: Fall, 2014 26 / 43
Replication variance estimation
Non-independent Random Groups
Idea1 Given sample A, use a random mechanism to divide A into
A = ∪Gg=1A(g), where A(1), · · · ,A(G) are disjoint.
2 Calculate θ(1), · · · , θ(G) and treat them as independent3 Use
V =1
G
1
G − 1
G∑k=1
(θ(k) − θRG
)2as a variance estimator for θ.
Requirement : Each A(g) should have same design as A.
Impractical in some cases, Unstable
Kim (ISU) Ch. 10: Fall, 2014 27 / 43
Replication variance estimation
Non-independent Random Groups
Property
Let θ1, · · · , θG be random variables with common expectation E(θi
)= θ.
Then,
E{V(θRG
)}− V
(θRG
)= − 1
G (G − 1)
∑∑i 6=j
Cov(θ(i), θ(j)
)
If θ(1), · · · , θ(G) are independent, then RHS=0.
If θ(1), · · · , θ(G) are identically distributed, then
RHS=−Cov(θ(1), θ(2)
).
Kim (ISU) Ch. 10: Fall, 2014 28 / 43
Replication variance estimation
Example : Use of non-independent random group methodunder simple random sampling
Interested in the variance estimation for θ = y under simple randomsampling.Partition the sample into G groups of dependent samplesA = ∪Gg=1A(g) where A(g) is a simple random sample of size b = n/G .
Compute θ(g) = y(g) from A(g). Note that
θ =1
G
G∑g=1
y(g).
How large is the bias of V(θRG
)?
Bias(V)
= −Cov(y(1), y(2)
)=
1
NS2
Kim (ISU) Ch. 10: Fall, 2014 29 / 43
Replication variance estimation
Jackknife method for variance estimation
Motivation
Basic Setup : Let (xi , yi ) be IID from a bivariate distribution withmean (µx , µy ). Let θ = µy/µx . Standard ratio estimator of θ,θ = x−1y , has bias of order O
(n−1).
Quenouille (1949) idea : propose a bias-reduced estimator of θ :
θ(.) =1
n
n∑k=1
θ(k)
where θ(k) = nθ − (n − 1) θ(−k) and
θ(−k) =
∑i 6=k
xi
−1∑i 6=k
yi
Kim (ISU) Ch. 10: Fall, 2014 30 / 43
Replication variance estimation
Jackknife method for variance estimation
Motivation (Cont’d)
Tukey (1958) : treat θ(1), · · · , θ(G) as an independent random groupof size n to get
VJK
(θ)
.=
1
n
1
n − 1
n∑k=1
(θ(k) − θ(.)
)2=
n − 1
n
n∑k=1
(θ(−k) − θ(.)
)2.
Kim (ISU) Ch. 10: Fall, 2014 31 / 43
Replication variance estimation
Taylor theorem 2 :Let {Xn,Wn} be a sequence of random variables such that
Xn = Wn + Op (rn)
where rn → 0 as n→∞. If g (x) is a function with s-th continuousderivatives in the line segment joining Xn and Wn and the s-th orderpartial derivatives are bounded, then
g (Xn) = g (Wn) +s−1∑k=1
1
k!g (k) (Wn) (Xn −Wn)k
+Op (r sn)
where g (k) (a) is the k-th derivative of g (x) evaluated at x = a.
Kim (ISU) Ch. 10: Fall, 2014 32 / 43
Replication variance estimation
Properties
If θn = y , then
VJK =1
n
1
n − 1
n∑i=1
(yi − y)2 =1
ns2y
Under some regularity conditions, y (−k) − y = Op
(n−1).
For θ = f (x , y), we have
θ(−k) − θ =∂f
∂x(x , y)
(x (−k) − x
)+∂f
∂y(x , y)
(y (−k) − y
)+ op
(n−1)
The jackknife variance estimator defined by
VJK =n − 1
n
n∑k=1
(θ(−k) − θ
)2is asymptotically equivalent to the linearized variance estimator.
Kim (ISU) Ch. 10: Fall, 2014 33 / 43
Replication variance estimation
Example: Ratio
R =y
x,
Jackknife replicates for R:
R(k) =y (−k)
x (−k)=
ny − yknx − xk
Taylor expansion 2:
R(−k) = R + (x)−1(y (−k) − Rx (−k)
)+ Op
(n−1)
Kim (ISU) Ch. 10: Fall, 2014 34 / 43
Replication variance estimation
Example: Ratio (Cont’d)
Jackknife variance estimator
VJK =n − 1
n
n∑k=1
(R(−k) − R
)2.
=n − 1
n
n∑k=1
{(x)−1
(y (−k) − Rx (k)
)}2
=1
n(n − 1)
1
x2
n∑k=1
(yk − Rxk
)2Jackknife variance estimator for ratio estimator ˆYr = X R isasymptotically equivalent to
VJK =
(X
x
)21
n(n − 1)
n∑k=1
(yk − Rxk
)2,
which is often called the conditional variance estimator.Kim (ISU) Ch. 10: Fall, 2014 35 / 43
Replication variance estimation
Example : Post-stratification ( Under SRS)
Point estimator:
Ypost =G∑
g=1
Ng yg =G∑
g=1
Ng
ng
∑i∈Ag
yi
k-th jackknife replicate of Ypost : For k ∈ Ah,
Y(−k)post − Ypost = Nh
(y(−k)h − yh
)= Nh (nh − 1)−1 (yh − yk)
Kim (ISU) Ch. 10: Fall, 2014 36 / 43
Replication variance estimation
Example : Post-stratification ( Under SRS)
Jackknife variance estimator
VJK
(Ypost
)=
n − 1
n
n∑k=1
(Y
(−k)post − Ypost
)2=
n − 1
n
G∑g=1
N2g (ng − 1)−1 s2g
Asymptotically equivalent to the conditional variance estimator.
Kim (ISU) Ch. 10: Fall, 2014 37 / 43
Replication variance estimation
Extension to complex sampling
Stratified cluster sampling design
YHT =H∑
h=1
nh∑i=1
whi Yhi
Jackknife replicates : Delete j-th cluster in g -th stratum
Y(−gj)HT =
H∑h=1
nh∑i=1
w(gj)hi Yhi
where
w(−gj)hi =
0 if h = g and i = j
(nh − 1)−1 nhwhi if h = g and i 6= jwhi otherwise
Kim (ISU) Ch. 10: Fall, 2014 38 / 43
Replication variance estimation
Extension to complex sampling (Cont’d)
Jackknife variance estimator
VJK
(YHT
)=
H∑h=1
nh − 1
nh
nh∑i=1
Y(−hi)HT − 1
nh
nh∑j=1
Y(−hj)HT
2
Property
VJK
(YHT
)=
H∑h=1
nhnh − 1
nh∑i=1
whi Yhi −1
nh
nh∑j=1
whj Yhj
2
≡ V0
Kim (ISU) Ch. 10: Fall, 2014 39 / 43
Replication variance estimation
Balanced repeated replication
Basic Set-Up : Stratified sampling with nh = 2, θ =∑H
h=1WhyhHalf Sample Replication : Pick one element for each stratum
θ(v) =H∑
h=1
Wh
{δ(v)h yh1 + (1− δ(v)h )yh2
}, v = 1, · · · , 2H
where δ(v)h =
{1 if yh1 is selected0 if yh2 is selected.
Note that
θ(v) − θ =H∑
h=1
0.5Wh
(2δ
(v)h − 1
)(yh1 − yh2) .
Also, the standard variance estimator for stratified sampling (withnh = 2) is
V (θ) =H∑
h=1
W 2h (yh1 − yh2)2/4.
Kim (ISU) Ch. 10: Fall, 2014 40 / 43
Replication variance estimation
Balanced repeated replication
Balanced Condition : There always exist s1, · · · , sG withH < G < H + 4 such that
G∑v=1
(2δ(v)h − 1)(2δ
(v)h′ − 1) = 0 if h 6= h′
Variance Estimator :
VBRR =1
G
G∑v=1
(θ(v) − θ)2
Property: If δ(v)h satisfies the balanced condition, then
VBRR =∑h
W 2h (yh1 − yh2)2 /4
which is equal to the variance estimator of a stratified samplingestimator with nh = 2.
Kim (ISU) Ch. 10: Fall, 2014 41 / 43
Replication variance estimation
Balanced repeated replication
Example : BRR for H = 3
MG : Hadamard matrix of order G :
G × G matrix of ± 1.M ′GMG = GIG
If MG satisfies M ′GMG = GIG , then M2G =
(M MM −M
)For G = 4,
M4 =
1 1 1 11 −1 1 −11 −1 −1 11 1 −1 −1
Each column of MG are mutually orthogonal : satisfies the balancedcondition
Kim (ISU) Ch. 10: Fall, 2014 42 / 43
Replication variance estimation
Balanced repeated replication
Example : BRR for H = 3 (Cont’d)
G = 4 BRR replicates can be constructed as follows:
θ(1) = W1y11 + W2y21 + W3y31
θ(2) = W1y12 + W2y21 + W3y32
θ(3) = W1y12 + W2y22 + W3y31
θ(4) = W1y11 + W2y22 + W3y32
Can check that
1
G
G∑v=1
(θ(v) − θ)2 =∑h
W 2h (yh1 − yh2)2 /4
Kim (ISU) Ch. 10: Fall, 2014 43 / 43