23/4/19 www.uic.edu.hk/~xlpeng 1
STAT 4060 Design and Analysis of Surveys
Exam: 60% Mid Test: 20% Mini Project: 10% Continuous assessment: 10%
23/4/19 www.uic.edu.hk/~xlpeng 2
What we have learned:
1. Simple random sampling, confidence interval and choice of sample size.
2. Ratio and regression estimators, systematic sampling.
3. Stratified random sampling, allocation of stratum weights.
4. Cluster sampling.
23/4/19 www.uic.edu.hk/~xlpeng 3
Population Parameter
23/4/19 www.uic.edu.hk/~xlpeng 4
Sample Statistics
23/4/19 www.uic.edu.hk/~xlpeng 5
Simple random sampling
We shall consider the use of simple random samples for estimating the three population characteristics:
the population mean
the population total
and the proportion P.
We shall discuss how any estimators behave in terms of their sampling distributions. The variance is often a crucial measure.
1
1, denoted , ;
N
jj
Y Y YN
1
, denoted , ;N
T T jj
Y Y Y
23/4/19 www.uic.edu.hk/~xlpeng 6
23/4/19 www.uic.edu.hk/~xlpeng 7
Proof of (1.9)
n
SfS
Nn
nS
Nn
N
yynnyVarn
YnYyynnYyVarnn
YnyEynnEyn
YyyyEn
YnyEyEyEyVar
jii
jii
jii
jijii
n
ii
222
22222
2222
222
22
1
22
)1(11
)),cov()1()((1
})),)(cov(1())(({1
})1({1
)(1
)/()()()(
23/4/19 www.uic.edu.hk/~xlpeng 8
Confidence interval for the population mean
23/4/19 www.uic.edu.hk/~xlpeng 9
23/4/19 www.uic.edu.hk/~xlpeng 10
23/4/19 www.uic.edu.hk/~xlpeng 11
23/4/19 www.uic.edu.hk/~xlpeng 12
23/4/19 www.uic.edu.hk/~xlpeng 13
23/4/19 www.uic.edu.hk/~xlpeng 14
Ratio Estimation and Regression Estimation(Chapter 4, Textbook, Barnett, V., 1991)
2.1 Estimation of a population ratio: The ratio estimator In some situations it is useful to estimate a (positive) ratio of two
population characteristics: the totals, or means, of two (positive) variables X and Y.
The sample average of ratio
unbiased for estimating the population mean
Two obvious estimators of R are
The ratio of the sample averages
is widely used.
23/4/19 www.uic.edu.hk/~xlpeng 15
1 1
1 1( / )
n n
i i ii i
r y x rn n
/ /T Tr y x y x
1 1
1 1( / )
N N
j j jj j
R R Y XN N
but biased for estimating R
The bias in estimating R by r
The bias in estimating R by r is the expectation of the following difference:
(2.3)
23/4/19 www.uic.edu.hk/~xlpeng 16
( ) /r R y Rx x 1
1y Rx x X
X X
2
1 .y Rx x X x X
X X X
2
[( )( )]( )
y Rx E y Rx x XE r R E
X X
Discussion about the bias
23/4/19 www.uic.edu.hk/~xlpeng 17
≈
23/4/19 www.uic.edu.hk/~xlpeng 18
(2.5)
2
21
2 2 22
( )1
1
12
Nj j
j
Y YX X
Y RXf
nX N
fS RS R S
nX
( ) ( )j j j j jZ Y RX Y Y RX RX
2.2 Ratio estimation of a population mean or total
23/4/19 www.uic.edu.hk/~xlpeng 19
( / )Ry rX X x y
( / )TR T Ry rX NX x y Ny
Variance of ratio estimator
23/4/19 www.uic.edu.hk/~xlpeng 20
23/4/19 www.uic.edu.hk/~xlpeng 21
23/4/19 www.uic.edu.hk/~xlpeng 22
The estimate of the ratio R of the present weight to prestudy weight for the herd is:
Solution:
000929.012
646.848,8)
500
121(
880
11)(
22
2
rSXn
frVar
030485.0000929.0)( rse
23/4/19 www.uic.edu.hk/~xlpeng 23
This examines when the variance of (2.10) could be less or greater than that of (1.9)
23/4/19 www.uic.edu.hk/~xlpeng 24
2.3 Regression estimation
Condition (2.15.1) demands that X and Y be linearly related, but, if the linear relationship does not pass through the origin, then, it suggests considering an alternative estimator known as regression estimator.
23/4/19 www.uic.edu.hk/~xlpeng 25
2.3 Regression estimation
23/4/19 www.uic.edu.hk/~xlpeng 26
A practicable simple linear regression model is (2.17)
.
An ideal (perfect) linear relationship is
(2.16)
)( jj XXbYY
(2.18)
jjj EXXbYY )(
2.3 Regression estimation
23/4/19 www.uic.edu.hk/~xlpeng 27
Consider the average (mean) of either (2.16) or (2.17),
( )Ly y b X x (2.19)
2.3 Regression estimation
23/4/19 www.uic.edu.hk/~xlpeng 28
2( ) [( ) ]L LVar y E y Y 2
2 2 2
2 2
{[( ) ( )] }
1( 2 )
1(1 )
L
Y YX X
Y YX
E y Y b x X
fS bS b S
nfS
n
21( )Y
fS Var y
n
(2.20)
y
2.3 Regression estimation
23/4/19 www.uic.edu.hk/~xlpeng 29
From (2.20),
2 2 21min { ( )} min ( 2 )b L b Y YX X
fVar y S bS b S
n
2 21(1 )Y YX
fS
n
The minimum is obtained with 2min / /YX X YX Y Xb b S S S S
Y
Thus the most efficient regression estimator of is
( / )( )L YX Y Xy y S S X x
(2.22)
2.3 Regression estimation
23/4/19 www.uic.edu.hk/~xlpeng 30
The optimal value of b of (2.22) suggests the obvious estimate:
1min 2 2
1
( )( )( )
( )
n
i iyx in
x ii
y y x xsb b
s x x
(2.24)
( )Ly y b X x (2.25)
which enjoys the following asymptotic properties:
1( ) ( )LE y Y O n
2.3 Regression estimation
23/4/19 www.uic.edu.hk/~xlpeng 31
Asymptotic properties:
( )LVar y
2 2 2 3/21( / ) ( )Y YX X
fS S S O n
n
21( ) ( )L y yx
fV y s bs
n
(2.27)
(2.26) )()1(1 2/322
nOSn
fXYX
2.4 Comparison of ratio and regression estimators
23/4/19 www.uic.edu.hk/~xlpeng 32
23/4/19 www.uic.edu.hk/~xlpeng 33
2.4 Comparison of ratio and regression estimators
2 2 2 21( ) ( ) 2R L X YX Y X YX Y
fV y Var y R S R S S S
n
21X YX Y
fRS S
n
23/4/19 www.uic.edu.hk/~xlpeng 34
Stratified Simple Random Sampling(Chapter 5, Textbook, Barnett, V., 1991)
Consider another sampling method:
Some Notations
23/4/19 www.uic.edu.hk/~xlpeng 35
To estimate the population mean of a finite population, we assume that the population is stratified, that is to say it has been divided into k non-overlapping groups, or strata, of sizes:
The stratum means and variances are denoted by
and
23/4/19 www.uic.edu.hk/~xlpeng 36
Estimation of Population Characteristicsin Stratified Populations
Estimating
23/4/19 www.uic.edu.hk/~xlpeng 37
The stratified sample mean is defined as
Here we assume the weights Wi=Ni /N is given (known).
The mean and variance of
23/4/19 www.uic.edu.hk/~xlpeng 38
Note that
Since
Because it is assumed that “sampling in different strata are independent”, that is
23/4/19 www.uic.edu.hk/~xlpeng 39
Simple random sampling
Stratified sampling with proportional allocation
23/4/19 www.uic.edu.hk/~xlpeng 40
(a) When stratum size is large enough:
N
N i
23/4/19 www.uic.edu.hk/~xlpeng 41
(b) When stratum size is not large enough:
The stratified sample mean will be more efficient than the s.r. sample mean
If and only if variation between the stratum means is sufficiently large
compared with within-strata variation!
Optimum Choice of Sample Size
23/4/19 www.uic.edu.hk/~xlpeng 42
To achieve required precision of estimation Some cost limitation
The simplest form assumes that there is some overhead cost, c0 of administering
The survey, and that individual observations from the ith stratum each cost an
Amount ci. Thus the total cost is:
23/4/19 www.uic.edu.hk/~xlpeng 43
I. Minimum variance for fixed cost (Cont.)
23/4/19 www.uic.edu.hk/~xlpeng 44
I. Minimum variance for fixed cost (Cont.)
Then
II. Minimum cost for fixed variance
23/4/19 www.uic.edu.hk/~xlpeng 45
Consider to satisfy for the minimum possible total cost.
23/4/19 www.uic.edu.hk/~xlpeng 46
iii nwnwGiven ,
23/4/19 www.uic.edu.hk/~xlpeng 47
Comparison of proportional allocation and optimum allocation
23/4/19 www.uic.edu.hk/~xlpeng 48
Thus the extent of the potential gain from optimum (Neyman) allocation
Compared with proportional allocation depends on the variability of the
stratum variances: the larger this is, the greater the relative advantage
Of optimum allocation.
23/4/19 www.uic.edu.hk/~xlpeng 49
Cluster Sampling(Chapter 6, Textbook, Barnett, V., 1991)
23/4/19 www.uic.edu.hk/~xlpeng 50
23/4/19 www.uic.edu.hk/~xlpeng 51
23/4/19 www.uic.edu.hk/~xlpeng 52
23/4/19 www.uic.edu.hk/~xlpeng 53
23/4/19 www.uic.edu.hk/~xlpeng 54
Comparison of s.r. sampling with cluster sampling
Systematic Sampling
23/4/19 www.uic.edu.hk/~xlpeng 55
Systematic sample can be viewed as a cluster sample of size m=1!
Systematic sample mean
Systematic Sampling
23/4/19 www.uic.edu.hk/~xlpeng 56
Comparison of s.r. sampling with systimatic sampling
23/4/19 www.uic.edu.hk/~xlpeng 57
Two ways of estimating ---
23/4/19 www.uic.edu.hk/~xlpeng 58
Y
23/4/19www.uic.edu.hk/~xlpeng 59
n
23/4/19 www.uic.edu.hk/~xlpeng 60