2015-7-3 xlpeng 1 Stratified Simple Random Sampling (Chapter 5, Textbook, Barnett, V., 1991)...

Post on 22-Dec-2015

224 views 4 download

transcript

23/4/19 www.uic.edu.hk/~xlpeng 1

Stratified Simple Random Sampling(Chapter 5, Textbook, Barnett, V., 1991)

Consider another sampling method:

Definition: Stratified Random Sample

23/4/19 www.uic.edu.hk/~xlpeng 2

A stratified random sample is obtained by dividing the population elements into non-overlapping groups, called strata and then selecting a random sample directly and independently from each stratum.

A stratified SRS is a special case of stratified sampling that uses SRS for selecting units from each stratum.

Examples of stratification

23/4/19 www.uic.edu.hk/~xlpeng 3

1.For some types of income and expenditure surveys on households in urban areas, states, provinces, counties, and districts may be considered as the strata.

2. For business surveys on production, and sales, stratification is usually based on industrial classifications like industry type and employment size.

.

Reasons for using stratified sampling

23/4/19 www.uic.edu.hk/~xlpeng 4

• Allow sub-estimates: they can then be combined to give an overall estimate, e.g. we estimate the income level at district level as well as the whole HK.

• Administrative convenience.• Allow different sampling fractions and methods: they

may be implemented in different sub-population, e.g. small/large business, private/government housing, urban/rural households.

• More efficient estimates: if a heterogeneous population is divided into strata that are internally homogeneous.

Some Notations

23/4/19 www.uic.edu.hk/~xlpeng 5

To estimate the population mean of a finite population, we assume that the population is stratified, that is to say it has been divided into k non-overlapping groups, or strata, of sizes:

The stratum means and variances are denoted by

and

23/4/19 www.uic.edu.hk/~xlpeng 6

Estimation of Population Characteristicsin Stratified Populations

Taking a stratified random sample

23/4/19 www.uic.edu.hk/~xlpeng 7

Sample mean and variance for ith stratum are denoted by

In each stratum, we have a sampling fraction:

Estimating

23/4/19 www.uic.edu.hk/~xlpeng 8

The stratified sample mean is defined as

Here we assume the weights Wi=Ni /N is given (known).

The mean and variance of

23/4/19 www.uic.edu.hk/~xlpeng 9

Note that

Since

Because it is assumed that “sampling in different strata are independent”, that is

An unbiased estimator of

23/4/19 www.uic.edu.hk/~xlpeng 10

where

Some Special cases of

23/4/19 www.uic.edu.hk/~xlpeng 11

Estimator of the “pooled variance”

23/4/19 www.uic.edu.hk/~xlpeng 12

Example: Advertising firm

23/4/19 www.uic.edu.hk/~xlpeng 13

An advertising firm conduct a sample survey to estimate the average

number of hours each week that households watch TV. The county contains 2 towns, A and B, and a rural area. Town A is built around a factory and contain mostly factory workers and school-aged children. Town B is an suburb of a city and contains older residents with few children at home.

There are 155 households in town A, 62 in town B, and 93 in the rural area. The advertising firm interview n = 40 households with random samples of size n1 = 20 from town A, n2 = 8 from town B, and n3 = 12 from the rural area with proportional allocation. The measurements of TV-viewing time in hours per week, are shown below:

23/4/19 www.uic.edu.hk/~xlpeng 14

Example: Advertising firm

23/4/19 www.uic.edu.hk/~xlpeng 15

Example: Advertising firm

(a) Estimate the average TV-viewing time, in hours per week, for all households in the county.

(b) In the study, the families of town A tend to be younger and have more children than those of town B. Estimate the difference between the average TV-viewing time, in hours per week, for families of these 2 towns.

In both cases, provide an estimate of standard error for the estimation.

23/4/19 www.uic.edu.hk/~xlpeng 16

Example: Advertising firmSolution (a): The population of households falls into 3 groups, 2 towns and a rural area with

23/4/19 www.uic.edu.hk/~xlpeng 17

Since the SRSs chosen within each stratum are independent, the variance of the difference between 2 independent random variables is the sum of their respective variances. The estimate of the difference is

Example: Advertising firmSolution (b):

Example: Estimation of the population total

23/4/19 www.uic.edu.hk/~xlpeng 18

(c) Estimate the total number of hours each week that households

view TV. Provide an estimate of s.e. for the estimation.Solution:

23/4/19 www.uic.edu.hk/~xlpeng 20

Simple random sampling

Stratified sampling with proportional allocation

23/4/19 www.uic.edu.hk/~xlpeng 21

(a) When stratum size is large enough:

N

N i

23/4/19 www.uic.edu.hk/~xlpeng 22

(b) When stratum size is not large enough:

The stratified sample mean will be more efficient than the s.r. sample mean

If and only if variation between the stratum means is sufficiently large

compared with within-strata variation!

23/4/19 www.uic.edu.hk/~xlpeng 23

*****

*****

*****

*****

*****

V

IV

III

II

I

EDCBA

(15 males and 10 females)

Take a stratified random sample with size 5 in each case, that is:

23/4/19 www.uic.edu.hk/~xlpeng 24

23/4/19 www.uic.edu.hk/~xlpeng 25

87.12)var( y

Optimum Choice of Sample Size

23/4/19 www.uic.edu.hk/~xlpeng 26

To achieve required precision of estimation Some cost limitation

The simplest form assumes that there is some overhead cost, c0 of administering

The survey, and that individual observations from the ith stratum each cost an

Amount ci. Thus the total cost is:

I. Minimum variance for fixed cost

23/4/19 www.uic.edu.hk/~xlpeng 27

23/4/19 www.uic.edu.hk/~xlpeng 28

I. Minimum variance for fixed cost (Cont.)

23/4/19 www.uic.edu.hk/~xlpeng 29

I. Minimum variance for fixed cost (Cont.)

Then

II. Minimum cost for fixed variance

23/4/19 www.uic.edu.hk/~xlpeng 30

Consider to satisfy for the minimum possible total cost.

23/4/19 www.uic.edu.hk/~xlpeng 31

II. Minimum cost for fixed variance (Cont.)

23/4/19 www.uic.edu.hk/~xlpeng 32

iii nwnwGiven ,

23/4/19 www.uic.edu.hk/~xlpeng 33

Comparison of proportional allocation and optimum allocation

23/4/19 www.uic.edu.hk/~xlpeng 34

Thus the extent of the potential gain from optimum (Neyman) allocation

Compared with proportional allocation depends on the variability of the

stratum variances: the larger this is, the greater the relative advantage

Of optimum allocation.

23/4/19 www.uic.edu.hk/~xlpeng 35

23/4/19 www.uic.edu.hk/~xlpeng 36

22 )0882.0()(

1728.0243/

42%10*420

z

dV

dd

d

u

T

T

23/4/19 www.uic.edu.hk/~xlpeng 37

For optimum allocation:

)/(

/

iii

iiiii

cSW

cSW

n

nw

The sample weights are about (0.527, 0.348, 0.124).

The required total sample size is now 31,

consisting of 16, 11, and 4 for each stratum.

By using simple random sampling, we will need 62 samples!

Double sampling for stratification Some practical considerations: Unknown of Ni and Si

2

Double sampling is a two-phase sampling. For example, we may call many voters to identify income level

(phase 1 sample), when only a few could be interviewed (phase 2 sample) for purposes of completing a detailed questionnaire.

Quota Sampling

23/4/19 www.uic.edu.hk/~xlpeng 38

23/4/19 www.uic.edu.hk/~xlpeng 39

Double sampling for stratification

Post-hoc stratification

23/4/19 www.uic.edu.hk/~xlpeng 40

Suppose plans have been drawn up to conduct a sample survey on a

stratified population, and that stratum sizes and stratum variances are

known. However, we may not be able to determine in which stratum an

observation belongs, until it has been drawn.

For example, where strata correspond to different personal details on

people-such as their religious beliefs, income levels, and so on.

Sometimes we may have to draw our sample and stratify it subsequently:

that is, carry out a post-hoc stratification.

For such factors, published national reports may provide a clear indication of stratum weights (sizes) and variances, but it can be most difficult to sample Individuals from specific strata.

23/4/19 www.uic.edu.hk/~xlpeng 41

Another possible use of post-hoc stratification is to correct “obvious

lack of representativeness” in a s.r. sample.

Post-hoc stratification

23/4/19 www.uic.edu.hk/~xlpeng 42

Post-hoc stratification

Conclusions on Stratified Sampling:

23/4/19 www.uic.edu.hk/~xlpeng 43

23/4/19 www.uic.edu.hk/~xlpeng 44

Conclusions on Stratified Sampling: