Designing and Conducting Business Surveys for Official Statistics, Tbilisi, November 2016 ·...

Designing and Conducting Business Surveys for Official Statistics, Tbilisi, November 2016

Haraldsen & Snijkers 2016 1

Sampling Business Surveys

TSE approach: Quality = 1 – ∑(Bias2 + Variance)

)]VVV()BBBBB[(1 gsinocesPrtMeasuremenSampling

2

gsinocesPrtMeasuremeneNonresponsFrameionSpecificat

Population

Sample frame

Respondents

Construct

Measurement instrument

Response

Adjusted Edited

Data delivery

Sample

Coverage errors

Sampling errors

Nonresponse errors

Adjustment errors

Validity

Measurement errors

Processing errors

WHO? WHAT? HOW?

Development & testing

Specification

Survey communication & management

Data delivery & documentation

Coding, cleaning & data integration

Survey Cycle Extended Survey Cycle

2 Haraldsen & Snijkers 2016

Administrative --> Statistical Business Register. Norway 2009


0.0 %

20.0 %

40.0 %

60.0 %

80.0 %

100.0 %

Identied Legal Entities Errors subtracted Passive subtracted

100.0 % 93.0 %

49.0 %

Cost Efficiency Cost

Quality )V()BB(1 Sampling

2

FrameionSpecificat

SizeSample


• IDENTIFY • CLASSIFY

• FRAME

• SAMPLE

Inactive included Active excluded One counted as several (multiplicity) Several counted as one (clustering) Inaccurate classification Misclassification

UPDATE



2

FrameionSpecificat

SizeSample


• With computerized questionnaires the relationship between cost and sample size is weaker than before

• For the individual company sample size does not affect response burden

• For the business world, however, it does

Sampling:


• Stratification as default

• Complete enumerated strata

• Partitions based on the most important domains in estimation

• Keep the number of stratifiers (and strata) low



2

FrameionSpecificat

SizeSample

Stratification =

• Decide what auxiliary variables to use as stratification variables

• Determine the number of strata

• Assign units to strata • Allocate the number to

be sampled from each strata (sample allocation)

• Take constraints (like expected nonresponse) into consideration

• (Random) Sampling within strata

= Industry code + no of employees?

Haraldsen & Snijkers 2016

7

= Which predict the survey outcome well

the partition of the population in such a way that the elements within a stratum are as similar as possible and the means of the strata are as different as possible.

= The point where variance decrease flattens out

Measures of variability


Range 𝑋 = 𝑋𝑀𝑎𝑥 − 𝑋𝑀𝑖𝑛

Variance 𝑠2 = 1

𝑛 𝑋𝑖 − 𝑥

2

𝑛

𝑖=1

Standard deviation

𝑠 = 1


2

𝑛

𝑖=1

Coefficient of Variation

𝑐𝑣 = 1

𝑥

1


2

𝑛

𝑖=1

Stratification by intuition: Dalenius-Hodges cumulative method for determining stratum boundaries

0

5

10

15

20

25

30

35

t = 29,9

t/4 = 7,5

2t/4 = 14,9

3t/4 = 22,4

Stratification by intuition 2: Splitting large groups

1-10 > 10-20 > 20-30 > 30-40 > 40-50

44 52 23 21 17

6,6 13,8 18,6 23,2 27,3 35,6 39,7 43,3 43,3 45,5 45,5 45,5 47,1

1 1 2 2 3 3 4 4 4 4 4 4 4

0

5

10

15

20

25

30

35

40

45

50

t = 47,1

t/4 = 11,8

2t/4 = 23,5

3t/4 = 35,3

Alternatives


Varying sample

variances Hidiroglou (1986) Lavallée – Hidroglou (1988)

Power Allocation Allocation = proportional population standard deviation estimate x stratum population size (Neyman 1934)

Cochran (1977)

𝑛𝑖 = 𝑛𝑠ℎ𝑁ℎ 𝑠ℎℎ 𝑁ℎ

= internal + external cost (response burden)

Probability Proportion (to size) Sampling: Direct rather than stratified sampling proportional to a suitable auxiliary information

0 ≤ s(eed) ≤ X/n n1≈ s n2≈ s + X/n

n3≈ s + X/n + X/n

nn≈ ………

Hard to sample:

• A range of products (or services) that vary between none and a lot – Large samples or samples based on census

• Prices (transactions) of a range of products (or services) – Multistage sampling

• Estimates of rare characteristics – Satellite Registers

– Two stage sampling/Filter questions


Cutoff sampling


93.1 %

6.4 % 0.5 %

28.5 % 31.8 %

39.7 %

23.9 % 24.6 %

51.6 %

0.0 %

10.0 %

20.0 %

30.0 %

40.0 %

50.0 %

60.0 %

70.0 %

80.0 %

90.0 %

100.0 %

Small = 0-9 employees Medium = 10 -99employees

Large = 100 + employees

No of businesses No of employees Economic turnover

Variable rather than unit coverage: • The large are fully

enumerated • The middle size are

sampled • The small are left out

Common in spite of immediate disadvantages: • Estimates among the small based on assumptions • Biased estimates of the total • Changes among the smallest go unnoticed • How to determine the cutoff?

Sample rotation and overlap


Variance of change = var (t1) + var (t2) – 2cov(t1,t2)

Panel element

Representativity +/- Learning effects (Perceived) Response Burden

Common,

Frozen Frame

Sample rotation Response Burden

• For the business world response burden is unaffected by sample rotation


Participation scenario, 2 surveys

Actual burden (1,0), B1 (0,1), B2 (1,1), B1B2 Expected burden

Allow overlap p1(1-p2) (1-p1)p2 p1p2 p1B1 + p2B2

No overlap p1 p2 0 p1B1 + p2B2

Sample rotation Response Burden • For the business world response burden is

unaffected by sample rotation


• Sample/population size decides the room for rotation

• Actual fairness is unrealistic

No of employees 0-4 5-9 10-19 20-49 50-99 100+

Sample no 1 0 1286 1135 1256 1124 1635 Sample no 2 0 400 947 2094 2236 1982 Sample no 3 0 948 1509 838 1124 1239 Sample no 4 3858 1606 3787 4188 2236 1982 Sample no 5 71969 12843 15140 8368 2236 1982 Sample no 6 0 386 758 1256 1124 1982 Total sampled 75828 17469 23276 17999 10081 10802 Population size 349104 25683 15140 8368 2236 1982

Sample rotation Perceived Response Burden

• Compared to those not sampled, fairness is a weak argument

• For most businesses response burden appears slightly more evenly distributed


No of employees 0-4 5-9 10-19 20-49 50-99 100+

Sample no 4 3858 1606 3787 4188 2236 1982 Sample no 5 71969 12843 15140 8368 2236 1982

Total sampled 75828 17469 23276 17999 10081 10802 Population size 349104 25683 15140 8368 2236 1982

Neither 4 nor 5 78 % 44 % -25 % -50 % -100 % -100 % Both 4 and 5 0,23 % 3 % 25 % 50 % 100 % 100 %

Sample rotation Perceived Response Burden

• Compared to those not sampled, fairness is a weak argument

• For most businesses response burden appears slightly more evenly distributed

• Expected survey holidays are probably the most convincing argument


No of employees 0-4 5-9 10-19 20-49 50-99 100+

Sample no 5 71969 12843 15140 8368 2236 1982 Sample no 6 0 386 758 1256 1124 1982 Total sampled 75828 17469 23276 17999 10081 10802 Population size 349104 25683 15140 8368 2236 1982

Available for rotation 273276 8214 To be rotated 37914 8734 Survey holiday calculation 7 Conditional holiday calculation 9 2 0

“A design that is robust to nonsampling errors will often be better than a highly optimized design that cannot be realized in practice”

Paul Smith’s summary of chapter 5


Date post:	25-Sep-2020
Category:	Documents
Upload:	others
View:	0 times
Download:	0 times

Designing and Conducting Business Surveys for Official Statistics, Tbilisi, November 2016 ·...

Documents