Designing and Conducting Business Surveys for Official Statistics, Tbilisi, November 2016
Haraldsen & Snijkers 2016 1
Sampling Business Surveys
TSE approach: Quality = 1 – ∑(Bias2 + Variance)
)]VVV()BBBBB[(1 gsinocesPrtMeasuremenSampling
2
gsinocesPrtMeasuremeneNonresponsFrameionSpecificat
Population
Sample frame
Respondents
Construct
Measurement instrument
Response
Adjusted Edited
Data delivery
Sample
Coverage errors
Sampling errors
Nonresponse errors
Adjustment errors
Validity
Measurement errors
Processing errors
WHO? WHAT? HOW?
Development & testing
Specification
Survey communication & management
Data delivery & documentation
Coding, cleaning & data integration
Survey Cycle Extended Survey Cycle
2 Haraldsen & Snijkers 2016
Administrative --> Statistical Business Register. Norway 2009
Haraldsen & Snijkers 2016 3
0.0 %
20.0 %
40.0 %
60.0 %
80.0 %
100.0 %
Identied Legal Entities Errors subtracted Passive subtracted
100.0 % 93.0 %
49.0 %
Cost Efficiency Cost
Quality )V()BB(1 Sampling
2
FrameionSpecificat
SizeSample
4 Haraldsen & Snijkers 2016
• IDENTIFY • CLASSIFY
• FRAME
• SAMPLE
Inactive included Active excluded One counted as several (multiplicity) Several counted as one (clustering) Inaccurate classification Misclassification
UPDATE
Cost Efficiency Cost
Quality )V()BB(1 Sampling
2
FrameionSpecificat
SizeSample
5 Haraldsen & Snijkers 2016
• With computerized questionnaires the relationship between cost and sample size is weaker than before
• For the individual company sample size does not affect response burden
• For the business world, however, it does
Sampling:
Haraldsen & Snijkers 2016 6
• Stratification as default
• Complete enumerated strata
• Partitions based on the most important domains in estimation
• Keep the number of stratifiers (and strata) low
Cost Efficiency Cost
Quality )V()BB(1 Sampling
2
FrameionSpecificat
SizeSample
Stratification =
• Decide what auxiliary variables to use as stratification variables
• Determine the number of strata
• Assign units to strata • Allocate the number to
be sampled from each strata (sample allocation)
• Take constraints (like expected nonresponse) into consideration
• (Random) Sampling within strata
= Industry code + no of employees?
Haraldsen & Snijkers 2016
7
= Which predict the survey outcome well
the partition of the population in such a way that the elements within a stratum are as similar as possible and the means of the strata are as different as possible.
= The point where variance decrease flattens out
Measures of variability
Haraldsen & Snijkers 2016 8
Range 𝑋 = 𝑋𝑀𝑎𝑥 − 𝑋𝑀𝑖𝑛
Variance 𝑠2 = 1
𝑛 𝑋𝑖 − 𝑥
2
𝑛
𝑖=1
Standard deviation
𝑠 = 1
𝑛 𝑋𝑖 − 𝑥
2
𝑛
𝑖=1
Coefficient of Variation
𝑐𝑣 = 1
𝑥
1
𝑛 𝑋𝑖 − 𝑥
2
𝑛
𝑖=1
Stratification by intuition: Dalenius-Hodges cumulative method for determining stratum boundaries
0
5
10
15
20
25
30
35
t = 29,9
t/4 = 7,5
2t/4 = 14,9
3t/4 = 22,4
Stratification by intuition 2: Splitting large groups
1-10 > 10-20 > 20-30 > 30-40 > 40-50
44 52 23 21 17
6,6 13,8 18,6 23,2 27,3 35,6 39,7 43,3 43,3 45,5 45,5 45,5 47,1
1 1 2 2 3 3 4 4 4 4 4 4 4
0
5
10
15
20
25
30
35
40
45
50
t = 47,1
t/4 = 11,8
2t/4 = 23,5
3t/4 = 35,3
Alternatives
Haraldsen & Snijkers 2016 11
Varying sample
variances Hidiroglou (1986) Lavallée – Hidroglou (1988)
Power Allocation Allocation = proportional population standard deviation estimate x stratum population size (Neyman 1934)
Cochran (1977)
𝑛𝑖 = 𝑛𝑠ℎ𝑁ℎ 𝑠ℎℎ 𝑁ℎ
= internal + external cost (response burden)
Probability Proportion (to size) Sampling: Direct rather than stratified sampling proportional to a suitable auxiliary information
0 ≤ s(eed) ≤ X/n n1≈ s n2≈ s + X/n
n3≈ s + X/n + X/n
nn≈ ………
Hard to sample:
• A range of products (or services) that vary between none and a lot – Large samples or samples based on census
• Prices (transactions) of a range of products (or services) – Multistage sampling
• Estimates of rare characteristics – Satellite Registers
– Two stage sampling/Filter questions
Haraldsen & Snijkers 2016 12
Cutoff sampling
Haraldsen & Snijkers 2016 13
93.1 %
6.4 % 0.5 %
28.5 % 31.8 %
39.7 %
23.9 % 24.6 %
51.6 %
0.0 %
10.0 %
20.0 %
30.0 %
40.0 %
50.0 %
60.0 %
70.0 %
80.0 %
90.0 %
100.0 %
Small = 0-9 employees Medium = 10 -99employees
Large = 100 + employees
No of businesses No of employees Economic turnover
Variable rather than unit coverage: • The large are fully
enumerated • The middle size are
sampled • The small are left out
Common in spite of immediate disadvantages: • Estimates among the small based on assumptions • Biased estimates of the total • Changes among the smallest go unnoticed • How to determine the cutoff?
Sample rotation and overlap
Haraldsen & Snijkers 2016 14
Variance of change = var (t1) + var (t2) – 2cov(t1,t2)
Panel element
Representativity +/- Learning effects (Perceived) Response Burden
Common,
Frozen Frame
Sample rotation Response Burden
• For the business world response burden is unaffected by sample rotation
Haraldsen & Snijkers 2016 15
Participation scenario, 2 surveys
Actual burden (1,0), B1 (0,1), B2 (1,1), B1B2 Expected burden
Allow overlap p1(1-p2) (1-p1)p2 p1p2 p1B1 + p2B2
No overlap p1 p2 0 p1B1 + p2B2
Sample rotation Response Burden • For the business world response burden is
unaffected by sample rotation
Haraldsen & Snijkers 2016 16
• Sample/population size decides the room for rotation
• Actual fairness is unrealistic
No of employees 0-4 5-9 10-19 20-49 50-99 100+
Sample no 1 0 1286 1135 1256 1124 1635 Sample no 2 0 400 947 2094 2236 1982 Sample no 3 0 948 1509 838 1124 1239 Sample no 4 3858 1606 3787 4188 2236 1982 Sample no 5 71969 12843 15140 8368 2236 1982 Sample no 6 0 386 758 1256 1124 1982 Total sampled 75828 17469 23276 17999 10081 10802 Population size 349104 25683 15140 8368 2236 1982
Sample rotation Perceived Response Burden
• Compared to those not sampled, fairness is a weak argument
• For most businesses response burden appears slightly more evenly distributed
Haraldsen & Snijkers 2016 17
No of employees 0-4 5-9 10-19 20-49 50-99 100+
Sample no 4 3858 1606 3787 4188 2236 1982 Sample no 5 71969 12843 15140 8368 2236 1982
Total sampled 75828 17469 23276 17999 10081 10802 Population size 349104 25683 15140 8368 2236 1982
Neither 4 nor 5 78 % 44 % -25 % -50 % -100 % -100 % Both 4 and 5 0,23 % 3 % 25 % 50 % 100 % 100 %
Sample rotation Perceived Response Burden
• Compared to those not sampled, fairness is a weak argument
• For most businesses response burden appears slightly more evenly distributed
• Expected survey holidays are probably the most convincing argument
Haraldsen & Snijkers 2016 18
No of employees 0-4 5-9 10-19 20-49 50-99 100+
Sample no 5 71969 12843 15140 8368 2236 1982 Sample no 6 0 386 758 1256 1124 1982 Total sampled 75828 17469 23276 17999 10081 10802 Population size 349104 25683 15140 8368 2236 1982
Available for rotation 273276 8214 To be rotated 37914 8734 Survey holiday calculation 7 Conditional holiday calculation 9 2 0
“A design that is robust to nonsampling errors will often be better than a highly optimized design that cannot be realized in practice”
Paul Smith’s summary of chapter 5
Haraldsen & Snijkers 2016 19