Post on 31-Oct-2020
transcript
Grid sampling for a mixed-mode human survey and adjustment for non-response
Seppo Laaksonen1
1University of Helsinki, e-mail: Seppo.Laaksonen@Helsinki.Fi
Acknowledgements: The study is a methodological part of the ongoing project that is initiated by the professors Mari Vaattovaara ja Matti Kortteinen from the University of Helsinki. I also thank Henrik Lönnqvist and Teemu Kempainen who are co-working for the project as well.
NTTS 2013 _ Seppo Laaksonen 1
NTTS 2013 _ Seppo Laaksonen 2
Type of area Number of grids
Population of 25-74
years
Stratum of ’poor’ grids 1058 232416
Stratum of ’rich’ grids 1187 70382
Municipality strata without confidentiality exclusion
5020 390142
Excluded due to confidentiality from the grid-based sample but not from the municipality sampling.
1616 6785
All 8881 699725
Table 1. Statistics of grids where one or more adults living
NTTS 2013 _ Seppo Laaksonen 3
Median income < 32092Median income > 73206
Figure1. Grids for ‘rich’ people (RED) vs. ‘poor’ people (BLUE) in the municipalities of the survey. The remaining grids are between those two ones or empty of people
NTTS 2013 _ Seppo Laaksonen 4
Poor grids ‘poor, h’
Rich grids ‘rich, h’
Munici-pality ‘all, h’ Total
25-74 year Population
Helsinki, most urbanised southern area 110 46 1000 1156 27465
Helsinki, most urbanised northern area 1142 8 1000 2150 40206
Helsinki, suburb 2501 1324 2500 6325 147098
Espoo and Kauniainen 546 3127 2000 5673 131840
Hyvinkää 248 64 600 912 24944
Järvenpää 115 38 600 753 21717
Kerava 124 48 600 772 18874
Kirkkonummi 89 173 600 862 20065
Lahti 0 0 1000 1000 57059
Lohja 0 0 600 600 22613
Mäntsälä and Pornainen 49 22 600 671 13850
Nurmijärvi 85 120 600 805 21924
Sipoo 48 134 600 782 10269
Tuusula 118 201 600 919 20948
Vantaa 746 574 1500 2820 104930
Vihti 81 121 600 802 15923
All 6000 6000 15000 27000 699725
Table 2. Distribution of the gross sample to strata. The group ‘Others’ in the above scheme is equal to municipality gross sample size.
NTTS 2013 _ Seppo Laaksonen 5
h
hk
N
n
Inclusion probabilities
Single municipality strata
Strata with grid sampling and thus with post-strata, ’Rich’ grids (and similarly to ’Poor’ grids and ’All’ grids In which
hrich
hrich
kN
n
,
,
)( ,,, hrichhpoorhhall NNNN
NTTS 2013 _ Seppo Laaksonen 6
Statistics Grid part Grid part
Munici-pality part
Munici-pality part
Gross Net Gross Net
Obser-vations
12000 4387 15000 5231
Population 302357 302357 397368 397368
Mean 25.8 70.6 27.1 77.8
Minimum 8.3 18.2 13.1 39.0
Maximum 45.6 164.2 57.1 167.8
CV (%) 54.6 61.4 36.4 39.9
Table 3. Some statistics of the gross/net sample design weights
NTTS 2013 _ Seppo Laaksonen 7
Our strategy for the weight adjustments is as follows: (i) We take those initial weights wk and divide these by the estimated response probabilities (called also response propensities) of each respondent obtained from the probit model, and symbolised by pk. (ii) Before going forward, it is good to check that the probabilities pk are realistic, that is, they are not too small, for instance. Naturally, all probabilities are below one. (iii) Since the sum of the weights (i) does not match to the known population statistics by strata h or by post-strata ‘rich, h’ , ‘poor, h’ or ‘all, h’, they should be calibrated so that the sums are equal to the sums of the initial weights in each stratum. This is made by multiplying the weights (i) by the ratio in which h may refer to post-strata as well. (iv) It is good also to check these weights against basic statistics, for example as presented in Table 3. If the weights are not plausible, the model should be revised.
kh k
h k
hpw
wq
/
NTTS 2013 _ Seppo Laaksonen 8
Auxiliary variable Category
Probit estimate
Standard error p-value
Type of grid Intermediate -0.064 0.006 <.0001
(ref= Rich) Poor -0.148 0.006 <.0001
Gender Male -0.292 0.003 <.0001
(ref= Female) Female 0,000 0 . Age group 25-34 -0.618 0.006 <.0001
(ref= 65-74 years) 35-44 -0.575 0.006 <.0001
45-54 -0.439 0.006 <.0001
55-64 -0.161 0.005 <.0001
Mother tongue Finnish -0.009 0.007 0.208 No significant (ref=Swedish) Swedish 0,000 0 . Number of people 1 0.179 0.013 <.0001
(ref=6+) 2 0.359 0.013 <.0001
3 0.272 0.013 <.0001
4 0.289 0.013 <.0001
5 0.216 0.014 <.0001
Removed to the Before 1995 0.013 0.004 0.0008
current house Between 1995-2006 -0.049 0.005 <.0001
(ref=After 2006) After 2006 0,000 0 . Current and previous living area
Removed to the southern Finland
0.019 0.007 0.0113
(ref=Removed within the same zip code area
Removed within the southern Finland
0.032 0.004 <.0001
Labour market status (ref=No unemployed)
Unemployed -0.051 0.008 <.0001
Table 4. Outcomes from the response propensity modeling by probit regression
Interaction by gender in the probit model Females of all age groups are participating better. It is fairly linear since 35 years old.
NTTS 2013 _ Seppo Laaksonen 9
-1,2
-1
-0,8
-0,6
-0,4
-0,2
0
25-34 35-44 45-54 55-64 65-74
Male
Female
NTTS 2013 _ Seppo Laaksonen 10
-0,4
-0,3
-0,2
-0,1
0
0,1
0,2
0,3
0,4
Highest 2nd highest 3rd highest 3rd lowest 2nd lowest Lowest
Probit estimates by income (earning plus capital) Fairly linear relationship Not that we could not get education. This replaces it.
NTTS 2013 _ Seppo Laaksonen 11
0
0,005
0,01
0,015
0,02
0,025
0,03
0,035
0,04
0,045
0,05
Current house smaller Current house size about as earlier Current house larger
Probit estimates by current and previous house size If removed to a larger house, the response propensity is higher. If a smaller, not so motivated to participate
NTTS 2013 _ Seppo Laaksonen 12
0
20
40
60
80
100
0 20 40 60 80
Propensity, %
Figure 2. Example of the cumulative response propensities for the respondents via ‘web’ and via ‘paper’ , respectively. We see that there are lower propensities for web respondents. But a web option is good for the survey participation, any way. More effort to motivate to use web is required
Web Paper
NTTS 2013 _ Seppo Laaksonen 13
Statistics Grid part Grid part
Munici-pality part
Munici-pality part
Adjusted
weights
for all
Gross Net Gross Net Net
Observations 12000 4387 15000 5231 9618
Population 302357 302357 397368 397368 699725
Mean 25.8 70.6 27.1 77.8 72.8
Minimum 8.3 18.2 13.1 39.0 12.4
Maximum 45.6 164.2 57.1 167.8 754.3
CV (%) 54.6 61.4 36.4 39.9 67.8
Adjustment leads to a new weight with a slightly higher variation as expected
NTTS 2013 _ Seppo Laaksonen 14
Results on people’s opinion on their living area by the type of grid; the means and standard errors in parenthesis. Indicators are scaled so that 0 = lowest, 100= highest.
Intermediate grids
Poor grids
Rich grids
General assessment 74.6 (0.48) 62.3 (0.55) 83.3 (0.44)
Quality of environment 74.5 (0.31) 65.4 (0.37) 79.6 (0.32)
Quality of housing conditions
72.7 (0.34) 65.4 (0.36) 77.4 (0.33)
Quality of services 68.9 (0.37) 73.2 (0.37) 68.8 (0.42)
Assessment of living area
74.8 (0.35) 67.6 (0.40) 80.1 (0.31)
Amount of problems 44.2 (0.60) 66.7 (0.58) 34.9 (0.58)
NTTS 2013 _ Seppo Laaksonen 15
Conclusion: Administrative areas and either postal zip codes are not ideal when designing and analysing survey data. Grids offer a flexible tool since they can be of a whatever size in principle, but confidentiality should be taken carefully into account. Results based on small grids are also more interesting comparing to those of ordinary areas. Basically, people living in a small grid know each other but this is not true with administrative and similar areas.