Survey Sampling Methodology in Social Research
The Baltic-Nordic-Ukrainian Summer Schoolon Survey StatisticsKyiv, August 23-27, 2009
Volodymyr SariogloInstitute for Demography and Social
Research,Kyiv, Ukraine
Objectives of surveys in social research
Social studies have covered a wide range of topics from research into living conditions, transport, services, health, to studies on smoking, alcohol consumption and fertility.
Main household (population) sample surveys in UkraineThree major continuous surveys carried out by the SSCU are the Survey of Economic Activity of Population (SEAP or LFS) - since 1999; the Household Living Conditions Survey (HLCS) - since 1999; Household Agricultural Activity Survey (HAAS) - since 2000.These surveys have been organized as integrated: SEAP and HLCS since 1999, SEAP, HLCS and HAAS since 2004.
Demographic and Health Survey (2007).Labour migration survey (first in 2008).
Municipal level – Survey of ability- and willingness-to-pay for water, heating consumption
State regulations on quality reporting
The State Statistics Committee of Ukraine has accepted the Methodological recommendations on standard quality reporting by results of state household (population) sample surveys (March, 2008).
The Strategy of development of the state statistics for the period 2009 – 2012, which is at the stage of the statement now, provides a number of actions on improvement of statistics quality, first of all, relevance, accessibility, clarity and quality reporting.
The basic quality measures
- Relevance;- Accuracy;- Timeliness and punctuality; - Accessibility and clarity;- Comparability of statistics;- Coherence.
As sufficient quality results of population sample surveys are considered the reliable data received in planned terms concerning target population characteristics which satisfy the needs of users with the maximal completeness, are coordinated with the available data from other sources and accessible for use
Key quality indicators for quality reporting
- A description of conformity of survey results to needs of the main users (K1); - Unit response rates for main groups of units (K2);- Item response rates for key indicators (K3);- Imputation rates for key indicators (K4);- Coefficients of variation for key indicators (K5);- Length of time between the end of reference period and the date of the final results availability (K6);- The number of disseminated publications (printed and electronic format) (K7);- A description of the differences in definitions, classifications, methodology which characterize concrete sample survey results and national and international standards (K8);- A part of statistics, that are used as sources for others statistical products(K9);- Comparison of survey results with existing data from others sources (K10);
Principles of the quality providing
- The system of quality indicators is defined. The methodological tools for the adequate measurement of quality is developed; - Survey organization, development of data processing procedures and indicators estimation are carried out in view of maintenance necessity of the maximal quality of survey results. It is achieved, first of all, due to an effective using of the auxiliary information at the basic stages of survey realization. All opportunities for quality rising are used and it constant monitoring is conducted;- Information about quality level of survey results is given to users in the most full and convenient kind;
Main stage of survey process
Research Objectives
Concepts Population
Data Collection Mode
Questionnaire Design Sampling Design
Data Processing
Estimation
revise
Data Collection
Analysis
Sampling design (1)
Probability – each household (person) from the population has a known non-zero probability to be selected;Stratified – the population is stratified by geographical zone, residence area, size of settlements and so on;Multistage sampling – some steps of units selection are realized in each stratum;
Sampling design (2)
Sample design for state household surveys in Ukraine are on a procedure of stratified multistage selection.
Procedure of sample design comprises the following steps:1) exclusion of territories which cannot be surveyed;2) exclusion of the population which is not subject to survey;3) stratification of the sampling frame;4) selection of primary sample units;5) selection of secondary sample units (for urban area only);6) selection of households.
Scheme of sample design for state household sample surveys (1)
Scheme of sample design for state household sample surveys (2)
The scheme of geographical serpentine for urban SSU and rural PSU selection
Retaining PSUs in the new sample
At the designing of territorial sample for conducting of SEAP in 2008 –2013 in rural area was provided retaining of certain part of PSUs of selection 2003 – 2008 years. The main advantages:
- saving of the staff of the trained interviewers and adjusted connections is with the workers of regional and central offices, proper informative base;
- increasing of reliability of estimation of changes for surveys which are conducted on permanent basis;
- decreasing of probability of breaks (inconsistencies) of time series due to the change of territories and state of interviewers;
Rotation of households in the sample
Rotation of households in the sample
Application of the new household rotation scheme in the labour force survey gives an opportunity:- To receive more reliable indicators due to carrying out monthly interviews, instead of quarterly, even under condition of allocation of the quarterly sample to all months of quarter;- To measure with higher reliability annual changes in the indicators of employment and unemployment and to make comparisons of the corresponding periods (months and quarter) of two consecutive years;- To correctly research trends in the changes, comparing them with corresponding data from alternative sources.
Main factors which stipulate the necessity of development of surveys design
- changing of needs of main users;- changing of the phenomena and processes which are measured;- development of methodology of survey conducting, data processing
and indicator estimation;- the necessity of harmonization of sample designs, interviewers work
schedules, and quality control systems for integrated surveys.
Principles of sample size determination for continuous surveys
- determination of main indicators reliable estimation of which must be provided in each survey;- determination and detailed analysis of reliability level of main indicator estimates in the current surveys;-determination of new sample allocation and size taking into account the necessity of providing of the certain level of reliability for main indicator estimates.
Optimization of the sampling design could be understand as estimation with the highest possible precision given a range of sampling designs that can be afforded with the available resources.
HLCS sample allocation by region
0
200
400
600
800
1000
1200
1400
Crimea
Vinnyts
iaVoly
n
Dniprop
etrov
skDone
tskZhy
tomyr
Zakarp
attya
Zaporiz
hya
Ivano-F
rankivs
kKyiv
Kirovo
grad
Luga
nsk
Lviv
Mykola
ivOde
saPolt
avaRivn
eSum
yTern
poil
Kharki
vKhe
rson
Khmeln
ytsky
Cherkas
yChern
ivtsi
Chernigiv
Kyiv ci
ty
Sevasto
pol c
ity
Region
Sam
ple
Size
Sample size in HLCS 2004-2008
Sample size in HLCS 2009-2013
Reliability of indicator estimate
For definition of statistical reliability of indicator estimate generally – in view of the variance and bias – the mean squared error is used and defined by the formula:
Size of MSE is usually considered as the measure of reliability, i.e. for target indicator estimate is considered more reliable than , if
.
( ) ( ) ( )θθΒθθθ ;€€V;€MSE 2+=
θ )1(€θ )2(€θ( ) ( )θθθθ ;€;€ )2()1( MSEMSE <
The level of reliability (example)
For reliable estimation of indicators by results of state household sample survey the sample design must provide the value of the coefficient of variation for estimates of main indicators on base of the annual data not more than 5% for the national level and 10%for the regional level
Indicator estimation by survey results
( )HTHTDIR XXYY €€€€ −+= β
Direct estimation
Survey data weighting
Statistical weights are calculated, taking into account the following:- Probabilities of selection of households;- Actual levels of refusals of households and separate persons from
participation in the survey;- Harmonization of survey results with demographic statistics data concerning
the size and sex/age structure of the population etc.The final weight for the rth person is calculated as a product of the base weight of the
household and the corresponding weight factors (reweighing factors) using the formula:
,...1 rnrrBr kkww ⋅⋅⋅=- the base weight of a household (person) ;rBw
rnr kk ,..,1 - weight factors ;
Household response rates, HLCS 2000-2006
Household response rate indices, HLCS 2000-2006
0,90
1,00
1,10
1,20
1,30
2000 2001 2002 2003 2004 2005 2006
Inde
x
Ukraine Cities Towns Rural area Kyiv
Weight calibration equation (HLCS, 2004-2008)
( )
( )
( )
( )
( )
( )
( )
( )
( )
( )
( )
( )
( )⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪
⎩
⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪
⎨
⎧
=
=
=
=
=
=
=
=
=
=
=
=
=
=
=
→−
⋅
∑
∑
∑
∑
∑
∑
∑
∑
∑
∑
∑
∑
∑
∑
∑
∑
=
=
=
=
=
=
=
=
=
=
=
=
=
=
=
=
,
;
;
;
;
;
;
;
;
;
;
;
;
;
;
min;21
)4(
13
4
)3(
13
3
)2(
13
2
)1(
13
1
)4(
13
4
)3(
13
3
)2(
13
2
)1(
13
1
13
)(
13
)_(
13
_
)_(
13
_
1
)(3
)(
)_(
13
)_(
)_(
13
_
1 2
223
Mwm
Mwm
Mwm
Mwm
Gwg
Gwg
Gwg
Gwg
Dwd
Nwh
Nwh
Nwh
Hwj
Hwj
Hwj
www
n
iii
n
iii
n
iii
n
iii
n
iii
n
iii
n
iii
n
iii
n
iii
rurn
ii
ruri
tsn
ii
tsi
tbn
ii
tbi
n
i
ruri
rur
tsn
ii
ts
tbn
ii
tbi
n
i i
ii
Quality of statistical weights system (SEAP)
Indicator 1999 2000 2001 2002 2003 2004 2005 2006
Average weight, 743 738 658 647 653 369 367 362
Minimal weight, 185 192 196 187 195 16 52 32
Maximal weight, 2433 3028 2752 2079 2665 4562 4774 6413
Coefficient of variation ofweights
36,48 37,75 38,56 38,66 41,42 87,95 87,16 87,33
4,02 3,84 3,36 3,45 3,35 22,75 7,03 11,44
3,28 4,1 4,18 3,21 4,08 12,37 13,00 17,72
Range of variation of weights,
2248 2835 2556 1892 2470 4546 4722 6381
minw
maxw
( )wCV
min1 w
wR =
wwR max
2 =
RΔ
w
Quality of statistical weights system (HLCS)
Indicator 1999 2000 2001 2002 2003 2004 2005 2006
Average weight, 1921 1897 1889 1868 1812 1753 1694 1659
Minimal weight, 591 384 357 46 45 56 14 9
Maximal weight, 5585 10205 9456 7195 13817 7333 8357 8817
Variation coefficient ofweights
31,4 34,96 37,16 38,5 46,08 44,97 46,57 44,40
3,25 4,94 5,29 40,63 40,28 30,30 121,00 184,33
2,91 5,38 5,00 3,85 7,62 4,18 4,93 5,31
Range of variation of weights,
4994 9821 9099 7149 13772 7277 8343 8808
minw
maxw
( )wCV
min1 w
wR =
wwR max
2 =
RΔ
w
Change of variance and MSE on stages of the survey
Under condition of consecutive realization of the stages under consideration the integrated effect of influence of different survey stages on its results reliability can be submitted in the following kind
)€()€()€()€()€()€( YfeffYmeffYceffYweffYdeffYseff ⋅⋅⋅⋅=
Change of variance and MSE on stages of unemployment rate estimation in rural area of certain regions of Prichernomorsky economic district (LFS, 2004)
Estimates of poverty indicators
0
10
20
30
40
50
60
70
80
Crimea
Vinnyts
iaVoly
n
Dniprop
etrov
skDone
tskZhy
tomyr
Zakarp
attya
Zaporiz
hya
Ivano-F
rankivs
kKyiv
Kirovo
grad
Luga
nsk
Lviv
Mykola
ivOde
saPolt
avaRivn
eSum
yTern
poil
Kharki
vKhe
rson
Khmeln
ytsky
Cherkas
yChern
ivtsi
Chernigiv
Kyiv ci
ty
Region
Pove
rty
rate
Poverty rate (national line) in 2007
Poverty rate (regional line) in 2007
Reliability of estimation of the poverty rate by region
0.0
10.0
20.0
30.0
40.0
50.0
Crimea
Vinnyts
iaVoly
n
Dniprop
etrov
skDone
tskZhy
tomyr
Zakarp
attya
Zaporiz
hya
Ivano-F
rankivs
kKyiv
Kirovo
grad
Luga
nsk
Lviv
Mykola
ivOde
saPolt
avaRivn
eSum
yTern
poil
Kharki
vKhe
rson
Khmeln
ytsky
Cherkas
yChern
ivtsi
Chernigiv
Kyiv ci
ty
Sevasto
pol c
ity
Region
CV
CV for poverty rate (national line) in 2007
Comparison of surveys quality (1)
SEAP HLCS HAASK1 0,50 0,75 0,25K2 0,75 0,75 0,25K3 0,00 0,00 0,00K4 0,00 0,25 0,00K5 0,75 0,75 0,50K6 0,50 0,75 0,25K7 0,75 0,75 0,50K8 0,75 0,75 0,25K9 0,00 0,25 0,00K10 0,00 0,25 0,00I 0,400 0,525 0,200
1.0wj =
∑ ⋅==
10
1jjj wKI
{ }0.1,75.0;50.0;25.0;0K j =
Comparison of surveys quality (2)
Coherence of estimates with National accounts
0
100
200
300
400
500
600
700
800
900
2006Q1 2006Q2 2006Q3 2006Q4 2007Q1 2007Q2
In a
vara
ge fo
r a
mon
th, p
er c
apita
, UA
H
Total resources (excepting of privileges and subsidies), HLCS The disposable income, SNA Wages, HLCSWages (excepting of taxes and obligatory payments), SNA
Some directions of future works
Maintenance of calculation of key indicators of quality in each state household survey.Development and implementation of standard quality reports for three base surveys during 2009.Development of procedures for detailed measurement of survey results quality by all quality measures. Development and implementation of procedures of integrated quality estimation, mainly for comparison of different surveys quality levels and revealing of critical quality aspects for each survey.Implementation of small area estimation procedures for estimation of poverty and unemployment indicators on the region level.
References
Kish, L., (1995) Survey sampling. Wiley Classics Library Edition Published.Cochran, W. (1976) Sampling techniques. Statistics (Russian).Biemer, P., Lyberg, L. (2003) Introduction to Survey Quality. Wiley-Interscience. Longford, N.T., (2005) Missing Data and Small-Area Estimation: ModernAnalytical Equipment for the Survey Statistician. Springer.Lehtonen, R., Pahkinen, E.J., (1996) Practical Methods for Design and Analysis of Complex Surveys. Revised edition.Moser, C.A., Kalton, G., (1989) Survey Methods in Social Investigation. 2nd ed., Gower Publishing Company Limited.Kalton, G., (1983) Introduction to survwy sampling. Sage publications, LondonKalton, G., (2000) Developments in Survey Research in the Past 25 Years // Survey Methodology, June 2000, Vol. 26, № 1, pp. 3-10.
The end