Date post: | 21-Jan-2018 |
Category: |
Education |
Upload: | externalevents |
View: | 59 times |
Download: | 1 times |
The challenge of the sampling variance&A note on outlier treatment in EU-SILC
Tim Goedemé, PhD
Herman Deleeck Centre for Social Policy - University of AntwerpNet-SILC 3
FAO
Rome, 2016-11-07
2
Introduction
• Indicators should be statistically sound and robust
• Most attention to non-sampling errors & validity
• Also sampling errors are important
3
6
• Statistics & samples are a powerful tool- Need limited number of observations- Point estimate and estimate of precision
However, without an estimate of its precision, a point estimate is pointless…
• … at least for evidence-based policy-making
Introduction
8
1. Monitoring
• The indicators should be sufficiently precise to observe small changes
- Advantage of panel data
• Much interest in (small) subpopulations (vulnerable groups, regional statistics, various breakdowns)
9
1. Monitoring
“Worryingly, 2014 figures show that now 11.1 per cent of the inhabitants of Flanders live in poverty. Last year, this was only 10.8 per cent.”
Standard error of change in poverty is about 1 p.p.
10
1. Monitoring
Source: Osier et al. 2013; EU-SILC 2011, own calculations
0.00.51.01.52.02.53.03.54.0
PL DE SI ES IT SE FI UK CZ HU FR EE CY LV AT MT BE NL DK EL SK LU PT BG IE RO LT
1/2 of 95% confidence interval AROPE (total population) and AROP60 (children), EU-SILC 2011
AREOPE(tot) AROP60(-18)
11
2. Computation
• Estimating the sampling variance requires:
- Good documentation of the sample (design)
- Access to high-quality microdata with sufficient information
- Adequate software, estimation methods, and expertise
12
2. Computation
0
1
2
3
4
5
6
7
8
persons households full sample design
95% Confidence interval of % in severe material deprivation, BE, EU-SILC 2007
13
2. Computation
1. Standard error of difference is much smaller with consistent SD variables.
2. Difference with 2011: the longer the time-span, the weaker the covariance (and the larger the standard error) will be
14
2. Computation
• So important issues:- Sample design, weighting, imputation, characteristics
of indicator- Never simply compare confidence intervals- Need of consistent sample design variables- Need of stable and sound sample designs- Need of appropriate estimation techniques
• Techniques such as small-area estimation could also help
15
3. Communication
• To policy-makers and politicians
• To the wider public
• For indicator databases such as RuLIS
• Improve awareness of both sampling and non-sampling errors
16
3. Communication
Informative value
Confidence interval > standard error
Standard error > degrees of freedom
Degrees of freedom > number of observations
17
3. Communication
• Develop tools to compute on line statistical significance of difference between point estimates?
- Invisible database with microdata - OR
- Invisible database with all var-covar matrices and DFs
- Users simply have to indicate which two figures they would like to compare
18
4. A note on outlier treatment
• (Undesirable) influential value
• Wrong value
• Can strongly reduce sampling variance (e.g. of poverty gap)
19
4. A note on outlier treatment• RuLIS:
- Automatic procedure (Normal / Lognormal distribution)
- Most disaggregate level, univariate checks
- Check on number of cases with corrected values
- No check of aggregates at individual level (e.g. totalincome)
- No check of other errors (not at extreme positions)?
21
4. A note on outlier treatment
• Strong variation in outlier detection and treatment
• More emphasis on trying to impute ‘correct’ value, e.g.:- Logical deduction / simulation models- Back to interviewers / data processing- Comparison with administrative records- Imputation based on multivariate regression models- Comparison with previous waves (panel data)
• Corrections through calibration
22
4. A note on outlier treatment
• Previous EU-SILC study (Van Kerm, 2007):
- At aggregate level
- Winsorizing better than trimming
- Parametric modelling of tails promising approach
23
Conclusion
• The sampling variance is an important challenge to indicators for evidence-based policy-making
• Increases awareness of both sampling andnon-sampling errors
24
Conclusion
Outlier treatment• No clear best practice in EU-SILC
• Check on aggregates at individual level (e.g. totalincome / consumption):
- Still undesirable outliers- Accumulation of imputations?
• Sensitivity checks are key
26
3. Communication
• Ways to make database users aware of sampling error
- Add a note with a general warning / guideline (preferably most conservative estimate?)
- Only report 90/95% confidence intervals
- Include standard error (and DFs?) / confidence intervals
- Generalized variance function with indication of how standard error can be computed
easy
diff
icul
t