1
A Resampling Study of NASS Survey MPPS Sampling Strategy
By Stanley Weng
National Agricultural Statistics Service
U.S. Department of Agriculture
2
INTRODUCTION
MPPS• Multivariate Probability Proportional to
Size
• Address multiple, and often competing, purposes (multi targets) of a survey
• Used for NASS Crops Survey (CS) etc., since 1999
3
MPPS
• Technically
Sample was selected using a Poisson method. Each farm i had a unique probability of selection, formed by
p p m Mi im m in{ , m ax{ , , . . . , }}( )1 1
4
MPPS
where is the item m selection
probability, determined by
▪ auxiliary data with the assumption of the variance proportional to (a power of) the auxiliary variable value
▪ optimal allocation
▪ a desired item-level sample size
p im( )
5
MPPS
• Development and application of the MPPS strategy at NASS:
Amrhein, Hicks and Kott (1996)
Amrhein and Bailey (1998)
Bailey and Kott (1997)
Hicks, Amrhein and Kott (1996)
Kott, Amrhein and Hicks (1998).
6
A COMPARISON STUDY
• This study was designed to compare MPPS with the previously used SRS ((Stratified) Simple Random Sampling) strategy
7
THIS STUDY
• Explored the resampling approach to reveal the statistical characteristics/ behavior of NASS Ag survey data
• Raised issues for further investigation to improve our understanding and practice of NASS Ag survey sampling /estimation
8
RESAMPLING
● Population bootstrap
• Base sample
June Crop Survey MPPS samples
• Pseudo population
Composed of replicates of base sample elements, according to the (integerized) weight of the element
U *
S
9
RESAMPLING
• Resamples
Independent samples, drawn from
by Poisson and SRS sampling strategies respectively
S r Rr* , , . . . , 1
U *
10
RESAMPLING
● Resample totals , and*t r r R 1 2, , . . . ,
tR
tR S rr
R* *
1
1
11
RESAMPLING
• Resampling variance estimate for the sample total estimate
Bootstrap statistic
( )* * *VR
t tR S r R Sr
R
1
12
1
12
DATA
• The crop component of the 2004 and 2005 June QAS, for all 42 participating states
• Certainty elements were eliminated from sample, to avoid unnecessary complication
,
13
RESAMPLING VAR ESTIMATES
● Based on 1000 resamples• Naive Comparison● Log-Log Plot
▪Resampling variance est vs sample total across crops – for each state▪Overlay: Poisson (*) vs SRS (^)
14
Naive Comparison
• General linear trend
(Assumption: the variance proportional to a power of the total)
• For majority of crops, SRS variance appeared greater than Poisson variance (but often not appreciably)
15
Log-Log Plot of Resampling Variance Est vs Total Across Crops: CAOverlay: Poisson (*) vs SRS (^)
pot srg saf sun dwh bar ctp oat ohy ctu wwh ric crn alf
22 ˆ ‚ ‚ ^ ‚ ‚ ‚ ^ * ‚ ^ 20 ˆ * ‚ ^ ^ ‚ * * log_var_psn ‚ * * ^ ‚ ^ * ‚ ‚ * 18 ˆ ‚ ^ * ‚ ^ ‚ * ^ ‚ ^ * ‚ ‚ * ^ 16 ˆ * ‚ ^ ‚ ‚ * ‚ ‚ ‚ 14 ˆ Šƒƒˆƒƒƒƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒƒƒƒˆƒƒ 9.0 9.5 10.0 10.5 11.0 11.5 12.0 12.5 13.0 13.5 log_tot NOTE: 1 obs hidden.
16
Validness of the Comparison
• Need additional information to justify
• The quality of the resampling variance estimate depends on the statistical quality of the resample totals, which also provides evidence for the appropriateness of the sampling strategy
• Among various aspects, the most important: NORMALITY
17
Normality
● Q – Q plot of resample totals• Demonstration: CA
▪ Most crops: Good shape of Q-Q Plot(Corn, Potatoes)▪ Exception: Other Hay
Evidence that Poisson was better than SRS
18
19
20
21
22
23
24
Outliers on the log-log plot
• Located far apart from the general trend
• The two sampling strategies gave appreciably different estimates
• Demonstration:
▪ CA: Other Hay
▪ MT: Potatoes
Evidence that SRS was better
25
Log-Log Plot of Resampling Variance Est vs Total Across Crops: MTOverlay: Poisson (*) vs SRS (^)
mus sun can pot saf fla crn oat ohy dwh bar alf wwh swh
log_var_psn ‚ ‚ 25.0 ˆ ‚ ‚ ‚ * ‚ 22.5 ˆ * ^ ‚ * ‚ * ‚ ^ ‚ * 20.0 ˆ * ‚ ‚ * * ‚ ^ ‚ ^ ^ * 17.5 ˆ ^ ‚ ^ * ‚ ^ * ‚ ‚ * 15.0 ˆ ‚ ‚ ^ * ‚ ‚ 12.5 ˆ * ‚ Šƒˆƒƒƒƒƒƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒƒƒƒƒƒˆƒ 7 8 9 10 11 12 13 14 15 log_tot NOTE: 5 obs hidden.
26
27
28
FINITE SAMPLE RESAMPLING
Complexities- Due to the special features of survey sampling
● Nonindependence arising in sampling without replacement
● Other complexities of finite population structure by designs and estimators
29
FINITE SAMPLE RESAMPLING
Effects of discreteness
(Davison & Hinkley, 1997, 2.3.2)▪ Discrete empirical distribution
and in particular,
▪ In finite population sampling, the pseudo population formed by replicates of sample elements
30
FINITE SAMPLE RESAMPLING
Issues with this study• Comparable sample size
- Addressed by size adjustment
• Impact of the base sample- Not clear
31
Impact of Base Sample
For finite population resampling, the general guideline
▪ The resampling population mimics the original population, and ▪ The resamples, mimic the base sample, drawn from by a design identical to the one by which the base sample was originally drawn(Sarndal, et al., 1992, Ch. 11)
U *
32
AT ISSUE
● How the resampling technique should be correctly modified to accommodate the finite sampling situation?
33
AT ISSUE
● In literature, most reported finite sample resampling studies used (stratified) SRS, which bears the most similarity to the infinite population independent random sampling - the standard setting that the resampling technique is based on
34
SUMMARY
• An Approach
Resampling & analysis of resamples,
using statistical graphical and diagnostic techniques, to reveal statistical characteristics / behavior of NASS Ag survey data
35
SUMMARY
● Sampling strategy comparison▪ Poisson seemed to be preferable to stratified simple random sampling
▪ A national comparison table of the two strategies across crops and states is to be produced for a comprehensive picture with likely causal factors identified
36
FURTHER INVESTIGATION
To develop statistical understanding,
the resampling setting of this study and
other statistical information techniques
will be further explored
37
FURTHER INVESTIGATION
▪ Behavior of Studentized bootstrap statistics
▪ Statistical function
(Booth, Butler, and Hall, 1994;
Davison & Hinkley, 1997)
▪ Examine different survey data
38
THANK YOU
39
ALF Alfalfa All Harvested Acres BAR Barley All Planted Acres
CAN Canola All Planted Acres CRN Corn Planted Acres
CTP Pima Cotton Planted Acres CTU Upland Cotton Planted Acres
DEB Dry Beans Planted Acres DWH Durum Wheat Planted Acres FLA Flaxseed Planted Acres MUS Mustard All Planted Acres OAT Oats All Planted Acres
OHY Other Hay Harvested Acres PNT Peanuts All Planted Acres
POT Potatoes All Planted Acres RIC Rice All Grain Planted Acres
RYE Rye All Planted Acres SAF Safflower All Planted Acres SGB Sugarcane All Planted Acres*
SOY Soybeans All Planted Acres SPT Sweet potatoes Planted Acres SRG Sorghum All Planted Acres
SUG Sugarcane For Sugar Harvested Acres SUN Sunflowers All Planted Acres
SWH Spring Wheat Irr Planted Acres WWH Winter Wheat All Planted Acres