Using Commercial-off-the-Shelf (COTS) Software at the National
Agricultural Statistics Service
Darcy Miller
UNECE Work Session on Statistical Data Editing
September 18-20, 2018
THE FINDINGS AND CONCLUSIONS IN THIS PRELIMINARY PRESENTATION HAVE NOT BEEN FORMALLY DISSEMINATED BY THE U.S. DEPARTMENT OF AGRICULTURE AND SHOULD NOT BE CONSTRUED TO REPRESENT ANY AGENCY DETERMINATION OR POLICY
2
National Agricultural Statistics Service(NASS)
• Agency in the United States Department of Agriculture (USDA)
• Mission: “The National Agricultural Statistics Service provides timely, accurate, and useful statistics in service to U.S. Agriculture.”
• Hundreds of survey reports– Surveys of farmers and farm businesses, scientific
measurements, satellite data, weather data, and more
• Census of Agriculture– 5 years
3
Editing and Imputation
• Data collected contain missing or erroneous values
• Often, customized code and/or a semi-manual process is used
• Major goal is a ‘clean’ dataset where edit logic is met
4
Edit and Imputation Review
• NASS continually seeks to improve its products
• Contracted with external organization to review editing and imputation processes
• One of the recommendations: use COTS software
5
COTS Software• Benefits:
– Generalized script
– Often supported
– Reduction in maintenance
– Ease of use/development
– Optimized code
– Reproducibility
• Challenges:– Desired features/flexibilities may not be available
– Fitting COTS software in an established process
6
COTS Software at NASSEditing and Imputation
• Blaise – hundreds of small surveys
– Interactive edit, changes primarily manual
• Banff evaluation
• IVEware
• PROC MI (FCS)
• PROC SURVEYIMPUTE
7
Blaise: Survey Processing• 100 + surveys (smaller)
8
Banff Evaluated
-
PRISM: Census of Agriculture
• Significant change to demographic section of the form
• Additional minor changes continued through cognitive, content, and web instrument testing to the final form
• Short timeframe to code and update code
• Census of Agriculture is edited/imputed record by record – Call to imputation code is made by the edit code
– Code for editing and imputation is custom script9
PRISM: Census of Agriculture~3 million records on list frame
10
Donor Pool 1
Keying or Web Collection
Statistical Edit
Nearest Neighbor 1
1 record
Questionnaire Section 1
Donor Pool 2
Statistical Edit
Nearest Neighbor 2
Questionnaire Section 2
1 record
Stored for Manual Review
and Analysis
Clean Record Data
PROC MIBatchedDemographics
. . .
. . .
. . .
Census of AgricultureWeighting Process
• June Area Survey– Annual survey
– Area based sample (theoretically complete)
– Demographic information is not edited/imputed
• Census of Agriculture weighting– June Area Survey data used in the dual system
estimation weighting which incorporates coverage, undercount, misclassification, and nonresponse
• Edited/imputed demographic information for June Area Survey using PROC SURVEYIMPUTE
11
PRISM: Survey Processing• <10 surveys (larger ~30,000 sampled)
12
IVEwarePROC MI
(all data collected)
Moving Forward
• NASS has had success in utilizing COTS software
– Primarily implemented in cases where timelines are short and data are new
• Continue to update edit & imputation processes to incorporate COTS software, where appropriate
– Features/flexibility
• Reduce challenges by modularizing processes
13