Slide 1Slide Slide 1
International Conference on Establishment Surveys IIIMontreal • June 18-21, 2007
United States Department of AgricultureNational Agricultural Statistics Service
Generalized Census Processing System at the National Agricultural
Statistics Service
Thomas Jacob, Carol House
National Agricultural Statistics Service
Presentation Outline
• Census of Agriculture Overview• 2002 Census Processing System• Reasons for redesign• Redesign initiatives• Dashboard for continuous monitoring• Can the system be more generalized?• Acknowledgements• Questions
Slide 1Slide Slide 2International Conference on Establishment Surveys III
Montreal • June 18-21, 2007
Census of Agriculture Overview
• In 1997 Census of Agriculture was transferred from U.S Bureau of the Census
• 2002 -3 Million report forms mailed out• 400+ system users in Headquarters and
Field Offices• Over 1,500 variables• Over 110 published tables per state and US• Volume, volume, volume
Slide 1Slide Slide 3International Conference on Establishment Surveys III
Montreal • June 18-21, 2007
2002 Census Processing System• NASS contracted National Processing Center (NPC) for
- Mail out, Check in, Capturing images and Capturing data ( OMR +ICR)
• SAS based system for Edit, Imputation and Analysis using Sybase and Redbrick databases
- Edit Specifications captured using Decision logic table (DLT)- Micro level and macro level analysis- Automated edit using DLT- Tried to implement Fellegi-Holt (FH)
methodology and DLT as a two-tier edit- Goal of 80% data not touched by analysts.
Slide 1Slide Slide 4International Conference on Establishment Surveys III
Montreal • June 18-21, 2007
OMR=Optical Marker Recognition ICR=Intelligent Character Recognition
What Worked Well
• Completed Census on Schedule• Questionnaire Imaging• Analysis - Macro and Micro tools• % of records touched• Disclosure routines worked well but
independently
Slide 1Slide Slide 5International Conference on Establishment Surveys III
Montreal • June 18-21, 2007
Reasons for Redesign
• Increase system speed- Edit and Imputation was extremely slow
(could only edit 75 records at a time)- Issues with loads between databases- Slow communication lines- Database design was inefficient
- Nearest Neighbor Imputation using sequential search
Slide 1Slide Slide 6International Conference on Establishment Surveys III
Montreal • June 18-21, 2007
Reasons for Redesign
• Increase effectiveness and quality of process- Minimize data capture errors- Time consuming analysis
- Inadequate dashboard for identifying influential records
- Need for true interactive edit (IE)- Disclosure routine in old FORTRAN code
Slide 1Slide Slide 7International Conference on Establishment Surveys III
Montreal • June 18-21, 2007
Edit/Imputation/IE
CATI
Web
SCAN
Images
PaperForms
KFI
Raw Data
Sybase/OLTP
Replication Server
Redbrick/OLAP
Batch Edit
Analysis
Data ReviewInteractive Edit
PRD
DLT Edit Data Review
Interactive Edit
Replication Server
Data ReviewInteractive Edit
Disclosure/Tabulation
Slide 1Slide Slide 8International Conference on Establishment Surveys III
Montreal • June 18-21, 2007
Donor Pool
Edit/Imputation/IE
CATI
Web
SCAN
Images
PaperForms
KFI
Raw Data
Sybase/OLTP
Replication Server
Redbrick/OLAP
Batch Edit
Analysis
Data ReviewInteractive Edit
PRD
DLT Edit Data Review
Interactive Edit
Replication Server
Data ReviewInteractive Edit
Disclosure/Tabulation
Qua
Slide 1Slide Slide 9International Conference on Establishment Surveys III
Montreal • June 18-21, 2007
Donor Pool
Redesign Initiatives• Multiple modes of data collections ( CATI, WEB,
KFI, …)- but use the same module for loading data
• Key from Image (KFI) instead of scanning (OCR&OMR)
• Create an indicator denoting additional information occurred on the report form ( Respondent notes, Remarks, Altered Stubbs)
• Create images for respondents who responded through CATI, Web
Slide 1Slide Slide 10International Conference on Establishment Surveys III
Montreal • June 18-21, 2007
Edit/Imputation/IE
CATI
Web
SCAN
Images
PaperForms
KFI
Raw Data
Sybase/OLTP
Replication Server
Redbrick/OLAP
Batch Edit
Analysis
Data ReviewInteractive Edit
PRD
DLT Edit Data Review
Interactive Edit
Replication Server
Data ReviewInteractive Edit
Disclosure/Tabulation
Slide 1Slide Slide 11International Conference on Establishment Surveys III
Montreal • June 18-21, 2007
Donor Pool
Redesign Initiatives• Batch edit in Unix, IE in PC( local) using
the same code and same donors• True interactive edit (IE)• Dual screens for Data Review and Image
comparisons• Improve donor search strategies- scalable
using daemons & SAS/SHARE• More use of Previously reported Data
(PRD)
Slide 1Slide Slide 12International Conference on Establishment Surveys III
Montreal • June 18-21, 2007
Edit/Imputation/IE
CATI
Web
SCAN
Images
PaperForms
KFI
Raw Data
Sybase/OLTP
Replication Server
Redbrick/OLAP
Batch Edit
Analysis
Data ReviewInteractive Edit
PRD
DLT Edit Data Review
Interactive Edit
Replication Server
Data ReviewInteractive Edit
Disclosure/Tabulation
Slide 1Slide Slide 13International Conference on Establishment Surveys III
Montreal • June 18-21, 2007
Donor Pool
Redesign Initiatives• Creating new data models for both Transactional
(OLTP) and Analytic databases (OLAP)• Editing is in OLTP environment. Analysis is in
OLAP environment• Introduce Replication server- moves and
synchronizes data between OLTP and OLAP• Perform more server side processing using
SAS/CONNECT to reduce interactive response times
Slide 1Slide Slide 14International Conference on Establishment Surveys III
Montreal • June 18-21, 2007
OLTP=Online Transaction Processing OLAP=Online Analytic Processing
Redesign Initiatives
• Disclosure module converted to SAS/BASE • The system is more metadata driven. • Provide quality control grids to monitor the
editing effects on the data
Slide 1Slide Slide 15International Conference on Establishment Surveys III
Montreal • June 18-21, 2007
Dashboard for Continuous Monitoring
• Implementing a Quality Control module to track four major areas in a proactive mode- AdministrativeManagement Information System (MIS) reports to track weekly progress
- Data Monitor what the system is doing to the data.
Tables, maps, graphs, outlier gridsIndependent check of record level inconsistencies
- Elapsed Times Track how long key processes are taking to run
- System Stability Track key indicators that can impact performance of databases, UNIX
machines, SAS, etc.
Slide 1Slide Slide 16International Conference on Establishment Surveys III
Montreal • June 18-21, 2007
Can the system be more generalized?
• Wanted to have one system for Surveys and Censuses
• Metadata can handle both• Imputation can handle different types of
imputation• A few Surveys are using the system• Survey Analysts are reluctant to use DLT for
Survey edits• FH methodology sent back to research for
further evaluation.
Slide 1Slide Slide 17International Conference on Establishment Surveys III
Montreal • June 18-21, 2007
Acknowledgment
Slide 1Slide Slide 18International Conference on Establishment Surveys III
Montreal • June 18-21, 2007
We want to thank each and every member in the 2007 Census Team for their tireless efforts to make the redesign initiatives a reality.
Questions?
Slide 1Slide Slide 19International Conference on Establishment Surveys III
Montreal • June 18-21, 2007