+ All Categories
Home > Documents > Analytics in Official Statistics - Sas Institute › ... ›...

Analytics in Official Statistics - Sas Institute › ... ›...

Date post: 08-Jun-2020
Category:
Upload: others
View: 4 times
Download: 0 times
Share this document with a friend
21
Analytics in Official Statistics: From Adaptive Survey Design to the U.S. 2020 Census Michael T. Thieme Assistant Director for Systems and Contracts Decennial Census Programs U.S. Census Bureau 1
Transcript
Page 1: Analytics in Official Statistics - Sas Institute › ... › analytics-in-official-statistics.pdf · 2018-04-27 · Analytics in Official Statistics: From Adaptive Survey Design to

Analytics in Official Statistics: From

Adaptive Survey Design to the U.S. 2020

Census

Michael T. Thieme

Assistant Director for Systems and Contracts

Decennial Census Programs

U.S. Census Bureau

1

Page 2: Analytics in Official Statistics - Sas Institute › ... › analytics-in-official-statistics.pdf · 2018-04-27 · Analytics in Official Statistics: From Adaptive Survey Design to

The thoughts and opinions in this presentation are those of the presenter and not necessarily those of the U.S. Census Bureau

2

Disclaimer

Page 3: Analytics in Official Statistics - Sas Institute › ... › analytics-in-official-statistics.pdf · 2018-04-27 · Analytics in Official Statistics: From Adaptive Survey Design to

▪Survey costs are rising

▪Confidence in government is declining

▪With it, the Public’s willingness to participate in surveys

▪Current methods for producing official statistics are unsustainable

3

Where we are:

Page 4: Analytics in Official Statistics - Sas Institute › ... › analytics-in-official-statistics.pdf · 2018-04-27 · Analytics in Official Statistics: From Adaptive Survey Design to

How did we get here?

and

4

How do we keep going?

Page 5: Analytics in Official Statistics - Sas Institute › ... › analytics-in-official-statistics.pdf · 2018-04-27 · Analytics in Official Statistics: From Adaptive Survey Design to

One Barrier to Where We Want to Go

5

1 Source: Fostering Interoperability in Official Statistics: Common Statistical Production Architecture (UNECE, 2013)

Accidental Architecture

Page 6: Analytics in Official Statistics - Sas Institute › ... › analytics-in-official-statistics.pdf · 2018-04-27 · Analytics in Official Statistics: From Adaptive Survey Design to

6

This is what Accidental Architecture looks like at Census:

Page 7: Analytics in Official Statistics - Sas Institute › ... › analytics-in-official-statistics.pdf · 2018-04-27 · Analytics in Official Statistics: From Adaptive Survey Design to

The Result?

▪Higher system costs

▪development, operations and maintenance

▪Nearly nonexistent interoperability

▪ Less data accessibility, discoverability, usability

▪Much more difficult to use data analytics and adaptive survey design approaches

7

Page 8: Analytics in Official Statistics - Sas Institute › ... › analytics-in-official-statistics.pdf · 2018-04-27 · Analytics in Official Statistics: From Adaptive Survey Design to

Part of the Answer:

8

Adaptive Survey Design

Page 9: Analytics in Official Statistics - Sas Institute › ... › analytics-in-official-statistics.pdf · 2018-04-27 · Analytics in Official Statistics: From Adaptive Survey Design to

9

Survey Data Collection Platform as a Service

A new approach at theU.S. Census Bureau

Concurrent Analysis and Estimation System

Unified Tracking System (Paradata Repository)

Centralized Operational Analysis and Control (Multimode Operational Control System)

CaRDS(ACS/

Decennial )

MAF/TIGER(Decennial)

Business Register(ECON)

Frame and Sample Systems

CaRDS(ACS and

Decennial)

BR, MADB, StEPS II(ECON)Integrated Field Operation Control

Systems

Time & Attendance

Systems

Response Processing

Systems

Centurion/ISR

iCADE ATAC CQA/IVRCOMET CLMS Enumeration

Page 10: Analytics in Official Statistics - Sas Institute › ... › analytics-in-official-statistics.pdf · 2018-04-27 · Analytics in Official Statistics: From Adaptive Survey Design to

Modest Beginnings

▪National Survey of College Graduates▪Developed R-Indicator Model

▪Ran experiments

▪Built confidence

10

Page 11: Analytics in Official Statistics - Sas Institute › ... › analytics-in-official-statistics.pdf · 2018-04-27 · Analytics in Official Statistics: From Adaptive Survey Design to

Modest Beginnings

▪Census Tests

▪2014, 2015, 2016

▪Administrative record modeling

▪Optimization of field work

▪Changed the way we do Censuses

11

Page 12: Analytics in Official Statistics - Sas Institute › ... › analytics-in-official-statistics.pdf · 2018-04-27 · Analytics in Official Statistics: From Adaptive Survey Design to

The U.S. 2020 Census

Using Analytics to:

▪Optimize the 2020 Census paid advertising campaign

▪ Identify vacant housing units

▪Optimize the number of enumeration attempts

▪ Identify best time to knock on doors

▪Optimize field worker efficiency

12

Page 13: Analytics in Official Statistics - Sas Institute › ... › analytics-in-official-statistics.pdf · 2018-04-27 · Analytics in Official Statistics: From Adaptive Survey Design to

13

2. SAS 9.4 for non-distributed processing

1. Hortonworks Hadoop for storage and in-database processing

3. SAS Viya for distributed processing

CAES 2020 Production Environment

Page 14: Analytics in Official Statistics - Sas Institute › ... › analytics-in-official-statistics.pdf · 2018-04-27 · Analytics in Official Statistics: From Adaptive Survey Design to

3 years of testing non-distributed versus distributed along 3 dimensions

1. Performance: How fast can we go?

2. Accuracy: When we go fast, do we come up with the same result?

3. Cost: What does it take to achieve better performance and the same level of precision?

Technology Performance Accuracy Cost

2015 Pilot SAS LASR In-Memory

2016 Pilot SAS In-Database (via Map Reduce)

2018 Pilot SAS Viya In-Memory ? ? ?

The Journey to CAES 2020

Page 15: Analytics in Official Statistics - Sas Institute › ... › analytics-in-official-statistics.pdf · 2018-04-27 · Analytics in Official Statistics: From Adaptive Survey Design to

15

Business Goal: speed up Decennial Administrative Records process

1. Performance:

Non-Distributed Model Processing Time: 38 HOURS

Distributed Model Processing Time: 2 HOURS

2. Accuracy:

Non-distributed and distributed RESULTS MATCHED

3. Cost

Roughly 4 HOURS required to convert and validate each model

Preserved existing code structure and Math-Stat way of working

2018 Pilot in Detail

Page 16: Analytics in Official Statistics - Sas Institute › ... › analytics-in-official-statistics.pdf · 2018-04-27 · Analytics in Official Statistics: From Adaptive Survey Design to

APPENDIX SLIDES

MOVED SLIDES FROM PREVIOUS DRAFT TO BACK OF PRESENTATION

16

Page 17: Analytics in Official Statistics - Sas Institute › ... › analytics-in-official-statistics.pdf · 2018-04-27 · Analytics in Official Statistics: From Adaptive Survey Design to

Performance of AdRec Modeling Programs

Page 18: Analytics in Official Statistics - Sas Institute › ... › analytics-in-official-statistics.pdf · 2018-04-27 · Analytics in Official Statistics: From Adaptive Survey Design to

Performance of Long-Running Occupied Model Program

Page 19: Analytics in Official Statistics - Sas Institute › ... › analytics-in-official-statistics.pdf · 2018-04-27 · Analytics in Official Statistics: From Adaptive Survey Design to

Accuracy of Scored Predictions from Occupied Model

Page 20: Analytics in Official Statistics - Sas Institute › ... › analytics-in-official-statistics.pdf · 2018-04-27 · Analytics in Official Statistics: From Adaptive Survey Design to

Cost of Converting 9.4 LOGISTIC to Viya LOGSELECT

Page 21: Analytics in Official Statistics - Sas Institute › ... › analytics-in-official-statistics.pdf · 2018-04-27 · Analytics in Official Statistics: From Adaptive Survey Design to

Worker Node 9

Worker Node 8

Worker Node 7

Worker Node 6

Worker Node 5

CAES

Business User

Developer

Admin

CAES Cluster

High Speed Local Network

Communication, No Data Moved

- NameNode 1- Resource Manager 2- Journal Keeper- Zookeeper- SAS Embedded Process

Master Node 1

20 CPU Cores256 GB Memory

12x 2 TB Disk Storage

Master Node 2

20 CPU Cores256 GB Memory

12x 2 TB Disk Storage

- Resource Manager 1- Hive Metastore 2- HiveServer 2- WebHCat 2- Journal Keeper- SAS Embedded Process

Master Node 3

20 CPU Cores256 GB Memory

12x 2 TB Disk Storage

- NameNode 1- History Server- Timeline Server- Journal Keeper- Zookeeper- SAS Embedded Process

Master Node 4

20 CPU Cores256 GB Memory

12x 2 TB Disk Storage

- Hive Metastore 1- HiveServer 1- WebHCat 1- Zookeeper- SAS Embedded Process

Worker Node 4

Worker Node 3

28 CPU Cores384 GB Memory

16 TB Disk Storage

- DataNode- NodeManager- Open Source R- SAS Embedded Process

Worker Node 2

Worker Node 1

20 CPU Cores256 GB Memory

16 TB Disk Storage

- Knox Gateway- HDP Clients- RStudio

Virtual Machine 2

8 CPU vCores32 GB Memory

1 TB vDisk Storage

- MySQL Database Server

Virtual Machine 3

8 CPU vCores32 GB Memory

1 TB vDisk Storage

- Ambari Server- Ranger Audit Server- Ranger Policy Server- Zeppelin- HST Server- Activity Analyzer

Virtual Machine 1

8 CPU vCores32 GB Memory

1 TB vDisk Storage

SAS 9.4 Metadata ServerSAS 9.4 Compute Server

SAS Mid-Tier Server

28 CPU Cores384 GB Memory

16 TB Disk Storage

SAS Viya Worker Node 4

28 CPU Cores384 GB Memory

16 TB Disk Storage

SAS Viya Worker Node 3

28 CPU Cores384 GB Memory

16 TB Disk Storage

SAS Viya Worker Node 2

28 CPU Cores384 GB Memory

16 TB Disk Storage

SAS Viya Worker Node 1

28 CPU Cores384 GB Memory

16 TB Disk Storage

- SAS Visual Analytics (Viya enabled)- SAS Visual Statistics (Viya enabled)- SAS Visual Data Mining

and Machine Learning

SAS Viya Controller Node

28 CPU Cores384 GB Memory

16 TB Disk Storage

SAS Viya Microservice Node

28 CPU Cores384 GB Memory

16 TB Disk Storage

Legend: Textured blue box: SAS Virtual Machine or Bare MetalSolid blue boxes: SAS Viya servers Bare Metal recommendedGreen Boxes: Hortonworks serversGreen Text: Hortonworks servicesBlue Text: SAS services

- SAS Metadata Server- SAS Web Server- SAS Web Application Server- SAS Web Clients- SAS Environment Manager- SAS Data Loader For Hadoop- SAS Scoring Accelerator for Hadoop- SAS Compute Server

SAS Desktop Client s

CAES 2020 Production Environment


Recommended