+ All Categories
Home > Documents > Approach to using alternative data sources to … › fileadmin › DAM › stats › documents ›...

Approach to using alternative data sources to … › fileadmin › DAM › stats › documents ›...

Date post: 25-Jun-2020
Category:
Upload: others
View: 0 times
Download: 0 times
Share this document with a friend
23
Approach to using alternative data sources to support the 2021 Census in England and Wales Cal Ghee Office for National Statistics 26 September 2018
Transcript
Page 1: Approach to using alternative data sources to … › fileadmin › DAM › stats › documents › ece › ces › ge...Approach to using alternative data sources to support the 2021

Approach to using alternative data sources to support the 2021 Census in England and Wales

Cal GheeOffice for National Statistics26 September 2018

Page 2: Approach to using alternative data sources to … › fileadmin › DAM › stats › documents › ece › ces › ge...Approach to using alternative data sources to support the 2021

Content

• Introduction• What are ‘alternative data’?• How we could use them

o Prepare and collecto Process and analyseo Outputs

• Vision• Aims• Criteria for inclusion in design• Next steps

Page 3: Approach to using alternative data sources to … › fileadmin › DAM › stats › documents › ece › ces › ge...Approach to using alternative data sources to support the 2021

Introduction: the 2021 Census for England and Wales

National Statistician’s 2014 recommendation for the future provision of population statistics and the next census included:

Increased use of administrative data and surveys in order to enhance the statistics from the 2021 Census ...

… make the best use of all available data

Page 4: Approach to using alternative data sources to … › fileadmin › DAM › stats › documents › ece › ces › ge...Approach to using alternative data sources to support the 2021

Introduction: meeting specific objectives of the census

1. to produce census statistics of the right quality and timeliness to meet user needs

2. to produce integrated outputs from census, administrative and survey data

Page 5: Approach to using alternative data sources to … › fileadmin › DAM › stats › documents › ece › ces › ge...Approach to using alternative data sources to support the 2021

What are ‘alternative data’?Administrative data• collected primarily for

administrative reasons• statistical use usually

secondary

Survey data• gathered from statistical surveys

including earlier censuses

Big data• large, often unstructured• potentially available in real time• difficult to process efficiently using

traditional methods and technologies. • many formats, including audio, video,

computer logs, purchase transactions, sensors, social networking sites.

• freely available on the web or held by the private sector

Paradatadata that describe the process by which the data were collected, eg:

o the times of day responses were submitted o time taken to complete the questionnaire o number of attempts to complete the questionnaireo mode of communication/responseo how many times field officers called, day of week, time of day, how many times they made contact and whether a response was subsequently received

Page 6: Approach to using alternative data sources to … › fileadmin › DAM › stats › documents › ece › ces › ge...Approach to using alternative data sources to support the 2021

How could we use them? 2021 Census Operation

Hard to count; field workload allocation***

Collection Target Populations; predicted response profiles***

Helping to create address frame in advance***

Coding: Adding to indexes/classifications***

Coverage bias adjustments**

QA detailed matching in areas of greatest uncertainty; CQS triangulation**

Validation to reduce field visits*

Cleaning and editing*

Edit & Imputation of single year of age*

Placeholders in record imputation*

Adjust for collected data in communal establishments**

PREP

ARE

&

COLL

ECT

PRO

CESS

& A

NAL

YSE

DATA

Key Aggregated data

Record level data

***Confirmed use

**Likely use (have demonstrated)

*Possible use (research still to do)

Replace previously collected or creation of new variables**

Extended output categories eg qualifications***

Maintenance of Output Areas**

OU

TPU

TS

QA collected data, population estimates and characteristics***

Journalistic topic analysis***

Page 7: Approach to using alternative data sources to … › fileadmin › DAM › stats › documents › ece › ces › ge...Approach to using alternative data sources to support the 2021

Prepare and collect

Hard to count index; field workload allocation***

Collection Target Populations; predicted response profiles***

Helping to create address frame in advance***

Validation to reduce field visits*

PREP

ARE

& C

OLL

ECT

Key Aggregated data

Record level data

***Confirmed use

**Likely use (have demonstrated)

*Possible use (research still to do)

Page 8: Approach to using alternative data sources to … › fileadmin › DAM › stats › documents › ece › ces › ge...Approach to using alternative data sources to support the 2021

Example: validate field outcomes and making field visits more efficientAddress

List

Initial contact lettersExtract

taken before Census Day

Response

Undelivered as addressed (return to sender)

Non-response

?Follow up

Page 9: Approach to using alternative data sources to … › fileadmin › DAM › stats › documents › ece › ces › ge...Approach to using alternative data sources to support the 2021

Undelivered as addressed (return to sender)

Desk check against alternative sources Field

check

Remove from list

Re-send letter, and field visit to encourage response

Address doesn’t exist / non-residential

Address is current / has signs of residence

Example: validate field outcomes and making field visits more efficient

Page 10: Approach to using alternative data sources to … › fileadmin › DAM › stats › documents › ece › ces › ge...Approach to using alternative data sources to support the 2021

Non-responseField visit

Remove from further visits

Maintain field visits

Address doesn’t exist / non-residential

Address is current / has signs of residence

?Desk check against alternative sources

Confirm address doesn’t exist / non-res

Example: validate field outcomes and making field visits more efficient

Page 11: Approach to using alternative data sources to … › fileadmin › DAM › stats › documents › ece › ces › ge...Approach to using alternative data sources to support the 2021

Example: hard to count index, target groups and live response-chasing

Page 12: Approach to using alternative data sources to … › fileadmin › DAM › stats › documents › ece › ces › ge...Approach to using alternative data sources to support the 2021

Hard-to-Count index

Response Profiles

Predicts relative likelihood of self-response for each LSOA

Field Operations Simulation

Response Chasing Algorithm tool

Predicts responses over time for groups of LSOAs sharing similar characteristicsCensus day

Models field staff hours, number of paper questionnaires and reminders needed and impacts of interventions

Actual returns

Predicted responses

Identifies gaps between predicted and actual returns and suggests interventions

Page 13: Approach to using alternative data sources to … › fileadmin › DAM › stats › documents › ece › ces › ge...Approach to using alternative data sources to support the 2021

Hard-to-Count

Response Profiles

Predicts relative likelihood of self-response for each LSOA

Field Operations Simulation

RCA

This is the response profile expected for this type of LSOA, characterised by HtC group and age profile

Census day

248 field staff hours are required to reach target response rates

Actual returns

Predicted responses

Our actual returns are falling short of our predictions. To meet our targets, we need to move additional field staff to this area.

HtC willingness 5 (low self-response), digital group 2 (high digital take-up)

62% Self-response

E.g. LSOA ‘X’

Page 14: Approach to using alternative data sources to … › fileadmin › DAM › stats › documents › ece › ces › ge...Approach to using alternative data sources to support the 2021

Process and analyse

Coding: Adding to indexes/classifications***

Coverage bias adjustments**

QA detailed matching in areas of greatest uncertainty; CQS triangulation**

Cleaning and editing*

Edit & Imputation of single year of age*

Placeholders in record imputation*

Adjust for collected data in communal establishments**

PRO

CESS

& A

NAL

YSE

DATA

Maintenance of Output Areas**

QA collected data, population estimates and characteristics***

Key Aggregated data

Record level data

***Confirmed use

**Likely use (have demonstrated)

*Possible use (research still to do)

Page 15: Approach to using alternative data sources to … › fileadmin › DAM › stats › documents › ece › ces › ge...Approach to using alternative data sources to support the 2021

Example: placeholders in record imputation

Varia

bles

from

Adm

in re

cord

s

Imputed non-response records

Collected responses

variables

Direct use of admin data – pull data across from admin data linked to a non-responding address

Indirect useso ‘dummy’ forms (operational

paradata – eg type of accomm, likely number of residents)

o intelligence from the admin data (eg 4 people usually live here)

Modelling – eg postcode level data where HE students most likely to live –model imputed records to fit proportion of residents likely to be students

How?

1. Estimate basic information from coverage survey (dual system estimation), create skeleton records to populate

2. Use donor system to populate the rest of the record

Page 16: Approach to using alternative data sources to … › fileadmin › DAM › stats › documents › ece › ces › ge...Approach to using alternative data sources to support the 2021

Outputs

Replace previously collected or creation of new variables**

Extended output categories egqualifications***O

UTP

UTS Journalistic topic analysis***

Key Aggregated data

Record level data

***Confirmed use

**Likely use (have demonstrated)

*Possible use (research still to do)

Page 17: Approach to using alternative data sources to … › fileadmin › DAM › stats › documents › ece › ces › ge...Approach to using alternative data sources to support the 2021

Example: replacing ‘number of rooms’ question with admin dataUser need still exists, but historically quality of collected data is poor

Link data to address list

• Valuation Office Agency data

• Linking by UPRN in advance of census

• Number of rooms

• Potential for more: property type, size of property

Process

• Clean data

• Edit rules*

• Impute missing values

Outputs

• Additional data available for outputs

• Reduced respondent burden

• Improved quality

* Census collects number of bedrooms: need to ensure consistency across variables

Page 18: Approach to using alternative data sources to … › fileadmin › DAM › stats › documents › ece › ces › ge...Approach to using alternative data sources to support the 2021

- improve timeliness, meet more user needs

Integrate alternative data to provide the best quality 2021 census and leave a positive legacyVI

SIO

NAI

MS

Continue to add value post census

Collection - value for money and optimise response ratesProcessing - improve quality and trust through assurance Outputs

Improve and enhance:

VISION AND AIMS

Page 19: Approach to using alternative data sources to … › fileadmin › DAM › stats › documents › ece › ces › ge...Approach to using alternative data sources to support the 2021

DESIGN PRINCIPLES

- Believe the collected census data - Add value to the census - Assure users of its quality- Weigh up gains v costs and knock-on

effects- Think: Quality – Value – Trust- Just because we can, doesn’t mean we

should

Page 20: Approach to using alternative data sources to … › fileadmin › DAM › stats › documents › ece › ces › ge...Approach to using alternative data sources to support the 2021

INCLUSION CRITERIA

Balance – Improved quality, trust, value balanced against risk to accuracy, timeliness, interpretability

Detail – improved quality for a population sub-set or for core variables -quality v effort

Quality – Relevance – still meeting user needs? Includes granularity/low level geography.Accuracy – improving? To what extent are we adding more uncertainty? Issues clustered in same areas/same groups of people – are we risking the whole but not actually improving sufficiently where needed?Timeliness – does adding more processes risk our timetable?Accessibility – does increased use of data available in public domain risk what we can publish?Interpretability – can we explain the use clearly? Especially regarding the quality of sources used. Circularity of use.Coherence – does the alternative data cover the same definitions? UK harmonisation, coherence over time.

Value – from integrating data appropriately. Use of and input into corporate transformation programmes; improving efficiency of collection and processing

Trustxx – ethical considerations, reduce respondent burden, build user assurance

Page 21: Approach to using alternative data sources to … › fileadmin › DAM › stats › documents › ece › ces › ge...Approach to using alternative data sources to support the 2021

Potential uses – current thinking

Validation to reduce field visits*

Cleaning and editing*

Edit & Imputation of single year of age*

Placeholders in record imputation*

?

Page 22: Approach to using alternative data sources to … › fileadmin › DAM › stats › documents › ece › ces › ge...Approach to using alternative data sources to support the 2021

Next steps

• Still in the research phase• Large-scale rehearsal in 2019/20• Gather and use as much alternative data sources

as possible for the rehearsal• Finalise the design for the 2021 Census

Dependencies• Getting the data• Quality of available data• Development of methods

Page 23: Approach to using alternative data sources to … › fileadmin › DAM › stats › documents › ece › ces › ge...Approach to using alternative data sources to support the 2021

Thank you


Recommended