+ All Categories
Home > Documents > Better data: combining survey, administrative and big data · 2019-01-04 · Large Data ‣...

Better data: combining survey, administrative and big data · 2019-01-04 · Large Data ‣...

Date post: 08-Aug-2020
Category:
Upload: others
View: 0 times
Download: 0 times
Share this document with a friend
28
researchICTsolutions Better data: combining survey, administrative and big data - Dr. Christoph Stork 1
Transcript
Page 1: Better data: combining survey, administrative and big data · 2019-01-04 · Large Data ‣ Administrative data:-medical and tax records -driver licences, civil registry ‣ social

researchICTsolutions

Better data: combining survey, administrative and big data

- Dr. Christoph Stork

�1

Page 2: Better data: combining survey, administrative and big data · 2019-01-04 · Large Data ‣ Administrative data:-medical and tax records -driver licences, civil registry ‣ social

Big Data = combining various data sets, including large data‣ Large Data (private sector and administrative

data): - n -> all ( n approaches N)

- all bank accounts - all mobile subscribers - all tax payers - all driver licence holders

‣ Survey Data: ‣ Quarterly LFS South Africa 2016 n= 30,000

�2

Page 3: Better data: combining survey, administrative and big data · 2019-01-04 · Large Data ‣ Administrative data:-medical and tax records -driver licences, civil registry ‣ social

Large Data‣ Administrative data:

- medical and tax records - driver licences, civil registry ‣ social security, crime statistics, electricity consumption,

educational statistics ‣ Commercial transactions data:

- Stock exchange data, FX - bank and credit card and super market transactions, - insurance records, loyalty card records…

‣ Sensors and tracking devices: sensors, M2M, satellite, GPS devices…

‣ Online activities / social media: Web scraping of online search activity, online page views, blogs/ FB/ twitter

�3

Page 4: Better data: combining survey, administrative and big data · 2019-01-04 · Large Data ‣ Administrative data:-medical and tax records -driver licences, civil registry ‣ social

Example:PopulationdensitychangesinColomboregion:weekday/weekendbasedon100millionCallDetailRecordsperdaygeneratedbySriLankamobileoperators

!4

Wee

kday

Sund

ay

Decrease in Density Increase in Density

Time 18:30Time 12:30Time 06:30

by Sriganesh Lokanathan [email protected]

Page 5: Better data: combining survey, administrative and big data · 2019-01-04 · Large Data ‣ Administrative data:-medical and tax records -driver licences, civil registry ‣ social

Big Data

�5

ADMIN DATA

SURVEY

DATA

PRIVATE SECTOR

DATA

Banks Mobile Operators Other Companies

Labour Force Survey National Household Income and Expenditure Survey

Economic Surveys (Informal sector)

Ministry of Finance Social Security Civil Registry

Page 6: Better data: combining survey, administrative and big data · 2019-01-04 · Large Data ‣ Administrative data:-medical and tax records -driver licences, civil registry ‣ social

Surveys are snap shots - Big Data is a movie

2000

2001 Census

2011 Census

2006 DHS

2012 LFS

2003/4 NHIES

2006/7 NHIES

2013 LFS

2014 LFS

2016 LFS

Can Big Data be used to: ‣ Fill gaps (interpolate key statistics)? ‣ Reduce frequency of surveys? ‣ Make statistics more accurate? ‣ Reduce sample size (census 4% of population)?

Example of Namibia surveys

Page 7: Better data: combining survey, administrative and big data · 2019-01-04 · Large Data ‣ Administrative data:-medical and tax records -driver licences, civil registry ‣ social

Governments need reliable data‣ Survey data can be misleading if sampling is not

not done properly…(US election polling eg) ‣ Big data can be misleading if what is being

measured is not well understood ‣ Mobile operator data does not include info on non-users ‣ Bank data does not generate informal sector income data

‣ Social media as early indicator of an unemployment? ‣ Self reinforcing trends ‣ fake news

�7

Page 8: Better data: combining survey, administrative and big data · 2019-01-04 · Large Data ‣ Administrative data:-medical and tax records -driver licences, civil registry ‣ social

researchICTsolutions

Digital and Financial Divide

�8

Page 9: Better data: combining survey, administrative and big data · 2019-01-04 · Large Data ‣ Administrative data:-medical and tax records -driver licences, civil registry ‣ social

Finscope 2015 Survey for South Africa

�9

Big and Admin Data may overlook these

Page 10: Better data: combining survey, administrative and big data · 2019-01-04 · Large Data ‣ Administrative data:-medical and tax records -driver licences, civil registry ‣ social

Finscope 2015 Survey for South Africa

75% of South Africa may leave only thin digital trace

Page 11: Better data: combining survey, administrative and big data · 2019-01-04 · Large Data ‣ Administrative data:-medical and tax records -driver licences, civil registry ‣ social

�11

ZICTA 2015 ICT Survey: Individuals 10+

Zambia

Urban

Rural7%

18%

14%

39%

68%

51%

Active Mobile usersSmartphone owners among mobile phone owners

ZICTA 2015 ICT Survey: Households with working…

Financially included

Access to mobile phone

Access to computer

Access to Internet 11%

8%

78%

57%

21%

15%

82%

61%

Male Female

Finscope 2015: Individual 16+

Zambia

Male bias

Urban bias

Page 12: Better data: combining survey, administrative and big data · 2019-01-04 · Large Data ‣ Administrative data:-medical and tax records -driver licences, civil registry ‣ social

�12

Zambia -90 db

Page 13: Better data: combining survey, administrative and big data · 2019-01-04 · Large Data ‣ Administrative data:-medical and tax records -driver licences, civil registry ‣ social

researchICTsolutions

Informal Business Surveys

�13

Page 14: Better data: combining survey, administrative and big data · 2019-01-04 · Large Data ‣ Administrative data:-medical and tax records -driver licences, civil registry ‣ social

researchICTsolutions �14

Page 15: Better data: combining survey, administrative and big data · 2019-01-04 · Large Data ‣ Administrative data:-medical and tax records -driver licences, civil registry ‣ social

researchICTsolutions �15

Page 16: Better data: combining survey, administrative and big data · 2019-01-04 · Large Data ‣ Administrative data:-medical and tax records -driver licences, civil registry ‣ social

researchICTsolutions �16

Listing compiled for each EA - The listings serve as sample frames for the simple random selections of households & businesses

Page 17: Better data: combining survey, administrative and big data · 2019-01-04 · Large Data ‣ Administrative data:-medical and tax records -driver licences, civil registry ‣ social

Large data itself not enough for policy purposes‣ Large Data available for the rich or middle class ‣ Large data for Internet of Things ‣ Little available for the informal sector or poor

- Outside of coverage areas - Not using tech - Not having bank account and only using cash - No health insurance - No permanent address

‣ Informal sector makes up a large share of our societies

�17

Page 18: Better data: combining survey, administrative and big data · 2019-01-04 · Large Data ‣ Administrative data:-medical and tax records -driver licences, civil registry ‣ social

Generating Big Data through Triangulation

�18

ADMIN DATA

SURVEY

DATA

PRIVATE SECTOR

DATA

Page 19: Better data: combining survey, administrative and big data · 2019-01-04 · Large Data ‣ Administrative data:-medical and tax records -driver licences, civil registry ‣ social

Estimating Income

Source: BanksFrequency: ContinuousType: Private Sector Large DataWhat: Salaries received (formal)Data: address, age, gender, IDEnrichment: years of work experience, education, skills, sector

Source: Ministry of Finance Frequency: Continuous Type: Admin Large Data What: Pay as you earn tax (formal salaries), income from self employmentData: address, age, gender, IDEnrichment: years of work experience, education, skills

Source: Labour Force Survey Frequency: Annual / OccasionalType: SurveyWhat: Income of formally employedIncome of informally employedIncome of self-employedData: age, gender, work experience, education, location, sector, type of employment (full-time, part-time or occasional)Enrichment: Having a bank accountReceiving salary in bank accountPaying Social Security

Page 20: Better data: combining survey, administrative and big data · 2019-01-04 · Large Data ‣ Administrative data:-medical and tax records -driver licences, civil registry ‣ social

�20

LFS LFS

2015 estimate

relationship

2020 re-calibrate

Income data

no of people receiving a

formal salary

Income Data Triangulation2015 LFS Formal Income = 2015 Total salaries declared for tax

= 2015 Total salaries received in bank accounts

2015 LFS Informal Income = 2015 LFS total Income - Total salaries declared for tax

Monthly or quarterly PAYE Tax

Salary transactions data

interpolate

Page 21: Better data: combining survey, administrative and big data · 2019-01-04 · Large Data ‣ Administrative data:-medical and tax records -driver licences, civil registry ‣ social

Understanding Informal Sector better

‣ Large / Big does not know much about informal sector

‣ Estimate link between informal and formal sector based on detailed LFSs

‣ Interpolating Labour force statistics based on large data may then also allow you to interpolate informal sector statistics

�21

Page 22: Better data: combining survey, administrative and big data · 2019-01-04 · Large Data ‣ Administrative data:-medical and tax records -driver licences, civil registry ‣ social

Detecting Strikes

‣ Bank transaction data classified as salary or wage payments can be screened for temporary interruptions by location

�22

Page 23: Better data: combining survey, administrative and big data · 2019-01-04 · Large Data ‣ Administrative data:-medical and tax records -driver licences, civil registry ‣ social

Matching Jobs to skills

Source: Universities, VTCs (SAQA) Frequency: AnnualType: Admin DataWhat: Supply of graduates by field and date

Source: Company Online Portal,Company Survey, Skills Audit, tracer studiesFrequency: Occasional Type: Survey / Demand drivenWhat: Vacancies by skill, address, date

Source: Labour Force Survey Frequency: Annual / OccasionalType: SurveyWhat: Number of unemployed by age, gender, work experience, skills, education and locationEnrichment:Paying Social Security

Source: Social Security Frequency: ContinuousType: Admin DataWhat: Number of unemployedData details: address, age, gender, years of work experience,education, skills

Page 24: Better data: combining survey, administrative and big data · 2019-01-04 · Large Data ‣ Administrative data:-medical and tax records -driver licences, civil registry ‣ social

�24

2015 LFS

2020 LFS

2015 estimate

relationship

2020 re-calibrate

Combining Social Security with Labour Force Survey Data

quarterly social security data

interpolate employment and

unemployment numbers

Page 25: Better data: combining survey, administrative and big data · 2019-01-04 · Large Data ‣ Administrative data:-medical and tax records -driver licences, civil registry ‣ social

Impact of raising minimum wages in SA‣ Asking businesses? They will always say it will reduce jobs ‣ Triangulation: ‣ Bank and Tax Data: once enriched with demographic info

profile of lower salary spectrum ‣ Labour Force Survey Data: Profile informal wage earners close

to minimum threshold ‣ Determine the Gap between the 2 profiles ‣ Who are those that are below the minimum wage ‣ How likely would higher minimum wage make them formal? ‣ Who would benefit? ‣ Who strong will incentive be to remain informal?

‣ Big Data = allows to measure impact and allow quick reversal if harmful �25

Page 26: Better data: combining survey, administrative and big data · 2019-01-04 · Large Data ‣ Administrative data:-medical and tax records -driver licences, civil registry ‣ social

researchICTsolutions �26

Can Big Data be used to fill gaps (interpolate key statistics)?

Yes

Can Big Data be used to reduce frequency of surveys? Can Big Data be used to make statistics more accurate?

Can Big Data be used to reduce sample size?

We will only know once we try

Page 27: Better data: combining survey, administrative and big data · 2019-01-04 · Large Data ‣ Administrative data:-medical and tax records -driver licences, civil registry ‣ social

researchICTsolutions

All this assumes access can be negotiated

�27

Page 28: Better data: combining survey, administrative and big data · 2019-01-04 · Large Data ‣ Administrative data:-medical and tax records -driver licences, civil registry ‣ social

researchICTsolutions

Thank youDr. Christoph Stork

[email protected] www.researchictsolutions.com

�28


Recommended