Better data: combining survey, administrative and big data · 2019-01-04 · Large Data ‣...

Post on 08-Aug-2020

0 views 0 download

transcript

researchICTsolutions

Better data: combining survey, administrative and big data

- Dr. Christoph Stork

�1

Big Data = combining various data sets, including large data‣ Large Data (private sector and administrative

data): - n -> all ( n approaches N)

- all bank accounts - all mobile subscribers - all tax payers - all driver licence holders

‣ Survey Data: ‣ Quarterly LFS South Africa 2016 n= 30,000

�2

Large Data‣ Administrative data:

- medical and tax records - driver licences, civil registry ‣ social security, crime statistics, electricity consumption,

educational statistics ‣ Commercial transactions data:

- Stock exchange data, FX - bank and credit card and super market transactions, - insurance records, loyalty card records…

‣ Sensors and tracking devices: sensors, M2M, satellite, GPS devices…

‣ Online activities / social media: Web scraping of online search activity, online page views, blogs/ FB/ twitter

�3

Example:PopulationdensitychangesinColomboregion:weekday/weekendbasedon100millionCallDetailRecordsperdaygeneratedbySriLankamobileoperators

!4

Wee

kday

Sund

ay

Decrease in Density Increase in Density

Time 18:30Time 12:30Time 06:30

by Sriganesh Lokanathan sriganesh@lirneasia.net

Big Data

�5

ADMIN DATA

SURVEY

DATA

PRIVATE SECTOR

DATA

Banks Mobile Operators Other Companies

Labour Force Survey National Household Income and Expenditure Survey

Economic Surveys (Informal sector)

Ministry of Finance Social Security Civil Registry

Surveys are snap shots - Big Data is a movie

2000

2001 Census

2011 Census

2006 DHS

2012 LFS

2003/4 NHIES

2006/7 NHIES

2013 LFS

2014 LFS

2016 LFS

Can Big Data be used to: ‣ Fill gaps (interpolate key statistics)? ‣ Reduce frequency of surveys? ‣ Make statistics more accurate? ‣ Reduce sample size (census 4% of population)?

Example of Namibia surveys

Governments need reliable data‣ Survey data can be misleading if sampling is not

not done properly…(US election polling eg) ‣ Big data can be misleading if what is being

measured is not well understood ‣ Mobile operator data does not include info on non-users ‣ Bank data does not generate informal sector income data

‣ Social media as early indicator of an unemployment? ‣ Self reinforcing trends ‣ fake news

�7

researchICTsolutions

Digital and Financial Divide

�8

Finscope 2015 Survey for South Africa

�9

Big and Admin Data may overlook these

Finscope 2015 Survey for South Africa

75% of South Africa may leave only thin digital trace

�11

ZICTA 2015 ICT Survey: Individuals 10+

Zambia

Urban

Rural7%

18%

14%

39%

68%

51%

Active Mobile usersSmartphone owners among mobile phone owners

ZICTA 2015 ICT Survey: Households with working…

Financially included

Access to mobile phone

Access to computer

Access to Internet 11%

8%

78%

57%

21%

15%

82%

61%

Male Female

Finscope 2015: Individual 16+

Zambia

Male bias

Urban bias

�12

Zambia -90 db

researchICTsolutions

Informal Business Surveys

�13

researchICTsolutions �14

researchICTsolutions �15

researchICTsolutions �16

Listing compiled for each EA - The listings serve as sample frames for the simple random selections of households & businesses

Large data itself not enough for policy purposes‣ Large Data available for the rich or middle class ‣ Large data for Internet of Things ‣ Little available for the informal sector or poor

- Outside of coverage areas - Not using tech - Not having bank account and only using cash - No health insurance - No permanent address

‣ Informal sector makes up a large share of our societies

�17

Generating Big Data through Triangulation

�18

ADMIN DATA

SURVEY

DATA

PRIVATE SECTOR

DATA

Estimating Income

Source: BanksFrequency: ContinuousType: Private Sector Large DataWhat: Salaries received (formal)Data: address, age, gender, IDEnrichment: years of work experience, education, skills, sector

Source: Ministry of Finance Frequency: Continuous Type: Admin Large Data What: Pay as you earn tax (formal salaries), income from self employmentData: address, age, gender, IDEnrichment: years of work experience, education, skills

Source: Labour Force Survey Frequency: Annual / OccasionalType: SurveyWhat: Income of formally employedIncome of informally employedIncome of self-employedData: age, gender, work experience, education, location, sector, type of employment (full-time, part-time or occasional)Enrichment: Having a bank accountReceiving salary in bank accountPaying Social Security

�20

LFS LFS

2015 estimate

relationship

2020 re-calibrate

Income data

no of people receiving a

formal salary

Income Data Triangulation2015 LFS Formal Income = 2015 Total salaries declared for tax

= 2015 Total salaries received in bank accounts

2015 LFS Informal Income = 2015 LFS total Income - Total salaries declared for tax

Monthly or quarterly PAYE Tax

Salary transactions data

interpolate

Understanding Informal Sector better

‣ Large / Big does not know much about informal sector

‣ Estimate link between informal and formal sector based on detailed LFSs

‣ Interpolating Labour force statistics based on large data may then also allow you to interpolate informal sector statistics

�21

Detecting Strikes

‣ Bank transaction data classified as salary or wage payments can be screened for temporary interruptions by location

�22

Matching Jobs to skills

Source: Universities, VTCs (SAQA) Frequency: AnnualType: Admin DataWhat: Supply of graduates by field and date

Source: Company Online Portal,Company Survey, Skills Audit, tracer studiesFrequency: Occasional Type: Survey / Demand drivenWhat: Vacancies by skill, address, date

Source: Labour Force Survey Frequency: Annual / OccasionalType: SurveyWhat: Number of unemployed by age, gender, work experience, skills, education and locationEnrichment:Paying Social Security

Source: Social Security Frequency: ContinuousType: Admin DataWhat: Number of unemployedData details: address, age, gender, years of work experience,education, skills

�24

2015 LFS

2020 LFS

2015 estimate

relationship

2020 re-calibrate

Combining Social Security with Labour Force Survey Data

quarterly social security data

interpolate employment and

unemployment numbers

Impact of raising minimum wages in SA‣ Asking businesses? They will always say it will reduce jobs ‣ Triangulation: ‣ Bank and Tax Data: once enriched with demographic info

profile of lower salary spectrum ‣ Labour Force Survey Data: Profile informal wage earners close

to minimum threshold ‣ Determine the Gap between the 2 profiles ‣ Who are those that are below the minimum wage ‣ How likely would higher minimum wage make them formal? ‣ Who would benefit? ‣ Who strong will incentive be to remain informal?

‣ Big Data = allows to measure impact and allow quick reversal if harmful �25

researchICTsolutions �26

Can Big Data be used to fill gaps (interpolate key statistics)?

Yes

Can Big Data be used to reduce frequency of surveys? Can Big Data be used to make statistics more accurate?

Can Big Data be used to reduce sample size?

We will only know once we try

researchICTsolutions

All this assumes access can be negotiated

�27

researchICTsolutions

Thank youDr. Christoph Stork

christoph@researchictsolutions.com www.researchictsolutions.com

�28