+ All Categories
Home > Documents > Joint UNECE/Eurostat Work Session on Statistical Data Confidentiality, 5-7 September 2015, Helsinki...

Joint UNECE/Eurostat Work Session on Statistical Data Confidentiality, 5-7 September 2015, Helsinki...

Date post: 20-Jan-2016
Category:
Upload: marylou-whitehead
View: 225 times
Download: 2 times
Share this document with a friend
Popular Tags:
21
Transcript
Page 1: Joint UNECE/Eurostat Work Session on Statistical Data Confidentiality, 5-7 September 2015, Helsinki Beata Nowok Administrative Data Research Centre – Scotland.
Page 2: Joint UNECE/Eurostat Work Session on Statistical Data Confidentiality, 5-7 September 2015, Helsinki Beata Nowok Administrative Data Research Centre – Scotland.

What is synthpop?

A software tool for producing synthetic versions of sensitive microdata

Administrative Data Research Centre - Scotland | Beata Nowok | 5-7 October 2015

Page 3: Joint UNECE/Eurostat Work Session on Statistical Data Confidentiality, 5-7 September 2015, Helsinki Beata Nowok Administrative Data Research Centre – Scotland.

Sex Age EducationMarital status

Income Life satisfaction

FEMALE 57 VOCATIONAL/GRAMMAR MARRIED 800 PLEASED

MALE 41 SECONDARY UNMARRIED 1500 MIXED

FEMALE 18 VOCATIONAL/GRAMMAR UNMARRIED NA PLEASED

FEMALE 78 PRIMARY/NO EDUCATION WIDOWED 900 MIXED

FEMALE 54 VOCATIONAL/GRAMMAR MARRIED 1500 MOSTLY SATISFIED

MALE 20 SECONDARY UNMARRIED -8 PLEASED

FEMALE 39 SECONDARY MARRIED 2000 MOSTLY SATISFIED

MALE 39 SECONDARY MARRIED 1197 MIXED

FEMALE 38 VOCATIONAL/GRAMMAR MARRIED NA MOSTLY DISSATISFIED

FEMALE 73 VOCATIONAL/GRAMMAR WIDOWED 1700 PLEASED

FEMALE 54 SECONDARY WIDOWED 2000 MOSTLY SATISFIED

MALE 30 VOCATIONAL/GRAMMAR UNMARRIED 900 MOSTLY SATISFIED

MALE 68 SECONDARY MARRIED -8 DELIGHTED

MALE 61 PRIMARY/NO EDUCATION MARRIED -8 MIXED

Observed (input)

Sex Age EducationMarital status

Income Life satisfaction

MALE 81 PRIMARY/NO EDUCATION MARRIED 2100 PLEASED

MALE 54 VOCATIONAL/GRAMMAR MARRIED 1700 PLEASED

FEMALE 32 VOCATIONAL/GRAMMAR DIVORCED 870 MIXED

FEMALE 98 PRIMARY/NO EDUCATION MARRIED 800 MOSTLY DISSATISFIED

FEMALE 50 PRIMARY/NO EDUCATION MARRIED NA MOSTLY SATISFIED

FEMALE 37 VOCATIONAL/GRAMMAR MARRIED 158 PLEASED

MALE 28 VOCATIONAL/GRAMMAR NA 1500 MOSTLY SATISFIED

FEMALE 62 PRIMARY/NO EDUCATION MARRIED 830 MOSTLY SATISFIED

MALE 78 PRIMARY/NO EDUCATION MARRIED NA PLEASED

FEMALE 29 SECONDARY MARRIED 580 MOSTLY SATISFIED

MALE 59 PRIMARY/NO EDUCATION MARRIED 1300 MOSTLY SATISFIED

MALE 41 SECONDARY UNMARRIED 1500 MIXED

MALE 18 SECONDARY UNMARRIED -8 PLEASED

FEMALE 73 PRIMARY/NO EDUCATION WIDOWED 1350 MOSTLY SATISFIED

Synthetic (output)

Data that look (structurally) like original data but contain artificial units only

Page 4: Joint UNECE/Eurostat Work Session on Statistical Data Confidentiality, 5-7 September 2015, Helsinki Beata Nowok Administrative Data Research Centre – Scotland.

Generating synthetic data: method

Sequentially replacing original data values with synthetic values generated from conditional probability distributions

fit

draw

Yj ~ (Y0,Y1,...,Yj−1)

syn

theti

c

ob

serv

ed

Page 5: Joint UNECE/Eurostat Work Session on Statistical Data Confidentiality, 5-7 September 2015, Helsinki Beata Nowok Administrative Data Research Centre – Scotland.

http://cran.r-project.org/package=synthpop

Generating synthetic versions of sensitive microdata for statistical disclosure control

Page 6: Joint UNECE/Eurostat Work Session on Statistical Data Confidentiality, 5-7 September 2015, Helsinki Beata Nowok Administrative Data Research Centre – Scotland.
Page 7: Joint UNECE/Eurostat Work Session on Statistical Data Confidentiality, 5-7 September 2015, Helsinki Beata Nowok Administrative Data Research Centre – Scotland.
Page 8: Joint UNECE/Eurostat Work Session on Statistical Data Confidentiality, 5-7 September 2015, Helsinki Beata Nowok Administrative Data Research Centre – Scotland.

Generating synthetic data: synthpop

syn

theti

c

syn()

ob

serv

ed

Page 9: Joint UNECE/Eurostat Work Session on Statistical Data Confidentiality, 5-7 September 2015, Helsinki Beata Nowok Administrative Data Research Centre – Scotland.

Synthesis can be run with default parameters (CART – Classification and Regression Trees)

syn(data)

Generating synthetic data: synthpop

Administrative Data Research Centre - Scotland | Beata Nowok | 5-7 October 2015

Page 10: Joint UNECE/Eurostat Work Session on Statistical Data Confidentiality, 5-7 September 2015, Helsinki Beata Nowok Administrative Data Research Centre – Scotland.
Page 11: Joint UNECE/Eurostat Work Session on Statistical Data Confidentiality, 5-7 September 2015, Helsinki Beata Nowok Administrative Data Research Centre – Scotland.
Page 12: Joint UNECE/Eurostat Work Session on Statistical Data Confidentiality, 5-7 September 2015, Helsinki Beata Nowok Administrative Data Research Centre – Scotland.

syn() & common data problems

Missing-data codes: cont.na

categorical variables: additional factor level(s)

continuous variables: specified by cont.na and modelled

separately

Semi-continuous variables: semicont

Restricted values (interrelationships between variables):

rules & rvalues

Linear constraints: denom

Non-negativity / non-normality: method set to ‘lognorm’,

‘sqrtnorm’ or ‘cubertnorm’

Deterministic relations: method set to “~I(…)”

Page 13: Joint UNECE/Eurostat Work Session on Statistical Data Confidentiality, 5-7 September 2015, Helsinki Beata Nowok Administrative Data Research Centre – Scotland.

syn()

Page 14: Joint UNECE/Eurostat Work Session on Statistical Data Confidentiality, 5-7 September 2015, Helsinki Beata Nowok Administrative Data Research Centre – Scotland.

Overview of synthpop functions

syn

theti

c

read.obs() write.syn()

sdc()

compare.synds() summary.synds()

compare.fit.synds()glm.synds()summary.fit.synds()

descriptive

models

syn()

ob

serv

ed

utility.synds()data structure

Page 15: Joint UNECE/Eurostat Work Session on Statistical Data Confidentiality, 5-7 September 2015, Helsinki Beata Nowok Administrative Data Research Centre – Scotland.

compare()

Page 16: Joint UNECE/Eurostat Work Session on Statistical Data Confidentiality, 5-7 September 2015, Helsinki Beata Nowok Administrative Data Research Centre – Scotland.

compare()

Page 17: Joint UNECE/Eurostat Work Session on Statistical Data Confidentiality, 5-7 September 2015, Helsinki Beata Nowok Administrative Data Research Centre – Scotland.

compare()

Page 18: Joint UNECE/Eurostat Work Session on Statistical Data Confidentiality, 5-7 September 2015, Helsinki Beata Nowok Administrative Data Research Centre – Scotland.

utility.synds()

Page 19: Joint UNECE/Eurostat Work Session on Statistical Data Confidentiality, 5-7 September 2015, Helsinki Beata Nowok Administrative Data Research Centre – Scotland.

sdc() & statistical disclosure control

Data labelling: label Removing replicated uniques:

rm.replicated.uniques Bottom- and top-coding: recode.vars,

bottom.top.coding, recode.exclude

At synthesis stage: smoothing, minbucket

Page 20: Joint UNECE/Eurostat Work Session on Statistical Data Confidentiality, 5-7 September 2015, Helsinki Beata Nowok Administrative Data Research Centre – Scotland.

sdc()

Page 21: Joint UNECE/Eurostat Work Session on Statistical Data Confidentiality, 5-7 September 2015, Helsinki Beata Nowok Administrative Data Research Centre – Scotland.

Conclusions

The synthpop package for R:

facilitating generation, evaluation and analysis of synthetic data

Administrative Data Research Centre - Scotland | Beata Nowok | 5-7 October 2015


Recommended