+ All Categories
Home > Documents > CAPRI CCSR Analysis of Information Loss: a Case Study From a UK Survey Mark Elliot Kingsley Purdam...

CAPRI CCSR Analysis of Information Loss: a Case Study From a UK Survey Mark Elliot Kingsley Purdam...

Date post: 28-Mar-2015
Category:
Upload: julia-greene
View: 217 times
Download: 2 times
Share this document with a friend
Popular Tags:
23
CAPRI CCSR Analysis of Information Loss: a Case Study From a UK Survey Mark Elliot Kingsley Purdam Confidentiality and Privacy Group (CAPRI) CCSR, University of Manchester
Transcript
Page 1: CAPRI CCSR Analysis of Information Loss: a Case Study From a UK Survey Mark Elliot Kingsley Purdam Confidentiality and Privacy Group (CAPRI) CCSR, University.

CAPRI CCSR

 Analysis of Information

Loss: a Case Study From a UK Survey

Mark Elliot

Kingsley Purdam

Confidentiality and Privacy Group (CAPRI)

CCSR, University of Manchester

Page 2: CAPRI CCSR Analysis of Information Loss: a Case Study From a UK Survey Mark Elliot Kingsley Purdam Confidentiality and Privacy Group (CAPRI) CCSR, University.

CAPRI CCSR

Outline

• Concepts

• General Method

• Results

Page 3: CAPRI CCSR Analysis of Information Loss: a Case Study From a UK Survey Mark Elliot Kingsley Purdam Confidentiality and Privacy Group (CAPRI) CCSR, University.

CAPRI CCSR

Concepts

• Analytical Completeness– Effects of Recodes

• Analytical Validity– Effects of Perturbations

Page 4: CAPRI CCSR Analysis of Information Loss: a Case Study From a UK Survey Mark Elliot Kingsley Purdam Confidentiality and Privacy Group (CAPRI) CCSR, University.

CAPRI CCSR

General Method

• Selected Sample of publications

• Contact Authors

• Phase 1 Questionnaire

• Phase 2 Rerun of Studies

Page 5: CAPRI CCSR Analysis of Information Loss: a Case Study From a UK Survey Mark Elliot Kingsley Purdam Confidentiality and Privacy Group (CAPRI) CCSR, University.

CAPRI CCSR

Completeness Study

Page 6: CAPRI CCSR Analysis of Information Loss: a Case Study From a UK Survey Mark Elliot Kingsley Purdam Confidentiality and Privacy Group (CAPRI) CCSR, University.

CAPRI CCSR

Example Recodes

• Age recoded from single years to Five-year bands.

• Area removed from data set but region left in.

• Ethnicity recoded from 10 to 4 categories:

– a.   White

– b.   Black

– c.   Asian

– d. Other

Page 7: CAPRI CCSR Analysis of Information Loss: a Case Study From a UK Survey Mark Elliot Kingsley Purdam Confidentiality and Privacy Group (CAPRI) CCSR, University.

CAPRI CCSR

30.4%

8.7%

17.4%

21.7%

21.7%13+

10-12

7-9

4-6

1-3

Number of recodes impacting on analyses per author.

Page 8: CAPRI CCSR Analysis of Information Loss: a Case Study From a UK Survey Mark Elliot Kingsley Purdam Confidentiality and Privacy Group (CAPRI) CCSR, University.

CAPRI CCSR

4.3%

8.7%

17.4%

30.4%

17.4%

21.7%

13+

10-12

7-9

4-6

1-3

0

Number of recodes severely impacting on analyses per author

Page 9: CAPRI CCSR Analysis of Information Loss: a Case Study From a UK Survey Mark Elliot Kingsley Purdam Confidentiality and Privacy Group (CAPRI) CCSR, University.

CAPRI CCSR

17.4%

34.8%

8.7%

39.1%

Other

Severely affacted

Moderately affected

Not affected

Percentage of authors giving each category of response to whether removing area retaining region would affect their analyses.

Page 10: CAPRI CCSR Analysis of Information Loss: a Case Study From a UK Survey Mark Elliot Kingsley Purdam Confidentiality and Privacy Group (CAPRI) CCSR, University.

CAPRI CCSR

34.8%

43.5%

21.7%

Severely affacted

Moderately affected

Not affected

Percentage of authors giving to each category of response to whether recoding age into ten-year bands would affect their analyses

Page 11: CAPRI CCSR Analysis of Information Loss: a Case Study From a UK Survey Mark Elliot Kingsley Purdam Confidentiality and Privacy Group (CAPRI) CCSR, University.

CAPRI CCSR

•utility index.

Utility = %none+ (%moderate + %other)/2

•No great claims made about this but •useful way of summarising results and •can be compared to disclosure risk impact (using DIS: Skinner and Elliot 2003).

Page 12: CAPRI CCSR Analysis of Information Loss: a Case Study From a UK Survey Mark Elliot Kingsley Purdam Confidentiality and Privacy Group (CAPRI) CCSR, University.

CAPRI CCSR

Variable ToUtility index Variable To

Utility index

Age Five year bands 74 Industry 9 categories 91Age Ten year bands 43 Marital status 3 categories 74Area 12 regions 52 Occupation 9 categories 83Area 4 countries 59 Number of highest qualification Omit 78Country of birth 2 categories 67 Level of highest qualification 2 categories 80Country of birth 4 categories 63 Subject of highest qualification Omit 91Ethnic group 4 categories 59 Relationship to household head 4 categories 83Distance of move 3 categories 83 Socio-economic group Omit 61Distance to work 5 categories 91 Term time address Omit 96Primary economic status 4 categories 63 Method of transport to work 5 categories 96Secondary economic status Omit 87 Work place Omit 89Family type 3 categories 61 Number of cars in household 3 categories 89Work hours 4 bands 93 Dwelling space type 5 categories 93Work hours Top coded at 50 93 Number of residents per room 3 categories 89

Tenure 3 categories 65

utility index for the data after recode

Page 13: CAPRI CCSR Analysis of Information Loss: a Case Study From a UK Survey Mark Elliot Kingsley Purdam Confidentiality and Privacy Group (CAPRI) CCSR, University.

CAPRI CCSR

A DIS analysis showing the probability of a correct math given a unique match of the SARs using a base key (basic = age94, sex2, marital status5) + a selection of other variables

before and after recoding

Key Recoded variable Categories

bef>aft SARS Recoded Impact

Area,Age,sex,mstatus,Ocupation OCCUPATION 74->10 0.055 0.025 0.459

Area,Age,sex,mstatus,Industry INDUSTRY 63->10 0.049 0.026 0.524

Area,Age,sex,mstatus,hours HOURS 73->50 0.044 0.038 0.864

Area,Age,sex,mstatus,cobirth COBIRTH 42->2 0.041 0.038 0.927

Area,Age,sex,mstatus,primecon PRIMECON 10->4 0.028 0.021 0.766

Area,Age,sex,mstatus,tenure TENURE 10->3 0.028 0.022 0.802

Area,Age,sex,mstatus,ethnic ETHNIC 10->4 0.023 0.020 0.870

Area,Age,sex,mstatus,primecon Age 93->10 0.028 0.020 0.726

Area,Age,sex,mstatus,primecon Age 93->20 0.028 0.021 0.753

Region,Age,sex,mstatus,primecon Geography 273->12 0.028 0.020 0.711

Page 14: CAPRI CCSR Analysis of Information Loss: a Case Study From a UK Survey Mark Elliot Kingsley Purdam Confidentiality and Privacy Group (CAPRI) CCSR, University.

CAPRI CCSR

Table 3: Relationship between utility index and disclosure risk impact

Variable From To

Utility index (UI)

Disclosure risk impact (DRI)

UI/ (DRI*100)

Age Single years Five year bands 74 0.75 0.98 Age Single years Ten year bands 43 0.73 0.59 Area 278 areas 12 regions 52 0.71 0.73 Country of birth 42 categories 2 categories 67 0.93 0.72 Ethnic group 10 categories 4 categories 59 0.87 0.68 Primary economic status 10 categories 4 categories 63 0.77 0.82 Work hours Single hours Top coded at 50 93 0.86 1.08 Industry 61 categories 9 categories 91 0.52 1.74 Occupation 73 categories 9 categories 83 0.46 1.81 Tenure 10 categories 3 categories 65 0.80 0.81

Page 15: CAPRI CCSR Analysis of Information Loss: a Case Study From a UK Survey Mark Elliot Kingsley Purdam Confidentiality and Privacy Group (CAPRI) CCSR, University.

CAPRI CCSR

Validity Study

Page 16: CAPRI CCSR Analysis of Information Loss: a Case Study From a UK Survey Mark Elliot Kingsley Purdam Confidentiality and Privacy Group (CAPRI) CCSR, University.

CAPRI CCSR

ARGUS: Problems and Resolutions

• Key Variable Selection problematic.– Not able to use Elliot and Dale(1999)

scenarios keys.

• Individual risk model doesn’t work on un-weighted data.

• Not able to block certain missing values from use.

Page 17: CAPRI CCSR Analysis of Information Loss: a Case Study From a UK Survey Mark Elliot Kingsley Purdam Confidentiality and Privacy Group (CAPRI) CCSR, University.

CAPRI CCSR

Perturbed File 1

File with suppressions.• All two dimensional tables.• Three dimensional tables under scenarios.

Page 18: CAPRI CCSR Analysis of Information Loss: a Case Study From a UK Survey Mark Elliot Kingsley Purdam Confidentiality and Privacy Group (CAPRI) CCSR, University.

CAPRI CCSR

Perturbed File 2

PRAMed file• All Variables PRAMED levels set to

maintain univariate distributions

Page 19: CAPRI CCSR Analysis of Information Loss: a Case Study From a UK Survey Mark Elliot Kingsley Purdam Confidentiality and Privacy Group (CAPRI) CCSR, University.

CAPRI CCSR

Perturbed File 3

1. Unperturbed! Control File.

Page 20: CAPRI CCSR Analysis of Information Loss: a Case Study From a UK Survey Mark Elliot Kingsley Purdam Confidentiality and Privacy Group (CAPRI) CCSR, University.

CAPRI CCSR

Perturbed File 4

1. PRAMed as file 2.

2. Suppressions• All two dimensional tables.

Page 21: CAPRI CCSR Analysis of Information Loss: a Case Study From a UK Survey Mark Elliot Kingsley Purdam Confidentiality and Privacy Group (CAPRI) CCSR, University.

CAPRI CCSR

Overview of Results

•Basic analyses on the whole SAR: cross-tabs, correlations, simple regressions lead to fairly consistent interpretations. However still some problems.

•Problems arise for all three perturbed files for more complex analyses and/or those involving sub-sections of the file (e.g one geographical area).

Page 22: CAPRI CCSR Analysis of Information Loss: a Case Study From a UK Survey Mark Elliot Kingsley Purdam Confidentiality and Privacy Group (CAPRI) CCSR, University.

CAPRI CCSR

Author/Researcher Description of effect of perturbations by suppression method used. Ten example studies.

Affect of perturbation File Perturbation method None Moderate Severe A Suppressions 5 5 0 B PRAM 2 7 1 C None 10 0 0 D Both 1 5 4

Page 23: CAPRI CCSR Analysis of Information Loss: a Case Study From a UK Survey Mark Elliot Kingsley Purdam Confidentiality and Privacy Group (CAPRI) CCSR, University.

CAPRI CCSR

Conclusions

• Study introduces methods for measuring the utility impact of disclosure control measures•The relationship between utility measures and and disclosure risk measures represent the cost benefit equation of disclosure control.


Recommended