+ All Categories
Home > Documents > The methodology used for the 2001 SARs Special Uniques Analysis Mark Elliot Anna Manning...

The methodology used for the 2001 SARs Special Uniques Analysis Mark Elliot Anna Manning...

Date post: 28-Mar-2015
Category:
Upload: antonio-murray
View: 214 times
Download: 0 times
Share this document with a friend
Popular Tags:
24
The methodology used for the 2001 SARs Special Uniques Analysis Mark Elliot Anna Manning Confidentiality And Privacy Group (www.capri.man.ac.uk) University of Manchester
Transcript
Page 1: The methodology used for the 2001 SARs Special Uniques Analysis Mark Elliot Anna Manning Confidentiality And Privacy Group () University.

The methodology used for the 2001 SARs Special

Uniques AnalysisMark Elliot

Anna ManningConfidentiality And Privacy

Group(www.capri.man.ac.uk)

University of Manchester

Page 2: The methodology used for the 2001 SARs Special Uniques Analysis Mark Elliot Anna Manning Confidentiality And Privacy Group () University.

Overview

• Description of DIS• Description of SUDA• Description of DIS-

SUDA• Numerical Study

Page 3: The methodology used for the 2001 SARs Special Uniques Analysis Mark Elliot Anna Manning Confidentiality And Privacy Group () University.

Data Intrusion Simulation(DIS)

• Uses microdata set itself to estimate risk at the file level

• Provides estimates of matching probabilities – matching probability particularly: probability

of a correct match given a unique match: pr(cm|um).

• Special method: sub-sampling and re-sampling.

• General method: derivation from the partition structure of the microdata file.

Page 4: The methodology used for the 2001 SARs Special Uniques Analysis Mark Elliot Anna Manning Confidentiality And Privacy Group () University.

The DIS Method

Remove a small number of records

Microdata sample

Page 5: The methodology used for the 2001 SARs Special Uniques Analysis Mark Elliot Anna Manning Confidentiality And Privacy Group () University.

The DIS Method II

Copy back a random number of the removed records (at a probability equivalent to the original sampling fraction)

Page 6: The methodology used for the 2001 SARs Special Uniques Analysis Mark Elliot Anna Manning Confidentiality And Privacy Group () University.

The DIS Method III

Match the removed fragment against the truncated microdata file

Page 7: The methodology used for the 2001 SARs Special Uniques Analysis Mark Elliot Anna Manning Confidentiality And Privacy Group () University.

DIS Validation

• Numerical studies using population data: results: no bias and small error; Elliot (2000)

• Statistical validation; Skinner and Elliot (2002)

Page 8: The methodology used for the 2001 SARs Special Uniques Analysis Mark Elliot Anna Manning Confidentiality And Privacy Group () University.

Levels of Risk Analysis

• DIS– Works at the file level– Very good for comparative analyses

• E.G. Small area microdata(SAM); Tranmer et. al. (2003)

• BUT: Record level risk is important– Variations in risk topography– Risky records

Page 9: The methodology used for the 2001 SARs Special Uniques Analysis Mark Elliot Anna Manning Confidentiality And Privacy Group () University.

Special Uniques

• Original concept: Elliot, Skinner & Dale(1998)– Counterintuitive geographical effect,

indicated two types of sample uniques– Random and special– Special

• Demographic peculiarity

– Random• Effect of sampling and variable definition

Page 10: The methodology used for the 2001 SARs Special Uniques Analysis Mark Elliot Anna Manning Confidentiality And Privacy Group () University.

Special Uniques Definitions

• Changing definition:1. Sample uniques which remain

unique despite geographical aggregation.

2. Sample uniques which remain unique through any variable aggregation.

3. Sample uniques on small number of key variables.

Page 11: The methodology used for the 2001 SARs Special Uniques Analysis Mark Elliot Anna Manning Confidentiality And Privacy Group () University.

Theoretical and empirical properties of special and random

uniques

Property Special Uniques Random Uniques Cognitive Spontaneously recognisable

Maybe No

Appear Unusual Maybe No Data- Analytical Sensitive to removal of key variables

No, except if one of the contributing variables

Yes

Sensitive to changes in sampling fraction

No Yes

Sensitive to changes in geographical detail

No Yes

Uniqueness arises from Small number of variables

Large Number of Variables

Page 12: The methodology used for the 2001 SARs Special Uniques Analysis Mark Elliot Anna Manning Confidentiality And Privacy Group () University.

Special Uniques: Issues

Problem: how to look at all the variables?

– File may contain hundreds– Combinatorial explosion– Data storage issues

(1)Storage requirements for locating minimal sample unique patterns(MSU’s)

(2)Storage of results for post-processing

Page 13: The methodology used for the 2001 SARs Special Uniques Analysis Mark Elliot Anna Manning Confidentiality And Privacy Group () University.

HIPERSTAD

Use of high performance computing– Enables comprehensive analysis of

patterns of uniqueness within each record

– Has allowed investigation of more complex grading systems

Page 14: The methodology used for the 2001 SARs Special Uniques Analysis Mark Elliot Anna Manning Confidentiality And Privacy Group () University.

Risk Signatures

Example– Unique pairs 3– Unique Triples 2– Unique fourfolds 0– Unique fivefolds 1– Unique sixfolds 0– Unique sevenfolds 0– ………

Page 15: The methodology used for the 2001 SARs Special Uniques Analysis Mark Elliot Anna Manning Confidentiality And Privacy Group () University.

An example of MSUs at record level

Size 2 Size 3 Size 5

1,2 (1,6,9) (2,5,6,8,11)

1,5 (5,8,12)

1,8

Page 16: The methodology used for the 2001 SARs Special Uniques Analysis Mark Elliot Anna Manning Confidentiality And Privacy Group () University.

(1) (2) (3) (4) (5) (6) (7) (8) (9) (10) (11) (12)

(1,2) (1,5) (1,6) (1,8) (1,9) (2,5) (2,6) (2,8) (2,11) (5,6) (5,8) (5,11) (5,12) (6,8) (6,9) (6,11) (8,11) (8.12)

(1,6,9) (2,5,6) (2,5,8) (2.5.11) (2,6,8) (2,6,11) (2,8,11) (5,6,8) (5,6,11) (5,8,11) (5,8,12) (6,8,11)

(2,5,6,8) (2,5,6,11) (2,5,8,11) (2,6,8,11) (5,6,8,11)

(2,5,6,8,11)

Figure 1: Lattcie structure showing minimal uniques

Page 17: The methodology used for the 2001 SARs Special Uniques Analysis Mark Elliot Anna Manning Confidentiality And Privacy Group () University.

Numerical Study

• Elliot et al. (2002), show strong relationship between SUDA output score (essentially a measure of the proportion of lattice that is unique) and Population Equivalence class

• However, SUDA’s output score is ad hoc. Two SUDA output scores from different analyses do not mean the same thing.

Page 18: The methodology used for the 2001 SARs Special Uniques Analysis Mark Elliot Anna Manning Confidentiality And Privacy Group () University.

DIS-SUDA

• DIS and SUDA outputs both relate to the underlying partition structure in the population.

• However, relating the two is tricky as SUDA is ad hoc.

• The method we have developed involves first running DIS to calibrate SUDA

Page 19: The methodology used for the 2001 SARs Special Uniques Analysis Mark Elliot Anna Manning Confidentiality And Privacy Group () University.

DIS-SUDA

• It exploits the fact that DIS accurately estimates the mean reciprocal equivalence class.

– this can be used to derive the number of population units corresponding to the sample uniques.

– which can then be distributed using the SUDA score.

Page 20: The methodology used for the 2001 SARs Special Uniques Analysis Mark Elliot Anna Manning Confidentiality And Privacy Group () University.

DIS-SUDA

LET D = The DIS matching probability pr (cm|um) LET U = the number of sample uniques LET K = the number of variables in the Key LET S = SUDA output metric LET Q = 1+(8-K)/20

Page 21: The methodology used for the 2001 SARs Special Uniques Analysis Mark Elliot Anna Manning Confidentiality And Privacy Group () University.

DIS-SUDA Evaluation

•1991 census data used •Geographical area pop approximately 0.5m population.

•50 parallel geographically stratified 2% samples drawn•12 key variables

•restricted to variables coded at 100% in 1991•DIS-SUDA run across all 50 samples

•Results summed across the 50 samples.•Compare DIS-SUDA scores with population uniques and 1/Fj

Page 22: The methodology used for the 2001 SARs Special Uniques Analysis Mark Elliot Anna Manning Confidentiality And Privacy Group () University.

Percentage of records population unique by DIS SUDA score (rounded up to one decimal place).

DIS-SUDA Score Population Non-Uniques

Population Uniques

0.0-0.1 99.80% 0.20%0.1-0.2 96.50% 3.50%0.2-0.3 95.60% 4.40%0.3-0.4 88.00% 12.00%0.4-0.5 84.80% 15.20%0.5-0.6 73.70% 26.30%0.6-0.7 55.10% 44.90%0.7-0.8 48.30% 51.70%0.8-0.9 25.00% 75.00%0.9-1.0 8.70% 91.30%

Page 23: The methodology used for the 2001 SARs Special Uniques Analysis Mark Elliot Anna Manning Confidentiality And Privacy Group () University.

Mean reciprocal population equivalence class by DIS-SUDA score (grouped)

DIS-SUDA Score Mean reciprocal equivalence class

0.0-0.1 0.040.1-0.2 0.130.2-0.3 0.170.3-0.4 0.280.4-0.5 0.340.5-0.6 0.430.6-0.7 0.600.7-0.8 0.650.8-0.9 0.850.9-1.0 0.95

Page 24: The methodology used for the 2001 SARs Special Uniques Analysis Mark Elliot Anna Manning Confidentiality And Privacy Group () University.

Conclusions

• Combination of DIS and SUDA give desired record level matching certainty metric

• Records DIS SUDA predicts are population unique are extremely likely to be so.


Recommended