+ All Categories
Home > Documents > Official Statistics and Confidentiality

Official Statistics and Confidentiality

Date post: 24-Feb-2016
Category:
Upload: thyra
View: 69 times
Download: 1 times
Share this document with a friend
Description:
Official Statistics and Confidentiality. Maura Bardos. Outline. Overview of the Federal Statistical System Agencies Types of survey data collected Challenges Statistical Disclosure and confidentiality Implications . Federal Statistical System. Headed by a Chief Statistician - PowerPoint PPT Presentation
Popular Tags:
48
Official Statistics and Confidentiality Maura Bardos
Transcript
Page 1: Official Statistics and  Confidentiality

Official Statistics and Confidentiality

Maura Bardos

Page 2: Official Statistics and  Confidentiality

Outline Overview of the Federal Statistical

System› Agencies› Types of survey data collected

Challenges› Statistical Disclosure and confidentiality› Implications

Page 3: Official Statistics and  Confidentiality

Federal Statistical System Headed by a Chief Statistician Decentralized System in the United

States› 13 Agencies with a statistics oriented

mission› Statistical Agencies are located throughout

various agencies in the Federal Government Examples: Census (Commerce Department),

Energy Information Administration (Department of Energy), Bureau of Labor Statistics (Department of Labor)

Page 4: Official Statistics and  Confidentiality

Data Where do the numbers come from?

Survey data Regulations by OMB

› Response rates› Legal obligations› Confidentiality

Page 5: Official Statistics and  Confidentiality

Confidentiality Confidential Information Protection and

Statistical Efficiency Act of 2002(CIPSEA)- places the onus on federal employees to limit disclosure› Took over 4 years to implement (Anderson and Seltzer)

3 ways to reduce within agencies: › 1) Limiting identifiability of survey materials

within the organization› 2) restricting access to data› 3) restricting the contents that may be

released

Page 6: Official Statistics and  Confidentiality

Statistical Disclosure and Confidentiality

Statistical Disclosure- “the identification of an individual (or of an attribute) through the matching of survey data with information available outside of the survey” (Groves, et.al)

The federal government identifies three different types of disclosure: › Identity: inappropriate attribution of information to a data

subject, whether an individual or an organization.› Attribute: data subject is identified from a released file

sensitive information about a data subject is revealed through the released file

› Inferential: the released data make it possible to determine the value of some characteristic of an individual more accurately than otherwise would have been possible (FCSM)

Page 7: Official Statistics and  Confidentiality

Example

Page 8: Official Statistics and  Confidentiality

Challenges Need to provide information

› FOIA requests, Subpoenas Satisfy requests for multiple clients. Must

keep track of all withheld information Maintain utility of data while preserving

confidentiality “Programming nightmare” to keep track

of the relationship between variables, tables, and hierarchy

Page 9: Official Statistics and  Confidentiality

How To Prevent

Specific Strategies Data Swapping Noise Combining Cells Rounding Cell Suppression

Page 10: Official Statistics and  Confidentiality

Strategy: Data Swapping Exchange of reported data values

across data records (Fienberg, Steele, Makov, 1996)

Page 11: Official Statistics and  Confidentiality

Strategy: Swapping

Page 12: Official Statistics and  Confidentiality

Select 10%Number Child Count

yHH Edu. HH

IncomeRace Sex

4Pete Alpha High 61W M

Alfonso Beta Very High 61W M

Number Child County HH Edu HH Income

Race Sex

4 Alfonso Alpha Very High

61 W M

Page 13: Official Statistics and  Confidentiality

Strategy: Swapping

Page 14: Official Statistics and  Confidentiality

Strategy: Noise Assign a multiplying factor, or noise factor

to all data› For example: the value of a randomly

generated variable might be added to each value in a dataset

“protect individual establishments without compromising the quality of our estimates”

Pro: More data can be published, less complicated, less time consuming

Problem: perturbing ALL data, non-sensitive and sensitive alike

Page 15: Official Statistics and  Confidentiality

Strategy: Noise How is this done: Use Multipliers

› The standard is to perturb data by about 10%› Use multipliers ranging from .9 to 1.1› Must preserve trend in data- otherwise useless

for client’s analysis› Use distributions to control variance (examples)

Page 16: Official Statistics and  Confidentiality

Strategy: Noise

Page 17: Official Statistics and  Confidentiality

Example: Table with and without Noise

Page 18: Official Statistics and  Confidentiality

Tables Before Tabulation Strategies: Data Swapping; Data

Perturbation (Noise) Tables of Frequencies

› Percent of population with certain characteristics› With outside knowledge- respondents with unique

characteristics can be identified› Sensitive information: identified by threshold

Tables of magnitude data› Aggregate data, such as income of individuals, revenues

of companies› Extreme values› Sensitive information: identified by linear sensitivity

measure

Page 19: Official Statistics and  Confidentiality

Strategy: Recoding Methods Changing to values of outlier cases,

since outliers are more likely to be sample or population uniques

Top coding- taking the largest values on a variable and giving them the same code value in dataset› For example- place all companies producing

more than 100,000 barrels of oil per day in one category

Non-uniques are unperturbed

Page 20: Official Statistics and  Confidentiality

Example of DisclosureHow do we fix this?

Page 21: Official Statistics and  Confidentiality

Example Cont. Collapsing of categories

Page 22: Official Statistics and  Confidentiality

Strategy: Rounding Similar to noise. Cells are rounded,

random decision is made whether to round up or down› Example: x -r = 5q

Round values to the a multiple of 5 Where q = non negative integer

r = remainder X = cell value,

Rounded up, 5 x (q+1) probability of r/5Rounded down, 5 x q probability of (1-r/5)

Page 23: Official Statistics and  Confidentiality

Original Table

Page 24: Official Statistics and  Confidentiality

Example: Rounding

Page 25: Official Statistics and  Confidentiality

Strategy: Rounding, now with constraints

Page 26: Official Statistics and  Confidentiality

How to identify cells with disclosure risks for magnitude data

n-k rule p% rule

Page 27: Official Statistics and  Confidentiality

P-Percent rule If upper or lower estimates for the

respondent’s value are closer to the reported value than some prespecified percentage (p) of the total cell value, the cell is sensitive (Groves, 372).

Assumptions: Any respondent can estimate the contribution of another respondent within 100% of its value

The second largest responded can use their reported value and attempt to estimate the largest reported value, X1

Page 28: Official Statistics and  Confidentiality

P Percent Rule A cell is sensitive if:

S>0where S = x1 - 100/p * (T – x2 -

x1)

For a given cell with N respondents, arrange the data in order from large to small: X1>X2>…>Xn>0

Page 29: Official Statistics and  Confidentiality

Example

Consider the cell 18,177.

N=3; X1 = 17,000; X2 = 1,000; X3 = 177; p=15

Page 30: Official Statistics and  Confidentiality

(n, k) Rule If a small number (n) of the respondents contribute a large

percentage (k) to the total cell value then the cell is sensitive (Groves 372)

Page 31: Official Statistics and  Confidentiality

Example We are publishing production data of how

many barrels a day of crude oil each refinery produces. This is secret information. If our competitors found out, it could be detrimental to our business.

There are 4 collectors in the state with collections of 100, 50, 25, and 5 respectively

Find out if this information should be released or not using the n-k rule with (2, 85). The P Percent rule (p=35%)?

Using the P Percent rule, this cell is sensitive. However, it is not sensitive by the n-k rule

Page 32: Official Statistics and  Confidentiality

Relationship between n-k and p% rule

Page 33: Official Statistics and  Confidentiality

System of equations:P%: Z2 > 100 – 1.35Z1(n,k): Z2 > 85 – Z1

Variable ConstraintsZ2 < Z1Z1 + Z2 < 100

Page 34: Official Statistics and  Confidentiality

Relationship between n-k and p% rule

Page 35: Official Statistics and  Confidentiality

(55.56, 27.27)

Page 36: Official Statistics and  Confidentiality

Strategy: Sensitive Cell Suppression

Primary Suppressions: The sensitive Cell Complementary/Secondary Suppressions:

Additional withheld data to ensure that the primary suppressions cannot be derived by linear combination

Goal: Minimize information lost. This is accomplished by selecting smallest possible cell values for complementary cell suppression

Problem: Often requires a substantial amount of data to be withheld. Potential for errors may lead to the release of confidential data

Page 37: Official Statistics and  Confidentiality

Strategy: Sensitive Cell Suppression

Small Tables:› Manual suppression› Computerized audit procedures

Large Tables:› Much more complex, especially with

related tables and hierarchical data› Consistency

Page 38: Official Statistics and  Confidentiality

Real Example: Disclosure

Page 39: Official Statistics and  Confidentiality

Cell Suppression Example Let’s return to a previous example:

Sales Revenue We determined that we must the cell

must be suppressed. How do we accomplish this?

Page 40: Official Statistics and  Confidentiality

Example of a Solution

Page 41: Official Statistics and  Confidentiality

Conclusion: Data is secure High levels of security and suppression

protect data are necessary as data guides real life policy issues.

Quality of this data is dependent on not only a high response rate, but accurate responses

Producing data is a function of “public trust” However, the point of data collection is its

use and analysis. The tradeoff between confidentiality and utilization must be examined

Page 42: Official Statistics and  Confidentiality

…Or is it? Patriot Act 2001 (Anderson & Seltzer) Section 508: Disclosure information from

National Center for Education Statistics Surveys

Justice Department is able to obtain and use for investigation and prosecution reports, records, and information (including individually identifiable information)

The Patriot Act overrides the 1994 National Center for Education Statistics Act that protections confidentiality

Page 43: Official Statistics and  Confidentiality

Other examples from history

Second War Powers Act (1942-1947) Repealed confidentiality protects of Title 13

governing the US Census Bureau (Anderson & Seltzer)

Japanese Americans and Internment camps (USA Today)

Page 44: Official Statistics and  Confidentiality

2004 data on Arab-Americans (NYT)› Released number of Arab-Americans per

zip code› Categorized by country of origin: Egyptian,

Iraqi, Jordanian, Lebanese, Moroccan, Palestinian, Syrian and two general categories, "Arab/Arabic" and "Other Arab."

› Data obtained from a sample (the long form of the census)

Page 45: Official Statistics and  Confidentiality

In conclusion……the next time you fill out a survey,

think about where your information may (or may not) be used.

Page 46: Official Statistics and  Confidentiality

Sources Clemetson, Lynette. “Homeland Secuirty given data on Arab-

Americans.” New York Times. July 30, 2004. http://www.nytimes.com/2004/07/30/politics/30census.html

El Nasser, Haya. “Papers show Census role in WWII Camps.” USA Today. March 30, 2007. http://www.usatoday.com/news/nation/2007-03-30-census-role_N.htm

“DoD releases FY 2010 Budget Proposal.” US Department of Defense. May 7, 2009. http://www.defenselink.mil/releases/release.aspx?releaseid=12652

Seltzer, William and Margo Anderson. “NCES and the Patriot Act.” Paper prepared for the Joint Statistical Meetings. 2002. http://www.uwm.edu/~margo/govstat/jsm.pdf

Evans, Timothy, Laura Zayatz, and John Slanta. “Using Noise for Disclosure Limitation of Establishment Tabular Data.” US Census Bureau. 1996. http://www.census.gov/prod/2/gen/96arc/iiaevans.pdf

“Statistical Programs of the US Government.” Office of Management and Budget. 2009. http://www.whitehouse.gov/omb/assets/information_and_regulatory_affairs/09statprog.pdf

Page 47: Official Statistics and  Confidentiality

Sources of examples Sullivan, Colleen. “An Overview of Disclosure

Principles.” US Census Bureau. 1992. http://www.2010census.biz/srd/papers/pdf/rr92-09.pdf

“Statistical Policy Working Paper: Report on Statistical Disclosure Methodology.” Federal Committee on Statistical Methodology. 2005. http://www.fcsm.gov/working-papers/SPWP22_rev.pdf

Groves, Robert et. al. Survey Methodology. Hoboken, NJ: John Wiley & Sons. 2004.


Recommended