+ All Categories
Home > Documents > Microdata Simulation for Confidentiality of Tax Returns Using Quantile Regression and Hot Deck...

Microdata Simulation for Confidentiality of Tax Returns Using Quantile Regression and Hot Deck...

Date post: 26-Dec-2015
Category:
Upload: calvin-wilcox
View: 216 times
Download: 1 times
Share this document with a friend
Popular Tags:
22
Microdata Simulation for Confidentiality of Tax Returns Using Quantile Regression and Hot Deck Jennifer Huckett Iowa State University June 20, 2007
Transcript
Page 1: Microdata Simulation for Confidentiality of Tax Returns Using Quantile Regression and Hot Deck Jennifer Huckett Iowa State University June 20, 2007.

Microdata Simulation for Confidentiality of Tax Returns Using Quantile Regression and

Hot Deck

Jennifer Huckett

Iowa State University

June 20, 2007

Page 2: Microdata Simulation for Confidentiality of Tax Returns Using Quantile Regression and Hot Deck Jennifer Huckett Iowa State University June 20, 2007.

Outline

• Motivation

• Disclosure Limitation Methods

• Risk Assessment

• Simulation Study

• Results & Conclusions

Page 3: Microdata Simulation for Confidentiality of Tax Returns Using Quantile Regression and Hot Deck Jennifer Huckett Iowa State University June 20, 2007.

Motivation• Iowa Department of Revenue (IDR)

– Collects and maintains individual tax return data

• Legislative Services Agency (LSA)– Examines impact of tax law changes on liability

• Current system– LSA submits requests to IDR– IDR computes liability, reports to LSA– Occurs several times each year– Inefficient for both IDR and LSA

Page 4: Microdata Simulation for Confidentiality of Tax Returns Using Quantile Regression and Hot Deck Jennifer Huckett Iowa State University June 20, 2007.

• Solutions– Secure/remote access server

• Data are not released

• Some analyses suppressed

– Statistical disclosure limitation (SDL)• Tabular

• Microdata– enable IDR to provide LSA with data set

– allow LSA to compute liability with ease and accuracy

– MUST ENSURE CONFIDENTIALITY of RECORDS!

Page 5: Microdata Simulation for Confidentiality of Tax Returns Using Quantile Regression and Hot Deck Jennifer Huckett Iowa State University June 20, 2007.

Establishment Connection

• Very skew distributions, unusual associations among distributions

• Groups of variables are related to one another in unusual ways

• Similar to business tax data or business expenditure/revenue data

• Confidentiality is critical

Page 6: Microdata Simulation for Confidentiality of Tax Returns Using Quantile Regression and Hot Deck Jennifer Huckett Iowa State University June 20, 2007.

Traditional Approaches

• Recoding (e.g. aggregation)

• Noise addition

• Data swapping

• Data suppression

• Imputation

• Combinations of these

Page 7: Microdata Simulation for Confidentiality of Tax Returns Using Quantile Regression and Hot Deck Jennifer Huckett Iowa State University June 20, 2007.

Our Approach

• Synthetic microdata simulation– Retain key demographic variables– Simulate values for some variables

• Quantile regression conditional on key variables

• Compute fitted values at selected quantiles

– Impute values for remaining variables • Hot deck + rank swap

• Hot deck based on simulated income variables

Page 8: Microdata Simulation for Confidentiality of Tax Returns Using Quantile Regression and Hot Deck Jennifer Huckett Iowa State University June 20, 2007.

Quantile Regression

– = “tilted absolute value function” for quantile

– = linear function of predictors (xi)

• performed in R– quantreg package– rq function

Quantile Regression, Koenker 2004

)),((min ii xy

)ˆ( yyi ),( ix

th

Page 9: Microdata Simulation for Confidentiality of Tax Returns Using Quantile Regression and Hot Deck Jennifer Huckett Iowa State University June 20, 2007.

Simulate via Quantile Regression

• Estimate for quantiles from the set

• For each record on variable y

– Randomly select ~ Uniform(0,1)

– Compute fitted given x at above and below

– Interpolate to obtain = simulated value

={0.01, 0.02, ...,0.99}

*ˆy

**y

),( ix

Page 10: Microdata Simulation for Confidentiality of Tax Returns Using Quantile Regression and Hot Deck Jennifer Huckett Iowa State University June 20, 2007.

IDR Application: Key Demographic Variables

• Number of dependents– 0, 1, 2,…

– Categorized into • 0

• 1

• ≥2

• County– 1,…,99

– Categorized into 4 population size groups

• State filing status1. single2. married filing joint3. married filing separate

on combined return4. married filing separate

returns5. head of household6. widow(er) with

dependent child– Categorized into

• 1• 2 and 3• 4, 5, and 6

Page 11: Microdata Simulation for Confidentiality of Tax Returns Using Quantile Regression and Hot Deck Jennifer Huckett Iowa State University June 20, 2007.

IDR Application: Quantile Regression for wages

]4[]3[]2[]6,5,4[

]3,2[]2[#]1[#

111098

7654

43

32

210

countyIcountyIcountyIsfsI

sfsIdepIdepIageageageagewages

Page 12: Microdata Simulation for Confidentiality of Tax Returns Using Quantile Regression and Hot Deck Jennifer Huckett Iowa State University June 20, 2007.

• Hot Deck– Mahalanobis distance

– closest 20 records

• Rank Swap– compute sample rank, r

– draw random rank, r*, from discrete Uniform[r-10, r+10]

– impute value from record with rank r*

IDR Application: Hot Deck and Rank Swap for Federal Tax

)()'(),( 1jixxji xxSxxjid

Page 13: Microdata Simulation for Confidentiality of Tax Returns Using Quantile Regression and Hot Deck Jennifer Huckett Iowa State University June 20, 2007.

Disclosure Risk Measurement

• Using methods detailed in Reiter (2005) and Duncan and Lambert (1986, 1989)

• Examine specific records– Original records– Released records – Model intruder behavior to assess disclosure

risk

• Simulation Study

Page 14: Microdata Simulation for Confidentiality of Tax Returns Using Quantile Regression and Hot Deck Jennifer Huckett Iowa State University June 20, 2007.

Original and Released Records

Page 15: Microdata Simulation for Confidentiality of Tax Returns Using Quantile Regression and Hot Deck Jennifer Huckett Iowa State University June 20, 2007.

),|Pr( ZtjJ

Intruder Behavior

• Target record, t– Intruder has information on target

– Attempts to match t in released records

• Released records j=1,…,r in Z• Probability that record j belongs to target t is

• As – probability decreases

– disclosure risk decreases

Page 16: Microdata Simulation for Confidentiality of Tax Returns Using Quantile Regression and Hot Deck Jennifer Huckett Iowa State University June 20, 2007.

Simulation Study

Schemes for SDL influence divisions of A into Ap

(available, perturbed) and Ad (available, unperturbed).

Page 17: Microdata Simulation for Confidentiality of Tax Returns Using Quantile Regression and Hot Deck Jennifer Huckett Iowa State University June 20, 2007.

SDL Schemes in Simulation Study

• No SDL• Swap 30% marital status• Swap 30% marital status and minority• Recode age into 5 year intervals• Recode age into 5 year intervals and swap

30% marital status and minority• Simulation via quantile regression and hot

deck

Page 18: Microdata Simulation for Confidentiality of Tax Returns Using Quantile Regression and Hot Deck Jennifer Huckett Iowa State University June 20, 2007.

Targets

• Intruder has information on target, t, and wants to match with released records

• Consider a few targets– Unique record– Rare record– Common record

Page 19: Microdata Simulation for Confidentiality of Tax Returns Using Quantile Regression and Hot Deck Jennifer Huckett Iowa State University June 20, 2007.

Results from Simulation Study

),|Pr( ZtjJ

target No SDLMarital

swapMarital and

minority swapAge

recode

Swaps and

recode

Quantile regression

and hot deck

unique1 1 0.1046 1 0.0178  0.0895

rare0.3333 0.1044 0.1304 0.0526 0.0225

 

0.0016

common0.0385 0.0320 0.0320 0.0068 0.0055

 

0.0008

Page 20: Microdata Simulation for Confidentiality of Tax Returns Using Quantile Regression and Hot Deck Jennifer Huckett Iowa State University June 20, 2007.

Conclusions & Future Work

• Risk behaves as we expect– increased SDL– decreased disclosure risk (except for unique!)

• Perform SDL techniques to American Community Survey data at US Census Bureau

• Compare traditional techniques to quantile regression and hot deck by computing risk

• Measure utility of released data

Page 21: Microdata Simulation for Confidentiality of Tax Returns Using Quantile Regression and Hot Deck Jennifer Huckett Iowa State University June 20, 2007.

Acknowledgements

• Iowa Department of Revenue

• Iowa’s Legislative Services Agency

• National Institute of Statistical Sciences

• US Census Bureau Dissertation Fellowship Award

Page 22: Microdata Simulation for Confidentiality of Tax Returns Using Quantile Regression and Hot Deck Jennifer Huckett Iowa State University June 20, 2007.

References

• Duncan,G.T. and Lambert, D. 1986. “Disclosure-Limited Data Dissemination,” Journal of the American Statistical Association, 81, 10-28.

• Duncan,G.T. and Lambert, D. 1989. “The Risk of Disclosure for Microdata,” Journal of Business and Economic Statisistics, 7, 207-217.

• Koenker, R. 2005. “Introduction,” Quantile Regression, Econometric Society Monograph Series, Cambridge University Press.

• Reiter, J.P. 2005. “Estimating Risks of Identification Disclosure in Microdata”, Journal of the American Statistical Association, 100, 472, 1103-1113.


Recommended