+ All Categories
Home > Documents > Part 4: Data Dependent Query Processing Methods

Part 4: Data Dependent Query Processing Methods

Date post: 23-Feb-2016
Category:
Upload: cloris
View: 57 times
Download: 0 times
Share this document with a friend
Description:
Part 4: Data Dependent Query Processing Methods. Yin “David” Yang  Zhenjie Zhang  Gerome Miklau Prev . Session: Marianne Winslett  Xiaokui Xiao. What we talked in the last session. Privacy is a major concern in data publishing - PowerPoint PPT Presentation
Popular Tags:
23
Yang, et al. Differentially Private Data Publication and Analysis. Tutorial at SIGMOD’12 Part 4: Data Dependent Query Processing Methods Yin “David” Yang Zhenjie Zhang Gerome Miklau Prev. Session: Marianne Winslett Xiaokui Xiao 1
Transcript
Page 1: Part 4: Data Dependent Query Processing Methods

Yang, et al. Differentially Private Data Publication and Analysis. Tutorial at SIGMOD’121

Part 4: Data Dependent Query Processing Methods

Yin “David” Yang Zhenjie Zhang Gerome Miklau

Prev. Session: Marianne Winslett Xiaokui Xiao

Page 2: Part 4: Data Dependent Query Processing Methods

Yang, et al. Differentially Private Data Publication and Analysis. Tutorial at SIGMOD’122

What we talked in the last sessionPrivacy is a major concern in data publishing

Simple anonymization methods fail to provide sufficient privacy protection

Definition of differential privacyHard to tell if a record is in the DB from query results Plausible deniability

Basic solutionsLaplace mechanism: inject Laplace noise into query

resultsExponential mechanism: choose a result randomly; a

“good” result has higher probabilityData independent methods

Page 3: Part 4: Data Dependent Query Processing Methods

Yang, et al. Differentially Private Data Publication and Analysis. Tutorial at SIGMOD’123

Data independent vs. data dependent

Data independent methods

Data dependent methods

Sensitive info Query results Query results + data dependent parameters

Error source Injected noise Injected noise + information loss

Noise type Unbiased Often BiasedAsymptotic error bound

Higher Lower, with data dependent constants

Practical accuracy Higher Lower for some data

Page 4: Part 4: Data Dependent Query Processing Methods

Yang, et al. Differentially Private Data Publication and Analysis. Tutorial at SIGMOD’124

Types of data dependent methodsType 1: optimizing noisy results

1. Inject noise2. Optimize the noisy query results based on

their valuesType 2: transforming original data

1. Transform the data to reduce the amount of necessary noise

2. Inject noise

Page 5: Part 4: Data Dependent Query Processing Methods

Yang, et al. Differentially Private Data Publication and Analysis. Tutorial at SIGMOD’125

Optimizing noisy results: Hierarchical Strategy presented in the last session.Hierarchical strategy: tree with count in each

nodeData dependent optimization:

If a node N has noisy count close to 0 Set the noisy count at N to 0

N4

N2 N3

N1

v1 v2 v3 v4

N5 N6 N7

Noisy count: 0.05Optimized count: 0

Hay et al. Boosting the Accuracy of Differentially-Private Queries Through Consistency, VLDB’10.

Page 6: Part 4: Data Dependent Query Processing Methods

Yang, et al. Differentially Private Data Publication and Analysis. Tutorial at SIGMOD’126

Optimizing noisy results: iReductSetting: answer a set of m queriesGoal: minimize their total relative error

RelErr = (noisy result – actual result) / actual result

Example:Two queries, q1 and q2Actual results: q1 :10, q2 :20Observation: we should add less noise to q1

than to q2

Xiao et al. iReduct: Differential Privacy with Reduced Relative Errors, SIGMOD’11.

Page 7: Part 4: Data Dependent Query Processing Methods

Yang, et al. Differentially Private Data Publication and Analysis. Tutorial at SIGMOD’127

Answering queries differently leads to different total relative errorContinuing the example

Two queries, q1 and q2, with actual answers 10 and 20

Suppose each of q1 and q2 has sensitivity 1Two strategies:

Answer q1 with ε/2, q2 with ε/2 Noise on q1: 2/ε Noise on q1: 2/ε

Answer q1 with 2ε/3, q2 with ε/3 Noise on q1: 1.5ε Noise variance on q1: 3/ε

Lower relative error overall

But we don’t know which strategy is better before comparing their actual answers!

Page 8: Part 4: Data Dependent Query Processing Methods

Yang, et al. Differentially Private Data Publication and Analysis. Tutorial at SIGMOD’128

Idea of iReduct1. Answer all queries with privacy budget ε/t2. Refine the noisy results with budget ε/t

more budget on queries with smaller results How to refine a noisy count?

Method 1: obtain a new noisy version, compute weighted average with the old version

Method 2: obtain a refined version directly from a complicated distribution

3. Repeat the last step t1 times

Page 9: Part 4: Data Dependent Query Processing Methods

Yang, et al. Differentially Private Data Publication and Analysis. Tutorial at SIGMOD’129

Example of iReductq1 q2

Iteration 1:16 14

ε/2t

ε/t 14/30

ε/2t

ε/t 16/30

12 24Iteration 2: ε/t 2/3 ε/t 1/3

9 22

… …Iteration 3: ε/t

22/31ε/t 9/31

Page 10: Part 4: Data Dependent Query Processing Methods

Yang, et al. Differentially Private Data Publication and Analysis. Tutorial at SIGMOD’1210

Optimizing noisy results: MWProblem: publish a histogram under DP that is

optimized for a given query set.Idea:

Start from a uniform histogram.Repeat the following t times

Evaluate all queries. Find the query q with the worst accuracy. Modify the histogram to improve the accuracy of q

using a technique called multiplicative weights (MW)

Hardt et al. A simple and practical algorithm for differentially private data release, arXiv.

Page 11: Part 4: Data Dependent Query Processing Methods

Yang, et al. Differentially Private Data Publication and Analysis. Tutorial at SIGMOD’1211

Example of MW

Exact histogram

q1 q2

Initial histogram

Range count queries

q1 q2less accurate

No privacy budget cost!Iteration 1: optimize q1

privacy cost: ε/t

q1 q2still less accurate

Iteration 2: optimize q1

privacy cost: ε/t

q1 q2less accurate

Iteration 3: optimize q2

privacy cost: ε/t

q1 q2

Page 12: Part 4: Data Dependent Query Processing Methods

Yang, et al. Differentially Private Data Publication and Analysis. Tutorial at SIGMOD’1212

Optimizing noisy results: NoiseFirstProblem: publish a histogram

Xu et al. Differentially Private Histogram Publication, ICDE’12.

Original datain a medical statistical DB

Histogram

Name

Age

HIV+

Frank 42 YBob 31 YMary 28 YDave 43 N

… … …

Page 13: Part 4: Data Dependent Query Processing Methods

Yang, et al. Differentially Private Data Publication and Analysis. Tutorial at SIGMOD’1213

Reduce error by merging bins

Noisy histogram

Exact histogram

Optimized histogram

2 2 2

Bin-merging scheme computed through dynamic programming

Positive/negative noise cancels out!

Page 14: Part 4: Data Dependent Query Processing Methods

Yang, et al. Differentially Private Data Publication and Analysis. Tutorial at SIGMOD’1214

Next we focus on the second type.Type 1: optimizing noisy results

1. Inject noise2. Optimize the noisy query results based on

their valuesType 2: transforming original data

1. Transform the data to reduce the amount of necessary noise

2. Inject noise

Page 15: Part 4: Data Dependent Query Processing Methods

Yang, et al. Differentially Private Data Publication and Analysis. Tutorial at SIGMOD’1215

Transforming data: StructureFirstAn alternative solution for histogram

publication

Original histogram Histogram after merging bins

∆=1 ∆=1/3 ∆=1/2

Lower sensitivity means less noise!

Xu et al. Differentially Private Histogram Publication, ICDE’12.Related: Xiao et al. Differentially Private Data Release through Multi-

Dimensional Partitioning. SDM’10.

Page 16: Part 4: Data Dependent Query Processing Methods

Yang, et al. Differentially Private Data Publication and Analysis. Tutorial at SIGMOD’1216

But the optimal structure is sensitive!

OriginalHistogram

Diff. optimal structures

With/without Alice

Alice is an HIV+ patient !

Alice

Page 17: Part 4: Data Dependent Query Processing Methods

Yang, et al. Differentially Private Data Publication and Analysis. Tutorial at SIGMOD’1217

StructureFirst uses the Exponential Mechanism to render its structure differentially private.

Randomly perturb the optimal histogram structureSet each boundary using the exponential

mechanism

1.3

1.3

1.3

4.5

4.5 1 1

1 2 1 4 5 1 1

1.3

1.3

1.3

4.5

4.5 1 1

1.3

1.3

1.3 4 2.

32.3

2.3

¢ ¢ ¢

1.2

1.2

1.2

5.1

2.4

2.4

2.4

Original histogram

merge bins (k*=3)

Randomly adjust boundaries

Lap(∆/ε) noise

1ProbSSE

Consume ε1Consume ε2 = (ε-ε1)Satisfies ε-DP

Page 18: Part 4: Data Dependent Query Processing Methods

Yang, et al. Differentially Private Data Publication and Analysis. Tutorial at SIGMOD’1218

Observations on StructureFirstMerging bins essentially compresses the data

Reduced sensitivity vs. information lossQuestion: can we apply other compression algorithms?

Yes!Method 1: Perform Fourier transformation, take the first

few coefficients, discard all others Rastogi and Nath. Differentially Private Aggregation Of Distributed Time-series

With Transformation And Encryption, SIGMOD’10Method 2: apply the theory of sparse representation

Li et al. Compressive Mechanism: Utilizing Sparse Representation in Differential Privacy, WPES’11

Hardt and Roth. Beating Randomized Response on Incoherent Matrices. STOC’12

Your new paper?

Page 19: Part 4: Data Dependent Query Processing Methods

Yang, et al. Differentially Private Data Publication and Analysis. Tutorial at SIGMOD’1219

Transforming original data: k-d-treeProblem: answer 2D range count queriesSolution: index the data with a k-d-tree

Cormode et al. Differentially Private Space Decompositions. ICDE’12.Xiao et al. Differentially Private Data Release through Multi-Dimensional

Partitioning. SDM, 2010

The k-d-tree structure is sensitive!

Page 20: Part 4: Data Dependent Query Processing Methods

Yang, et al. Differentially Private Data Publication and Analysis. Tutorial at SIGMOD’1220

How to protect the k-d-tree structure?Core problem: differentially private median.Method 1: exponential mechanism. (best) [1]Method 2: simply replace mean with median.

[3]Method 3: cell-based method. [2]

Partition the data with a grid.Compute differentially private counts using the

grid.[1] Cormode et al. Differentially Private Space Decompositions. ICDE’12.[2] Xiao et al. Differentially Private Data Release through Multi-Dimensional

Partitioning. SDM’10.[3] Inan et al. Private Record Matching Using Differential Privacy. EDBT’10.

Page 21: Part 4: Data Dependent Query Processing Methods

Yang, et al. Differentially Private Data Publication and Analysis. Tutorial at SIGMOD’1221

Transforming original data: S&AS&A: Sample and AggregateGoal: answer a query q whose result does not dependent on

the dataset cardinality, e.g., avg Idea 1:

Randomly partition the dataset into m blocks Evaluate q on each block Return average over m blocks + Laplace noise

Sensitivity: (max-min)/m Idea 2: median instead of average + exponential

mechanism Sensitivity is 1! Zhenjie has moreMohan et al. GUPT: Privacy Preserving Data Analysis Made Easy. SIGMOD’12.

Smith. Privacy-Preserving Statistical Estimation with Optimal Convergence Rates. STOC’11.

Page 22: Part 4: Data Dependent Query Processing Methods

Yang, et al. Differentially Private Data Publication and Analysis. Tutorial at SIGMOD’1222

Systems using Differential PrivacyPrivacy on the MapPINQAiravatGUPT

Page 23: Part 4: Data Dependent Query Processing Methods

Yang, et al. Differentially Private Data Publication and Analysis. Tutorial at SIGMOD’1223

Summary on data dependent methodsData dependent vs. data independentOptimizing noisy results

Simple optimizationsIterative methods

Transforming original dataReduced sensitivityCaution: parameters may reveal information

Next: Zhenjie on differentially private data mining


Recommended