+ All Categories
Home > Documents > Lecture 23 Data Privacy - Duke UniversityBeatles anthology Ubuntu breeze Python in thought Enthought...

Lecture 23 Data Privacy - Duke UniversityBeatles anthology Ubuntu breeze Python in thought Enthought...

Date post: 15-Jul-2020
Category:
Upload: others
View: 0 times
Download: 0 times
Share this document with a friend
83
Lecture 23 Data Privacy Guest Lecturer: Xi CompSci 516 Data Intensive Computing Systems (The slides were adapted from CompSci 216 Spring 15)
Transcript
Page 1: Lecture 23 Data Privacy - Duke UniversityBeatles anthology Ubuntu breeze Python in thought Enthought Canopy. User IDs replaced with random numbers 19 Uefa cup Uefa champions league

Lecture 23Data PrivacyGuest Lecturer: Xi

CompSci 516Data Intensive Computing Systems

(The slides were adapted from CompSci 216 Spring 15)

Page 2: Lecture 23 Data Privacy - Duke UniversityBeatles anthology Ubuntu breeze Python in thought Enthought Canopy. User IDs replaced with random numbers 19 Uefa cup Uefa champions league

Data and ____ ☜ your favorite subject

2

Page 3: Lecture 23 Data Privacy - Duke UniversityBeatles anthology Ubuntu breeze Python in thought Enthought Canopy. User IDs replaced with random numbers 19 Uefa cup Uefa champions league

Where is all this data coming from?

3

Page 4: Lecture 23 Data Privacy - Duke UniversityBeatles anthology Ubuntu breeze Python in thought Enthought Canopy. User IDs replaced with random numbers 19 Uefa cup Uefa champions league

Where is all this data coming from?

• Census surveys• IRS Records!

• Medical records• Insurance records!

• Search logs• Browse logs• Shopping histories

• Photos• Videos!• Smart phone Sensors• Mobility trajectories!

• …

4

Very sensitive information …

Page 5: Lecture 23 Data Privacy - Duke UniversityBeatles anthology Ubuntu breeze Python in thought Enthought Canopy. User IDs replaced with random numbers 19 Uefa cup Uefa champions league

Sometimes users can know and control who sees their information

5

Page 6: Lecture 23 Data Privacy - Duke UniversityBeatles anthology Ubuntu breeze Python in thought Enthought Canopy. User IDs replaced with random numbers 19 Uefa cup Uefa champions league

… but not always !!

6

Page 7: Lecture 23 Data Privacy - Duke UniversityBeatles anthology Ubuntu breeze Python in thought Enthought Canopy. User IDs replaced with random numbers 19 Uefa cup Uefa champions league

Example: Targeted advertising

7

Source: http://graphicsweb.wsj.com/documents/divSlider/media/ecosystem100730.png

Page 8: Lecture 23 Data Privacy - Duke UniversityBeatles anthology Ubuntu breeze Python in thought Enthought Canopy. User IDs replaced with random numbers 19 Uefa cup Uefa champions league

What websites track your behavior?

8

Source: http://blogs.wsj.com/wtk/

http://www.dictionary.com/

Page 9: Lecture 23 Data Privacy - Duke UniversityBeatles anthology Ubuntu breeze Python in thought Enthought Canopy. User IDs replaced with random numbers 19 Uefa cup Uefa champions league

Is your browser safe against tracking?9

source: http://panopticlick.com

Page 10: Lecture 23 Data Privacy - Duke UniversityBeatles anthology Ubuntu breeze Python in thought Enthought Canopy. User IDs replaced with random numbers 19 Uefa cup Uefa champions league

Servers track your information … so what?

10

Individual 1r1

Individual 2r2

Individual 3r3

Individual NrN

Server

DB

Either release the dataset OR

answers to queries

Page 11: Lecture 23 Data Privacy - Duke UniversityBeatles anthology Ubuntu breeze Python in thought Enthought Canopy. User IDs replaced with random numbers 19 Uefa cup Uefa champions league

Does it matter … I am anonymous, right?

11

Source (http://xkcd.org/834/)

What if we ensure our names and other identifiers are never released?

Page 12: Lecture 23 Data Privacy - Duke UniversityBeatles anthology Ubuntu breeze Python in thought Enthought Canopy. User IDs replaced with random numbers 19 Uefa cup Uefa champions league

Outline• Why does naïve anonymization fail?– The Massachusetts governor privacy breach– AOL data publishing fiasco – Facebook privacy violation

• How to ensure data analysis without privacy leakage?

• Applications & research direction

12

Page 13: Lecture 23 Data Privacy - Duke UniversityBeatles anthology Ubuntu breeze Python in thought Enthought Canopy. User IDs replaced with random numbers 19 Uefa cup Uefa champions league

The Massachusetts Governor Privacy Breach [Sweeney IJUFKS 2002]

•Name•SSN•Visit Date•Diagnosis•Procedure•Medication•Total Charge

Medical Data

• Zip !• Birth date !• Sex

13

Page 14: Lecture 23 Data Privacy - Duke UniversityBeatles anthology Ubuntu breeze Python in thought Enthought Canopy. User IDs replaced with random numbers 19 Uefa cup Uefa champions league

The Massachusetts Governor Privacy Breach [Sweeney IJUFKS 2002]

•Name•SSN•Visit Date•Diagnosis•Procedure•Medication•Total Charge

•Name•Address•Date Registered•Party affiliation •Date last voted

• Zip !• Birth date !• Sex

Medical Data Voter List

14

Page 15: Lecture 23 Data Privacy - Duke UniversityBeatles anthology Ubuntu breeze Python in thought Enthought Canopy. User IDs replaced with random numbers 19 Uefa cup Uefa champions league

The Massachusetts Governor Privacy Breach [Sweeney IJUFKS 2002]

•Name•SSN•Visit Date•Diagnosis•Procedure•Medication•Total Charge

•Name•Address•Date Registered•Party affiliation •Date last voted

• Zip !• Birth date !• Sex

Medical Data Voter List

• Governor of MA uniquely identified using ZipCode, Birth Date, and Sex. Name linked to Diagnosis

15

Page 16: Lecture 23 Data Privacy - Duke UniversityBeatles anthology Ubuntu breeze Python in thought Enthought Canopy. User IDs replaced with random numbers 19 Uefa cup Uefa champions league

The Massachusetts Governor Privacy Breach [Sweeney IJUFKS 2002]

•Name•SSN•Visit Date•Diagnosis•Procedure•Medication•Total Charge

•Name•Address•Date Registered•Party affiliation •Date last voted

• Zip !• Birth date !• Sex

Medical Data Voter List

• Governor of MA uniquely identified using ZipCode, Birth Date, and Sex.

Quasi Identifier

87 % of US population

16

Page 17: Lecture 23 Data Privacy - Duke UniversityBeatles anthology Ubuntu breeze Python in thought Enthought Canopy. User IDs replaced with random numbers 19 Uefa cup Uefa champions league

AOL data publishing fiasco17

Page 18: Lecture 23 Data Privacy - Duke UniversityBeatles anthology Ubuntu breeze Python in thought Enthought Canopy. User IDs replaced with random numbers 19 Uefa cup Uefa champions league

AOL data publishing fiasco …

18

Ashwin222Ashwin222Ashwin222Ashwin222Jun156Jun156Brett12345Brett12345Brett12345Brett12345Austin222Austin222

Uefa cupUefa champions leagueChampions league finalChampions league final 2013exchangeabilityProof of deFinitti’s theoremZombie gamesWarcraftBeatles anthologyUbuntu breezePython in thoughtEnthought Canopy

Page 19: Lecture 23 Data Privacy - Duke UniversityBeatles anthology Ubuntu breeze Python in thought Enthought Canopy. User IDs replaced with random numbers 19 Uefa cup Uefa champions league

User IDs replaced with random numbers

19

Uefa cupUefa champions leagueChampions league finalChampions league final 2013exchangeabilityProof of deFinitti’s theoremZombie gamesWarcraftBeatles anthologyUbuntu breezePython in thoughtEnthought Canopy

865712345865712345865712345865712345236712909236712909112765410112765410112765410112765410865712345865712345

Page 20: Lecture 23 Data Privacy - Duke UniversityBeatles anthology Ubuntu breeze Python in thought Enthought Canopy. User IDs replaced with random numbers 19 Uefa cup Uefa champions league

Privacy breach20

[NYTimes 2006]

Page 21: Lecture 23 Data Privacy - Duke UniversityBeatles anthology Ubuntu breeze Python in thought Enthought Canopy. User IDs replaced with random numbers 19 Uefa cup Uefa champions league

Privacy violations from Facebook

21

Source: http://article.wn.com/view/2012/08/28/Facebooks_new_app_bazaar_violates_punters_privacy_lobbyists/

Page 22: Lecture 23 Data Privacy - Duke UniversityBeatles anthology Ubuntu breeze Python in thought Enthought Canopy. User IDs replaced with random numbers 19 Uefa cup Uefa champions league

Inference from Impressions: Sexual Orientation

22

[Korolova JPC 2011]

Facebook Profile

+Online Data

Number of Impressions

+ Who are interested in

Men

+ Who are interested in

Women

25

0

Facebook uses private information to predict match to ad

Page 23: Lecture 23 Data Privacy - Duke UniversityBeatles anthology Ubuntu breeze Python in thought Enthought Canopy. User IDs replaced with random numbers 19 Uefa cup Uefa champions league

Reason for privacy breach• Anyone can run a campaign with strict targeting

criteria– Zip, birthdate and sex uniquely identify 87% of US

population!

• “Private” and “Friends only” profile info used to determine match!

• Default privacy settings lead to users having many publicly visible features– Default privacy setting for Likes, location, work place,

etc. is public

23

Page 24: Lecture 23 Data Privacy - Duke UniversityBeatles anthology Ubuntu breeze Python in thought Enthought Canopy. User IDs replaced with random numbers 19 Uefa cup Uefa champions league

Can Facebook release its graph ?

• Suppose we release just release the nodes and edges in the Facebook graph …

24

Page 25: Lecture 23 Data Privacy - Duke UniversityBeatles anthology Ubuntu breeze Python in thought Enthought Canopy. User IDs replaced with random numbers 19 Uefa cup Uefa champions league

25

Mobile communication networks

[J. Onnela et al. PNAS 07]

Sexual & Injection Drug Partners

[Potterat et al. STI 02]

Page 26: Lecture 23 Data Privacy - Duke UniversityBeatles anthology Ubuntu breeze Python in thought Enthought Canopy. User IDs replaced with random numbers 19 Uefa cup Uefa champions league

Naïve anonymization

!!!

!• Consider the above email communication graph

– Each node represents an individual– Each edge between two individuals indicates that they have

exchanged emails!

• Replace node identifiers with random numbers.

26

Alice

Ed

Bob

Fred

Cathy

Grace

Diane

Page 27: Lecture 23 Data Privacy - Duke UniversityBeatles anthology Ubuntu breeze Python in thought Enthought Canopy. User IDs replaced with random numbers 19 Uefa cup Uefa champions league

Lecture 2 : 590.03 Fall 13

Attacks on naïve anonymization

• Alice has sent emails to three individuals only

Alice

Ed

Bob

Fred

Cathy

Grace

Diane

27

Page 28: Lecture 23 Data Privacy - Duke UniversityBeatles anthology Ubuntu breeze Python in thought Enthought Canopy. User IDs replaced with random numbers 19 Uefa cup Uefa champions league

Lecture 2 : 590.03 Fall 13

Attacks on naïve anonymization

• Alice has sent emails to three individuals only • Only one node in the anonymized network has a

degree three• Hence, Alice can re-identify herself

Alice

Ed

Bob

Fred

Cathy

Grace

Diane

28

Page 29: Lecture 23 Data Privacy - Duke UniversityBeatles anthology Ubuntu breeze Python in thought Enthought Canopy. User IDs replaced with random numbers 19 Uefa cup Uefa champions league

Lecture 2 : 590.03 Fall 13

Attacks on naïve anonymization

• Cathy has sent emails to five individuals

Alice

Ed

Bob

Fred

Cathy

Grace

Diane

29

Page 30: Lecture 23 Data Privacy - Duke UniversityBeatles anthology Ubuntu breeze Python in thought Enthought Canopy. User IDs replaced with random numbers 19 Uefa cup Uefa champions league

Lecture 2 : 590.03 Fall 13

Attacks on naïve anonymization

• Cathy has sent emails to five individuals• Only one node has a degree five• Hence, Cathy can re-identify herself

Alice

Ed

Bob

Fred

Cathy

Grace

Diane

30

Page 31: Lecture 23 Data Privacy - Duke UniversityBeatles anthology Ubuntu breeze Python in thought Enthought Canopy. User IDs replaced with random numbers 19 Uefa cup Uefa champions league

Lecture 2 : 590.03 Fall 13

Attacks on naïve anonymization

• Now consider that Alice and Cathy share their knowledge about the anonymized network

• What can they learn about the other individuals?

Alice

Ed

Bob

Fred

Cathy

Grace

Diane

31

Page 32: Lecture 23 Data Privacy - Duke UniversityBeatles anthology Ubuntu breeze Python in thought Enthought Canopy. User IDs replaced with random numbers 19 Uefa cup Uefa champions league

Lecture 2 : 590.03 Fall 13

Attacks on naïve anonymization

• First, Alice and Cathy know that only Bob have sent emails to both of them

Alice

Ed

Bob

Fred

Cathy

Grace

Diane

32

Page 33: Lecture 23 Data Privacy - Duke UniversityBeatles anthology Ubuntu breeze Python in thought Enthought Canopy. User IDs replaced with random numbers 19 Uefa cup Uefa champions league

Lecture 2 : 590.03 Fall 13

Attacks on naïve anonymization

• First, Alice and Cathy know that only Bob have sent emails to both of them

• Bob can be identified

Alice

Ed

Bob

Fred

Cathy

Grace

Diane

33

Page 34: Lecture 23 Data Privacy - Duke UniversityBeatles anthology Ubuntu breeze Python in thought Enthought Canopy. User IDs replaced with random numbers 19 Uefa cup Uefa champions league

Lecture 2 : 590.03 Fall 13

Attacks on naïve anonymization

• Alice has sent emails to Bob, Cathy, and Ed only

Alice

Ed

Bob

Fred

Cathy

Grace

Diane

34

Page 35: Lecture 23 Data Privacy - Duke UniversityBeatles anthology Ubuntu breeze Python in thought Enthought Canopy. User IDs replaced with random numbers 19 Uefa cup Uefa champions league

Lecture 2 : 590.03 Fall 13

Attacks on naïve anonymization

• Alice has sent emails to Bob, Cathy, and Ed only• Ed can be identified

Alice

Ed

Bob

Fred

Cathy

Grace

Diane

35

Page 36: Lecture 23 Data Privacy - Duke UniversityBeatles anthology Ubuntu breeze Python in thought Enthought Canopy. User IDs replaced with random numbers 19 Uefa cup Uefa champions league

Lecture 2 : 590.03 Fall 13

Attacks on naïve anonymization

• Alice and Cathy can learn that Bob and Ed are connected

Alice

Ed

Bob

Fred

Cathy

Grace

Diane

36

Page 37: Lecture 23 Data Privacy - Duke UniversityBeatles anthology Ubuntu breeze Python in thought Enthought Canopy. User IDs replaced with random numbers 19 Uefa cup Uefa champions league

Attacks

37

• Matching attacks: the adversary matches external information to a naively anonymized network - unique or partial node re-identification

Page 38: Lecture 23 Data Privacy - Duke UniversityBeatles anthology Ubuntu breeze Python in thought Enthought Canopy. User IDs replaced with random numbers 19 Uefa cup Uefa champions league

Local structure is highly identifying

38

Node Degree Neighbor’s Degree

Well Protected

Uniquely Identified

[Hay et al PVLDB 08]

Friendster Network ~ 4.5 million nodes

Page 40: Lecture 23 Data Privacy - Duke UniversityBeatles anthology Ubuntu breeze Python in thought Enthought Canopy. User IDs replaced with random numbers 19 Uefa cup Uefa champions league

Sensitive values in social networks

• Some people are privacy conscious (like you)

• Most people are lazy and keep the default privacy settings (i.e., no privacy)!

• Can infer your sensitive attributes based on the sensitive attribute of public individuals …

40

Page 41: Lecture 23 Data Privacy - Duke UniversityBeatles anthology Ubuntu breeze Python in thought Enthought Canopy. User IDs replaced with random numbers 19 Uefa cup Uefa champions league

Servers track your information … and you are not anonymous

41

Page 42: Lecture 23 Data Privacy - Duke UniversityBeatles anthology Ubuntu breeze Python in thought Enthought Canopy. User IDs replaced with random numbers 19 Uefa cup Uefa champions league

• Redlining: the practice of denying, or charging more for, services such as banking, insurance, access to health care, or even supermarkets, or denying jobs to residents in particular, often racially determined, areas.

42

Why care about privacy?

Page 43: Lecture 23 Data Privacy - Duke UniversityBeatles anthology Ubuntu breeze Python in thought Enthought Canopy. User IDs replaced with random numbers 19 Uefa cup Uefa champions league

Can data analysis be done without breaching the privacy of individuals?

43

Page 44: Lecture 23 Data Privacy - Duke UniversityBeatles anthology Ubuntu breeze Python in thought Enthought Canopy. User IDs replaced with random numbers 19 Uefa cup Uefa champions league

Outline• Why does naïve anonymization fail?!

• How to ensure data analysis without privacy leakage?

!

• Applications & research direction

44

Page 45: Lecture 23 Data Privacy - Duke UniversityBeatles anthology Ubuntu breeze Python in thought Enthought Canopy. User IDs replaced with random numbers 19 Uefa cup Uefa champions league

Private data analysis problem

45

Individual 1r1

Individual 2r2

Individual3r3

Individual NrN

Server

DB

Utility:Privacy: No breach about any individual

Page 46: Lecture 23 Data Privacy - Duke UniversityBeatles anthology Ubuntu breeze Python in thought Enthought Canopy. User IDs replaced with random numbers 19 Uefa cup Uefa champions league

Private data analysis examples

46

Application Data Collector

Third Party (adversary)

Private Information

Function (utility)

Medical Hospital Epidemiologist Disease Correlation between disease and geography

Genome analysis

Hospital Statistician/Researcher

Genome Correlation between genome and disease

Advertising Google/FB/Y!

Advertiser Clicks/Browsing

Number of clicks on an ad by age/region/gender …

Social Recommen-

dations

Facebook Another user Friend links / profile

Recommend other users or ads to users based on

social network

Location Services

Verizon/AT&T

Verizon/AT&T Location Local Search

Page 47: Lecture 23 Data Privacy - Duke UniversityBeatles anthology Ubuntu breeze Python in thought Enthought Canopy. User IDs replaced with random numbers 19 Uefa cup Uefa champions league

Private data analysis methods

• Bare Minimum protection: • K-anonymity [Sweeney IJUFKS 2002] • L-diversity [Machanavajjhala et al ICDE 2006]• T-closeness [Li et al ICDE 2007]

!

• Ideal (state-of-the-art): Differential Privacy

47

Page 48: Lecture 23 Data Privacy - Duke UniversityBeatles anthology Ubuntu breeze Python in thought Enthought Canopy. User IDs replaced with random numbers 19 Uefa cup Uefa champions league

K-anonymity [Sweeney IJUFKS 2002]

• If every row corresponds to one individual, then … … every row should look like k-1 other rows based on the quasi-identifier attributes

48

Page 49: Lecture 23 Data Privacy - Duke UniversityBeatles anthology Ubuntu breeze Python in thought Enthought Canopy. User IDs replaced with random numbers 19 Uefa cup Uefa champions league

K-anonymity

49

Zip Age Nationality Disease

13053 28 Russian Heart

13068 29 American Heart

13068 21 Japanese Flu

13053 23 American Flu

14853 50 Indian Cancer

14853 55 Russian Heart

14850 47 American Flu

14850 59 American Flu

13053 31 American Cancer

13053 37 Indian Cancer

13068 36 Japanese Cancer

13068 32 American Cancer

Zip Age Nationality Disease

130** <30 * Heart

130** <30 * Heart

130** <30 * Flu

130** <30 * Flu

1485* >40 * Cancer

1485* >40 * Heart

1485* >40 * Flu

1485* >40 * Flu

130** 30-40 * Cancer

130** 30-40 * Cancer

130** 30-40 * Cancer

130** 30-40 * Cancer

k=4

Page 50: Lecture 23 Data Privacy - Duke UniversityBeatles anthology Ubuntu breeze Python in thought Enthought Canopy. User IDs replaced with random numbers 19 Uefa cup Uefa champions league

K-anonymity in graphs

50

Page 51: Lecture 23 Data Privacy - Duke UniversityBeatles anthology Ubuntu breeze Python in thought Enthought Canopy. User IDs replaced with random numbers 19 Uefa cup Uefa champions league

Attack 1: homogeneity

51

Zip Age Nationality Disease

130** <30 * Heart

130** <30 * Heart

130** <30 * Flu

130** <30 * Flu

1485* >40 * Cancer

1485* >40 * Heart

1485* >40 * Flu

1485* >40 * Flu

130** 30-40 * Cancer

130** 30-40 * Cancer

130** 30-40 * Cancer

130** 30-40 * Cancer

!

Bob has Cancer

Name Zip Age Nat.

Bob 13053 35 ??

Page 52: Lecture 23 Data Privacy - Duke UniversityBeatles anthology Ubuntu breeze Python in thought Enthought Canopy. User IDs replaced with random numbers 19 Uefa cup Uefa champions league

Attack 2: background & knowledge

52

Zip Age Nationality Disease

130** <30 * Heart

130** <30 * Heart

130** <30 * Flu

130** <30 * Flu

1485* >40 * Cancer

1485* >40 * Heart

1485* >40 * Flu

1485* >40 * Flu

130** 30-40 * Cancer

130** 30-40 * Cancer

130** 30-40 * Cancer

130** 30-40 * Cancer

Name Zip Age Nat.

Umeko 13068 24 Japan

Page 53: Lecture 23 Data Privacy - Duke UniversityBeatles anthology Ubuntu breeze Python in thought Enthought Canopy. User IDs replaced with random numbers 19 Uefa cup Uefa champions league

Attack 2: background & knowledge

53

Zip Age Nationality Disease

130** <30 * Heart

130** <30 * Heart

130** <30 * Flu

130** <30 * Flu

1485* >40 * Cancer

1485* >40 * Heart

1485* >40 * Flu

1485* >40 * Flu

130** 30-40 * Cancer

130** 30-40 * Cancer

130** 30-40 * Cancer

130** 30-40 * Cancer

!

Umeko has Flu

Name Zip Age Nat.

Umeko 13068 24 Japan

Japanese have a very low incidence of Heart disease.

Page 54: Lecture 23 Data Privacy - Duke UniversityBeatles anthology Ubuntu breeze Python in thought Enthought Canopy. User IDs replaced with random numbers 19 Uefa cup Uefa champions league

Recall attacks on k-anonymity

54

Zip Age Nationality Disease

130** <30 * Heart

130** <30 * Heart

130** <30 * Flu

130** <30 * Flu

1485* >40 * Cancer

1485* >40 * Heart

1485* >40 * Flu

1485* >40 * Flu

130** 30-40 * Cancer

130** 30-40 * Cancer

130** 30-40 * Cancer

130** 30-40 * Cancer

Name Zip Age Nat.

Umeko 13068 24 Japan

Japanese have a very low incidence of Heart disease. !

Umeko has Flu

!

Bob has Cancer

Name Zip Age Nat.

Bob 13053 35 ??

Page 55: Lecture 23 Data Privacy - Duke UniversityBeatles anthology Ubuntu breeze Python in thought Enthought Canopy. User IDs replaced with random numbers 19 Uefa cup Uefa champions league

3-diverse table

55

Zip Age Nationality Disease

1306* <=40 * Heart

1306* <=40 * Flu

1306* <=40 * Cancer

1306* <=40 * Cancer

1485* >40 * Cancer

1485* >40 * Heart

1485* >40 * Flu

1485* >40 * Flu

1305* <=40 * Heart

1305* <=40 * Flu

1305* <=40 * Cancer

1305* <=40 * Cancer

Name Zip Age Nat.

Umeko 13068 24 Japan

Japanese have a very low incidence of Heart disease. !

Umeko has ?

!

Bob has ?

Name Zip Age Nat.

Bob 13053 35 ??

Page 56: Lecture 23 Data Privacy - Duke UniversityBeatles anthology Ubuntu breeze Python in thought Enthought Canopy. User IDs replaced with random numbers 19 Uefa cup Uefa champions league

3-diverse table

56

Zip Age Nationality Disease

1306* <=40 * Heart

1306* <=40 * Flu

1306* <=40 * Cancer

1306* <=40 * Cancer

1485* >40 * Cancer

1485* >40 * Heart

1485* >40 * Flu

1485* >40 * Flu

1305* <=40 * Heart

1305* <=40 * Flu

1305* <=40 * Cancer

1305* <=40 * Cancer

Name Zip Age Nat.

Umeko 13068 24 Japan

Japanese have a very low incidence of Heart disease. !

Umeko has ?

!

Bob has ?

Name Zip Age Nat.

Bob 13053 35 ??

L-Diversity Principle: Every group of tuples with the same Q-ID values has ≥ L distinct sensitive values of roughly equal proportions.

Page 57: Lecture 23 Data Privacy - Duke UniversityBeatles anthology Ubuntu breeze Python in thought Enthought Canopy. User IDs replaced with random numbers 19 Uefa cup Uefa champions league

L-diversity

• L-diversity Principle:Every group of tuples with the same Q-ID values has ≥ L distinct “well represented” sensitive values.

• The link between identity and attribute value is the sensitive information. “Does Bob have cancer? Heart disease? Flu?” “Does Umeko have cancer? Heart disease? Flu?”

• Privacy is breached when the attribute value can be inferred with high probability. Pr[“Bob has cancer” | published table, adv. knowledge] > t

57

[Machanavajjhala et al ICDE 2006]

Page 58: Lecture 23 Data Privacy - Duke UniversityBeatles anthology Ubuntu breeze Python in thought Enthought Canopy. User IDs replaced with random numbers 19 Uefa cup Uefa champions league

L-diversity

• Limit adversarial knowledge- Adversary knows ≤ L-2 negation statements.

“Umeko does not have Heart Disease.”

• Consider the worst case- Consider all possible conjunctions of ≤ (L-2) statements

58

At least L sensitive values should appear in every group

Page 59: Lecture 23 Data Privacy - Duke UniversityBeatles anthology Ubuntu breeze Python in thought Enthought Canopy. User IDs replaced with random numbers 19 Uefa cup Uefa champions league

L-diversity

• Limit adversarial knowledge- Adversary knows ≤ L-2 negation statements.

“Umeko does not have Heart Disease.”

• Consider the worst case- Consider all possible conjunctions of ≤ (L-2) statements

59

The L distinct sensitive values in each group should be roughly of equal proportions

Page 60: Lecture 23 Data Privacy - Duke UniversityBeatles anthology Ubuntu breeze Python in thought Enthought Canopy. User IDs replaced with random numbers 19 Uefa cup Uefa champions league

T-closeness60

[Li et al ICDE 2007]

The L distinct sensitive values in each group should be roughly of equal proportions

Let t=0.75. Privacy of individuals in the above group is ensured if,

Page 61: Lecture 23 Data Privacy - Duke UniversityBeatles anthology Ubuntu breeze Python in thought Enthought Canopy. User IDs replaced with random numbers 19 Uefa cup Uefa champions league

T-closeness61

Theorem: For all groups g, for all s in S (sensitive values), and for all B (background knowledge), |B| ≤ (L-2)

is equivalent to

Page 62: Lecture 23 Data Privacy - Duke UniversityBeatles anthology Ubuntu breeze Python in thought Enthought Canopy. User IDs replaced with random numbers 19 Uefa cup Uefa champions league

Attack 3: Composition

62

If Bob is in both datasets, then Bob has Stroke!

Zip Age Income Disease

130** [25-30] >50k None

130** [25-30] >50k Stroke

130** [25-30] >50k Flu

130** [25-30] >50k Cancer

902** [60-70] <50k Flu

902** [60-70] <50k Stroke

902** [60-70] <50k Flu

902** [60-70] <50k Cancer

Zip Age Nationality Disease

130** <40 * Cold

130** <40 * Stroke

130** <40 * Rash

1485* >40 * Cancer

1485* >40 * Flu

1485* >40 * Cancer

Page 63: Lecture 23 Data Privacy - Duke UniversityBeatles anthology Ubuntu breeze Python in thought Enthought Canopy. User IDs replaced with random numbers 19 Uefa cup Uefa champions league

Differential Privacy

• Consider two datasets – D1: with Bob as one of the participants

– D2 : without Bob

• Answers are roughly the same whether or not Bob is in the data

63

[Dwork et al TCC 2006]

Page 64: Lecture 23 Data Privacy - Duke UniversityBeatles anthology Ubuntu breeze Python in thought Enthought Canopy. User IDs replaced with random numbers 19 Uefa cup Uefa champions league

Differential Privacy

Algorithm A satisfies ε-differential privacy if: - for every pair of neighboring tables D1, D2 - for every output O!

Pr[A(D1) = O] ≤ eε Pr[A(D2) = O]

64

Page 65: Lecture 23 Data Privacy - Duke UniversityBeatles anthology Ubuntu breeze Python in thought Enthought Canopy. User IDs replaced with random numbers 19 Uefa cup Uefa champions league

Meaning …

65

D2

D1

Set of all outputs

.

.

.

A(D1) = O1

P [ A(D1) = O1 ]

P [ A(D2) = Ok ]

Bob in the data

Bob not in the data

Page 66: Lecture 23 Data Privacy - Duke UniversityBeatles anthology Ubuntu breeze Python in thought Enthought Canopy. User IDs replaced with random numbers 19 Uefa cup Uefa champions league

Meaning …

66

.

.

.

Worst discrepancyin probabilities

D2

D1

O1

Page 67: Lecture 23 Data Privacy - Duke UniversityBeatles anthology Ubuntu breeze Python in thought Enthought Canopy. User IDs replaced with random numbers 19 Uefa cup Uefa champions league

Privacy loss parameter ε

Algorithm A satisfies ε-differential privacy if: - for every pair of neighboring tables D1, D2 - for every output O

Pr[A(D1) = O] ≤ eε Pr[A(D2) = O]!

• Smaller the ε more the privacy (and better the utility)

67

Page 68: Lecture 23 Data Privacy - Duke UniversityBeatles anthology Ubuntu breeze Python in thought Enthought Canopy. User IDs replaced with random numbers 19 Uefa cup Uefa champions league

Privacy loss parameter ε

Algorithm A satisfies ε-differential privacy if: - for every pair of neighboring tables D1, D2 - for every output O

Pr[A(D1) = O] ≤ eε Pr[A(D2) = O]!

what the adversary learns about an individual is the same even if the individual is not in the data

(or lied about his/her value)

68

Page 69: Lecture 23 Data Privacy - Duke UniversityBeatles anthology Ubuntu breeze Python in thought Enthought Canopy. User IDs replaced with random numbers 19 Uefa cup Uefa champions league

Algorithm 1: randomized response

69

Disease (Y/N)

Y

Y

N

Y

N

N

With probability p, report true value!With probability 1-p, report flipped value

Disease (Y/N)

Y

N

N

N

Y

N

Can estimate the true proportion of Y in the data based on the perturbed values (since we know p)

Page 70: Lecture 23 Data Privacy - Duke UniversityBeatles anthology Ubuntu breeze Python in thought Enthought Canopy. User IDs replaced with random numbers 19 Uefa cup Uefa champions league

Algorithm 1: randomized response

• Consider two databases D, D’ that differ in jth value:- D[j] ≠ D’[j], D[i] = D’[i] for all i≠j

• Consider output O:!

70

Page 71: Lecture 23 Data Privacy - Duke UniversityBeatles anthology Ubuntu breeze Python in thought Enthought Canopy. User IDs replaced with random numbers 19 Uefa cup Uefa champions league

Algorithm 1: randomized response

• Consider two databases D, D’ that differ in jth value:- D[j] ≠ D’[j], D[i] = D’[i] for all i≠j

• Consider output O:!

71

Page 72: Lecture 23 Data Privacy - Duke UniversityBeatles anthology Ubuntu breeze Python in thought Enthought Canopy. User IDs replaced with random numbers 19 Uefa cup Uefa champions league

Algorithm 1: randomized response

• Consider two databases D, D’ that differ in jth value:- D[j] ≠ D’[j], D[i] = D’[i] for all i≠j

• Consider output O:!

72

Page 73: Lecture 23 Data Privacy - Duke UniversityBeatles anthology Ubuntu breeze Python in thought Enthought Canopy. User IDs replaced with random numbers 19 Uefa cup Uefa champions league

Algorithm 1: randomized response

• Suppose n1 out of n people replied ‘yes’, rest said no, what is the best estimator for

!

73

Page 74: Lecture 23 Data Privacy - Duke UniversityBeatles anthology Ubuntu breeze Python in thought Enthought Canopy. User IDs replaced with random numbers 19 Uefa cup Uefa champions league

Algorithm 2: Laplace mechanism

74

Laplace Distribution – Lap(λ)

00.1250.25

0.3750.5

-10 -8 -6 -4 -2 0 2 4 6 8 10

Database

Researcher

Query q

True answer q(D) q(D) + η

η

h(η) α exp(-η / λ)

Privacy depends on the λ parameter

Mean: 0, Variance: 2 λ2

57

Page 75: Lecture 23 Data Privacy - Duke UniversityBeatles anthology Ubuntu breeze Python in thought Enthought Canopy. User IDs replaced with random numbers 19 Uefa cup Uefa champions league

Laplace mechanism example

Qn: Release the histogram of admissions by diagnosisAns: • Compute the true histogram• Add noise to each count in the histogram using

noise from Lap(1/ε)!

Noisy count is within ± 1.38 of true count for ε = 1

75

Page 76: Lecture 23 Data Privacy - Duke UniversityBeatles anthology Ubuntu breeze Python in thought Enthought Canopy. User IDs replaced with random numbers 19 Uefa cup Uefa champions league

DP Composition

Qn: Release 2 histograms of admissions (a) by diagnosis, and (b) ageAns: • Compute the true histograms• Add noise to each count in the histograms using noise

from Lap(1/ε)Noisy counts are within ± 1.38 of true counts in both

histograms … but total privacy loss (1+1) = 2

—> satisfies 2-differential privacy

76

Page 77: Lecture 23 Data Privacy - Duke UniversityBeatles anthology Ubuntu breeze Python in thought Enthought Canopy. User IDs replaced with random numbers 19 Uefa cup Uefa champions league

Interactive v.s. publishing model77

Interactive setting: - depending on the remaining privacy budget

Non-interactive setting: - queries or the query types are known in advance- publishing synthetic data- no limit on the number of queries

Page 78: Lecture 23 Data Privacy - Duke UniversityBeatles anthology Ubuntu breeze Python in thought Enthought Canopy. User IDs replaced with random numbers 19 Uefa cup Uefa champions league

Outline• Why does naïve anonymization fail?!

• How to ensure data analysis without privacy leakage?

!

• Applications & research direction

78

Page 79: Lecture 23 Data Privacy - Duke UniversityBeatles anthology Ubuntu breeze Python in thought Enthought Canopy. User IDs replaced with random numbers 19 Uefa cup Uefa champions league

Applications of DP: OnTheMap79

• A Census application that plots commuting patterns of workers http://onthemap.ces.census.gov/

Page 80: Lecture 23 Data Privacy - Duke UniversityBeatles anthology Ubuntu breeze Python in thought Enthought Canopy. User IDs replaced with random numbers 19 Uefa cup Uefa champions league

Applications of DP: OnTheMap80

• DP synthetic data, with noise-adding mechanism• To compute Quarterly Workforce Indicators – Total employment – Average earnings – New hires & separations – Unemployment statistics

Page 81: Lecture 23 Data Privacy - Duke UniversityBeatles anthology Ubuntu breeze Python in thought Enthought Canopy. User IDs replaced with random numbers 19 Uefa cup Uefa champions league

Applications of DP: RAPPOR81

• Randomized aggregatable privacy-preserving ordinal response- crowdsource statistics from end-user client software (chromium)

Erlingsson et al

http://www.chromium.org/developers/design-documents/rappor

Randomized Response

Page 82: Lecture 23 Data Privacy - Duke UniversityBeatles anthology Ubuntu breeze Python in thought Enthought Canopy. User IDs replaced with random numbers 19 Uefa cup Uefa champions league

Other DP Applications & Research

• Network data- Private analysis of graph structure, Karwa et al, at ACM Trans. Database Syst., 2014

• Multiple entries - Formal privacy protection for data products combining individual and employer frames, Haney et al. at UNECE/Eurostat 2015

• Trajectory & location trajectories - Geo-indistinguishability Andrés et al. at CCS 2013 - DPT, He et al. at VLDB 2015

• DP + Security - Root ORAM, Wagh et al. arxiv 2016 - Private record linkage, Cao et al, ICDE 2015

• Beyond DP - Pufferfish privacy, Machanavajjhala, at ACM Trans. Database Syst., 2014- Blowfish privacy, He et al, SIGMOD 2015.

82

Page 83: Lecture 23 Data Privacy - Duke UniversityBeatles anthology Ubuntu breeze Python in thought Enthought Canopy. User IDs replaced with random numbers 19 Uefa cup Uefa champions league

Summary

• “Data-driven” revolution has transformed many fields, but need to address the privacy problem - The Massachusetts governor privacy breach - AOL data publishing fiasco - Facebook privacy violation

• Tools like differential privacy can foster ‘safe’ data collection, analysis and data sharing. - K-anonymity - L-diversity - T-closeness - Differential privacy

• More details on data privacy (see Ashwin’s other course)

83


Recommended