Date post: | 26-Dec-2015 |
Category: |
Documents |
Upload: | debra-griffith |
View: | 217 times |
Download: | 0 times |
PRIVACY-PRESERVING RELEASES OF SOCIAL NETWORKS
Chih-Hua Tai
Dept. of CSIE, National Taipei University, New Taipei City, Taiwan
DATA MINING
The primary task in data mining: development of models about aggregated data. Finding frequent patterns Finding rules
2
FINDING PATTERNS
3
FINDING RULES
4
DATA MINING VS. PRIVACY
The primary task in data mining: development of models about aggregated data. Finding frequent patterns Finding rules
Can we develop accurate models without access to precise information in individual data records? Why?
5
PRIVACY ISSUES IN DATA
6
PRIVACY ISSUES IN DATA
7
PRIVACY ISSUES IN DATA
8
DATA MINING AND PRIVACY
The primary task in data mining: development of models about aggregated data.
Can we develop accurate models without access to precise information in individual data records?
Answer: yes, by randomization. R. Agrawal, R. Srikant “Privacy Preserving Data
Mining,” SIGMOD 2000 How about the data utility?
9
DATA MINING VS. SOCIAL NETWORKS
10
Attributes: Name, Salary, …
Links: Friends, Neighborhood, …
Communities: Interests, Activities, …
PRIVACY ISSUES ON SOCIAL NETWORKS
Personal information leaked, even if the vertex identifies are hidden…
11
Many information can be used to re-associate the vertex with its identity.
Vertex degree: k-degree anonymity , …
Neighborhood configuration: k-neighborhood anonymity, k-automorphism anonymity, k-isomorphism anonymity, grouping-and- collapsing, …
13
C.-H. Tai, P. S. Yu, D.-N. Yang and M.-S. Chen, "Privacy-preserving Social Network Publication Against Friendship Attacks," In KDD, 2011.
FRIENDSHIP ATTACK
Still there are another type of information for vertex re-identification – friendship attack
14
FRIENDSHIP ATTACK
Given a target individual A and the degree pair information D2 = (d1,d2), a friendship attack (D2,A) exploits D2 to identify a vertex v1 corresponding to A in a published social network where v1 connects to another vertex v2 with the degree pair (dv1
,dv2) = (d1,d2).
15
10
1 2 3
4 5
Alice
6 7
9
8
Ex 1. Assume that an attacker knows that Alice has 3 connections, Bob has 2 connections, and Alice and Bob are friends. The attacker identifies v9 as Alice with 100% confidence.
FRIENDSHIP ATTACK
16
In DBLP data set, the percentages of vertices that can be re-identified with a probability larger than 1/k by degree and friendship attacks.
Original Social Network
k-degree anonymized Social Network
kDegree Attack Friendship Attack Friendship Attack
5 0.28% 5.37% 2.89%
10 0.53% 10.69% 4.65%
15 0.73% 14.71% 5.82%
20 0.93% 18.44% 7.23%
NEW PRIVACY MODEL AGAINST FRIENDSHIP ATTACK
k2-degree AnonymityA social network is k2-degree anonymous
if, for every vertex with an incident edge of degree pair (d1,d2), there exist at least k − 1 other vertices, each of which also has an incident edge of the same degree pair.
17
10
1 2 3
4 5
6 7
9
8
Ex 2. Even with the knowledge (D2,A)=((3,2),Alice), the probability that an attacker can re-identify Alice in the 22-degree anonymous social network is limited to ½.
THE ANONYMIZATION
Problem formulation: Given a graph G(V, E) and an integer k, the problem is to
anonymize G to satisfy k2-degree anonymity such that information distortion is minimized.
The challenges: Any alteration on an edge will affect the degrees of
two vertices.
18
GRAPH ANONYMIZATION ALGORITHMS
Integer Programming formulation Obtain the optimal solution with bad scalability
DEgree SEqence ANonymization (DESEAN) Step1. Degree Sequence Anonymization.
- determine the groups of vertices protecting each others
Step2. Privacy Constraint Satisfaction.- eliminate the advantage of knowing friendship information
Step3. Anonymous Degree Realization.- have the vertices in the same group share the same vertex degree 19
Step1. Degree Sequence Anonymization. Cluster vertices with similar degrees and select a
target degree dx for each cluster x s. t. each cluster contains at least k vertices and the weighted degree difference ω Σvϵx (dx - d v) + (1 − ω) Σvϵx (d v - dx) is as small as possible.
ALGORITHM DESEAN
20
10
1 2 3
4 5
6 7
9
8
Ex 3. Given k = 2 and ω = 0.5.
ALGORITHM DESEAN
Step1. Degree Sequence Anonymization. Cluster vertices with similar degrees and select a
target degree dx for each cluster x s. t. each cluster contains at least k vertices and the weighted degree difference ω Σvϵx (dx - d v) + (1 − ω) Σvϵx (d v - dx) is as small as possible.
21
10
1 2 3
4 5
6 7
9
8
10
1 2 3
4 5
6 7
9
8
Ex 3. Given k = 2 and ω = 0.5.
Step2. Privacy Constraint Satisfaction. Add or delete edges between clusters to ensure
that, for each pair of clusters (x,y), the number of vertices in x directly connected to the vertices in y is either zero or not less than k.
ALGORITHM DESEAN
22
10
1 2 3
4 5
6 7
9
8
Ex 3. Given k = 2 and ω = 0.5.
Step2. Privacy Constraint Satisfaction. Add or delete edges between clusters to ensure
that, for each pair of clusters (x,y), the number of vertices in x directly connected to the vertices in y is either zero or not less than k.
ALGORITHM DESEAN
23
10
1 2 3
4 5
6 7
9
8
10
1 2 3
4 5
6 7
9
8
Ex 3. Given k = 2 and ω = 0.5.
Step3. Anonymous Degree Realization. Adjust edges in G s. t. the vertices in each
cluster x meet the target degree dx selected in Step 1.
10
1 2 3
4 5
6 7
9
8
ALGORITHM DESEAN
24
Ex 3. Given k = 2 and ω = 0.5.
Step3. Anonymous Degree Realization. Adjust edges in G s. t. the vertices in each
cluster x meet the target degree dx selected in Step 1.
10
1 2 3
4 5
6 7
9
8
10
1 2 3
4 5
6 7
9
8
ALGORITHM DESEAN
25
Ex 3. Given k = 2 and ω = 0.5.
26
C.-H. Tai, P. S. Yu, D.-N. Yang, and M.-S. Chen, ”Structural diversity for privacy in publishing social networks,” In SDM, 2011.
COMMUNITY IDENTIFICATION
Vertex identification is considered to be an important privacy issue in publishing social networks. ◦ k-degree anonymity, k-neighborhood anonymity, …
In addition to a vertex identity, each individual is also associated with a community identity. ◦ Could be used to infer the political party affiliation or disease
information sensitive to the public.◦ Is a kind of structural information
27
COMMUNITY IDENTIFICATION
Community information is explicitly given:
Ex.
Alice knows recently… Bob is sick Bob participates in this social network Bob makes 5 friends. (vertex degree attack)
Bob has AIDS!
AIDS Com. SLE Com.
28
COMMUNITY IDENTIFICATION
Community information is not given:
Ex.
Alice knows Bob participates in this social network and has 5 friends. (vertex degree attack)
Alice can know the approximation of Bob’s neighborhood.
29
COMMUNITY IDENTIFICATION
% of vertices violating k-structural diversity◦ (a) DBLP◦ (b) ca-CondMat
k-degree anonymization is insufficient 30
COMMUNITY IDENTIFICATION
structural diversity◦ (a) original DBLP◦ (b) 10-degree anonymized DBLP
Vertices with large degrees appear in a small set of communities 31
NEW PRIVACY MODEL AGAINST COMMUNITY IDENTIFICATION
k-Structural Diversity To protect against vertex degree attack, for each vertex, there
should be other vertices with the same degree located in at least k-1 other communities.
If a graph satisfies k-structural diversity, then it also satisfies k-degree anonymity.
32
THE ANONYMIZATION
Problem formulation: Given a graph G(V, E, C) and an integer k, 1 k |C|, the ≦ ≦
problem is to anonymize G to satisfy k-structural diversity such that information distortion is minimized.
The challenges: How to preserve community structures, even in
the implicit cases, while preserving privacy.
33
PROBLEM FORMULATION
Operation Adding Edge ◦ Connect two vertices belonging to the same community.◦ Can avoid destroying the communities.
34
PROCEDURE MERGENCE
◦ To protect a vertex v in an existing anonymous group, in which all the vertices have the same degree d
Com. 1
v
Com. 2
Com. 335
PROCEDURE MERGENCE
◦ To protect a vertex v in an existing anonymous group, in which all the vertices have the same degree d
Com. 1
v
Com. 2
Com. 336
PROCEDURE CREATION
To create a new anonymous group for a vertex v, such that all the vertices in the group locate in at least k difference communities and have the same degree as v
Com. 1 Com. 2 Com. 3
v
37
PROCEDURE CREATION
To create a new anonymous group for a vertex v, such that all the vertices in the group locate in at least k difference communities and have the same degree as v
Com. 1 Com. 2 Com. 3
v
38
THE EDGE-REDITECTION MECHANISM
◦ Is defined on w, v, x in the same community w: an anonymized vertex v and x : two not-yet-anonymized vertices
◦ Is to replace the edge (w, v) with the edge (w, x)
v w
v w
x
39
THE EDGE-REDITECTION MECHANISM
By mergence
v w
Com. 1 Com. 2
Com. 3
Q. Why needs the Edge-Reditection mechanism?
By mergence
v w
Com. 1 Com. 2
Com. 3
40
THE EDGE-REDITECTION MECHANISM
By creation
Com. 1
Com. 2 Com. 3
v w
Q. Why needs the Edge-Reditection mechanism?
By creation
Com. 1
Com. 2 Com. 3
v w
41
ALGORITHM EDGECONNECT (EC)
Procedures Mergence and Creation◦ Let Rv be the set of edges that could be redirected away from v
The Edge-Reditection mechanism
42
PROBLEM FORMULATION
Operation Adding Edge ◦ Connect two vertices belonging to the same community.◦ Can avoid destroying the communities.
Operation Splitting Vertex◦ Replace a vertex v with a set of substitute vertices, such that
each substitute vertex is connected with at least one edge incident to v originally.
◦ Each substitute vertex presents partial truth of the vertex v.
43
PROCEDURE CREATIONBYSPLIT
Split a set of vertices, including v, to create a new anonymous group
Com. 1 Com. 2
Com. 3
v
Com. 1 Com. 2
Com. 3
v1v2
44
PROCEDURE MERGEBYSPLIT
Split v into a set of substitute vertices s.t. each substitute vertex is protected in some existing anonymous group
Com. 1 Com. 2
Com. 3 v
v1
v2
v3
45
46
C.-H. Tai, P.-J. Tseng, P. S. Yu and M.-S. Chen, "Identities Anonymization in Dynamic Social Networks," In ICDM-11, 2011.
THE PROBLEM IN DYNAMIC SCENARIOS…
A dynamic social network will be sequentially released.
An attacker can monitor a victim for a period w.
Therefore, the adversary knowledge includes: The releases G t-w+1,
G t-w+2, …, G t during w A degree sequence Δv
w=(dvt-
w+1, dv
t-w+2, …, dvt) of a victim v
during w47
G2G1
Ex. John has two friends at time 1, and three friends at time 2.
PRIVACY MODEL: KW-STRUCTURAL DIVERSITY ANONYMITY
Base case of w=1 A group θd, consisting of all
vertices of degree d, is a k-shielding group if there is a vertex subset θ ⊆ θd s. t. (1) |θ |≥ k, and (2) any two vertices u and v in θ,
Cv ∩ Cv = ø, where C is the community identity.
48
The adversary knowledge includes: 1. The release
social network G t 2. A degree
sequence Δv
1=(dvt) of a
victim v
Ex. Mary has four friends.
???
G
PRIVACY MODEL: KW-STRUCTURAL DIVERSITY ANONYMITY
Dynamic scenarios of w>1 A consistent group ΘΔ is the set of vertices
that always share the same degree during w.
A consistent group ΘΔ is a k-shielding if
at each time instant t in w, there is a vertex subset Θ t
⊆ ΘΔ s. t. (1) |Θ t |≥ k, and (2) any two vertices u and v in Θ t, Cv
t ∩ Cv t = ø, where
C t is the community identity at time t.
49
The adversary knowledge of includes: 1. The releases G t-
w+1, G t-w+2, …, G t during w
2. A degree sequence Δv
w=(dvt-w+1, dv
t-
w+2, …, dvt) of a
victim v during w
…
THE ANONYMIZATION
Problem formulation: Suppose that every vertex in a series of sequential
releases G t-w+1, G t-w+2, …, G t-1 is protected. Given G t-w+1, G t-w+2, …, G t-1 and k, anonymize the
current social network Gt s. t. every vertex is protected in a k-shielding consistent group.
The challenges: The anonymization is depended on not only the
current social network but also previous w-1 releases.
Searching through all the w-1 releases to eliminate privacy leak is time consuming.
50
ANONYMIZATION ALGORITHM
Construct CS (Clustering Sequence) -Table to prevent the search through w graphs.
CS-Table Summary the vertex information in w-1 previous
releases. Fetch v’ info. in w graphs without scanning the graphs.
According to the degree sequence during w, sort the vertices in hierarchical clustering manner. Vertices in the same k-consistent shielding group are close in CS-Table. CS-Table can be incrementally updated.
According to the vertex ranking in CS-Table, anonymize each vertex to be protected.
51
THANK YOU~!