Int. J. Social Network Mining, Vol. 1, No. 1, 2012 51
Copyright © 2012 Inderscience Enterprises Ltd.
Anonymisation in social network: a literature survey and classification
Sanur Sharma*, Preeti Gupta and Vishal Bhatnagar Ambedkar Institute of Technology, Geeta Colony, Delhi-110031, India Fax: 91-11-22048044 E-mail: [email protected] E-mail: [email protected] E-mail: [email protected] *Corresponding author
Abstract: Social network has got its importance in current developing society as it is able to bring people close to each other. The closeness which is achieved has its own advantages and disadvantages in term of security breach which can be due to many reasons. We as authors had surveyed the various anonymisation techniques which are applied on social network for privacy preservation which is the need of today’s social network setup. We had done an in-depth study of existing literature from various known international journal papers to come up with a framework which will help the various researchers to focus on specific and emerging areas in the field of applying the anonymisation in privacy preservation in social network.
Keywords: social network; privacy; anonymisation.
Reference to this paper should be made as follows: Sharma, S., Gupta, P. and Bhatnagar, V. (2012) ‘Anonymisation in social network: a literature survey and classification’, Int. J. Social Network Mining, Vol. 1, No. 1, pp.51–66.
Biographical notes: Sanur Sharma received her BTech in Computer Science and Engineering from GGSIPU University, Delhi, India in 2010 and is currently pursuing MTech in Information Security from GGSIPU University, Delhi, India. Her research interests include database, data warehouse, data mining, and social network analysis.
Preeti Gupta received her BTech in Computer Science and Engineering from Kurukshetra University, Haryana, India in 2001 and is currently pursuing MTech in Information Security from GGSIPU University, Delhi, India. She has worked as a Lecturer from 2001 to 2005 and is currently working with Government of India as a Scientist. Her research interest are network traffic analysis (botnets, social networks), tracking of cyber attacks/crimes and Malware analysis.
Vishal Bhatnagar received his BTech in Computer Science and Engineering from Nagpur University in Nagpur, India in 1999 and MTech in Information Technology from Punjabi University, Patiala, India in 2005. He is an Assistant Professor in Computer Science and Engineering Department at Ambedkar Institute of Technology (Government of Delhi), GGSIPU University, Delhi, India. His research interests include database, advance database, data warehouse and data mining. He has been in teaching for more than eight years. He has guided under-graduate and post-graduate students in various research projects of databases and data mining.
52 S. Sharma et al.
1 Introduction
Social network is a virtual community consisting of social structures made up of nodes/vertices that represent individuals and links that represent the relationships between them (Hinds and Lee, 2008). The users of a social network can be categorised into three types: First are the passive members, who do not perform any activity, second are the inviters who encourage offline friends to join the social network and third are the linkers who fully participate in social evolution of the network (Kumar et al., 2006). The evolution of social network is a three step process, involving node arrival, edge initiation and edge destination selection (Leskovec et al., 2008). There are three main aspects in social networks which are studied by researchers and analysts, how a user joins a community/group, how the group will evolve and how it will change over a period of time. These three questions are also termed as membership, growth and change (Backstrom et al., 2006).
Social networking has led to an easy availability of public data where users have explicitly chosen to publish their links to others (Krishnamurthy and Wills, 2008). At the same time users expect a level of privacy and control over their data. This has led to various privacy issues and challenges in social networks. Social networking sites are the place where the users not only post their messages but also submit personal details like age, e-mail-id and country at the time of registration. There are various entities which can have access to the information present on these sites. They range from members of the group or network who are not friends to some external applications. The situation is worsened by the third party advertisers and aggregators or crawlers who keep track of the user activity or surfing habits (Bilge et al., 2009; Krishnamurthy and Wills, 2008). There can be number of ways in which privacy can be breached by adversaries, such as publication of specific information on the network to unintended recipients due to poorly understood defaults, accidental data release, intentional use of private data for marketing purposes by the social networking site, court order and many more (Maiya and Berger-Wolf, 2009; Lucas and Borisov, 2008).
The hard fact about social networking is that the way private or sensitive information could be gathered and utilised implicitly or explicitly by the adversary is hard to know and control (Ding et al., 2010; Wondracek et al., 2010; Krishnamurthy and Wills, 2008). General privacy risks associated with social networking which could be exploited are stalking, re-identification such as demographics re-identification, face re-identification, digital dossier which could reveal sensitive information such as current partners, political views and more (Gross and Acquisti, 2005).
As the utility and importance of the social networks could not be neglected, there is a need of some amount of privacy preservation in such a way that its utility is still maintained and could be used ethically by analysts. A balance needs to be maintained between privacy and utility (Krishnamurthy and Wills, 2008).
Social network sites like Facebook and Xing allows users to share every information’s which can reveal private and personal information about the users of the network. Apart from this, the data is provided to various researchers and third party applications for analysing and studying various trends using analytical tools like data mining. So in order to preserve the privacy of individuals in a social network, the data should be hidden in such a way so that an unauthorised party cannot infer anything from this published data (Cormode and Srivastava, 2009) and the authorised party can analyse the data without any security breach. Anonymisation is one of the methods mostly used
Anonymisation in social network 53
for achieving security in various real life scenarios such as preventing sensitive information and decreases the success rate of various attacks such as context aware phishing attack and context aware spam attack. A recent privacy breach that occurred on Facebook resulted in leakage of personal information of 100 million users and was published on Pirate Bay, the world’s largest file sharing website (Pattaya Daily News, 2010). Another instance of security breach reported by WSJ was when, some of the most popular applications on the social networking site, including Farmville were leaking user’s unique ID numbers to advertisers which could be used to look up any user’s name, regardless of their profile privacy settings (Business ETC, 2011).
Lots of research is done in finding the solution to the privacy issues related to social network using anonymisation techniques. In this paper, we as authors had tried to classify all such research to formulate a framework which can be considered as a way for researchers who would like to do their research. However, the research in the field of dynamic social network is still in its infancy stage which encourages us to deeply study the static and dynamic prospects of the implementation of anonymisation in social network. This is the motivation for our paper. This paper is organised as follows: Section 2 presents the research methodology. Section 3 outlines the classification method and framework specifying dimensions and various approaches. Section 4 discusses the classification of articles. Section 5 presents research implications. Section 6 discusses the limitations of the study and Section 7 concludes by presenting some direction for future research for privacy preservation in social networks using anonymisation.
2 Research methodology
As the nature of research in social networks and privacy are difficult to confine to specific disciplines, the relevant materials are scattered across various journals and conferences. Anonymisation is the most common academic discipline for preservation of privacy in social networks research.
To provide a comprehensive bibliography of the academic literature on anonymisation in social networks the following online journal and conference databases were searched:
• ACM
• Springer
• IEEE.
Each article was carefully reviewed and separately classified according to the three categories of social network dimensions and four approaches used for anonymisation of social networks, as shown in Table 1. Although this search was not exhaustive, it serves as a comprehensive base for an understanding of anonymisation in social networks.
The methodology adapted by authors behind the classification is primarily based on our rigorous study of the literature related to social network and anonymisation which revealed us that there are basically three dimensions that are targeted by the adversary in the social network namely identity disclosure, link disclosure, content disclosure and the various anonymisation techniques that are required to protect social network data from an adversary. Our classification framework and article classification is primarily based on
54 S. Sharma et al.
these disclosures, background knowledge possessed by an adversary and anonymisation techniques.
3 Classification method
According to Liu and Terzi (2008) and Liu et al. (2008), privacy disclosure on released social network data consists of three dimensions:
1 identity disclosure
2 link disclosure
3 content disclosure.
These three dimensions cover all the attacks which could be accomplished on the released social network data by the attacker. In order to achieve a complete privacy-protection all the three dimensions should be considered. However, there is no single privacy preserving technique which could be used to achieve privacy protection for all the three dimensions (Liu et al., 2008). Each of the privacy preserving technique can prevent from one of the above mentioned disclosure. According to Tinabo et al. (2009), pseudonymisation (use of false names) and anonymisation (without names) are the main techniques for privacy protection. However, it is difficult to provide privacy by simply replacement with false names (Zhou and Pei, 2008; Liu and Terzi, 2008), so our paper is mainly focused on various anonymisation techniques for privacy protection and do not discuss basic technologies such as cipher. Anonymisation techniques can be classified into following four approaches (Cormode and Srivastava, 2009; Zhou and Pei, 2008; Zheleva and Getoor, 2007):
1 clustering
2 clustering with constraints
3 modification of graph
4 hybrid.
The above four categories broadly covers the various anonymisation techniques. According to Zhou and Pei (2008), anonymising social network data is much more difficult and complicated as compared to relational data due to various reasons. One of the major factor that cannot be ignored and is worth mentioning is unlike relational data in which major association is based on the quasi identifiers, in case of a social network many pieces of information can be used to identify individuals, such as labels of vertices and edges, neighbourhood graphs, induced sub graphs, and their combinations (Kleinberg, 2007; Zheleva and Getoor, 2007). These information correspond to the background knowledge which an attacker possess and may utilise in launching an attack. In fact, the various anonymisation techniques consider only some of the background knowledge and their combination which could be possessed and utilise by the attacker. Here are some examples of background knowledge which are generally considered (Zhou et al., 2008):
Anonymisation in social network 55
1 vertex properties
2 vertex degree
3 sensitive attributes
4 link relationship
5 neighbourhood information
6 structural properties
7 graph metrics
8 sub-graphs.
Figure 1 Classification framework for anonymisation in social networks (see online version for colours)
56 S. Sharma et al.
A graphical classification framework on anonymisation techniques in social networks is proposed and shown in Figure 1. It is based on a review of the literature on anonymisation techniques in social networks The literature review on major privacy preserving technique in social networks helped us to identify the major privacy dimensions and techniques for their application in Social networks. This framework is also based on the research conducted by Liu and Terzi (2008) and Liu et al. (2008). They described the major privacy disclosure dimensions for social network as: identity disclosure, link disclosure, and content disclosure. In addition, Cormode and Srivastava (2009), Zhou et al. (2008) and Zheleva and Getoor (2007) described the various approaches for anonymisation in social networks as clustering, clustering with constraints, modification, and hybrid. We provide a brief description of these three dimensions and some references for further details, and each of them is discussed in the following sections.
3.1 Classification framework – SN privacy disclosures dimensions
In this study, privacy disclosures accounts for all the risk factors associated with the released social network data. Detailed knowledge of all the dimensions is required in order to take preventive measures to protect against unauthorised disclosure. The three dimensions of the SN privacy disclosures are (Liu and Terzi, 2008; Liu et al., 2008):
1 Identity disclosure: Identity disclosure is referred to as the disclosure of an individual who is associated with node revealed. The identity disclosure problem occurs when the social network data is publically released or to a third party which could be used for further analysis by the attacker. Simple naive anonymisation (removing the personally identifying information or replacement with a pseudorandom name) may not always guarantee privacy protection and could be susceptible to active and passive attacks (Backstrom et al., 2007). The situation get even worse by the existence of background knowledge with the attacker. The attacker could use different type of queries for re-identification such as vertex refinement queries, sub-graph queries; hub fingerprint queries (Hay et al., 2008).
2 Link disclosure: Link disclosure is referred as the disclosure of relationships between the targets. These relationships could be sensitive to reveal. The link disclosure problem occurs when some structural information is leaked or may be inferred using observed relationship or node attributes (Song et al., 2009; Zheleva and Getoor, 2007; Liben-Nowell and Kleinberg, 2003). Simple naïve anonymisation may not always guarantee privacy protection against link disclosure and could be susceptible to active (inserted sub-graph knowledge) and passive attacks (Narayanan and Shmatikov, 2009; Backstrom et al., 2007). Other important scenario for link Disclosure is the case in which users does not reveal their sensitive relationship but still it is possible to infer the some or complete link relationship using other non-anonymous nodes, which are compromised or bribed by the attacker to reveal the sensitive links (Korolova et al., 2008). Korolova et al. (2008) also mentioned that number of these non-anonymous nodes to be bribed decreases exponentially with an increase in look-ahead (edges seen).
Anonymisation in social network 57
3 Content disclosure: Content disclosure is referred to the disclosure of the data associated with the target like GPRS info, mail, telephone calls. This is possible by linking or matching the various set of released data (Ur and Ganapathy, 2009; Sweeney, 2002). The privacy preservation for content disclosure could be achieved by applying standard privacy-preserving such a perturbation, k-anonymisation where identity and attributes would be represented as table (Aggarwal and Yu, 2008).
3.2 Classification framework – anonymisation approaches
Within the context of disclosures in a social network, anonymisation approaches can be seen as a process aimed at the preservation of privacy in social networks. For this the data should be anonymised before its release. These anonymisation approaches should consider the privacy data models and the utility of data (Zhou et al., 2008). We now broadly classify the various anonymisation approaches that can be applied to the social network data as follows.
1 Clustering: A clustering-based method clusters vertices and edges into groups and anonymises a sub graph into a super-vertex (Zhou et al., 2008). Like we use generalisation approach to hide an individual’s identity in relational data, we can use clustering to hide it in social network data. According to Zhou et al. (2008) clustering approaches can further be classified in to vertex clustering methods, edge clustering methods, vertex and edge clustering methods and vertex-attribute mapping clustering methods.
2 Clustering with constraints: The cluster edge anonymisation with constraints technique creates edges between equivalence classes, but it requires equivalence class nodes to have some constraints as any two nodes in the original data (Zheleva and Getoor, 2007).
3 Modification of graph: This approach makes use of insertion, deletion and/or swapping of some nodes and edges in a social network. It also includes perturbation or random modification and greedy graph modifications (Zhou et al., 2008).
4 Hybrid approach: This approach includes combination of any of the above. There are various instances where people have used a combination of clustering and graph modification to achieve privacy (Zou et al., 2009; Zhou and Pei, 2010; Tripathy and Panda, 2010).
4 Classification of the articles
A detailed distribution of the articles classified in accordance with the proposed framework is shown in Table 1. In a social network many pieces of information is used for privacy preservation. This information could include various disclosures, anonymisation approaches, background knowledge and the anonymisation methods. Table 1 provides a brief summary of the various articles which gives a view as to which of the above mentioned information it covers.
58 S. Sharma et al.
Table 1 Distribution of articles according to the proposed classification model
Dis
clos
ures
An
onym
istio
n ap
proa
ches
Ba
ckgr
ound
info
rmat
ion
Anon
ymis
tion
algo
rith
m/m
etho
d Re
fere
nces
Iden
tity
disc
losu
re
Clu
ster
ing
Ver
tex
prop
ertie
s V
erte
x-at
tribu
te m
appi
ng
Cor
mod
e et
al.
(200
8)
Ver
tex
degr
ee, n
eigh
bour
hood
B
ound
ed t-
mea
ns a
lgor
ithm
and
un
ion
split
alg
orith
m
Thom
pson
and
Yao
(200
9)
Sub-
grap
h pr
oper
ties
H
e et
al.
(200
9)
Sens
itive
attr
ibut
es
p-se
nsiti
ve-k
ano
nym
ity
Ford
et a
l. (2
009)
Clu
ster
ing
with
con
stra
ints
M
odifi
catio
n of
gra
ph
Ver
tex
degr
ee, s
ub-g
raph
pro
perti
es
k-ca
ndid
ate
anon
ymity
and
gr
aph
rand
omis
atio
n H
ay e
t al.
(200
7)
Stru
ctur
al p
rope
rties
(ver
tex
stru
ctur
e)
Topo
logi
cal a
nony
mity
Si
ngh
and
Zhan
(200
7)
Ver
tex
degr
ee
k-de
gree
ano
nym
isat
ion
with
m
inim
um e
dge
disc
losu
re
Liu
and
Terz
i (20
08)
Ver
tex
prop
ertie
s, su
b-gr
aph
prop
ertie
s
Hay
et a
l. (2
008)
N
eigh
bour
hood
k-
neig
hbou
rhoo
d an
onym
ity
and
grap
h is
omor
phis
m
Zhou
and
Pei
(200
8)
Ver
tex
labe
l, ve
rtex
degr
ee, n
eigh
bour
hood
Wei
and
Lu
(200
8)
Ver
tex
degr
ee, s
ub-g
raph
R
ando
mis
atio
n Y
ing
et a
l. (2
009)
V
erte
x pr
oper
ties (
verte
x at
tribu
tes)
, ne
ighb
ourh
ood
wei
ght d
istri
butio
n W
eigh
ted
grap
h an
onym
isat
ion
Li a
nd S
hen
(201
0a)
Ver
tex
prop
ertie
s (ve
rtex
attri
bute
s)
Li
and
She
n (2
010b
)
Su
b-gr
aph
prop
ertie
s k-
isom
orph
ism
C
heng
et a
l. (2
010)
Hyb
rid
Ver
tex
degr
ee a
nd st
ruct
ure
prop
ertie
s B
asic
inte
r clu
ster
ing
mat
chin
g m
etho
d an
d ex
tend
ed in
ter
clus
terin
g m
atch
ing
met
hods
Thom
pson
and
Yao
(200
9)
Ver
tex
attri
bute
s, st
ruct
ure
prop
ertie
s k-
auto
mor
phis
m
(k-m
atch
alg
orith
m)
Zou
et a
l. (2
009)
Ver
tex
prop
ertie
s, ne
ighb
ourh
ood
k-an
onym
ity m
etho
d Zh
ou a
nd P
ei (2
010)
N
eigh
bour
hood
, ver
tex
prop
ertie
s k-
anon
ymity
of s
ub g
raph
s Tr
ipat
hy a
nd P
anda
(201
0)
Anonymisation in social network 59
Table 1 Distribution of articles according to the proposed classification model (continued)
Dis
clos
ures
An
onym
istio
n ap
proa
ches
Ba
ckgr
ound
info
rmat
ion
Anon
ymis
tion
algo
rith
m/m
etho
d Re
fere
nces
Link
dis
clos
ure
Clu
ster
ing
Ver
tex
attri
bute
s, ed
ge e
xist
ence
and
st
ruct
ural
pro
perti
es, s
ensi
tive
attri
bute
s Ed
ge c
lust
erin
g Zh
elev
a an
d G
etoo
r (20
07)
Nei
ghbo
urho
od, v
erte
x pr
oper
ties
Ver
tex
and
edge
clu
ster
ing
Cam
pan
and
Trut
a (2
008)
Clu
ster
ing
with
con
stra
ints
V
erte
x at
tribu
tes,
edge
exi
sten
ce a
nd
stru
ctur
al p
rope
rties
, sen
sitiv
e at
tribu
tes
Edge
clu
ster
ing
Zhel
eva
and
Get
oor (
2007
)
M
odifi
catio
n of
gra
ph
Ver
tex
prop
ertie
s
Libe
n-N
owel
l and
K
lein
berg
(200
3)
Stru
ctur
al p
rope
rties
and
mod
e st
ruct
ure
Topo
logi
cal a
nony
mity
Si
ngh
and
Zhan
(200
7)
Sens
itive
link
rela
tions
hip
Zh
elev
a an
d G
etoo
r (20
07)
Ver
tex
degr
ee, s
truct
ural
pro
perti
es.
Ran
dom
isat
ion
Yin
g an
d W
u (2
008)
Se
nsiti
ve li
nks,
verte
x pr
oxim
ity
(ver
tex
prop
ertie
s)
Ran
dom
isat
ion
Yin
g an
d W
u (2
009)
Ver
tex
degr
ee, v
erte
x pr
oper
ties,
st
ruct
ural
pro
perti
es
Zh
ang
and
Zhan
g (2
009)
Ver
tex
degr
ee, s
ub-g
raph
, R
ando
mis
atio
n Y
ing
et a
l. (2
009)
St
ruct
ural
pro
perti
es, v
erte
x pr
oper
ties
B
acks
trom
and
Le
skov
ec (2
011)
Su
b-gr
aph
prop
ertie
s k-
isom
orph
ism
C
heng
et a
l. (2
010)
Hyb
rid
C
onte
nt d
iscl
osur
e C
lust
erin
g Se
nsiti
ve a
ttrib
utes
p
sens
itive
k a
nony
mity
Fo
rd e
t al.
(200
9)
Gra
ph m
etric
s to
mea
sure
se
nsiti
ve a
ttrib
ute
valu
es
p+-s
ensi
tive
k-an
onym
ity a
nd
(p, α
)-se
nsiti
ve k
-ano
nym
ity
Sun
et a
l. (2
011)
M
odifi
catio
n of
gra
ph
k-
anon
ymity
Sw
eene
y (2
002)
Se
nsiti
ve a
ttrib
utes
l-d
iver
sity
M
acha
nava
jjhal
a et
al.
(200
6)
Sens
itive
edg
e at
tribu
tes
Line
ar p
rogr
amm
ing
mod
el (L
P)
Das
et a
l. (2
010a
, 201
0b)
H
ybrid
60 S. Sharma et al.
Almost all the techniques mentioned above for anonymisation of social network are centred around k-anonymity and randomisation.
4.1 k-annonymity
k-anonymity is a technique by which the information of an individual in a published data cannot be uniquely identified from at least k – 1 individuals in that published data (Sweeney, 2002). If the data is not properly anonymised, it can lead to the re-identification of the data by linking the published data with some external data acquired by an adversary (Sweeney, 2002). Though it was initially introduced for relational data, it was later also applied on social network data with some variations to it.
Hay et al. (2007) discussed naive anonymisation on social network data in which the nodes are renamed and the structure of the social network graph is not modified, but this could not ensure the privacy of an individual, as an adversary could re-identify the target with some background knowledge. Depending on various background knowledge assumption with the attacker, different authors have presented different variants of k-anonymity as follows:
1 k-candidate anonymity: To ensure anonymity the adversary should have a minimum level of uncertainty about the re-identification of any node in the graph (Hay et al., 2007). This can be achieved by k-candidate anonymity which is a generalisation of k-anonymity proposed by Sweeney (2002). k-candidate anonymity says that any individual cannot be re-identified from k other individuals in that graph. That is no individual can be identified with a probability greater than 1/k (Hay et al., 2007).
2 k-degree anonymity: Liu and Terzi (2008) studied k-degree anonymity in which every node has at least k – 1 other nodes in the graph having same degree. This prevents re-identification of individuals by adversaries having prior knowledge of degree of certain nodes. The authors have defined a graph anonymisation problem by which, given a graph G, asks for k-degree anonymous graph that stems from G, with minimum number of graph modifications (Liu and Terzi, 2008)
3 k-neighbourhood anonymity: Zhou and Pei (2008) have identified an essential type of privacy attacks: neighbourhood attacks. If an adversary has some knowledge about the neighbours of a target victim and the relationship among the neighbours, the victim may be re-identified in a social network even if the victim’s identity is preserved using the conventional anonymisation techniques. Liu et al. (2008) has defined the k-neighbourhood anonymity as, “a node is k-anonymous in a graph G if there are at least (k – 1) other nodes v1,…,vk − 1 ∈ VG such that the sub graphs constructed by the neighbours of each node v1,…,vk − 1 are all isomorphic”. A graph satisfies k-neighbourhood anonymity if all the nodes are k-anonymous as defined above.
4.2 Randomisation
Ying and Wu (2008) pointed out that graph perturbation provides protection from structural attack but it introduces structural changes resulting information loss. Ying and Wu (2008) proposed two randomisation techniques for privacy protection in a social network graph:
Anonymisation in social network 61
1 Random add/del technique: In this technique, randomly false edged are added to the network graph followed by the same number of true edge deletion such that number of edges remain unchanged.
Ying et al. (2009) compared this approach with k-anonymity and found that rand add/del technique provides protection from both identity and link disclosure whereas k-anonymity although preserves more structural properties can protect only against identity disclosure.
2 Random switch edges: In this technique, pair of edges are swapped such that nodes’ degree remain unchanged.
The k-anonymity and randomization techniques mentioned above cater only for identity and link disclosures in a social network. Below are the two privacy techniques that are used to prevent attribute disclosure also.
1 l-diversity: Machanavajjhala et al. (2006) discussed two attacks on k-anonymity that caused severe privacy problems. First, the lack of diversity in sensitive attributes and second, background knowledge attack. To overcome these two privacy problems the author proposed a powerful privacy definition called l-diversity.
l-diversity provides privacy even when the data publisher does not know what kind of knowledge is possessed by the adversary. The main idea behind ℓ-diversity is the requirement that the values of the sensitive attributes are well-represented in each group (Machanavajjhala et al., 2006). In order to achieve l-diversity in social network, Zhou and Pei (2010) introduced l-diverse partition where vertices have to be partitioned into equivalence groups, such that in every equivalence group of vertices, at most 1/l of the vertices are associated with the most frequent sensitive labels. The author has defined l-diverse partition in social networks as, given a social network G = (V, E) with n vertices and each vertex is associated with a non-sensitive label and a sensitive label, an l-diverse partition divides the vertices V into m equivalence groups of vertices, such that (freq(c) / | EG |) ≤ 1/l, where freq(c) is the number of vertices which carry the most frequent sensitive label c in group EG, and | EG | is the number of vertices in the corresponding equivalence group.
2 p-sensitivity k-anonymity: p-sensitive k-anonymity model has been defined as a sophistication of k-anonymity. This new property requires that there be at least p distinct values for each sensitive attribute within the records sharing a combination of key attributes (Sun et al., 2011). The authors have empirically investigated two enhanced k-anonymity models. Instead of publishing original specific sensitive attributes, the new models publish the categories that the sensitive values belong to. To overcome the shortcoming of the p-sensitive k-anonymity principle, Sun et al. (2011) have proposed two models. First, p+-sensitive k-anonymity and second, (p, α)-sensitive k-anonymity.
More recently some other approaches have been proposed for achieving anonymity in released social network data such as:
1 Social-k: Beach et al. (2010b) have proposed that it makes more sense to change the social network APIs or providing an API with anonymity than anonymising the complete existing social network data. Beach et al. (2010b) mentioned that traditional k-anonymity require anonymity across the entire dataset and thus
62 S. Sharma et al.
proposed a new approach for k-annonymity which states that “given a partial release of data from a personal dataset, wherein all data is quasi-identifiable, the released data must map to at least k distinct sets of individuals within the dataset”. And this too is achieved without modification of released data as long as k-anonymity constraints are met, and is otherwise selectively withheld which contrasts to existing approaches that release modified data, either distorted or generalised, to maintain k-anonymity.
2 q-anon: Beach et al. (2010a) proposed an anonymity model, q-anon, which measures the probability of an attacker logically deducing previously unknown information from a social network API while assuming the data being protected may already be public information. The author has defined an interactive data release model used by a social network API, which can be used to provide anonymity without bounding it to attacker’s background knowledge. This data release model, named ‘q-anon’ provided better privacy then the traditional anonymity methods. q-anon works by increasing the ambiguity in released data to prevent it from re-identification attack. Privacy is measured in terms of q, for which larger values represent greater ambiguity. q is measured by finding all unique user groups which could have accounted for the released data and then finding the largest fraction of those groups which include any one user. q is defined as the reciprocal of this fraction (Beach et al., 2010a).
5 Research implications
The literature review on the anonymisation has being done by the authors to find the nuggets of information pertaining to the use of anonymisation in social network. The classification of the various studies in this direction will help the researchers to identify the prospective direction of the future research. It will open the areas where the application of anonymisation will be helpful in providing a secure access in social network. The researchers who are already working in providing secure access to social network will come to know about the latest trends and direction of research. The in-depth study of such literature also helps to gain a better understanding as what should be next step towards providing secure access in social network.
6 Limitations
All the anonymisation techniques mentioned in the paper thrives to maintain a balance between utility and privacy of sensitive data which could lead to various scalability issues such as:
1 all these techniques are based on static nature of social network and do not consider the temporal nature of social network where republication of data could help an adversary to identify information
2 these techniques assume only some of the background knowledge which could be possessed by an adversary; however the attacker may use some different background knowledge to launch an attack
Anonymisation in social network 63
3 it is very difficult to select which is the best technique to achieve anonymisation of social network data, as it varies from situation to situation.
7 Conclusions and future research
Application of anonymisation techniques for preservation of privacy in social networks is an emerging trend in the industry. It has attracted the attention of practitioners and academics alike. This paper has identified various articles, related to the various anonymisation approaches for preservation of privacy in social networks, published between 2002 and 2011. It aims to give a research summary on the various anonymisation approaches which are most often used. Although this review cannot claim to be exhaustive, it does provide reasonable insight into this subject. The future works which can be done in the field of applying anonymisation in the social network are:
• Research on the privacy preserving techniques will increase in the future based on the increasing interest in social networking.
• The majority of reviewed articles relate to only single type of disclosure and assumes only some of the background knowledge which could be possessed by the adversaries. However, the attacker may utilise various types of background knowledge to achieve his objective. The future work could thus try to find the feasibility of collaboration of various anonymisation techniques for achieving better privacy than the existing techniques.
• Most of the anonymisation techniques for social networks are based on the static nature of social network data. There are relatively fewer articles which consider the dynamic nature of social networks. Future research could be aimed on applicability of dynamic techniques such as m-invariance, m-distinct for privacy preserving of dynamic releases of social network.
References Aggarwal, C.C. and Yu, P.S. (2008) ‘Privacy-preserving data mining: models and algorithms’,
Advances in Database Systems, Vol. 34, pp.10–32, Springer, New York, NY, USA. Backstrom, L. and Leskovec, J. (2011) ‘Supervised random walks: predicting and recommending
links in social networks’, WSDM 2011, Proceedings of Fourth International Conference on Web Search and Data Mining, pp.635–644.
Backstrom, L., Dwork, C. and Kleinberg, J. (2007) ‘Wherefore art thou r3579x? Anonymized social networks, hidden patterns, and structural steganography’, International Conference on World Wide Web (WWW).
Backstrom, L., Huttenlocher, D., Kleinberg, J. and Lan, X. (2006) ‘Group formation in large social networks: membership, growth, and evolution’, KDD’06: Proceedings of the 12th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, New York, NY, USA, pp.44–54.
Beach, A., Gartrell, M. and Han, R. (2010a) ‘q-anon: rethinking anonymity for social networks’, 2nd International Conference on Social Computing (SocialCom).
Beach, A., Gartrell, M. and Han, R. (2010b) ‘Social-k: real-time k-anonymity guarantees for social network applications’, IEEE International Workshop on Security and Social Networking (SESOC), PerCom.
64 S. Sharma et al.
Bilge, L., Strufe, T., Balzarotti, D. and Kirda, E. (2009) ‘All your contacts are belong to us: automated identity theft attacks on social networks’, 18th International Conference on World Wide Web (WWW).
Business ETC (2011) Facebook Security Breach Allowed Advertisers Access to User Data, available at http://businessetc.thejournal.ie/facebook-security-breach-allowed-advertisers-access-to-user-data-134549-May2011/ (accessed on August 2011).
Campan, A. and Truta, T. (2008) ‘A clustering approach for data and structural anonymity in social networks’, Proc. SIGKDD Intl. Workshop on Privacy, Security, and Trust in KDD (PinKDD’08).
Cheng, J., Fu, A.W.C. and Liu, J. (2010) ‘K-isomorphism: privacy preserving network publication against structural attacks’, SIGMOD 2010, pp.459–470.
Cormode, G. and Srivastava, D. (2009) ‘Anonymized data: generation, models, usage’, Proc. SIGMOD Conference, pp.1015–1018.
Cormode, G., Srivastava, D., Yu, T. and Zhang, Q. (2008) ‘Anonymizing bipartite graph data using safe groupings’, Proceedings of the 34th International Conference on Very Large Databases (VLDB’08).
Das, S., Egecioglu, Ö. and Abbadi, A.E. (2010a) ‘Anonimos: an LP based approach for anonymizing weighted social network graphs’, Presented at CoRR.
Das, S., Egecioglu, O. and Abbadi, A.E. (2010b) ‘Anonymizing weighted social network graphs’, IEEE ICDE Conference.
Ding, X., Zhang, L., Wan, Z. and Gu, M. (2010) ‘A brief survey on de-anonymization attacks in online social networks’, Computational Aspects of Social Networks (CASoN), 2010 International Conference, pp.611 – 615.
Ford, R., Truta, T.M. and Campan, A. (2009) ‘P-sensitive k-anonymity for social networks’, Proc. DMIN, pp.403–409.
Gross, R. and Acquisti, A. (2005) ‘Information revelation and privacy in online social networks (the Facebook case)’, Workshop on Privacy in the Electronic Society (WPES), pp.71–80.
Hay, M., Miklau, G., Jensen, D., Towsley, D. and Weis, P. (2008) ‘Resisting structural reidentification in anonymized social networks’, VLDB.
Hay, M., Miklau, G., Jensen, D., Weis, P. and Srivastava, S. (2007) Anonymizing Social Networks, University of Massachusetts Technical Report.
He, X., Vaidya, J., Shafiq, B., Adam, N.R. and Atluri, V. (2009) ‘Preserving privacy in social networks: a structure-aware approach’, Web Intelligence, pp.647–654.
Hinds, H. and Lee, R.M. (2008) ‘Social network structure as a critical success condition for virtual communities’, Proceeding HICSS ‘08 Proceedings of the Proceedings of the 41st Annual Hawaii International Conference on System Sciences, IEEE Computer Society, Washington, DC, USA.
Kleinberg, J.M. (2007) ‘Challenges in mining social network data: processes, privacy, and paradoxes’, KDD, pp.4–5.
Korolova, A., Motwani, R., Nabar, S.U. and Xu, Y. (2008) ‘Link privacy in social networks’, International Conference on Data Engineering (ICDE), pp.1355–1357.
Krishnamurthy, B. and Wills, C.E. (2008) ‘Characterizing privacy in online social networks’, Proceedings of the Workshop on Online Social Networks in conjunction with ACM SIGCOMM Conference, pp.37–42, ACM, Seattle, WA USA.
Kumar, R., Novak, J. and Tomkins, A. (2006) ‘Structure and evolution of online social networks’, KDD, pp.611–617.
Leskovec, J., Backstrom, L., Kumar, R. and Tomkins, A. (2008) ‘Microscopic evolution of social networks’, ACM SIGKDD.
Li, Y. and Shen, H. (2010a) ‘Anonymizing graphs against weight-based attacks’, IEEE International Conference on Data Mining Workshops, pp.491–498.
Anonymisation in social network 65
Li, Y. and Shen, H. (2010b) ‘On identity disclosure in weighted graphs’, IEEE The 11th International Conference on Parallel and Distributed Computing, Applications and Technologies.
Liben-Nowell, D. and Kleinberg, J. (2003) ‘The link prediction problem for social networks’, CIKM ‘03: Proceedings of the Twelfth International Conference on Information and Knowledge Management, pp.556–559, ACM, New York, NY, USA.
Liu, K. and Terzi, E. (2008) ‘Towards identity anonymization on graphs’, SIGMOD. Liu, K., Das, K., Grandison, T. and Kargupta, H. (2008) ‘Privacy preserving data analysis on
graphs and social networks’, Next Generation of Data Mining, Chapter 21, pp.419–437, CRC Press, USA.
Lucas, M. and Borisov, N. (2008) ‘flyByNight: mitigating the privacy risks of social networking’, WPES.
Machanavajjhala, A., Gehrke, J., Kifer, D. and Venkitasubramaniam, M. (2006) ‘l-diversity: Privacy beyond k-anonymity’, International Conference on Data Engineering (ICDE).
Maiya, A.S. and Berger-Wolf, T.Y. (2009) ‘Inferring the maximum likelihood hierarchy in social networks’, Proceedings of the 12th IEEE International Conference on Computational Science and Engineering (CSE ‘09), pp.245–250, Vancouver, Canada.
Narayanan, A. and Shmatikov, V. (2009) ‘De-anonymizing social networks’, IEEE Symposium on Security and Privacy (S&P).
Pattaya Daily News (2010) Facebook Security Breach, Private Details Published on Pirate Bay, available at http://www.pattayadailynews.com/en/2010/07/29/facebook-security-breach-private-details-published-on-pirate-bay/ (accessed on August 2011).
Singh, L. and Zhan, J. (2007) ‘Measuring topological anonymity in social networks’, IEEE International Conference on Granular Computing.
Song, H.H., Cho, T.W., Dave, V., Zhang, Y. and Qiu, L. (2009) ‘Scalable proximity estimation and link prediction in online social networks’, Proc. Internet Measurement Conference, pp.322–335.
Sun, X., Sun, L. and Wang, H. (2011) ‘Extended k-anonymity models against sensitive attribute disclosure’, Presented at Computer Communications, pp.526–535.
Sweeney, L. (2002) ‘k-anonymity: a model for protecting privacy’, International Journal on Uncertainty, Fuzziness and Knowledge-based Systems, Vol. 10, No. 5, pp. 557–570.
Thompson, B. and Yao, D. (2009) ‘The union-split algorithm and cluster-based anonymization of social networks’, Proceedings of the 4th International Symposium on Information, Computer; and Communications Security (ASIACCS 2009), ACM, pp.218–227.
Tinabo, R., Mtenzi, F., O’Driscoll, C. and O’Shea, B. (2009) ‘Anonymisation vs. pseudonymisation: which one is most useful for both privacy protection and usefulness of e-healthcare data’, The 4th International Conference for Internet Technology and Secured Transactions (ICITST), London, UK, pp.1–6.
Tripathy, B.K. and Panda, G.K. (2010) ‘A new approach to manage security against neighborhood attacks in social networks’, ASONAM, pp.264–269.
Ur, B.E. and Ganapathy, V. (2009) ‘Evaluating attack amplification in online social networks’, Web 2.0 Security and Privacy.
Wei, Q. and Lu, Y. (2008) ‘Preservation of privacy in publishing social network data’, Proceedings of the 2008 International Symposium on Electronic Commerce and Security (ISECS 2008), IEEE Computer Society, pp.421–425.
Wondracek, G., Holz, T., Kirda, E. and Kruegel, C. (2010) A Practical Attack to De-anonymize Social Network Users, Tech. Rep. TRiSecLab-0110-001.
Ying, X. and Wu, X. (2008) ‘Randomizing social networks: a spectrum preserving approach’, SIAM International Conference on Data Mining (SDM), pp.739–750.
Ying, X. and Wu, X. (2009) ‘On link privacy in randomizing social networks’, PAKDD.
66 S. Sharma et al.
Ying, X., Pan, K., Wu, X. and Guo, L. (2009) ‘Comparisons of randomization and k-degree anonymization schemes for privacy preserving social network publishing’, The 3rd SNA-KDD Workshop, pp.1–10.
Zhang, L. and Zhang, W. (2009) ‘Edge anonymity in social network graphs’, CSE ‘09 Proceedings of the 2009 International Conference on Computational Science and Engineering, Vol. 04, Washington, DC, USA, pp.1–8.
Zheleva, E. and Getoor, L. (2007) ‘Preserving the privacy of sensitive relationships in graph data’, PinKDD, pp.153–171.
Zhou, B. and Pei, J. (2008) ‘Preserving privacy in social networks against neighborhood attacks’, IEEE 24th International Conference on Data Engineering, pp.506–515.
Zhou, B. and Pei, J. (2010) ‘The k-anonymity and l-diversity approaches for privacy preservation in social networks against neighborhood attacks’, Knowledge and Information Systems, Springer, Verlag, London.
Zhou, B., Pei, J. and Luk, W.S. (2008) ‘A brief survey on anonymization techniques for privacy preserving publishing of social network data’, SIGKDD Explorations, Vol. 10, No. 2, pp.12–22, NY, USA.
Zou, L., Chen, L. and Tamer Özsu, M. (2009) ‘K-automorphism: a general framework for privacy preserving network publication’, Proceedings of PVLDB, Vol. 2, No. 1, pp.946–957.