Anonymisation in social network: a literature survey and classification

Int. J. Social Network Mining, Vol. 1, No. 1, 2012 51

Copyright © 2012 Inderscience Enterprises Ltd.

Anonymisation in social network: a literature survey and classification

Sanur Sharma*, Preeti Gupta and Vishal Bhatnagar Ambedkar Institute of Technology, Geeta Colony, Delhi-110031, India Fax: 91-11-22048044 E-mail: [email protected] E-mail: [email protected] E-mail: [email protected] *Corresponding author

Abstract: Social network has got its importance in current developing society as it is able to bring people close to each other. The closeness which is achieved has its own advantages and disadvantages in term of security breach which can be due to many reasons. We as authors had surveyed the various anonymisation techniques which are applied on social network for privacy preservation which is the need of today’s social network setup. We had done an in-depth study of existing literature from various known international journal papers to come up with a framework which will help the various researchers to focus on specific and emerging areas in the field of applying the anonymisation in privacy preservation in social network.

Keywords: social network; privacy; anonymisation.

Reference to this paper should be made as follows: Sharma, S., Gupta, P. and Bhatnagar, V. (2012) ‘Anonymisation in social network: a literature survey and classification’, Int. J. Social Network Mining, Vol. 1, No. 1, pp.51–66.

Biographical notes: Sanur Sharma received her BTech in Computer Science and Engineering from GGSIPU University, Delhi, India in 2010 and is currently pursuing MTech in Information Security from GGSIPU University, Delhi, India. Her research interests include database, data warehouse, data mining, and social network analysis.

Preeti Gupta received her BTech in Computer Science and Engineering from Kurukshetra University, Haryana, India in 2001 and is currently pursuing MTech in Information Security from GGSIPU University, Delhi, India. She has worked as a Lecturer from 2001 to 2005 and is currently working with Government of India as a Scientist. Her research interest are network traffic analysis (botnets, social networks), tracking of cyber attacks/crimes and Malware analysis.

Vishal Bhatnagar received his BTech in Computer Science and Engineering from Nagpur University in Nagpur, India in 1999 and MTech in Information Technology from Punjabi University, Patiala, India in 2005. He is an Assistant Professor in Computer Science and Engineering Department at Ambedkar Institute of Technology (Government of Delhi), GGSIPU University, Delhi, India. His research interests include database, advance database, data warehouse and data mining. He has been in teaching for more than eight years. He has guided under-graduate and post-graduate students in various research projects of databases and data mining.

52 S. Sharma et al.

1 Introduction

Social network is a virtual community consisting of social structures made up of nodes/vertices that represent individuals and links that represent the relationships between them (Hinds and Lee, 2008). The users of a social network can be categorised into three types: First are the passive members, who do not perform any activity, second are the inviters who encourage offline friends to join the social network and third are the linkers who fully participate in social evolution of the network (Kumar et al., 2006). The evolution of social network is a three step process, involving node arrival, edge initiation and edge destination selection (Leskovec et al., 2008). There are three main aspects in social networks which are studied by researchers and analysts, how a user joins a community/group, how the group will evolve and how it will change over a period of time. These three questions are also termed as membership, growth and change (Backstrom et al., 2006).

Social networking has led to an easy availability of public data where users have explicitly chosen to publish their links to others (Krishnamurthy and Wills, 2008). At the same time users expect a level of privacy and control over their data. This has led to various privacy issues and challenges in social networks. Social networking sites are the place where the users not only post their messages but also submit personal details like age, e-mail-id and country at the time of registration. There are various entities which can have access to the information present on these sites. They range from members of the group or network who are not friends to some external applications. The situation is worsened by the third party advertisers and aggregators or crawlers who keep track of the user activity or surfing habits (Bilge et al., 2009; Krishnamurthy and Wills, 2008). There can be number of ways in which privacy can be breached by adversaries, such as publication of specific information on the network to unintended recipients due to poorly understood defaults, accidental data release, intentional use of private data for marketing purposes by the social networking site, court order and many more (Maiya and Berger-Wolf, 2009; Lucas and Borisov, 2008).

The hard fact about social networking is that the way private or sensitive information could be gathered and utilised implicitly or explicitly by the adversary is hard to know and control (Ding et al., 2010; Wondracek et al., 2010; Krishnamurthy and Wills, 2008). General privacy risks associated with social networking which could be exploited are stalking, re-identification such as demographics re-identification, face re-identification, digital dossier which could reveal sensitive information such as current partners, political views and more (Gross and Acquisti, 2005).

As the utility and importance of the social networks could not be neglected, there is a need of some amount of privacy preservation in such a way that its utility is still maintained and could be used ethically by analysts. A balance needs to be maintained between privacy and utility (Krishnamurthy and Wills, 2008).

Social network sites like Facebook and Xing allows users to share every information’s which can reveal private and personal information about the users of the network. Apart from this, the data is provided to various researchers and third party applications for analysing and studying various trends using analytical tools like data mining. So in order to preserve the privacy of individuals in a social network, the data should be hidden in such a way so that an unauthorised party cannot infer anything from this published data (Cormode and Srivastava, 2009) and the authorised party can analyse the data without any security breach. Anonymisation is one of the methods mostly used

Anonymisation in social network 53

for achieving security in various real life scenarios such as preventing sensitive information and decreases the success rate of various attacks such as context aware phishing attack and context aware spam attack. A recent privacy breach that occurred on Facebook resulted in leakage of personal information of 100 million users and was published on Pirate Bay, the world’s largest file sharing website (Pattaya Daily News, 2010). Another instance of security breach reported by WSJ was when, some of the most popular applications on the social networking site, including Farmville were leaking user’s unique ID numbers to advertisers which could be used to look up any user’s name, regardless of their profile privacy settings (Business ETC, 2011).

Lots of research is done in finding the solution to the privacy issues related to social network using anonymisation techniques. In this paper, we as authors had tried to classify all such research to formulate a framework which can be considered as a way for researchers who would like to do their research. However, the research in the field of dynamic social network is still in its infancy stage which encourages us to deeply study the static and dynamic prospects of the implementation of anonymisation in social network. This is the motivation for our paper. This paper is organised as follows: Section 2 presents the research methodology. Section 3 outlines the classification method and framework specifying dimensions and various approaches. Section 4 discusses the classification of articles. Section 5 presents research implications. Section 6 discusses the limitations of the study and Section 7 concludes by presenting some direction for future research for privacy preservation in social networks using anonymisation.

2 Research methodology

As the nature of research in social networks and privacy are difficult to confine to specific disciplines, the relevant materials are scattered across various journals and conferences. Anonymisation is the most common academic discipline for preservation of privacy in social networks research.

To provide a comprehensive bibliography of the academic literature on anonymisation in social networks the following online journal and conference databases were searched:

• ACM

• Springer

• IEEE.

Each article was carefully reviewed and separately classified according to the three categories of social network dimensions and four approaches used for anonymisation of social networks, as shown in Table 1. Although this search was not exhaustive, it serves as a comprehensive base for an understanding of anonymisation in social networks.

The methodology adapted by authors behind the classification is primarily based on our rigorous study of the literature related to social network and anonymisation which revealed us that there are basically three dimensions that are targeted by the adversary in the social network namely identity disclosure, link disclosure, content disclosure and the various anonymisation techniques that are required to protect social network data from an adversary. Our classification framework and article classification is primarily based on

54 S. Sharma et al.

these disclosures, background knowledge possessed by an adversary and anonymisation techniques.

3 Classification method

According to Liu and Terzi (2008) and Liu et al. (2008), privacy disclosure on released social network data consists of three dimensions:

1 identity disclosure

2 link disclosure

3 content disclosure.

These three dimensions cover all the attacks which could be accomplished on the released social network data by the attacker. In order to achieve a complete privacy-protection all the three dimensions should be considered. However, there is no single privacy preserving technique which could be used to achieve privacy protection for all the three dimensions (Liu et al., 2008). Each of the privacy preserving technique can prevent from one of the above mentioned disclosure. According to Tinabo et al. (2009), pseudonymisation (use of false names) and anonymisation (without names) are the main techniques for privacy protection. However, it is difficult to provide privacy by simply replacement with false names (Zhou and Pei, 2008; Liu and Terzi, 2008), so our paper is mainly focused on various anonymisation techniques for privacy protection and do not discuss basic technologies such as cipher. Anonymisation techniques can be classified into following four approaches (Cormode and Srivastava, 2009; Zhou and Pei, 2008; Zheleva and Getoor, 2007):

1 clustering

2 clustering with constraints

3 modification of graph

4 hybrid.

The above four categories broadly covers the various anonymisation techniques. According to Zhou and Pei (2008), anonymising social network data is much more difficult and complicated as compared to relational data due to various reasons. One of the major factor that cannot be ignored and is worth mentioning is unlike relational data in which major association is based on the quasi identifiers, in case of a social network many pieces of information can be used to identify individuals, such as labels of vertices and edges, neighbourhood graphs, induced sub graphs, and their combinations (Kleinberg, 2007; Zheleva and Getoor, 2007). These information correspond to the background knowledge which an attacker possess and may utilise in launching an attack. In fact, the various anonymisation techniques consider only some of the background knowledge and their combination which could be possessed and utilise by the attacker. Here are some examples of background knowledge which are generally considered (Zhou et al., 2008):


1 vertex properties

2 vertex degree

3 sensitive attributes

4 link relationship

5 neighbourhood information

6 structural properties

7 graph metrics

8 sub-graphs.

Figure 1 Classification framework for anonymisation in social networks (see online version for colours)

56 S. Sharma et al.

A graphical classification framework on anonymisation techniques in social networks is proposed and shown in Figure 1. It is based on a review of the literature on anonymisation techniques in social networks The literature review on major privacy preserving technique in social networks helped us to identify the major privacy dimensions and techniques for their application in Social networks. This framework is also based on the research conducted by Liu and Terzi (2008) and Liu et al. (2008). They described the major privacy disclosure dimensions for social network as: identity disclosure, link disclosure, and content disclosure. In addition, Cormode and Srivastava (2009), Zhou et al. (2008) and Zheleva and Getoor (2007) described the various approaches for anonymisation in social networks as clustering, clustering with constraints, modification, and hybrid. We provide a brief description of these three dimensions and some references for further details, and each of them is discussed in the following sections.

3.1 Classification framework – SN privacy disclosures dimensions

In this study, privacy disclosures accounts for all the risk factors associated with the released social network data. Detailed knowledge of all the dimensions is required in order to take preventive measures to protect against unauthorised disclosure. The three dimensions of the SN privacy disclosures are (Liu and Terzi, 2008; Liu et al., 2008):

1 Identity disclosure: Identity disclosure is referred to as the disclosure of an individual who is associated with node revealed. The identity disclosure problem occurs when the social network data is publically released or to a third party which could be used for further analysis by the attacker. Simple naive anonymisation (removing the personally identifying information or replacement with a pseudorandom name) may not always guarantee privacy protection and could be susceptible to active and passive attacks (Backstrom et al., 2007). The situation get even worse by the existence of background knowledge with the attacker. The attacker could use different type of queries for re-identification such as vertex refinement queries, sub-graph queries; hub fingerprint queries (Hay et al., 2008).

2 Link disclosure: Link disclosure is referred as the disclosure of relationships between the targets. These relationships could be sensitive to reveal. The link disclosure problem occurs when some structural information is leaked or may be inferred using observed relationship or node attributes (Song et al., 2009; Zheleva and Getoor, 2007; Liben-Nowell and Kleinberg, 2003). Simple naïve anonymisation may not always guarantee privacy protection against link disclosure and could be susceptible to active (inserted sub-graph knowledge) and passive attacks (Narayanan and Shmatikov, 2009; Backstrom et al., 2007). Other important scenario for link Disclosure is the case in which users does not reveal their sensitive relationship but still it is possible to infer the some or complete link relationship using other non-anonymous nodes, which are compromised or bribed by the attacker to reveal the sensitive links (Korolova et al., 2008). Korolova et al. (2008) also mentioned that number of these non-anonymous nodes to be bribed decreases exponentially with an increase in look-ahead (edges seen).


3 Content disclosure: Content disclosure is referred to the disclosure of the data associated with the target like GPRS info, mail, telephone calls. This is possible by linking or matching the various set of released data (Ur and Ganapathy, 2009; Sweeney, 2002). The privacy preservation for content disclosure could be achieved by applying standard privacy-preserving such a perturbation, k-anonymisation where identity and attributes would be represented as table (Aggarwal and Yu, 2008).

3.2 Classification framework – anonymisation approaches

Within the context of disclosures in a social network, anonymisation approaches can be seen as a process aimed at the preservation of privacy in social networks. For this the data should be anonymised before its release. These anonymisation approaches should consider the privacy data models and the utility of data (Zhou et al., 2008). We now broadly classify the various anonymisation approaches that can be applied to the social network data as follows.

1 Clustering: A clustering-based method clusters vertices and edges into groups and anonymises a sub graph into a super-vertex (Zhou et al., 2008). Like we use generalisation approach to hide an individual’s identity in relational data, we can use clustering to hide it in social network data. According to Zhou et al. (2008) clustering approaches can further be classified in to vertex clustering methods, edge clustering methods, vertex and edge clustering methods and vertex-attribute mapping clustering methods.

2 Clustering with constraints: The cluster edge anonymisation with constraints technique creates edges between equivalence classes, but it requires equivalence class nodes to have some constraints as any two nodes in the original data (Zheleva and Getoor, 2007).

3 Modification of graph: This approach makes use of insertion, deletion and/or swapping of some nodes and edges in a social network. It also includes perturbation or random modification and greedy graph modifications (Zhou et al., 2008).

4 Hybrid approach: This approach includes combination of any of the above. There are various instances where people have used a combination of clustering and graph modification to achieve privacy (Zou et al., 2009; Zhou and Pei, 2010; Tripathy and Panda, 2010).

4 Classification of the articles

A detailed distribution of the articles classified in accordance with the proposed framework is shown in Table 1. In a social network many pieces of information is used for privacy preservation. This information could include various disclosures, anonymisation approaches, background knowledge and the anonymisation methods. Table 1 provides a brief summary of the various articles which gives a view as to which of the above mentioned information it covers.

58 S. Sharma et al.

Table 1 Distribution of articles according to the proposed classification model

Dis

clos

ures

An

onym

istio

n ap

proa

ches

Ba

ckgr

ound

info

rmat

ion

Anon

ymis

tion

algo

rith

m/m

etho

d Re

fere

nces

Iden

tity

disc

losu

re

Clu

ster

ing

Ver

tex

prop

ertie

s V

erte

x-at

tribu

te m

appi

ng

Cor

mod

e et

al.

(200

8)

Ver

tex

degr

ee, n

eigh

bour

hood

B

ound

ed t-

mea

ns a

lgor

ithm

and

un

ion

split

alg

orith

m

Thom

pson

and

Yao

(200

9)

Sub-

grap

h pr

oper

ties

H

e et

al.

(200

9)

Sens

itive

attr

ibut

es

p-se

nsiti

ve-k

ano

nym

ity

Ford

et a

l. (2

009)

Clu

ster

ing

with

con

stra

ints

M

odifi

catio

n of

gra

ph

Ver

tex

degr

ee, s

ub-g

raph

pro

perti

es

k-ca

ndid

ate

anon

ymity

and

gr

aph

rand

omis

atio

n H

ay e

t al.

(200

7)

Stru

ctur

al p

rope

rties

(ver

tex

stru

ctur

e)

Topo

logi

cal a

nony

mity

Si

ngh

and

Zhan

(200

7)

Ver

tex

degr

ee

k-de

gree

ano

nym

isat

ion

with

m

inim

um e

dge

disc

losu

re

Liu

and

Terz

i (20

08)

Ver

tex

prop

ertie

s, su

b-gr

aph

prop

ertie

s

Hay

et a

l. (2

008)

N

eigh

bour

hood

k-

neig

hbou

rhoo

d an

onym

ity

and

grap

h is

omor

phis

m

Zhou

and

Pei

(200

8)

Ver

tex

labe

l, ve

rtex

degr

ee, n

eigh

bour

hood

Wei

and

Lu

(200

8)

Ver

tex

degr

ee, s

ub-g

raph

R

ando

mis

atio

n Y

ing

et a

l. (2

009)

V

erte

x pr

oper

ties (

verte

x at

tribu

tes)

, ne

ighb

ourh

ood

wei

ght d

istri

butio

n W

eigh

ted

grap

h an

onym

isat

ion

Li a

nd S

hen

(201

0a)

Ver

tex

prop

ertie

s (ve

rtex

attri

bute

s)

Li

and

She

n (2

010b

)

Su

b-gr

aph

prop

ertie

s k-

isom

orph

ism

C

heng

et a

l. (2

010)

Hyb

rid

Ver

tex

degr

ee a

nd st

ruct

ure

prop

ertie

s B

asic

inte

r clu

ster

ing

mat

chin

g m

etho

d an

d ex

tend

ed in

ter

clus

terin

g m

atch

ing

met

hods

Thom

pson

and

Yao

(200

9)

Ver

tex

attri

bute

s, st

ruct

ure

prop

ertie

s k-

auto

mor

phis

m

(k-m

atch

alg

orith

m)

Zou

et a

l. (2

009)

Ver

tex

prop

ertie

s, ne

ighb

ourh

ood

k-an

onym

ity m

etho

d Zh

ou a

nd P

ei (2

010)

N

eigh

bour

hood

, ver

tex

prop

ertie

s k-

anon

ymity

of s

ub g

raph

s Tr

ipat

hy a

nd P

anda

(201

0)


Table 1 Distribution of articles according to the proposed classification model (continued)

Dis

clos

ures

An

onym

istio

n ap

proa

ches

Ba

ckgr

ound

info

rmat

ion

Anon

ymis

tion

algo

rith

m/m

etho

d Re

fere

nces

Link

dis

clos

ure

Clu

ster

ing

Ver

tex

attri

bute

s, ed

ge e

xist

ence

and

st

ruct

ural

pro

perti

es, s

ensi

tive

attri

bute

s Ed

ge c

lust

erin

g Zh

elev

a an

d G

etoo

r (20

07)

Nei

ghbo

urho

od, v

erte

x pr

oper

ties

Ver

tex

and

edge

clu

ster

ing

Cam

pan

and

Trut

a (2

008)

Clu

ster

ing

with

con

stra

ints

V

erte

x at

tribu

tes,

edge

exi

sten

ce a

nd

stru

ctur

al p

rope

rties

, sen

sitiv

e at

tribu

tes

Edge

clu

ster

ing

Zhel

eva

and

Get

oor (

2007

)

M

odifi

catio

n of

gra

ph

Ver

tex

prop

ertie

s

Libe

n-N

owel

l and

K

lein

berg

(200

3)

Stru

ctur

al p

rope

rties

and

mod

e st

ruct

ure

Topo

logi

cal a

nony

mity

Si

ngh

and

Zhan

(200

7)

Sens

itive

link

rela

tions

hip

Zh

elev

a an

d G

etoo

r (20

07)

Ver

tex

degr

ee, s

truct

ural

pro

perti

es.

Ran

dom

isat

ion

Yin

g an

d W

u (2

008)

Se

nsiti

ve li

nks,

verte

x pr

oxim

ity

(ver

tex

prop

ertie

s)

Ran

dom

isat

ion

Yin

g an

d W

u (2

009)

Ver

tex

degr

ee, v

erte

x pr

oper

ties,

st

ruct

ural

pro

perti

es

Zh

ang

and

Zhan

g (2

009)

Ver

tex

degr

ee, s

ub-g

raph

, R

ando

mis

atio

n Y

ing

et a

l. (2

009)

St

ruct

ural

pro

perti

es, v

erte

x pr

oper

ties

B

acks

trom

and

Le

skov

ec (2

011)

Su

b-gr

aph

prop

ertie

s k-

isom

orph

ism

C

heng

et a

l. (2

010)

Hyb

rid

C

onte

nt d

iscl

osur

e C

lust

erin

g Se

nsiti

ve a

ttrib

utes

p

sens

itive

k a

nony

mity

Fo

rd e

t al.

(200

9)

Gra

ph m

etric

s to

mea

sure

se

nsiti

ve a

ttrib

ute

valu

es

p+-s

ensi

tive

k-an

onym

ity a

nd

(p, α

)-se

nsiti

ve k

-ano

nym

ity

Sun

et a

l. (2

011)

M

odifi

catio

n of

gra

ph

k-

anon

ymity

Sw

eene

y (2

002)

Se

nsiti

ve a

ttrib

utes

l-d

iver

sity

M

acha

nava

jjhal

a et

al.

(200

6)

Sens

itive

edg

e at

tribu

tes

Line

ar p

rogr

amm

ing

mod

el (L

P)

Das

et a

l. (2

010a

, 201

0b)

H

ybrid

60 S. Sharma et al.

Almost all the techniques mentioned above for anonymisation of social network are centred around k-anonymity and randomisation.

4.1 k-annonymity

k-anonymity is a technique by which the information of an individual in a published data cannot be uniquely identified from at least k – 1 individuals in that published data (Sweeney, 2002). If the data is not properly anonymised, it can lead to the re-identification of the data by linking the published data with some external data acquired by an adversary (Sweeney, 2002). Though it was initially introduced for relational data, it was later also applied on social network data with some variations to it.

Hay et al. (2007) discussed naive anonymisation on social network data in which the nodes are renamed and the structure of the social network graph is not modified, but this could not ensure the privacy of an individual, as an adversary could re-identify the target with some background knowledge. Depending on various background knowledge assumption with the attacker, different authors have presented different variants of k-anonymity as follows:

1 k-candidate anonymity: To ensure anonymity the adversary should have a minimum level of uncertainty about the re-identification of any node in the graph (Hay et al., 2007). This can be achieved by k-candidate anonymity which is a generalisation of k-anonymity proposed by Sweeney (2002). k-candidate anonymity says that any individual cannot be re-identified from k other individuals in that graph. That is no individual can be identified with a probability greater than 1/k (Hay et al., 2007).

2 k-degree anonymity: Liu and Terzi (2008) studied k-degree anonymity in which every node has at least k – 1 other nodes in the graph having same degree. This prevents re-identification of individuals by adversaries having prior knowledge of degree of certain nodes. The authors have defined a graph anonymisation problem by which, given a graph G, asks for k-degree anonymous graph that stems from G, with minimum number of graph modifications (Liu and Terzi, 2008)

3 k-neighbourhood anonymity: Zhou and Pei (2008) have identified an essential type of privacy attacks: neighbourhood attacks. If an adversary has some knowledge about the neighbours of a target victim and the relationship among the neighbours, the victim may be re-identified in a social network even if the victim’s identity is preserved using the conventional anonymisation techniques. Liu et al. (2008) has defined the k-neighbourhood anonymity as, “a node is k-anonymous in a graph G if there are at least (k – 1) other nodes v1,…,vk − 1 ∈ VG such that the sub graphs constructed by the neighbours of each node v1,…,vk − 1 are all isomorphic”. A graph satisfies k-neighbourhood anonymity if all the nodes are k-anonymous as defined above.

4.2 Randomisation

Ying and Wu (2008) pointed out that graph perturbation provides protection from structural attack but it introduces structural changes resulting information loss. Ying and Wu (2008) proposed two randomisation techniques for privacy protection in a social network graph:


1 Random add/del technique: In this technique, randomly false edged are added to the network graph followed by the same number of true edge deletion such that number of edges remain unchanged.

Ying et al. (2009) compared this approach with k-anonymity and found that rand add/del technique provides protection from both identity and link disclosure whereas k-anonymity although preserves more structural properties can protect only against identity disclosure.

2 Random switch edges: In this technique, pair of edges are swapped such that nodes’ degree remain unchanged.

The k-anonymity and randomization techniques mentioned above cater only for identity and link disclosures in a social network. Below are the two privacy techniques that are used to prevent attribute disclosure also.

1 l-diversity: Machanavajjhala et al. (2006) discussed two attacks on k-anonymity that caused severe privacy problems. First, the lack of diversity in sensitive attributes and second, background knowledge attack. To overcome these two privacy problems the author proposed a powerful privacy definition called l-diversity.

l-diversity provides privacy even when the data publisher does not know what kind of knowledge is possessed by the adversary. The main idea behind ℓ-diversity is the requirement that the values of the sensitive attributes are well-represented in each group (Machanavajjhala et al., 2006). In order to achieve l-diversity in social network, Zhou and Pei (2010) introduced l-diverse partition where vertices have to be partitioned into equivalence groups, such that in every equivalence group of vertices, at most 1/l of the vertices are associated with the most frequent sensitive labels. The author has defined l-diverse partition in social networks as, given a social network G = (V, E) with n vertices and each vertex is associated with a non-sensitive label and a sensitive label, an l-diverse partition divides the vertices V into m equivalence groups of vertices, such that (freq(c) / | EG |) ≤ 1/l, where freq(c) is the number of vertices which carry the most frequent sensitive label c in group EG, and | EG | is the number of vertices in the corresponding equivalence group.

2 p-sensitivity k-anonymity: p-sensitive k-anonymity model has been defined as a sophistication of k-anonymity. This new property requires that there be at least p distinct values for each sensitive attribute within the records sharing a combination of key attributes (Sun et al., 2011). The authors have empirically investigated two enhanced k-anonymity models. Instead of publishing original specific sensitive attributes, the new models publish the categories that the sensitive values belong to. To overcome the shortcoming of the p-sensitive k-anonymity principle, Sun et al. (2011) have proposed two models. First, p+-sensitive k-anonymity and second, (p, α)-sensitive k-anonymity.

More recently some other approaches have been proposed for achieving anonymity in released social network data such as:

1 Social-k: Beach et al. (2010b) have proposed that it makes more sense to change the social network APIs or providing an API with anonymity than anonymising the complete existing social network data. Beach et al. (2010b) mentioned that traditional k-anonymity require anonymity across the entire dataset and thus

62 S. Sharma et al.

proposed a new approach for k-annonymity which states that “given a partial release of data from a personal dataset, wherein all data is quasi-identifiable, the released data must map to at least k distinct sets of individuals within the dataset”. And this too is achieved without modification of released data as long as k-anonymity constraints are met, and is otherwise selectively withheld which contrasts to existing approaches that release modified data, either distorted or generalised, to maintain k-anonymity.

2 q-anon: Beach et al. (2010a) proposed an anonymity model, q-anon, which measures the probability of an attacker logically deducing previously unknown information from a social network API while assuming the data being protected may already be public information. The author has defined an interactive data release model used by a social network API, which can be used to provide anonymity without bounding it to attacker’s background knowledge. This data release model, named ‘q-anon’ provided better privacy then the traditional anonymity methods. q-anon works by increasing the ambiguity in released data to prevent it from re-identification attack. Privacy is measured in terms of q, for which larger values represent greater ambiguity. q is measured by finding all unique user groups which could have accounted for the released data and then finding the largest fraction of those groups which include any one user. q is defined as the reciprocal of this fraction (Beach et al., 2010a).

5 Research implications

The literature review on the anonymisation has being done by the authors to find the nuggets of information pertaining to the use of anonymisation in social network. The classification of the various studies in this direction will help the researchers to identify the prospective direction of the future research. It will open the areas where the application of anonymisation will be helpful in providing a secure access in social network. The researchers who are already working in providing secure access to social network will come to know about the latest trends and direction of research. The in-depth study of such literature also helps to gain a better understanding as what should be next step towards providing secure access in social network.

6 Limitations

All the anonymisation techniques mentioned in the paper thrives to maintain a balance between utility and privacy of sensitive data which could lead to various scalability issues such as:

1 all these techniques are based on static nature of social network and do not consider the temporal nature of social network where republication of data could help an adversary to identify information

2 these techniques assume only some of the background knowledge which could be possessed by an adversary; however the attacker may use some different background knowledge to launch an attack


3 it is very difficult to select which is the best technique to achieve anonymisation of social network data, as it varies from situation to situation.

7 Conclusions and future research

Application of anonymisation techniques for preservation of privacy in social networks is an emerging trend in the industry. It has attracted the attention of practitioners and academics alike. This paper has identified various articles, related to the various anonymisation approaches for preservation of privacy in social networks, published between 2002 and 2011. It aims to give a research summary on the various anonymisation approaches which are most often used. Although this review cannot claim to be exhaustive, it does provide reasonable insight into this subject. The future works which can be done in the field of applying anonymisation in the social network are:

• Research on the privacy preserving techniques will increase in the future based on the increasing interest in social networking.

• The majority of reviewed articles relate to only single type of disclosure and assumes only some of the background knowledge which could be possessed by the adversaries. However, the attacker may utilise various types of background knowledge to achieve his objective. The future work could thus try to find the feasibility of collaboration of various anonymisation techniques for achieving better privacy than the existing techniques.

• Most of the anonymisation techniques for social networks are based on the static nature of social network data. There are relatively fewer articles which consider the dynamic nature of social networks. Future research could be aimed on applicability of dynamic techniques such as m-invariance, m-distinct for privacy preserving of dynamic releases of social network.

References Aggarwal, C.C. and Yu, P.S. (2008) ‘Privacy-preserving data mining: models and algorithms’,

Advances in Database Systems, Vol. 34, pp.10–32, Springer, New York, NY, USA. Backstrom, L. and Leskovec, J. (2011) ‘Supervised random walks: predicting and recommending

links in social networks’, WSDM 2011, Proceedings of Fourth International Conference on Web Search and Data Mining, pp.635–644.

Backstrom, L., Dwork, C. and Kleinberg, J. (2007) ‘Wherefore art thou r3579x? Anonymized social networks, hidden patterns, and structural steganography’, International Conference on World Wide Web (WWW).

Backstrom, L., Huttenlocher, D., Kleinberg, J. and Lan, X. (2006) ‘Group formation in large social networks: membership, growth, and evolution’, KDD’06: Proceedings of the 12th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, New York, NY, USA, pp.44–54.

Beach, A., Gartrell, M. and Han, R. (2010a) ‘q-anon: rethinking anonymity for social networks’, 2nd International Conference on Social Computing (SocialCom).

Beach, A., Gartrell, M. and Han, R. (2010b) ‘Social-k: real-time k-anonymity guarantees for social network applications’, IEEE International Workshop on Security and Social Networking (SESOC), PerCom.

64 S. Sharma et al.

Bilge, L., Strufe, T., Balzarotti, D. and Kirda, E. (2009) ‘All your contacts are belong to us: automated identity theft attacks on social networks’, 18th International Conference on World Wide Web (WWW).

Business ETC (2011) Facebook Security Breach Allowed Advertisers Access to User Data, available at http://businessetc.thejournal.ie/facebook-security-breach-allowed-advertisers-access-to-user-data-134549-May2011/ (accessed on August 2011).

Campan, A. and Truta, T. (2008) ‘A clustering approach for data and structural anonymity in social networks’, Proc. SIGKDD Intl. Workshop on Privacy, Security, and Trust in KDD (PinKDD’08).

Cheng, J., Fu, A.W.C. and Liu, J. (2010) ‘K-isomorphism: privacy preserving network publication against structural attacks’, SIGMOD 2010, pp.459–470.

Cormode, G. and Srivastava, D. (2009) ‘Anonymized data: generation, models, usage’, Proc. SIGMOD Conference, pp.1015–1018.

Cormode, G., Srivastava, D., Yu, T. and Zhang, Q. (2008) ‘Anonymizing bipartite graph data using safe groupings’, Proceedings of the 34th International Conference on Very Large Databases (VLDB’08).

Das, S., Egecioglu, Ö. and Abbadi, A.E. (2010a) ‘Anonimos: an LP based approach for anonymizing weighted social network graphs’, Presented at CoRR.

Das, S., Egecioglu, O. and Abbadi, A.E. (2010b) ‘Anonymizing weighted social network graphs’, IEEE ICDE Conference.

Ding, X., Zhang, L., Wan, Z. and Gu, M. (2010) ‘A brief survey on de-anonymization attacks in online social networks’, Computational Aspects of Social Networks (CASoN), 2010 International Conference, pp.611 – 615.

Ford, R., Truta, T.M. and Campan, A. (2009) ‘P-sensitive k-anonymity for social networks’, Proc. DMIN, pp.403–409.

Gross, R. and Acquisti, A. (2005) ‘Information revelation and privacy in online social networks (the Facebook case)’, Workshop on Privacy in the Electronic Society (WPES), pp.71–80.

Hay, M., Miklau, G., Jensen, D., Towsley, D. and Weis, P. (2008) ‘Resisting structural reidentification in anonymized social networks’, VLDB.

Hay, M., Miklau, G., Jensen, D., Weis, P. and Srivastava, S. (2007) Anonymizing Social Networks, University of Massachusetts Technical Report.

He, X., Vaidya, J., Shafiq, B., Adam, N.R. and Atluri, V. (2009) ‘Preserving privacy in social networks: a structure-aware approach’, Web Intelligence, pp.647–654.

Hinds, H. and Lee, R.M. (2008) ‘Social network structure as a critical success condition for virtual communities’, Proceeding HICSS ‘08 Proceedings of the Proceedings of the 41st Annual Hawaii International Conference on System Sciences, IEEE Computer Society, Washington, DC, USA.

Kleinberg, J.M. (2007) ‘Challenges in mining social network data: processes, privacy, and paradoxes’, KDD, pp.4–5.

Korolova, A., Motwani, R., Nabar, S.U. and Xu, Y. (2008) ‘Link privacy in social networks’, International Conference on Data Engineering (ICDE), pp.1355–1357.

Krishnamurthy, B. and Wills, C.E. (2008) ‘Characterizing privacy in online social networks’, Proceedings of the Workshop on Online Social Networks in conjunction with ACM SIGCOMM Conference, pp.37–42, ACM, Seattle, WA USA.

Kumar, R., Novak, J. and Tomkins, A. (2006) ‘Structure and evolution of online social networks’, KDD, pp.611–617.

Leskovec, J., Backstrom, L., Kumar, R. and Tomkins, A. (2008) ‘Microscopic evolution of social networks’, ACM SIGKDD.

Li, Y. and Shen, H. (2010a) ‘Anonymizing graphs against weight-based attacks’, IEEE International Conference on Data Mining Workshops, pp.491–498.


Li, Y. and Shen, H. (2010b) ‘On identity disclosure in weighted graphs’, IEEE The 11th International Conference on Parallel and Distributed Computing, Applications and Technologies.

Liben-Nowell, D. and Kleinberg, J. (2003) ‘The link prediction problem for social networks’, CIKM ‘03: Proceedings of the Twelfth International Conference on Information and Knowledge Management, pp.556–559, ACM, New York, NY, USA.

Liu, K. and Terzi, E. (2008) ‘Towards identity anonymization on graphs’, SIGMOD. Liu, K., Das, K., Grandison, T. and Kargupta, H. (2008) ‘Privacy preserving data analysis on

graphs and social networks’, Next Generation of Data Mining, Chapter 21, pp.419–437, CRC Press, USA.

Lucas, M. and Borisov, N. (2008) ‘flyByNight: mitigating the privacy risks of social networking’, WPES.

Machanavajjhala, A., Gehrke, J., Kifer, D. and Venkitasubramaniam, M. (2006) ‘l-diversity: Privacy beyond k-anonymity’, International Conference on Data Engineering (ICDE).

Maiya, A.S. and Berger-Wolf, T.Y. (2009) ‘Inferring the maximum likelihood hierarchy in social networks’, Proceedings of the 12th IEEE International Conference on Computational Science and Engineering (CSE ‘09), pp.245–250, Vancouver, Canada.

Narayanan, A. and Shmatikov, V. (2009) ‘De-anonymizing social networks’, IEEE Symposium on Security and Privacy (S&P).

Pattaya Daily News (2010) Facebook Security Breach, Private Details Published on Pirate Bay, available at http://www.pattayadailynews.com/en/2010/07/29/facebook-security-breach-private-details-published-on-pirate-bay/ (accessed on August 2011).

Singh, L. and Zhan, J. (2007) ‘Measuring topological anonymity in social networks’, IEEE International Conference on Granular Computing.

Song, H.H., Cho, T.W., Dave, V., Zhang, Y. and Qiu, L. (2009) ‘Scalable proximity estimation and link prediction in online social networks’, Proc. Internet Measurement Conference, pp.322–335.

Sun, X., Sun, L. and Wang, H. (2011) ‘Extended k-anonymity models against sensitive attribute disclosure’, Presented at Computer Communications, pp.526–535.

Sweeney, L. (2002) ‘k-anonymity: a model for protecting privacy’, International Journal on Uncertainty, Fuzziness and Knowledge-based Systems, Vol. 10, No. 5, pp. 557–570.

Thompson, B. and Yao, D. (2009) ‘The union-split algorithm and cluster-based anonymization of social networks’, Proceedings of the 4th International Symposium on Information, Computer; and Communications Security (ASIACCS 2009), ACM, pp.218–227.

Tinabo, R., Mtenzi, F., O’Driscoll, C. and O’Shea, B. (2009) ‘Anonymisation vs. pseudonymisation: which one is most useful for both privacy protection and usefulness of e-healthcare data’, The 4th International Conference for Internet Technology and Secured Transactions (ICITST), London, UK, pp.1–6.

Tripathy, B.K. and Panda, G.K. (2010) ‘A new approach to manage security against neighborhood attacks in social networks’, ASONAM, pp.264–269.

Ur, B.E. and Ganapathy, V. (2009) ‘Evaluating attack amplification in online social networks’, Web 2.0 Security and Privacy.

Wei, Q. and Lu, Y. (2008) ‘Preservation of privacy in publishing social network data’, Proceedings of the 2008 International Symposium on Electronic Commerce and Security (ISECS 2008), IEEE Computer Society, pp.421–425.

Wondracek, G., Holz, T., Kirda, E. and Kruegel, C. (2010) A Practical Attack to De-anonymize Social Network Users, Tech. Rep. TRiSecLab-0110-001.

Ying, X. and Wu, X. (2008) ‘Randomizing social networks: a spectrum preserving approach’, SIAM International Conference on Data Mining (SDM), pp.739–750.

Ying, X. and Wu, X. (2009) ‘On link privacy in randomizing social networks’, PAKDD.

66 S. Sharma et al.

Ying, X., Pan, K., Wu, X. and Guo, L. (2009) ‘Comparisons of randomization and k-degree anonymization schemes for privacy preserving social network publishing’, The 3rd SNA-KDD Workshop, pp.1–10.

Zhang, L. and Zhang, W. (2009) ‘Edge anonymity in social network graphs’, CSE ‘09 Proceedings of the 2009 International Conference on Computational Science and Engineering, Vol. 04, Washington, DC, USA, pp.1–8.

Zheleva, E. and Getoor, L. (2007) ‘Preserving the privacy of sensitive relationships in graph data’, PinKDD, pp.153–171.

Zhou, B. and Pei, J. (2008) ‘Preserving privacy in social networks against neighborhood attacks’, IEEE 24th International Conference on Data Engineering, pp.506–515.

Zhou, B. and Pei, J. (2010) ‘The k-anonymity and l-diversity approaches for privacy preservation in social networks against neighborhood attacks’, Knowledge and Information Systems, Springer, Verlag, London.

Zhou, B., Pei, J. and Luk, W.S. (2008) ‘A brief survey on anonymization techniques for privacy preserving publishing of social network data’, SIGKDD Explorations, Vol. 10, No. 2, pp.12–22, NY, USA.

Zou, L., Chen, L. and Tamer Özsu, M. (2009) ‘K-automorphism: a general framework for privacy preserving network publication’, Proceedings of PVLDB, Vol. 2, No. 1, pp.946–957.

Date post:	15-Dec-2016
Category:	Documents
Upload:	vishal
View:	218 times
Download:	3 times

Anonymisation in social network: a literature survey and classification

Documents