+ All Categories
Home > Documents > Community Detection in Multi-Layer Graphs: A...

Community Detection in Multi-Layer Graphs: A...

Date post: 12-Jun-2020
Category:
Upload: others
View: 0 times
Download: 0 times
Share this document with a friend
12
Community Detection in Multi-Layer Graphs: A Survey Jungeun Kim, Jae-Gil Lee Department of Knowledge Service Engineering, KAIST Daejeon, Republic of Korea {je kim, jaegil}@kaist.ac.kr ABSTRACT Community detection, also known as graph clustering, has been extensively studied in the literature. The goal of community detection is to partition vertices in a complex graph into densely-connected components so- called communities. In recent applications, however, an entity is associated with multiple aspects of rela- tionships, which brings new challenges in community detection. The multiple aspects of interactions can be modeled as a multi-layer graph comprised of multiple interdependent graphs, where each graph represents an aspect of the interactions. Great efforts have therefore been made to tackle the problem of community detec- tion in multi-layer graphs. In this survey, we provide readers with a comprehensive understanding of com- munity detection in multi-layer graphs and compare the state-of-the-art algorithms with respect to their underly- ing properties. 1. INTRODUCTION Graph mining in complex networks has attracted significant attention during the past several years. One of the important tasks in graph mining is com- munity detection, in which the objective is to parti- tion a graph into several densely-connected compo- nents. Such components correspond to sets of simi- lar vertices, and can thus be regarded as a commu- nity [17]. Since this problem arises in a broad range of applications, a large number of approaches have been proposed in the literature [10, 13, 15]. In contrast to the traditional problem, recent ap- plications, such as mobile and social network anal- yses, give rise to intriguing new challenges [6]. In this context, assumably, data encapsulates multiple aspects of human interactions, e.g., those among coworkers and those among friends. The multi- ple aspects of relationships can be represented by a multi-layer graph comprised of multiple interdepen- dent graphs, where each graph represents an aspect Jae-Gil Lee is the corresponding author. of the relationships. Therefore, great eorts have been made to solve the challenge of community de- tection in multi-layer graphs. The goal of this survey is to provide a timely re- mark on the status of improving community detec- tion in multi-layer graphs. We oer a brief overview of primary algorithms and classify them with re- spect to their underlying strategies. The rest of this paper is organized as follows. Sec- tion 2 discusses the background information regard- ing multi-layer graphs. Section 3 presents multi- layer graph datasets used in recent studies. Section 4 introduces community detection approaches in two-layer graphs. Section 5 introduces community detection approaches in multi-layer graphs. Section 6 presents comparisons of community detection ap- proaches in multi-layer graphs. Section 7 suggests promising future research directions in multi-layer graphs. Finally, Section 8 concludes this survey. 2. BACKGROUND In this section, we discuss some background in- formation about multi-layer graphs. We present the formal definitions of multi-layer graphs [14] in Section 2.1. Then, we briefly summarize the com- munity detection approaches for single graphs and the challenges for developing those for multi-layer graphs in Section 2.2. To enhance the readability of this survey, frequently used symbols are summa- rized in Table 1. Table 1: The summary of symbols. Symbol Description G a graph V a set of vertices S a set of attributes L a set of layers n the number of vertices m the number of edges k the number of clusters t the number of attributes l the number of layers SIGMOD Record, September 2015 (Vol. 44, No. 3) 37
Transcript
Page 1: Community Detection in Multi-Layer Graphs: A Surveymontesi/CBD/Articoli/SurveyComunityDectionMultil… · Community Detection in Multi-Layer Graphs: A Survey Jungeun Kim, Jae-Gil

Community Detection in Multi-Layer Graphs: A Survey

Jungeun Kim, Jae-Gil Lee⇤

Department of Knowledge Service Engineering, KAISTDaejeon, Republic of Korea

{je kim, jaegil}@kaist.ac.kr

ABSTRACTCommunity detection, also known as graph clustering,has been extensively studied in the literature. The goalof community detection is to partition vertices in acomplex graph into densely-connected components so-called communities. In recent applications, however,an entity is associated with multiple aspects of rela-tionships, which brings new challenges in communitydetection. The multiple aspects of interactions can bemodeled as a multi-layer graph comprised of multipleinterdependent graphs, where each graph represents anaspect of the interactions. Great efforts have thereforebeen made to tackle the problem of community detec-tion in multi-layer graphs. In this survey, we providereaders with a comprehensive understanding of com-munity detection in multi-layer graphs and compare thestate-of-the-art algorithms with respect to their underly-ing properties.

1. INTRODUCTIONGraph mining in complex networks has attracted

significant attention during the past several years.One of the important tasks in graph mining is com-munity detection, in which the objective is to parti-tion a graph into several densely-connected compo-nents. Such components correspond to sets of simi-lar vertices, and can thus be regarded as a commu-

nity [17]. Since this problem arises in a broad rangeof applications, a large number of approaches havebeen proposed in the literature [10, 13, 15].

In contrast to the traditional problem, recent ap-plications, such as mobile and social network anal-yses, give rise to intriguing new challenges [6]. Inthis context, assumably, data encapsulates multipleaspects of human interactions, e.g., those amongcoworkers and those among friends. The multi-ple aspects of relationships can be represented by amulti-layer graph comprised of multiple interdepen-dent graphs, where each graph represents an aspect

⇤Jae-Gil Lee is the corresponding author.

of the relationships. Therefore, great e↵orts havebeen made to solve the challenge of community de-tection in multi-layer graphs.

The goal of this survey is to provide a timely re-mark on the status of improving community detec-tion in multi-layer graphs. We o↵er a brief overviewof primary algorithms and classify them with re-spect to their underlying strategies.

The rest of this paper is organized as follows. Sec-tion 2 discusses the background information regard-ing multi-layer graphs. Section 3 presents multi-layer graph datasets used in recent studies. Section4 introduces community detection approaches intwo-layer graphs. Section 5 introduces communitydetection approaches in multi-layer graphs. Section6 presents comparisons of community detection ap-proaches in multi-layer graphs. Section 7 suggestspromising future research directions in multi-layergraphs. Finally, Section 8 concludes this survey.

2. BACKGROUNDIn this section, we discuss some background in-

formation about multi-layer graphs. We presentthe formal definitions of multi-layer graphs [14] inSection 2.1. Then, we briefly summarize the com-munity detection approaches for single graphs andthe challenges for developing those for multi-layergraphs in Section 2.2. To enhance the readabilityof this survey, frequently used symbols are summa-rized in Table 1.

Table 1: The summary of symbols.

Symbol Description

G a graphV a set of verticesS a set of attributesL a set of layersn the number of verticesm the number of edgesk the number of clusterst the number of attributesl the number of layers

SIGMOD Record, September 2015 (Vol. 44, No. 3) 37

Page 2: Community Detection in Multi-Layer Graphs: A Surveymontesi/CBD/Articoli/SurveyComunityDectionMultil… · Community Detection in Multi-Layer Graphs: A Survey Jungeun Kim, Jae-Gil

2.1 Multiple Network Models

2.1.1 Multi-Layer GraphsThe definition of a multi-layer graph depends on

that of a single-layer graph.

Definition 1. [14] A single-layer graph is aweighted graph (V, w) where V is a set of verticesand w is a set of edge weights: (V ⇥ V ) ! [0,1].

Figure 1 shows an example of a single undirectedgraph (without specifying the edge weights). As-sume that it is a subgraph of Facebook’s network.Each vertex represents the user, and each edge de-notes the relationship between users. The weight ofthe edge is the strength of the relationship.

A C

B D

Figure 1: A single-layer graph.When we start characterizing multi-layer graphs,

understanding which vertices in one graph corre-spond to vertices in the other is important becausethe multi-layer graph is comprised of multiple inter-

dependent graphs. A node mapping can formalizethis task.

Definition 2. [14] A node mapping from a graphlayer L1 = (V1, w1) to another graph layer L2 =(V2, w2) is a function f : V1 ⇥ V2 ! [0, 1]. For eachu 2 V1, the set C(u) = {v 2 V2|f(u, v) > 0} is theset of V2 vertices corresponding to u.

Figure 2 illustrates an example of a multi-layergraph. Assume that layer 1 is the Facebook networkand layer 2 is the Twitter network. If the usersin the Facebook network also have an account onTwitter, then the Twitter network can be used torepresent these users and their relationships. Notethat every user can be identified by one account oneach layer. This graph is generally called a pillar

multi-layer graph since every user can be seen asa pillar traversing every layer denoting the level ofphysical reality [14]. A pillar multi-layer graph isformally defined by node mapping, |C(u)| 2 {0, 1}.

A C

B D

A C

B D

Graph Layer 1

Graph Layer 2

Figure 2: A pillar multi-layer graph.A generic multi-layer graph is formally defined

based both on a set of single layers and a matrix ofnode mappings.

Definition 3. [14] A multi-layer graph is a tupleMLN = (L1, ..., Ll, IM) where Li = (Vi, wi), i 21, ..., l are graph layers and IM (Identity Mapping)is an l ⇥ l matrix of node mappings, with IMi,j :Vi ⇥ Vj ! [0, 1].

C

B

A

A’

A’’

A’’’

B’

D

Graph Layer 1

Graph Layer 2

E

Figure 3: A general multi-layer graph.Figure 3 shows an example of a multi-layer graph

that is more complex than a pillar multi-layergraph. For example, in Figure 3, on layer 1, A isan account of a user in FriendFeed, and A0, A00, andA000 on layer 2 are the social media accounts that theuser has registered. We call this network a general

multi-layer graph. Note that a vertex in one graphlayer corresponds to multiple vertices in another.This case is typically shown in social media aggre-

gators, such as FriendFeed, which support varioussocial network services with a single access pointas long as a user has registered for those services.Thus, vertices do not necessarily denote users, butmore generally, accounts.

2.1.2 Heterogeneous Information NetworksThe definition of a heterogeneous information

network depends on that of an information network.

Definition 4. [23, 24] An information network isdefined as a directed graph G = (V, E) with an ob-ject type mapping function � : V ! A and a linktype mapping function : E ! R, where each ob-ject v 2 V belongs to one particular object type�(v) 2 A, and each link e 2 E belongs to a partic-ular relation (e) 2 R.

If the number of object types |A| > 1 or the num-ber of link types |R| > 1, the network is called aheterogeneous information network [23, 24]. A bib-liographic information network is a typical example,containing objects from four types of entities: pa-pers, venues, authors, and terms. Each paper hasdistinct types of links to a set of authors, a venue,a set of words, a set of citing papers, and a set ofcited papers, respectively.

A heterogeneous information network can betranslated to a general multi-layer graph in Defini-tion 3, and vice versa. More specifically, an objecttype corresponds to a layer Li, the links within anobject type correspond to wi, and the links between

38 SIGMOD Record, September 2015 (Vol. 44, No. 3)

Page 3: Community Detection in Multi-Layer Graphs: A Surveymontesi/CBD/Articoli/SurveyComunityDectionMultil… · Community Detection in Multi-Layer Graphs: A Survey Jungeun Kim, Jae-Gil

(a) Work. (b) Lunch. (c) Facebook. (d) Friend. (e) Coauthor.

Figure 4: An example of a multi-layer graph called the AUCS dataset.

di↵erent object types correspond to IMi,j. That is,these two definitions are syntactically equivalent.

Despite this equivalence, the two definitions areactually being used for slightly di↵erent meanings.General multi-layer graphs emphasize multiple types

of relationships between similar types of entities.For example, in Figure 3, all the entities are so-cial media accounts. On the other hand, hetero-geneous information networks emphasize heteroge-

neous types of entities connected by di↵erent rela-tionships. Overall, we rely on the typical meaningof multi-layer graphs in this paper.

2.2 Current Status and Challenges

2.2.1 Community Detection in Single GraphsMany community detection approaches have been

proposed for single-layer graphs. Fortunato [7]and Schae↵er [19] conducted really extensive sur-vey on this topic. Representative algorithms includegraph partitioning algorithms, modularity-based al-gorithms, spectral algorithms, and structure defini-tion algorithms [7, 19]. The objective of graph par-

titioning algorithms is to divide the vertices suchthat cut size is minimal. Cut size is determinedby the number of edges lying between partitions.The goal of modularity-based algorithms is to parti-tion the vertices such that modularity is maximal.Modularity is defined by the fraction of the edgesthat fall within the given groups minus the expectedsuch fraction if edges were distributed at random.Spectral algorithms partition the graph into com-munities using the eigenvectors of graph matrices.A graph Laplacian matrix is typically used for thegraph matrix. Structure definition algorithms dis-cover communities such that a very strict structuralproperty is satisfied. In other words, they find com-munities satisfying the meta definitions of a commu-nity such as k-clique, r-quasi clique, and s-plex.

2.2.2 Challenges for Multi-Layer GraphsIn contrast to the community detection problem

in single graphs, new challenges arise for communitydetection in multi-layer graphs. Intuitively, eachsingle layer has a piece of meaningful information

from its own perspective; however, one can expectimproved community detection results through theproper and e�cient merging of information in eachlayer. Thus, an important open question is how toexploit and fuse the multiple aspects of informationto generate improved understanding of vertices andtheir relationships. In addition, since we are con-fronted with managing multiple layers (often callednetworks of networks), scalability remains a signifi-cant challenge because of the larger resulting searchspaces [2].

3. MULTI-LAYER GRAPH DATASETSIn this section, we introduce various multi-layer

graph datasets. Figure 4 illustrates an example of amulti-layer graph called the AUCS dataset. In thisgraph, the multiple layers represent relationshipsbetween 61 employees of a University departmentin five di↵erent aspects: (i) coworking, (ii) havinglunch together, (iii) Facebook friendship, (iv) o✏inefriendship (having fun together), and (v) coauthor-ship. Popular datasets used in academic papers areas follows. Note that some layers are constructedby using attribute information. In these cases, anedge between two vertices is formed if attributesimilarity is higher than a given threshold. Thislist is also available at http://dm.kaist.ac.kr/

datasets/multi-layer-network/.• MIT Reality Mining [6]

This is a mobile phone dataset including 87 userson the MIT campus. Each layer represents therelationships defined by physical locations, blue-tooth scans, and phone calls, respectively.

• Enron Email [17]This is an email message dataset between employ-ees of the Enron corporation. It contains 200,399messages belonging to 158 members. One layercontains the relationships defined by the existenceof email communications, and the other containsthose defined by the similarity of text messages.

• Mobile Phone [6]This is a mobile phone dataset collected by NokiaResearch Center(NRC) Lausnne [8]. It contains

SIGMOD Record, September 2015 (Vol. 44, No. 3) 39

Page 4: Community Detection in Multi-Layer Graphs: A Surveymontesi/CBD/Articoli/SurveyComunityDectionMultil… · Community Detection in Multi-Layer Graphs: A Survey Jungeun Kim, Jae-Gil

about 200 mobile users in Lausanne, Switzerland.Each layer represents the relationships defined byphysical locations, bluetooth scans, and phonecalls, respectively.

• Cora [6]This is a bibliographic dataset including 292 re-search papers. Layers represent three di↵erentresearch fields such as natural language process-ing, data mining, and robotics, respectively.

• IMDB [1]This is a movie database managed by IMDB,which contains 300 vertices and 18,368 edges.Vertices represent actors and edges are formed iftwo actors worked together. In this dataset, thereexists four layers: (i) the first year of collabora-tion, (ii) the last year of collaboration, (iii) theaverage incomes, and (iv) the average number ofsold tickets. In other words, four layers have thesame edges but di↵erent edge labels.

• Airline Transportation Multiplex [3]This is a network composed of the airline operat-ing in Europe. It contains 450 vertices and 3,588edges. This data includes total thirty-seven layersand each one corresponds to a di↵erent airline.

• SIAM Journal [25]This is a bibliographic dataset containing 5,022vertices which are papers. Five layers are formedfrom five di↵erent similarity matrices. First threeare defined by the text similarity based on theabstract, title, and keyword, respectively. Theother two are obtained by the number of commonauthors between papers and the citation relation.

• Political Blogs [26] [28]This is a webblog network on US politics, whichcontains 1,490 vertices and 19,090 edges. Ver-tices represent webblogs, and edges hyperlinks be-tween webblogs. Each blog in the dataset has anattribute denoting its political position as eitherliberal or conservative. Thus, one layer depictsexplicit hyperlinks, and the other depicts politi-cal preferences.

• CiteSeer [12] [18] [21]This is a citation network of computer sciencepublications containing 3,312 vertices and 4,536edges. One layer contains the relationships de-fined by citation, and the other contains thosedefined by content similarity.

• US Stock Market [27]This is a US stock market graph database con-taining 11 graph layers. On average it contains3,636 vertices and 206,747 edges. Each layer is

a graph made by setting the di↵erent correlationcoe�cient value based on stock price.

• Arxiv Publication Database [28]This is a bibliographic dataset including 13,396vertices and 673,800 edges. Each layer corre-sponds to citation relationships with di↵erent re-search topics. Thus, the number of layers is equiv-alent to that of topics.

• Flickr [17] [18]This is a social network with tagged photos in-cluding 16,710 vertices and 716,063 edges. Eachvertex represents a user, and the edge exists if theuser is in another’s contact list or if they favor thesame images. In other words, one layer representsthe relationships defined by the explicit contactlist, and the other represents those defined by thecommon interest retrieved from photo sharing be-tween two users.

• DBLP [1] [20] [26] [28]This is a bibliographic dataset including up to108,030 vertices and 276,658 edges. A vertexstands for an author, and an edge is formed if twoauthors write a research paper together or sharethe same research interest. Using this dataset,we can make a two-layer graph as well as a gen-eral multi-layer graph whose layers are more than2. In the two-layer graph, one layer contains therelationships defined by coauthorship while theother contains those defined by the sameness ofthe research interest. In the general multi-layergraph, each layer corresponds to the coauthor-ships in the di↵erent venues (conferences or jour-nals).

• LastFm [20]This is a social music network which consists of272,412 vertices and 350,239 edges. One layerrepresents the relationships based on friendshipsbetween users, and the other represents thosebased on the sameness of the musical tastes.

• Higgs Twitter [5]This is the multiplex of social interactions inTwitter including 456,631 vertices and 16,070,185edges. Each layer represents friendship, replying,mentioning, and retweeting, respectively.

• Wikipedia [18]This dataset is from the static dump of EnglishWikipedia pages. It consists of 3,580,013 verticesand 162,085,383 edges. One layer contains therelationships defined by explicit page links, andthe other contains those defined by text similaritybetween pages.

40 SIGMOD Record, September 2015 (Vol. 44, No. 3)

Page 5: Community Detection in Multi-Layer Graphs: A Surveymontesi/CBD/Articoli/SurveyComunityDectionMultil… · Community Detection in Multi-Layer Graphs: A Survey Jungeun Kim, Jae-Gil

Table 2: The summary of multi-layer datasets.No. Name # Vertices # Edges # Layers Type Publicly Available

1 AUCS 61 620 5 pillar Y 1

2 MIT Reality Mining [6] 87 - 3 pillar Y 2

3 Enron Email [17] 158 200,399 2 pillar Y 3

4 Mobile Phone [6] 200 - 3 pillar N5 Cora [6] 292 - 3 pillar Y 4

6 IMDB [1] 300 18,368 4 pillar 4 5

7 Airline Transportation Multiplex [3] 450 3,588 37 pillar Y 6

8 SIAM Journal [25] 5,022 - 5 pillar N9 Political Blogs [28] [26] 1,490 19,090 2 pillar Y 7

10 CiteSeer [12] [18] [21] 3,636 4,536 2 pillar Y 8

11 US stock market [27] 3,312 206,747 11 pillar N12 Arxiv publication [28] 13,396 673,800 7 pillar N13 Flickr [17] [18] 16,710 716,063 2 pillar Y 9

14 DBLP [1] [20] [26] [28] 108,030 276,658 various pillar Y 10

15 LastFm [20] 272,412 350,239 2 pillar 4 11

16 Higgs Twitter [5] 456,631 16,070,185 4 pillar Y 12

17 Wikipidia [18] 3,580,013 162,085,383 2 pillar N18 FriendFeed [4] 9,717,499 15,000,000 various general Y 13

• FriendFeed [4]This is one of social media aggregators. It con-tains about 400,000 users and 1 million posts with15 million subscription relationships. In general,vertices stand for users, and edges various rela-tionships between users. On the other hands,vertices can also be posts, and edges relationshipsbetween users and posts. Thus, layers can be var-ious, for example, the types of services as wellas the di↵erent relationships between users andposts.Table 2 shows a brief summary of the multi-layer

network datasets. If certain information of datasetsdoes not exist in the reference papers, we fill in theblank with “�”. For the “public available” column,if we can directly get the dataset through the web,we assign “Y”. If we need extra e↵orts (e.g., crawl-ing) to get datasets, we assign “4”.

4. COMMUNITY DETECTION INTWO-LAYER GRAPHS

In this section, we introduce community detec-tion algorithms in two-layer graphs. All algorithms1http://sigsna.net/impact/datasets/

2http://realitycommons.media.mit.edu/index.html

3http://bailando.sims.berkeley.edu/enron_email.html

4http://www.cs.umass.edu/

~

mccallum/data.html

5http://imdb.com

6http://complex.unizar.es/

~

atnmultiplex/

7http://networkdata.ics.uci.edu/data.php?id=102

8http://www.cs.umd.edu/projects/linqs/projects/lbc/

index.html

9http://staff.science.uva.nl/

~

xirong/index.php?

n=DataSet.Flickr3m

10http://informatik.uni-trier.de/

~

ley/db/

11http://www.last.fm

12http://www.plexmath.eu/?page_id=320/

13http://sigsna.net/impact/datasets/

described in this section can only support two-layer graphs and mostly consider structural and at-tribute information. One layer represents the orig-inal topology of a graph as structural information,and the other layer is derived by calculating the sim-ilarity between the vertices based on their attributeinformation. Such graphs with additional attributeinformation do not seem to conform to the defini-tion multi-layer graphs. However, they have beenregarded as a typical case of two-layer graphs sinceattribute information can be easily transformed toa layer—e.g., by creating an edge between verticesif the attribute similarity between them is above acertain threshold. Thus, we categorize such graphsinto two-layer graphs.

4.1 Cluster ExpansionLi et al. [12] proposed a hierarchical community

detection algorithm based on both relations andtextual attributes using the cluster expansion phi-losophy. This algorithm focuses on quickly find-ing initial cores as seeds of communities and ex-panding the cores into the communities in orderto enhance scalability. In this paper, the CiteSeerdataset (No.10 in Table 2) was used. In the No.10dataset, one layer represents the citation relation-ship between papers, and the other represents thedegree of content similarity of the titles and ab-stracts of papers.

The algorithm consists of four major steps: coreprobing, core merging, a�liation, and classification.Figure 5 shows an overview of the algorithm (with-out specifying attribute information).

First, structural information is used solely to findcores, denoted as Ki, using the frequent itemsetmining method derived from the Apriori algorithm.

SIGMOD Record, September 2015 (Vol. 44, No. 3) 41

Page 6: Community Detection in Multi-Layer Graphs: A Surveymontesi/CBD/Articoli/SurveyComunityDectionMultil… · Community Detection in Multi-Layer Graphs: A Survey Jungeun Kim, Jae-Gil

After the set of all outgoing relations is listed foreach document, the process of finding cores can betransformed into that of computing frequent item-sets. Each core will be used as a community seed.This step will enhance the scalability of the sub-sequent steps since the analysis scope is limited toeach core. Then, cores are merged based on textualanalysis using text similarity (i.e., attribute infor-mation). In the core merging step of Figure 5, K3

and K4 are merged since they are linked and alsotopically relevant (not shown in the figure). In thea�liation step, initial communities are constructedthrough relation propagation. For each vertex vi

in a cluster C, the algorithm finds all vertices thatare adjacent to vi and adds them to C. Now everymerged Ki is expanded to Ci in Figure 5. Sincefinding communities based solely on relation prop-agation may generate false hits, communities arerefined based on classification using attribute anal-ysis. In this step, LDA is used to reduce dimen-sionality, and all vertices are transformed into thefeature vectors to represent their topical positions.Then, vertices are classified based on the SVM, andnegatively labeled vertices are removed. For exam-ple, vD is dropped from C1.

Original

Core Merging

Classification

Core Probing

Affiliation

A

B

C

D

E G

F

H

I

J

K

L

A

B

C

D

E G

F

H

I

J

K

L

A

B

C

D

E G

F

H

I

J

K

L

A

B

C

D

E G

F

H

I

J

K

L

A

B

C

D

E G

F

H

I

J

K

L

Figure 5: The overview of the community discoveryalgorithm [12].

4.2 Matrix FactorizationQi et al. [17] proposed a community detection al-

gorithm based both on link structure and edge con-tent using the Edge-Induced Matrix Factorization

(EIMF). In this paper, the Enron email and Flickrdatasets (No.3 and No.13 in Table 2) were used. Inthe No.13 dataset, one layer depicts the relation-ships defined by the contact list, and the other de-picts those defined by the favorite photo shared bythe users.

The main contribution of this algorithm is us-ing edge content for the community detection pro-cess. Edge content can be a useful source of in-formation when nodes interact with multiple com-munities, since it can assist in distinguishing be-tween the di↵erent interactions of nodes. Figure 6shows an example of an edge-based social networkin the No.13 dataset. Intuitively, edges can be di-vided into two di↵erent groups, such as a family(AB, BC, CD, AD) and people with similar musi-cal interests (AE, AF ). Moreover, it is clear thatthe user A belongs to both communities based onthe edge content, whereas the same finding is un-clear in terms of a vertex-centric perspective.

A

D

E

C

B

F

Family Music

Figure 6: An example of an edge-based social medianetwork [17].

This algorithm consists mainly of two parts: theEIMF based purely on the link structure, and in-corporation of the edge content into the EIMF.

In the first part, an incidence matrix is formed us-ing the link structure. Then, the latent edge matrixE is constructed from the incidence matrix usingmatrix factorization, which is obtained by minimiz-ing Eq.(1).

Ol(E) =k ET · E ·4� � k2F (1)

Here, E is a k ⇥ m matrix with each column cor-responding to a k-dimensional feature vector for anedge, � is an m⇥n incident matrix, and � is a nor-malization of the incident matrix such that everycolumn-wise sum becomes 1. Then, by the defi-nition of matrix factorization, Eq.(1) indicates theerror of the approximation by E when comparedwith the link structure �. Since each column of Erepresents the membership of the edge to k commu-nities, this procedure of matrix factorization can beregarded as a community detection technique usingthe link structure.

42 SIGMOD Record, September 2015 (Vol. 44, No. 3)

Page 7: Community Detection in Multi-Layer Graphs: A Surveymontesi/CBD/Articoli/SurveyComunityDectionMultil… · Community Detection in Multi-Layer Graphs: A Survey Jungeun Kim, Jae-Gil

In the second part, two approaches are proposedin order to consider the edge content by way of re-flecting the similarity among the edge content inmatrix factorization. The former approach is to op-timize Eq.(2).

O(E) = Ol(E) + � · Oc(E) (2)

Here, Oc(E) denotes the error of the approxima-tion by E when compared with the similarity ofedge content instead of the link structure, and � isa weighting factor to consider the degree of impor-tance of the link structure and edge content. Theother approach is developed to avoid the necessityof tuning the parameter �, and the reader can referto [17] for the details.

4.3 Unified DistanceZhou et al. [28] proposed a community detection

algorithm, called SA-Cluster, based on both struc-tural and attribute similarities using a unified dis-tance measure. In this paper, political blogs and theDBLP datasets (No.9 and No.14 in Table 2) wereused. In the No.14 dataset, one layer representsthe relationships created by coauthorship betweenresearchers, and the other layer represents those de-fined by the similarity of research interests.

The main contribution of SA-Cluster is twofold:(1) a unified distance measure to fuse structural andattribute similarities; (2) a weight self-adjustmentmethod to modulate the degree of importance ofstructural and attribute similarities.

First, the unified distance measure is formu-lated based on the attribute-augmented graph us-ing the Random Walk with Restart (RWR). Figure7a shows the original coauthor network, and Fig-ure 7b shows the attribute-augmented graph withresearch topics. In the attribute-augmented graph,attribute vertices are added to represent attributevalues, and the original vertices are connected to thecorresponding attribute vertices. For example, theresearch topics, “Skyline” and “XML”, are added asattribute vertices (two shaded vertices L and M inFigure 7b). Then, the researchers are connected viaattribute vertices if they are interested in the sameresearch topic. Intuitively, the larger the numberof common attribute values between two vertices,the higher the degree of similarity between the twovertices, since more random walk paths can exist.

Second, the graph clustering algorithm that fol-lows k-medoids clustering is performed based on theunified distance measure. More importantly, weightself-adjustment is conducted in each iteration of thealgorithm. The weight of an attribute ai in the

J K

I

H

D

C

A

B

F G

E

Skyline Skyline

Skyline

Skyline

XML

XML

XML

XML

XML

XML XML

(a) A coauthor graph.

J K

I

H

D

C

A

B

F G

E

L

M

Skyline Skyline

Skyline

Skyline

XML

XML

XML

XMLXML

XML XML

Skyline

XML

XML

(b) An augmented graph.

Figure 7: A coauthor network example with a topicattribute [28].

(t + 1)th iteration is computed as Eq.(3).

wt+1i =

12(wt

i + �wti) (3)

The weight increment �wi is measured by a ma-jority voting mechanism. The voting mechanismcounts the number of vertices within clusters thatshare the same attribute values for estimating clus-tering tendency of the attribute, and then adjuststhe attribute weight. That is, if a large number ofvertices within clusters have the same value of anattribute ai, it denotes that ai has high clusteringtendency and increases the weight wi of ai accord-ingly.

4.4 Model-Based MethodXu et al. [26] proposed a model-based community

detection approach based on both structural andattribute aspects of a graph. In this paper, thedatasets and graph layers were the same as thoseused in the authors’ previous work [28].

The key point of this approach is the use of aprobabilistic model that fuses both structural andattribute information instead of an artificial dis-tance measure. The algorithm consists of two majorparts: the construction of the probabilistic modeland a variational approach to solve the model.

In the first part, a Bayesian probabilistic model isproposed for community detection over a clusteredattributed graph. The clustered attributed graphused in this model is represented by X, Y, and Z,where X = [Xij ] is an n ⇥ n adjacency matrix, Y= [Y i

t ] is an n ⇥ t attribute matrix, and Z = [Zi]is a n ⇥ 1 cluster vector that contains the labelof a given vertex’s cluster. This model defines ajoint probability distribution p(↵, ✓,�,Z|X,Y) forall possible communities and attributed graphs. ↵,✓, � are parameters for generating the probabilis-tic model, where ↵ denotes the vertex distributionof each cluster, ✓ implies the attribute distributionof each cluster, and � denotes the edge occurrenceprobabilities between clusters.

Based on the model, the problem of commu-nity detection is transformed into a probabilistic in-ference problem, finding the maximum-a-posteriori

SIGMOD Record, September 2015 (Vol. 44, No. 3) 43

Page 8: Community Detection in Multi-Layer Graphs: A Surveymontesi/CBD/Articoli/SurveyComunityDectionMultil… · Community Detection in Multi-Layer Graphs: A Survey Jungeun Kim, Jae-Gil

(MAP) configuration of communities Z with condi-tions X and Y, as formulated by Eq.(4).

Z⇤ = argmaxZ

p(Z|X,Y) (4)

However, it is computationally infeasible to find theglobal maximum for a large set of Z.

In the second part, a variational algorithm is in-troduced to solve the probabilistic inference prob-lem. The major principle is to approximate the dis-tribution p(↵, ✓,�,Z|X,Y) using a variational dis-tribution q(↵, ✓,�,Z). Additionally, if we restrictthe variational distribution to a family of distri-butions that factorize as Eq.(5), finding the globalmaximum translates as finding the local maximumin Eq.(6). Please refer to [26] for the details of themathematical derivations.

q(↵, ✓,�,Z) = q(↵)q(✓)q(�)Y

i

q(Zi) (5)

Z⇤ = argmaxZ

p(Z|X,Y)

= [argmaxZ1

q(Z1), argmaxZ2

q(Z2), ..., argmaxZN

q(ZN )]

(6)

4.5 Pattern MiningSilva et al. [21] proposed a community detection

algorithm based on structural correlation patternmining, called SCPM. In this paper, the CiteSeer,DBLP, and LastFm datasets (No.10, No.14, andNo.15 in Table 2) were used. In the No.15 dataset,one layer contains friendships between users, andthe other contains their shared musical preferences,e.g., favorite singers.

The main contribution of SCPM is to uncoverthe interaction between vertex attributes and densesubgraphs using both frequent itemset mining andquasi-clique mining. Here, a dense subgraph is de-fined by a �-quasi-clique. The structural correlation

pattern is formed if the proportion of the vertices inthe dense subgraph that contain a given set of at-tribute values is above a threshold. In more detail,the algorithm first finds a frequent itemset S (i.e., aset of attribute values appearing together in manyvertices) from the entire graph G and obtains thesubgraph G0 induced by S. Then, it identifies a �-quasi-clique Q from G0. Finally, the structural cor-

relation of S is calculated by checking whether eachvertex in G0 belongs a quasi-clique Q. A structuralcorrelation pattern should preserve a high value ofstructural correlation.

Figure 8 shows a toy example of SCPM. Figure8a contains a set of attribute values for each ver-tex as well as an entire graph; Figure 8b depictstwo examples of dense graphs. For a frequent item-set {A,B}, a subgraph {6,7,8,9,10,11} is induced

from the entire graph, since these vertices include{A,B}. Then, a dense graph (the second one inFigure 8b) is obtained from this subgraph. Last,({A,B}, {6,7,8,9,10,11}) is a structural correlationpattern with structural correlation 1, implying thatthe value set {A,B} appears on every vertex of thesubgraph {6,7,8,9,10,11}.

vertex

1

2

3

4

5

6

7

8

9

10

11

1 4

5

6

2 3 7

9

8

10

11

values

A, C

A

A, C, D

A, D

A, E

A, B, C

A, B, E

A, B

A, B

A, B, D

A, B

(a) The graph with vertex attributes.

4

5

6

3

6

7

9

8

10

11

(b) The dense subgraphs.

Figure 8: An example of structural correlation pat-tern mining [21].

However, simply combining frequent itemset min-ing and quasi-clique mining will su↵er from highcomputational overhead since the two problems areknown to be #P-hard. Thus, two pruning tech-niques are proposed: (1) vertex pruning and (2)candidate set pruning. The former eliminates ver-tices that do not belong to quasi-cliques in the graphderived by a given attribute-value set or any quasi-clique in each iteration. The latter excludes can-didate sets after the (i + 1)th step if they do notsatisfy the condition in the ith step.

4.6 Graph MergingRuan et al. [18] proposed a community detection

approach, called CODICIL, to combine structuraland attribute information using the graph mergingprocess. In this paper, the Wikipedia, Flickr, andCiteSeer datasets (No.17, No.13, and No.10 in Table2) were used. In the No.17 dataset, one layer repre-sents explicit hyperlinks, and the other representscontent similarities.

The main contribution of this algorithm is tostrengthen the community signal by eliminatingnoise in the link structure using content informa-tion. Figure 9 shows the work flow of the proposedapproach. This approach consists of four steps:creating content edges, combining edges, samplingedges with bias, and clustering. First, for each ver-tex vi, its k most content-similar neighbors are com-puted by calculating cosine similarity. Then, con-

44 SIGMOD Record, September 2015 (Vol. 44, No. 3)

Page 9: Community Detection in Multi-Layer Graphs: A Surveymontesi/CBD/Articoli/SurveyComunityDectionMultil… · Community Detection in Multi-Layer Graphs: A Survey Jungeun Kim, Jae-Gil

tent edges are formed between the vertex vi andits top-k neighbors. Second, the newly-created con-tent edge set and the original topological edge setare simply unified. Third, for each vertex vi, theedges to retain are selected from its local neigh-borhood based on either cosine similarity or Jac-card similarity. Last, clustering is performed on themerged graph. Since the process of merging graphsis performed independently of community detectionalgorithms, any conventional community detectionalgorithms can be applied.

Term vectors Content edges

Edge union

Topological

edges

Edge subset

Vertices

Clustering

1. Create content edges

2. Combine edges

3. Sample edges with bias

4. Cluster

Figure 9: The work flow of CODICIL [18].

5. COMMUNITY DETECTION INMULTI-LAYER GRAPHS

In this section, we introduce community detec-tion algorithms that can support multi-layer graphscontaining more than or equal to two layers.

5.1 Matrix FactorizationTang et al. [25] and Dong et al. [6] proposed graph

clustering algorithms for multi-layer graphs basedon matrix factorization. In these papers, the MITReality Mining, mobile phone, Cora, and SIAMJournal datasets (No.2, No.4, No.5, and No.8 in Ta-ble 2) were used. In the No.2 and No.4 datasets, thelayers represent the relationships defined by physi-cal locations, bluetooth scans, and phone calls, re-spectively. In the No.5 dataset, the layers con-tain three di↵erent research domains: natural lan-guage processing, data mining, and robotics. In theNo.8 dataset, the layers depict five di↵erent sim-ilarity matrices retrieved from the abstract, title,keywords, author, and citation fields.

The main idea of these two algorithms is to fusedi↵erent information by extracting common factorsfrom multiple layers, which may then be used bygeneral clustering methods. The major di↵erence isthat Tang et al. [25] approximates adjacency matri-

ces while Dong et al. [6] approximates graph Lapla-

cian matrices. To achieve this goal, they approxi-mate each layer through a low-rank matrix factor-ization O ⇡ P⇤P t, where O is an object matrixthey try to approximate, which is either an adja-cency matrix or a Laplacian matrix, P is an n⇥ neigenvector matrix, and ⇤ is an n⇥n eigenvalue ma-

trix. When multiple layers are being considered, Ois naturally extended to O(i), for i = 1, . . . , l. Also,a common factor matrix should be reflected by themultiple factorizations. Hence, the objective func-tion is defined as minimizing Eq.(7), where P is ann⇥ n matrix representing the common factor of alllayers, ⇤(i) is an n ⇥ n matrix capturing the char-acteristics of ith layer, || · || is the Frobenius norm,and ↵ is a regularization parameter.

G =1

2

l!

i=1

||O(i) � P⇤(i)P T ||2F +↵

2(

l!

i=1

||⇤(i)||2F + ||P ||2F )

(7)

However, the solution of this objective functionis not jointly convex in P and ⇤(i). Thus, theyproposed an alternative method that transforms theproblem of finding the global minimum into that offinding the local minimum. In brief, they first fixP and optimize ⇤(i), and then fix ⇤(i) and optimizeP . This procedure is repeated until the solutionconverges.

5.2 Pattern MiningZeng et al. [27] proposed a subgraph mining algo-

rithm for finding quasi-cliques that appear on multi-ple layers with a frequency above a given threshold.In this paper, the US stock market database (No.11in Table 2) was used. In the No.11 dataset, eachlayer represents a graph formed by di↵erent corre-lation coe�cient values in terms of stock prices.

The main contribution of this algorithm is to findcross-graph quasi-cliques in a multi-layer graph thatare frequent, coherent, and closed. Generally, thecross-graph quasi-clique has been defined as a set ofvertices belonging to a quasi-clique that appears onall layers and must be the maximal set [16]. How-ever, this algorithm does not limit the minimumsupport to be 100%, meaning that it attempts tofind quasi-cliques on above a certain percentage ofthe layers in a multi-layer graph. The final outputdoes not contain a quasi-clique Q if any supersetof Q forms a quasi-clique with the same support,because the output must be closed.

To satisfy this goal, the algorithm first convertsthe subgraphs into their canonical forms. Sincethe algorithm does not take the exact topology ofa quasi-clique into account as long as it satisfiesgiven properties, the subgraph can be representedby the minimum string with the assumption thatall vertices have the total order. Then, the algo-rithm enumerates feasible candidates for �-quasi-cliques by using the DFS strategy with pruningtechniques. Finally, the algorithm selects closed �-quasi-cliques based on the closure-checking scheme.A naive approach of the closure-checking scheme

SIGMOD Record, September 2015 (Vol. 44, No. 3) 45

Page 10: Community Detection in Multi-Layer Graphs: A Surveymontesi/CBD/Articoli/SurveyComunityDectionMultil… · Community Detection in Multi-Layer Graphs: A Survey Jungeun Kim, Jae-Gil

scans all �-quasi-cliques, and then checks whetherthose quasi-cliques can be subsumed by other quasi-cliques. Since this naive approach is very costly, thealgorithm adopts an e�cient variational approachusing the enumeration tree satisfying the conditionthat a descendant must subsume an ancestor. Thekey principle of the variational approach is to con-duct the closure checking for each quasi-clique Qafter all of its descendants have been processed.

Boden et al. [1] proposed a graph clustering algo-rithm in multi-layer graphs with edge labels, calledMiMAG. In this paper, the IMDM, Arxiv, andDBLP datasets (No.6, No.12, and No.14 in Table 2)were used. In the No.6 dataset, each layer depictsdi↵erent information about movies in which two ac-tors star together. In the No.12 or No.14 datasets,each layer represents the citation or coauthorshiprelationships in di↵erent topics or conferences.

The main contribution of MiMAG is to findclusters, called MLCS (Multi-Layer Coherent Sub-graph), satisfying both aspects of structural densityand edge label similarity. In order to achieve thestructural density of MLCS, a �-quasi-clique modelis used. For the edge label similarity of MLCS, acell-based cluster model is used. Putting them to-gether, the algorithm finds the densely-connectedsubgraphs whose edge labels vary at most by a cer-tain threshold w. Such a subgraph is called anMLCS when it satisfies the two conditions on atleast two layers.

However, listing all MLCSs produces numeroussimilar clusters, possibly containing redundant in-formation, since MiMAG allows MLCSs to overlapwith each other. For example, in Figure 10, theclusters C2 and C3 are redundant since they sharea large number of the same vertices, i.e., {f, g, h},on layer 1.

In order to avoid redundancy, a redundancy rela-tion is introduced [1]. It defines a cluster C to beredundant with respect to a cluster C0 if the edgesof C and those of C0 overlap at a high rate and thequality of C0 is higher than that of C. The quality

a

b

c

d

e

f

h

g i

=({a,b,c,d,e},{ }),

Q( ) = 2.5

=({d,e,f,g,h},{ }), Q( ) = 5

=({f,g,h,i},{ }),

Q( ) = 5.3

Layer 1 ( )

Layer 2 ( )

Layer 3 ( )

Figure 10: An example of overlapping clusters [1].

of a cluster C = (V, L) is defined as Eq.(8), whereV is a set of vertices, L denotes a set of layers, and�L(V ) represents the average density of the clusteron L.

Q(C) =

(|V | · |L| · �L(V ), if |V | � 8 ^ |L| � 2�1, otherwise

(8)

Thus, MiMAG prefers the clusters that containmore vertices, contain more edges (i.e., denser), andappear on more layers. In Figure 10, it is formallydefined that C2 is redundant with respect to C3.

6. COMPARISONIn this section, we compare ten community detec-

tion algorithms introduced in Sections 4 and 5 withrespect to the following seven properties. When se-lecting properties, we refer to the popular propertiesof subspace clustering [9] since community detectionin multi-layer graphs resembles subspace clusteringin considering that both methods deal with multi-ple dimensions of datasets. Among the propertiesin [9], only those closely related to community de-tection in multi-layer graphs are selected. Then, ifa property is satisfied by none of the algorithms inthis paper, we exclude it. Overall, P.1⇠P.7 exceptP.4 correspond to a subset of the properties in [9].P.4 is inspired by a widely-known categorization ofattribute selection: the filter model and the wrap-per model [22].• Property 1: Multiple layer (l � 2) applica-

bilityAll algorithms we introduced are designed forthe community detection problem in multi-layergraphs. However, some algorithms support only atwo-layer graph, while the others support a multi-layer graph containing more than two layers.

• Property 2: Consideration of each layer’simportanceSince each aspect of relationships may have di↵er-ent importance in the real world, considering theimportance of each layer di↵erently is more appli-cable than assigning uniform importance. Thus,it is crucial to automatically find the importanceof each layer based on the layer’s characteristics.We call the importance of each layer its layer co-

e�cient.• Property 3: Flexible layer participation

The layer coe�cient can vary across communities.Thus, capturing the optimal layer coe�cient spe-

cific to each community is an important abilitysince it can distinguish the layer participation ineach community. In this case, an algorithm canfreely construct a community involved with a sub-

46 SIGMOD Record, September 2015 (Vol. 44, No. 3)

Page 11: Community Detection in Multi-Layer Graphs: A Surveymontesi/CBD/Articoli/SurveyComunityDectionMultil… · Community Detection in Multi-Layer Graphs: A Survey Jungeun Kim, Jae-Gil

set of layers rather than the entire or common setof available layers.

• Property 4: Algorithm insensitivitySome approaches are tightly coupled with a spe-cific graph clustering algorithm. This tight cou-pling may limit the freedom of users to choose agraph clustering algorithm. It is well-known thatcertain graph clustering algorithms tend to per-form particularly well or poorly on certain kindsof graphs [11]. Thus, an ability of applying anyclustering algorithms can improve the quality ofcommunity detection.

• Property 5: No layer locality assumptionSome approaches find initial communities from aspecific layer and then discover final communitiesby expanding and refining the initial communitieson other layers. Those algorithms are regardedto have locality assumption. In other words, itis assumed that all hidden communities can bederived from a local region of the layer.

• Property 6: Independence from the orderof layersThe results of community detection could be sen-sitive to the order of processing layers. This limi-tation typically happens when an algorithm pro-cesses layers sequentially with a dedicated policyfor each layer. In this case, an improper orderingwill result in lower-quality results.

• Property 7: Overlapping layersThe communities can be defined in an overlappingway across layers. That is, a vertex can belong toa community C1 on a certain set of layers but toa community C2 on another set of layers.Table 3 shows whether each algorithm supports

the seven properties. Our perspective is that moreY’s indicate that the algorithm has more power-ful and advanced features. Nevertheless, we cannotdefinitely say that the number of Y’s determinesthe superiority of an algorithm over another. Somealgorithm does not need all the properties if it isdesigned for specific environments. In addition, theperformance in terms of e�ciency or accuracy isnot addressed in Table 3, since an apple-to-applecomparison is not possible owing to the di↵erencesin problem settings. Overall, despite of these lim-itations, we believe that this comparison will giveuseful insights into various approaches.

7. FUTURE RESEARCH DIRECTIONSIn this section, we present a few challenging but

interesting future research directions.• General multi-layer graph applicability

Most algorithms covered are only applicable topillar multi-layer graphs. It is definitely true that

Table 3: The comparisons of community detectionalgorithms for multi-layer graphs.

Algorithm P1 P2 P3 P4 P5 P6 P7

Li et al. [12] N N N N N N YQi et al. [17] N Y N Y N N Y

Zhou et al. [28] N Y N N Y N NXu et al. [26] N Y N N Y N N

Silva et al. [21] N N Y N Y N YRuan et al. [18] N N N Y Y Y NTang et al. [25] Y Y N Y Y Y NDong et al. [6] Y Y N Y Y Y NZeng et al. [27] Y N Y N Y Y YBoden et al. [1] Y N Y N Y Y Y

they are simple but e↵ective to model variousreal-world situations. However, since a one-to-one correspondence between vertices of di↵erentlayers is not always guaranteed in the real world,it is more natural to consider an extension of thealgorithms into general multi-layer graphs.

• Uncertainty in multi-layer graphsMost studies assume that multi-layer graphs arealready cleaned completely. However, in the realworld, both vertices and edges could be noisy andambiguous [23]. For example, in bibliographicdatasets, di↵erent authors may have the samename. Even worse, information extracted fromthe real world may not be reliable. Thus, con-structing multi-layer graphs with entity resolu-tion and/or trustworthy analysis certainly en-hances the quality of the community detectionprocess.

• Scalability issuesIn the era of Big Data, the amount of avail-able information grows rapidly. Thus, scalabil-ity of both computational time and memory re-quirement has become a critical issue. Althoughmany researchers are trying to enhance scalabil-ity, most studies are being conducted with rel-atively small datasets because of unsatisfactoryscalability. One of feasible solutions is to im-plement parallel and distributed versions of acommunity detection algorithm. Another is touse sampling for feature-vector matrices of multi-layer graphs.

• Temporal analysisGraphs evolve over time, and the communities ingraphs also change as time goes by. Thus, un-derstanding and exploiting temporal characteris-tics are helpful for discovering deep insights aboutthe communities. Although many researchershave studied this problem for single-layer graphs,there is almost no work done for multi-layergraphs. The complexity of modeling the evolu-tion in multi-layer graphs is extremely high sinceit involves multiple layers and the connections be-tween the multiple layers.

SIGMOD Record, September 2015 (Vol. 44, No. 3) 47

Page 12: Community Detection in Multi-Layer Graphs: A Surveymontesi/CBD/Articoli/SurveyComunityDectionMultil… · Community Detection in Multi-Layer Graphs: A Survey Jungeun Kim, Jae-Gil

8. CONCLUSIONSIn this paper, we presented a comprehensive un-

derstanding of multi-layer graphs and the state-of-the-art community detection algorithms for multi-layer graphs. In recent applications, each entityoften engages in multiple relations. Hence, thequalified communities in multi-layer graphs can bediscovered by the way of exploiting and fusing allthese di↵erent aspects of information. We classi-fied community detection algorithms in multi-layergraphs into the six types based on their underlyingstrategies: cluster expansion, matrix factorization,unified distance, model-based, pattern mining, andgraph merging. These algorithms were comparedwith each other using seven properties. Also, vari-ous multi-layer graph datasets used in related stud-ies were summarized for ease of reference. Finally,we tried to provide insights and directions for fur-ther research in this domain.

9. ACKNOWLEDGMENTSThis research, “Geospatial Big Data Manage-

ment, Analysis and Service Platform TechnologyDevelopment,” was supported by the MOLIT (TheMinistry of Land, Infrastructure and Transport),Korea, under the national spatial information re-search program supervised by the KAIA (KoreaAgency for Infrastructure Technology Advance-ment) (15NSIP-B081011-02).

10. REFERENCES[1] B. Boden, S. Gunnemann, H. Ho↵mann, and T. Seidl.

Mining coherent subgraphs in multi-layer graphs withedge labels. In Proc. 2012 ACM SIGKDD Int’l Conf. onKnowledge Discovery and Data Mining, pages1258–1266, Beijing, China, Aug. 2012.

[2] S. V. Buldyrev, R. Parshani, G. Paul, H. E. Stanley, andS. Havlin. Catastrophic cascade of failures ininterdependent networks. Nature, 464(7291):1025–1028,Apr. 2010.

[3] A. Cardillo, J. Gomez-Gardenes, M. Zanin, M. Romance,D. Papo, F. del Pozo, and S. Boccaletti. Emergence ofnetwork features from multiplexity. Scientific Reports,3(1344), Feb. 2013.

[4] F. Celli, F. M. L. D. Lascio, M. Magnani, B. Pacelli, andL. Rossi. Social network data and practices: The case offriendfeed. In Proc. 3rd Int’l Conf. on SocialComputing, Behavioral Modeling, and Prediction, pages346–353, Bethesda, Maryland, Mar. 2010.

[5] M. D. Domenico, A. Lima, P. Mougel, and M. Musolesi.The anatomy of a scientific rumor. Scientific Reports,3(2980), Oct. 2013.

[6] X. Dong, P. Frossard, P. Vandergheynst, and N. Nefedov.Clustering with multi-layer graphs: A spectralperspective. IEEE Trans. on Signal Processing,60(11):5820–5831, Dec. 2011.

[7] S. Fortunato. Community detection in graphs. PhysicsReports, 486(3):75–174, Feb. 2010.

[8] N. Kiukkonen, J. Blom, O. Dousse, D. Gatica-Perez, andJ. Laurila. Towards rich mobile phone datasets: Lausannedata collection campaign. In Proc. The 7th Int’l Conf.on Pervasive Services, Berlin, Germany, July 2010.

[9] H.-P. Kriegel, P. Kroger, and A. Zimek. Clusteringhigh-dimensional data: A survey on subspace clustering,pattern-based clustering, and correlation clustering. ACM

Trans. on Knowledge Discovery from Data, 3(1):1–58,Mar. 2009.

[10] S. Lee, M. Ko, K. Han, and J.-G. Lee. On findingfine-granularity user communities by profiledecomposition. In Proc. 2012 ASONAM Int’l Conf. onAdvances in Social Networks Analysis and Mining,pages 631–639, Istanbul, Turkey, Aug. 2012.

[11] J. Leskovec, K. J. Lang, and M. W. Mahoney. Empiricalcomparison of algorithms for network communitydetection. In Proc. 19th Int’l World Wide Web Conf.,pages 631–640, Raleigh, North Carolina, Apr. 2010.

[12] H. Li, Z. Nie, W.-C. Lee, L. Giles, and J.-R. Wen.Scalable community discovery on textual data withrelations. In Proc. 17th Int’l Conf. on Information andKnowledge Management, pages 1203–1212, Napa Valley,California, Oct. 2008.

[13] S. Lim, S. Ryu, S. Kwon, K. Jung, and J.-G. Lee.LinkSCAN*: Overlapping community detection using thelink-space transformation. In Proc. 30th Int’l Conf. onData Engineering, pages 292–303, Chicago, Illinois, Apr.2014.

[14] M. Magnani and L. Rossi. The ML-model for multi-layersocial networks. In Proc. 2011 ASONAM Int’l Conf. onAdvances in Social Networks Analysis and Mining,pages 5–12, Kaohsiung City, Taiwan, July 2011.

[15] S. Moon, J.-G. Lee, and M. Kang. Scalable communitydetection from networks by computing edge betweennesson mapreduce. In Proc. 2014 Int’l Conf. on Big Dataand Smart Computing, pages 145–148, Bangkok,Thailand, Jan. 2014.

[16] J. Pei, D. Jiang, and A. Zhang. On mining cross-graphquasi-cliques. In Proc. 2005 ACM SIGKDD Int’l Conf.on Knowledge Discovery and Data Mining, pages228–238, Chicago, Illinois, Aug. 2005.

[17] G.-J. Qi, C. C. Aggarwal, and T. Huang. Communitydetection with edge content in social media networks. InProc. 28th Int’l Conf. on Data Engineering, pages534–545, Brisbane, Australia, Apr. 2012.

[18] Y. Ruan, D. Fuhry, and S. Parthasarathy. E�cientcommunity detection in large networks using content andlinks. In Proc. 22nd Int’l World Wide Web Conf., pages1089–1098, Rio de Janeiro, Brazil, May 2013.

[19] S. E. Schae↵er. Graph clustering. Computer ScienceReview, 1(1):27–64, Aug. 2007.

[20] A. Silva, W. M. Jr., and M. J. Zaki. Structuralcorrelation pattern mining for large graphs. In Proc. 8thWorkshop on Mining and Learning with Graphs, pages119–126, Washington D.C., Aug. 2010.

[21] A. Silva, W. M. Jr., and M. J. Zaki. Miningattribute-structure correlated patterns in large attributedgraphs. Proc. of the VLDB Endowment, 5(5):466–477,Sept. 2012.

[22] K. Sim, V. Gopalkrishnan, A. Zimek, and G. Cong. Asurvey on enhanced subspace clustering. Data Miningand Knowledge Discovery, 26(2):332–397, Feb. 2013.

[23] Y. Sun and J. Han. Mining heterogeneous informationnetworks: a structural analysis approach. ACM SIGKDDExplorations Newsletter, 14(2):20–28, Dec. 2013.

[24] Y. Sun, J. Han, X. Yan, P. S. Yu, and T. Wu. Pathsim:Meta path-based top-k similarity search in heterogeneousinformation networks. Proc. of the VLDB Endowment,4(11):992–1003, Aug. 2011.

[25] W. Tang, Z. Lu, and I. S. Dhillon. Clustering withmultiple graphs. In Proc. 9th Int’l Conf. on DataMining, pages 1016–1021, Mianmi, Florida, Dec. 2009.

[26] Z. Xu, Y. Ke, Y. Wang, H. Cheng, and J. Cheng. Amodel-based approach to attributed graph clustering. InProc. 2012 ACM SIGMOD Int’l Conf. on Managementof Data, pages 505–516, Indianapolis, Indiana, June 2012.

[27] Z. Zeng, J. Wang, L. Zhou, and G. Karypis. Coherentclosed quasi-clique discovery from large dense graphdatabases. In Proc. 2006 ACM SIGKDD Int’l Conf. onKnowledge Discovery and Data Mining, pages 797–802,Philadelphia, Pennsylvania, Aug. 2006.

[28] Y. Zhou, H. Cheng, and J. X. Yu. Graph clustering basedon structural/attribute similarities. Proc. of the VLDBEndowment, 2(1):718–729, Aug. 2009.

48 SIGMOD Record, September 2015 (Vol. 44, No. 3)


Recommended