Community Detection Algorithm and Community Quality Metric
Mingming Chen & Boleslaw K. SzymanskiDepartment of Computer ScienceRensselaer Polytechnic Institute
Community Structure
Many networks display community structure Groups of nodes within which connections are
denser than between them
Community detection algorithms
Community quality metrics
Two Related Community Detection Topics
Community detection algorithm LabelRank: a stabilized label propagation
community detection algorithm LabelRankT: extended algorithm for dynamic
networks based on LabelRank
A new community quality metric solving two problems of ModularityM. E. J. Newman, 2006;Newman and Girvan, 2004.
Xie, Chen, and Symanski, 2013.
Xie and Symanski, 2013.
LabelRank Algorithm
Four operators applied to the labels Label propagation operator Inflation operator Cutoff operator Conditional update operator
2
4
1
3
1
1
1
1
Question: NP=P ?Node 1: No;Node 2: No;Node 3: No;Node 4: Yes.
P1 (No)=3/4;P1 (Yes)=1/4. Node 1: No.
No
No
No
Yes
97P1 (No)=3/100;P1 (Yes)=97/100. Node 1: Yes.
Label Propagation Operator
where W is the n x n weighted adjacent matrix. P is the n x n label probability distribution matrix which is composed of n (1 x n) row vectors Pi, one for each node
Each element Pi(c) holds the current estimation of probability of node i observing label , where C is the set of labels (here, suppose C={1, 2, …, n}) Ex. Pi=(0.1, 0.2, …, 0.05, …)
To initialize P, each node is assigned a distribution of probabilities of all incoming edges
W P
c C
( )
( ) , s.t. 0.ici ic
ikk Nb i
wP c c C w
w
Label Propagation Operator Each node receives the label probability distribution
from its neighbors and computes the new distribution ( )
( )
( )( ) , .
ij jj Nb ii
ikk Nb i
w P cP c c C
w
P3= (0.25, 0, 0.25, 0, 0, 0, 0.25, 0.25, 0, 0)
P2= (0.25, 0.25, 0, 0, 0.25, 0.25, 0, 0, 0, 0) P4= (0.25, 0, 0, 0.25, 0, 0, 0, 0, 0.25, 0.25)
P1= (0.25, 0.25, 0.25, 0.25, 0, 0, 0, 0, 0, 0)
P1= (0.25, 0.125, 0.125, 0.125, 0.0625, 0.0625, 0.0625, 0.0625, 0.0625, 0.0625)
Inflation Operator Each element Pi(c) rises to the inth power:
It increases probabilities of labels with high probability but decreases that of labels with low probabilities during label propagation.
( )( ) ( )in
iinin i
ij C
P cP c P j
( 2)in in
P1= (0.25, 0.125, 0.125, 0.125, 0.0625, 0.0625, 0.0625, 0.0625, 0.0625, 0.0625)
P1= (0.129, 0.0323, 0.0323, 0.0323, 0.00806, 0.00806, 0.00806, 0.00806, 0.00806, 0.00806)
Cutoff Operator The cutoff operator on P removes labels that are
below the threshold with the help from Inflation Operator that decreases probabilities of labels with low probabilities during propagation.
efficiently reduces the space complexity from quadratic to linear.
r[0,1]r
r
P1= (0.129, 0.0323, 0.0323, 0.0323, 0.00806, 0.00806, 0.00806, 0.00806, 0.00806, 0.00806)
( 0.1)r r
P1= (0.129)
With r = 0.1, the average number of labels in each node is less than 3.
Conditional Update Operator At each iteration, it updates a node i only when it is
significantly different from its incoming neighbors in terms of labels:
where is the set of maximum probability labels at node i at the last step. returns 1 if and 0 otherwise. ki is the node degree and q∈ [0,1].
isSubset can be viewed as a measure of similarity between two nodes.
* *
( )
( , ) ,i j ij Nb i
isSubset C C qk
*iC
1 2( , )isSubset s s 1 2s s
Effect of Conditional Update Operator
Running time of LabelRank O(Tm): m is the number of edges and T is the number
of iterations.
LabelRank is a linear algorithm
Performance of LabelRank
LabelRankT It is a LabelRank with one extra conditional update rule
by which only nodes involved changes will be updated. Changes are handled by comparing neighbors of node i at two consecutive steps, and .1( )tNb i ( )tNb i
Two Problems of Modularity Maximization
Split large communities Favor small communities
Resolution limit problem Modularity optimization may fail to discover
communities smaller than a scale even in cases where communities are unambiguously defined.
This scale depends on the total number of edges in the network and the degree of interconnectedness of the communities.
Favor large communitiesFortunato et al, 2008; Li et al, 2008; Arenas et al, 2008; Berry et al, 2009; Good et al, 2010; Ronhovde et al, 2010; Fortunato, 2010; Lancichinetti et al, 2011; Traag et al, 2011; Darst et al, 2013.
Modularity Modularity (Q): the fraction of edges falling within
communities minus the expected value in an equivalent network with edges placed at random
Equivalent definition
,
,
1 ,2 | | 2 | |
1 if nodes and in the same community,0 otherwise.
i j
i j
i jij c c
ij
c c
k kQ A
E E
i j
2| | | | 2 | | | |
,| | 2 | |
| |: the number of intra edges of Community ;
| |: the number of inter edges of Community .
i i i
i
i
i
in in outcc c c
c
inc i
outc i
E E EQ
E E
E c
E c
M. E. J. Newman, 2006.
Newman and Girvan, 2004.
Modularity with Split Penalty Modularity (Q): the modularity of the community
detection result
Split penalty (SP): the fraction of edges that connect nodes of different communities
Qs = Q – SP: solving the problem, favoring small communities, of Modularity
,1 .
2 | | 2 | | i j
i jij c c
ij
k kQ A
E E
,1 (1 ).
2 | | i jij c cij
SP AE
, ,1 1 (1 ).
2 | | 2 | | 2 | |i j i j
i js ij c c ij c c
ij ij
k kQ Q SP A A
E E E
Qs with Community Density Resolution limit: Modularity optimization may fail to
detect communities smaller than a scale Intuitively, put density into Modularity and Split Penalty
to solve the resolution limit problem
Equivalent definition
,
2, , ,
,
1 1 (1 )2 | | 2 | | 2 | |
| || | (| | 1) / 2
| |
| || |
i i i j i j i j
i
i
i j
i j
i jds ij c c c c ij c c c c
ij ij
inc
ci i
c cc c
i j
k kQ A d d A d
E E E
Ed
c cE
dc c
2| | | |
,,
| || | 2 | | | || | 2 | | 2 | |
i ji i i
i i i j
i j
j i
in in outc Cc cc c c
ds c c c cc c
c c
EE E EQ d d d
E E E
Example of Two Well-Separated Communities
Modularity (Q) Split Penalty (SP) Qs = Q – SP Qds
2 communities 0.5 0 0.5 0.51 community 0 0 0 0.245
Example of Two Weakly Connected Communities
Modularity (Q) Split Penalty (SP) Qs = Q – SP Qds
2 communities 0.357 0.143 0.214 0.3391 community 0 0 0 0.25
Ambiguity between One and Two Communities
Modularity (Q) Split Penalty (SP) Qs = Q – SP Qds
2 communities 0.3 0.2 0.1 0.2631 community 0 0 0 0.249
Ambiguity between One and Two Communities
Modularity (Q) Split Penalty (SP) Qs = Q – SP Qds
2 communities 0.25 0.25 0 0.1881 community 0 0 0 0.245
Example of One Well Connected Community
Modularity (Q) Split Penalty (SP) Qs = Q – SP Qds
2 communities 0.167 0.333 -0.167 0.04171 community 0 0 0 0.23
Example of One Very Well Connected Community
Modularity (Q) Split Penalty (SP) Qs = Q – SP Qds
2 communities 0.0455 0.455 -0.409 -0.2391 community 0 0 0 0.168
Example of One Complete Graph
Community Quality on a complete graph with 8 nodes Modularity (Q) Split Penalty (SP) Qs = Q – SP Qds
2 communities -0.0714 0.571 -0.643 -0.6431 community 0 0 0 0
Modularity Has Nothing to Do with #Nodes
2
2
2
12 13(clique) (tree) 2* 0.4231;26 26
12 13 1(clique) (tree) 2* 0.3462;26 26 26
12 13 1 1(clique) 2* *1 *1 * 0.4183;26 26 26 4*4
12 2 13 2(tree) 2* * *26 7 26 7
s s
ds
ds
Q Q
Q Q
Q
Q
2 1 1* 0.2214.26 7*7
5-clique Example
Modularity (Q) Split Penalty (SP) Qs = Q – SP Qds
30 communities
0.8758 0.09091 0.7848 0.8721
15 communities
0.8879 0.04545 0.8424 0.4305∆Qs=(0.8424-0.7848)=0.0576 > ∆Q=(0.8879-0.8758)=0.0121
Thanks!Q & A
Example of Two Weakly Connected Communities
Modularity (Q) Split Penalty (SP) Qs = Q – SP Qds
2 communities 0.309 0.25 0.0586 0.2641 community -0.00586 0.125 -0.131 0.202