+ All Categories
Home > Documents > The overlapping community structure of complex networks.

The overlapping community structure of complex networks.

Date post: 31-Mar-2015
Category:
Upload: kaia-pote
View: 220 times
Download: 2 times
Share this document with a friend
Popular Tags:
41
The overlapping The overlapping community structure of community structure of complex networks complex networks
Transcript
Page 1: The overlapping community structure of complex networks.

The overlapping community The overlapping community structure of complex networksstructure of complex networks

Page 2: The overlapping community structure of complex networks.

• Networks and complex systems

• The structure of networks

• Finding communities

• Devisive and agglomerative methods

• Network construction in examples

• Statistical features

• The importance of observing networks

Introduction

Page 3: The overlapping community structure of complex networks.

1. Networks and complex systemspurpose: • understand the

structural and fundamental properties • desription of the global organization: coexistence of

structural subunits (communities)• local structural units’ distribution and clustering properties

global features

Communities: larger units in the network

vertices ( ) more densely connected to eachother than to the rest of the network

Page 4: The overlapping community structure of complex networks.

Examples

A person as part of the scientific community, family, their connections related to their

hobby, schoolmates

Page 5: The overlapping community structure of complex networks.

such blocks:

• in the industrial sectors

• functionally related proteins

• word association communities

(next illustration)

Page 6: The overlapping community structure of complex networks.

The communities of the word:

‘bright’

Page 7: The overlapping community structure of complex networks.

Problems with the identifications of communities

• different kind of methods: usually they don’t allow for overlapping

communities

However overlapping is important. devide networks into smaller peaces

Page 8: The overlapping community structure of complex networks.

Nested and overlapping structure of the communities

Page 9: The overlapping community structure of complex networks.

Devisive and agglomerative methods

fail to identify the communities when overlaps are significant

Page 10: The overlapping community structure of complex networks.

• We would like to discuss an approach to analysing the main statistical features

we need new characteristic quantities

• Introduce a technique for exploring overlapping communities on a large scale

Page 11: The overlapping community structure of complex networks.

2. The stucture of networks• Clusters/communities:

Those parts of the network in which the nodes are more highly connected to each other than to the rest of the network.

• Membership number: mi

number of communities that node i belongs to

• Overlap size between α and β communities:

Sovα,β

the number of nodes which communities α and β share

Page 12: The overlapping community structure of complex networks.

• Community degree: dαcom

the number of those links which are overlaps

• Size of community α: sαcom

number of nodes

We would like to examine the distribution of these quantities:

m → P(m)sov → P(sov)dcom → P(dcom)scom → P(scom)

Page 13: The overlapping community structure of complex networks.

• k-clique: complete subgraph of size k

• k-clique community:

union of all k-cliques that can be reached from each other through a series of adjacent k-cliques

→ they share k-1 nodes

3-cliques and

3-cligue percolation clusters

Page 14: The overlapping community structure of complex networks.

overlapping k-clique communities

k=4

overlaps:

yellow-blue: 1 node yellow-green: 2 nodes and 1 link 1 node

Page 15: The overlapping community structure of complex networks.

3. Finding communities

Requirements:

• The method of identification:

– cannot be too restrictive– be based on the density of links– be local– not allowed to be any cut-node or cut-link– allow overlaps

Page 16: The overlapping community structure of complex networks.

•Algorithm:We use an exponential algorithm→it proved to be more efficient than polynomial algorithms

procedure:1. Locating all cliques of the network

2. Identifying the communities by carrying out a standard component analysis of the clique-clique overlap matrix

We use the method for binary networks:undirected, unweighted links

Arbitrary networks can always be transformed to binary ones:

ignore any directionality

keep only those links that are stronger than a treshold w*

Page 17: The overlapping community structure of complex networks.

Strategy:

• according to the experience in real networks the typical size of the complete subgraphs is between 10 and 100

→ ( ) different k-cliques

→ locating the k-cliques individually and examine the adjacency between them would be extremely slow

don’t look for k-cliques, rather

1. locate the large complete subgraphs

2. look for the k-clique connected subsets of given k by studying the overlap between them

k

s

Page 18: The overlapping community structure of complex networks.

Method:

1. Extract all complete subgraphs (cliques):

• cliques have to be located in a decreasing order of their size(firtst of all the largest clique size have to be determined)

start with this sizerepeatedly choose a nodeextract every clique of this size containing that nodedelete the node and its edges

(will not find the same clique multiple times)when no nodes are left the clique size is decreased by one

• Find the clique of size s that contains node v: construct set A A: nodes all linked to eachother

initially contains v then enlarge till it reaches size s construct set B the set of nodes that are linked to each node in A but not necessarily to the

nodes in B initially consists of the neighbours of v

Page 19: The overlapping community structure of complex networks.

2. Prepare the clique-clique overlap matrix:

(symmetric)

Diagonal elements → size of the clique

Offdiagonal elements → the number of common nodes

Page 20: The overlapping community structure of complex networks.

k-clique communities: at least k-1 nodes

→ we have to erase every offdiagonal entry smaller than k-1

→ erase every diagonal elements smaller than k

→ replace the remaining elements by 1

→ component analysis of this matrix

Page 21: The overlapping community structure of complex networks.

Efficiency:

• CPU time depends on the structure of the input data very strongly

• If we illustrate the time (t) depending on the number of edges (M)

fit: t = AMBln(M)

(A,B: fitting parameters)

Page 22: The overlapping community structure of complex networks.

Further examples for local community structure:

The four community of the word ‘gold’ :k=4

w*=0.025

Page 23: The overlapping community structure of complex networks.

Communities of the word ‘day’ :k=4

w*=0.025

Page 24: The overlapping community structure of complex networks.

Communities of the word ‘play’ :k=4

w*=0.025

Page 25: The overlapping community structure of complex networks.

Community structure around a particular node:We should scan through some ranges of k, w*

Examples:

1.Social network of scientific collaborators

2. The communities of the word ‘bright’ in the South Florida Free Association norms list

3. The communities of the protein Zds1 in the DIP core list of the protein-protein interaction of Saccharomyces cerevisiae

Page 26: The overlapping community structure of complex networks.

Social network of scientific

collaborators

k=4w*=0.75

Page 27: The overlapping community structure of complex networks.

The communities of the word:

‘bright’

k=4w*=0.025

Page 28: The overlapping community structure of complex networks.

The molecular-biological network of protein-protein interactions

k=4w*=0.75

Page 29: The overlapping community structure of complex networks.

We try to find the community of proteins based on their interaction

Most proteins can be associated with • protein complexes• certain functionsFor some proteins no function is yet available→ appearing as a part of a community can be a prediction of their

functions

Example:protein Ycr072c (essential for the viability of the cell) • there is no biological function yet available• the most important biological process for this community:

‘ribosome biogenesis/assembly’

→ our protein is likely to be involved in this process

Page 30: The overlapping community structure of complex networks.

Network of the protein-protein

interactions of S. cerevisiae

(k=4)

Page 31: The overlapping community structure of complex networks.

Divisive and agglomerative methods

Devisive methods:

cut the network into smaller and smaller peaces–each node is forced to remain in only one community and becomes separated from its other communities

→ usually they fall apart and desappear

example: ‘bright’ → stays together with the words connected to ‘light’

→ most of the other communities disintegrate

Agglomerative methods:• do the same in reverse direction• leads to a tree-like hierarchical rendering of the communities

Page 32: The overlapping community structure of complex networks.

The constructions of our above mentioned networks

1. co-authorship: each article →

contribution to the weight of the link between every pair of its n authors

2. South Florida Free Association norms list:→ weight of a directed link from one word to another indicates the frequency with which the people in the survey associated the end point of the link with its starting point→ replace with undirected ones→ weight: equal to the sum of the weights of the corresponding two oppositely directed links

3. DIP (Database of Interacting Proteins core list of the protein-protein interactions of Saccharomyces cerevisiae)each interaction represents an unweighted link between the interacting proteins

1

1

n

Page 33: The overlapping community structure of complex networks.

4. Statistical featuresValues of k, w*:Purpose: we would like to analyse the statistical properties

of the community structure of the entire network

→ finding a community structure that is as highly structured as possible

it leads us to the percolation phenomenon:

If the number of links is increased above a critical point a giant component appears.

Page 34: The overlapping community structure of complex networks.

Approach critical point!• for each value of k (typ. 3-6) we lower the treshold w* until the largest community becomes twice as big as the second largest one

find as many communities as possible, but– no giant community that smears out the details of the

community structure by merging many smaller communities

• f*: the fraction of links stronger than w*

– use those k values for which f* is not too small

(smaller than 0.5)

• co-authorship: k=6 f*= 0.93

• protein interaction network: k=5 f*= 0.75

• word-association: k=4 f*= 0.67

Page 35: The overlapping community structure of complex networks.

Statistics of the k-clique communities• Cumulative distribution function of the community size: power law

P(scom) (scom)-τ

– τ ranges between: -1, -1.6

– valid over nearly the entire range of community size

Page 36: The overlapping community structure of complex networks.

• The cumulative distribution of the community degree:

– starts exponentionally then crosses over to a power law

• exponentional decay:

P(dcom)

most of the communities have a size of the order of kand their distribution dominates this part of the curve

→ a characteristic scale appears d0com ≈ kδ

• power-law tail: P(dcom) (dcom) –τ

on average each node of a community has a contribution of δ to the community degree

→ this power law tail is proportional to that of the community size distribution

com

com

d

d

e 0

Page 37: The overlapping community structure of complex networks.
Page 38: The overlapping community structure of complex networks.

•The cumulative distribution of the overlap size:

– close to a power law

– large exponent

– there is no characteristic overlap size in the network

•The cumulative distribution of the membership number: P(m)

– a node can belong to several communities

– collaboration, word-association: • no characteristic value• the data are close to a power-law dependence, large exponent

– protein-protein interaction: the largest membership number is only 4

(consistent with the also short distribution of its community degree)

Page 39: The overlapping community structure of complex networks.
Page 40: The overlapping community structure of complex networks.

From statistical features:

• two communities overlapping with a given community are likely to overlap with each other as well( average clustering coefficient is high )

• Specific scaling of P(dcom): the signature of the hierarchical nature of the system(the network of the communities still exhibits a degree-distribution with a fat tail, a characteristic scale appears below which the distribution is exponential)

Complex systems have different levels of organization with units specific to each level

Page 41: The overlapping community structure of complex networks.

5. The importance of observing networks

• Community structure

→ prediction of some essential features of the system possibility to ‘zoom’ in on a unit and uncover its

communities→ interpret the local organization of large networks→ predict how the modular structure changes if a unit is removed

We can simultaneously look at the network at a higher level of organization and locate the communities.


Recommended