Data Mining and Machine Learning: Fundamental Concepts and ...zaki/DMML/slides/pdf/ychap4.pdf ·...

Data Mining and Machine Learning:

Fundamental Concepts and Algorithmsdataminingbook.info

Mohammed J. Zaki1 Wagner Meira Jr.2

1Department of Computer ScienceRensselaer Polytechnic Institute, Troy, NY, USA

2Department of Computer ScienceUniversidade Federal de Minas Gerais, Belo Horizonte, Brazil

Chapter 4: Graph Data

Zaki & Meira Jr. (RPI and UFMG) Data Mining and Machine Learning Chapter 4: Graph Data 1 / 48

Graphs

A graph G = (V ,E ) comprises a finite nonempty set V of vertices or nodes, and aset E ⊆ V ×V of edges consisting of unordered pairs of vertices.

The number of nodes in the graph G , given as |V |= n, is called the order of thegraph, and the number of edges in the graph, given as |E |=m, is called the size

of G .

A directed graph or digraph has an edge set E consisting of ordered pairs ofvertices.

A weighted graph consists of a graph together with a weight wij for each edge(vi ,vj) ∈ E .

A graph H = (VH ,EH) is called a subgraph of G = (V ,E ) if VH ⊆ V and EH ⊆ E .


Undirected and Directed Graphs

v1 v2

v3 v4 v5 v6

v7 v8

v1 v2

v3 v4 v5 v6

v7 v8


Degree Distribution

The degree of a node vi ∈ V is the number of edges incident with it, and isdenoted as d(vi ) or just di .

The degree sequence of a graph is the list of the degrees of the nodes sorted innon-increasing order.

Let Nk denote the number of vertices with degree k . The degree frequency

distribution of a graph is given as

(N0,N1, . . . ,Nt)

where t is the maximum degree for a node in G .

Let X be a random variable denoting the degree of a node. The degree

distribution of a graph gives the probability mass function f for X , given as

(

f (0), f (1), . . . , f (t))

where f (k) = P(X = k) = Nkn

is the probability of a node with degree k .


Degree Distribution

v1 v2

v3 v4 v5 v6

v7 v8

The degree sequence of the graph is

(4,4,4,3,2,2,2,1)

Its degree frequency distribution is

(N0,N1,N2,N3,N4) = (0,1,3,1,3)

The degree distribution is given as(

f (0), f (1), f (2), f (3), f (4))

= (0,0.125,0.375,0.125,0.375)


Path, Distance and Connectedness

A walk in a graph G between nodes x and y is an ordered sequence of vertices, startingat x and ending at y ,

x = v0, v1, . . . , vt−1, vt = y

such that there is an edge between every pair of consecutive vertices, that is,(vi−1,vi ) ∈ E for all i = 1,2, . . . , t. The length of the walk, t, is measured in terms ofhops – the number of edges along the walk.

A path is a walk with distinct vertices (with the exception of the start and end vertices).A path of minimum length between nodes x and y is called a shortest path, and thelength of the shortest path is called the distance between x and y , denoted as d(x ,y). Ifno path exists between the two nodes, the distance is assumed to be d(x ,y) =∞.

Two nodes vi and vj are connected if there exists a path between them. A graph isconnected if there is a path between all pairs of vertices. A connected component, orjust component, of a graph is a maximal connected subgraph.

A directed graph is strongly connected if there is a (directed) path between all orderedpairs of vertices. It is weakly connected if there exists a path between node pairs only byconsidering edges as undirected.


Adjacency Matrix

A graph G = (V ,E ), with |V |= n vertices, can be represented as an n× n,symmetric binary adjacency matrix, A, defined as

A(i , j) =

{

1 if vi is adjacent to vj

0 otherwise

If the graph is directed, then the adjacency matrix A is not symmetric.

If the graph is weighted, then we obtain an n× n weighted adjacency matrix, A,defined as

A(i , j) =

{

wij if vi is adjacent to vj

0 otherwise

where wij is the weight on edge (vi ,vj) ∈ E .


Graphs from Data Matrix

Many datasets that are not in the form of a graph can still be converted into one.

Let D = {x i}ni=1 (with x i ∈R

d), be a dataset. Define a weighted graphG = (V ,E ), with edge weight

wij = sim(x i ,x j)

where sim(x i ,x j) denotes the similarity between points x i and x j .

For instance, using the Gaussian similarity

wij = sim(x i ,x j) = exp

{

−‖x i − x j‖

2

2σ2

}

where σ is the spread parameter.


Iris Similarity Graph: Gaussian Similarityσ = 1/

√2; edge exists iff wij ≥ 0.777

order: |V |= n= 150; size: |E |=m= 753

uT

uT

uT

uT

uT

uT

uT

uT

uT

uT

uT

uT

uT

uT

uTuT

uT

uT

uTuT

uTuT

uT

uTuT

uT

uT

uT

uT

uTuT

uTuT

uT

uT

uTuT

uT

uTuT

uT

uTuT

uT

uT

uT

uT

uT

uT

uT

rSrS

rSrS rSrS

rSrS

rS

rS

rS

rS

rSrS

rSrSrS

rS

rS

rS

rS

rS

rS

rSrS

rS

rS

rS

rSrS

rSrS

rS

rS

rS

rS

rS

rS

rS

rSrS

rSrSrS

rS

rS

rSrS

rS

rS

bC

bC

bC

bC

bCbC

bCbC

bCbC

bC

bCbC

bC

bC

bC

bC

bC

bC

bC

bC bC

bCbC

bC

bCbC

bCbC

bCbC

bC

bC

bC

bC

bC

bC

bC

bC

bC

bC

bC

bCbC

bC

bC

bC

bCbC

bC


Topological Graph Attributes

Graph attributes are local if they apply to only a single node (or an edge), andglobal if they refer to the entire graph.

Degree: The degree of a node vi ∈ G is defined as

di =∑

j

A(i , j)

The corresponding global attribute for the entire graph G is the average degree:

µd =

∑

i di

n

Average Path Length: The average path length is given as

µL =

∑

i

∑

j>i d(vi ,vj)(

n

2

) =2

n(n− 1)

∑

i

∑

j>i

d(vi ,vj)


Iris Graph: Degree Distribution

1 3 5 7 9 11 13 15 17 19 21 23 25 27 29 31 33 35 37 39 41 43

0.01

0.02

0.03

0.04

0.05

0.06

0.07

0.08

0.09

0.10

Degree: k

f(k)

6

13

8

6

5

8 8

13

10

6

9

6

7

6

5

1 1

2

4 4

3

4

5

1

3

0

1

0

1

2

1

0 0 0

1


Iris Graph: Path Length Histogram

0 1 2 3 4 5 6 7 8 9 10 110

100

200

300

400

500

600

700

800

900

1000

Path Length: k

Fre

quen

cy

753

1044

831

668

529

330

240

146

90

30 12


Eccentricity, Radius and Diameter

The eccentricity of a node vi is the maximum distance from vi to any other nodein the graph:

e(vi ) = maxj

{

d(vi ,vj)}

The radius of a connected graph, denoted r(G ), is the minimum eccentricity ofany node in the graph:

r(G ) =mini

{

e(vi )}

=mini

{

maxj

{

d(vi ,vj)}

}

The diameter, denoted d(G ), is the maximum eccentricity of any vertex in thegraph:

d(G ) =maxi

{

e(vi )}

=maxi,j

{

d(vi ,vj)}

For a disconnected graph, values are computed over the connected components ofthe graph.

The diameter of a graph G is sensitive to outliers. Effective diameter is morerobust; defined as the minimum number of hops for which a large fraction,typically 90%, of all connected pairs of nodes can reach each other.Zaki & Meira Jr. (RPI and UFMG) Data Mining and Machine Learning Chapter 4: Graph Data 13 / 48

Clustering Coefficient

The clustering coeff icient of a node vi is a measure of the density of edges in theneighborhood of vi .

Let Gi = (Vi ,Ei ) be the subgraph induced by the neighbors of vertex vi . Note thatvi 6∈ Vi , as we assume that G is simple.

Let |Vi |= ni be the number of neighbors of vi , and |Ei |=mi be the number ofedges among the neighbors of vi . The clustering coefficient of vi is defined as

C (vi ) =no. of edges in Gi

maximum number of edges in Gi

=mi(

ni2

) =2 ·mi

ni(ni − 1)

The clustering coeff icient of a graph G is simply the average clustering coefficientover all the nodes, given as

C (G ) =1

n

∑

i

C (vi )

C (vi ) is well defined only for nodes with degree d(vi )≥ 2, thus define C (vi ) = 0 ifdi < 2.


Transitivity and Efficiency

Transitivity of the graph is defined as

T (G ) =3× no. of triangles in G

no. of connected triples in G

where the subgraph composed of the edges (vi ,vj) and (vi ,vk) is a connected

triple centered at vi , and a connected triple centered at vi that includes (vj ,vk) iscalled a triangle (a complete subgraph of size 3).

The eff iciency for a pair of nodes vi and vj is defined as 1d(vi ,vj )

. If vi and vj are

not connected, then d(vi ,vj) =∞ and the efficiency is 1/∞= 0.The eff iciency of a graph G is the average efficiency over all pairs of nodes,whether connected or not, given as

2

n(n− 1)

∑

i

∑

j>i

1

d(vi ,vj)


Clustering Coefficient

v1 v2

v3 v4 v5 v6

v7 v8

Subgraph induced by node v4:

v1

v3 v5

v7

The clustering coefficient of v4 is

C(v4) =2(

4

2

) =2

6= 0.33

The clustering coefficient for G is

C(G ) =1

8

(

1

2+

1

3+ 1+

1

3+

1

3

)

=2.5

8= 0.3125


Centrality Analysis

A centrality is a function c : V →R, that induces a ranking on V .

Degree Centrality: The simplest notion of centrality is the degree di of a vertexvi – the higher the degree, the more important or central the vertex.

Eccentricity Centrality: Eccentricity centrality is defined as:

c(vi ) =1

e(vi )=

1

maxj {d(vi ,vj)}

The less eccentric a node is, the more central it is.

Closeness Centrality: closeness centrality uses the sum of all the distances torank how central a node is

c(vi ) =1

∑

j d(vi ,vj)


Betweenness Centrality

The betweenness centrality measures how many shortest paths between all pairs ofvertices include vi . It gives an indication as to the central “monitoring” role playedby vi for various pairs of nodes.

Let ηjk denote the number of shortest paths between vertices vj and vk , and letηjk(vi ) denote the number of such paths that include or contain vi .

The fraction of paths through vi is denoted as

γjk(vi ) =ηjk(vi )

ηjk

The betweenness centrality for a node vi is defined as

c(vi ) =∑

j 6=i

∑

k 6=ik>j

γjk =∑

j 6=i

∑

k 6=ik>j

ηjk(vi )

ηjk


Centrality Values

v1 v2

v3 v4 v5 v6

v7 v8

Centrality v1 v2 v3 v4 v5 v6 v7 v8

Degree 4 3 2 4 4 1 2 2

Eccentricity 0.5 0.33 0.33 0.33 0.5 0.25 0.25 0.33e(vi ) 2 3 3 3 2 4 4 3

Closeness 0.100 0.083 0.071 0.091 0.100 0.056 0.067 0.071∑

j d(vi ,vj) 10 12 14 11 10 18 15 14

Betweenness 4.5 6 0 5 6.5 0 0.83 1.17


Prestige or Eigenvector Centrality

Let p(u) be a positive real number, called the prestige score for node u. Intuitivelythe more (prestigious) the links that point to a given node, the higher its prestige.

p(v) =∑

u

A(u,v) · p(u)

=∑

u

AT (v ,u) · p(u)

Across all the nodes, we have

p′ =ATp

where p is an n-dimensional prestige vector.By recursive expansion, we see that

pk =ATpk−1 =

(

AT)2

pk−2 = · · ·=(

AT)k

p0

where p0 is the initial prestige vector. It is well known that the vector pk

converges to the dominant eigenvector of AT .


Computing Dominant Eigenvector: Power Iteration

The dominant eigenvector of AT

and the corresponding eigenvaluecan be computed using the power

iteration method.

It starts with an initial vector p0,

and in each iteration, it multiplieson the left by AT , and scales theintermediate pk vector by dividingit by the maximum entry pk [i ] inpk to prevent numeric overflow.

The ratio of the maximum entry initeration k to that in k − 1, givenas λ=

pk [i ]

pk−1[i ], yields an estimate

for the eigenvalue.

The iterations continue until thedifference between successiveeigenvector estimates falls belowsome threshold ǫ > 0.

PowerIteration (A, ǫ):1 k← 0 // iteration

2 p0← 1 ∈Rn// initial vector

3 repeat

4 k← k + 1 pk ←ATpk−1

// eigenvector estimate

5 i ← argmaxj{

pk [j ]}

// maximum

value index

6 λ← pk [i ]/pk−1[i ] // eigenvalue

estimate

7 pk ←1

pk [i ]pk // scale vector

8

9 until∥

∥pk −pk−1

∥

∥≤ ǫ

10 p← 1‖pk‖

pk // normalize eigenvector

11 return p,λ


Power Iteration for Eigenvector Centrality: Example

v1

v4 v5

v3 v2

A=

0 0 0 1 00 0 1 0 11 0 0 0 00 1 1 0 10 1 0 0 0

AT =

0 0 1 0 00 0 0 1 10 1 0 1 01 0 0 0 00 1 0 1 0


Power Method via Scaling

p0 p1 p2 p3

11111

12212

→

0.511

0.51

11.51.50.51.5

→

0.6711

0.331

11.331.330.671.33

→

0.7511

0.51

λ 2 1.5 1.33

p4 p5 p6 p7

11.51.50.751.5

→

0.6711

0.51

11.51.50.671.5

→

0.6711

0.441

11.441.440.671.44

→

0.6911

0.461

11.461.460.691.46

→

0.6811

0.471

1.5 1.5 1.444 1.462


Convergence of the Ratio to Dominant Eigenvalue

0 2 4 6 8 10 12 14 161.25

1.50

1.75

2.00

2.25

bc

bc

bc

bc bc bc bc bc bc bc bc bc bc bc bc bc λ= 1.466


PageRank

PageRank is based on (normalized) prestige combined with a random jump

assumption. The PageRank of a node v recursively depends on the PageRank ofother nodes that point to it.

Normalized Prestige: Define N as the normalized adjacency matrix

N(u,v) =

{

1od(u) if (u,v) ∈ E

0 if (u,v) 6∈ E

where od(u) is the out-degree of node u.Normalized prestige is given as

p(v) =∑

u

NT (v ,u) · p(u)

Random Jumps: In the random surfing approach, there is a small probability ofjumping from one node to any of the other nodes in the graph. The normalizedadjacency matrix for a fully connected graph is

N r =1

n1n×n

where 1n×n is the n× n matrix of all ones.Zaki & Meira Jr. (RPI and UFMG) Data Mining and Machine Learning Chapter 4: Graph Data 25 / 48

PageRank: Normalized Prestige and Random Jumps

The PageRank vector is recursively defined as

p′ = (1−α)NTp+αN

Tr p

=(

(1−α)NT +αNTr

)

p

=MTp

α denotes the probability of random jumps. The solution is the dominanteigenvector of M

T , where M = (1−α)N +αN r is the combined normalizedadjacency matrix.

Sink Nodes: If od(u) = 0, then only random jumps from u are allowed. Themodified M matrix is given as

Mu =

{

Mu if od(u)> 01n1Tn if od(u) = 0

where 1n is the n-dimensional vector of all ones.


Hub and Authority Scores (HITS)

The authority score of a page is analogous to PageRank or prestige, and it depends onhow many “good” pages point to it. The hub score of a page is based on how many“good” pages it points to. In other words, a page with high authority has many hubpages pointing to it, and a page with high hub score points to many pages that havehigh authority.

Let a(u) be the authority score and h(u) the hub score of node u. We have:

a(v) =∑

u

AT (v ,u) · h(u)

h(v) =∑

u

A(v ,u) · a(u)

In matrix notation, we obtain

a′ =A

Th h

′ =Aa

Recursively, we have:

ak =AThk−1 =A

T (Aak−1) = (ATA)ak−1

hk =Aak−1 =A(AThk−1) = (AA

T )hk−1

The authority score converges to the dominant eigenvector of ATA, whereas the hubscore converges to the dominant eigenvector of AAT .Zaki & Meira Jr. (RPI and UFMG) Data Mining and Machine Learning Chapter 4: Graph Data 27 / 48

Small World Property

Real-world graphs exhibit the small-world property that there is a short pathbetween any pair of nodes. A graph G exhibits small-world behavior if the averagepath length µL scales logarithmically with the number of nodes in the graph, thatis, if

µL ∝ logn

where n is the number of nodes in the graph.

A graph is said to have ultra-small-world property if the average path length ismuch smaller than logn, that is, if µL≪ logn.


Scale-free Property

In many real-world graphs it has been observed that the empirical degreedistribution f (k) exhibits a scale-free behavior captured by a power-lawrelationship with k , that is, the probability that a node has degree k satisfies thecondition

f (k)∝ k−γ

Taking the logarithm on both sides gives

log f (k) = log(αk−γ)

or log f (k) =−γ logk + logα

which is the equation of a straight line in the log-log plot of k versus f (k), with−γ giving the slope of the line.

A power-law relationship leads to a scale-free or scale invariant behavior becausescaling the argument by some constant c does not change the proportionality.


Clustering Effect

Real-world graphs often also exhibit a clustering effect, that is, two nodes aremore likely to be connected if they share a common neighbor. The clusteringeffect is captured by a high clustering coefficient for the graph G .

Let C (k) denote the average clustering coefficient for all nodes with degree k ;then the clustering effect also manifests itself as a power-law relationship betweenC (k) and k :

C (k)∝ k−γ

In other words, a log-log plot of k versus C (k) exhibits a straight line behaviorwith negative slope −γ.


Degree Distribution: Human Protein Interaction Network|V |= n= 9521, |E |=m= 37060

0 1 2 3 4 5 6 7 8

−14

−12

−10

−8

−6

−4

−2

Degree: log2 k

Pro

bab

ility

:log2f(k)

bCbC

bCbC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC

bC bCbC bCbCbCbC

bC bC bCbC bC bC bC bC

bCbCbC bC

bCbC

bC bC bC bC bC bC bCbC

bC

bC bC

bC

bC

bC

bC bC

bC bC

bC

bCbC bCbC

bC bC bCbC bCbCbCbC bC bCbC

bCbC bCbC bCbCbC

bC bC bCbC bCbCbC bCbC

bC bC bC bCbCbC bC bC bC bC bC bC bCbCbC bC bC bC bC bC bC bC bC bC bC bC bC bC

−γ =−2.15


Cumulative Degree DistributionF c (k) = 1−F (k) where F (k) is the CDF for f (k)

0 1 2 3 4 5 6 7 8

−14

−12

−10

−8

−6

−4

−2

0

Degree: log2 k

Pro

bab

ility

:log2F

c(k)

bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bCbCbC

−(γ− 1) =−1.85


Average Clustering Coefficient

1 2 3 4 5 6 7 8

−8

−6

−4

−2

Degree: log2 k

Ave

rage

Clu

ster

ing

Coeffi

cien

t:log2C(k)

bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bCbCbC bC bC bC bC bC bC

bC bC

bCbC

bCbCbCbCbCbC

bC

bCbCbC bCbC

bC

bC bC

bC

bC bC bCbC

bC

bC

bCbCbC

bCbC

bC

bC

bC

bC

bC bCbCbCbC bC bC bC

bC

bC

bC

bC

bC

bC

bC

bCbCbC bCbC

bCbCbC

bC

bCbCbC bC

bC bCbC

bCbC

bCbC bC

bCbCbC

bC

bC bCbC bC

bC

bCbC

bCbC bC

bC

bCbCbC

bC bC

bCbC bCbCbCbC

bC

bC

−γ =−0.55


Erdös–Rényi Random Graph Model

The ER model specifies a collection of graphs G(n,m) with n nodes and m edges,such that each graph G ∈ G has equal probability of being selected:

P(G ) =1

(

M

m

) =

(

M

m

)−1

where M =(

n

2

)

= n(n−1)2

and(

M

m

)

is the number of possible graphs with m edges(with n nodes).

Random Graph Generation: Randomly select two distinct vertices vi ,vj ∈ V ,and add an edge (vi ,vj) to E , provided the edge is not already in the graph G .Repeat the process until exactly m edges have been added to the graph.

Let X be a random variable denoting the degree of a node for G ∈ G. Let pdenote the probability of an edge in G

p =m

M=

m(

n

2

) =2m

n(n− 1)


Random Graphs: Average Degree

Degree of a node follows a binomial distribution with probability of success p,given as

f (k) = P(X = k) =

(

n− 1

k

)

pk(1− p)n−1−k

since a node can be connected to n− 1 other vertices.

The average degree µd is then given as the expected value of X :

µd = E [X ] = (n− 1)p

The variance of the degree is

σ2d = var(X ) = (n− 1)p(1− p)


Random Graphs: Degree Distribution

As n→∞ and p→ 0 the expected value and variance of X can be rewritten as

E [X ] = (n− 1)p ≃ np as n→∞

var(X ) = (n− 1)p(1− p) ≃ np as n→∞ and p→ 0

The binomial distribution can be approximated by a Poisson distribution withparameter λ, given as

f (k) =λk

e−λ

k!

where λ= np represents both the expected value and variance of the distribution.

Thus, ER random graphs do not exhibit power law degree distribution.


Random Graphs: Clustering Coefficient and Diameter

Clustering Coefficient: Consider a node vi with degree k. Since p is the probability of anedge, the expected number of edges mi among the neighbors of a node vi is simply

mi =pk(k − 1)

2

The clustering coefficient is

C(vi ) =2mi

k(k − 1)= p

which implies that C(G ) = 1

n

∑

i C(vi ) = p. Since for sparse graphs we have p → 0, this

means that ER random graphs do not show clustering effect.

Diameter: Expected degree of a node is µd = λ, so in one hop a node can reach λ

nodes. Coarsely, in k hops it can reach λk nodes. Thus, we have

t∑

k=1

λk≤ n, which implies that t = log

λn

It follows that the diameter of the graph is

d(G )∝ logn

Thus, ER random graphs are small-world.Zaki & Meira Jr. (RPI and UFMG) Data Mining and Machine Learning Chapter 4: Graph Data 37 / 48

Watts–Strogatz Small-world Graph Model

The Watts–Strogatz (WS) model starts with a regular graph of degree 2k , havingn nodes arranged in a circular layout, with each node having edges to its k

neighbors on the right and left.The regular graph has high local clustering. Adding a small amount ofrandomness leads to the emergence of the small-world phenomena.

Watts–Strogatz Regular Graph: n= 8, k = 2v0

v1

v2

v3

v4

v5

v6

v7


WS Regular Graph: Clustering Coefficient and Diameter

The clustering coefficient of a node v is given as

C (v) =mv

Mv

=3k − 3

4k − 2

As k increases, the clustering coefficient approaches 34

because C (G ) = C (v)→ 34

as k→∞. The WS regular graph thus has a high clustering coefficient.

The diameter of a regular WS graph is given as

d(G ) =

{

⌈

n2k

⌉

if n is even⌈

n−12k

⌉

if n is odd

The regular graph has a diameter that scales linearly in the number of nodes, andthus it is not small-world.


Random Perturbation of Regular Graph

Edge Rewiring: For each edge (u,v) in the graph, with probability r , replace v

with another randomly chosen node avoiding loops and duplicate edges.

The WS regular graph has m= kn total edges, so after rewiring, rm of the edgesare random, and (1− r)m are regular.

Edge Shortcuts: Add a few shortcut edges between random pairs of nodes, withr being the probability, per edge, of adding a shortcut edge.

The total number of random shortcut edges added to the network is mr = knr .The total number of edges in the graph is m+mr = (1+ r)m= (1+ r)kn.


Watts–Strogatz Graph: Shortcut Edgesn= 20, k = 3


Properties of Watts–Strogatz Graphs

Degree Distribution: Let X denote the random variable denoting the number ofshortcuts for each node. Then the probability of a node with j shortcut edges is given as

f (j) = P(X = j) =

(

n′

j

)

pj (1− p)n

′−j

with E [X ] = n′p = 2kr and p = 2krn−2k−1

= 2krn′

.

The expected degree of each node in the network is therefore 2k +E [X ] = 2k(1+ r).The degree distribution is not a power law.

Clustering Coefficient: The clustering coefficient is

C(v)≃3(k − 1)

(1+ r)(4kr + 2(2k − 1))=

3k − 3

4k − 2+ 2r(2kr + 4k − 1)

Thus, for small values of r the clustering coefficient remains high.

Diameter: Small values of shortcut edge probability r are enough to reduce the diameterfrom O(n) to O(logn).


Watts-Strogatz Model: Diameter (circles) and Clustering

Coefficient (triangles)

0 0.02 0.04 0.06 0.08 0.10 0.12 0.14 0.16 0.18 0.200

10

20

30

40

50

60

70

80

90

100

Edge probability: r

Dia

met

er:d(G

)

bC

bCbC

bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC

bC 167

uT uT uT uT uT uT uT uT uT uT uT uT uT uT uT uT uT uT uT uT uT

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1.0

Clu

ster

ing

coeffi

cien

t:C(G

)


Barabási–Albert Scale-free Model

The Barabási–Albert (BA) yields a scale-free degree distribution based onpreferential attachment; that is, edges from the new vertex are more likely to linkto nodes with higher degrees.

Let Gt denote the graph at time t, and let nt denote the number of nodes, and mt

the number of edges in Gt .

Initialization: The BA model starts with G0, with each node connected to its leftand right neighbors in a circular layout. Thus m0 = n0.

Growth and Preferential Attachment: The BA model derives a new graphGt+1 from Gt by adding exactly one new node u and adding q ≤ n0 new edgesfrom u to q distinct nodes vj ∈ Gt , where node vj is chosen with probability πt(vj)proportional to its degree in Gt , given as

πt(vj) =dj

∑

vi∈Gtdi


Barabási–Albert Graphn0 = 3, q = 2, t = 12

At t = 0, start with 3 vertices v0, v1, andv2 fully connected (shown in gray).

At t = 1, vertex v3 is added, with edgesto v1 and v2, chosen according to thedistribution

π0(vi ) = 1/3 for i = 0,1,2

At t = 2, v4 is added. Nodes v2 and v3

are preferentially chosen according to theprobability distribution

π1(v0) = π1(v3) =2

10= 0.2

π1(v1) = π1(v2) =3

10= 0.3

v0

v1

v2

v3

v4

v5

v6

v7 v8

v9

v10

v11

v12

v13

v14


Properties of the BA Graphs

Degree Distribution: The degree distribution for BA graphs is given as

f (k) =(q+ 2)(q+ 1)q

(k + 2)(k + 1)k·

2

(q+ 2)=

2q(q+ 1)

k(k + 1)(k + 2)

For constant q and large k, the degree distribution scales as

f (k)∝ k−3

The BA model yields a power-law degree distribution with γ = 3, especially for largedegrees.

Diameter: The diameter of BA graphs scales as

d(Gt) =O

(

logntlog lognt

)

suggesting that they exhibit ultra-small-world behavior, when q > 1.

Clustering Coefficient: The expected clustering coefficient of the BA graphs scales as

E [C(Gt)] =O

(

(lognt)2

nt

)

which is only slightly better than for random graphs.Zaki & Meira Jr. (RPI and UFMG) Data Mining and Machine Learning Chapter 4: Graph Data 46 / 48

Barabási–Albert Model: Degree Distributionn0 = 3, t = 997,q = 3

1 2 3 4 5 6 7

−14

−12

−10

−8

−6

−4

−2

Degree: log2 k

Pro

bab

ility

:log2f(k)

bC

bCbC

bCbC bC bC

bCbCbC bCbC bCbC bC bCbCbC bC bCbCbC bC bC bC bC

bC bCbCbC

bC

bCbCbC bC bCbCbCbC

bC

bC

bC

bC

bCbCbC

bC

bCbCbC bCbCbC

bC

bC

bC

bC

bC

bC bC

bC bCbCbC

bC

bCbC bC

bC

bC bC

bC bC bC bC bC bC bC bC

−γ =−2.64


Data Mining and Machine Learning:

Fundamental Concepts and Algorithmsdataminingbook.info

Mohammed J. Zaki1 Wagner Meira Jr.2

1Department of Computer ScienceRensselaer Polytechnic Institute, Troy, NY, USA

2Department of Computer ScienceUniversidade Federal de Minas Gerais, Belo Horizonte, Brazil

Chapter 4: Graph Data


Date post:	23-Jul-2020
Category:	Documents
Upload:	others
View:	7 times
Download:	0 times

Data Mining and Machine Learning: Fundamental Concepts and ...zaki/DMML/slides/pdf/ychap4.pdf ·...

Documents