+ All Categories
Home > Documents > Leonid E. Zhukov - Leonid Zhukov · Table of contents 1 Probability basics 2 Power law distribution...

Leonid E. Zhukov - Leonid Zhukov · Table of contents 1 Probability basics 2 Power law distribution...

Date post: 27-Jun-2020
Category:
Upload: others
View: 4 times
Download: 2 times
Share this document with a friend
31
Power laws Leonid E. Zhukov School of Data Analysis and Artificial Intelligence Department of Computer Science National Research University Higher School of Economics Network Science Leonid E. Zhukov (HSE) Lecture 2 19.01.2016 1 / 31
Transcript
Page 1: Leonid E. Zhukov - Leonid Zhukov · Table of contents 1 Probability basics 2 Power law distribution 3 Scale-free networks 4 Parameter estimation 5 Zipf’s law Leonid E. Zhukov (HSE)

Power laws

Leonid E. Zhukov

School of Data Analysis and Artificial IntelligenceDepartment of Computer Science

National Research University Higher School of Economics

Network Science

Leonid E. Zhukov (HSE) Lecture 2 19.01.2016 1 / 31

Page 2: Leonid E. Zhukov - Leonid Zhukov · Table of contents 1 Probability basics 2 Power law distribution 3 Scale-free networks 4 Parameter estimation 5 Zipf’s law Leonid E. Zhukov (HSE)

Table of contents

1 Probability basics

2 Power law distribution

3 Scale-free networks

4 Parameter estimation

5 Zipf’s law

Leonid E. Zhukov (HSE) Lecture 2 19.01.2016 2 / 31

Page 3: Leonid E. Zhukov - Leonid Zhukov · Table of contents 1 Probability basics 2 Power law distribution 3 Scale-free networks 4 Parameter estimation 5 Zipf’s law Leonid E. Zhukov (HSE)

Continuous distribution

Continuous random variable X

Probability density function p(x) (PDF):

Pr(a ≤ X ≤ b) =

∫ b

ap(x)dx

p(x) ≥ 0∫ ∞−∞

p(x)dx = 1

Cumulative distribution function (CDF)

F (x) = Pr(X ≤ x) =

∫ x

−∞p(x)dx ;

d

dxF (x) = p(x)

Complementary cumulative distribution function (cCDF)

F̄ (x) = Pr(X > x) = 1− F (x) =

∫ ∞x

p(x)dx

Leonid E. Zhukov (HSE) Lecture 2 19.01.2016 3 / 31

Page 4: Leonid E. Zhukov - Leonid Zhukov · Table of contents 1 Probability basics 2 Power law distribution 3 Scale-free networks 4 Parameter estimation 5 Zipf’s law Leonid E. Zhukov (HSE)

Continuous distribution

Gaussian: p(x) = 1σ√

2πe−

(x−µ)2

2σ2 , F (x) = 12 [1 + erf ( x−µ

σ√

2)]

Exponential (x ≥ 0): p(x) = λe−λx , F (x) = 1− e−λx , F̄ (x) = e−λx

Leonid E. Zhukov (HSE) Lecture 2 19.01.2016 4 / 31

Page 5: Leonid E. Zhukov - Leonid Zhukov · Table of contents 1 Probability basics 2 Power law distribution 3 Scale-free networks 4 Parameter estimation 5 Zipf’s law Leonid E. Zhukov (HSE)

Discrete distribution

Discrete random variable Xi

Probability mass function (PMF) p(x):

p(x) = Pr(Xi = x)

p(x) ≥ 0∑x

p(x) = 1

Cumulative distribution function (CDF)

F (x) = Pr(Xi ≤ x) =∑x ′≤x

p(x ′)

Leonid E. Zhukov (HSE) Lecture 2 19.01.2016 5 / 31

Page 6: Leonid E. Zhukov - Leonid Zhukov · Table of contents 1 Probability basics 2 Power law distribution 3 Scale-free networks 4 Parameter estimation 5 Zipf’s law Leonid E. Zhukov (HSE)

Empirical distributions

Newman et.al, 2005

Leonid E. Zhukov (HSE) Lecture 2 19.01.2016 6 / 31

Page 7: Leonid E. Zhukov - Leonid Zhukov · Table of contents 1 Probability basics 2 Power law distribution 3 Scale-free networks 4 Parameter estimation 5 Zipf’s law Leonid E. Zhukov (HSE)

Power Laws

Continuous approximation

Power law

p(x) = Cx−α =C

xα, for x ≥ xmin

Normalization (α > 1)

1 =

∫ ∞xmin

p(x)dx = C

∫ ∞xmin

dx

xα=

C

α− 1x−α+1

min

C = (α− 1)xα−1min

Power law PDF

p(x) =α− 1

xmin

(x

xmin

)−α

Leonid E. Zhukov (HSE) Lecture 2 19.01.2016 7 / 31

Page 8: Leonid E. Zhukov - Leonid Zhukov · Table of contents 1 Probability basics 2 Power law distribution 3 Scale-free networks 4 Parameter estimation 5 Zipf’s law Leonid E. Zhukov (HSE)

Power Laws

poisson: p(k) = λk

k! e−λ, exponent: p(x) = Ce−λx , power law:

p(x) = Cx−α

Leonid E. Zhukov (HSE) Lecture 2 19.01.2016 8 / 31

Page 9: Leonid E. Zhukov - Leonid Zhukov · Table of contents 1 Probability basics 2 Power law distribution 3 Scale-free networks 4 Parameter estimation 5 Zipf’s law Leonid E. Zhukov (HSE)

Power Laws

Power law PDF

p(x) = Cx−α =α− 1

xmin

(x

xmin

)−αComplimentary cumulative distribution function cCDF

F̄ (x) = Pr(X > x) =

∫ ∞x

p(x)dx

F̄ (x) = C̄ x−(α−1) =C

α− 1x−(α−1) =

(x

xmin

)−(α−1)

Leonid E. Zhukov (HSE) Lecture 2 19.01.2016 9 / 31

Page 10: Leonid E. Zhukov - Leonid Zhukov · Table of contents 1 Probability basics 2 Power law distribution 3 Scale-free networks 4 Parameter estimation 5 Zipf’s law Leonid E. Zhukov (HSE)

Power Laws

Power law:p(x) = Cx−α, F̄ (x) = C̄ x−(α−1)

log p(x) = logC − α log x , log F̄ (x) = logC − (α− 1) log x

Leonid E. Zhukov (HSE) Lecture 2 19.01.2016 10 / 31

Page 11: Leonid E. Zhukov - Leonid Zhukov · Table of contents 1 Probability basics 2 Power law distribution 3 Scale-free networks 4 Parameter estimation 5 Zipf’s law Leonid E. Zhukov (HSE)

Empirical distributions

log-log scale

Newman et.al, 2005Leonid E. Zhukov (HSE) Lecture 2 19.01.2016 11 / 31

Page 12: Leonid E. Zhukov - Leonid Zhukov · Table of contents 1 Probability basics 2 Power law distribution 3 Scale-free networks 4 Parameter estimation 5 Zipf’s law Leonid E. Zhukov (HSE)

Moments

PDF

p(x) =C

xα, x ≥ xmin

First moment (mean value), α > 2:

〈x〉 =

∫ ∞xmin

xp(x)dx = C

∫ ∞xmin

dx

xα−1=α− 1

α− 2xmin

Second moment, α > 3:

〈x2〉 =

∫ ∞xmin

x2p(x)dx = C

∫ ∞xmin

dx

xα−2=α− 1

α− 3x2

min

k-th moment, α > k + 1:

〈xk〉 =α− 1

α− 1− kxkmin

Leonid E. Zhukov (HSE) Lecture 2 19.01.2016 12 / 31

Page 13: Leonid E. Zhukov - Leonid Zhukov · Table of contents 1 Probability basics 2 Power law distribution 3 Scale-free networks 4 Parameter estimation 5 Zipf’s law Leonid E. Zhukov (HSE)

Moments

Fisrt moment (mean):

〈x〉 = C

∫ xmax

xmin

dx

xα−1=α− 1

α− 2

(xmin −

xα−1min

xα−2max

)

Clauset et.al, 2009

Leonid E. Zhukov (HSE) Lecture 2 19.01.2016 13 / 31

Page 14: Leonid E. Zhukov - Leonid Zhukov · Table of contents 1 Probability basics 2 Power law distribution 3 Scale-free networks 4 Parameter estimation 5 Zipf’s law Leonid E. Zhukov (HSE)

Scale invariance

Scaling of the density

x → bx , p(bx) = C (bx)−α = b−αCx−α ∝ p(x)

Scale invariancep(100x)

p(10x)=

p(10x)

p(x)

Leonid E. Zhukov (HSE) Lecture 2 19.01.2016 14 / 31

Page 15: Leonid E. Zhukov - Leonid Zhukov · Table of contents 1 Probability basics 2 Power law distribution 3 Scale-free networks 4 Parameter estimation 5 Zipf’s law Leonid E. Zhukov (HSE)

Power law histograms

Newman et.al, 2005

Leonid E. Zhukov (HSE) Lecture 2 19.01.2016 15 / 31

Page 16: Leonid E. Zhukov - Leonid Zhukov · Table of contents 1 Probability basics 2 Power law distribution 3 Scale-free networks 4 Parameter estimation 5 Zipf’s law Leonid E. Zhukov (HSE)

Scale-free networks

Leonid E. Zhukov (HSE) Lecture 2 19.01.2016 16 / 31

Page 17: Leonid E. Zhukov - Leonid Zhukov · Table of contents 1 Probability basics 2 Power law distribution 3 Scale-free networks 4 Parameter estimation 5 Zipf’s law Leonid E. Zhukov (HSE)

Node degree distribution

ki - node degree, i.e. number of nearest neighbors, ki = 1, 2, ...kmax

nk - number of nodes with degree k , nk =∑

i I(ki == k)

total number of nodes n =∑

k nk

Degree distribution P(ki = k) ≡ P(k)

P(k) =nk∑k nk

=nkn

CDF

F (k) =∑k ′≤k

P(k ′) =1

n

∑k ′≤k

nk ′

cCDF

F (k) = 1−∑k ′≤k

P(k ′) =1

n

∑k ′>k

nk ′

Leonid E. Zhukov (HSE) Lecture 2 19.01.2016 17 / 31

Page 18: Leonid E. Zhukov - Leonid Zhukov · Table of contents 1 Probability basics 2 Power law distribution 3 Scale-free networks 4 Parameter estimation 5 Zipf’s law Leonid E. Zhukov (HSE)

Discrete power law distribution

Power law distribution

P(k) = Ck−γ =C

Normalization

∞∑k=1

P(k) = C∞∑k=1

k−γ = Cζ(γ) = 1; C =1

ζ(γ)

Riemann zeta function, γ > 1

P(k) =k−γ

ζ(γ)

Log-log coordinates

log(P(k)) = −γ log k + logC

Leonid E. Zhukov (HSE) Lecture 2 19.01.2016 18 / 31

Page 19: Leonid E. Zhukov - Leonid Zhukov · Table of contents 1 Probability basics 2 Power law distribution 3 Scale-free networks 4 Parameter estimation 5 Zipf’s law Leonid E. Zhukov (HSE)

Power law networks

Probability mass function PMF/mPDF

Leonid E. Zhukov (HSE) Lecture 2 19.01.2016 19 / 31

Page 20: Leonid E. Zhukov - Leonid Zhukov · Table of contents 1 Probability basics 2 Power law distribution 3 Scale-free networks 4 Parameter estimation 5 Zipf’s law Leonid E. Zhukov (HSE)

Power law networks

Complementary cumulative distribution function cCDF

Leonid E. Zhukov (HSE) Lecture 2 19.01.2016 20 / 31

Page 21: Leonid E. Zhukov - Leonid Zhukov · Table of contents 1 Probability basics 2 Power law distribution 3 Scale-free networks 4 Parameter estimation 5 Zipf’s law Leonid E. Zhukov (HSE)

Power law networks

Actor collaboration graph, N=212,250 nodes, 〈k〉 = 28.8, γ = 2.3WWW, N = 325,729 nodes, 〈k〉 = 5.6, γ = 2.1Power grid data, N = 4941 nodes, 〈k〉 = 5.5, γ = 4Barabasi et.al, 1999

Leonid E. Zhukov (HSE) Lecture 2 19.01.2016 21 / 31

Page 22: Leonid E. Zhukov - Leonid Zhukov · Table of contents 1 Probability basics 2 Power law distribution 3 Scale-free networks 4 Parameter estimation 5 Zipf’s law Leonid E. Zhukov (HSE)

Power law networks

In- and out- degrees of WWW crawl 1999Broder et.al, 1999

Leonid E. Zhukov (HSE) Lecture 2 19.01.2016 22 / 31

Page 23: Leonid E. Zhukov - Leonid Zhukov · Table of contents 1 Probability basics 2 Power law distribution 3 Scale-free networks 4 Parameter estimation 5 Zipf’s law Leonid E. Zhukov (HSE)

Parameter estimation: α

Maximum likelihood estimation of parameter α

Let {xi} be a set of n observations (points) independently sampledfrom the distribution

P(xi ) =α− 1

xmin

(xixmin

)−αProbability of the sample

P({xi}|α) =n∏i

α− 1

xmin

(xixmin

)−αBayes’ theorem

P(α|{xi}) = P({xi}|α)P(α)

P({xi})

Leonid E. Zhukov (HSE) Lecture 2 19.01.2016 23 / 31

Page 24: Leonid E. Zhukov - Leonid Zhukov · Table of contents 1 Probability basics 2 Power law distribution 3 Scale-free networks 4 Parameter estimation 5 Zipf’s law Leonid E. Zhukov (HSE)

Maximum likelihood

log-likelihood

L = lnP(α|{xi}) = n ln(α− 1)− n ln xmin − αn∑

i=1

lnxixmin

maximization ∂L∂α = 0

α = 1 + n

[n∑

i=1

lnxixmin

]−1

error estimate

σ =√n

[n∑

i=1

lnxixmin

]−1

=α− 1√

n

Leonid E. Zhukov (HSE) Lecture 2 19.01.2016 24 / 31

Page 25: Leonid E. Zhukov - Leonid Zhukov · Table of contents 1 Probability basics 2 Power law distribution 3 Scale-free networks 4 Parameter estimation 5 Zipf’s law Leonid E. Zhukov (HSE)

Parameter estimation: xmin

Kolmogorov-Smirnov test (compare model and experimental CDF)

D = maxx|F (x |α, xmin)− Fexp(x)|

Clauset et.al, 2009

Leonid E. Zhukov (HSE) Lecture 2 19.01.2016 25 / 31

Page 26: Leonid E. Zhukov - Leonid Zhukov · Table of contents 1 Probability basics 2 Power law distribution 3 Scale-free networks 4 Parameter estimation 5 Zipf’s law Leonid E. Zhukov (HSE)

Empirical models

Clauset et.al, 2009

Leonid E. Zhukov (HSE) Lecture 2 19.01.2016 26 / 31

Page 27: Leonid E. Zhukov - Leonid Zhukov · Table of contents 1 Probability basics 2 Power law distribution 3 Scale-free networks 4 Parameter estimation 5 Zipf’s law Leonid E. Zhukov (HSE)

Word counting

Word frequency table (6318 unique words, min freq 800, corpus size> 85mln):

6187267 the

4239632 be

3093444 of

2687863 and

2186369 a

1924315 in

1620850 to

........

801 incredibly

801 historically

801 decision-making

800 wildly

800 reformer

800 quantum

Leonid E. Zhukov (HSE) Lecture 2 19.01.2016 27 / 31

Page 28: Leonid E. Zhukov - Leonid Zhukov · Table of contents 1 Probability basics 2 Power law distribution 3 Scale-free networks 4 Parameter estimation 5 Zipf’s law Leonid E. Zhukov (HSE)

Zips’f law

Zipf’s law - the frequency of a word in an natural language corpus isinversely proportional to its rank in the frequency table f (k) ∼ 1/k.

f (k) =1/ks∑N

k=1(1/ks)

George Zipf, American linguist, 1935Leonid E. Zhukov (HSE) Lecture 2 19.01.2016 28 / 31

Page 29: Leonid E. Zhukov - Leonid Zhukov · Table of contents 1 Probability basics 2 Power law distribution 3 Scale-free networks 4 Parameter estimation 5 Zipf’s law Leonid E. Zhukov (HSE)

Rank-frequency plot

Sort items by their frequency in decreasing order (frequency table)

Fraction of the words with frequencies higher or equal to the k-thword is cCDF F̄ (k) = Pr(X ≥ k). The number of the words withfrequency above k-th word is its rank k!

Plot word rank as a function of the word frequency: rank k - y axis,frequency - x axis.

Use rank-frequency plot instead of computing and plotting cumulativedistribution of a quantity.

6187267 the

4239632 be

3093444 of

2687863 and

2186369 a

1924315 in

Leonid E. Zhukov (HSE) Lecture 2 19.01.2016 29 / 31

Page 30: Leonid E. Zhukov - Leonid Zhukov · Table of contents 1 Probability basics 2 Power law distribution 3 Scale-free networks 4 Parameter estimation 5 Zipf’s law Leonid E. Zhukov (HSE)

Rank-frequency plot

Leonid E. Zhukov (HSE) Lecture 2 19.01.2016 30 / 31

Page 31: Leonid E. Zhukov - Leonid Zhukov · Table of contents 1 Probability basics 2 Power law distribution 3 Scale-free networks 4 Parameter estimation 5 Zipf’s law Leonid E. Zhukov (HSE)

References

Power laws, Pareto distributions and Zipfs law, M. E. J. Newman,Contemporary Physics, pages 323351, 2005.

Power-Law Distribution in Empirical Data, A. Clauset, C.R. Shalizi,M.E.J. Newman, SIAM Review, Vol 51, No 4, pp. 661-703, 2009.

A Brief History of Generative Models for Power Law and LognormalDistributions, M. Mitzenmacher, Internet Mathematics Vol 1, No 2,pp 226-251.

Leonid E. Zhukov (HSE) Lecture 2 19.01.2016 31 / 31


Recommended