+ All Categories
Home > Documents > Estimating Sizes of Social Networks via Biased...

Estimating Sizes of Social Networks via Biased...

Date post: 18-Oct-2020
Category:
Upload: others
View: 0 times
Download: 0 times
Share this document with a friend
56
Estimating Sizes of Social Networks via Biased Sampling Liran Katzir, Edo Liberty, and Oren Somekh Yahoo! Labs, Haifa, Israel International World Wide Web Conference, 28th March - 1st April 2011, Hyderabad, India Yahoo! Labs: WWW’2011 1 / 20
Transcript
Page 1: Estimating Sizes of Social Networks via Biased Samplinglirank/...Size-Online-Social-Networks-slides.pdf · Labs, Haifa, Israel International World Wide Web Conference, 28th March

Estimating Sizes of Social Networksvia Biased Sampling

Liran Katzir, Edo Liberty, and Oren Somekh

Yahoo! Labs, Haifa, Israel

International World Wide Web Conference,28th March - 1st April 2011, Hyderabad, India

Yahoo! Labs: WWW’2011 1 / 20

Page 2: Estimating Sizes of Social Networks via Biased Samplinglirank/...Size-Online-Social-Networks-slides.pdf · Labs, Haifa, Israel International World Wide Web Conference, 28th March

Social Network size estimation

Goal:

Obtaining estimates for sizes of (sub)populations in social network.

Why:

Advertisement - estimate of market share.

Business development - merger/acquisition or asset valuation.

Yahoo! Labs: WWW’2011 2 / 20

Page 3: Estimating Sizes of Social Networks via Biased Samplinglirank/...Size-Online-Social-Networks-slides.pdf · Labs, Haifa, Israel International World Wide Web Conference, 28th March

The Problem

Difficulties:

Social network have become pretty big:

Facebook (650,000,000)Qzone (200,000,000)Twitter (175,000,000)...

No public API for population size queries.

What is the total number of registered users?What is the number of registered (self-declared) 20–30 year olds livingin New-York?

Even if a public API is provided an independent estimate is needed.

Exhaustive crawl is time/space/communication intensive and violates“politeness”.

Yahoo! Labs: WWW’2011 3 / 20

Page 4: Estimating Sizes of Social Networks via Biased Samplinglirank/...Size-Online-Social-Networks-slides.pdf · Labs, Haifa, Israel International World Wide Web Conference, 28th March

Population size estimation

Population sizes can be estimated efficiently using the “birthday paradox”.

The “birthday paradox”:

Given r uniform samples from a set of n elements, the expected numberof collisions is r(r−1)

2n .

A collision is a pair of identical samples.

Example:

Samples: X = (d , b, b, a, b, e).Total 3 collisions, (x2, x3), (x2,x5), and (x3,x5).

Yahoo! Labs: WWW’2011 4 / 20

Page 5: Estimating Sizes of Social Networks via Biased Samplinglirank/...Size-Online-Social-Networks-slides.pdf · Labs, Haifa, Israel International World Wide Web Conference, 28th March

Population size estimation

Using the birthday paradox inversely:

When observing C collisions the pouplation can be estimated by

⇒ n ' r2

2C

If r = const ·√n this gives a rather good estimator.

Similar to mark-and-recapture which counts collisions between two samplesets (but is essentially equivalent).

Newer version of mark-and-recapture also handles non-uniform but a-prioryknown distributions [Chao, 1987].

Social network size estimation [Ye and Wu, 2010]

Alas, we cannot sample users uniformly from most social networks...

Yahoo! Labs: WWW’2011 5 / 20

Page 6: Estimating Sizes of Social Networks via Biased Samplinglirank/...Size-Online-Social-Networks-slides.pdf · Labs, Haifa, Israel International World Wide Web Conference, 28th March

Uniform distribution on graphs

Social networks can be viewed as an undirected graph which we cantraverse using their public APIs.

Special random walks can generate close to uniform sampling:

1 Bipartite Query-Web page graph [Bharat and Broder, 1998][Bar-Yossef and Gurevich, 2007].

2 Social network [Gjoka et al, 2010].

Uses only r = const√n samples,

but obtaining each sample might be hard.

Yahoo! Labs: WWW’2011 6 / 20

Page 7: Estimating Sizes of Social Networks via Biased Samplinglirank/...Size-Online-Social-Networks-slides.pdf · Labs, Haifa, Israel International World Wide Web Conference, 28th March

Graph size estimation

It is possible to estimate the size of some graphs directly.

1 Estimate the size of a tree [Knuth, 1974].

2 Estimate the size of a directed acyclic graph [Pitt, 1987].

We give an estimator for the size of undirected graphs (and sub graphs)which:

1 Counts collisions but uses the graph’s stationary distribution.(does not require a uniform sample)

2 Requires asymptotically less than√n samples to converge.

3 Obtains samples efficiently.(provable small number of random walk steps.)

Yahoo! Labs: WWW’2011 7 / 20

Page 8: Estimating Sizes of Social Networks via Biased Samplinglirank/...Size-Online-Social-Networks-slides.pdf · Labs, Haifa, Israel International World Wide Web Conference, 28th March

Assumptions

The graph can be traversed from nodes to neighboring nodes.

We can perform a random walk the graph:

start at any node

In each step, proceed to one of the neighbors uniformly at random.

Yahoo! Labs: WWW’2011 8 / 20

Page 9: Estimating Sizes of Social Networks via Biased Samplinglirank/...Size-Online-Social-Networks-slides.pdf · Labs, Haifa, Israel International World Wide Web Conference, 28th March

Facts about random walks

This random walk yields the stationary distribution.

1 The probability to get the i ’th node is diD .

2 di – i ’th node’s degree.3 D =

∑ni=1 di .

taking a few steps/several walks ensures independence between twoconsecutive samples.

Yahoo! Labs: WWW’2011 9 / 20

Page 10: Estimating Sizes of Social Networks via Biased Samplinglirank/...Size-Online-Social-Networks-slides.pdf · Labs, Haifa, Israel International World Wide Web Conference, 28th March

Algorithm Outline

1 Sample r users using random walk.

2 C – the number of collisions.

3 Ψ1 – the sum of the sampled nodes’ degrees.

4 Ψ−1 – the sum of the inverse sampled nodes’ degrees.

The estimated number of nodes:

n̂ = Ψ1Ψ−1

2C .

Yahoo! Labs: WWW’2011 10 / 20

Page 11: Estimating Sizes of Social Networks via Biased Samplinglirank/...Size-Online-Social-Networks-slides.pdf · Labs, Haifa, Israel International World Wide Web Conference, 28th March

Example

Sampling process:

Sampled Nodes:

d f f c c d

Sampled Node Degree:

3 2 2 4 4 3

C:

0 0 1 1 2 3

Ψ1:

3 5 7 11 15 18

Ψ−1:

1/3 5/6 16/12 19/12 22/12 26/12

n̂:

– – 4 8 6 6

Input social network graph:

Yahoo! Labs: WWW’2011 11 / 20

Page 12: Estimating Sizes of Social Networks via Biased Samplinglirank/...Size-Online-Social-Networks-slides.pdf · Labs, Haifa, Israel International World Wide Web Conference, 28th March

Example

Sampling process:

Sampled Nodes:

d f f c c d

Sampled Node Degree:

3 2 2 4 4 3

C:

0 0 1 1 2 3

Ψ1:

3 5 7 11 15 18

Ψ−1:

1/3 5/6 16/12 19/12 22/12 26/12

n̂:

– – 4 8 6 6

Input social network graph:

Yahoo! Labs: WWW’2011 11 / 20

Page 13: Estimating Sizes of Social Networks via Biased Samplinglirank/...Size-Online-Social-Networks-slides.pdf · Labs, Haifa, Israel International World Wide Web Conference, 28th March

Example

Sampling process:

Sampled Nodes:

d f f c c d

Sampled Node Degree:

3 2 2 4 4 3

C:

0 0 1 1 2 3

Ψ1:

3 5 7 11 15 18

Ψ−1:

1/3 5/6 16/12 19/12 22/12 26/12

n̂:

– – 4 8 6 6

Input social network graph:

Yahoo! Labs: WWW’2011 11 / 20

Page 14: Estimating Sizes of Social Networks via Biased Samplinglirank/...Size-Online-Social-Networks-slides.pdf · Labs, Haifa, Israel International World Wide Web Conference, 28th March

Example

Sampling process:

Sampled Nodes:

d f f c c d

Sampled Node Degree:

3 2 2 4 4 3

C:

0 0 1 1 2 3

Ψ1:

3 5 7 11 15 18

Ψ−1:

1/3 5/6 16/12 19/12 22/12 26/12

n̂:

– – 4 8 6 6

Input social network graph:

Yahoo! Labs: WWW’2011 11 / 20

Page 15: Estimating Sizes of Social Networks via Biased Samplinglirank/...Size-Online-Social-Networks-slides.pdf · Labs, Haifa, Israel International World Wide Web Conference, 28th March

Example

Sampling process:

Sampled Nodes:

d f f c c d

Sampled Node Degree:

3 2 2 4 4 3

C:

0 0 1 1 2 3

Ψ1:

3 5 7 11 15 18

Ψ−1:

1/3 5/6 16/12 19/12 22/12 26/12

n̂:

– – 4 8 6 6

Input social network graph:

Yahoo! Labs: WWW’2011 11 / 20

Page 16: Estimating Sizes of Social Networks via Biased Samplinglirank/...Size-Online-Social-Networks-slides.pdf · Labs, Haifa, Israel International World Wide Web Conference, 28th March

Example

Sampling process:

Sampled Nodes:

d f f c c d

Sampled Node Degree:

3 2 2 4 4 3

C:

0 0 1 1 2 3

Ψ1:

3 5 7 11 15 18

Ψ−1:

1/3 5/6 16/12 19/12 22/12 26/12

n̂:

– – 4 8 6 6

Input social network graph:

Yahoo! Labs: WWW’2011 11 / 20

Page 17: Estimating Sizes of Social Networks via Biased Samplinglirank/...Size-Online-Social-Networks-slides.pdf · Labs, Haifa, Israel International World Wide Web Conference, 28th March

Example

Sampling process:

Sampled Nodes: d

f f c c d

Sampled Node Degree: 3

2 2 4 4 3

C: 0

0 1 1 2 3

Ψ1: 3

5 7 11 15 18

Ψ−1: 1/3

5/6 16/12 19/12 22/12 26/12

n̂: –

– 4 8 6 6

Input social network graph:

Yahoo! Labs: WWW’2011 11 / 20

Page 18: Estimating Sizes of Social Networks via Biased Samplinglirank/...Size-Online-Social-Networks-slides.pdf · Labs, Haifa, Israel International World Wide Web Conference, 28th March

Example

Sampling process:

Sampled Nodes: d

f f c c d

Sampled Node Degree: 3

2 2 4 4 3

C: 0

0 1 1 2 3

Ψ1: 3

5 7 11 15 18

Ψ−1: 1/3

5/6 16/12 19/12 22/12 26/12

n̂: –

– 4 8 6 6

Input social network graph:

Yahoo! Labs: WWW’2011 11 / 20

Page 19: Estimating Sizes of Social Networks via Biased Samplinglirank/...Size-Online-Social-Networks-slides.pdf · Labs, Haifa, Israel International World Wide Web Conference, 28th March

Example

Sampling process:

Sampled Nodes: d

f f c c d

Sampled Node Degree: 3

2 2 4 4 3

C: 0

0 1 1 2 3

Ψ1: 3

5 7 11 15 18

Ψ−1: 1/3

5/6 16/12 19/12 22/12 26/12

n̂: –

– 4 8 6 6

Input social network graph:

Yahoo! Labs: WWW’2011 11 / 20

Page 20: Estimating Sizes of Social Networks via Biased Samplinglirank/...Size-Online-Social-Networks-slides.pdf · Labs, Haifa, Israel International World Wide Web Conference, 28th March

Example

Sampling process:

Sampled Nodes: d

f f c c d

Sampled Node Degree: 3

2 2 4 4 3

C: 0

0 1 1 2 3

Ψ1: 3

5 7 11 15 18

Ψ−1: 1/3

5/6 16/12 19/12 22/12 26/12

n̂: –

– 4 8 6 6

Input social network graph:

Yahoo! Labs: WWW’2011 11 / 20

Page 21: Estimating Sizes of Social Networks via Biased Samplinglirank/...Size-Online-Social-Networks-slides.pdf · Labs, Haifa, Israel International World Wide Web Conference, 28th March

Example

Sampling process:

Sampled Nodes: d

f f c c d

Sampled Node Degree: 3

2 2 4 4 3

C: 0

0 1 1 2 3

Ψ1: 3

5 7 11 15 18

Ψ−1: 1/3

5/6 16/12 19/12 22/12 26/12

n̂: –

– 4 8 6 6

Input social network graph:

Yahoo! Labs: WWW’2011 11 / 20

Page 22: Estimating Sizes of Social Networks via Biased Samplinglirank/...Size-Online-Social-Networks-slides.pdf · Labs, Haifa, Israel International World Wide Web Conference, 28th March

Example

Sampling process:

Sampled Nodes: d

f f c c d

Sampled Node Degree: 3

2 2 4 4 3

C: 0

0 1 1 2 3

Ψ1: 3

5 7 11 15 18

Ψ−1: 1/3

5/6 16/12 19/12 22/12 26/12

n̂: –

– 4 8 6 6

Input social network graph:

Yahoo! Labs: WWW’2011 11 / 20

Page 23: Estimating Sizes of Social Networks via Biased Samplinglirank/...Size-Online-Social-Networks-slides.pdf · Labs, Haifa, Israel International World Wide Web Conference, 28th March

Example

Sampling process:

Sampled Nodes: d f

f c c d

Sampled Node Degree: 3 2

2 4 4 3

C: 0 0

1 1 2 3

Ψ1: 3 5

7 11 15 18

Ψ−1: 1/3 5/6

16/12 19/12 22/12 26/12

n̂: – –

4 8 6 6

Input social network graph:

Yahoo! Labs: WWW’2011 11 / 20

Page 24: Estimating Sizes of Social Networks via Biased Samplinglirank/...Size-Online-Social-Networks-slides.pdf · Labs, Haifa, Israel International World Wide Web Conference, 28th March

Example

Sampling process:

Sampled Nodes: d f

f c c d

Sampled Node Degree: 3 2

2 4 4 3

C: 0 0

1 1 2 3

Ψ1: 3 5

7 11 15 18

Ψ−1: 1/3 5/6

16/12 19/12 22/12 26/12

n̂: – –

4 8 6 6

Input social network graph:

Yahoo! Labs: WWW’2011 11 / 20

Page 25: Estimating Sizes of Social Networks via Biased Samplinglirank/...Size-Online-Social-Networks-slides.pdf · Labs, Haifa, Israel International World Wide Web Conference, 28th March

Example

Sampling process:

Sampled Nodes: d f

f c c d

Sampled Node Degree: 3 2

2 4 4 3

C: 0 0

1 1 2 3

Ψ1: 3 5

7 11 15 18

Ψ−1: 1/3 5/6

16/12 19/12 22/12 26/12

n̂: – –

4 8 6 6

Input social network graph:

Yahoo! Labs: WWW’2011 11 / 20

Page 26: Estimating Sizes of Social Networks via Biased Samplinglirank/...Size-Online-Social-Networks-slides.pdf · Labs, Haifa, Israel International World Wide Web Conference, 28th March

Example

Sampling process:

Sampled Nodes: d f

f c c d

Sampled Node Degree: 3 2

2 4 4 3

C: 0 0

1 1 2 3

Ψ1: 3 5

7 11 15 18

Ψ−1: 1/3 5/6

16/12 19/12 22/12 26/12

n̂: – –

4 8 6 6

Input social network graph:

Yahoo! Labs: WWW’2011 11 / 20

Page 27: Estimating Sizes of Social Networks via Biased Samplinglirank/...Size-Online-Social-Networks-slides.pdf · Labs, Haifa, Israel International World Wide Web Conference, 28th March

Example

Sampling process:

Sampled Nodes: d f

f c c d

Sampled Node Degree: 3 2

2 4 4 3

C: 0 0

1 1 2 3

Ψ1: 3 5

7 11 15 18

Ψ−1: 1/3 5/6

16/12 19/12 22/12 26/12

n̂: – –

4 8 6 6

Input social network graph:

Yahoo! Labs: WWW’2011 11 / 20

Page 28: Estimating Sizes of Social Networks via Biased Samplinglirank/...Size-Online-Social-Networks-slides.pdf · Labs, Haifa, Israel International World Wide Web Conference, 28th March

Example

Sampling process:

Sampled Nodes: d f

f c c d

Sampled Node Degree: 3 2

2 4 4 3

C: 0 0

1 1 2 3

Ψ1: 3 5

7 11 15 18

Ψ−1: 1/3 5/6

16/12 19/12 22/12 26/12

n̂: – –

4 8 6 6

Input social network graph:

Yahoo! Labs: WWW’2011 11 / 20

Page 29: Estimating Sizes of Social Networks via Biased Samplinglirank/...Size-Online-Social-Networks-slides.pdf · Labs, Haifa, Israel International World Wide Web Conference, 28th March

Example

Sampling process:

Sampled Nodes: d f f

c c d

Sampled Node Degree: 3 2 2

4 4 3

C: 0 0 1

1 2 3

Ψ1: 3 5 7

11 15 18

Ψ−1: 1/3 5/6 16/12

19/12 22/12 26/12

n̂: – – 4

8 6 6

Input social network graph:

Yahoo! Labs: WWW’2011 11 / 20

Page 30: Estimating Sizes of Social Networks via Biased Samplinglirank/...Size-Online-Social-Networks-slides.pdf · Labs, Haifa, Israel International World Wide Web Conference, 28th March

Example

Sampling process:

Sampled Nodes: d f f

c c d

Sampled Node Degree: 3 2 2

4 4 3

C: 0 0 1

1 2 3

Ψ1: 3 5 7

11 15 18

Ψ−1: 1/3 5/6 16/12

19/12 22/12 26/12

n̂: – – 4

8 6 6

Input social network graph:

Yahoo! Labs: WWW’2011 11 / 20

Page 31: Estimating Sizes of Social Networks via Biased Samplinglirank/...Size-Online-Social-Networks-slides.pdf · Labs, Haifa, Israel International World Wide Web Conference, 28th March

Example

Sampling process:

Sampled Nodes: d f f

c c d

Sampled Node Degree: 3 2 2

4 4 3

C: 0 0 1

1 2 3

Ψ1: 3 5 7

11 15 18

Ψ−1: 1/3 5/6 16/12

19/12 22/12 26/12

n̂: – – 4

8 6 6

Input social network graph:

Yahoo! Labs: WWW’2011 11 / 20

Page 32: Estimating Sizes of Social Networks via Biased Samplinglirank/...Size-Online-Social-Networks-slides.pdf · Labs, Haifa, Israel International World Wide Web Conference, 28th March

Example

Sampling process:

Sampled Nodes: d f f

c c d

Sampled Node Degree: 3 2 2

4 4 3

C: 0 0 1

1 2 3

Ψ1: 3 5 7

11 15 18

Ψ−1: 1/3 5/6 16/12

19/12 22/12 26/12

n̂: – – 4

8 6 6

Input social network graph:

Yahoo! Labs: WWW’2011 11 / 20

Page 33: Estimating Sizes of Social Networks via Biased Samplinglirank/...Size-Online-Social-Networks-slides.pdf · Labs, Haifa, Israel International World Wide Web Conference, 28th March

Example

Sampling process:

Sampled Nodes: d f f

c c d

Sampled Node Degree: 3 2 2

4 4 3

C: 0 0 1

1 2 3

Ψ1: 3 5 7

11 15 18

Ψ−1: 1/3 5/6 16/12

19/12 22/12 26/12

n̂: – – 4

8 6 6

Input social network graph:

Yahoo! Labs: WWW’2011 11 / 20

Page 34: Estimating Sizes of Social Networks via Biased Samplinglirank/...Size-Online-Social-Networks-slides.pdf · Labs, Haifa, Israel International World Wide Web Conference, 28th March

Example

Sampling process:

Sampled Nodes: d f f

c c d

Sampled Node Degree: 3 2 2

4 4 3

C: 0 0 1

1 2 3

Ψ1: 3 5 7

11 15 18

Ψ−1: 1/3 5/6 16/12

19/12 22/12 26/12

n̂: – – 4

8 6 6

Input social network graph:

Yahoo! Labs: WWW’2011 11 / 20

Page 35: Estimating Sizes of Social Networks via Biased Samplinglirank/...Size-Online-Social-Networks-slides.pdf · Labs, Haifa, Israel International World Wide Web Conference, 28th March

Example

Sampling process:

Sampled Nodes: d f f c

c d

Sampled Node Degree: 3 2 2 4

4 3

C: 0 0 1 1

2 3

Ψ1: 3 5 7 11

15 18

Ψ−1: 1/3 5/6 16/12 19/12

22/12 26/12

n̂: – – 4 8

6 6

Input social network graph:

Yahoo! Labs: WWW’2011 11 / 20

Page 36: Estimating Sizes of Social Networks via Biased Samplinglirank/...Size-Online-Social-Networks-slides.pdf · Labs, Haifa, Israel International World Wide Web Conference, 28th March

Example

Sampling process:

Sampled Nodes: d f f c

c d

Sampled Node Degree: 3 2 2 4

4 3

C: 0 0 1 1

2 3

Ψ1: 3 5 7 11

15 18

Ψ−1: 1/3 5/6 16/12 19/12

22/12 26/12

n̂: – – 4 8

6 6

Input social network graph:

Yahoo! Labs: WWW’2011 11 / 20

Page 37: Estimating Sizes of Social Networks via Biased Samplinglirank/...Size-Online-Social-Networks-slides.pdf · Labs, Haifa, Israel International World Wide Web Conference, 28th March

Example

Sampling process:

Sampled Nodes: d f f c

c d

Sampled Node Degree: 3 2 2 4

4 3

C: 0 0 1 1

2 3

Ψ1: 3 5 7 11

15 18

Ψ−1: 1/3 5/6 16/12 19/12

22/12 26/12

n̂: – – 4 8

6 6

Input social network graph:

Yahoo! Labs: WWW’2011 11 / 20

Page 38: Estimating Sizes of Social Networks via Biased Samplinglirank/...Size-Online-Social-Networks-slides.pdf · Labs, Haifa, Israel International World Wide Web Conference, 28th March

Example

Sampling process:

Sampled Nodes: d f f c

c d

Sampled Node Degree: 3 2 2 4

4 3

C: 0 0 1 1

2 3

Ψ1: 3 5 7 11

15 18

Ψ−1: 1/3 5/6 16/12 19/12

22/12 26/12

n̂: – – 4 8

6 6

Input social network graph:

Yahoo! Labs: WWW’2011 11 / 20

Page 39: Estimating Sizes of Social Networks via Biased Samplinglirank/...Size-Online-Social-Networks-slides.pdf · Labs, Haifa, Israel International World Wide Web Conference, 28th March

Example

Sampling process:

Sampled Nodes: d f f c

c d

Sampled Node Degree: 3 2 2 4

4 3

C: 0 0 1 1

2 3

Ψ1: 3 5 7 11

15 18

Ψ−1: 1/3 5/6 16/12 19/12

22/12 26/12

n̂: – – 4 8

6 6

Input social network graph:

Yahoo! Labs: WWW’2011 11 / 20

Page 40: Estimating Sizes of Social Networks via Biased Samplinglirank/...Size-Online-Social-Networks-slides.pdf · Labs, Haifa, Israel International World Wide Web Conference, 28th March

Example

Sampling process:

Sampled Nodes: d f f c

c d

Sampled Node Degree: 3 2 2 4

4 3

C: 0 0 1 1

2 3

Ψ1: 3 5 7 11

15 18

Ψ−1: 1/3 5/6 16/12 19/12

22/12 26/12

n̂: – – 4 8

6 6

Input social network graph:

Yahoo! Labs: WWW’2011 11 / 20

Page 41: Estimating Sizes of Social Networks via Biased Samplinglirank/...Size-Online-Social-Networks-slides.pdf · Labs, Haifa, Israel International World Wide Web Conference, 28th March

Example

Sampling process:

Sampled Nodes: d f f c c

d

Sampled Node Degree: 3 2 2 4 4

3

C: 0 0 1 1 2

3

Ψ1: 3 5 7 11 15

18

Ψ−1: 1/3 5/6 16/12 19/12 22/12

26/12

n̂: – – 4 8 6

6

Input social network graph:

Yahoo! Labs: WWW’2011 11 / 20

Page 42: Estimating Sizes of Social Networks via Biased Samplinglirank/...Size-Online-Social-Networks-slides.pdf · Labs, Haifa, Israel International World Wide Web Conference, 28th March

Example

Sampling process:

Sampled Nodes: d f f c c

d

Sampled Node Degree: 3 2 2 4 4

3

C: 0 0 1 1 2

3

Ψ1: 3 5 7 11 15

18

Ψ−1: 1/3 5/6 16/12 19/12 22/12

26/12

n̂: – – 4 8 6

6

Input social network graph:

Yahoo! Labs: WWW’2011 11 / 20

Page 43: Estimating Sizes of Social Networks via Biased Samplinglirank/...Size-Online-Social-Networks-slides.pdf · Labs, Haifa, Israel International World Wide Web Conference, 28th March

Example

Sampling process:

Sampled Nodes: d f f c c

d

Sampled Node Degree: 3 2 2 4 4

3

C: 0 0 1 1 2

3

Ψ1: 3 5 7 11 15

18

Ψ−1: 1/3 5/6 16/12 19/12 22/12

26/12

n̂: – – 4 8 6

6

Input social network graph:

Yahoo! Labs: WWW’2011 11 / 20

Page 44: Estimating Sizes of Social Networks via Biased Samplinglirank/...Size-Online-Social-Networks-slides.pdf · Labs, Haifa, Israel International World Wide Web Conference, 28th March

Example

Sampling process:

Sampled Nodes: d f f c c

d

Sampled Node Degree: 3 2 2 4 4

3

C: 0 0 1 1 2

3

Ψ1: 3 5 7 11 15

18

Ψ−1: 1/3 5/6 16/12 19/12 22/12

26/12

n̂: – – 4 8 6

6

Input social network graph:

Yahoo! Labs: WWW’2011 11 / 20

Page 45: Estimating Sizes of Social Networks via Biased Samplinglirank/...Size-Online-Social-Networks-slides.pdf · Labs, Haifa, Israel International World Wide Web Conference, 28th March

Example

Sampling process:

Sampled Nodes: d f f c c

d

Sampled Node Degree: 3 2 2 4 4

3

C: 0 0 1 1 2

3

Ψ1: 3 5 7 11 15

18

Ψ−1: 1/3 5/6 16/12 19/12 22/12

26/12

n̂: – – 4 8 6

6

Input social network graph:

Yahoo! Labs: WWW’2011 11 / 20

Page 46: Estimating Sizes of Social Networks via Biased Samplinglirank/...Size-Online-Social-Networks-slides.pdf · Labs, Haifa, Israel International World Wide Web Conference, 28th March

Example

Sampling process:

Sampled Nodes: d f f c c

d

Sampled Node Degree: 3 2 2 4 4

3

C: 0 0 1 1 2

3

Ψ1: 3 5 7 11 15

18

Ψ−1: 1/3 5/6 16/12 19/12 22/12

26/12

n̂: – – 4 8 6

6

Input social network graph:

Yahoo! Labs: WWW’2011 11 / 20

Page 47: Estimating Sizes of Social Networks via Biased Samplinglirank/...Size-Online-Social-Networks-slides.pdf · Labs, Haifa, Israel International World Wide Web Conference, 28th March

Example

Sampling process:

Sampled Nodes: d f f c c d

Sampled Node Degree: 3 2 2 4 4 3

C: 0 0 1 1 2 3

Ψ1: 3 5 7 11 15 18

Ψ−1: 1/3 5/6 16/12 19/12 22/12 26/12

n̂: – – 4 8 6 6

Input social network graph:

Yahoo! Labs: WWW’2011 11 / 20

Page 48: Estimating Sizes of Social Networks via Biased Samplinglirank/...Size-Online-Social-Networks-slides.pdf · Labs, Haifa, Israel International World Wide Web Conference, 28th March

Proof Intuition

Notations:

n – the graph size, r – number of samples

di – node i degree, D =∑n

i=1 di

Expectations:

E [Ψ1] = rD∑n

i=1

(diD

)2, E [Ψ−1] = rn

D

E [C ] =(r

2

)∑ni=1

(diD

)2.

E [Ψ1]E [Ψ−1]2E [C ] = n r

r−1 ' n.

n̂ =Ψ1Ψ−1

2C' E [Ψ1]E [Ψ−1]

2E [C ]' n

Yahoo! Labs: WWW’2011 12 / 20

Page 49: Estimating Sizes of Social Networks via Biased Samplinglirank/...Size-Online-Social-Networks-slides.pdf · Labs, Haifa, Israel International World Wide Web Conference, 28th March

Analytic Results

Main statement:

Using r(n, ε, δ) samples: Pr[n(1− ε) ≤ n̂ ≤ n(1 + ε)] ≥ 1− δ

Uniform vs Biased:

Sampling method Number of samples

Any graph, uniform O(√n)

Synthetic graph, Zipfiandegree distribution O( 4

√n log n)

α = 2, dm =√n,

random walk

Example – n = 109

√n ≈ 30, 000.

4√n log n ≈ 6, 000.

Yahoo! Labs: WWW’2011 13 / 20

Page 50: Estimating Sizes of Social Networks via Biased Samplinglirank/...Size-Online-Social-Networks-slides.pdf · Labs, Haifa, Israel International World Wide Web Conference, 28th March

Setup

Networks of known sizes:

Network Size Edges

Synthetic 1,000,000 Zipfian α = 2, dm = 1000

DBLP 845,211 co-authorship

IMDB 1,955,508 co-casting

Yahoo! Labs: WWW’2011 14 / 20

Page 51: Estimating Sizes of Social Networks via Biased Samplinglirank/...Size-Online-Social-Networks-slides.pdf · Labs, Haifa, Israel International World Wide Web Conference, 28th March

A Synthetic Network, Degree Zipfian α = 2,dm = 1000

0 0.5 1 1.5 2 2.5

0.8

1

1.2

1.4

1.6

1.8

2

2.2Synthetic network − Confidence interval

Number of samples [Percentage of network size]

Siz

e es

timat

ion

[Rel

ativ

e to

net

wor

k si

ze]

Unif. dist. − non−unique 95%Deg. dist. − non−unique 95%Deg. dist. − non−unique 5%Unif. dist. − non−unique 5%

Yahoo! Labs: WWW’2011 15 / 20

Page 52: Estimating Sizes of Social Networks via Biased Samplinglirank/...Size-Online-Social-Networks-slides.pdf · Labs, Haifa, Israel International World Wide Web Conference, 28th March

DBLP - The Digital Bibliography and Library Project

0 0.5 1 1.5 2 2.5 3 3.50.5

1

1.5

2

2.5

3DBLP network − Confidence interval

Number of samples [Percentage of network size]

Siz

e es

timat

ion

[Rel

ativ

e to

net

wor

k si

ze]

Unif. dist. − non−unique 95%Deg. dist. − non−unique 95%Deg. dist. − non−unique 5%Unif. dist. − non−unique 5%

Yahoo! Labs: WWW’2011 16 / 20

Page 53: Estimating Sizes of Social Networks via Biased Samplinglirank/...Size-Online-Social-Networks-slides.pdf · Labs, Haifa, Israel International World Wide Web Conference, 28th March

IMDB - The Internet Movie Database

0 0.5 1 1.5 20.5

1

1.5

2

2.5

3IMDB − Confidence interval

Number of samples [Percentage of network size]

Siz

e es

timat

ion

[Rel

ativ

e to

net

wor

k si

ze]

Unif. dist. − non−unique 95%Deg. dist. − non−unique 95%Deg. dist. − non−unique 5%Unif. dist. − non−unique 5%

Yahoo! Labs: WWW’2011 17 / 20

Page 54: Estimating Sizes of Social Networks via Biased Samplinglirank/...Size-Online-Social-Networks-slides.pdf · Labs, Haifa, Israel International World Wide Web Conference, 28th March

Facebook

Date April 2009 October 2010

Sampling method uniform random walk

Number of samples 0.98 · 106 1 · 106

Collision estimator 237 · 106 475 · 106

Facebook report 200− 250 · 106 500 · 106

Thanks to Minas Gjoka for the Facebook crawls.

Yahoo! Labs: WWW’2011 18 / 20

Page 55: Estimating Sizes of Social Networks via Biased Samplinglirank/...Size-Online-Social-Networks-slides.pdf · Labs, Haifa, Israel International World Wide Web Conference, 28th March

Conclusions

An efficient algorithm to estimate the size of a social network usingpublic API was presented.

Its effectiveness was demonstrated on synthetic and real worldnetworks.

This algorithm outperforms prior art methods by using biasedsampling.

This algorithm also applies for sub-populations.

Yahoo! Labs: WWW’2011 19 / 20

Page 56: Estimating Sizes of Social Networks via Biased Samplinglirank/...Size-Online-Social-Networks-slides.pdf · Labs, Haifa, Israel International World Wide Web Conference, 28th March

Thanks!

Yahoo! Labs: WWW’2011 20 / 20


Recommended