+ All Categories
Home > Documents > Estimating Graph Shweta Jain Properties through › wp-content › uploads › 2018 › 05 ›...

Estimating Graph Shweta Jain Properties through › wp-content › uploads › 2018 › 05 ›...

Date post: 25-Jun-2020
Category:
Upload: others
View: 1 times
Download: 0 times
Share this document with a friend
75
Estimating Graph Properties through Sampling Shweta Jain Advisor: Prof. C. Seshadhri University of California, Santa Cruz 1
Transcript
Page 1: Estimating Graph Shweta Jain Properties through › wp-content › uploads › 2018 › 05 › slides_S… · Shweta Jain Advisor: Prof. C. Seshadhri University of California, Santa

Estimating Graph Properties through Sampling

Shweta JainAdvisor: Prof. C. Seshadhri

University of California,Santa Cruz

1

Page 2: Estimating Graph Shweta Jain Properties through › wp-content › uploads › 2018 › 05 › slides_S… · Shweta Jain Advisor: Prof. C. Seshadhri University of California, Santa

Large GraphsSocial Network

Routing Networks

Protein-Interaction Networks

Citation Networks

2

Page 3: Estimating Graph Shweta Jain Properties through › wp-content › uploads › 2018 › 05 › slides_S… · Shweta Jain Advisor: Prof. C. Seshadhri University of California, Santa

Peculiarities of real-world graphs❖ Degree distribution

❖ Heavy tailed

A: Actor collaboration network, B: WWW, C: Power Grid data [Barabási et. al., 1999]

Source: www.sciencemag.com3

Page 4: Estimating Graph Shweta Jain Properties through › wp-content › uploads › 2018 › 05 › slides_S… · Shweta Jain Advisor: Prof. C. Seshadhri University of California, Santa

Peculiarities of real-world graphs

❖ Counts of patterns: cycles, triangles, cliques

❖ Avg. distance between nodes - small world property❖ High clustering coefficients

3-clique (triangle) 5-clique

4

5-cycle

Page 5: Estimating Graph Shweta Jain Properties through › wp-content › uploads › 2018 › 05 › slides_S… · Shweta Jain Advisor: Prof. C. Seshadhri University of California, Santa

Need for graph sampling

❖ Scale - traditional graph-theoretic algorithms impractical❖ Limitations of access model e.g. streaming❖ Can utilize unique characteristics of real-world graphs

5

Page 6: Estimating Graph Shweta Jain Properties through › wp-content › uploads › 2018 › 05 › slides_S… · Shweta Jain Advisor: Prof. C. Seshadhri University of California, Santa

Goals

❖ Estimate global characteristics from small sample.❖ Fast, work well on real-world instances.❖ Accurate, with provable error bounds.

6

Page 7: Estimating Graph Shweta Jain Properties through › wp-content › uploads › 2018 › 05 › slides_S… · Shweta Jain Advisor: Prof. C. Seshadhri University of California, Santa

Applications

❖ Computationally hard problems - clique counting❖ Restricted access model - estimating the degree

distribution

7

Page 8: Estimating Graph Shweta Jain Properties through › wp-content › uploads › 2018 › 05 › slides_S… · Shweta Jain Advisor: Prof. C. Seshadhri University of California, Santa

A Fast and Provable Method for Estimating Clique Counts using Turán’s Theorem.

Shweta JainC. Seshadhri

University of California,Santa Cruz

WWW 2017 Best Paper

8

Page 9: Estimating Graph Shweta Jain Properties through › wp-content › uploads › 2018 › 05 › slides_S… · Shweta Jain Advisor: Prof. C. Seshadhri University of California, Santa

Cliques❖ k-clique: set of k vertices all connected to each other.

❖ [Holland et. al., 1970], [Milo et. al., 2002], [Burt, 2004], [Przulj et. al., 2004], [Hanneman et. al., 2005], [Hormozdiari et. al., 2007], [Faust, 2010], [Jackson, 2010], [Tsourakakis et. al., 2015], [Sizemore et. al., 2016] - clique counts appear in all these papers.

❖ Used in modeling, community detection, spam detection etc. 9

3-clique (triangle) 5-clique4-clique

Page 10: Estimating Graph Shweta Jain Properties through › wp-content › uploads › 2018 › 05 › slides_S… · Shweta Jain Advisor: Prof. C. Seshadhri University of California, Santa

Problem Statement❖ Given a simple, undirected graph G, and a positive

integer k, estimate the number of k-cliques in G.

10

#5-Cliques = 0123

Page 11: Estimating Graph Shweta Jain Properties through › wp-content › uploads › 2018 › 05 › slides_S… · Shweta Jain Advisor: Prof. C. Seshadhri University of California, Santa

Prior theoretical work

❖ Clique counting:❖ Arboricity and subgraph listing algorithms. [Chiba et. al., 1985]❖ Finding dense subgraphs with size bounds. [Alon et. al., 1994] ❖ Efficient algorithms for clique problems. [Vassilevska, 2009]

❖ Maximal clique counting:❖ Finding all cliques of an undirected graph. [Bron et. al., 1963]❖ Worst case time complexity of generating all maximal cliques. [Tomita

et.al., 2004]❖ Listing all maximal cliques in large sparse real-world graphs.

[Eppstein et. al, 2013]

theo work

11

Page 12: Estimating Graph Shweta Jain Properties through › wp-content › uploads › 2018 › 05 › slides_S… · Shweta Jain Advisor: Prof. C. Seshadhri University of California, Santa

Challenge❖ Combinatorial explosion!

GRAPH VERTICES EDGES 7-CLIQUES 10-CLIQUES

web-BerkStan

0.6M 6M 9T 50000T

as-skitter 2M 11M 73B 22T

com-lj 4M 34M 510T 14000000T

com-orkut 3M 110M 360B 31T

12

Enumeration is costly.Hence, approximate.

Page 13: Estimating Graph Shweta Jain Properties through › wp-content › uploads › 2018 › 05 › slides_S… · Shweta Jain Advisor: Prof. C. Seshadhri University of California, Santa

Practical approaches❖ Practical approaches:

❖ Color Coding [Alon et. al, 1994], [Hormozdiari et. al., 2007], [Betzler et. al., 2011], [Zhao et. al., 2012]

❖ Edge Sampling, GRAFT [Tsourakakis et. al., 2009], [Tsourakakis et. al., 2011], [Rahman et. al., 2014]

❖ MCMC based [Bhuiyan et. al., 2012]

❖ Parallel algorithm using MapReduce [Finocchi et. al., 2015]

❖ kClist [Danisch et. al., 2018]

13

Page 14: Estimating Graph Shweta Jain Properties through › wp-content › uploads › 2018 › 05 › slides_S… · Shweta Jain Advisor: Prof. C. Seshadhri University of California, Santa

Our contribution

❖ We present a randomized algorithm, TuránShadow that approximates the number of k-cliques in G and has the following properties:❖ Runs on a single machine❖ Provable error bounds

14

Page 15: Estimating Graph Shweta Jain Properties through › wp-content › uploads › 2018 › 05 › slides_S… · Shweta Jain Advisor: Prof. C. Seshadhri University of California, Santa

Our contribution❖ Extremely fast and accurate

❖ For 10 cliques, no other method terminated for all graphs in min{100xTuranShadow, 7 hours}!

15

GRAPH 7-CLIQUES TIME ERROR %

web-BerkStan 9.3T < 4 minutes 1.05

as-skitter 73B < 3 minutes 0.23

com-orkut 361B < 2 hours 1.97

Page 16: Estimating Graph Shweta Jain Properties through › wp-content › uploads › 2018 › 05 › slides_S… · Shweta Jain Advisor: Prof. C. Seshadhri University of California, Santa

Main theoremLet S be the Turán k-clique shadow of G. Then w.h.p.

TuránShadow outputs a (1 ± !)-approximation to the number of k-cliques in G.

The running time of TuránShadow is O*(⍺|S|+m+n).⍺: degeneracy

m: #edgesn: #vertices

16

Page 17: Estimating Graph Shweta Jain Properties through › wp-content › uploads › 2018 › 05 › slides_S… · Shweta Jain Advisor: Prof. C. Seshadhri University of California, Santa

Degeneracy

❖ ⍺: degeneracy of graph❖ Measure of density, low for real-world graphs❖ Let T: set of all subgraphs of G❖ Degeneracy = max

t2Tmin

v2t{degree of v in G|t}

17

Page 18: Estimating Graph Shweta Jain Properties through › wp-content › uploads › 2018 › 05 › slides_S… · Shweta Jain Advisor: Prof. C. Seshadhri University of California, Santa

How many edges can a n-vertex graph have without having a triangle?

n2

4Ans:

[Turán, 1941] If the graph has more than edges, then it must have a triangle.

n2

4

18

Page 19: Estimating Graph Shweta Jain Properties through › wp-content › uploads › 2018 › 05 › slides_S… · Shweta Jain Advisor: Prof. C. Seshadhri University of California, Santa

[Erdös, 1941] If the graph has even one more edge than , then it must have triangles.n2

4⌦(n)

density = #edges�n2

19

= 1

2

Thus, if density > , then graph necessarily has

triangles.⌦(n)

1

2

Page 20: Estimating Graph Shweta Jain Properties through › wp-content › uploads › 2018 › 05 › slides_S… · Shweta Jain Advisor: Prof. C. Seshadhri University of California, Santa

Turán’s theoremGeneralizes for larger k.

If a graph on n vertices has density greater than

then it must have

k-cliques.

1� 1

k � 1

⌦(nk�2)

20

Page 21: Estimating Graph Shweta Jain Properties through › wp-content › uploads › 2018 › 05 › slides_S… · Shweta Jain Advisor: Prof. C. Seshadhri University of California, Santa

G

Naïve algorithm

GGG

E[#samples] =

n =k =

1M5

#5-cliques = 100T

21

≅ 1016

Page 22: Estimating Graph Shweta Jain Properties through › wp-content › uploads › 2018 › 05 › slides_S… · Shweta Jain Advisor: Prof. C. Seshadhri University of California, Santa

Key IdeaReal world graphs have dense pockets.

Drill down on dense pockets and count cliques within them!22

Page 23: Estimating Graph Shweta Jain Properties through › wp-content › uploads › 2018 › 05 › slides_S… · Shweta Jain Advisor: Prof. C. Seshadhri University of California, Santa

Turan Shadow

23

G

G1G7

G4

G8 G2

G6 G3

G5

G9

Turán density!

decompose

G-> G1, k1

G2, k2

G3, k3

Page 24: Estimating Graph Shweta Jain Properties through › wp-content › uploads › 2018 › 05 › slides_S… · Shweta Jain Advisor: Prof. C. Seshadhri University of California, Santa

G1

Turan Shadow

C = G1

G7

G4

G8G2

G6 G3

G5

G9

G1G1G1

#samples = 1 0 2 3 0 1

E[#samples] =

n1=k1=

215

<

< Erdös!

24

Page 25: Estimating Graph Shweta Jain Properties through › wp-content › uploads › 2018 › 05 › slides_S… · Shweta Jain Advisor: Prof. C. Seshadhri University of California, Santa

Constructing the shadow

❖ Convert G to a DAG - order by degeneracy❖ Build clique enumeration tree, stopping whenever Turán

density is reached.

25

Page 26: Estimating Graph Shweta Jain Properties through › wp-content › uploads › 2018 › 05 › slides_S… · Shweta Jain Advisor: Prof. C. Seshadhri University of California, Santa

Constructing the shadow

…v1 v2 v3 v4 vn

Convert G to DAG

26

Page 27: Estimating Graph Shweta Jain Properties through › wp-content › uploads › 2018 › 05 › slides_S… · Shweta Jain Advisor: Prof. C. Seshadhri University of California, Santa

Constructing the shadow

…v1 v2 v3 vnv4

v3

v2

v4

Convert G to DAG

Check outnbrhd of v1

27

Page 28: Estimating Graph Shweta Jain Properties through › wp-content › uploads › 2018 › 05 › slides_S… · Shweta Jain Advisor: Prof. C. Seshadhri University of California, Santa

Constructing the shadow

…v1 v2 v3 vnv4

v3

v2

v4

Convert G to DAG

Check outnbrhd of v1: Γ+(v1)Is density > Turán

density (k-1)? Add to TuránShadow

Yes

28

Page 29: Estimating Graph Shweta Jain Properties through › wp-content › uploads › 2018 › 05 › slides_S… · Shweta Jain Advisor: Prof. C. Seshadhri University of California, Santa

Constructing the shadow

…v1 v2 v3 vnv4

v4

Convert G to DAG

Check outnbrhd of v1

Add to TuránShadow

NoExpand further

Is density > Turán density (k-1)?

Add to TuránShadowYes

v4

v2 v3

29

Γ+(v1)⋂Γ+(v2)

Page 30: Estimating Graph Shweta Jain Properties through › wp-content › uploads › 2018 › 05 › slides_S… · Shweta Jain Advisor: Prof. C. Seshadhri University of California, Santa

Sampling

G1(n1, k1) G2(n2, k2) G3(n3, k3) Gl(nl, kl)

Sample leaf i with probability

(

niki)

Pjl

(

njkj)

Randomly sample ki vertices from leaf i

30

Page 31: Estimating Graph Shweta Jain Properties through › wp-content › uploads › 2018 › 05 › slides_S… · Shweta Jain Advisor: Prof. C. Seshadhri University of California, Santa

Sampling

G1(n1, k1) G2(n2, k2) G3(n3, k3) Gl(nl, kl)

Bernoulli r.v. X = 1 if ki-clique, else 0

Exp[X] = #k-cliques in GPjl

(njkj)

31

Page 32: Estimating Graph Shweta Jain Properties through › wp-content › uploads › 2018 › 05 › slides_S… · Shweta Jain Advisor: Prof. C. Seshadhri University of California, Santa

Putting it all together

❖ Construct Turán Shadow❖ Setup distribution over leaves❖ Sample from distribution and scale success ratio

32

Page 33: Estimating Graph Shweta Jain Properties through › wp-content › uploads › 2018 › 05 › slides_S… · Shweta Jain Advisor: Prof. C. Seshadhri University of California, Santa

TuranShadow terminated in minutes for all graphs except com-orkut (3M/100M) for which it took 3 hours.

33

7 and 10 Clique Count Estimation PerformanceTi

me

(s)

0

1

10

100

1,000

10,000

100,000

loc-

gow

web

-Sta

n

amaz

on

yout

ube

Goo

gle

Berk

Stan

as-s

kitte

r

Pate

nts

soc-

poke

c

com

-lj

com

-ork

ut

k=7 k=10

Page 34: Estimating Graph Shweta Jain Properties through › wp-content › uploads › 2018 › 05 › slides_S… · Shweta Jain Advisor: Prof. C. Seshadhri University of California, Santa

3-100x speedup for k=7.

For k=10, no other algorithm terminated for all graphs in min{100x, 7 hours}

34

k = 7Sp

eedu

p

0

1

10

100

1,000

loc-gow

web-Stan

amazon

youtube

Google

BerkStan

as-skitter

Patents

soc-pokec

com-lj

com-orkut

ES GRAFT

Page 35: Estimating Graph Shweta Jain Properties through › wp-content › uploads › 2018 › 05 › slides_S… · Shweta Jain Advisor: Prof. C. Seshadhri University of California, Santa

Size of shadow

105 106 107 108 109 1010

Number of edges

105

106

107

108

109

1010S

hado

wsi

zeShadow size, k=7

Shadow size roughly linear in m.35

Page 36: Estimating Graph Shweta Jain Properties through › wp-content › uploads › 2018 › 05 › slides_S… · Shweta Jain Advisor: Prof. C. Seshadhri University of California, Santa

Less than 2% error with just 50,000 samples.

36

Page 37: Estimating Graph Shweta Jain Properties through › wp-content › uploads › 2018 › 05 › slides_S… · Shweta Jain Advisor: Prof. C. Seshadhri University of California, Santa

Trends in clique countsC

lique

s

1E-011E+01

1E+05

1E+09

1E+13

1E+17

5 6 7 8 9 10

com-ljweb-BerkStancom-orkutas-skitter

com-youtubeamazon0601cit-Patents

Clique Size k

Page 38: Estimating Graph Shweta Jain Properties through › wp-content › uploads › 2018 › 05 › slides_S… · Shweta Jain Advisor: Prof. C. Seshadhri University of California, Santa

What we achieved

❖ We make clique-counting feasible for larger cliques.❖ Single commodity machine. No need to use

MapReduce.❖ Extremely fast and accurate❖ Provable error bounds

38

Page 39: Estimating Graph Shweta Jain Properties through › wp-content › uploads › 2018 › 05 › slides_S… · Shweta Jain Advisor: Prof. C. Seshadhri University of California, Santa

Open Questions

❖ Feasible for cliques of size k > 10?❖ Can we count near-cliques?❖ Can this approach be used for dense subgraph

discovery?

39

Page 40: Estimating Graph Shweta Jain Properties through › wp-content › uploads › 2018 › 05 › slides_S… · Shweta Jain Advisor: Prof. C. Seshadhri University of California, Santa

Thank you

Questions?

40

Page 41: Estimating Graph Shweta Jain Properties through › wp-content › uploads › 2018 › 05 › slides_S… · Shweta Jain Advisor: Prof. C. Seshadhri University of California, Santa

Provable and Practical Approximations For the Degree Distribution using Sublinear Graph Samples*

* Talya and Shweta are equal contributors.

WWW 2018

Talya EdenTel Aviv University

Shweta JainUniversity of California,

Santa Cruz

Ali PinarSandia National Labs

Dana RonTel Aviv University

C. SeshadhriUniversity of California,

Santa Cruz

1

Page 42: Estimating Graph Shweta Jain Properties through › wp-content › uploads › 2018 › 05 › slides_S… · Shweta Jain Advisor: Prof. C. Seshadhri University of California, Santa

Large GraphsSocial Network

Routing Networks

Protein-Interaction Networks

Citation Networks

2

Page 43: Estimating Graph Shweta Jain Properties through › wp-content › uploads › 2018 › 05 › slides_S… · Shweta Jain Advisor: Prof. C. Seshadhri University of California, Santa

❖ Degree(v) = #vertices v is connected to

Degree Distribution

d = 5v

3

Page 44: Estimating Graph Shweta Jain Properties through › wp-content › uploads › 2018 › 05 › slides_S… · Shweta Jain Advisor: Prof. C. Seshadhri University of California, Santa

❖ Degree(v) = #vertices v is connected to

❖ Degree distribution: histogram of number of vertices of a certain degree

Degree Distribution

d = 5v

0

1

2

3

4

1 2 3 4 5

d

#vertices

4

Page 45: Estimating Graph Shweta Jain Properties through › wp-content › uploads › 2018 › 05 › slides_S… · Shweta Jain Advisor: Prof. C. Seshadhri University of California, Santa

Heavy tail

A: Actor collaboration network, B: WWW, C: Power Grid data [Barabási et. al., 1999]Source: www.sciencemag.com

5

Page 46: Estimating Graph Shweta Jain Properties through › wp-content › uploads › 2018 › 05 › slides_S… · Shweta Jain Advisor: Prof. C. Seshadhri University of California, Santa

Why sample❖ If access to whole graph: O(n) algorithm

6

0

1

2

3

4

1 2 3 4 50

1

2

3

4

1 2 3 4 50

1

2

3

4

1 2 3 4 50

1

2

3

4

1 2 3 4 50

1

2

3

4

1 2 3 4 50

1

2

3

4

1 2 3 4 50

1

2

3

4

1 2 3 4 50

1

2

3

4

1 2 3 4 50

1

2

3

4

1 2 3 4 50

1

2

3

4

1 2 3 4 50

1

2

3

4

1 2 3 4 50

1

2

3

4

1 2 3 4 5

d

#vertic

es

Page 47: Estimating Graph Shweta Jain Properties through › wp-content › uploads › 2018 › 05 › slides_S… · Shweta Jain Advisor: Prof. C. Seshadhri University of California, Santa

Why sample❖ But what if we did not have access to whole graph?

❖ Internet, routing networks

❖ Crawl based methods, traceroutes [Faloutsos et. al., 1999]

❖ Contains bias! [Achlioptas et. al., 2009]

❖ Cannot simply scale sample.

❖ [Faloutsos et. al., 1999], [Leskovec et. al., 2006], [Ebbes et. al., 2008][Maiya et. al., 2011], [Ahmed et. al., 2010, 2014] - aim to capture representative graph sample

7

Page 48: Estimating Graph Shweta Jain Properties through › wp-content › uploads › 2018 › 05 › slides_S… · Shweta Jain Advisor: Prof. C. Seshadhri University of California, Santa

Problem Definition❖ ccdh: complementary cumulative degree histogram

❖ N(d) = #vertices with degree >= d❖ monotonically non-increasing, smooth

Can we estimate N(d) for any given d?

8

Page 49: Estimating Graph Shweta Jain Properties through › wp-content › uploads › 2018 › 05 › slides_S… · Shweta Jain Advisor: Prof. C. Seshadhri University of California, Santa

Query Model1. Vertex queries: u.a.r. v ∈ V

9

Can I get a vertex

Here you go!

Can I get a vertex

Here you go!

Page 50: Estimating Graph Shweta Jain Properties through › wp-content › uploads › 2018 › 05 › slides_S… · Shweta Jain Advisor: Prof. C. Seshadhri University of California, Santa

Query Model2. Neighbor queries: u.a.r. neighbor u of v

10

Can I have a neighbor of A

Here you go!

Can I have a neighbor of B

Here you go!

A

B

Page 51: Estimating Graph Shweta Jain Properties through › wp-content › uploads › 2018 › 05 › slides_S… · Shweta Jain Advisor: Prof. C. Seshadhri University of California, Santa

Query Model3. Degree queries: degree dv

11

Can I have the degree of A

4

Can I have the degree of B

9

A

B

Page 52: Estimating Graph Shweta Jain Properties through › wp-content › uploads › 2018 › 05 › slides_S… · Shweta Jain Advisor: Prof. C. Seshadhri University of California, Santa

Query Model1. Vertex queries: u.a.r. v ∈ V

2. Neighbor queries: u.a.r. neighbor u of v

3. Degree queries: degree dv

12

Page 53: Estimating Graph Shweta Jain Properties through › wp-content › uploads › 2018 › 05 › slides_S… · Shweta Jain Advisor: Prof. C. Seshadhri University of California, Santa

Prior work

❖ Vertex sampling [Stumpf et. al., 2005, Lee et. al. 2006]

❖ Edge Sampling [Stumpf et. al., 2005, Lee et. al. 2006]

❖ Random Walk with Jump [Lee et. al. 2006]

❖ Forest Fire Sampling [Faloutsos et. al., 2006]

❖ Snowball Sampling [Maiya et. al., 2011]

❖ Linear system solver [Zhang et. al., 2015]

13

All need to sample at least 10-30% of the graph!

Page 54: Estimating Graph Shweta Jain Properties through › wp-content › uploads › 2018 › 05 › slides_S… · Shweta Jain Advisor: Prof. C. Seshadhri University of California, Santa

Main contribution❖ Randomized algorithm SADDLES that estimates N(d)❖ Uses a sublinear number of queries for any degree distribution

bounded below by a power law.❖ Power Law

exponent number of samples

2 n

3 n❖ Strongly sublinear!

14

1-2

-32

Page 55: Estimating Graph Shweta Jain Properties through › wp-content › uploads › 2018 › 05 › slides_S… · Shweta Jain Advisor: Prof. C. Seshadhri University of California, Santa

Main contribution❖ In practice, we needed to sample only 1% of the graph❖ Works well for all degrees

15

100 101 102 103 104 105

degree d

100

101

102

103

104

105

106

N(d

)web-Google actual

SADDLESVSVS invOWSOWS invFFRWJIN inv

Page 56: Estimating Graph Shweta Jain Properties through › wp-content › uploads › 2018 › 05 › slides_S… · Shweta Jain Advisor: Prof. C. Seshadhri University of California, Santa

Query complexity❖ Depends on 2 parameters:

❖ h-index = mind max(d, N(d))❖ Largest d, such that there are at least d vertices of degree

>= d.❖ Same as the bibliometric h-index!

16d

N(d)

d = N(d)

h

Page 57: Estimating Graph Shweta Jain Properties through › wp-content › uploads › 2018 › 05 › slides_S… · Shweta Jain Advisor: Prof. C. Seshadhri University of California, Santa

Query complexity❖ Depends on 2 parameters:

❖ h-index = mind max(d, N(d))❖ z-index = mind:N (d)>0 sqrt(d·N(d))

❖ replace max by geometric mean❖ h and z are large for power laws!

17

Page 58: Estimating Graph Shweta Jain Properties through › wp-content › uploads › 2018 › 05 › slides_S… · Shweta Jain Advisor: Prof. C. Seshadhri University of California, Santa

Vertex sampling❖ Sample u.a.r. vertices❖ Bin them according to degree❖ Need samples

18

Have to take many samples

to hit highdegree vertex

d d

Page 59: Estimating Graph Shweta Jain Properties through › wp-content › uploads › 2018 › 05 › slides_S… · Shweta Jain Advisor: Prof. C. Seshadhri University of California, Santa

Edge sampling

19

-61

-61

-61-6

1

-61

-61 1

1

11

1

1

Undirected edge -> 2 directed edges

wt((v,u)) =

Page 60: Estimating Graph Shweta Jain Properties through › wp-content › uploads › 2018 › 05 › slides_S… · Shweta Jain Advisor: Prof. C. Seshadhri University of California, Santa

Edge sampling

20

-61

-61

-61-

61

-61

-61 1

1

11

1

1

1Sum of weights of

edges incident on a vertex = 1

Page 61: Estimating Graph Shweta Jain Properties through › wp-content › uploads › 2018 › 05 › slides_S… · Shweta Jain Advisor: Prof. C. Seshadhri University of California, Santa

Edge sampling

21

-61

1

1

11

1

11

Sum of weights of edges incident on a vertex = 1

-61-6

1

-61

-61 -6

1

Page 62: Estimating Graph Shweta Jain Properties through › wp-content › uploads › 2018 › 05 › slides_S… · Shweta Jain Advisor: Prof. C. Seshadhri University of California, Santa

Edge sampling

22

-61

1

1

11

1

11

1

1

1

1

1

1-61

-61

-61 -6

1

Sum of all weights = n

Page 63: Estimating Graph Shweta Jain Properties through › wp-content › uploads › 2018 › 05 › slides_S… · Shweta Jain Advisor: Prof. C. Seshadhri University of California, Santa

Edge sampling

23

-61

0

0

00

0

00

0

0

0

0

0

1-61

-61

-61 -6

1

To get N(d), set weights of

irrelevant edges to 0

Say, d = 5

Sum of all weights = N(d)

Page 64: Estimating Graph Shweta Jain Properties through › wp-content › uploads › 2018 › 05 › slides_S… · Shweta Jain Advisor: Prof. C. Seshadhri University of California, Santa

Edge sampling

24

0 00000 -61 -6

1 -61 -6

1 -61 -6

1

Set of objects, we want their sum

Sample randomly

Take average of sampled weights

Scale by number of edgesto get total sum

-61 -6

1 -61 0

—42/6 x 12 = 1

0

Page 65: Estimating Graph Shweta Jain Properties through › wp-content › uploads › 2018 › 05 › slides_S… · Shweta Jain Advisor: Prof. C. Seshadhri University of California, Santa

Main Idea❖ Combine vertex sampling and edge sampling❖ But we don’t have edge sampling❖ Simulate it!

25

Page 66: Estimating Graph Shweta Jain Properties through › wp-content › uploads › 2018 › 05 › slides_S… · Shweta Jain Advisor: Prof. C. Seshadhri University of California, Santa

Theoretical work❖ Average degree [Feige et. al., 2006], [Goldreich et. al.,

2002, 2008]❖ Number of star graphs, moments [Eden et. al., 2011]❖ Number of triangles [Eden et. al., 2014]

26

Page 67: Estimating Graph Shweta Jain Properties through › wp-content › uploads › 2018 › 05 › slides_S… · Shweta Jain Advisor: Prof. C. Seshadhri University of California, Santa

Simulated Edge Sampling

27

❖ Sample some vertices❖ The neighbors of these vertices is the edge set that

we will perform random sampling on.

Page 68: Estimating Graph Shweta Jain Properties through › wp-content › uploads › 2018 › 05 › slides_S… · Shweta Jain Advisor: Prof. C. Seshadhri University of California, Santa

Simulated Edge Sampling

28

❖ Sample r vertices❖ Set up distribution D to sample

vertex v ∝ dv❖ Repeat q times:

❖ Sample a vertex v from D❖ Sample u.a.r. neighbor u of

v❖ Find average weight of

samples❖ Scale appropriately

u

r vertices

v

Page 69: Estimating Graph Shweta Jain Properties through › wp-content › uploads › 2018 › 05 › slides_S… · Shweta Jain Advisor: Prof. C. Seshadhri University of California, Santa

Putting it all together

29

Sample vertices

Enough vertices withdegree>d found?

Yes Use estimator of vertex sampling

No

Sample edgesand use estimator of

edge sampling

d

d

Page 70: Estimating Graph Shweta Jain Properties through › wp-content › uploads › 2018 › 05 › slides_S… · Shweta Jain Advisor: Prof. C. Seshadhri University of California, Santa

r and q❖ Total samples: ❖ How big do r and q need to be?

❖ If VS: r =❖ If ES: r = ❖ Similarly,

q =

30

degree d vertex

Want at least 1 of its d neighbors to be in R

Page 71: Estimating Graph Shweta Jain Properties through › wp-content › uploads › 2018 › 05 › slides_S… · Shweta Jain Advisor: Prof. C. Seshadhri University of California, Santa

Query complexity❖ Query complexity:

31

❖ Vertex queries: ❖ Neighbor queries:

d

N(d)

d = N(d)

h

Page 72: Estimating Graph Shweta Jain Properties through › wp-content › uploads › 2018 › 05 › slides_S… · Shweta Jain Advisor: Prof. C. Seshadhri University of California, Santa

Simulated Edge Sampling❖ Single edge sample is uniform at random❖ But multiple edge samples are correlated❖ Key insights:

❖ Correlation can be contained if h and z are high. Power laws have high h and z!

❖ 1-hop distance is enough - don’t need to do long random walks

32

Page 73: Estimating Graph Shweta Jain Properties through › wp-content › uploads › 2018 › 05 › slides_S… · Shweta Jain Advisor: Prof. C. Seshadhri University of California, Santa

h and z❖ Indeed large!

33

GRAPH VERTICES EDGES AVG. DEG. h z

web-BerkStan 0.6M 6M 10 707 220

as-skitter 2M 11M 7 982 184

com-lj 4M 34M 9 810 114

com-orkut 3M 110M 38 1638 172

Page 74: Estimating Graph Shweta Jain Properties through › wp-content › uploads › 2018 › 05 › slides_S… · Shweta Jain Advisor: Prof. C. Seshadhri University of California, Santa

Results

34

100 101 102 103 104

degree d

100

101

102

103

104

105

106

107

N(d

)

cit-Patents actualSADDLESVSVS invOWSOWS invFFRWJIN inv

100 101 102 103 104 105

degree d

100

101

102

103

104

105

106

N(d

)

web-Google actualSADDLESVSVS invOWSOWS invFFRWJIN inv

Page 75: Estimating Graph Shweta Jain Properties through › wp-content › uploads › 2018 › 05 › slides_S… · Shweta Jain Advisor: Prof. C. Seshadhri University of California, Santa

Thank you

Questions?

35


Recommended