Christos Gkantsidis, Milena Mihail, Amin Saberi Presented by Paul Bogdan February 28 th , 2007

Post on 20-Jan-2016

21 views 0 download

description

“Hybrid Search Schemes for Unstructured Peer-to-Peer Networks” “Random Walks in Peer-to-Peer Networks”. Christos Gkantsidis, Milena Mihail, Amin Saberi Presented by Paul Bogdan February 28 th , 2007. “Hybrid Search Schemes for Unstructured Peer-to-Peer Networks”. - PowerPoint PPT Presentation

transcript

1

“Hybrid Search Schemes for Unstructured Peer-to-Peer Networks”

“Random Walks in Peer-to-Peer Networks”

Christos Gkantsidis, Milena Mihail, Amin Saberi

Presented by Paul Bogdan

February 28th, 2007

2

“Hybrid Search Schemes for Unstructured Peer-to-Peer Networks”

Christos Gkantsidis, Milena Mihail, Amin Saberi

3

Outline

• Random Graph Models

• Flooding and Normalization

• Random Walks and Replication

• Generalized Search Schemes

• Experimental evaluation

4

Motivation• Flooding + small time-to-live (TTL) performs well in regular graphs

• Performance metric: number of exchanged messages/distinct response• Its performance decreases: when TTL increases or for irregular networks

• Random Walk performs better than flooding• scalability, granularity

• Hybrid + Generalized search schemes: • Random Walks with lookahead, Random Walks with 1-step replication

5

Contribution• Random walks (RW) with shallow flooding offer

good performance (analytic justification)R1: In a random graph model with O(n) nodes of constant degree and O(n1/2) nodes of degree O(n1/2) the expected time to discover Ω(n) is O(n1/2).R2: Random Walks with look-ahead 1 or 1-step replication perform better when there is discrepancy on the degrees of the underlying topology.

• Normalized Flooding (NF) solutionR3: NF achieves comparable performance to flooding in regular graphs. R4: NF with 1-step replication achieves performance comparable to RW with 1-step replication. R5: Local information of the network (nodes degree) offers global benefit.

• Generalized Search Schemes

6

Random Graph Models

• Random Regular Graphs – Gn,d

Gn,d represents a graph with n nodes and each node is of degree d.

Gn,d has a sum of degree D = nd .

• Random Graphs with super-nodes - Gn,d,α,β

Given α and β constants, Gn,d,α,β denotes a graphs with αn1/2 of degree βn1/2 (i.e. large vertices) and the remaining nodes of degree d (i.e. small vertices).

Gn,d,α,β has a sum of degree D = (αβ+d)n.

7

Flooding and Normalization• Theorem 3.1.: Let us consider Gn,d random regular graph, flooding scenario

from node v with time-to-live τ, S – the number of distinct nodes queried by flooding with |S| ≤ |V| / 2

Claims:

(1)

(2)

(3)

d

-Od-

τ-d-S)(d

121

1

111log2

log

least at is message /responsesdistinct of number the

and is responsesdistinct of number the For

11

1

121

1

122

1

d

S

d-O

d-

τε

d-OSεVεSS,

least at is message / responsesdistinct of number the

and 411 is responsesdistinct of number theany For

2V

S , S

s. a. least at is message /responsesdistinct of number the

is responses of number the , ,any For

8

(1)• Proof:

d

-Od-

τ-d-S)(d

121

1

111log2

log

least at is message per responsesdistinct of number the

and is responsesdistinct of number the For

dO

d

vS

dd

vS

d-OSG

dd

dd

dvSndiv

nOdvSnndiv

vS

n,d

i

i

i

i

i

i

i

12

1

1

vS1

1

1

vS1

vS is messageper

responsesdistinct ofnumber theand S1

1 have wegraph random aFor

12

111vS is TTL with received responsesdistinct ofnumber The

1 have we1 with allfor and verticesallfor Similarly,

1y probabilit with 1 ,log1 with allfor and verticesallFor

vSvS is TTL with received responsesdistinct ofnumber The

1

1

1

11

1-

0i

11-

0ii

2

1

22

1

11

9

(2)• Proof:

dO

d

vS

dd

vS

d-OSG

dd

dd

dvSndiv

nOdvSnndiv

vS

n,d

i

i

i

i

i

i

i

12

1

1

vS1

1

1

vS1

vS is messageper

responsesdistinct ofnumber theand S1

1 have wegraph random aFor

12

111vS is TTL with received responsesdistinct ofnumber The

1 have we1 with allfor and verticesallfor Similarly,

1y probabilit with 1 ,log1 with allfor and verticesallFor

vSvS is TTL with received responsesdistinct ofnumber The

1

1

1

11

1-

0i

11-

0ii

2

1

22

1

11

surely almost least at is message per responsesdistinct of number the

is responsesdistinct of number the , ,any For

d-O

d-

τε

d-OSεVεSS,

121

1

122

1

10

2/ ,4/

,1

1)(

VSVSd

VSSdd

OS

2/ ,4/

,1

1)(

VSVS

VSSd

OS

11

(3)

• Proof:

dO

d

vS

dd

vS

d-OSG

dd

dd

dvSndiv

nOdvSnndiv

vS

n,d

i

i

i

i

i

i

i

12

1

1

vS1

1

1

vS1

vS is messageper

responsesdistinct ofnumber theand S1

1 have wegraph random aFor

12

111vS is TTL with received responsesdistinct ofnumber The

1 have we1 with allfor and verticesallfor Similarly,

1y probabilit with 1 ,log1 with allfor and verticesallFor

vSvS is TTL with received responsesdistinct ofnumber The

1

1

1

11

1-

0i

11-

0ii

2

1

22

1

11

11

1

d

S

least at is message per responsesdistinct of number the

and 411 is responsesdistinct of number theany For

2V

S , S

12

Flooding and Normalization• Theorem 3.2.: Let Gn,d,α,β be a random graph with supernodes and a flooding

scenario from node v of degree d with time-to-live τ.Claim: For some τ = O(log log n), the number of distinct responses is Ω(n).Proof: Consider flooding with τ = c logd-1(log n)+1 and vertices visited with TTL τ-1.

Assumption: this set (of visited nodes) doesn’t contain a large degree vertex.

From d-regular graphs we know that this set contains at least (d - 1)τ-1 edges.

The probability that no vertex in Γ(Sτ-1(v)) is bounded by (d/(d+αβ))(d - 1)^(τ-1) = (d/(d+αβ))clog n so within the first O(loglog n) steps we see a large vertex.

13

Flooding and Normalization• Theorem 3.3. : Let Gn,d,α,β be a random graph with supernodes, a normalized

flooding scenario from node v with TTL . Then the number of distinct responses is Ω((d - 1)τ-1) and the number of messages per response is O(1).

Proof:

From Theorem 3.1. the number of minigroups seen is (d - 1)τ-1 The expected number of small vertices is Q = (d *(d - 1)τ-1)/(d+αβ)

Let Xi, i = 1,…,N be random variables with P[ Xi=1]=pi and P[Xi=0]=1-pi

Using the above Chernoff bound the probability that less than Q/2 are seen is vanishingly small.

1log2

log

d

n

3

32

1 1

22

1 1

2

22expPr

2expPr

pNpNn

pNX

pNn

pNX

N

i i

ii

N

i i

ii and

14

Random Walks and Replication

• Random Walk with Look-Ahead: • a random walk with shallow flooding on each step of the walk• RW with lookahead 1 visits Ω(n) nodes with response O(n^(1/2))

• Theorem 4.2.: Let Gn,d,α,β be a random graph with supernodes and consider a

random walk from a node v. Then, in 1-step replication scenario, the expected number of messages and response time to obtain distinct responses is

11

4n

d

n

nnOn

nOd

log2

log 2

12

1

15

• Theorem 4.3.: Let Gn,d,α,β be a random graph with supernodes and consider

Normalized flooding from v with TTL τ ≈ (log n)/(2*log(d-1)). Then, in 1-step replication scenario, the number of distinct responses is at least

and the number of messages is at most

Proof:

The number of minigroups seen is (d - 1)τ – 1 and using the Chernoff bounds

there will be minigroups corresponding to large vertices.

ndd

nbd

8

1 2

121

2/111

2 nOdd

O

d

d

2

1 1

16

Generalized Search Schemes• Searching procedure:

• A node of degree d initiates a search based on a budget kbudget = number of messages that are propageted in the network• Among its d neighbors the node picks certain quantities k1,k2,…,kd

such that k1 + k2 + … + kd = k

• For every neighbor i the master node forwards the message with budget ki ( for ki = 0 the message is not transmitted)

• Each neighbor i reduces the budget by 1 unit and repeat the process until the budget is greater than 0

• Every node that receives the message for the second yime from another neighbor forwards the message with the corresponding budget

• Random Walks + Flooding

17

Experimental Evaluation• Methodology

– Performance Metrics• Median and Mean number of distinct peers discovered (hits)• Minimum, Maximum, Standard Deviation of the number of hits• Number of messages• Granularity of number of messages• Response time

– Topologies• Random d-Regular Graphs• Power Law Graphs• Bimodal topologies• Clustered topologies

18

Normalized Flooding (NF)• Mean number of unique peers discovered as a function of the initial TTL • NF and Standard Flooding behave similarly in Regular Graphs• NF controls the number of messages and provides higher efficiency

19

Normalized Flooding (NF)

• The number of unique peers increases exponentially with TTL in NF case• The number of peers increases faster than exponentially with TTL in

topologies with high degrees

20

Random Walk with 1-step replication

21

Random Walk with LookAhead (RWLA)

• RWLA performance is similar to long RW without lookahead (in terms of unique peers discovered)

• RWLA response time is much smaller compared to standard RW

22

Edge Criticality & Searching with weights

• Generalized Searching performs similarly to Standard Flooding in regular graphs

• Generalized Searching behaves similarly to Standard Flooding in other topologies if normalized edge criticality is used.

23

Conclusions

• Normalized Flooding (NF) could substitute the Standard Flooding in irregular graphs

• RW with 1-step replication performs better than RW and NF in irregular graphs

• Open for improvements:• Generalized schemes (analytic investigation)• Quantifying Directional flooding

24

“Random Walks in Peer-to-Peer (P2P) Networks”

Christos Gkantsidis, Milena Mihail, Amin Saberi

25

Outline

• Motivation

• Statistical Estimation and Random Walks (RW)

• Searching• Methodology and Topologies importance

• Construction and Summary

26

Motivation• Random Walks (RW) were proposed for constructing searching

and topology maintenance protocols in P2P networks• RW improve searching performance as compared to flooding (Cao et al., 2002)• A RW approach to constructing and maintaining unstructured topologies

provides good connectivity properties (i.e. constant degree, constant expansion)

• Claim: RW approach is a good candidate • to simulate uniform sampling• the number of simulation steps required can be as low as the number of

samples in independent uniform sampling

• Searching and Overlay Topology Construction • RW searching performs better than flooding for the same number of messages

and for cluster and slow dynamic topologies• Construction of P2P networks by random walks

27

Statistical Estimation & Random Walks• Coupon collection and Chernoff bounds

• n - type of coupons & each time one is drawn (uniformly distributed)• Tn - time by which we extracted coupons belonging to all n types

• Tαn - time by which we encountered αn distinct types, 0 < α < 1

• X1,…,Xk independent Bernoulli trials, P[Xi=1]=pi and P[Xi=0]=1-pi

• p - probability that a random drawn object has a particular property• the probability that the property is found in substantially fewer draws

than its frequency in the search space and the quality of the estimator X/k are bounded by

)log(21

1 nnOnn

n

n

nTE n

)(1

1

1211 nO

nn

n

n

n

n

nTE n

20 /

1

2

1

2

21 kpεk

i

i / εkpk

ii eεpp

k

XPr ekpεXPr

and

28

Statistical Estimation & Random Walks

• Random Walks (RW), Convergence and Cover Time• G = (V,E) undirected graph, |V| = n, and di- degree of vertex I

• Aij - adjacency matrix, P - transition matrix which satisfies

• f: V→{0,1} which satisfies• Convergence rate metric - the rate at which the RW approaches the

stationary distribution• Cover time metric - the time by which all nodes were visited• Trajectory sample average - the rate at which the value of f averaged

over successive vertices of the RW trajectory approaches p

E

dP i

i 2 , with

Vv

vvfVv

v vfp )(1)(:

29

Statistical Estimation & Random Walks

• Convergence rate is related to the second eigenvalue of P

(1)

• yt – the vertex that the RW visited at time t

• Cover time (2)

• Trajectory sample average (3)

SπSyPrmaxtΔπ

λtΔ tVS

min

t

2 , where

nΩπ ,

λ

nO

αλπ

lognO

αCE

nΩπ ,

λ

nlognO

λπ

lognOCE

min

22min

αn

min

22min

n

1

11

1

11

1

1

11

1

2

1

20

1- 2

8e

λlog

πlogτYYεpp

k

YPr min

1τtt

λkpε 22

and ,

(1) :[ 11], (2) :[ 12, 13] , (3) :[ 3, 4, 5, 6]

30

Statistical Estimation & Random Walks

• Second Eigenvalue, Expansion and Conductance• S subset of V, C(S) cutset of V (i.e. edges with one point in S and

the other one in V\S), vol(S) (i.e. the sum of degrees of vertices in S)• Expansion

• Conductance

• Known bound

/2VSVS

S

SCminφ

/2VvolSvolVS

Svol

SCminΦ

2-12-1

2

2

ΦλΦ

[ 11, 14, 15, 16, 17, 18, 19]

31

Searching• Performance metrics for Flooding and RW

• average number of distinct copies of an item located in the search• number of messages used by the searching algorithm

• RW performs better than flooding if• multiple search requests for the same item with slow-changing

topology• peer clustering ( see [20, 21, 22, 23, 24, 25] for details)

• Searching analysis• Methodology• Flat topologies with Uniformly Distributed Content• Topologies with Peer Clustering• Re-issuing the Same Query• Real topologies

32

Searching - Methodology• Performance Metrics

• mean of the number of distinct copies (i.e. Mean)• discrepancy around the mean (i.e. Std) and the failure probability

• Cost• number of messages or queries performed during search

• Peer-to-peer topologies ( ≈ 1 million nodes)• Flat regular expanders, Two tier topologies with clustering, Power law

graphs, Samples from real topologies

• Dynamic topologies• rewiring

• Content placement• Content clustering affects the performance of searching

33

Searching – Flat Topologies• Experiment:

• one request in a network of 500K peers• Mean hits, Minimum # of hits and Std are similar for Flooding

and RW• the entire distribution of hits is similar for Flooding and RW

34

Searching -Topologies with Peer Clustering• Cluster topology consists of

• 5 flat regular graphs of size 40K; from each one pick randomly 1000 nodes to construct another flat regular graph

• Number of hits for RW is more concentrated around the mean compared to Flooding

35

Searching - Reissuing the Same Query• Experiment setup – repeat 4 times the below procedure

• each peer sends a request and waits for response• between requests 2% of the links are rewired• each peer initiates a new searching

• RW have better performance than Flooding• Mean Hits and Failure Probability

36

Searching - Reissuing the Same Query

• Performance of successive searches depends • on the number of topology changes considered between consecutive

searches

• Performance of Flooding increases as the rate of topological changes increases

• RW Performance remains the same for small variations

37

Searching – Real Topologies

• The number of hits for RW is more concentrated around the mean than in Flooding

• P2P have good expansion properties

38

Construction• P2P network construction concerns with:

• peers arrive and leave the network dynamically• strong and weak decentralization• low network overhead per addition or deletion

39

Baseline Construction of Expander Graphs

• ABASE (undirected graph) consists of: • n vertices where each one chooses randomly d vertices• total number of edges = nd and expected vertex degree = 2d

• Theorem 4.1. Let G(V,E) a graph constructed by ABASE.

Then, G is an expander with high probability and for positive

constant α < 1 )1(1minPr

2,

OS

SCV

SVS

40

Baseline Construction of Expander Graphs with Constant Overhead in Random Bits

• A’BASE construction algorithm: • start a RW at a random vertex on H (constant degree expander graph)• when ABASE needs a random number this is taken from the RW on H

• Theorem 4.2. Let G(V,E) a graph constructed by A’BASE.

There are positive constants α, 0 < β < 0.5 such that any subset S of at least β|V| and at most 0.5|V| has cutset expansion α almost surely.

)1(1minPr

2,

OS

SCV

SVVS

41

Distributed Construction of Expanders with Constant Overhead on Network Resources

• A’H – construction• d daemons , one for each Hamilton cycle• a new arriving node, it contacts the daemon associated with the i-th

Hamilton cycle• it attaches after c number of steps between the peer that currently

hosts daemon i and one of its neighbors in the cycle i

42

Distributed Construction of Expanders with Constant Overhead on Network Resources

• A’M – construction• d daemons , one for each Hamilton cycle• the arrival of a new arriving node consists of two X and Y nodes; X and

Y contact the central server to discover the location of the d daemons• X becomes the neighbor of daemon i and Y the neighbor of the initial

daemon’s neighbor

43

Summary

• For Searching • Random Walks (RW) are superior to Flooding

• For Construction• RW add new peers with constant overhead

• Open Problems• Strong Decentralized Construction algorithm• Can we handle better deletions and expansions of

small sets?• How the P2P network parameters (e.g. capacities)

affect the performance of RW?