Presentation by Nanda Kishore Lella Lella.2@wright

1

A Measurement Study of Peer-to-Peer File Sharing Systems

by

Stefan SaroiuP. Krishna Gummadi

Steven D. Gribble

Presentationby

Nanda Kishore [email protected]

2

Outline• P2P Overview

– What is a peer?– Example applications– Benefits of P2P

• P2P Content Sharing– Challenges– Group management/data placement approaches– Measurement studies

• Conclusion

3

What is Peer-to-Peer (P2P)?

• Most people think of P2P as music sharing

Examples:

• Napster

• Gnutella

4

What is a peer?

• Contrasted with Client-Server model

• Servers are centrally maintained and administered

• Client has fewer resources than a server

5

What is a peer?

• A peer’s resources are similar to the resources of the other participants

• P2P – peers communicating directly with other peers and sharing resources

6

P2P Application Taxonomy

P2P Systems

Distributed Computing File Sharing Collaboration PlatformsJXTA

7

P2P Goals/Benefits

• Cost sharing

• Resource aggregation

• Improved scalability/reliability

• Increased autonomy

• Anonymity/privacy

• Dynamism

• Ad-hoc communication

8

P2P File Sharing

• Content exchange– Gnutella

• File systems– Oceanstore

• Filtering/mining– Opencola

9

Research Areas

• Peer discovery and group management

• Data location and placement

• Reliable and efficient file exchange

• Security/privacy/anonymity/trust

10

Current Research

• Group management and data placement– Chord, CAN, Tapestry, Pastry

• Anonymity– Publius

• Performance studies– Gnutella measurement study etc.

11

Management/Placement Challenges

• Per-node state

• Bandwidth usage

• Search time

• Fault tolerance/resiliency

12

Approaches

• Centralized

• Flooding

• Document Routing

13

Centralized

• Napster model• Benefits:

– Efficient search

– Limited bandwidth usage

– Efficient network handling

• Drawbacks:– Central point of failure

– Limited scale

Bob Alice

JaneJudy

14

Flooding

• Gnutella model• Benefits:

– No central point of failure

– Limited per-node state

• Drawbacks:– Slow searches

– Bandwidth intensive

Bob

Alice

Jane

Judy

Carl

15

Document Routing

• FreeNet, Chord, CAN, Tapestry, Pastry model

• Benefits:– More efficient searching

– Limited per-node state

• Drawbacks:– Limited fault-tolerance vs

redundancy

001 012

212

305

332

212 ?

212 ?

16

Document Routing – CAN

• Associate to each node and item a unique id in an d-dimensional space

• Goals– Scales to hundreds of thousands of nodes

– Handles rapid arrival and failure of nodes

• Properties – Routing table size O(d)

– Guarantees that a file is found in at most d*n1/d steps, where n is the total number of nodes

Slide modified from another presentation

17

CAN Example: Two Dimensional Space

• Space divided between nodes• All nodes cover the entire space• Each node covers either a square or a

rectangular area • Example:

– Node n1:(1, 2) first node that joins cover the entire space

1 2 3 4 5 6 70

1

2

3

4

5

6

7

0

n1


18


• Node n2:(4, 2) joins space is divided between n1 and n2

1 2 3 4 5 6 70

1

2

3

4

5

6

7

0

n1 n2


19


• Node n2:(4, 2) joins space is divided between n1 and n2

1 2 3 4 5 6 70

1

2

3

4

5

6

7

0

n1 n2

n3


20


• Nodes n4:(5, 5) and n5:(6,6) join

1 2 3 4 5 6 70

1

2

3

4

5

6

7

0

n1 n2

n3 n4n5


21


• Nodes: n1:(1, 2); n2:(4,2); n3:(3, 5); n4:(5,5);n5:(6,6)

• Items: f1:(2,3); f2:(5,1); f3:(2,1); f4:(7,5);

1 2 3 4 5 6 70

1

2

3

4

5

6

7

0

n1 n2

n3 n4n5

f1

f2

f3

f4


22


• Each item is stored by the node who owns its mapping in the space

1 2 3 4 5 6 70

1

2

3

4

5

6

7

0

n1 n2

n3 n4n5

f1

f2

f3

f4


23

CAN: Query Example• Each node knows its

neighbors in the d-space• Forward query to the

neighbor that is closest to the query id

• Can route around some failures– some failures require local

flooding• Example: assume n1 queries

f41 2 3 4 5 6 70

1

2

3

4

5

6

7

0

n1 n2

n3 n4n5

f1

f2

f3

f4


24

CAN: Query Example

1 2 3 4 5 6 70

1

2

3

4

5

6

7

0

n1 n2

n3 n4n5

f1

f2

f3

f4


25

CAN: Query Example

1 2 3 4 5 6 70

1

2

3

4

5

6

7

0

n1 n2

n3 n4n5

f1

f2

f3

f4


26

CAN: Query Example

1 2 3 4 5 6 70

1

2

3

4

5

6

7

0

n1 n2

n3 n4n5

f1

f2

f3

f4


27

Node Failure Recovery

• Simple failures– know your neighbor’s neighbors– when a node fails, one of its neighbors takes

over its zone

• More complex failure modes– simultaneous failure of multiple adjacent nodes – scoped flooding to discover neighbors– hopefully, a rare event


28

Document Routing – Chord

• MIT project• Uni-dimensional ID

space• Keep track of log N

nodes• Search through log N

nodes to find desired key

N32

N10

N5

N20

N110

N99

N80

N60

K19

29

Document Routing – Chord(2)

N32

N10

N5

N20

N110

N99

N80

N60

K19• Each node and key is

assigned an id.• If a node needs a key, searches in its table of n nodes for the key.• If fails, goes to the last

node of its table and repeats until it finds the key.

• Search through log N nodes to find desired key

30

Doc Routing – Tapestry/Pastry

• Global mesh of meshes• Suffix-based routing• Uses underlying network

distance in constructing mesh

13FE

ABFE

1290239E

73FE

9990

F990

993E

04FE

43FE

31

Naming in Tapestry

13FE

ABFE

1290239E

73FE

9990

F990

993E

04FE

43FE

• Every node has a 4 bit

name similar to IP address

• Each bit in the name

can hold 16 types• Keys present at the node

are in accordance with

the node name.

32

Tapestry Routing

6789

B4F8

9098

7598

4598

Msg to 4598

B437

33

Remaining Problems?

• Hard to handle highly dynamic environments

• Methods don’t consider peer characteristics

34

Measurement Studies

Gnutella vs. Napster

35

S S

S S

napster.com

P

P

P

P

P

P

Q

R

D

PP

PPP

PP

Q

QQ

QD

Q

R

P

S

peer

server

Q

RD

response

queryfile download

Napster Gnutella

R

36

Methodology

2 stages:1. periodically crawl Gnutella/Napster

• discover peers and their metadata

2. feed output from crawl into measurement tools:• bottleneck bandwidth – SProbe• latency – SProbe• peer availability – LF• degree of content sharing – Napster crawler

37

Crawling

• May 2001

• Napster crawl– query index server and keep track of results– query about returned peers– don’t capture users sharing unpopular content

• Gnutella crawl– send out ping messages with large TTL

38

39

Measurement Study

• How many peers are server-like…client-like?

• Bandwidth, latency

• Connectivity

• Who is sharing what?

40

41

Graph results

• CDF: cumulative distribution function• From this graph, we see that while 78% of the

participating peers have downstream bottleneck

bandwidths of at least 1000Kbps• Only 8% of the peers have upstream bottleneck

bandwidths of at least 10Mbps.• 22% of the participating peers have upstream

bottleneck bandwidths of 100Kbps or less.

42

43

Reported Bandwidth

44

Graph results

• The percentage of Napster users connected with modems (of 64Kbps or less) is

• About 25%, while the percentage of Gnutella users with similar connectivity is as low as 8%.

• 50% of the users in Napster and 60% of the users in Gnutella use broadband connections

• only about 20% of the users in Napster and 30% of the users in Gnutella have very high bandwidth connections

• Overall, Gnutella users on average tend to have higher downstream bottleneck bandwidths than Napster

users.

45

46

Graph results

• Approximately 20% of the peers have latencies of at least 280ms,

• Another 20% have latencies of at most 70ms

47

48

Graph results

• This graph illustrates the presence of two clusters; a smaller one situated at (20-60Kbps, 100-1,000ms) and a larger one at over (1,000Kbps, 60-300ms).

• horizontal lines in the graph they predicate that the latency also depends on the location of the peer for measuring system.

49

Measured Uptime

50

51

Number of Shared Files

52

53

Correlation of Free-Riding with B/WCDF of Number of Downloads Per Reported Bandwidths (Napster)

0

20

40

60

80

100

0 1 10 100Number of Downloads

Per

cen

tag

e o

f H

ost

s

Unknown

Modem + ISDN

Dual ISDN + Cable + DSL

T1 + T3

CDF of Number of Uploads Per Reported Bandwidths (Napster)

0

20

40

60

80

100

0 1 10 100Number of Uploads

Per

cen

tag

e o

f H

ost

sUnknown

T1 + T3

Dual ISDN + Cable + DSL

Modem + ISDN

54

55

56

CDFs of Downstream Bottlenck Bandwidths for All Napster Users and For All Users Who Reported Unknown Bandwidhts

0

20

40

60

80

100

1 10 100 1,000 10,000 100,000

Measured (SProbe) Downstream Bottleneck Bandwidth (Kbps)

Per

cent

age

of H

osts

All Napster Users

Napster Users whoReported Unknown Bandwidhts

57

Power law

• A connected cluster of peers that spans the entire network survives even in the presence of a large percentage p of random peer breakdowns, where p can be as large as:

where m is the minimum node degree and K is the maximum node degree.α <3.

58

59

Gnutella

Fri Feb 16 05:21:52-05:23:22 PST1771 hosts

Popular sites:

• 212.239.171.174

• adams-00-305a.Stanford.EDU

• 0.0.0.0

60

30% random failures

1771 – 471 – 294 hosts Fri Feb 16 05:21:52-05:23:22 PST

61

4% orchestrated failures

Fri Feb 16 05:21:52-05:23:22 PST1771 - 63 hosts

62

Results Overview

• Lots of heterogeneity between peers– Systems should consider peer capabilities

• Peers lie– Systems must be able to verify reported peer

capabilities or measure true capabilities

63

Points of Discussion

• Is it all hype?

• Should P2P be a research area?

• Do P2P applications/systems have common research questions?

• What are the “killer apps” for P2P systems?

64

Conclusion

• P2P is an interesting and useful model

• There are lots of technical challenges to be solved

Date post:	21-Jan-2016
Category:	Documents
Upload:	walden
View:	114 times
Download:	0 times

Presentation by Nanda Kishore Lella Lella.2@wright

Documents