+ All Categories
Home > Documents > Seminar: Information Management in the Web

Seminar: Information Management in the Web

Date post: 13-Jan-2016
Category:
Upload: judson
View: 20 times
Download: 1 times
Share this document with a friend
Description:
Seminar: Information Management in the Web. Gnutella, Freenet and more: an overview of file sharing architectures Thomas Zahn. Peer-to-Peer - Introduction. "opposite" of Client/Server no central servers  information highly distributed every peer acts as a client AND server - PowerPoint PPT Presentation
Popular Tags:
25
1 Seminar: Information Management in the Web Gnutella, Freenet and more: an overview of file sharing architectures Thomas Zahn
Transcript
Page 1: Seminar:  Information Management in the Web

1

Seminar: Information Management in the Web

Gnutella, Freenet and more: an overview of file sharing

architectures

Thomas Zahn

Page 2: Seminar:  Information Management in the Web

2

Peer-to-Peer - Introduction

• "opposite" of Client/Server

• no central servers information highly distributed

• every peer acts as a client AND server

• -> can query, reply to queries and route messages at the same time

• every peer can directly "talk" to any other peer

Page 3: Seminar:  Information Management in the Web

3

Popular Peer-to-Peer Networks

• Napster

• Gnutella

• Freenet

• FastTrack (Kazaa)

• CHORD, CAN, PASTRY, TAPESTRY

Page 4: Seminar:  Information Management in the Web

4

Napster

• was used primarily for file sharing

• NOT a pure peer-to-peer network

• => hybrid system

• peer turns to central DB for querying (client/server)

• peer downloads directly from other peer(s) (peer-to-peer)

Page 5: Seminar:  Information Management in the Web

5

Napster

central DB6

5

1 2

4

3

1. Query2. Response

3. DownloadRequest

4. File

Peer

Page 6: Seminar:  Information Management in the Web

6

Gnutella - overview

• pure peer-to-peer

• used for file sharing

• very popular => practically proven ?

• very simple protocol

• no routing "intelligence"

• messages are always broadcast

Page 7: Seminar:  Information Management in the Web

7

Gnutella - PING/PONG

1 52

4

3

6

7

8

Ping 1

Ping 1

Ping 1

Ping 1

Ping 1

Ping 1

Ping 1Known Hosts:2

3,4,5

6,7,8

Pong 2

Pong 4

Pong 3

Pong 5Pong 3,4,5

Pong 6,7,8 Pong 6

Pong 7

Pong 8

Pong 6,7,8

Query/Response analogous

Page 8: Seminar:  Information Management in the Web

8

Gnutella - Pro & Con

• VERY simple protocol => easy to implement

• very little overhead

• practically proven functionality (?)

• message broadcasts flood network

• =>heavy network traffic

• => bad, bad scalibility

Page 9: Seminar:  Information Management in the Web

9

Gnutella – Reachable Peers

T=1 T=2 T=3 T=4 T=5 T=6 T=7 T=8

N=2 2 4 6 8 10 12 14 16

N=3 3 9 21 45 93 189 381 765

N=4 4 16 52 160 484 1,456 4,372 13,120

N=5 5 25 105 425 1,705 6,825 27,305 109,225

N=6 6 36 186 936 4,686 23,436 117,186 585,936

N=7 7 49 301 1,813 10,885 65,317 391,909 2,351,461

N=8 8 64 456 3,200 22,408 156,864 1,098,056 7,686,400

Page 10: Seminar:  Information Management in the Web

10

Gnutella – Generated Traffic in Bytes (1)

T=1 T=2 T=3 T=4 T=5 T=6 T=7 T=8

N=2 166 332 498 664 830 996 1,162 1,328

N=3 249 747 1,743 3,735 7,719 15,687 31,623 63,495

N=4 332 1,328 4,316 13,28 40,172 120,848 362,876 1,088,960

N=5 415 2,075 8,715 35,275 141,515 566,475 2,266,315 9,065,675

N=6 498 2,988 15,438 77,688 388,938 1,945,188 9,726,438 48,632,688

N=7 581 4,067 24,983 150,479 903,455 5,421,311 32,528,447 195,171,263

N=8 664 5,312 37,848 265,600 1,859,864 13,019,712 91,138,648 637,971,200

• query message length: 83 bytes• simple query relaying (no responses)

Page 11: Seminar:  Information Management in the Web

11

Gnutella – Generated Traffic in Bytes (2)

T=1 T=2 T=3 T=4 T=5 T=6 T=7 T=8

N=3283.68 1,418.4 4,822.56 13,900.3 36,594.7 91,061.3 218,15 508.638

N=4378.24 2,647.68 12,860.2 53,710.1 206,897 758,371 2,688,530 9,306,220

N=5472.8 4,255.2 26,949.6 147,986 753,17 3,658,050 17,214,200 79,185,000

N=6567.36 6,240.96 48,793 332,473 2,105,470 12,743,500 74,798,500 429,398,000

N=7661.92 8,604.96 80,092.3 651,991 4,941,123 35,823,800 252,002,000 1,734,360,000

N=8756.48 11,347.2 122,55 1,160,440 10,242,000 86,526,900 709,521,000 5,693,470,000

• Mean percentage of users who typically share content: 30%• Mean perctg. of users who typically have responses to search queries: 40%• Mean number of search responses the typical respondent offers: 10• Mean length of search responses the typical respondent offers: 60 "Standard client settings yield a whopping 17MB generated in response to […] search query "

Page 12: Seminar:  Information Management in the Web

12

Freenet - Concepts

• peer-to-peer file storage & retrieval system

• every document has a globally unique ID

• efficient (?) retrieval algorithm– documents are retrieved with sublinear effort

• routing based on likelihood of answer capability

• focus on security

Page 13: Seminar:  Information Management in the Web

13

Freenet – Query Routing (1)

• every peer maintains routing table

• table contains known peers along with the IDs of the documents their are storing

• a request is routed to the peer most likely to have an answer (closest matching ID)

• responses are sent back upstream

• intermediate peers also store document and augment their routing tables

Page 14: Seminar:  Information Management in the Web

14

Freenet – Query Routing (2)

Routing TableB: 14, 20Doc Cache19, 30

A B

C

D

Routing TableC: 19, 30D: 45, 51Doc Cache14,20

Routing TableB: 14, 20X: 47, 60Doc Cache5, 89

Routing TableB: 14, 20Z: 105, 110Doc Cache17, 45, 51, 102, 205

1. Query for doc 17 3. C has no match -> backtrack

2. Forward to best match

4. Forward query to 2nd best match

5. Send back doc 17

Routing TableC: 19, 30D: 17, 45, 51Doc Cache14, 17, 20

6. Route back response

Routing TableB: 14, 17, 20X: 47, 60Doc Cache5, 17, 89

Page 15: Seminar:  Information Management in the Web

15

Freenet – Document Insert

• analogous to query routing

• insert is routed to the peer most likely to be interested in new doc (closest matching ID)

• intermediate peers cache document and augment routing tables

• until TTL is reached

Page 16: Seminar:  Information Management in the Web

16

Freenet - Discussion

• efficient routing algorithm (compared to Gnutella)

• adequate security features/heuristics (the more popular a document, the more frequently it gets cached)

• no metasearch

• no updates, deletes possible

• worst case query routing = DFS

Page 17: Seminar:  Information Management in the Web

17

FUtella – Concepts

• peer-to-peer platform for general knowledge sharing

• tries to model learning style of humans

• content-based routing

• combines and extends approaches from:– Gnutella (message format)– JXTA (peer groups)– JXTA Search (queryspaces and registrations)– FreeNet (routing of registration discoveries)

Page 18: Seminar:  Information Management in the Web

18

FUtella - Knowledge Groups

E

MiM1 . . .

Group Head: Peer E

Members M1 - Mi

FUtella NetKnowledge Group:Queryspace "Computer Architecture"

Inserts Registration

Page 19: Seminar:  Information Management in the Web

19

FUtella - Knowledge Group Discovery 1

Routing Table"computer" -> B"computer analysis" -> YRegistration Cache"computer": B"computer analysis": Y

A B

C

D

Routing Table"computer analysis" -> C"computer systems" -> D"data base" -> ARegistration Cache"computer analysis" : Y"computer systems": Z"data base" : X

Routing Table"computer" -> B"data base" -> XRegistration Cache"computer": B"data base": X

Routing Table"computer" -> B"computer systems" -> Z"computer architecture" -> ERegistration Cache"computer systems": Z"computer": B"computer architecture": E

1. Discovery request "computer architecture"

3. C has no cached registration for "computer architecture -> backtrack

2. Forward discovery request

4. Forward discovery request to 2nd best match

Page 20: Seminar:  Information Management in the Web

20

FUtella - Knowledge Group Discovery 2

A B D

Routing Table"computer analysis" -> C"computer architecture" -> D"computer systems" -> D"data base" -> ARegistration Cache"computer analysis" : Y"computer architecture": E"computer systems": Z"data base" : X

Routing Table"computer" -> B"computer architecture" -> D"data base" -> XRegistration Cache"computer": B"computer architecture": E"data base": X

Routing Table"computer" -> B"computer systems" -> Z"computer architecture" -> ERegistration Cache"computer systems": Z"computer": B"computer architecture": E

5. Discovery response

Containing registration "computer architecture": E

6. Forward discovery response

Page 21: Seminar:  Information Management in the Web

21

Futella - Query Processing

A B

C

D

1. Discovery request "computer architecture"

2. Forward discovery request

3. C has no cached registration for "computer architecture -> backtrack4. Forward discovery

request to 2nd best match

5. Discovery response containing cached registration

6. Forward discovery response

E M1

Mi

.

.

.

8. Forward query to member

8.Forward query to member

9. Query response

9. Query response

Knowledge group "computer architecture"

7. Send query

Page 22: Seminar:  Information Management in the Web

22

Futella - Test Results (1)

Total Number of Messages

dynamic peersstatic peers semi-dynamic peers

0

50000

100000

150000

200000

250000

# m

sg

threshold 2

no threshold

Gnutella

Page 23: Seminar:  Information Management in the Web

23

FUtella - Test Results (2)

Average Hit Ratio

dynamic peersstatic peers semi-dynamic peers

0%

10%

20%

30%

40%

50%

60%

70%

80%

90%

100%

threshold 2

no threshold

Gnutella

Page 24: Seminar:  Information Management in the Web

24

Conclusion

• first and second generation P2P systems still most widely used

• practically proven

• very flexible in terms of topology

• bad scalibility (Gnutella)

• no guaranteed lower bound on query effort (Freenet)

• (scientificly) far better approach: DHTs (see next presentation)

Page 25: Seminar:  Information Management in the Web

25

Questions ?

?


Recommended