+ All Categories
Home > Documents > MSCS6060 Parallel and Distributed Systems Peer-to-Peer Computing Rong Ge Some slides and figures are...

MSCS6060 Parallel and Distributed Systems Peer-to-Peer Computing Rong Ge Some slides and figures are...

Date post: 28-Dec-2015
Category:
Upload: marlene-lee
View: 219 times
Download: 0 times
Share this document with a friend
Popular Tags:
44
MSCS6060 Parallel and Distributed Systems Peer-to-Peer Computing Rong Ge Some slides and figures are from www.list.gmu.edu and HPL survey
Transcript
Page 1: MSCS6060 Parallel and Distributed Systems Peer-to-Peer Computing Rong Ge Some slides and figures are from  and HPL survey.

MSCS6060 Parallel and Distributed Systems

Peer-to-Peer Computing

Rong Ge

Some slides and figures are from www.list.gmu.edu and HPL survey

Page 2: MSCS6060 Parallel and Distributed Systems Peer-to-Peer Computing Rong Ge Some slides and figures are from  and HPL survey.

Outline

• What’re P2P technologies?• Taxonomy• P2P applications and services• Research issues

2

Page 3: MSCS6060 Parallel and Distributed Systems Peer-to-Peer Computing Rong Ge Some slides and figures are from  and HPL survey.

3

Mainframe → Client-Server → P2P• Mainframe era:

– 1970’s– Dumb terminals connected to a big mainframe – Mainframes possibly networked together

• Client-server:– Late 1980’s– Many clients, 1 user per client– Dedicated servers– Single client can access multiple servers– Significant computing resources on client

• Peer-to-Peer (P2P)– Late 1990’s– Each computer is a client and a server– Takes on whatever role is appropriate for a given task at a given time– Harnesses computing and communication power of the entire network

What do you think makes p2p increasingly common?

Page 4: MSCS6060 Parallel and Distributed Systems Peer-to-Peer Computing Rong Ge Some slides and figures are from  and HPL survey.

4

P2P versus Client-Server: Idealized View

From Peer-to-Peer Computing, Milojicic et al, HP Laboratories, HPL-2002-57, March 8th, 2002

Page 5: MSCS6060 Parallel and Distributed Systems Peer-to-Peer Computing Rong Ge Some slides and figures are from  and HPL survey.

5

No Clear Border

From Peer-to-Peer Computing, Milojicic et al, HP Laboratories, HPL-2002-57, March 8th, 2002

Page 6: MSCS6060 Parallel and Distributed Systems Peer-to-Peer Computing Rong Ge Some slides and figures are from  and HPL survey.

6

Hybrid P2P Systems

From Peer-to-Peer Computing, Milojicic et al, HP Laboratories, HPL-2002-57, March 8th, 2002

Page 7: MSCS6060 Parallel and Distributed Systems Peer-to-Peer Computing Rong Ge Some slides and figures are from  and HPL survey.

Peer-to-Peer Computing

• The individual nodes have symmetric roles. • Each node may act as both a client and a server. (IRTF P2P research

group) • The participants share a part of their own hardware resources

(processing power, storage capacity, network link capacity, printers, …)• Individual computers communicate directly over the Internet without

central entities• The participants are resource (Service and content) providers as well as

resource (Service and content) requestors (Servent-concept)

R. Schollmeier, “A definition of peer-to-peer networking for the classification of peer-to-peer architectures and applications,” in Proc. of P2P’01, pp. 101-102, Aug. 2001 7

Page 8: MSCS6060 Parallel and Distributed Systems Peer-to-Peer Computing Rong Ge Some slides and figures are from  and HPL survey.

Peer-to-Peer Is Not New• P2P networking is not new-fashioned

– Telephone– Usenet News in 1979– DNS

• P2P is mostly known under the brand of Napster, the first file-sharing service

8

Page 9: MSCS6060 Parallel and Distributed Systems Peer-to-Peer Computing Rong Ge Some slides and figures are from  and HPL survey.

9

Napster

From THE FUTURE OF PEER-TO-PEER COMPUTING, Loo, CACM Sept 2003

Page 10: MSCS6060 Parallel and Distributed Systems Peer-to-Peer Computing Rong Ge Some slides and figures are from  and HPL survey.

10

P2P Application Examples• Napster

– Music sharing• Information (File) sharing

– KaZaa, Gnutella– Morpheus, FreeNet,

Grokster, …• Distributed data processing

– SETI@home– Folding@home– Popular Power

• Distributed applications– Distributed File system– DDoS

Page 11: MSCS6060 Parallel and Distributed Systems Peer-to-Peer Computing Rong Ge Some slides and figures are from  and HPL survey.

P2P Domainates Internet Traffic• P2P has dominated Internet traffic

Source: CacheLogic.

In 2006, more than 60% of Internet traffic

Since YouTube is based on HTTP, there is a growth in Web traffic in 2007.

Page 12: MSCS6060 Parallel and Distributed Systems Peer-to-Peer Computing Rong Ge Some slides and figures are from  and HPL survey.

Statistics of P2P Traffic

12

Page 13: MSCS6060 Parallel and Distributed Systems Peer-to-Peer Computing Rong Ge Some slides and figures are from  and HPL survey.

Some Statistics about P2P Systems• More than 663 million users registered with skype, around 10

million on-line users. (2010)• Around 4.7M hosts participate SETI@Home (2006)• BT accounts for 1/3 of Internet traffic (2007)• More than 200,000 simultaneous online users on PPLive. (2007)• More than 3,000,000 users downloaded PPStream. (2008)

13

Page 14: MSCS6060 Parallel and Distributed Systems Peer-to-Peer Computing Rong Ge Some slides and figures are from  and HPL survey.

14

Taxonomy of Computer Systems

From Peer-to-Peer Computing, Milojicic et al, HP Laboratories, HPL-2002-57, March 8th, 2002

Page 15: MSCS6060 Parallel and Distributed Systems Peer-to-Peer Computing Rong Ge Some slides and figures are from  and HPL survey.

15

Why P2P?

• Get rid of Servers– Single point of failure, centralized control and management, access fee

and management fee, …

• Clients are not so dumb– Billions of Mhz CPU, tons of terabytes disk, millions of gigabits network

bandwidth, …

• P2P is about resource sharing – Flexible, efficient information sharing

P2P changes the way of Web (Internet)

Page 16: MSCS6060 Parallel and Distributed Systems Peer-to-Peer Computing Rong Ge Some slides and figures are from  and HPL survey.

16

Taxonomy of P2P Systems

From Peer-to-Peer Computing, Milojicic et al, HP Laboratories, HPL-2002-57, March 8th, 2002

Page 17: MSCS6060 Parallel and Distributed Systems Peer-to-Peer Computing Rong Ge Some slides and figures are from  and HPL survey.

17

Classification of P2P Systems

From Peer-to-Peer Computing, Milojicic et al, HP Laboratories, HPL-2002-57, March 8th, 2002

Page 18: MSCS6060 Parallel and Distributed Systems Peer-to-Peer Computing Rong Ge Some slides and figures are from  and HPL survey.

18

Taxonomy of P2P Applications

From Peer-to-Peer Computing, Milojicic et al, HP Laboratories, HPL-2002-57, March 8th, 2002

Page 19: MSCS6060 Parallel and Distributed Systems Peer-to-Peer Computing Rong Ge Some slides and figures are from  and HPL survey.

19

Taxonomy of P2P Markets

From Peer-to-Peer Computing, Milojicic et al, HP Laboratories, HPL-2002-57, March 8th, 2002

Page 20: MSCS6060 Parallel and Distributed Systems Peer-to-Peer Computing Rong Ge Some slides and figures are from  and HPL survey.

20

P2P Markets versus P2P Applications

From Peer-to-Peer Computing, Milojicic et al, HP Laboratories, HPL-2002-57, March 8th, 2002

Page 21: MSCS6060 Parallel and Distributed Systems Peer-to-Peer Computing Rong Ge Some slides and figures are from  and HPL survey.

21

P2P System Architecture

From Peer-to-Peer Computing, Milojicic et al, HP Laboratories, HPL-2002-57, March 8th, 2002

Page 22: MSCS6060 Parallel and Distributed Systems Peer-to-Peer Computing Rong Ge Some slides and figures are from  and HPL survey.

Summary• Distributed computing

– Server/client– P2p

• P2p computing– Participants are both servers and clients

• Taxonomy of P2P computing– Systems– Applications– Architecture

MSCS6060 Spring 2010 22

Page 23: MSCS6060 Parallel and Distributed Systems Peer-to-Peer Computing Rong Ge Some slides and figures are from  and HPL survey.

MSCS6060 Parallel and Distributed Systems

Peer-to-Peer Computing Cont’d

Rong Ge

Some slides and figures are from www.list.gmu.edu and HPL survey

Page 24: MSCS6060 Parallel and Distributed Systems Peer-to-Peer Computing Rong Ge Some slides and figures are from  and HPL survey.

Outline• P2P system models and operations• Challenges and issues in P2P

MSCS6060 Spring 2010 24

Page 25: MSCS6060 Parallel and Distributed Systems Peer-to-Peer Computing Rong Ge Some slides and figures are from  and HPL survey.

P2P Operations

• Operations in P2P systems consist of three phases– Peer discovery (bootstrap)

• Well-known nodes, cached peers, broadcasting, …

– Resource discovery (search)• Locate a resource given its identifier

• Central servers maintain index of all information• Unstructured P2P networks use flooding• Structured P2P networks use distributed hash table (DHT)

– Communication or data transfer• Direct communication, NAT/Firewall traversal

Page 26: MSCS6060 Parallel and Distributed Systems Peer-to-Peer Computing Rong Ge Some slides and figures are from  and HPL survey.

P2P System Models

• Centralized– Central indexing servers maintain a directory of shared data– Napster, Kuro, etc.

• Decentralized unstructured– Neither central directory server nor any precise control over network

topology or data placement– Gnutella, Kazaa, etc.

• Decentralized structured– No centralized directory but shared data placement and topology

characteristics of network are tightly controlled based on Distributed Hash Table (DHT)

– CAN, Chord, Pastry, Tapestry, etc.• Hierarchical• Hybrid

26

Page 27: MSCS6060 Parallel and Distributed Systems Peer-to-Peer Computing Rong Ge Some slides and figures are from  and HPL survey.

Centralized P2P

• Utilize a central directory for object location

• For file-sharing P2P, location inquiry form central servers then downloaded directly from peers

• Benefits– Simplicity– Limited bandwidth usage

• Drawbacks– Unreliable (single point of

failure), performance bottleneck, and scalability limits

– Vulnerable to DoS attacks– Copyright infringement

upload indexes1. query

3. transfer

Centralized Server

2. response

Page 28: MSCS6060 Parallel and Distributed Systems Peer-to-Peer Computing Rong Ge Some slides and figures are from  and HPL survey.

Unstructured P2P (1/2)• Each request is flooded to directly connected peers, which then flood their

neighbors– Until the request is answered or with a certain scope (TTL limit)

• Can be hierarchical– Supernode acts as a local central index for file shared by local peers and

forwards queries to other supenodes• Benefits

– Decentralized, reliable, fault-tolerance, …• Drawbacks

– Excessive query traffic– Not scalable– The most critical is fail to find content that is actually in the system

Page 29: MSCS6060 Parallel and Distributed Systems Peer-to-Peer Computing Rong Ge Some slides and figures are from  and HPL survey.

Unstructured P2P (2/2)

searchtransfer

peer node

supernode

1.query

2.query

Flooded to connected peers Flooded between supernodes

Page 30: MSCS6060 Parallel and Distributed Systems Peer-to-Peer Computing Rong Ge Some slides and figures are from  and HPL survey.

SearchTo search for a file a node, say n, sends a search Query message to its

neighbor nodes.On receiving a search Query, nodes look for a match in their local data setIf a match is found a Hit message is generated which is sent back over the

same path through which Query message came to the nodeQuery message is forwarded further if TTL is not zero

DownloadOn receiving Hit messages node n selects a node to download the fileThe Downloads happen via a HTTP connection

File Exchange over Gnutella

Page 31: MSCS6060 Parallel and Distributed Systems Peer-to-Peer Computing Rong Ge Some slides and figures are from  and HPL survey.

Search and Download

(1)Q

uery

(2)Query(3)Q

uery

(4) Hit

(5) Hit

(6) H

it

(7) Download

Peer A Peer D

Peer BPeer C

Page 32: MSCS6060 Parallel and Distributed Systems Peer-to-Peer Computing Rong Ge Some slides and figures are from  and HPL survey.

Structured P2P• Each peer is assigned an ID and knows a given number of peers• Each shared resource is assigned an hashed ID• A request will be directed to the peer with the ID most similar to the

resource ID using a Distributed Hash Table (DHT)• Benefits

– Scalable– More efficient searching

• Drawbacks– Routing table maintenance– Exact-match search

Page 33: MSCS6060 Parallel and Distributed Systems Peer-to-Peer Computing Rong Ge Some slides and figures are from  and HPL survey.

Distributed Hash Tables

Hash table: (key, value)Responsibility for maintaining the mapping is distributed among the nodesScalable, able to handle continual node arrivals, departures, and failures

MSCS6060 Spring 2010 33

Page 34: MSCS6060 Parallel and Distributed Systems Peer-to-Peer Computing Rong Ge Some slides and figures are from  and HPL survey.

BitTorrent

seed

peerpeer

BitTorrent Tracker uses DHT: a server assisting in the communication between peersBitTorrent index: a list of .torrent files including descriptions

Page 35: MSCS6060 Parallel and Distributed Systems Peer-to-Peer Computing Rong Ge Some slides and figures are from  and HPL survey.

Hierarchical P2P

MSCS6060 Spring 2010 35

Peers can have different roles in groups

superpeerspeers

The first c peers to join will be the superpeers in the group.

A peer must contacts one superpeer when joining a group

The superpeers forms an overlay network

Page 36: MSCS6060 Parallel and Distributed Systems Peer-to-Peer Computing Rong Ge Some slides and figures are from  and HPL survey.

Issues of P2P

• Search– Full index, partial index, Semantic search

• Flash crowd• Free riding• Topological awareness• NAT traversal• Fault resilience• Security

– Spurious content– Anonymity– Trust, Reputation

• Non-technical issue– Copyright infringement, intellectual piracy

36

Page 37: MSCS6060 Parallel and Distributed Systems Peer-to-Peer Computing Rong Ge Some slides and figures are from  and HPL survey.

P2P Search Algorithms

• How to search resource?– Centralized index model– Decentralized unstructured

• Flooded requests model• Hierarchical model (Supernode)

– Decentralized structured• Document routing model, DHT-based routing

• Advanced issues– Keyword search– Semantic context search

37

Page 38: MSCS6060 Parallel and Distributed Systems Peer-to-Peer Computing Rong Ge Some slides and figures are from  and HPL survey.

Flash Crowd• Definition

– A sudden, unanticipated growth in demand of a particular object– This object may be cold previously or new released

• Issues– Overhead: how many query messages generated?– Speed: how long to find and download the object?

Page 39: MSCS6060 Parallel and Distributed Systems Peer-to-Peer Computing Rong Ge Some slides and figures are from  and HPL survey.

Free Riding

• Peers share little or no data in P2P file-sharing systems

• Measurement– Nearly 70% of Gnutella

users share no files– Nearly 50% of all responses

are returned by the top 1% of sharing hosts

• Incentive mechanisms to encourage user cooperation

Page 40: MSCS6060 Parallel and Distributed Systems Peer-to-Peer Computing Rong Ge Some slides and figures are from  and HPL survey.

Topological Awareness

• Peers choose neighbors without any knowledge about underlying physical topology can cause a serious topology mismatching between the P2P logical overlay network and the physical underlying network

Page 41: MSCS6060 Parallel and Distributed Systems Peer-to-Peer Computing Rong Ge Some slides and figures are from  and HPL survey.

Lessons for P2P System Designers• Take the heterogeneity of the peers into account

– Different peer should be delegated with different responsibility• On-line measure performance of peers

– Adapt to changes of peer status• Fairness (incentive)

– Encourage server-like peers and discourage client-like peers (free riders) with some resource management mechanisms.

Page 42: MSCS6060 Parallel and Distributed Systems Peer-to-Peer Computing Rong Ge Some slides and figures are from  and HPL survey.

42

Conclusion• P2P may change the way of Web/Internet• Lots of creative applications to be developed• Expect a rapid growth in Internet traffic• Still lots of problems

– Illegal copies (copyright problem)– Security– Undesired traffic– …

Page 43: MSCS6060 Parallel and Distributed Systems Peer-to-Peer Computing Rong Ge Some slides and figures are from  and HPL survey.

References[1] R. Schollmeier, “A definition of peer-to-peer networking for the classification of peer-to-peer architectures

and applications,” in Proc. of P2P’01, pp. 101-102, Aug. 2001[2] A. Crespo and H. Garcia-Molina, “Routing indices for peer-to-peer systems,” in Proc. of 22nd Int’l Conf. on

Distributed Computing Systems (ICDCS’02), pp. 23-35, July 2002[3] V. Kalogeraki, D. Gunopulos, and D. Zeinalipour-Yazti, “A local search mechanism for peer-to-peer

networks,” in Proc. of 11th Int’l Conf. on Information and Knowledge Management (CIKM’02), pp. 300–307, 2002

[4] Q. Lv, P. Cao, E. Cohen, K. Li, and S. Shenker, “Search and replication in unstructured peer-to-peer networks,” in Proc. of 16th ACM Int’l Conf. on Supercomputing (ICS’02), pp. 84-95, New York, June 2002

[5] D. Tsoumakos and N. Roussopoulos, “Adaptive probabilistic search for peer-to-peer networks,” in Proc. of 3rd Int’l Conf. on Peer-to-Peer Computing (P2P’03), pp. 102-109, 1-3 Sept. 2003

[6] B. Yang and H. Garcia-Molina, ”Improving search in peer-to-peer networks”, in Proc. of 22nd Int’l Conf. on Distributed Computing Systems (ICDCS’02), pp. 5-14, 2002

[7] D. Zeinalipour-Yazti, V. Kalogeraki, and D. Gunopulos, “Information retrieval techniques for peer-to-peer networks,” Computing in Science & Engineering [see also IEEE Computational Science and Engineering], vol. 06, no. 4, pp. 20-26, July-Aug 2004

[8] Yunhao Liu, Xiaomei Liu, Li Xiao, Lionel M. Ni, and Xiaodong Zhang, “Location-Aware Topology Matching in P2P Systems,” The 23rd Conference of the IEEE Computer and Communications Societies (INFOCOM’04), vol. 4, pp. 2220-2230, 7-11 March 2004.

Page 44: MSCS6060 Parallel and Distributed Systems Peer-to-Peer Computing Rong Ge Some slides and figures are from  and HPL survey.

SETI@Home• Search for ET intelligence• Central site collects radio telescope data• Data is divided into work chunks of 300 Kbytes• User obtains client, which runs in background• Peer sets up TCP connection to central computer, downloads

chunk• Peer does FFT on chunk, uploads results, gets new chunk• According to a statistics in 2004

– Nearly 5 million participants in 226 countries– Nearly 2 million CPU years of work– Over 1.3 billion results received


Recommended