+ All Categories
Home > Documents > Distributed Computing Peer to Peer Computing Chapter 10: PEER TO PEER SYSTEMS.

Distributed Computing Peer to Peer Computing Chapter 10: PEER TO PEER SYSTEMS.

Date post: 20-Jan-2016
Category:
Upload: rosaline-curtis
View: 238 times
Download: 1 times
Share this document with a friend
Popular Tags:
52
Distributed Computing Peer to Peer Computing Chapter 10: PEER TO PEER SYSTEMS
Transcript
Page 1: Distributed Computing Peer to Peer Computing Chapter 10: PEER TO PEER SYSTEMS.

Distributed ComputingPeer to Peer Computing

Chapter 10: PEER TO PEER SYSTEMS

Page 2: Distributed Computing Peer to Peer Computing Chapter 10: PEER TO PEER SYSTEMS.

What is peer-to-peer (P2P) computing? Webster definition

Peer: one that is of equal standing with another Computing between equals

Page 3: Distributed Computing Peer to Peer Computing Chapter 10: PEER TO PEER SYSTEMS.

Intro to P2P Systems The scope of expanding popular services by

adding to number of the computers hosting then is limited when all the host must be owned & managed by the service provider

Administration and fault recovery costs

Bandwidth that can be provided to a single server Major service provider all face this problem with

varying severity

Page 4: Distributed Computing Peer to Peer Computing Chapter 10: PEER TO PEER SYSTEMS.

Intro to P2P Systems P2P application that exploit resources available at

the edges of the internet Storage, content, cycles, human presence

Traditional client-server provide access to these but only on single machine or tightly coupled servers This centralized design required few decisions about

placement & management of resources

In P2P -- algorithm for the placement and subsequent retrieval of information objects are a key aspect of the system design. It’s a system which is Fully decentralized & self organizing Can dynamically balance the storage and processing

loads between all the participating computers as they join and leave

Page 5: Distributed Computing Peer to Peer Computing Chapter 10: PEER TO PEER SYSTEMS.

P2P Design Characteristics Their design ensures that each user contributes

resources to the systems Although they may differ in the resources that

contribute, all the nodes in a peer to peer system have the same functionality capabilities and responsibilities

Their correct operation dose not depend on the existence of any centrally administered systems

They can be designed to offer a limited degree of anonymity to the providers and users of resources

Key issues for the their efficient operation is the choice of algorithm for placing and retrieving data on many hosts Balance of load Availability without much overhead

Participants availability to system is unpredictable

Page 6: Distributed Computing Peer to Peer Computing Chapter 10: PEER TO PEER SYSTEMS.

Evolution of P2P Can be grouped in three generations

First generation – Napster music exchange service [OpneNap 2001]

Second generation – file sharing applications with greater Scalability, anonymity & fault tolerance Guentella, Kaza, Freenet

Developed with help of middleware layers Application independent management of distributed

resources on a global scale E.g. pastry, Tapestry, CAN, CHORD, JAXTA

Provide of delivery of request of delivery of request in a bounded number of network hops

Place replicas of resources, by keeping mind volatile availability & trustworthiness, locality

Page 7: Distributed Computing Peer to Peer Computing Chapter 10: PEER TO PEER SYSTEMS.

Cluster, Grid, P2P: CharacteristicsCharacteristic Cluster Grid P2P

Population Commodity Computers

High-end computers Edge of network (desktop PC)

Ownership Single Multiple Multiple

Discovery Membership Services

Centralised Index & Decentralised Info

Decentralized

User Management Centralised Decentralised Decentralised

Resource management Centralized Distributed Distributed

Allocation/Scheduling Centralised Decentralised Decentralised

Inter-Operability VIA based? No standards yet No standards

Single System Image Yes No No

Scalability 100s 1000? Millions? [@Home]

Capacity Guaranteed Varies, but high Varies

Throughput Medium High Very High

Speed(Lat. Bandwidth) Low, high High, Low High, Low

Page 8: Distributed Computing Peer to Peer Computing Chapter 10: PEER TO PEER SYSTEMS.

Example P2P Applications SETI@home Napster Gnutella FastTrack

Page 9: Distributed Computing Peer to Peer Computing Chapter 10: PEER TO PEER SYSTEMS.

SETI@home

SETI@home uses the National Astronomy and Ionospheric Center's 305 meter telescope at Arecibo, Puerto Rico.

A screenshot of the SETI@home client program. •2.4 mil volunteers as of Oct. 2000

Page 10: Distributed Computing Peer to Peer Computing Chapter 10: PEER TO PEER SYSTEMS.

Distributed computation Usage & Exploitation best example

SETi@home (Search for Extra-Terrestrial Intelligence) Portions a steam of digitized radio telescope data

into 107 second work unit, each about 350KB, distribute them on clients computer

Work unit is redundantly distributed to 3-4 users, to guard against errors & bad nodes

Coordination work is handled by a single server T3.91 million PCs participated in this by 2002 In one year they processed 221 million work units,

data worth 27.36 teraflops on average

Page 11: Distributed Computing Peer to Peer Computing Chapter 10: PEER TO PEER SYSTEMS.

Napster Centralized MP3 file sharing Clients/Peers hold the files Servers holds catalog and broker relationships

Clients upload IP address, music file shared, and requests

Clients request locations where requests can be met

File transfer is P2P – proprietary protocol

Page 12: Distributed Computing Peer to Peer Computing Chapter 10: PEER TO PEER SYSTEMS.

Napster & its Legacy

Napster serverIndex1. File location

2. List of peers

request

offering the file

peers

3. File request

4. File delivered5. Index update

Napster serverIndex

Napster: peer-to-peer file sharing with a centralized, replicated index

Page 13: Distributed Computing Peer to Peer Computing Chapter 10: PEER TO PEER SYSTEMS.

Napster & its Legacy Architecture included centralized index

servers, main reason for defeat in lawsuit Anonymity of receiver & provider Lessons learned from napster

Feasible to develop a large scale P2P service It can scale the resource to meet the need, based

on locality Limitations

Consistency between replicas was not strong, but for music this requirement is not much strong

Index server for accessing the resource was a bottleneck

Page 14: Distributed Computing Peer to Peer Computing Chapter 10: PEER TO PEER SYSTEMS.

Gnutella Completely decentralized – no servers with

catalogs Shares any files Gnutella node ---- SERVENT

Issue the query and view search result Accept the query from other SERVENTs and check

the match against its database and response with corresponding result

Page 15: Distributed Computing Peer to Peer Computing Chapter 10: PEER TO PEER SYSTEMS.

Gnutella (cont) Joining the network:-

The new node connects to a well-known SERVENT Then sends a PING message to discover other

nodes PONG message are sent in reply from hosts

offering connections with the new node Direct connection are then made

Page 16: Distributed Computing Peer to Peer Computing Chapter 10: PEER TO PEER SYSTEMS.

Gnutella (cont) Searching a file:-

A node broadcasts its QUERY to all its peers who in turn broadcasts to their peers

Nodes route back QUERYHITS along the QUERY path back to the sender containing the location detail

To download the files a direct connection is made using details of the host in the QUERYHIT message

Page 17: Distributed Computing Peer to Peer Computing Chapter 10: PEER TO PEER SYSTEMS.

Gnutella (cont) Gnutella broadcasts its messages. To prevent flooding -TTL is introduced. To prevent forwarding same mesg. twice -

each servent maintains a list of recently seen mesg.

Page 18: Distributed Computing Peer to Peer Computing Chapter 10: PEER TO PEER SYSTEMS.

Gnutella (cont)

GnuCache

A

User A connects to the GnuCache to get the list of available servents already connected in the network GnuCache sends back the list to the user A User A sends the request message GNUTELLA CONNECT to the user BUser B replies with the GNUTELLA OK message granting user A to join the network

User A connects to the GnuCache to get the list of available servents already connected in the network GnuCache sends back the list to the user A User A sends the request message GNUTELLA CONNECT to the user BUser B replies with the GNUTELLA OK message granting user A to join the network

B

D

C

F

G

E

H

J

I

(1) (2)

(3)

(4)(1)

(1)

(1)

(2)

(2)

(2)

(3)

(3)

(3) (3)

(2)

Page 19: Distributed Computing Peer to Peer Computing Chapter 10: PEER TO PEER SYSTEMS.

Gnutella (cont) Typical query scenario:-

A sends a query message to its neighbor, B B first checks that the message is not an old one Then checks for a match with its local data If there is a match, it sends the queryHit message

back to user A Else B decrements TTL by 1 and forwards the

query message to users C, D, and E C, D, and E performs the same steps as user B and

forwards the query message further to users F, G, H, and I

Page 20: Distributed Computing Peer to Peer Computing Chapter 10: PEER TO PEER SYSTEMS.

Gnutella (cont) Problems

Broadcast mesg. congests the network Lost of reply packets (dynamic environment)

Page 21: Distributed Computing Peer to Peer Computing Chapter 10: PEER TO PEER SYSTEMS.

FastTrack Hybrid between centralized and decentralized Has 2 tiers of control:-

Ordinary nodes that connect to super nodes in a centralized fashion

Super nodes that connect to each other in a decentralized manner

Page 22: Distributed Computing Peer to Peer Computing Chapter 10: PEER TO PEER SYSTEMS.

FastTrack (cont)

Page 23: Distributed Computing Peer to Peer Computing Chapter 10: PEER TO PEER SYSTEMS.

FastTrack (cont) Joining the network? - Bootstrapping node Querying?

Problems (Like Gnutella) Broadcast mesg. between Super Nodes Lost of reply packets

Page 24: Distributed Computing Peer to Peer Computing Chapter 10: PEER TO PEER SYSTEMS.

Some key issues Scalability

Networks can grow to millions of nodes Challenge in achieving efficient peer and resource

discovery High amount of query/response traffic

Availability Potential for commercial content provision Such services require high availability and

accessibility Anonymity

What is the right level of anonymity?

Page 25: Distributed Computing Peer to Peer Computing Chapter 10: PEER TO PEER SYSTEMS.

Some key issues (cont) Security

Due to open nature, have to assume environment is hostile

Concerns include: Privacy and anonymity File authenticity Threats like worms and virus

Fault Resilience The system must still be able to function even

though several important nodes goes off-line.

Page 26: Distributed Computing Peer to Peer Computing Chapter 10: PEER TO PEER SYSTEMS.

Some key issues (cont) Standards and Interoperability

Lack of standards lead to poor interoperability between applications

Can be improved by using common protocols Copyright / Access Control

Classic case of Napster being shut down Other applications have learned to get around the

law Possibility of paid access in future

Page 27: Distributed Computing Peer to Peer Computing Chapter 10: PEER TO PEER SYSTEMS.

Some key issues (cont) Quality of Service (QoS)

Metrics to be used is not clearly defined Tradeoff between achieving QoS and costs

Complexity of Queries Must be able to support query languages of

varying degree of expressiveness Simple keywords to SQL-like searches

Search Mechanism Different search algorithms are used to reduced

search time and maximize search space

Page 28: Distributed Computing Peer to Peer Computing Chapter 10: PEER TO PEER SYSTEMS.

Some key issues (cont) Load Balancing

existence of hot-spots (overloaded nodes) due to: uneven node distribution throughout logical space uneven object distribution among nodes uneven demand distribution among objects

query and routing hot-spots Self-organization

Ability to adapt itself to the dynamic nature of the Internet

Depends on the architecture of the system

Page 29: Distributed Computing Peer to Peer Computing Chapter 10: PEER TO PEER SYSTEMS.

P2P Middleware - GUID Resources are identified by Global Unique Identifier

GUID Derived from secure hash HASH makes a resource self certifying Client receiving the resource can check the hash This requires that states of resources are immutable P2P systems are inherently best suited for the

storage of immutable objects – music file, images Mutable objects sharing can be managed by set of

trusted servers to manage the sequence of versions e.g Oceanstore, Ivy – more in section 10.6

Page 30: Distributed Computing Peer to Peer Computing Chapter 10: PEER TO PEER SYSTEMS.

Overlay routing vs IP routing

IP Application-level routing overlay

Scale IPv4 is limited to 232 addressable nodes. The IPv6 name space is much more generous (2128), but addresses in both versions are hierarchically structured and much of the space is pre-allocated according to administrative requirements.

Peer-to-peer systems can address more objects. The GUID name space is very large and flat (>2128), allowing it to be much more fully occupied.

Load balancing Loads on routers are determined by network topology and associated traffic patterns.

Object locations can be randomized and hence traffic patterns are divorced from the network topology.

Network dynamics (addition/deletion of objects/nodes)

IP routing tables are updated asynchronously on a best-efforts basis with time constants on the order of 1 hour.

Routing tables can be updated synchronously or asynchronously with fractions of a second delays.

Fault tolerance Redundancy is designed into the IP network by its managers, ensuring tolerance of a single router or network connectivity failure. n-fold replication is costly.

Routes and object references can be replicated n-fold, ensuring tolerance of n failures of nodes or connections.

Target identification Each IP address maps to exactly one target node.

Messages can be routed to the nearest replica of a target object.

Security and anonymity Addressing is only secure when all nodes are trusted. Anonymity for the owners of addresses is not achievable.

Security can be achieved even in environments with limited trust. A limited degree of anonymity can be provided.

Page 31: Distributed Computing Peer to Peer Computing Chapter 10: PEER TO PEER SYSTEMS.

Peer to Peer to Middleware Key problem in p2p application design:

“Provide mechanism to enable clients to access data resources quickly & dependably whenever they are located throughout the network” Napster used a unified index 2nd generation p2p file systems like Gnutella &

Freenet employ portioned & distributed indexes, but the algorithm used are specific to each system

Page 32: Distributed Computing Peer to Peer Computing Chapter 10: PEER TO PEER SYSTEMS.

P2P Middleware P2P Middleware are designed specifically for placement &

subsequent location of the distributed objects managed by different p2p systems

Functional Requirements Simplified construction of distributed services

Locate resources and communicate with resource provider & consumer

Add & remove resources at will anytime API for the p2p programmers

Non Functional Requirements Global Scalability Load Balancing Optimization for local interaction between neighboring

peers Accommodating to highly dynamic host availability Security of data in an environment with heterogeneous

trust Anonymity, deniability and resistance to censorship

Page 33: Distributed Computing Peer to Peer Computing Chapter 10: PEER TO PEER SYSTEMS.

Routing Overlay In P2P we cannot maintain the database at all the

client nodes, giving the location of all the resources

Resource location knowledge must be partitioned and distributed

Each node is made responsible for maintaining detailed knowledge of the locations of nodes and

objects in a portion of the namespace As well as general knowledge of the topology of the

entire name space High degree of replication of this knowledge is

necessary to ensure dependability in the face of the volatile availability of hosts and intermittent network connectivity.

Page 34: Distributed Computing Peer to Peer Computing Chapter 10: PEER TO PEER SYSTEMS.

Distribution of information in a routing overlay

Object:

Node:

D

CÕs routing knowledge

DÕs routing knowledgeAÕs routing knowledge

BÕs routing knowledge

C

A

B

Routing overlay takes the responsibility for locating nodes and objects

Page 35: Distributed Computing Peer to Peer Computing Chapter 10: PEER TO PEER SYSTEMS.

Routing Overlay Ensures that any node can access any object

by routing each request through a sequence of nodes, exploiting knowledge at each of them to locate the destination object

It also maintains the knowledge of location of all the replicas of the object and deliver request to nearest live node

GUID used to identify nodes and objects are an example of pure name “opaque identifier” It dose not reveal identity of the location of the

object

Page 36: Distributed Computing Peer to Peer Computing Chapter 10: PEER TO PEER SYSTEMS.

Tasks of Routing Overlay Main task of Routing Overlay

Client wishing to invoke an operation on an object submits a request including the object’s GUID to the routing overlay, which routes the request to a node at which a replica of the object resides

Other task of Routing Overlay Node wishing to make new object available to a P2P

service computes a GUID for the object and announces it to the routing overlay, which then ensures that the object is reachable by all other clients.

When clients request the removal of the object from the service the routing overlay must make them unavailable.

Nodes may join or leave the service, when a node joins the service, the routing overlay arranges for it to assume some of the responsibilities of other nodes. When a node leaves its responsibilities are distributed amongst the other nodes.

Page 37: Distributed Computing Peer to Peer Computing Chapter 10: PEER TO PEER SYSTEMS.

Basic programming interface for a distributed hash table (DHT) as implemented by the PAST API over Pastry

put(GUID, data) The data is stored in replicas at all nodes responsible for the object identified by GUID.

remove(GUID)Deletes all references to GUID and the associated data.

value = get(GUID)The data associated with GUID is retrieved from one of the nodes responsible for it.

Page 38: Distributed Computing Peer to Peer Computing Chapter 10: PEER TO PEER SYSTEMS.

Basic programming interface for distributed object location and routing (DOLR) as implemented by Tapestry

publish(GUID ) GUID can be computed from the object (or some part of it, e.g. its name). This function makes the node performing a publish operation the host for the object corresponding to GUID.

unpublish(GUID)Makes the object corresponding to GUID inaccessible.

sendToObj(msg, GUID, [n])Following the object-oriented paradigm, an invocation message is sent to an object in order to access it. This might be a request to open a TCP connection for data transfer or to return a message containing all or part of the object’s state. The final optional parameter [n], if present, requests the delivery of the same message to n replicas of the object.

Page 39: Distributed Computing Peer to Peer Computing Chapter 10: PEER TO PEER SYSTEMS.

Overlay Case Study: Pastry All the nodes & objects are assigned 128-bit GUIDs

For nodes: For nodes: computed by applying a secure hash function to public key of the node

For objects: For objects: computed by applying a secure hash function to the objects name or some part of its stored state

Resulting GUIDs are randomly distributed in the range 0 to 2128 -1

Provide no clue how these values are computed and clashes between GUIDs for different nodes or objects are extremely unlikely, still pastry can detect & mange this unlikely event

In a network with N participating nodes the Pastry algo will correctly route a message addressed to any GUID in O(log N) steps

If GUID identifies a node which is active, message is delivered to that node otherwise delivered to a active node with closet numeric GUID

Page 40: Distributed Computing Peer to Peer Computing Chapter 10: PEER TO PEER SYSTEMS.

Pastry Routing steps involve the use of an underlying transport

protocol (normally UDP) to transfer the message to a Pastry node that is closer to its destination

Closeness in pastry refers to an entirely artificial space – the space of GUIDs Real transport of message across internet between two

pastry nodes may require lots of IP hops. For better path option, pastry uses locality metric on

network distance in the underlying network (hop count, two way latency) to select appropriate neighbors when setting up the routing tables used at each node

Pastry id fully self organizing New nodes get info form neighbors to construct the table Nodes can detect the absence of the node and can update

the table

Page 41: Distributed Computing Peer to Peer Computing Chapter 10: PEER TO PEER SYSTEMS.

Pastry: Routing Algo Explanation in two stages

Stage 01: simplified form of the algo which routes messages correctly but inefficiently without a routing table

Stage 02: full routing algo which routes request to any node in O(log N) messges

Page 42: Distributed Computing Peer to Peer Computing Chapter 10: PEER TO PEER SYSTEMS.

Figure 10.6: Circular routing alone is correct but inefficient Based on Rowstron and Druschel [2001]

The dots depict live nodes. The space is considered as circular: node 0 is adjacent to node (2128-1). The diagram illustrates the routing of a message from node 65A1FC to D46A1C using leaf set information alone, assuming leaf sets of size 8 (l = 4). This is a degenerate type of routing that would scale very poorly; it is not used in practice.

0 FFFFF....F (2128-1)

65A1FC

D13DA3

D471F1

D467C4D46A1C

Page 43: Distributed Computing Peer to Peer Computing Chapter 10: PEER TO PEER SYSTEMS.

Pastry: Routing Algo Each pastry nodes maintain a tree structured

routing table giving GUIDs and IP addresses for a set of nodes spread through out the entire range of 2128 possible values, with increased density of coverage for GUID numerically close to its own

Fig 10.7 shows the structure of the routing table Fig 10.8 illustrate the actions of the routing

algorithm

Page 44: Distributed Computing Peer to Peer Computing Chapter 10: PEER TO PEER SYSTEMS.

Routing tables are structured as

GUIDs are viewed as hexadecimal values & tables classifies GUIDs based on their hexadecimal prefixes

Tables has as many rows as there are hexadecimal digits in a GUID, so for our prototype there are 128/4 = 32 rows

Each row contains 15 entries

Page 45: Distributed Computing Peer to Peer Computing Chapter 10: PEER TO PEER SYSTEMS.

Figure 10.7: First four rows of a Pastry routing table

Page 46: Distributed Computing Peer to Peer Computing Chapter 10: PEER TO PEER SYSTEMS.

Figure 10.8: Pastry routing example Based on Rowstron and Druschel [2001]

0 FFFFF....F (2128-1)

65A1FC

D13DA3

D4213F

D462BA

D471F1

D467C4D46A1C

Routing a message from node 65A1FC to D46A1C.With the aid of a well-populated routing table themessage can be delivered in ~ log 16(N ) hops.

Page 47: Distributed Computing Peer to Peer Computing Chapter 10: PEER TO PEER SYSTEMS.

Figure 10.9: Pastry’s routing algorithm

To handle a message M addressed to a node D (where R[p,i] is the element at column i,row p of the routing table):

1. If (L-l < D < Ll) { // the destination is within the leaf set or is the current node.2. Forward M to the element Li of the leaf set with GUID closest to D or the current

node A.3. } else { // use the routing table to despatch M to a node with a closer GUID4. find p, the length of the longest common prefix of D and A. and i, the (p+1)th

hexadecimal digit of D .5. If (R[p,i] ° null) forward M to R[p,i] // route M to a node with a longer common

prefix.6. else { // there is no entry in the routing table7. Forward M to any node in L or R with a common prefix of length i, but a

GUID that is numerically closer.}

}

Page 48: Distributed Computing Peer to Peer Computing Chapter 10: PEER TO PEER SYSTEMS.

Pastry’s routing algorithm Algo will succeed in delivering the message

M to its destination cuase lines 1,2 & 7 They perform action as described in stage 01

The remaining steps are designed to improve the alogrithm’s performance by reducing the numbers of hops required

Page 49: Distributed Computing Peer to Peer Computing Chapter 10: PEER TO PEER SYSTEMS.

Host Integration Node compute GUID Contact near by node – address of nearby node ??

X is the new node sends join request to A A will dispatch the join as normal message to numerically

nearest node of X, using Pastry algo. Let that node is Z A, Z and all the nodes (B, C …) through which the

message was routed to Z Add relevant part of their RT and leaf sets to X X examines these leaf sets and construct its own routing table

& leaf sets Can request some other nodes for additional info X’s leaf set node should be very much similar to Z leaf set

Once X RT is constructed it send its leaf set and RT entries, info to other nodes are other nodes to update their RT

Page 50: Distributed Computing Peer to Peer Computing Chapter 10: PEER TO PEER SYSTEMS.

RT updates Node failure or departure

Node is considered failed when its immediate neighbors are unable to contact

Node which discovers the failure of the node, looks for the next nearest live node, and request for its leaf set

This leaf set will contain the overlapping info of failed node leaf set. Discovering node will choose the best node from this leaf set to replace the failed node.

Page 51: Distributed Computing Peer to Peer Computing Chapter 10: PEER TO PEER SYSTEMS.

Self study Locality Fault tolerance Dependability – MS Pastry Evaluation of MS Pastry

Page 52: Distributed Computing Peer to Peer Computing Chapter 10: PEER TO PEER SYSTEMS.

ThanksThanks

Dr. Raihan! remember & keepup ur promise Dr. Raihan! remember & keepup ur promise


Recommended