C. Yang (Oct. 30): Searching in Peer-to-Peer Networks

Searching In Peer-To-Peer Networks

Chunlin Yang

What’s P2P - Unofficial Definition

• All of the computers in the network are equal

• Each computer functions as a client as well as a server with no administrator

• User on each computer decides what data on their computer will be shared on the network

What’s P2P – Continue

• To share huge volumes of data among peers in the network

• No dedicated servers or hierarchy among the computers in the network

• Examples: Gnutella, Freenet, and Napster

Why P2P• Three internet fundamental assets: information,

bandwidth, storage space

• Increasing amount of information, find useful information in real time is increasingly difficult

• Bandwidth: more have been done, however hot sites like Yahoo, eBay get more and more traffic bottleneck

Why P2P - Continue

• Computing resource: processors speed increase and storage device capacity get bigger, but data center accumulate more and more computation tasks

• P2P networking can greatly improve the utilization of the internet resources

Why P2P - Continue

• Load balance traffic to reduce the peak load on network

• Increase reliability and fault tolerance of the global system

• Fault tolerance for server down time, such as email delivery or slice big email package to small packets and transfer through multi-path.

Basic Searching Algorithms

• Gnutella: BFS

• Freenet: DFS

• Napster: Index Server

Basic Search Algorithm Gnutella

• Each node of the network simultaneously acts as a client as well as a server

• Conducts searching while listening for incoming queries

• Completely decentralized, every node is equal

Basic Searching Algorithm Gnutella - Continue

• A node send query to all its neighbors and each neighbor searches in its own resource and forward the message to all it’s own neighbors

• If a query is satisfied, a response will be sent back to the original requester using the reverse path

Basic Searching Algorithm Gnutella - Continue

• Queries are assigned GUIDs to avoid repetition

• Use a TTL of 7 (about 10000 nodes) to not congest the network

• Problem: can be cyclical, and cause excessive traffic

Basic Searching Algorithm Freenet

• Cooperative file distribution to improve documentation distribution efficiency by sharing bandwidth and disk

• Each file has a unique id and its locations

• Network of equal nodes, each acting as client and server

Basic Searching Algorithm Freenet - Continue

• Information stored on hosts under searchable keys

• Uses a depth-first search with depth limit D. Each node forwards the query to a single neighbor, and waits for a definite response from the neighbor

• If the query was not satisfied, the neighbor forwards the query to another neighbor

Basic Searching Algorithm Freenet - Continue

• If the query was satisfied, the response will be sent back to the query source using the reverse path

• Each node along the path copies data to its own database as well

• More popular information becomes

easier to access

Basic Searching Algorithm Napster

• Centralized server has information of online users and songs location in database for quick search

• Client use peer-to-peer file transfer when a location of a song found from server

• Legal problem: ignores copyright• Problem: same issue for client-server

bottleneck and if the index server down

Improving Search Algorithms In Peer-to-Peer Network

• Iterative Deepening

• Directed BFS

• Local Indices

• Routing Indices

• NEVRLATE

Iterative Deepening

• Multiple breadth-first searches initiated with successively larger depth limits, until the query is satisfied or the

• Maximum depth has been reached.• Example: policy P(a,b,c) first depth a,

second depth b, and third depth c.

Iterative Deepening - Continue

• A Source mode S first initiates a BFS of depth a, When a node at depth a receives and process the query, it will store the query temporarily

• All messages frozen at nodes of a hops from the source

• S receives response messages from nodes that have processes the query

Iterative Deepening - Continue

• After a time period of predefined W, if the query has been satisfied, S does nothing

• Otherwise S starts another round of iteration by initiating a BFS of depth b

• S send a resend message of TTL of a, all node will only forward the resend message until to nodes at a hops

Iterative Deepening - Continue• A node at hop a will drop the resend message

and unfreeze the corresponding query by forwarding the query to all its neighbor with a TTL of b-a

• When message reach to node of hop b, the process continues in a similar fashion

• When process to level c, query will not be frozen, S will not initiate another iteration even the query is not satisfied. Problem ?

Directed BFS

• A node sends query to a subset of its neighbors that could return many results for minimum response time

• A node maintains simple statistics on its neighbors for past queries or the latency of the connection with that neighbor

• From these statistics, some rules can be used to pick up a node to send a query:

Directed BFS - Continue

• Neighbors that has returned highest number of results for previous queries

• Neighbors that returns response message having the lowest average number of hops

• Neighbors that has forward the largest number of message

• Neighbors that has the shortest message queue

Local Indices

• Each node n maintains an index over the data of all nodes within r hops of itself

• r is a system-wide variable known as the radius of the index

• When receive a query, a node can process it on behalf of every node with in r hops, data can be searched on fewer nodes to reduce the cost while keep the satisfaction

Local Indices - Continue• A system-wide policy specifies the depths at

which the query should be processed

• All nodes at the depths not listed in the policy simply forward the query

• Example P(1,5), Only nodes with a depths of 1 and 5 process the query while nodes at other depth just forward the query,

• Reason: Each node has information of its neighbors within 4 hops.

Routing Indices

• To allow a node to select the “best” neighbors to send a query to,

• Routing Indices is a data structure and associated algorithms that, given a query, returns a list of neighbors, ranked according to their goodness for the query,

• The goodness should in general reflect the number of documents in nearby nodes.

Routing Indices - Continue

• Each node has a local index for quickly finding local documents when a query is received.

• Nodes also have a Compound Routing Indices containing:

• The number of documents along each path,• The number of documents on each topic of

interest,

Routing Indices Example

Routing Indices Example

Documents with topics--------------------------------------------------

Path #docs DB N T L A 150 30 20 0 100B 100 20 0 10 30 C 1000 0 300 0 50 D 200 100 0 100 150

Routing Indices - Maintain

• When a connection is established between two nodes, they exchange their routing indices, and update its own indices and send message to its neighbors,

• When a node I disconnected from the network, node D detected, it will remove the row for I, and send a new routing indices of its own to all its neighbors to update.

NEVRLATE

• Network-Efficient Vast Resource Lookup At The Edge

• Directory servers to be organized into a logical 2-dimensional grid, or a set of sets of servers

• Enabling registration in one “horizontal” dimension and

• Lookup in the other “vertical” dimension.

NEVRLATE - Continue

• Each node is a directory server

• Each set of servers, the vertical cloud, can reach each other member of the set

• The set of sets of servers is the entire NEVRLATE network.

NEVRLATE - Continue

NEVRLATE - Continue

• Each host register its resource and location to one node of each set

• When a query comes, only one set need to be searched to get all location containing the satisfied information

• Can also register to two nodes in each sets for fault tolerance

Extension

• Total rank of neighbor’s : weighed sum of all key ranks

• Assumption: high rank nodes should always be better to access or close to resource

• Dominating-set mark process: rule1/rule2, when remove a node from the DS, choose the one with less rank instead of uid

Extension - Continue

• Based on Mark Process (Wu & Li), the connected dominating set nodes will have relatively higher connectivity than non-DS nodes.

• The dominating set nodes need to have resource information and location of resource for their neighbor nodes.

• When search, request will be sent only to DS nodes to reduce cost and traffic while keep satisfactions.


• Clustering: when construct a cluster, choose the one with highest rank instead of lowest uid, choose the node with lowest rank as the gateway – low traffic

• Consider not only its own rank but also total ranks of its neighbors

• Max-min ranking: when searching, choose max as well as min for the key index rank


• Reason: max could be high traffic, min, low traffic

• Networks are dynamic, resources are dynamic, help to re-rank the networks

• Example: Glades Rd/Palmetto Park Rd

• SW NE

Summary

Date post:	07-Jun-2015
Category:	Documents
Upload:	networkingcentral
View:	444 times
Download:	5 times