Fast Searching in Peer-to-Peer Networks

transcript

Self-Organizing Parallel Search Clusters

Rocky Dunlap

Agenda

• Peer-to-peer Networks

• Search Links/Index Links Model

• Parallel Search Clusters

• Self-Organizing Parallel Search Clusters

• Further Research

Peer-to-Peer Networks

• Peer = Client + Server

• Anyone can send/process messages

• Highly Distributed

• Highly Parallel

• Data-centric routing

P2P Networks – Two Types

• Unstructured• “Loose” network

structure• Requires less control

of peers (casual searching)

• Fault tolerance, churn• Keyword searching

• Structured• Specific network

structure• Distributed Hash Tables

– Smart routing

• Guarantees:– Bounded hops– Bounded state– Ability to search entire

network

Unstructured Searching

The Problems

• Query saturation – every node processes every query

• Query processing redundancy

• Slow response time from distant nodes

• In reality, cannot search entire network (TTL)

• Need a model for studying P2P networks

SIL Model

• Search Links (forwarding)

• Index Links (non-forwarding)

SIL Model

Index links provide full coverage

Searches remain inside cluster

Parallel Search Clusters

• Assumptions– Keep network essentially unstructured (keyword

searching, fault tolerance)– Search rate is high– Update rate is low

• Limit the number of nodes that processes query• Provide full (or high) coverage of network• Index links allow some nodes to proxy searches

for others

The Challenge

• Self-Organizing Parallel Search Clusters

• Decentralized• Nodes only know a few

neighbors• Dealing with “churn”• Minimal interruption of

normal operations

Proposed Solution

• Existing clusters split into two new clusters• Advantages

– Solves origin problem (start with one cluster)– Clusters split autonomously– Automatic load balancing

• Three phase approach– Color– Replicate Links– Split

Splitting Cluster

Phase 1Coloring

Splitting Cluster

Phase 1Coloring

Color (radius = 2)

Splitting Cluster

Phase 1ColoringColor (radius = 2)

Splitting Cluster

Phase 2Replicate Links

Splitting Cluster

Phase 2Replicate Links

Splitting Cluster

Phase 3Split

Splitting Cluster

Phase 3Split

Splitting Cluster

Phase 3Split

Splitting Cluster

Phase 3Split

Splitting Cluster

Phase 3Split

Splitting Cluster

Phase 3Split

Further Research

• Initiating the split• Choosing the radius for coloring phase

– Want two clusters of same size

• Overloading index links• Dealing with “churn”

– Nice nodes– Not-so-nice nodes

• Merge operation?• Simulation

Bibliography

• B. F. Cooper and H. Garcia-Molina. SIL: Modeling and Measuring Scalable Peer-to-peer Search Networks. http://www-db.stanford.edu/~cooperb/pubs/searchnets.pdf, 2003.

• B. Yang and H. Garcia-Molina. Improving Search in Peer-to-Peer Networks. http://dbpubs.stanford.edu:8090/pub/2002-28, 2002.

Fast Searching in Peer-to-Peer Networks

Documents