Routing Indices For P-to-P Systems ICDCS 2002. Introduction Search in a P2P system –Mechanisms...

transcript

Routing Indices For P-to-P Systems

ICDCS 2002

Introduction• Search in a P2P system

– Mechanisms without an index– Mechanisms with specialized index nodes (cent

ralized search)– Mechanisms with indices at each node

• Structure P2P network• Unstructure P2P network

• Parallel v.s. sequentially search– Response time– Network traffic

Routing indices(RI)• Query

– Documents are on zero or more “topics”, and queries request documents on particular topics.

– Documents topics are independent

• Local index• RI

– Each node has a local routing index which contains following information

• The number of documents along each path• The number of documents on each topic of interest

– Allow a node to select the “best” neighbors to send a query to

• The RI may be “coarser” than the local indices – overcounts– Undercounts

• Goodness measure– Number of results in a path

• Using Routing indices

– Storage space• N: number of nodes in the P2P network

• b: branching factor

• c: number of categories

• s: counter size in bytes

Centralized index : s*( c+1) *N

Distributed system: s*(c+1)*b (each node)

• Creating routing indices

• Maintaining Routing Indices– Trade off between RI freshness and update cost– No requiring the participation of a

disconnecting node

• Discussion– If the search topics is dependent?– Can the number of “hops” necessary to reach a

document be estimated?

Alternative Routing Indices

• Hop-count RI– Aggregated RIs for each “hop” up to a maximu

m number of hops are stored

– Search cost• Number of messages

– The goodness of a neighbor• The ratio between the number of documents availabl

e through that neighbor and the number of messages required to get those documents

– Regular tree with fanout F

– It takes Fh messages to find all documents at hop h

– Storage cost?

• Exponentially aggregated RI– Store the result of applying the regular-tree cost

formula to a hop-count RI

– How to compute the goodness of a path for the query containing several topics?

Cycles in the P2P network (HW)

Improving Search in Peer-to-Peer Networks

ICDCS 2002

Beverly YangHector Garcia-Molina

Outline

• Introduction

• Techniques

• Experiment

Introduction

• We present three techniques for efficient search in P2P systems.– Basic idea is to reduce the number of nodes that

process a query

Current Techniques

• Gnutella– BFS with depth limit D.– Waste bandwidth and processing resources

• Freenet– DFS with depth limit D.– Poor response time.

Iterative Deepening

• Under policy P= { a, b, c} ;waiting time W

• See example.

Directed BFS

• A source send query messages to just a subset of its neighbors

• A node maintains simple statistics on its neighbors– Number of results received from each neighbor– Latency of connection

Candidate nodes

• Returned the Highest number of results

• Low hop-count

• High messages

Local Indices

• Each node n maintains an index over the data of all nodes within r hops radius.

• All nodes at depths not listed in the policy simply forward the query.

• Example: policy P= { 1, 5}

Experimental Setup

• For each response ,we log:– Number of hops took– IP from which the Response message came– Response time– Individual results

Experimental result

Efficient Content Location Using Interest-Based Locality in Peer-to-

Peer SystemsKunwadee Sripanidkulchai

Bruce Maggs

Hui Zhang

IEEE INFOCOM 2003

motivation

• Although flooding is simple and robust, it is not scalable.

• A content location solution in which peers organized into an interest-based structure on top of Gnutella.

• The algorithm is called interest-based shortcuts

Interest-based locality

Shortcuts Architecture and Design Goals

• To create additional links on top of a peer-to-peer system’s overlay

• As a separate performance enhancement layer on top of existing content location mechanisms

Content location paths

Shortcut Discovery

• The first lookup returns a set of peers that store the content

• These are potential candidates.

• One peer is selected at random from the set and added

• For scalability, each peer allocates a fixed-size amount of storage to implement shortcuts.

Shortcut selection

• We rank shortcuts based on their perceived utility

• A peer sequentially asking all of the shortcuts on its list.

Ranking metrics

• Probability of providing content

• Latency of the path to the shortcut

• Load at the shortcut

• A combination of metrics can be used based on each peer’s preference

Performance indices

• Success rate

• Load characteristics

• Query scope

• Minimum reply path lengths

• Additional state

Potential and Limitations

• Adding 5 shortcuts at a time produces success rates that are close to the best possible.

• Slightly increase the shortest path length from 1 to 2 hops will perform better success rate.

Conclusion

• A simple and practical mechanism was proposed.

Similarity Discovery in structured P2P Overlays

Introduction• Structured P2P network

– Only support search with a single keyword

• Similarity between two documents– Keyword sets– Vector space– Measure

• Problems– Search problem– New keyword?

||||cos 1

Meteorograph

• Absolute angle

Publishing and Searching

• Publish– Hash

– Publish the item to a node np with the hash key closest to hash value

• Search problem– Nearest answers– K_nearest answers–

• Partial

• Comprehensive

• Search strategy

• Discussions

• What happened when keyword vector is represented by ?

Other issues

• Load balance

• Changes of vector space– Republished?– Comprehensive set of keywords– Other methods?

SWAM: A Family of Access Methods for Similarity-Search in

Peer-to-Peer Data NetworksFarnoush Banaei-KashaniCyrus Shahabi

(CIKM04)

PDN access method

• Defines

• How to organize the PDN topology to an index-like structure

• How to use the index structure

Hilbert space

• Hilbert space (V, Lp)• Key k = (a1,a2, … , ad)

– d: the dimension of a Vector space– The domain is a contiguous and finite interval o

• The Lp norm with p belongs to Z+– The distance function to measure the dissimilari

Topology

• Topology of a PDN can be modelled as a directed graph G(N, E)

• A(n) is the set of neighbors for node n

• A node maintains– A limited amount of information about its neigh

bors Includes • the key of the tuples maintained at neighbors

• The physical addresses of neighbors

• The processing of the query is completed when all expected tuples in the relevant result set are visited

• Access methods– Join, leave for virtual nodes– Forward for using local information to process

queries and make forwarding decisions

The small world example

• Grid component

• Random graph component

• The process of queries (exact, range, kNN) in the highly locality topology

Flat partitioning

• SWAM also employs the space partitioning idea: flat partitioning

Query Processing

• Exact-Match query processing

• Range query processing

• kNN Query processing

Data Indexing in Peer-to-Peer DHT Networks

ICDCS 2004

• Locating data using incomplete information.– How to search data in a DHT

• Data descriptors and queries– Semi-structured XML data

– Query• Most specific query for d

• Relationship between queries

• Given the most specific query, finding the location of the file is simple

• How about less specific queries

• Solution– Provide query-to-query service

• For a given query q, the index service returns a list of more specific queries, covered by q

– DHT storage system must be extended• Insert(q.qi), q->qi, adds a mapping (q;qi) to the index

of the node responsible for key q.