+ All Categories
Home > Documents > Publish/Subscribe Systems with Distributed Hash Tables and Languages from IR Christos Tryfonopoulos...

Publish/Subscribe Systems with Distributed Hash Tables and Languages from IR Christos Tryfonopoulos...

Date post: 30-Mar-2015
Category:
Upload: caden-schultz
View: 214 times
Download: 2 times
Share this document with a friend
Popular Tags:
24
Publish/Subscribe Systems with Distributed Hash Tables and Languages from IR Christos Tryfonopoulos & Manolis Koubarakis Intelligent Systems Lab Dept. of Electronic & Computer Engineering Technical University of Crete, Greece http://www.intelligence.tuc.gr
Transcript
Page 1: Publish/Subscribe Systems with Distributed Hash Tables and Languages from IR Christos Tryfonopoulos & Manolis Koubarakis Intelligent Systems Lab Dept.

Publish/Subscribe Systems with Distributed Hash Tables and Languages from IR

Christos Tryfonopoulos & Manolis Koubarakis

Intelligent Systems LabDept. of Electronic & Computer EngineeringTechnical University of Crete, Greece

http://www.intelligence.tuc.gr

Page 2: Publish/Subscribe Systems with Distributed Hash Tables and Languages from IR Christos Tryfonopoulos & Manolis Koubarakis Intelligent Systems Lab Dept.

Overview

Motivation Distributed resource sharing The DHTrie protocols Local filtering algorithms Conclusions

Page 3: Publish/Subscribe Systems with Distributed Hash Tables and Languages from IR Christos Tryfonopoulos & Manolis Koubarakis Intelligent Systems Lab Dept.

Motivation

Resource sharing is at the core of today’s computing (Web, P2P, Grid).

One-time as well as continuous querying functionality is needed.

Data models and languages based on Information Retrieval are useful for annotating and querying resources.

Many nice technologies to build on (e.g., overlay networks, agents etc.)

Page 4: Publish/Subscribe Systems with Distributed Hash Tables and Languages from IR Christos Tryfonopoulos & Manolis Koubarakis Intelligent Systems Lab Dept.

Related work

Distributed information retrieval p-Search, PlanetP, [Li et.al. 2003], [Cohen et.al. 2003],

[Reynolds et.al. 2002], … Publish/subscribe

Non DHT-based SIFT, SIENA, Le Suscribe, Gryphon, P2P-DIET, …

DHT-based Scribe, Bayeux (topic-based) [Tam et.al. 2003], [Pietzuch et.al. 2003], [Terpstra et.al.

2003], [Triantafillou et.al. 2004] (content-based)

Page 5: Publish/Subscribe Systems with Distributed Hash Tables and Languages from IR Christos Tryfonopoulos & Manolis Koubarakis Intelligent Systems Lab Dept.

Distributed resource (file) sharing Two kinds of basic functionality are expected:

One-time querying A user poses a query “I want photos of Euro 2004

champions”. The system returns a list of pointers to matching resources.

Publish/subscribe A user posts a continuous query to receive a notification when

a photo of “Euro 2004 champions” is published. The system notifies the subscriber with a pointer to the peer

that published the video clip.

Page 6: Publish/Subscribe Systems with Distributed Hash Tables and Languages from IR Christos Tryfonopoulos & Manolis Koubarakis Intelligent Systems Lab Dept.

Distributed resource sharing

One-time query scenariopeer

superpeer

superpeer

superpeer super

peer

peer

peer

publication

publication

query

reply

publication

peer

informationprovider

download

Page 7: Publish/Subscribe Systems with Distributed Hash Tables and Languages from IR Christos Tryfonopoulos & Manolis Koubarakis Intelligent Systems Lab Dept.

Distributed resource sharing

Publish/subscribe scenariopeer

superpeer

superpeer

superpeer super

peer

peer

peer

publication

publication

continuousquery

notificationpeer

informationprovider

download

continuousquery

Page 8: Publish/Subscribe Systems with Distributed Hash Tables and Languages from IR Christos Tryfonopoulos & Manolis Koubarakis Intelligent Systems Lab Dept.

Achievements in the context of DIET Languages and data models from IR (emphasis on textual

information). Efficient filtering algorithms. The system P2P-DIET

A super-peer based P2P system. Implemented on top of the lightweight mobile agent platform DIET

Agents.

DIET project: www.dfki.uni-kl.de/IVSWEB/DIET DIET Agents: http://diet-agents.sourceforge.net/

P2P-DIET: http://www.intelligence.tuc.gr/p2pdiet

Current work: Solve the pub/sub problem using ideas from DHTs.

Page 9: Publish/Subscribe Systems with Distributed Hash Tables and Languages from IR Christos Tryfonopoulos & Manolis Koubarakis Intelligent Systems Lab Dept.

Distributed Hash Tables (DHTs) Created to solve the object location problem in a

distributed (dynamic) network of nodes. Support only one operation:

Given a key, map the key onto a node Many existing systems (Chord, CAN, Pastry,

Tapestry, P-Grid, DKS, Viceroy, …). Needs logarithmic number of messages to locate a

node.

Page 10: Publish/Subscribe Systems with Distributed Hash Tables and Languages from IR Christos Tryfonopoulos & Manolis Koubarakis Intelligent Systems Lab Dept.

Data model…

Publications are attribute-value pairs (A,s), where A is a named attribute and s is a text value.

An example of a publication in model AWP

{(AUTHOR, “John Smith”),

(TITLE, “Information dissemination in P2P systems”),

(ABSTRACT, “In this paper we show …”)}

Page 11: Publish/Subscribe Systems with Distributed Hash Tables and Languages from IR Christos Tryfonopoulos & Manolis Koubarakis Intelligent Systems Lab Dept.

…and query language

Examples of continuous queries in model AWP

[0,0]

( )

( ( ) )

AUTHOR James

TITLE filtering algorithms satisfaction

õ

õ

[0,2]

( " ")

( )

AUTHOR John Smith

TITLE algorithms complexity

õ

Page 12: Publish/Subscribe Systems with Distributed Hash Tables and Languages from IR Christos Tryfonopoulos & Manolis Koubarakis Intelligent Systems Lab Dept.

Distributed resource sharing revisited Publish/subscribe scenario

peer

peer

peer

publication

publication

continuousquery

notificationpeer

informationprovider

download

continuousquery

superpeer

superpeer

superpeer

superpeer

superpeer

Page 13: Publish/Subscribe Systems with Distributed Hash Tables and Languages from IR Christos Tryfonopoulos & Manolis Koubarakis Intelligent Systems Lab Dept.

Subscribing with a continuous query Assume query q of the form:

Then for a random attribute Ai and a random word wj contained in either si or wpi , we create the string Aiwj and use it as the key to forward the query to peer with ID = H(Aiwj).

The DHTrie protocols

1 1 1 1( ) ... ( ) ( ) ... ( )m m m m n nA s A s A wp A wp õ õ

Page 14: Publish/Subscribe Systems with Distributed Hash Tables and Languages from IR Christos Tryfonopoulos & Manolis Koubarakis Intelligent Systems Lab Dept.

The DHTrie protocols (cont’d) Publishing a resource

Assume a publication p of the form:

Obtain a list of peer IDs by hashing string Aiwj for all words, and all attributes in p (necessary to ensure correctness). Use indirect message passing and the DHT infrastructure to forward the message.

The receiver node, contacts neighbors included in the recipients list, removes them from it and forwards the message.

1 1 2 2{( , ), ( , ),..., ( , )}m mA s A s A s

Page 15: Publish/Subscribe Systems with Distributed Hash Tables and Languages from IR Christos Tryfonopoulos & Manolis Koubarakis Intelligent Systems Lab Dept.

Traditional way to handle a message forwarding to more than one recipients.

Send a lookup() message for each recipient. For k recipients we need O(k log(N)) lookup

messages. Multicast techniques not applicable, since group

of peers to be contacted is not known a priori.

Direct message passing

Page 16: Publish/Subscribe Systems with Distributed Hash Tables and Languages from IR Christos Tryfonopoulos & Manolis Koubarakis Intelligent Systems Lab Dept.

Incorporate recipient list into message Avoid asking the same routing question more than

once Opportunistic forwarding

Increase in message size due to: publication size

process publication (remove stopwords, stemming) use inverted (and compressed) index

receipient list size use gap compression (avoid peer IDs)

Indirect message passing

Page 17: Publish/Subscribe Systems with Distributed Hash Tables and Languages from IR Christos Tryfonopoulos & Manolis Koubarakis Intelligent Systems Lab Dept.

The DHTrie protocols

Notifying interested subscribers To find all matching queries in a peer, we use filtering

algorithm BestFitTrie.

[Tryfonopoulos, Koubarakis, Drougas, SIGIR 2004]

Once all matching queries are found, a notification message is created and forwarded to peers using indirect message passing.

Page 18: Publish/Subscribe Systems with Distributed Hash Tables and Languages from IR Christos Tryfonopoulos & Manolis Koubarakis Intelligent Systems Lab Dept.

Some (preliminary) results

5

6

7

8

9

10

11

12

13

14

15

5000 10000 15000 20000 25000 30000 35000 40000 45000 50000

Ave

rage

num

ber

of m

essa

ges

per

quer

y

Number of peers

Continuous Query Subscription

Page 19: Publish/Subscribe Systems with Distributed Hash Tables and Languages from IR Christos Tryfonopoulos & Manolis Koubarakis Intelligent Systems Lab Dept.

Filtering algorithms at each super-peer Query clustering algorithm BestFitTrie Data structure is a hash table of tries Hash table is used for fast access to trie roots We search for the best place to store query q, in

two phases:1. Best position trie-wise2. Best position forest-wise

Matching procedure examines only tries with roots contained in the incoming document

Page 20: Publish/Subscribe Systems with Distributed Hash Tables and Languages from IR Christos Tryfonopoulos & Manolis Koubarakis Intelligent Systems Lab Dept.

Filtering algorithms at each super-peer

PrefixTrie: Prefix-based clustering (handle a queryas a sequence of words)

BestFitTrie: Set-based clustering (handle a queryas a set of words)

Page 21: Publish/Subscribe Systems with Distributed Hash Tables and Languages from IR Christos Tryfonopoulos & Manolis Koubarakis Intelligent Systems Lab Dept.

Filtering algorithms at each super-peer

Page 22: Publish/Subscribe Systems with Distributed Hash Tables and Languages from IR Christos Tryfonopoulos & Manolis Koubarakis Intelligent Systems Lab Dept.

Filtering algorithms at each super-peer BestFitTrie 1M

PrefixTrie 1M

BestFitTrie 3M

PrefixTrie 3M

Page 23: Publish/Subscribe Systems with Distributed Hash Tables and Languages from IR Christos Tryfonopoulos & Manolis Koubarakis Intelligent Systems Lab Dept.

Other interesting issues

Load balancing Frequency of occurrence of words may overload

certain peers. Index queries under infrequent words. Use controlled replication.

Word frequency computation Also useful in other types of queries (VSM). Global vs Local ranking schemes. Propose a hybrid ranking scheme, with updating and

estimation mechanisms.

Page 24: Publish/Subscribe Systems with Distributed Hash Tables and Languages from IR Christos Tryfonopoulos & Manolis Koubarakis Intelligent Systems Lab Dept.

Thank you

Funding sources:IST/FET project DIET (www.dfki.uni-kl.de/IVSWEB/DIET)

IST/FET project Evergrow (http://www.evergrow.org)Heraclitus Ph.D. Fellowship Program (Greece)


Recommended