+ All Categories
Home > Documents > Subscription Partitioning and Routing in Content-based Publish/Subscribe Networks

Subscription Partitioning and Routing in Content-based Publish/Subscribe Networks

Date post: 04-Feb-2016
Category:
Upload: aadi
View: 33 times
Download: 0 times
Share this document with a friend
Description:
Subscription Partitioning and Routing in Content-based Publish/Subscribe Networks. Yi-Min Wang, Lili Qiu, Dimitris Achlioptas, Gautam Das, Paul Larson, and Helen J. Wang Microsoft Research DISC 2002 Toulouse, France. Motivation. The increasing popularity of event notification - PowerPoint PPT Presentation
Popular Tags:
27
1 Subscription Partitioning and Routing in Content- based Publish/Subscribe Networks Yi-Min Wang, Lili Qiu, Dimitris Achlioptas, Gautam Das, Paul Larson, and Helen J. Wang Microsoft Research DISC 2002 Toulouse, France
Transcript
Page 1: Subscription Partitioning and Routing in Content-based Publish/Subscribe Networks

1

Subscription Partitioning and Routing in Content-based Publish/Subscribe Networks

Yi-Min Wang, Lili Qiu, Dimitris Achlioptas,

Gautam Das, Paul Larson, and Helen J. Wang

Microsoft Research

DISC 2002Toulouse, France

Page 2: Subscription Partitioning and Routing in Content-based Publish/Subscribe Networks

2

Motivation The increasing popularity of event notification

Yahoo! Alerts, MSN Mobile, AOL anywhere, InfoSpace, … Complements the traditional polling model in Web Examples: stock quotes, sport scores, weather, news, …

Event Distribution Network (EDN) Distributed and scalable event distribution

Parallel the idea of Content Distribution Network (CDN) for event distribution

Built on top of a self-configuring overlay network of servers Content-based publish/subscribe systems through in-

network processing of aggregated subscription filters Versus simply extending topic-based pub/sub with all

filtering processing at end servers

Page 3: Subscription Partitioning and Routing in Content-based Publish/Subscribe Networks

3

Flat dispatcher-based model

Servers

Dispatcher

Publishers (Event sources)

Subscribers

Notification Routing Service

Event traffic

Notification Traffic

Page 4: Subscription Partitioning and Routing in Content-based Publish/Subscribe Networks

4

Subscription Partitioning Basic idea: similarity-based clustering

for reducing total event traffic Event Space Partitioning (ESP) Filter Set Partitioning (FSP)

Partition 1 Partition 2

Partition 1

Partition 2

ESP FSP

Page 5: Subscription Partitioning and Routing in Content-based Publish/Subscribe Networks

5

Equality Predicates Hash predicates to get uniform distribution

Treat the hashed domain as the event space Use Event Space Partitioning

Subscription is a point; does not intersect multiple sub-spaces

Use over-partitioning for better load balancing Use offline greedy algorithm to assign buckets to

servers for load balancing Use indirection table to dynamically map buckets to

servers for load re-balancing Use bloom filters to further reduce traffic

Fast detection of true negatives at the expense of (very low) false-positive rate

Page 6: Subscription Partitioning and Routing in Content-based Publish/Subscribe Networks

6

Simulation Results Actual Notification Money log

1.48M subscriptions with 0.29M unique filters over 21,741 stock symbols

Zipf-like distribution

1

10

100

1000

10000

100000

1000000

1 10 100 1000 10000 100000

Stock symbol popularity ranking

# su

bsc

rip

tio

ns

for

each

sym

bo

l

Actual Least square line fit for the middle part

Page 7: Subscription Partitioning and Routing in Content-based Publish/Subscribe Networks

7

Simulation Results (Cont.) Simulate 100M new subscriptions from 43,734

symbols Scaled-up Zipf-like distribution Perturbation and permutation Uniform distribution

50 servers with over-partitioning ratio = 10 Without load re-balancing

Load imbalance (max/min) ranged from 1.41 to 6.66 (Uniform case)

With imbalance threshold of 2.0 Re-balancing was triggered only 5 times, each time

involving re-assignment of up to 3 buckets and migration of up to 0.7% subscriptions.

Page 8: Subscription Partitioning and Routing in Content-based Publish/Subscribe Networks

8

Range Predicates Use Filter Set Partitioning K-Mean clustering

Use center point to represent a rectangle R-tree-based clustering

R-tree: dynamic index structure for multi-dimensional data rectangles

Offline R-tree algorithm Exhaustively and recursively search for partitions that

minimize sum of bounding rectangle volumes Online R-tree algorithm

Insert from root down the path that greedily minimizes the increase in bounding rectangle volume

Simulation results Off-line R-tree > On-line R-tree > K-Mean > Random

Page 9: Subscription Partitioning and Routing in Content-based Publish/Subscribe Networks

9

Related Work Pub/Sub systems

Echo, Elvin, Gryphon, Herald, Hierarchical Proxy Architecture, Information Bus, JEDI, Keryx, Ready, Scribe, Siena, …

Clustering in the pub/sub All the previous work focus on reducing #

multicast groups [OAA+00, RLW+02, WKM00]

Page 10: Subscription Partitioning and Routing in Content-based Publish/Subscribe Networks

10

Summary Proposed two subscription partitioning

and routing approaches Event Space Partitioning Filter Set Partitioning

Evaluated performance via simulations Subscription partitioning reduces network

traffic Over-partitioning helps to achieve good load

balancing dynamically Bloom filter further reduces event traffic

Page 11: Subscription Partitioning and Routing in Content-based Publish/Subscribe Networks

11

Simulation Results 10,000 random subscriptions per server on

average Offline R-tree performs the best; reduces event

traffic by 20% to 60%

0

0.2

0.4

0.6

0.8

1

0 5 10 15 20

# servers

Hit

rat

io p

er s

erve

r Random

Offline R-tree

Online R-Tree

Offline/Online R-tree

Offline K-Mean

Online K-Mean

Offline/Online K-Mean

Page 12: Subscription Partitioning and Routing in Content-based Publish/Subscribe Networks

12

Model of Content-based Pub/Sub Content-based filtering

Event schema with d attributes, supporting equality and range predicates

Event: a point in the d–dimensional space Subscription: a rectangle in that space Match: a rectangle contains the point

Content-based routing Based on a subset of attributes Consider d’-dimensional points and

rectangles where d’ ≤ d

Page 13: Subscription Partitioning and Routing in Content-based Publish/Subscribe Networks

13

EDN Network Architecture

16

2

4

35

5

3

1. Submit subscriptions2. Subscription routing3. Content-based route

updates4. Peer exchange of

route updates5. Content-based event

routing6. Notification delivery

NotificationRouting Services

subscriber

EventSrc.

EDNnodes

Page 14: Subscription Partitioning and Routing in Content-based Publish/Subscribe Networks

14

Backup Slides

Page 15: Subscription Partitioning and Routing in Content-based Publish/Subscribe Networks

15

Optimize various performance metrics, subject to load-balancing constraints Minimize total event traffic

Volume of union of rectangles Maximize overall system throughput Minimize end-to-end latency

Precise Summary

Imprecise Summary

Subscription rectangles

Page 16: Subscription Partitioning and Routing in Content-based Publish/Subscribe Networks

16

The EDN Optimization Problem

Centralized Architecture

Distributed Architecture

EventSources

Subscribers

ServerNotification

RoutingService

1

PartitionExisting

Subscriptions

2SummaryReporting

3RouteEvents

4

Route NewSubscriptions

5

Page 17: Subscription Partitioning and Routing in Content-based Publish/Subscribe Networks

17

Three Research Directions

Theoretical Study Optimal or approximation algorithms for

simplified versions System Design and Simulation

Subscription partitioning for reducing event traffic

Summary-based routing for enhancing system throughput

Indigo-based Implementation Extensible routing & pub/sub architecture

Page 18: Subscription Partitioning and Routing in Content-based Publish/Subscribe Networks

18

An R-tree-based EDN pub/sub system

Summary Manager

Maximal Rectangles

Subscription Rectangles

Summary- Based Router

Single- Node

Filtering Engine

Event ( = Point )

Subscription ( = Rectangle )

Summary Bounding Rectangles

Page 19: Subscription Partitioning and Routing in Content-based Publish/Subscribe Networks

19

System Design and Simulation:Summary-based Routing

Basic idea: summary precision-based load balancing for enhancing system throughput

Ns servers Ts F

Dispatcher Td

R Tl

Tl

Tp Publishers

Page 20: Subscription Partitioning and Routing in Content-based Publish/Subscribe Networks

20

If dispatcher is not the bottleneck, use precise summary.

Otherwise, reduce summary precision until either the outgoing link or the servers are about to become the bottleneck. Throughput increasing

Further reduction of summary precision would generate excessive false-positive traffic to throttle back the dispatcher Throughput decreasing

Page 21: Subscription Partitioning and Routing in Content-based Publish/Subscribe Networks

21

Simulation results

Imprecise summaries enhance throughput

0.5

1

1.5

2

2.5

3

0% 20% 40% 60% 80% 100%

Summary precision

Rel

ativ

e th

rou

gh

pu

t

100,000 rectangles (Rp=0.75;Ro=0.97) 50,000 rectangles (Rp=0.67;Ro=0.89)

20,000 rectangles (Rp=0.54;Ro=0.82) 10,000 rectangles (Rp=0.42;Ro=0.73)

Page 22: Subscription Partitioning and Routing in Content-based Publish/Subscribe Networks

22

0.5

1

1.5

2

2.5

3

0% 20% 40% 60% 80% 100%

Summary precision

Re

lati

ve

th

rou

gh

pu

t

With partitioning Without partitioning

Imprecise summaries combined with R-tree-based partitioning further enhance throughput

Page 23: Subscription Partitioning and Routing in Content-based Publish/Subscribe Networks

23

Dispatcher-to-link and dispatcher-to-sever bottleneck ratios

0

1

2

3

4

5

6

0% 20% 40% 60% 80% 100%

Summary precision

Dis

pa

tch

er

bo

ttle

ne

ck

ra

tio

s

Ratio_s (Ns/F=20) Ratio_oRatio_s (Ns/F=10) Ratio_s (Ns/F=2)

Page 24: Subscription Partitioning and Routing in Content-based Publish/Subscribe Networks

24

EDN on Herald

Piggyback subscription routing & summary reporting on multicast tree forming process

Need to additionally consider notification traffic (because subscribers are now part of multicast tree)

Subscriber

SubscriptionRouting

Page 25: Subscription Partitioning and Routing in Content-based Publish/Subscribe Networks

25

Indigo-based Implementation

Indigo M2 routing & pub/sub architecture was not extensible

EDN used M2 messaging and built a WS-compliant, extensible routing & pub/sub architecture on top of it Close collaboration with Indigo

Extensibility proposals to Indigo Some appeared in M3

But most sealed for security for now Some being considered for M4

Page 26: Subscription Partitioning and Routing in Content-based Publish/Subscribe Networks

26

EDN Extensible Routing and Pub/Sub

Indigo Messaging

EDNRoute Manager

EDNSubscription Manager

WS-EventingSubscription Manager

MSRoute Manager

WS-RoutingRoute Manager

NamespaceBinding Layer

XPathFilter

Matcher

EDNR-tree

Matcher

Page 27: Subscription Partitioning and Routing in Content-based Publish/Subscribe Networks

27

Other XML-Messaging/Indigo interactions

State dependency management Design tool for new features involving “state

transplant” E.g., System Restore (across time), Intellimirror (across

space) Repair tool providing consistent undo

System Restore + rollback of “atomic units” GoBack3 + roll-forward of “atomic units”

Troubleshooting tool Trace-diff & state-diff approaches

Our automatic, bottom-up, black-box discovery approach complements their manual, top-down, logical declaration approach (TravisM)

Install-time and run-time information augments the authoring-time information

Targeted problem spaces help identify things to declare for manageability


Recommended