SCRIBEA large-scale and decentralized application-level multicast infrastructure
Overview
Pastry
PAST distributed file system layered on top of
Pastry
SCRIBE decentralized publish/subscribe system
Pastry – Quick Review
Chord like routing Consistent
hashing Prefix routing Leaf set
Pastry – locality properties
Short routes Total distance
traveled Average dist 1.59 to
2.2 times actual dist Route convergence
Dist Traveled by 2 messages sent to same key
Equal to dist between to nodes before routes converge
Pastry API
nodeID = pastryInit(Credentials) Causes node to join pastry network
route(msg,key) send(msg,IP-addr) Applications must export:
deliver(msg,key) forward(msg,key,nextID) newLeafs(leafSet)
SCRIBE
Built on top of Pastry Support large number of groups Handle a high rate of membership
turnover SCRIBE nodes can:
Create groups Join groups Multicast messages to groups
SCRIBE API
create(credentials, groupID) join(credentials, groupID) leave(credentials, groupID) multicast(credentials, groupID, message)
SCRIBE – Creating a Group
Pastry route(msg, key) SCRIBE route(CREATE, groupID) groupID => hash textual name cat creator
name Message delivered to closest key which become
rendez-vous point for the group (root of multicast tree for group) Adds to local list of groups Stores credentials
Alternative use itself as root good choice if creator sends to group often
SCRIBE – Joining a Group
Pastry route(msg, key) SCRIBE route(JOIN, groupID)
routed to rendez-vous point along the way multicast tree formed
SCRIBE – Leaving a Group
Remove from local group children list If list becomes empty forward to parent Part of the multicast tree may be
removed
SCRIBE – Sending a multicast message
route(MULTICAST, groupID) ask for rendez-vous IP address
If rendez-vous fails re-request rendez-vous point Pastry handles node duplication
All messages are sent through the rendez-vous point
SCRIBE – Repairing the Multicast Tree
Messages are delivered only in best-effort may be out of order delivery
Periodic heartbeat message sent to all children
Child rejoins the tree through sending a new JOIN message if suspects parent has failed
Can repair rendez-vous point Pastry handles node duplication in leaf nodes Children nodes JOIN new root when missing
heartbeat is detected
SCRIBE – Forming a Multicast Tree Rendez-vous point (root) Forwarders
may or may not be members of the group maintain a children table (IP and nodeID)
for group
SCRIBE - Strengths
Pastry handles root duplication Rendez-vous point does not handle all
join requests Locality properties of Pastry
short routes delay from rendez-vous point to member is
short route convergence
load imposed on physical network is small
SCRIBE – Experimental Evalutation Simulation experimental results Focus on three metrics:
delay to deliver events to group members stress on each node stress on each physical network link
SCRIBE – Simulator Evaluation 5050 routers and 100,000 end nodes 1,500 groups of different sizes 10 different runs using same parameters
but different random seeds Averaged all results Compared results with IP multicast
SCRIBE – Delay Penalty
RMD – ratio between max delay using SCRIBE & max delay using IP multicast
RAD – ratio between average delay using SCRIBE
& average delay using IP multicast
SCRIBE – Node Stress
Average node is responsible for forwarding a small number of multicast messages
SCRIBE – Link Stress
Total num links = 1,035,295 SCRIBE = 2,489,824 messages (mean
2.4) IP multicast = 758,853 messages (mean
0.7)
SCRIBE – Bottleneck remover Bottlenecks
Low capacity nodes High capacity nodes with extremely high children
entries Drop children if over capacity
Select child to drop and send message with children table
Child chooses new parent node and sends JOIN message
Result Removes long tail in node stress graph Increases average link stress
SCRIBE – Scalability Small Groups
50,000 nodes 30,000 groups 11 members each group SCRIBE performs poorly for large number
small groups SCRIBE collapse
Removes long paths removing nodes that are not members of a
group & have only one entry in their children table
SCRIBE – Scalability Small Groups Average link stress 6.1 to 3.3 Average number of children 21.2 to 8.5
SCRIBE - Conclusion
Fully decentralized Support large number of groups Support large group size Multiple multicast sources per group
QUESTIONS