Post on 19-Jan-2016
description
transcript
Scalable Self-Repairing Publish/Subscribe
Robbert van Renesse
Ken Birman
Werner Vogels
Cornell University
Background
• ISIS, Horus, Ensemble systems– Strong properties (for replicated data)– Adaptive (changing network/app behavior)
• Problems…– as fast as slowest receiver– “Jim Gray effect”– no IP Multicast
New Direction
• Probabilistically Strong Guarantees– Randomized protocols
• Compartmentalization
• No reliance on IP multicast, clock sync
• Auto-configuration, self-repair
JBI
Three Main Components
• Astrolabe– Aggregation Service
• SelectCast– Dissemination Service
• Bimodal Multicast– End-to-end reliability
Aggregation
• Ability to summarize information from distributed sources.
• aka data fusion in sensor networks.
• The basis for scalability!
• Standard service in databases.
• Why not in distributed systems?
Examples
• Barrier Synchronization
• Voting
• Resource Location
• Multicast Routing
F
Astrolabe
• Astrolabe takes continuous snapshots of the global state of a distributed system, and aggregates this information in user-specified ways.
Four Design Principles
• Scalability through Hierarchy
• Flexibility through Mobile SQL
• Robustness through p2p Gossip
• Security through Certificates
DNS-like Domain HierarchyAttribute
list
Domains identified by path names
MIB• Each domain has an attribute list called
“MIB” (management information base).
• MIBs of internal domains generated by aggregating child domains’ MIBs.
Domain Table
• No servers for any domain: a MIB is replicated on all hosts in its domain!
• Each host maintains not only the MIBs of its own domains, but also those of its sibling domains.
• Sibling MIBs organized in “domain tables”.
Domain Table Example
ID CONTACTS ISSUED NMEMBERS MIN(LOAD)
dom1 10.0.0.1
10.0.0.2
T1 5 0.31
dom2 10.0.1.1 T2 10 0.13
dom3 10.0.2.3 T3 8 1.5
dom4 10.1.2.5
10.3.2.1
T4 18 0.0
Aggregation
id Load Weblogic? SMTP? Word Version
…
swift 2.0 0 1 6.2
falcon 1.5 1 0 4.1
cardinal 4.5 1 0 6.0
id Load Weblogic? SMTP? Word Version
…
gazelle 1.7 0 0 4.5
zebra 3.2 0 1 6.2
gnu .5 1 0 6.2
id Min Load
WL contact SMTP contact
domain1 1.5 123.45.61.3 123.45.61.17
domain2 1.7 127.16.77.6 127.16.77.11
domain3 3.1 14.66.71.8 14.66.71.12
Domain1 Domain2
SQL query “summarizes”
data
Dynamically changing query output is visible domain-wide (like spreadsheet)
Example queries
– SELECT SUM(nmembers) AS nmembers– SELECT MAX(depth) + 1 AS depth– SELECT MIN(minl) AS minl
• (minimum load)
– …
• Functions gossiped with everything else.
Aggregation
Name Load Weblogic? SMTP? Word Version
…
swift 2.0 0 1 6.2
falcon 1.5 1 0 4.1
cardinal 4.5 1 0 6.0
Name Load Weblogic? SMTP? Word Version
…
gazelle 1.7 0 0 4.5
zebra 3.2 0 1 6.2
gnu .5 1 0 6.2
Name Avg Load
WL contact SMTP contact
SF 2.6 123.45.61.3 123.45.61.17
NJ 1.8 127.16.77.6 127.16.77.11
Paris 3.1 14.66.71.8 14.66.71.12
Domain1 Domain2
Aggregation
Name Load Weblogic? SMTP? Word Version
…
swift 2.0 0 1 6.2
falcon 1.5 1 0 4.1
cardinal 4.5 1 0 6.0
Name Load Weblogic? SMTP? Word Version
…
gazelle 1.7 0 0 4.5
zebra 3.2 0 1 6.2
gnu .5 1 0 6.2
Name Avg Load
WL contact SMTP contact
SF 2.6 123.45.61.3 123.45.61.17
NJ 1.8 127.16.77.6 127.16.77.11
Paris 3.1 14.66.71.8 14.66.71.12
Domain1 Domain2
O(log n) info per host
Other Examples
1. Which are the three lowest loaded hosts?
2. Which domains contain hosts with an out-of-date virus database?
3. Do >30% of hosts measure elevated radiation?
4. Which domains contain subscribers interested in some topic?
5. Where is the nearest logging server?
Epidemic or Gossip Protocols
• Used to keep domain tables up-to-date
• Randomized Communication between (nearby) hosts:– Fast (latency grows O(log n))– Hard to stop (robust even in the face of Denial-of-
Service attacks)– Probabilistically Real-Time guarantees on latency
(based on epidemiological analysis).
How it works…
ID CONTACTS ISSUED NMEMBERS MIN(LOAD)
dom1 10.1.0.1
10.2.0.1
T1 5 0.23
dom2 10.3.0.1 T3 1 0.3
dom3 10.4.0.1 T4 8 0.0
ID CONTACTS ISSUED NMEMBERS MIN(LOAD)
domA 10.0.0.1
10.0.0.2
T5 2 0.31
domB 10.0.1.1 T6 1 0.13
domC 10.0.2.3 T7 2 1.5
domD 10.1.2.5
10.3.2.1
T8 3 0.0
gossip
SQL
SelectCast• Disseminate messages through Astrolabe
hierarchy
• (Application-level) Routers selected through domain aggregation:
SELECT
FIRST(3, routers) AS routers,
MIN(minload) AS minload
ORDER BY minload
Exploit heterogeneity, don’t hide it!
Multicast Tree
Fault Masking
Filtering (Pub/Sub)
• SQL condition on each message
• For example:– MIN(version) < 3– MAX(radiation) > 300– OR(subject) // BLOOM FILTERS– TRUE
• Generalization of topic based publishing
Filtering Example
Scalability
• Latency, memory use, CPU load, load on network links, all grow O(log N), and independent of update rate.
• Highly robust to omission and crash failures.
• Confirmed by analysis, simulation, and experiment.
• O(1) lookup for most useful queries.
Emulab topology (U. Utah)
Experiments
Real vs. Simulation
The real thing Simulation
Membership
• Domain failure detected when its attributes are no longer being updated.
• Domains discovered (and partitions repaired) through– gossip
– occasional broadcast and multicast
– configuration
• Special precautions for domains separated by firewalls and NAT boxes
Security
• Integrated PKI– integrity, no confidentiality– prevents “Sybil” Attacks
• Remove outliers– Summarize in a robust way
• Compartmentalize– Exploit domain hierarchy
Bimodal Multicast
• Probabilistic end-to-end reliability
• Uses IP Multicast or SelectCast for initial dissemination
• Runs a background gossip protocol to do repairs of message loss
• Performance improves with scale– share buffering load
Work in Progress
• Evaluate Scalability and Performance– emulation, simulation, deployment
• Improve support for low power apps– self configuration
• Improve expressiveness– pattern matching