Date post: | 05-Dec-2014 |
Category: |
Technology |
Upload: | palvaro |
View: | 12,100 times |
Download: | 1 times |
Blazes: coordination analysis for distributed program
Peter Alvaro, Neil Conway, Joseph M. Hellerstein David Maier UC Berkeley Portland State
Distributed systems are hard
Asynchrony Partial Failure
Asynchrony isn’t that hard
Logical timestamps Deterministic interleaving
Ameloriation:
Partial failure isn’t that hard
Replication Replay
Ameloriation:
Asynchrony * partial failure is hard2
Logical timestamps Deterministic interleaving
Replication Replay
Asynchrony * partial failure is hard2
Replication Replay
Today: Consistency criteria for fault-tolerant distributed systems Blazes: analysis and enforcement
This talk is all setup Frame of mind: 1. Dataflow: a model of distributed computation 2. Anomalies: what can go wrong? 3. Remediation strategies
1. Component properties 2. Delivery mechanisms
Framework: Blazes – coordination analysis and synthesis
Little boxes: the dataflow model
Generalization of distributed services Components interact via asynchronous calls (streams)
Components
Input interfaces Output interface
Streams
Nondeterministic order
Example: a join operator
R
S T
Example: a key/value store
put
get response
Example: a pub/sub service
publish
subscribe deliver
Logical dataflow
“Software architecture”
Data source
client
Service X filter cache
c
a
b
Dataflow is compositional
Data source
client
Service X filter aggregator
Dataflow is compositional
Components are recursively defined
Dataflow exhibits self-similarity
c
q r
Buffer
Buffer
group/count
Dataflow exhibits self-similarity
DB HDFS
Hadoop
Index Combine
Sta:c HTTP App1
App2
Buy
Content
User requests
App1 answers
App2 answers
Physical dataflow
Physical dataflow
Data source
client
Service X filter aggregator
c
a
b
Physical dataflow
Data source
Service X filter
aggregator
client “System architecture”
What could go wrong?
Cross-run nondeterminism
Data source
client
Service X filter aggregator
c
a
b
Run 1
Nondeterministic replays
Cross-run nondeterminism
Data source
client
Service X filter aggregator
c
a
b
Run 1
Nondeterministic replays
Cross-run nondeterminism
Data source
client
Service X filter aggregator
c
a
b
Nondeterministic replays
Run 2
Cross-run nondeterminism
Data source
client
Service X filter aggregator
c
a
b
Nondeterministic replays
Run 2
Cross-instance nondeterminism
Data source
Service X
client
Transient replica disagreement
Cross-instance nondeterminism
Data source
Service X
client
Transient replica disagreement
Divergence
Data source
Service X
client
Permanent replica disagreement
Divergence
Data source
Service X
client
Permanent replica disagreement
Divergence
Data source
Service X
client
Permanent replica disagreement
Divergence
Data source
Service X
client
Permanent replica disagreement
Hazards
Data source
client
Service X filter aggregator c
a
b
Order à Contents?
Preventing the anomalies
1. Understand component semantics (And disallow certain compositions)
Component properties
• Convergence – Component replicas receiving the same
messages reach the same state – Rules out divergence
Insert Read
Convergent data structure (e.g., Set CRDT)
Convergence
Insert Read
Commutativity Associativity Idempotence
Insert Read
Convergent data structure (e.g., Set CRDT)
Convergence
Insert Read
Commutativity Associativity Idempotence
Insert Read
Convergent data structure (e.g., Set CRDT)
Convergence
Insert Read
Commutativity Associativity Idempotence
Insert Read
Convergent data structure (e.g., Set CRDT)
Convergence
Insert Read
Commutativity Associativity Idempotence
Reordering Batching Retry/duplication
Tolerant to
Convergence isn’t compositional
Data source
client
Convergent (identical input contents è identical state)
Convergence isn’t compositional
Data source
client
Convergent (identical input contents è identical state)
Convergence isn’t compositional
Data source
client
Convergent (identical input contents è identical state)
Component properties
• Convergence – Component replicas receiving the same
messages reach the same state – Rules out divergence
• Confluence – Output streams have deterministic contents – Rules out all stream anomalies
Confluent è convergent
Confluence
Confluence
Confluence
Confluence
Confluence
=
Confluence
output set = f(input set)
{ }
{ } =
Confluence is compositional
output set = f � g(input set)
Confluence is compositional
output set = f � g(input set)
Preventing the anomalies
1. Understand component semantics (And disallow certain compositions)
2. Constrain message delivery orders 1. Ordering
Ordering – global coordination
Determinis:c outputs
Order-sensitive
Ordering – global coordination
Data source
client
The first principle of successful scalability is to batter the consistency mechanisms down to a minimum. – James Hamilton
Preventing the anomalies
1. Understand component semantics (And disallow certain compositions)
2. Constrain message delivery orders 1. Ordering 2. Barriers and sealing
Barriers – local coordination
Determinis:c outputs
Data source
client Order-sensitive
Barriers – local coordination
Data source
client
Sealing – continuous barriers
Do partitions of (infinite) input streams “end”? Can components produce deterministic results given “complete” input partitions?
Sealing: partition barriers for infinite streams
Sealing – continuous barriers
Finite partitions of infinite inputs are common …in distributed systems
– Sessions – Transactions – Epochs / views
…and applications – Auctions – Chats – Shopping carts
Blazes:
consistency analysis
+
coordination selection
Blazes:
Mode 1: Grey boxes
Grey boxes
Example: pub/sub x = publish y = subscribe z = deliver
x
y z
Determinis:c but unordered
Severity Label Confluent Stateless
1 CR X X
2 CW X
3 ORgate X
4 OWgate
x->z : CW y->z : CWT
Grey boxes
Example: key/value store
x = put; y = get; z = response
x
y z
Determinis:c but unordered
Severity Label Confluent Stateless
1 CR X X
2 CW X
3 ORgate X
4 OWgate
x->z : OWkey y->z : ORT
Label propagation – confluent composition
CW CR
CR
CR
CR
Label propagation – confluent composition
CW CR
CR
CR
CR Determinis:c outputs
Label propagation – confluent composition
CW CR
CR
CR
CR Determinis:c outputs
CW
Label propagation – unsafe composition
OW CR
CR
CR
CR
Label propagation – unsafe composition
OW CR
CR
CR
CR Tainted outputs
Label propagation – unsafe composition
OW CR
CR
CR
CR Tainted outputs
Interposi:on point
Label propagation – sealing
OWkey CR
CR
CR
CR
Seal(key=x) Seal(key=x)
Label propagation – sealing
OWkey CR
CR
CR
CR Determinis:c outputs
Seal(key=x) Seal(key=x)
Label propagation – sealing
OWkey CR
CR
CR
CR Determinis:c outputs
OWkey Seal(key=x)
Seal(key=x)
Blazes:
Mode 1: White boxes
white boxes module KVS! state do! interface input, :put, [:key, :val]! interface input, :get, [:ident, :key]! interface output, :response, ! ! ! ! ! ! !!
! ! ! ![:response_id, :key, :val]! table :log, [:key, :val]! end! bloom do! log <+ put! log <- (put * log).rights(:key => :key)! response <= (log * get).pairs(:key=>:key) do |s,l| !
! ![l.ident, s.key, s.val]!! end!
end!end
white boxes module KVS! state do! interface input, :put, [:key, :val]! interface input, :get, [:ident, :key]! interface output, :response, ! ! ! ! ! ! !!
! ! ! ![:response_id, :key, :val]! table :log, [:key, :val]! end! bloom do! log <+ put! log <- (put * log).rights(:key => :key)! response <= (log * get).pairs(:key=>:key) do |s,l| !
! ![l.ident, s.key, s.val]!! end!
end!end Negation (à order sensitive)
white boxes module KVS! state do! interface input, :put, [:key, :val]! interface input, :get, [:ident, :key]! interface output, :response, ! ! ! ! ! ! !!
! ! ! ![:response_id, :key, :val]! table :log, [:key, :val]! end! bloom do! log <+ put! log <- (put * log).rights(:key => :key)! response <= (log * get).pairs(:key=>:key) do |s,l| !
! ![l.ident, s.key, s.val]!! end!
end!end Negation (à order sensitive)
Partitioned by :key
white boxes module KVS! state do! interface input, :put, [:key, :val]! interface input, :get, [:ident, :key]! interface output, :response, ! ! ! ! ! ! !!
! ! ! ![:response_id, :key, :val]! table :log, [:key, :val]! end! bloom do! log <+ put! log <- (put * log).rights(:key => :key)! response <= (log * get).pairs(:key=>:key) do |s,l| !
! ![l.ident, s.key, s.val]!! end!
end!end
put àresponse: OWkey get à response: ORkey
Negation (à order sensitive)
Partitioned by :key
white boxes module PubSub! state do! interface input, :publish, [:key, :val]! interface input, :subscribe, [:ident, :key]! interface output, :response, ! ! ! ! ! ! !!
! ! ! ![:response_id, :key, :val]! table :log, [:key, :val]! table :sub_log, [:ident, :key]! end! bloom do! log <= publish!
!sub_log <= subscribe!!response <= (log * sub_log).pairs(:key=>:key) do |s,l| !! ![l.ident, s.key, s.val]!! end!
end!end
publish à response: CW subscribe à response: CR
white boxes module PubSub! state do! interface input, :publish, [:key, :val]! interface input, :subscribe, [:ident, :key]! interface output, :response, ! ! ! ! ! ! !!
! ! ! ![:response_id, :key, :val]! table :log, [:key, :val]! table :sub_log, [:ident, :key]! end! bloom do! log <= publish!
!sub_log <= subscribe!!response <= (log * sub_log).pairs(:key=>:key) do |s,l| !! ![l.ident, s.key, s.val]!! end!
end!end
The Blazes frame of mind:
• Asynchronous dataflow model • Focus on consistency of data in motion – Component semantics – Delivery mechanisms and costs
• Automatic, minimal coordination
Queries?