Date post: | 11-Jul-2015 |
Category: |
Technology |
Upload: | ruben-gilmar-mendoza-jayo |
View: | 530 times |
Download: | 0 times |
Life after CAP
CAP conjecture [reminder]
• Can only have two of:– Consistency
– Availability
– Partition-tolerance
• Examples– Databases, 2PC, centralized algo (C & A)
– Distributed databases, majority protocols (C & P)
– DNS, Bayou (A & P)
CAP theorem
• Formalization by Gilbert & Lynch
• What does impossible mean?
– There exist an execution which violates one of CAP
– not possible to guarantee that an algorithm has all three at all times
• Shard data with different CAP tradeoffs
• Detect partitions and weaken consistency
Partition-tolerance & availability
• What is partition-tolerance?– Consistency and Availability are provided by algo
– Partitions are external events (scheduler/oracle)• Partition-tolerance is really a failure model
• Partition-tolerance equivalent with omissions
• In the CAP theorem– Proof rests on partitions that never heal
– Datacenters can guarantee recovery of partitions!• Can guarantee that conflict resolution eventually happens
How do we ensure consistency
• Main technique to be consistent
– Quorum principle
– Example: Majority quorums• Always write to and read from a majority of nodes
• At least one node knows most recent value
WRITE(v) READ v
majority(9)=5
Quorum Principle• Majority Quorum
– Pro: tolerate up to N/2 -1 crashes– Con: Have to read/write N/2 +1 values
• Read/write quorums (Dynamo, ZooKeeper, Chain Repl)– Read R nodes, Rrite W nodes, s.t. R + W > N (W > N/2)– Pro: adjust performance of reads/writes– Con: availability can suffer
• Maekwa Quorum– Arrange nodes in a MxM grid– Write to row+col, read cols (always overlap)– Pro: Only need to read/write O( sqrt(N) ) nodes– Con: Tolerate at most O( sqrt(N) ) crashes (reconfiguration)
7
P1 P2 P3
P4 P5 P6
P7 P8 P9
Probabilistic Quorums
• Quorum size α√N, (α > 1)
intersects with probability 1-exp(α2)
– Example: N=16 nodes, quorum size 7,
intersects 95%, tolerates 9 failures
– Maekwa: N=16 nodes, quorum size 7,
intersects 100%, tolerates 4 failures
– Pro: Small quorums, high fault-tolerance
– Con: Could fail to intersect, N usually large
8
Quorums and CAP
• With quorums we can get– C & P: partition can make quorum unavailable
– C & A: no-partition ensures availability and atomicity
• Faced decision when fail to get quorum *brewer’11+– Sacrifice availability by waiting for merger– Sacrifice atomicity by ignoring the quorum
• Can we get CAP for weaker consistency?
What does atomicity really mean?
• Linearization Points
– Read ops appear as if immediately happened at all nodes at• time between invocation and response
– Write ops appear as if immediately happened at all nodes at• time between invocation and response
P3
P2
W(5) W(6)
R
P1
R
invocation response
Definition of Atomicity• Linearization Points
– Read ops appear as if immediately happened at all nodes at• time between invocation and response
– Write ops appear as if immediately happened at all nodes at• time between invocation and response
P3
P2
W(5) W(6)
R:5
P1
R:6
atomic
Definition of Atomicity
P3
P2
W(5) W(6)
R:6
P1
R:6
atomic
R:5
P3
P2
W(5) W(6)
R:6
P1not atomic
Atomicity too strong?
P3
P2
W(5) W(6)
R:6
P1
R:5
not atomic
• Linearization points too strong?– Why not just have R:5 appear atomically right after W(5)?
– Lamport: ”If P2’s operator phones P1 and tells her I just read 6”
Atomicity too strong?
P3
P2
W(5) W(6)
R:6
P1
R:5
not atomic
sequentially consistent
• Sequential consistency– Weaker than atomicity
– Sequential consistency removes this ”real-time” requirement
– Any global ordering OK as long as they respect local ordering
– Does Gilbert’s proof fall apart for sequential consistency?
• Causal memory– Weaker than sequential
– No need to have global view, each process different view
– Local, read/writes immediately return to caller
– CAP theorem does not apply to causal memory P2
W(1)
P1
R:0
W(0) R:1
causallyconsistent
Going really weak
• Eventual consistency– When network non-partitioned, all nodes eventually have the same
value– I.e. don’t be ”consistent” at all times, but only after partitions heal!
• Based on powerful technique: gossipping– Periodically exchange ”logs” with one random node– Exchange must be constant-sized packets– Set reconciliation, merkle trees, etc– Use (clock, node_id) to break ties of events in log
• Properties of gossipping– All nodes will have the same value in O(log N) time– No positive-feedback cycles that congest the network
BASE
• Catch all for any consistency model C’ that enables C’-A-P– Eventual consistency
– PRAM consistency
– Causal consistency
• Main ingredients– Stale data
– Soft-state (regenerateable state)
– Approximate answers
Summary
• No need to ensure CAP at all times– Switch between algorithms or satisfy subset at different times
• Weaken consistency model– Choose weaker consistency:
• Causal memory (relatively strong) work around CAP
– Only be consistent when network isn’t partitioned:• Eventual consistency (very weak) works around CAP
• Weaken partition-tolerance– Some environments never partition, e.g. datacenters– Tolerate unavailability in small quorums– Some env. have recovery guarantees (partitions heal within X
hours), perform conflict resolution
Related Work (ignored in talk)
• PRAM consistency (Pipelined RAM)
– Weaker than causal and non-blocking
• Eventual Linearizability (PODC’10)
– Becomes atomic after quiescent periods
• Gossipping & set reconciliation
– Lots of related work