CAP theorem by Ali Ghodsi

Life after CAP

CAP conjecture [reminder]

• Can only have two of:– Consistency

– Availability

– Partition-tolerance

• Examples– Databases, 2PC, centralized algo (C & A)

– Distributed databases, majority protocols (C & P)

– DNS, Bayou (A & P)

CAP theorem

• Formalization by Gilbert & Lynch

• What does impossible mean?

– There exist an execution which violates one of CAP

– not possible to guarantee that an algorithm has all three at all times

• Shard data with different CAP tradeoffs

• Detect partitions and weaken consistency

Partition-tolerance & availability

• What is partition-tolerance?– Consistency and Availability are provided by algo

– Partitions are external events (scheduler/oracle)• Partition-tolerance is really a failure model

• Partition-tolerance equivalent with omissions

• In the CAP theorem– Proof rests on partitions that never heal

– Datacenters can guarantee recovery of partitions!• Can guarantee that conflict resolution eventually happens

How do we ensure consistency

• Main technique to be consistent

– Quorum principle

– Example: Majority quorums• Always write to and read from a majority of nodes

• At least one node knows most recent value

WRITE(v) READ v

majority(9)=5

Quorum Principle• Majority Quorum

– Pro: tolerate up to N/2 -1 crashes– Con: Have to read/write N/2 +1 values

• Read/write quorums (Dynamo, ZooKeeper, Chain Repl)– Read R nodes, Rrite W nodes, s.t. R + W > N (W > N/2)– Pro: adjust performance of reads/writes– Con: availability can suffer

• Maekwa Quorum– Arrange nodes in a MxM grid– Write to row+col, read cols (always overlap)– Pro: Only need to read/write O( sqrt(N) ) nodes– Con: Tolerate at most O( sqrt(N) ) crashes (reconfiguration)

7

P1 P2 P3

P4 P5 P6

P7 P8 P9

Probabilistic Quorums

• Quorum size α√N, (α > 1)

intersects with probability 1-exp(α2)

– Example: N=16 nodes, quorum size 7,

intersects 95%, tolerates 9 failures

– Maekwa: N=16 nodes, quorum size 7,

intersects 100%, tolerates 4 failures

– Pro: Small quorums, high fault-tolerance

– Con: Could fail to intersect, N usually large

8

Quorums and CAP

• With quorums we can get– C & P: partition can make quorum unavailable

– C & A: no-partition ensures availability and atomicity

• Faced decision when fail to get quorum *brewer’11+– Sacrifice availability by waiting for merger– Sacrifice atomicity by ignoring the quorum

• Can we get CAP for weaker consistency?

What does atomicity really mean?

• Linearization Points

– Read ops appear as if immediately happened at all nodes at• time between invocation and response

– Write ops appear as if immediately happened at all nodes at• time between invocation and response

P3

P2

W(5) W(6)

R

P1

R

invocation response

Definition of Atomicity• Linearization Points

– Read ops appear as if immediately happened at all nodes at• time between invocation and response

– Write ops appear as if immediately happened at all nodes at• time between invocation and response

P3

P2

W(5) W(6)

R:5

P1

R:6

atomic

Definition of Atomicity

P3

P2

W(5) W(6)

R:6

P1

R:6

atomic

R:5

P3

P2

W(5) W(6)

R:6

P1not atomic

Atomicity too strong?

P3

P2

W(5) W(6)

R:6

P1

R:5

not atomic

• Linearization points too strong?– Why not just have R:5 appear atomically right after W(5)?

– Lamport: ”If P2’s operator phones P1 and tells her I just read 6”

Atomicity too strong?

P3

P2

W(5) W(6)

R:6

P1

R:5

not atomic

sequentially consistent

• Sequential consistency– Weaker than atomicity

– Sequential consistency removes this ”real-time” requirement

– Any global ordering OK as long as they respect local ordering

– Does Gilbert’s proof fall apart for sequential consistency?

• Causal memory– Weaker than sequential

– No need to have global view, each process different view

– Local, read/writes immediately return to caller

– CAP theorem does not apply to causal memory P2

W(1)

P1

R:0

W(0) R:1

causallyconsistent

Going really weak

• Eventual consistency– When network non-partitioned, all nodes eventually have the same

value– I.e. don’t be ”consistent” at all times, but only after partitions heal!

• Based on powerful technique: gossipping– Periodically exchange ”logs” with one random node– Exchange must be constant-sized packets– Set reconciliation, merkle trees, etc– Use (clock, node_id) to break ties of events in log

• Properties of gossipping– All nodes will have the same value in O(log N) time– No positive-feedback cycles that congest the network

BASE

• Catch all for any consistency model C’ that enables C’-A-P– Eventual consistency

– PRAM consistency

– Causal consistency

• Main ingredients– Stale data

– Soft-state (regenerateable state)

– Approximate answers

Summary

• No need to ensure CAP at all times– Switch between algorithms or satisfy subset at different times

• Weaken consistency model– Choose weaker consistency:

• Causal memory (relatively strong) work around CAP

– Only be consistent when network isn’t partitioned:• Eventual consistency (very weak) works around CAP

• Weaken partition-tolerance– Some environments never partition, e.g. datacenters– Tolerate unavailability in small quorums– Some env. have recovery guarantees (partitions heal within X

hours), perform conflict resolution

Related Work (ignored in talk)

• PRAM consistency (Pipelined RAM)

– Weaker than causal and non-blocking

• Eventual Linearizability (PODC’10)

– Becomes atomic after quiescent periods

• Gossipping & set reconciliation

– Lots of related work

Date post:	11-Jul-2015
Category:	Technology
Upload:	ruben-gilmar-mendoza-jayo
View:	530 times
Download:	0 times

CAP theorem by Ali Ghodsi

Technology