Write Fast, Read in the Past: Causal Consistency for Client-
Side Applications
Marc ShapiroJoint work with:
Marek Zawirski, Inria & LIP6Annette Bieniusa, U. Kaiserslautern
Valter Balegas, U. Nova LisboaSérgio Duarte, U. Nova Lisboa
Carlos Baquero, U. MinhoNuno Preguiça, U. Nova Lisboa
[Write Fast, Read in the Past: Causal+ Consistency for Client-Side Apps]
Cloud-backed applications
2
1. Request4. Transmit
update
50~200ms3. Reply
2. Process request & store update
Large amounts of shared mutable dataGeo-replication: high availability, low latency, fault tolerance
10~200ms
10~20
0ms
database
app server
ad-hoc app interface
[Write Fast, Read in the Past: Causal+ Consistency for Client-Side Apps]
SwiftCloud approach
3
Transmit
App + database at/near clientUpdate shared store locallyAvailability + consistency
partialdatabase
app
Process request& store update Transmit
Transmit
fail-over
fulldatabase
Transmit
[Write Fast, Read in the Past: Causal+ Consistency for Client-Side Apps]
Requirements & challenges
Scalability with #items, #clients• Partial replication of data• … and of metadata• Small, bounded metadata
Availability ⇒ causal consistency• Eventual exactly-once delivery• Causal order• … despite partial replication• … despite transient failures• … despite permanent failures
Convergence
4
[Write Fast, Read in the Past: Causal+ Consistency for Client-Side Apps]
Metadata
Tracking causality• Dependence: piggy-backed on update message• Delivery: compare with incoming message
Representations:• Graph: worst-case cost, transitive closure cost• Vector: compact transitive closure, constant
cost + optimisationsIssue: size(vector) = O (#masters)Log: pruning not contingent on clients
5
1063
depend on 10th update of
replica 1
received 10th update from
Replica 1
[Write Fast, Read in the Past: Causal+ Consistency for Client-Side Apps]
Causality + scalability: DC
Full replica state:• Causally consistent• Transitively closed
Vector clock• size = O(#DCs)
6
DC
DC
DC
C
C C
C
[Write Fast, Read in the Past: Causal+ Consistency for Client-Side Apps]
Causality + scalability: client
Partial replica: cache interest set Topology: Communicate with DC onlyState = DC state |interest set ∪ client updatesConcurrency: size(vector) = O(#DCs)!
7
DC
DC
DC
C
C C
C
Remote: Causally
consistent Local: Read My Writes
[Write Fast, Read in the Past: Causal+ Consistency for Client-Side Apps]
Causal consistency & fail-over
Asynchronous propagation• At least once: stubborn• At most once: filter out duplicates: how?• Consistency with new DC?
8
fail-overDC
DC
DC
C
C C
C
[Write Fast, Read in the Past: Causal+ Consistency for Client-Side Apps]
Inconsistency with new DC
Causal gap at client: y.add(C) depends on x.add(T)
Oregon cannot satisfy: fail-over impossible; client blocks until Ireland recovers
9
xy
xy
y.add A
x.add T
y.add B
Oregon DC
Ireland DC
(y.add A)
(y.add A)
(y.add B)Paris Client
y.add(C)y.readread x
read not available
write not transmitted
xy
[Write Fast, Read in the Past: Causal+ Consistency for Client-Side Apps]
Solution: reading in the past
Client served only updates that are K-stable ⇒ no gaps• K=1: always fresh, fail-over problematic• K=N: fail-over guaranteed, staleness‣ updates possibly buffered indefinitely
Default: K=2, covers common failures• Also improves throughput
10
x.read
xy
xy
y.add A
x.add T
y.add B
Oregon DC
Ireland DC
(y.add A)
(y.add A)
Paris Client y.add(C)y.readx
y
[Write Fast, Read in the Past: Causal+ Consistency for Client-Side Apps]
Convergence by construction
Merging concurrent updates:• Deterministic• Dependent only on delivery poset• Not on delivery order, local info
Sufficient conditions:• Delivered updates commute• (or) Monotonic semi-lattice
CRDTs
11 [Write Fast, Read in the Past: Causal+ Consistency for Client-Side Apps]
Conflict-free Replicated Data Types (CRDTs)Converge concurrent updates
Encapsulate replication & resolutionRe-usable data types
Correct by construction
12
Register• Last-Writer Wins• Multi-Value
Set• Grow-Only• 2P• Observed-Remove
Map
Counter• Unlimited• Restricted non-
negativeGraph
• Directed• Monotonic DAG• Edit graph
Sequence
[Write Fast, Read in the Past: Causal+ Consistency for Client-Side Apps]
Highly-Available Transactions
Unit of reading / writing• Read committed• Read atomic
Update is visible only when • preceding visible• all in transaction visible
Single ID, increment• read in the past
No synchronisation necessary
13 [Write Fast, Read in the Past: Causal+ Consistency for Client-Side Apps]
Experiments
3 DCs in Amazon EC21 DC = 1 EC2 medium instance100 client nodes in PlanetLabMultiple sessions per nodeCache size: 512 objects
14
ms
msms
Social networking application•90% cache hitsEvaluate•Availability, responsiveness• Size of metadata• Latency, throughput
[Write'Fast,'Read'in'the'Past:'Causal+'Consistency'for'Client:Side'Apps]
• remote = application logic & data in DC'• SwiftCloud: application, updates at client,
asynchronous commit'• SwiftCloud RIP: read K-stable'• stable updates: read one, write K
Update caching + Read-In-Past minimize latency
RTT 2. RTT
Operations with > 1 cache miss, selective queries
Read-in-past + client-assisted fault tolerance
Client-side caching & updates
15
writes,RIP
reads,RIPreads/writes,
remote, no FT
reads, remote + stable update
writ
es, r
emot
e+st
able
upd
ate
[Write Fast, Read in the Past: Causal+ Consistency for Client-Side Apps]
Latency vs. throughput
SwiftCloud
1 DC
2 DC
3 DC
1 DC
2 DC
3 DC
classic synch.
10×
60×
[Write Fast, Read in the Past: Causal+ Consistency for Client-Side Apps]
Latency vs. throughput
SwiftCloud
1 DC
2 DC
3 DC
1 DC
2 DC
3 DC
classic synch. 6×
12×
[Write Fast, Read in the Past: Causal+ Consistency for Client-Side Apps]
Staleness for fault tolerance
18
[Write Fast, Read in the Past: Causal+ Consistency for Client-Side Apps]
Metadata overhead
19
SwiftCloud PRACTI/Depot
● ● ●● ● ●
●
●
●
●
●
●
0
1000
2000
3000
4000
5000
7500 15000 22500 30000 375007500 15000 22500 30000 37500#users
syst
em m
etad
ata
[byt
es]
[Write Fast, Read in the Past: Causal+ Consistency for Client-Side Apps]
SummaryFast response, high availability
• Shared store replicated at client• Full consistency guarantees
Availability: Causal consistency is strongest possible• (+ transactions + red-blue)
Challenges & solutions:• Large database: partial replicate data, metadata• Small, bounded metadata: DC-based, pruning• Fail-over‣ At-most-once: Client ID + merge clocks‣ Avoid gaps: read in the past (K-stable)
• Convergence: CRDTs
20
[Write Fast, Read in the Past: Causal+ Consistency for Client-Side Apps]
Current & future work
SyncFree EU project• Inria, Nova, Minho, Kaiserslautern,
UC Louvain, Koç• Basho, Trifork, Rovio
Objectives• CRDTs in theory + real apps• Extreme scalability (DC and clients)• Transactions, invariants, security• Languages, patterns, tools• Crowd-sourced experiments
21 [Write Fast, Read in the Past: Causal+ Consistency for Client-Side Apps]
Backup slides
22
[Write Fast, Read in the Past: Causal+ Consistency for Client-Side Apps]
Internet round-trip times
23
RTTS (ms)
Endpoints Speed of light min mean max
Berkeley – Canberra 79.9 174.0 174.7 176.0
Berkeley – New York 27.4 85.0 85.0 85.0
Berkeley – Trondheim 55.6 197.0 197.0 197.0
Pittsburgh – Ottawa 4.3 44.0 44.1 62.0
Pittsburgh – Hong-Kong 85.9 217.0 223.1 393.0
Pittsburgh – Dublin 42.0 115.0 115.7 116.0
Pittsburgh – Seattle 22.9 83.0 83.9 84.0
[Write Fast, Read in the Past: Causal+ Consistency for Client-Side Apps]
Red/Blue consistency
Mix synchronous (red) and asynchronous (blue) updates
Correctness• All reads and updates are causally
ordered• Blue updates commute with both red
and blue ones• Red updates are totally ordered with
red ones (but not with blue ones)[Li et al. OSDI 2012]
24
[Write Fast, Read in the Past: Causal+ Consistency for Client-Side Apps][From strong to eventual consistency: getting it right]
Eventual Consistency
• Every update eventually reaches every replica at least once
• An update has effect on a replica at most once• At all times, the state of a replica satisfies the
objects’ specifications• Two replicas that received the same set of
updates eventually reach the same stateTransactional: replace “update” with “transaction
comprising one or more updates”.
25 [Write Fast, Read in the Past: Causal+ Consistency for Client-Side Apps]
CRDTs in the wild
Walter'[SOSP'2011]:'c:set'Riak'2.0:'counter,'set,'map'Facebook'Apollo'(announced'2014:06)'StateBox'(Erlang),'KnockBox'(Clojure),'meangirls'(Ruby),'riak:java:crdt'(Java)
26
[Write Fast, Read in the Past: Causal+ Consistency for Client-Side Apps]
SwiftCloud: geo-replication right to the client machine
Extreme numbers of clients
Large database
Causal consistency for servers and clients
Fault tolerance
Metadata proportional to # clients?
Full replication?
Switching servers?
27 [Write Fast, Read in the Past: Causal+ Consistency for Client-Side Apps]
SwiftCloud: geo-replication right to the client machine
Low latency … for reads
• Partial replication/caching in clients… for writes
• Weak consistency• Mergeable writes => CRDT• + Transactions with parallel snapshot isolation (PSI
transactions)
28
[Write Fast, Read in the Past: Causal+ Consistency for Client-Side Apps]
Causal transactions
Read a causally-consistent stateAll-or-nothing, isolated updatesNo consensus (not ACID)
29
xy
xy
!y1
!x1
?x1
!y2
!x2
?x2
x
?y1 ?y2
Not Available
Eventual Consistency
Causal Consistency
Non-Monotonic SI
Parallel SIUpdate
Serialisability
Snapshot Isolation Serialisability
Strict Serialisability
Transactional Causal Cons.
Partial Order Writes
Hard toprogram
Easy toprogram
Live
Causal order
Transactions
Total order writes
Total order reads
Total order reads & writesReal time
Total Order Writes
Total order reads
Total order reads & writes
Fast writes
Highperformance
Lowperformance
Wait-Free QueriesMinimum
Commitment Sync
Genuine Partial Replication
Forward Freshness
Genuine Partial Replication
Forward Freshness
Minimum Commitment SyncRelax read ordering
Wait-Free QueriesDecoupled read/writes
Relax write ordering
No order whatsoever
No atomicity