Dynamo
Kay Ousterhout
Goals
• Small files• Always writeable • Low latency–Measured at 99.9th percentile
Non-goals
• Untrusted nodes• Relational schema• Hierarchical namespaceKey-Value Store
Big (Old?) Ideas
Replication: Store every object on N nodes
N = 3
Consistent Hashing
C
B
A
G
F
DE
Key
Consistent Hashing
C
B
A
G
F
DE
Key
H
Consistent Hashing + Virtual Nodes
C
B
A
G
F
DE
Key
H
Conflict Resolution: Vector Clocks
Basic put()
C
B
A
G
F
DE
H
Coordinator
k:v
k:v([G, 1])
N = 3, W = 2
get() with Sloppy Quorum
C
B
A
G
F
DE
H
Coordinator
k?
k?
N = 3, W = 2, R = 2
k:v([G, 1])
k:v([G, 1])
Takeaway: Tunable!
Change N,R,W for desired consistency, availability
Antientropy: Merkle Trees
Object hashes at leaves
Hash of children
Node A Node B
More Ideas
• Hinted handoff• Explicit join/leave mechanism (gossip-based)• Seed nodes• Local notion of failure
Comments
System Scope
“…the system needs to have scalable and robust solutions for load balancing, membership and
failure detection, failure recovery, replica synchronization, overload handling, state transfer, concurrency and job scheduling,
request marshaling, request routing, system monitoring and alarming, and configuration
management.”
Triggers
• Client registers to be notified when key changes
• Challenging with eventual consistency!
• Solution: sloppy triggers
Setting N,R,W
• …is annoying!• Can we auto-configure R?• What about W?
Supporting Multiple N,R,W
• Different availability/consistency tradeoffs for each object
• …without making multiple systems