+ All Categories
Home > Documents > Dynamo: Amazon’s Highly Available Key-value Store

Dynamo: Amazon’s Highly Available Key-value Store

Date post: 15-Mar-2016
Category:
Upload: igor-higgins
View: 48 times
Download: 2 times
Share this document with a friend
Description:
Dynamo: Amazon’s Highly Available Key-value Store. Giuseppe DeCandia, et.al., SOSP ‘07. Introduction. Dynamo: used to manage applications that require only primary-key access to data - PowerPoint PPT Presentation
Popular Tags:
29
Dynamo: Amazon’s Highly Available Key- value Store Giuseppe DeCandia, et.al., SOSP ‘07
Transcript
Page 1: Dynamo: Amazon’s Highly Available Key-value Store

Dynamo: Amazon’s Highly Available Key-value Store

Giuseppe DeCandia, et.al.,SOSP ‘07

Page 2: Dynamo: Amazon’s Highly Available Key-value Store

Introduction

• Dynamo: used to manage applications that require only primary-key access to data

• Dynamo applications need scalability, high availability, fault tolerance, but don’t need the complexity of a relational DB– ACID properties => little parallelism, low

availability

Page 3: Dynamo: Amazon’s Highly Available Key-value Store

Assumptions:• Applications perform simple read/write ops on

single, small ( < 1MB) data objects which are identified by a unique key.– Example: the shopping cart

• Replace ACID properties with weaker guarantees: eventual consistency, no isolation promises

• Services must operate efficiently on commodity hardware

• Used only by internal services, so security isn’t an issue

Page 4: Dynamo: Amazon’s Highly Available Key-value Store

Service Level Agreements (SLA)

• Clients and servers negotiate SLAs to establish the kind of service and the expected performance

• Amazon expects the guarantees to apply to 99.9% of requests– Claim that most industry systems express

SLAs in terms of “average”, “median”, and “expected variance” – much weaker than Amazon’s requirements

Page 5: Dynamo: Amazon’s Highly Available Key-value Store

Design Considerations

• Services control properties such as durability and consistency, evaluate tradeoffs (cost v performance, for example)

• Replicated databases cannot guarantee strong consistency and high availability at the same time– Optimistic replication updates replicas as a

background process to get eventual consistency

Page 6: Dynamo: Amazon’s Highly Available Key-value Store

Design Considerations:Resolving Conflicting Updates

• When– Since Dynamo targets services that require

“always writeable” data storage; e.g., users must always be able to add/delete from the shopping cart; resolve conflicts during reads, not writes

• By Whom– Let each application decide for itself– But … default is “last write wins”.

Page 7: Dynamo: Amazon’s Highly Available Key-value Store

Other Key Design Principles

• Incremental scalability: adding a single node should not affect the system significantly

• Symmetry: all nodes have the same responsibilities

• Decentralization: favor P2P techniques over centralized control

• Heterogeneity: take advantage of differences in server capabilities.

Page 8: Dynamo: Amazon’s Highly Available Key-value Store

Comparison to Other Systems

• Peer-to-Peer (Freenet, Chord, …)– Structured v unstructured: access times– Conflict resolution for concurrent updates

without wide-area file locking • Distributed File Systems and Databases

(Google, Bayou, Coda, …)– Treatment of system partitions– Conflict resolution, eventual consistency– Strong consistency v eventual consistency

Page 9: Dynamo: Amazon’s Highly Available Key-value Store

Dynamo v Other DecentralizedStorage Systems

• “always writeable”; – updates won’t be rejected because of failure

or concurrent updates• One administrative domain; nodes are

assumed to be trustworthy• Don’t require hierarchical name spaces or

relational schema• Operations must be performed within a

few hundred milliseconds.

Page 10: Dynamo: Amazon’s Highly Available Key-value Store

System Architecture

• The Dynamo data storage system contains items that are associated with a single key

• Operations that are implemented: get( ) and put( ).– get(key)– put(key, context, object) where context refers

to various kinds of system metadata

Page 11: Dynamo: Amazon’s Highly Available Key-value Store

Problem Technique Advantage

Partitioning Consistent Hashing Incremental scalability

High availability Vector clocks, reconciled Version size is decoupledfor writes during reads from update rates

Temporary Sloppy Quorum, Provides high availability &failures hinted handoff durability guarantee when

some of the replicas arenot available

Permanent Anti-entropy using Synchronizes divergent replicasfailures Merkle trees in the background

Membership & Gossip-based protocol Preserves symmetry and avoids failure detection having a centralized registry for

storing membership and nodeliveness information

Table 1: Summary of techniques used in Dynamo and their advantages

Page 12: Dynamo: Amazon’s Highly Available Key-value Store

Partitioning Algorithm

• Partitioning = dividing data storage across all nodes. Supports scalability

• Very similar to Chord-based schemes• Consistent hashing scheme distributes

content across multiple nodes– In consistent hashing the effect of adding a

node is localized – on average, K/n objects must be remapped (K = # of keys, n = # of nodes)

Page 13: Dynamo: Amazon’s Highly Available Key-value Store

Partitioning Algorithm

• Hash function produces an m-bit number which defines a circular name space (like Chord)

• Nodes are assigned numbers randomly in the name space

• Hash(data key) and assign to node using successor function like Chord

Page 14: Dynamo: Amazon’s Highly Available Key-value Store

Load Distribution

• Random assignment of node to position in ring may produce non-uniform distribution of data.

• Solution: virtual nodes– Assign several random numbers to each

physical node; now it is responsible for itself and data that would be stored on the virtual nodes, if they existed

Page 15: Dynamo: Amazon’s Highly Available Key-value Store

Replication

• Data is replicated at N nodes• Succ(key) = coordinator node

– The coordinator replicates the object at the N-1 successor nodes in the ring, skipping virtual nodes to increase fault tolerance

– Preference list: the list of nodes that store a particular key

– There are actually > N nodes on the preference list, in order to ensure N “healthy” nodes at all times.

Page 16: Dynamo: Amazon’s Highly Available Key-value Store

Data Versioning

• Updates can be propagated to replicas asynchronously – the put( ) call may return before all updates have been applied.– Implication: a subsequent get( ) may return

stale data.• Barring failure, most updates are applied

within bounded time, but server or network failure can delay updates “for an extended period of time”.

Page 17: Dynamo: Amazon’s Highly Available Key-value Store

Data Versioning

• Some app’s can be designed to work in this environment; e.g., the “add-to/delete-from cart” operation.– It’s okay to add to an old cart, as long as all

versions of the cart are eventually reconciled• Dynamo treats each modification as a new

(& immutable) version of the object.– Multiple versions can exist at the same time

Page 18: Dynamo: Amazon’s Highly Available Key-value Store

Reconciliation

• Usually, new versions contain the old versions – no problem

• Sometimes concurrent updates and failures generate conflicting versions

• Typically this is handled by merging– For add-to-cart operations, nothing is lost– For delete-from cart, deleted items might

reappear after the reconciliation

Page 19: Dynamo: Amazon’s Highly Available Key-value Store

Parallel Version Branches

• There may be multiple versions of the same data, each coming from a different path (e.g., if there’s been a network partition)

• Vector clocks are used to identify causally related versions and parallel (concurrent) versions– For causally related versions, accept the final version

as the “true” version– For parallel (concurrent) versions, use some

reconciliation technique to resolve the conflict

Page 20: Dynamo: Amazon’s Highly Available Key-value Store

Execution of get( ) and put( )• Operations can originate at any node in the

system. • Clients may

– Route request through a load-balancing coordinator node

– Use client software that routes the request directly to the coordinator for that object

• The coordinator contacts R nodes for reading and W nodes for writing, where R + W > N

Page 21: Dynamo: Amazon’s Highly Available Key-value Store

“Sloppy Quorum”

• put( ): the coordinator writes to the first N healthy nodes on the preference list. If W writes succeed, the write is considered to be successful

• get( ): coordinator reads from N nodes; waits for R responses. – If they agree, return value. – If they disagree, but are causally related, return the

most recent value– If they are causally unrelated apply reconciliation

techniques and write back the corrected version

Page 22: Dynamo: Amazon’s Highly Available Key-value Store

Hinted Handoff

• What if a write operation can’t reach some of the nodes on the preference?

• To preserve availability and durability, store the replica temporarily on another node, accompanied by a metadata “hint” that remembers where the replica should be stored.

• Hinted handoff ensures that read and write operations don’t fail because of network partitioning or node failures.

Page 23: Dynamo: Amazon’s Highly Available Key-value Store

Handling Permanent Failures

• Hinted replicas may be lost before they can be returned to the original node. Other problems may cause replicas to be lost or fall out of agreement

• Merkle trees allow two nodes to compare a set of replicas and determine fairly easily– Whether or not they are consistent– Where the inconsistencies are

Page 24: Dynamo: Amazon’s Highly Available Key-value Store

Handling Permanent Failures

• Merkle trees have leaves whose values are hashes of the values associated with keys (one key/leaf) – Parent nodes contain hashes of their children– Eventually, root contains a hash that represents

everything in that replica• To detect inconsistency between two sets of

replicas, compare the roots– Source of inconsistency can be detected by looking at

internal nodes

Page 25: Dynamo: Amazon’s Highly Available Key-value Store

Failures

• Like Google, Amazon has a number of data centers, each with many commodity machines.– Individual machines fail regularly– Sometimes entire data centers fail due to

power outages, network partitions, tornados, etc.

• To handle failure of entire centers, replicas are spread across multiple data centers.

Page 26: Dynamo: Amazon’s Highly Available Key-value Store

Membership and Failure Detection

• Temporary failures or accidental additions of nodes are possible but shouldn’t cause load re-balancing.

• Additions and deletions of nodes are explicitly executed by an administrator.

• A gossip-based protocol is used to ensure that every node eventually has a consistent view of the membership list.

Page 27: Dynamo: Amazon’s Highly Available Key-value Store

Gossip-based Protocol

• Periodically, each node contacts another node in the network, randomly selected.

• Nodes compare their membership histories and reconcile them.

Page 28: Dynamo: Amazon’s Highly Available Key-value Store

Load Balancing for Additions and Deletions

• When a node is added, it acquires key values from other nodes in the network.– Nodes learn of the addition through the gossip

protocol, contact the node to offer their keys, which are then transferred after being accepted

– When a node is removed, a similar process happens in reverse

• Experience has shown that this approach leads to a relatively uniform distribution of key/value pairs across the system

Page 29: Dynamo: Amazon’s Highly Available Key-value Store

Summary• Experience with Dynamo indicates that it meets

the requirements of scalability and availability.• Service owners are able to customize their

storage system to emphasize performance, durability, or consistency. The primary parameters are N, R, and W.

• The developers conclude that decentralization and eventual consistency can provide a satisfactory platform for hosting highly-available applications.


Recommended