+ All Categories
Home > Technology > AWS Under the covers with Amazon DynamoDB IP Expo 2013

AWS Under the covers with Amazon DynamoDB IP Expo 2013

Date post: 01-Dec-2014
Category:
Upload: amazon-web-services
View: 2,017 times
Download: 0 times
Share this document with a friend
Description:
In this session you'll learn about the decisions that went into designing and building Amazon DynamoDB, and how it allows you to stay focused on your application while enjoying single digit latencies at any scale. We'll dive deep on how to model data, maintain maximum throughput, and drive analytics against your data, while profiling real world use cases, tips and tricks from customers running on Amazon DynamoDB today.
56
Under the covers with Dynamo DB Ian Meyers IP Expo 2013
Transcript
Page 1: AWS Under the covers with Amazon DynamoDB IP Expo 2013

Under the covers

with Dynamo DB

Ian Meyers IP Expo 2013

Page 2: AWS Under the covers with Amazon DynamoDB IP Expo 2013

Fully Managed, Provisioned throughput NoSQL

database

Fast, predictable, configurable performance

Fully distributed, fault tolerant HA architecture

Integration with EMR & Hive

Compute Storage

AWS Global Infrastructure

Database

App Services

Deployment & Administration

Networking

RDS Dynamo

DB

Redshift

Page 3: AWS Under the covers with Amazon DynamoDB IP Expo 2013

Consistent, predictable performance.

Single digit millisecond latency.

Backed by solid-state drives.

Page 4: AWS Under the covers with Amazon DynamoDB IP Expo 2013

Flexible data model.

Key/attribute pairs. No schema required.

Easy to create. Easy to adjust.

Page 5: AWS Under the covers with Amazon DynamoDB IP Expo 2013

Seamless scalability.

No table size limits. Unlimited storage.

No downtime.

Page 6: AWS Under the covers with Amazon DynamoDB IP Expo 2013

Durable.

Consistent, disk only writes.

Replication across data centers and availability zones.

Page 7: AWS Under the covers with Amazon DynamoDB IP Expo 2013

Without the operational burden.

No Cluster to Manage

No HA to Manage

Page 8: AWS Under the covers with Amazon DynamoDB IP Expo 2013

Consistent writes.

Atomic increment and decrement.

Optimistic concurrency control: conditional writes.

Page 9: AWS Under the covers with Amazon DynamoDB IP Expo 2013

Transactions.

Native item level transactions only.

Puts, updates and deletes are ACID.

Transaction API for Java.

Page 10: AWS Under the covers with Amazon DynamoDB IP Expo 2013

Three decisions + three clicks

= ready for use

Page 11: AWS Under the covers with Amazon DynamoDB IP Expo 2013

Three decisions + three clicks

= ready for use

Primary keys

Level of throughput

Secondary Indexes

Page 12: AWS Under the covers with Amazon DynamoDB IP Expo 2013

Three decisions + three clicks

= ready for use

Primary keys

Level of throughput

Secondary Indexes

Page 13: AWS Under the covers with Amazon DynamoDB IP Expo 2013

Provisioned throughput.

Reserve IOPS for reads and writes.

Scale up for down at any time.

Page 14: AWS Under the covers with Amazon DynamoDB IP Expo 2013

Pay per capacity unit.

Priced per hour of provisioned throughput.

Page 15: AWS Under the covers with Amazon DynamoDB IP Expo 2013

Write throughput.

Size of item x writes per second

$0.0065 for 10 write units

Page 16: AWS Under the covers with Amazon DynamoDB IP Expo 2013

Read throughput.

Strong or eventual consistency

Read data will reflect all previous transactions.

$0.0065 per hour for 50 units.

Page 17: AWS Under the covers with Amazon DynamoDB IP Expo 2013

Read throughput.

Strong or eventual consistency

$0.0065 per hour for 100 units.

Read data may reflect old values.

Page 18: AWS Under the covers with Amazon DynamoDB IP Expo 2013

Read throughput.

Strong or eventual consistency

Same latency expectations.

Mix and match at ‘read time’.

Page 19: AWS Under the covers with Amazon DynamoDB IP Expo 2013

Provisioned throughput is

managed by DynamoDB.

Page 20: AWS Under the covers with Amazon DynamoDB IP Expo 2013

Data is partitioned and

managed by DynamoDB.

Page 21: AWS Under the covers with Amazon DynamoDB IP Expo 2013

Reserved capacity.

Up to 53% for 1 year reservation.

Up to 76% for 3 year reservation.

Page 22: AWS Under the covers with Amazon DynamoDB IP Expo 2013

Authentication.

Session based to minimize latency.

Uses the Amazon Security Token Service.

Handled by AWS SDKs.

Integrates with IAM.

Page 23: AWS Under the covers with Amazon DynamoDB IP Expo 2013

Monitoring.

CloudWatch metrics:

latency, consumed read and write throughput,

errors and throttling.

Page 24: AWS Under the covers with Amazon DynamoDB IP Expo 2013

Indexing.

Items are indexed by primary and secondary keys.

Primary keys can be composite.

Secondary keys index on other attributes.

Page 25: AWS Under the covers with Amazon DynamoDB IP Expo 2013

ID Date Total

id = 100 date = 2012-05-16-09-00-10 total = 25.00

id = 101 date = 2012-05-15-15-00-11 total = 35.00

id = 101 date = 2012-05-16-12-00-10 total = 100.00

id = 102 date = 2012-03-20-18-23-10 total = 20.00

id = 102 date = 2012-03-20-18-23-10 total = 120.00

Page 26: AWS Under the covers with Amazon DynamoDB IP Expo 2013

ID Date Total

id = 100 date = 2012-05-16-09-00-10 total = 25.00

id = 101 date = 2012-05-15-15-00-11 total = 35.00

id = 101 date = 2012-05-16-12-00-10 total = 100.00

id = 102 date = 2012-03-20-18-23-10 total = 20.00

id = 102 date = 2012-03-20-18-23-10 total = 120.00

Hash key

Page 27: AWS Under the covers with Amazon DynamoDB IP Expo 2013

ID Date Total

id = 100 date = 2012-05-16-09-00-10 total = 25.00

id = 101 date = 2012-05-15-15-00-11 total = 35.00

id = 101 date = 2012-05-16-12-00-10 total = 100.00

id = 102 date = 2012-03-20-18-23-10 total = 20.00

id = 102 date = 2012-03-20-18-23-10 total = 120.00

Hash key Range key

Composite primary key

Page 28: AWS Under the covers with Amazon DynamoDB IP Expo 2013

ID Date Total

id = 100 date = 2012-05-16-09-00-10 total = 25.00

id = 101 date = 2012-05-15-15-00-11 total = 35.00

id = 101 date = 2012-05-16-12-00-10 total = 100.00

id = 102 date = 2012-03-20-18-23-10 total = 20.00

id = 102 date = 2012-03-20-18-23-10 total = 120.00

Hash key Range key Secondary range key

Page 29: AWS Under the covers with Amazon DynamoDB IP Expo 2013

Conditional updates.

PutItem, UpdateItem, DeleteItem can take

optional conditions for operation.

UpdateItem performs atomic increments.

Page 30: AWS Under the covers with Amazon DynamoDB IP Expo 2013

One API call, multiple items

BatchGet returns multiple items by key.

Throughput is measured by IO, not API calls.

BatchWrite performs up to 25 put or delete operations.

Page 31: AWS Under the covers with Amazon DynamoDB IP Expo 2013

Query vs Scan

Query for Composite Key queries.

Scan for full table scans, exports.

Both support pages and limits.

Maximum response is 1Mb in size.

Page 32: AWS Under the covers with Amazon DynamoDB IP Expo 2013

Unlimited storage.

Unlimited attributes per item.

Unlimited items per table.

Maximum of 64k per item.

Page 33: AWS Under the covers with Amazon DynamoDB IP Expo 2013

message_id = 1 part = 1 message =

<first 64k>

message_id = 1 part = 2 message =

<second 64k>

message_id = 1 part = 3 joined =

<third 64k>

Split across items.

Page 34: AWS Under the covers with Amazon DynamoDB IP Expo 2013

message_id = 1 message =

http://s3.amazonaws.com...

message_id = 2 message =

http://s3.amazonaws.com...

message_id = 3 message =

http://s3.amazonaws.com...

Store a pointer to S3.

Page 35: AWS Under the covers with Amazon DynamoDB IP Expo 2013

Time Series Data

Separate Data by Throughput Required

Hot Data in Tables with High Provisioned IO

Older Data in Tables with Low IO

Page 36: AWS Under the covers with Amazon DynamoDB IP Expo 2013

April March February January December

Page 37: AWS Under the covers with Amazon DynamoDB IP Expo 2013

event_id =

1000

timestamp =

2013-04-16-09-59-01

key =

value

event_id =

1001

timestamp =

2013-04-16-09-59-02

key =

value

event_id =

1002

timestamp =

2013-04-16-09-59-02

key =

value

Hot and cold tables. April - 1000 Read IOPS, 1000 Write IOPS

March - 200 Read IOPS, 1 Write IOPS

event_id =

1000

timestamp =

2013-03-01-09-59-01

key =

value

event_id =

1001

timestamp =

2013-03-01-09-59-02

key =

value

event_id =

1002

timestamp =

2013-03-01-09-59-02

key =

value

Page 38: AWS Under the covers with Amazon DynamoDB IP Expo 2013

Archive data.

Move old data to S3: lower cost.

Still available for analytics.

Run queries across hot and cold data

with Elastic MapReduce.

Page 39: AWS Under the covers with Amazon DynamoDB IP Expo 2013

Partitioning

Page 40: AWS Under the covers with Amazon DynamoDB IP Expo 2013

Uniform workload.

Data stored across multiple partitions.

Data is primarily distributed by primary key.

Provisioned throughput is divided evenly across partitions.

Page 41: AWS Under the covers with Amazon DynamoDB IP Expo 2013

To achieve and maintain full

provisioned throughput, spread

workload evenly across hash keys.

Page 42: AWS Under the covers with Amazon DynamoDB IP Expo 2013

Non-Uniform workload.

Might be throttled, even at high levels of throughput.

Page 43: AWS Under the covers with Amazon DynamoDB IP Expo 2013

Distinct values for hash keys.

BEST PRACTICE 1:

Hash key elements should have a

high number of distinct values.

Page 44: AWS Under the covers with Amazon DynamoDB IP Expo 2013

user_id =

mza

first_name =

Matt

last_name =

Wood

user_id =

jeffbarr

first_name =

Jeff

last_name =

Barr

user_id =

werner

first_name =

Werner

last_name =

Vogels

user_id =

simone

first_name =

Simone

last_name =

Brunozzi

... ... ...

Lots of users with unique user_id.

Workload well distributed across hash key.

Page 45: AWS Under the covers with Amazon DynamoDB IP Expo 2013

Avoid limited hash key values.

BEST PRACTICE 2:

Hash key elements should have a

high number of distinct values.

Page 46: AWS Under the covers with Amazon DynamoDB IP Expo 2013

status =

200

date =

2012-04-01-00-00-01

status =

404

date =

2012-04-01-00-00-01

status

404

date =

2012-04-01-00-00-01

status =

404

date =

2012-04-01-00-00-01

Small number of status codes.

Unevenly, non-uniform workload.

Page 47: AWS Under the covers with Amazon DynamoDB IP Expo 2013

Model for even distribution.

BEST PRACTICE 3:

Access by hash key value should be evenly

distributed across the dataset.

Page 48: AWS Under the covers with Amazon DynamoDB IP Expo 2013

mobile_id =

100

access_date =

2012-04-01-00-00-01

mobile_id =

100

access_date =

2012-04-01-00-00-02

mobile_id =

100

access_date =

2012-04-01-00-00-03

mobile_id =

100

access_date =

2012-04-01-00-00-04

... ...

Large number of devices.

Small number which are much more popular than others.

Workload unevenly distributed.

Page 49: AWS Under the covers with Amazon DynamoDB IP Expo 2013

mobile_id =

100.1

access_date =

2012-04-01-00-00-01

mobile_id =

100.2

access_date =

2012-04-01-00-00-02

mobile_id =

100.3

access_date =

2012-04-01-00-00-03

mobile_id =

100.4

access_date =

2012-04-01-00-00-04

... ...

Sample access pattern.

Workload randomized by hash key.

Page 50: AWS Under the covers with Amazon DynamoDB IP Expo 2013

Distribute scans across dataset

BEST PRACTICE 4:

Improve retrieval times by scanning partitions

concurrently using the Parallel Scan feature.

Page 51: AWS Under the covers with Amazon DynamoDB IP Expo 2013

Parallel Scan: separate thread for each table segment

Application

Main Thread

Worker

Thread 0

Worker

Thread 1

Worker

Thread 2

Page 52: AWS Under the covers with Amazon DynamoDB IP Expo 2013

Reporting & Analytics

Page 53: AWS Under the covers with Amazon DynamoDB IP Expo 2013

Seamless scale.

Scalable methods for data processing.

Scalable methods for backup/restore.

Page 54: AWS Under the covers with Amazon DynamoDB IP Expo 2013

Amazon Elastic MapReduce.

Managed Hadoop service for

data-intensive workflows.

aws.amazon.com/emr

Page 55: AWS Under the covers with Amazon DynamoDB IP Expo 2013

create external table items_db

(id string, votes bigint, views bigint) stored by

'org.apache.hadoop.hive.dynamodb.DynamoDBStorageHandler'

tblproperties

("dynamodb.table.name" = "items",

"dynamodb.column.mapping" =

"id:id,votes:votes,views:views");

Page 56: AWS Under the covers with Amazon DynamoDB IP Expo 2013

select id, likes, views

from items_db

order by views desc;


Recommended