Understanding AWS Database Options (DAT201) | AWS re:Invent 2013

© 2013 Amazon.com, Inc. and its affiliates. All rights reserved. May not be copied, modified, or distributed in whole or in part without the express consent of Amazon.com, Inc.

DAT201- Understanding AWS Database Options

Sundar Raghavan – Amazon RDS Zac Sprackett – Vice President of Operations with SugarCRM Michael Thomas – Principal Software Engineer with Scopely November 13, 2013

Today’s discussion

AWS Database Options and Decision Factors

Best Practice Tips and Techniques SugarCRM

Scopely

Q & A

Starting with the Customer

• How many of you use databases on AWS?

• How many of you use Amazon RDS, Amazon DynamoDB, Amazon Redshift, or Amazon ElastiCache?

• How many of you have a well defined DR strategy for your databases?

• How many of you are building geo-spatial and context sensitive applications?

• We suggest that you attend Werner’s keynote!

US West x 2 (N. California and Oregon)

US East (Northern Virginia)

Europe West (Dublin)

Asia Pacific Region

(Singapore)

Asia Pacific Region (Tokyo)

9 AWS Regions including 25 Availability Zones and growing

46 world-wide points of presence

US GovCloud (US ITAR

Region -- Oregon)

LATAM (Sao

Paola)

>10 data centersIn US East alone

Australia Region

(Australia)

Introducing: Cross Region Support

• RDS Snapshot Copy • All engines

Zoopla “We are very happy with RDS cross region snapshot copy feature as it gives us the ability to copy our data from one AWS region to another AWS region with minimal effort. Prior to this feature, it used to take 3 days and a number of manual steps to copy our snapshots. Now we have an automated process that helps us to achieve disaster recovery capabilities in just few steps.” Joel Callaway, IT Operations Manager Zoopla Property Group Ltd, UK

Your Mission is Clear

1. Zero to App in ____ Minutes

2. Zero to Millions of users in ____ Days

3. Zero to “Hero” in ____ Months

Focus on your App

Application tier

Load balancer

Database tier

Your Stack

Application tier

Load balancer

Database tier

Security, Innovation, Scale, Transactions, Performance, Durability, Availability, Skills..

Security, Innovation, Scale, Performance, Availability…

Security, Scale, Availability…

Your Stack of Worries

Not available on AWS

Spectrum of Database Options

SQL NoSQL

Low Cost High Cost

Do-it Yourself Fully Managed

Spectrum of Options

SQL NoSQL


MySQL, Oracle, SQL Server Amazon Redshift

Spectrum of Options

SQL NoSQL


MySQL Oracle, SQL Server, MariaDB Vertica, Paraccell …

Spectrum of Options

SQL NoSQL


MongoDB Cassandra Redis Memcache

DynamoDB ElastiCache (Memcache) ElastiCache (Redis) SimpleDB

Thinking About the Questions

Should I use SQL or NoSQL?

Should I use MySQL or

PostgreSQL?

Should I use Redis, Memcache, or ElastiCache?

? Should I use MongoDB,

Cassandra, or DynamoDB?

Actually, Thinking About the Right Questions

What are my scale and latency

needs?

What are my transactional and

consistency needs?

What are my read/write, storage and IOPS needs?

What are my time to market and server control

needs?

?

Factors to Consider Factors SQL NoSQL

Application • App with complex business logic? • Web app with lots of users?

Transactions • Complex txns, joins, updates? • Simple data model, updates, queries?

Scale • Developer managed • Automatic, on-demand scaling

Performance • Developer architected • Consistent, high performance at scale

Availability • Architected for fail-over • Seamless and transparent

Core Skills • SQL + Java/Ruby/Python/PhP • NoSQL + Java/Ruby/Python/PhP

Best of both worlds: Possible to Use SQL and NoSQL models in one App

Factors to Consider Self-Managed Service

• Full control over the instance, db and OS parameters

• Upgrades, back-ups, fail-over are yours to manage

• All aspects of security is managed by you

• Complex replication topologies and data management

Managed Service • Off-load the infrastructure and

software management • Automate database life-cycle

with APIs • Focus on database access and

app security • Limited control over replication

topologies

Pace of Innovation – a Bonus • SQL Server TDE, Version upgrade • Oracle TDE, Statspack, Fine grain access, 3TB/30K IOPS • Cross Region Snapshot Copy, Parallel replica, Chained replica • Multi-AZ SLA, Log access, VPC groups, …

RDS team launched 23+

features

• Redis engine support • Amazon DynamoDB Fine grain access control • Amazon DynamoDB local, Geospatial indexing library • Transaction library, Local secondary index, parallel scan

NoSQL team launched 10+

features

• Encryption with HSM support • Audit logging, SNS notification, snapshot sharing • COPY from Amazon EMR/HDFS/SSH • Faster resize, improved concurrency, distributed tables, …

Redshift team launched 20+

features

Amazon RDS is a managed SQL database service.

Simple to deploy and scale

Without any operational burden Reliable and cost effective

Choice of Database engines

Schema design

Frequent server upgrades Storage upgrades

Backup and recovery

Software upgrades

Patching

Hardware crash

Query construction

Query optimization Configuration

Migration

Off load the “administration”

Focus on the “innovation”

Optimizing for Developer Productivity

Multiple databases per instance

Use MySQL tools & drivers

Quickly set up Read Replicas

High availability Multi-AZ option (99.95% SLA)

Ability to promote Read replicas, Rename as Master

Diagnostics

Native MySQL replication

SSL for encryption over the wire

Monitor metrics

Shell, super user or direct file system access (Think security!)

Optimizing for Developer Productivity MySQL Manual for Read Replica

OR Amazon RDS console

ElastiCache is a managed caching service.

Easy to set up and operate cache clusters

Scale cache clusters with push button ease

Without any operational burden Ultra fast response time for read scaling

Supports Memcached and Redis engines

Elastic Load Balancing EC2 App

Instances

RDS MySQL DB

Instance with PIOPS

Master App Reads

Clients Cache Updates

ElastiCache is a Performance Booster Read Replica (Redis) Serve most read queries

In-memory performance

Read/write queries SSD performance

Amazon DynamoDB is a managed NoSQL database service.

Store and retrieve any amount of data Scale throughput to millions of IO

Without any operational burden Single digit millisecond latencies

CreateTable

UpdateTable

DeleteTable

DescribeTable

ListTables

PutItem

GetItem

UpdateItem

DeleteItem

BatchGetItem

BatchWriteItem

Query

Scan

Manage tables

Query specific items OR scan the full table

“Select”, “insert”, “update” items

Bulk select or update (max 1MB)

Optimizing for Developer Productivity

Amazon Redshift is a managed data warehouse service.

Fast response time (~10x that of typical relational stores)

Without any operational burden Under $1,000 per TB per year

Petabyte scale columnar database

So, what are the tips and techniques for successful deployments?

CRM Software

Thousands of Successful Deployments Two Highlights

Zac Sprackett

Gaming Platform

SugarCRM

Mike Thomas


Crafting Loyal Customers with SugarCRM Every Customer. Every User. Every Time.

S. Zachariah Sprackett, VP of Operations, SugarCRM

November 13, 2013

SugarCRM • Redefining Customer Relationship Management • Unique product bundling

– On Premise and Hosted offerings

• Manifest destiny – Source code access and SQL database per customer

• Scale – From one seat customers to multi thousand seat customers

• Globally distributed customer base

Deployment Models

Traditional SaaS SugarCRM

Application Stack

Shadow

Apache

PHP

MySQL

Elastic Search HTML5 & JavaScript

Linux Email Archiving

Background Jobs

Cloud Stacks

Amazon S3

Amazon Glacier

ElastiCache

EC2 Elastic Search EC2 Job Servers EC2 Web Servers

Amazon SES

RDS DB Instance

RDS DB Instance Read

Replica

Cloud Provider

Cloud Providers

Route 53

Managed Elastic IP

EC2 HA Proxy

EC2 HA Proxy

Cloud Stack

Management Console

Globally Distributed Cloud Providers

Delivering On Time and On Budget • Amazon lets you easily spin up testing environments

– Testing only works if you make use of it. Don’t make assumptions – Monitor everything

• Change in cost model can surprise finance – Planned capital expenditures versus after the fact operational expenditures – Use reserved instances – Third party tools such as Cloudability can help alert you of issues early

• Manage access keys effectively to control cost – Learn to love AWS Identity and Access Management (IAM)

Things to Watch Out For • Understand your IO requirements

– Make effective use of each of instance backed, Amazon EBS and Provisioned IOPS file systems

• Use the heck out of read replicas • Snapshots are incredibly useful

– But not available from a read replica • Don’t use the default parameter group for Amazon RDS

– Unless you really like restarting databases • Cold Standby is not instant on

– Don’t get stuck waiting for deployments in a forced failover scenario • ElastiCache is not clustered across availability zones • Watch out for the SLA

– 99.95% for a region even across two AZ’s – This doesn’t include user error

• You still need DBAs and Ops but they get to do cooler stuff

We’re Hiring

Email: [email protected] Free Trials: http://www.sugacrm.com/try-sugar

mailto:[email protected]


Scopely

Michael Thomas – Principal Software Engineer with Scopely

November 13, 2013

Our technical infrastructure allows developers to build games efficiently for both iOS and Android.

ABOUT SCOPELY

Millions of Users Billions of Turns

All titles have reached the Top 5 in the App Store, and the last

three have been #1.

Challenges • Build a single platform to support many different

kinds of games – asynchronous turn based, single player, synchronous, etc.

• Scale up and down as games are tested, launched, grow, and are retired.

• We are not an infrastructure company – we must focus on building features that support game development.

Platform Features • Accounts / authentication • Gameplay / state persistence • Chat / messaging • In game economy • Facebook integration • Gifting • Single Player state tracking • Promotion / cross-promotion system • Statistics • Tournaments • Achievements

• Email targeting • Suggested friends • In game news system • External partner integration • Invitation attribution • Push notifications • Content management • Generic storage API • Application / device configuration • AB Testing

Different Features/Different Requirements • Dynamic scaling (game launches, promotions, tests) • High write/read ratio (playing turns) • Transactional consistency (real money purchases) • Indexed data (user accounts) • Complex, real-time data (leaderboards)

Scopely Gaming Platform

DynamoDB RDS

ElastiCache

ElastiCache

Memcached for performance, scalability, and cost savings

Amazon DynamoDB for unbounded data with heavy write load.

Redis for fast, complex caching and message passing.

MySQL for bounded, transactional, queryable data.

Operational Data Storage

S3

Amazon S3 for asset and image storage.

Analytics Data Pipeline

Scopely Gaming Platform

SQS: In-Flight Events EC2: Message Loader S3: Staged Messages

RDS: Process / Job Tracking S3: Processed Data

EC2: Redshift Loader Redshift Data Warehouse

EMR: Transformer

Schema Mapping DSL from centipede.schema.table import Table from centipede.attributes import * class GemsTurn(Table): user_id = Integer, lambda message: message['Data']['GameData']['CurrentPlayerId'] current_turn = Integer, lambda message: message['Data']['Gamedata']['CurrentTurn'] end_date = Timestamp, lambda message: message['Data']['GameData']['EndDate'] expiration = Timestamp, lambda message: message['Data']['GameData']['Expiration'] game_id = Guid, lambda message: message['Data']['GameData']['GameId'] resigning_user_id = Integer, lambda message: message['Data']['GameData']['ResigningPlayerId'] start_context = Integer, lambda message: message['Data']['GameData']['StartContext'] start_date = Timestamp, lambda message: message['Data']['GameData']['StartDate'] status = Integer, lambda message: message['Data']['GameData']['Status'] tournament_id = Guid, lambda message: message['Data']['GameData']['TournamentId'] tournament_price_category = Integer, lambda message: message['Data']['GameData']['TournamentPriceCategory'] tournament_price_paid = Integer, lambda message: message['Data']['GameData']['TournamentPricePaid'] tutorial_type = Integer, lambda message: message['Data']['GameData']['TutorialType'] winning_user_id = Integer, lambda message: message['Data']['GameData']['WinningPlayerId'] awards = List, lambda message: message['Data']['GameData']['Players'][GemsTurnHelper.current_user_index(message)]['Awards'] coins_gathered = List, lambda message: message['Data']['GameData']['Players'][GemsTurnHelper.current_user_index(message)]['CoinsGathered'] custom_statistics = VarChar, lambda message: message['Data']['GameData']['Players'][GemsTurnHelper.current_user_index(message)]['CustomStatistics'] has_hidden_game = Boolean, lambda message: message['Data']['GameData']['Players'][GemsTurnHelper.current_user_index(message)]['HasHiddenGame'] last_nudge_date = Timestamp, lambda message: message['Data']['GameData']['Players'][GemsTurnHelper.current_user_index(message)]['LastNudgeDate'] score = Integer, lambda message: message['Data']['GameData']['Players'][GemsTurnHelper.current_user_index(message)]['Score'] score_for_award = Integer, lambda message: message['Data']['GameData']['Players'][GemsTurnHelper.current_user_index(message)]['ScoreForAward'] opponent_user_id = Integer, lambda message: message['Data']['GameData']['Players'][GemsTurnHelper.opponent_user_index(message)]['UserId']

Use Case: Leaderboards • “What is my rank in today’s tournament?”

• Hard to cache since a single player getting a new high score

changes everyone’s rank

• Highly optimized schema required 4 m2.2xlarge RDS nodes

• Latency for “what is my rank” could be above 100ms

• Redis sorted sets provide exactly what we need. Two m2.xlarge instances are more than enough. Rank query is now in single digit milliseconds.

Redis

Use Case: Game/Turn State • Extremely high throughput. Extremely large dataset.

• Semi-structured data – each game models “state” differently.

• Always queried by UserID or GameID.

• Maxed out an Amazon RDS instance – instead of spending time sharding /

optimizing Amazon RDS, we moved to Amazon DynamoDB.

• Saves operational time and development time by not having to worry about growing games/adding new games/traffic spikes.

DynamoDB

Use Case: User Accounts • Need to maintain uniqueness across multiple

columns (email, username, etc.)

• Queryable on multiple facets (email, username, external identifier)

• Entire table needs to be scanned regularly (promotions)

• Bounded data size

MySQL (RDS)

Use Case: Global Caching • Cache everything possible in Memcached

including both entities in Amazon DynamoDB and RDS.

• Single interface providing session caching, memcached caching, and Amazon DynamoDB access encourages consistent use of caching.

Memcached (ElastiCache)

public class CoherentStorage { public Cache L1Cache { get; set; } public Cache L2Cache { get; set; } public DynamoClient Dynamo { get; set; } private readonly Games _game; public CoherentStorage(Games game) { _game = game; L1Cache = Cache.Request; L2Cache = Cache.GetMemcached(String.Format("{0}GameState", game)); Dynamo = DynamoClient.Instance; } public void Save(object instance) { } public void Delete(object instance) { } public T Get<T>(object id, bool skipCache = false, bool consistentRead = true) { } }

Use Case: Global Caching

Memcached (ElastiCache)

Tips & Traps • Know your data – use reasonable heuristics for expected

data growth.

• Each data storage technology introduces some level of operational and engineering overhead. Choose wisely.

• Get creative with Amazon DynamoDB.

• Prepare for the unexpected with Metadata columns in MySQL.

Please give us your feedback on this presentation

As a thank you, we will select prize winners daily for completed surveys!

DAT201

Date post:	26-Jan-2015
Category:	Technology
Upload:	amazon-web-services
View:	118 times
Download:	2 times

Understanding AWS Database Options (DAT201) | AWS re:Invent 2013

Technology