Date post: | 26-Jan-2015 |
Category: |
Technology |
Upload: | amazon-web-services |
View: | 118 times |
Download: | 2 times |
© 2013 Amazon.com, Inc. and its affiliates. All rights reserved. May not be copied, modified, or distributed in whole or in part without the express consent of Amazon.com, Inc.
DAT201- Understanding AWS Database Options
Sundar Raghavan – Amazon RDS Zac Sprackett – Vice President of Operations with SugarCRM Michael Thomas – Principal Software Engineer with Scopely November 13, 2013
Today’s discussion
AWS Database Options and Decision Factors
Best Practice Tips and Techniques SugarCRM
Scopely
Q & A
Starting with the Customer
• How many of you use databases on AWS?
• How many of you use Amazon RDS, Amazon DynamoDB, Amazon Redshift, or Amazon ElastiCache?
• How many of you have a well defined DR strategy for your databases?
• How many of you are building geo-spatial and context sensitive applications?
• We suggest that you attend Werner’s keynote!
US West x 2 (N. California and Oregon)
US East (Northern Virginia)
Europe West (Dublin)
Asia Pacific Region
(Singapore)
Asia Pacific Region (Tokyo)
9 AWS Regions including 25 Availability Zones and growing
46 world-wide points of presence
US GovCloud (US ITAR
Region -- Oregon)
LATAM (Sao
Paola)
>10 data centersIn US East alone
Australia Region
(Australia)
Introducing: Cross Region Support
• RDS Snapshot Copy • All engines
Zoopla “We are very happy with RDS cross region snapshot copy feature as it gives us the ability to copy our data from one AWS region to another AWS region with minimal effort. Prior to this feature, it used to take 3 days and a number of manual steps to copy our snapshots. Now we have an automated process that helps us to achieve disaster recovery capabilities in just few steps.” Joel Callaway, IT Operations Manager Zoopla Property Group Ltd, UK
Your Mission is Clear
1. Zero to App in ____ Minutes
2. Zero to Millions of users in ____ Days
3. Zero to “Hero” in ____ Months
Focus on your App
Application tier
Load balancer
Database tier
Your Stack
Application tier
Load balancer
Database tier
Security, Innovation, Scale, Transactions, Performance, Durability, Availability, Skills..
Security, Innovation, Scale, Performance, Availability…
Security, Scale, Availability…
Your Stack of Worries
Not available on AWS
Spectrum of Database Options
SQL NoSQL
Low Cost High Cost
Do-it Yourself Fully Managed
Spectrum of Options
SQL NoSQL
Do-it Yourself Fully Managed
MySQL, Oracle, SQL Server Amazon Redshift
Spectrum of Options
SQL NoSQL
Do-it Yourself Fully Managed
MySQL Oracle, SQL Server, MariaDB Vertica, Paraccell …
Spectrum of Options
SQL NoSQL
Do-it Yourself Fully Managed
MongoDB Cassandra Redis Memcache
DynamoDB ElastiCache (Memcache) ElastiCache (Redis) SimpleDB
Thinking About the Questions
Should I use SQL or NoSQL?
Should I use MySQL or
PostgreSQL?
Should I use Redis, Memcache, or ElastiCache?
? Should I use MongoDB,
Cassandra, or DynamoDB?
Actually, Thinking About the Right Questions
What are my scale and latency
needs?
What are my transactional and
consistency needs?
What are my read/write, storage and IOPS needs?
What are my time to market and server control
needs?
?
Factors to Consider Factors SQL NoSQL
Application • App with complex business logic? • Web app with lots of users?
Transactions • Complex txns, joins, updates? • Simple data model, updates, queries?
Scale • Developer managed • Automatic, on-demand scaling
Performance • Developer architected • Consistent, high performance at scale
Availability • Architected for fail-over • Seamless and transparent
Core Skills • SQL + Java/Ruby/Python/PhP • NoSQL + Java/Ruby/Python/PhP
Best of both worlds: Possible to Use SQL and NoSQL models in one App
Factors to Consider Self-Managed Service
• Full control over the instance, db and OS parameters
• Upgrades, back-ups, fail-over are yours to manage
• All aspects of security is managed by you
• Complex replication topologies and data management
Managed Service • Off-load the infrastructure and
software management • Automate database life-cycle
with APIs • Focus on database access and
app security • Limited control over replication
topologies
Pace of Innovation – a Bonus • SQL Server TDE, Version upgrade • Oracle TDE, Statspack, Fine grain access, 3TB/30K IOPS • Cross Region Snapshot Copy, Parallel replica, Chained replica • Multi-AZ SLA, Log access, VPC groups, …
RDS team launched 23+
features
• Redis engine support • Amazon DynamoDB Fine grain access control • Amazon DynamoDB local, Geospatial indexing library • Transaction library, Local secondary index, parallel scan
NoSQL team launched 10+
features
• Encryption with HSM support • Audit logging, SNS notification, snapshot sharing • COPY from Amazon EMR/HDFS/SSH • Faster resize, improved concurrency, distributed tables, …
Redshift team launched 20+
features
Amazon RDS is a managed SQL database service.
Simple to deploy and scale
Without any operational burden Reliable and cost effective
Choice of Database engines
Schema design
Frequent server upgrades Storage upgrades
Backup and recovery
Software upgrades
Patching
Hardware crash
Query construction
Query optimization Configuration
Migration
Off load the “administration”
Focus on the “innovation”
Optimizing for Developer Productivity
Multiple databases per instance
Use MySQL tools & drivers
Quickly set up Read Replicas
High availability Multi-AZ option (99.95% SLA)
Ability to promote Read replicas, Rename as Master
Diagnostics
Native MySQL replication
SSL for encryption over the wire
Monitor metrics
Shell, super user or direct file system access (Think security!)
Optimizing for Developer Productivity MySQL Manual for Read Replica
OR Amazon RDS console
ElastiCache is a managed caching service.
Easy to set up and operate cache clusters
Scale cache clusters with push button ease
Without any operational burden Ultra fast response time for read scaling
Supports Memcached and Redis engines
Elastic Load Balancing EC2 App
Instances
RDS MySQL DB
Instance with PIOPS
Master App Reads
Clients Cache Updates
ElastiCache is a Performance Booster Read Replica (Redis) Serve most read queries
In-memory performance
Read/write queries SSD performance
Amazon DynamoDB is a managed NoSQL database service.
Store and retrieve any amount of data Scale throughput to millions of IO
Without any operational burden Single digit millisecond latencies
CreateTable
UpdateTable
DeleteTable
DescribeTable
ListTables
PutItem
GetItem
UpdateItem
DeleteItem
BatchGetItem
BatchWriteItem
Query
Scan
Manage tables
Query specific items OR scan the full table
“Select”, “insert”, “update” items
Bulk select or update (max 1MB)
Optimizing for Developer Productivity
Amazon Redshift is a managed data warehouse service.
Fast response time (~10x that of typical relational stores)
Without any operational burden Under $1,000 per TB per year
Petabyte scale columnar database
So, what are the tips and techniques for successful deployments?
CRM Software
Thousands of Successful Deployments Two Highlights
Zac Sprackett
Gaming Platform
SugarCRM
Mike Thomas
© 2013 Amazon.com, Inc. and its affiliates. All rights reserved. May not be copied, modified, or distributed in whole or in part without the express consent of Amazon.com, Inc.
Crafting Loyal Customers with SugarCRM Every Customer. Every User. Every Time.
S. Zachariah Sprackett, VP of Operations, SugarCRM
November 13, 2013
SugarCRM • Redefining Customer Relationship Management • Unique product bundling
– On Premise and Hosted offerings
• Manifest destiny – Source code access and SQL database per customer
• Scale – From one seat customers to multi thousand seat customers
• Globally distributed customer base
Deployment Models
Traditional SaaS SugarCRM
Application Stack
Shadow
Apache
PHP
MySQL
Elastic Search HTML5 & JavaScript
Linux Email Archiving
Background Jobs
Cloud Stacks
Amazon S3
Amazon Glacier
ElastiCache
EC2 Elastic Search EC2 Job Servers EC2 Web Servers
Amazon SES
RDS DB Instance
RDS DB Instance Read
Replica
Cloud Provider
Cloud Providers
Route 53
Managed Elastic IP
EC2 HA Proxy
EC2 HA Proxy
Cloud Stack
Management Console
Globally Distributed Cloud Providers
Delivering On Time and On Budget • Amazon lets you easily spin up testing environments
– Testing only works if you make use of it. Don’t make assumptions – Monitor everything
• Change in cost model can surprise finance – Planned capital expenditures versus after the fact operational expenditures – Use reserved instances – Third party tools such as Cloudability can help alert you of issues early
• Manage access keys effectively to control cost – Learn to love AWS Identity and Access Management (IAM)
Things to Watch Out For • Understand your IO requirements
– Make effective use of each of instance backed, Amazon EBS and Provisioned IOPS file systems
• Use the heck out of read replicas • Snapshots are incredibly useful
– But not available from a read replica • Don’t use the default parameter group for Amazon RDS
– Unless you really like restarting databases • Cold Standby is not instant on
– Don’t get stuck waiting for deployments in a forced failover scenario • ElastiCache is not clustered across availability zones • Watch out for the SLA
– 99.95% for a region even across two AZ’s – This doesn’t include user error
• You still need DBAs and Ops but they get to do cooler stuff
© 2013 Amazon.com, Inc. and its affiliates. All rights reserved. May not be copied, modified, or distributed in whole or in part without the express consent of Amazon.com, Inc.
Scopely
Michael Thomas – Principal Software Engineer with Scopely
November 13, 2013
Our technical infrastructure allows developers to build games efficiently for both iOS and Android.
ABOUT SCOPELY
Millions of Users Billions of Turns
All titles have reached the Top 5 in the App Store, and the last
three have been #1.
Challenges • Build a single platform to support many different
kinds of games – asynchronous turn based, single player, synchronous, etc.
• Scale up and down as games are tested, launched, grow, and are retired.
• We are not an infrastructure company – we must focus on building features that support game development.
Platform Features • Accounts / authentication • Gameplay / state persistence • Chat / messaging • In game economy • Facebook integration • Gifting • Single Player state tracking • Promotion / cross-promotion system • Statistics • Tournaments • Achievements
• Email targeting • Suggested friends • In game news system • External partner integration • Invitation attribution • Push notifications • Content management • Generic storage API • Application / device configuration • AB Testing
Different Features/Different Requirements • Dynamic scaling (game launches, promotions, tests) • High write/read ratio (playing turns) • Transactional consistency (real money purchases) • Indexed data (user accounts) • Complex, real-time data (leaderboards)
Scopely Gaming Platform
DynamoDB RDS
ElastiCache
ElastiCache
Memcached for performance, scalability, and cost savings
Amazon DynamoDB for unbounded data with heavy write load.
Redis for fast, complex caching and message passing.
MySQL for bounded, transactional, queryable data.
Operational Data Storage
S3
Amazon S3 for asset and image storage.
Analytics Data Pipeline
Scopely Gaming Platform
SQS: In-Flight Events EC2: Message Loader S3: Staged Messages
RDS: Process / Job Tracking S3: Processed Data
EC2: Redshift Loader Redshift Data Warehouse
EMR: Transformer
Schema Mapping DSL from centipede.schema.table import Table from centipede.attributes import * class GemsTurn(Table): user_id = Integer, lambda message: message['Data']['GameData']['CurrentPlayerId'] current_turn = Integer, lambda message: message['Data']['Gamedata']['CurrentTurn'] end_date = Timestamp, lambda message: message['Data']['GameData']['EndDate'] expiration = Timestamp, lambda message: message['Data']['GameData']['Expiration'] game_id = Guid, lambda message: message['Data']['GameData']['GameId'] resigning_user_id = Integer, lambda message: message['Data']['GameData']['ResigningPlayerId'] start_context = Integer, lambda message: message['Data']['GameData']['StartContext'] start_date = Timestamp, lambda message: message['Data']['GameData']['StartDate'] status = Integer, lambda message: message['Data']['GameData']['Status'] tournament_id = Guid, lambda message: message['Data']['GameData']['TournamentId'] tournament_price_category = Integer, lambda message: message['Data']['GameData']['TournamentPriceCategory'] tournament_price_paid = Integer, lambda message: message['Data']['GameData']['TournamentPricePaid'] tutorial_type = Integer, lambda message: message['Data']['GameData']['TutorialType'] winning_user_id = Integer, lambda message: message['Data']['GameData']['WinningPlayerId'] awards = List, lambda message: message['Data']['GameData']['Players'][GemsTurnHelper.current_user_index(message)]['Awards'] coins_gathered = List, lambda message: message['Data']['GameData']['Players'][GemsTurnHelper.current_user_index(message)]['CoinsGathered'] custom_statistics = VarChar, lambda message: message['Data']['GameData']['Players'][GemsTurnHelper.current_user_index(message)]['CustomStatistics'] has_hidden_game = Boolean, lambda message: message['Data']['GameData']['Players'][GemsTurnHelper.current_user_index(message)]['HasHiddenGame'] last_nudge_date = Timestamp, lambda message: message['Data']['GameData']['Players'][GemsTurnHelper.current_user_index(message)]['LastNudgeDate'] score = Integer, lambda message: message['Data']['GameData']['Players'][GemsTurnHelper.current_user_index(message)]['Score'] score_for_award = Integer, lambda message: message['Data']['GameData']['Players'][GemsTurnHelper.current_user_index(message)]['ScoreForAward'] opponent_user_id = Integer, lambda message: message['Data']['GameData']['Players'][GemsTurnHelper.opponent_user_index(message)]['UserId']
Use Case: Leaderboards • “What is my rank in today’s tournament?”
• Hard to cache since a single player getting a new high score
changes everyone’s rank
• Highly optimized schema required 4 m2.2xlarge RDS nodes
• Latency for “what is my rank” could be above 100ms
• Redis sorted sets provide exactly what we need. Two m2.xlarge instances are more than enough. Rank query is now in single digit milliseconds.
Redis
Use Case: Game/Turn State • Extremely high throughput. Extremely large dataset.
• Semi-structured data – each game models “state” differently.
• Always queried by UserID or GameID.
• Maxed out an Amazon RDS instance – instead of spending time sharding /
optimizing Amazon RDS, we moved to Amazon DynamoDB.
• Saves operational time and development time by not having to worry about growing games/adding new games/traffic spikes.
DynamoDB
Use Case: User Accounts • Need to maintain uniqueness across multiple
columns (email, username, etc.)
• Queryable on multiple facets (email, username, external identifier)
• Entire table needs to be scanned regularly (promotions)
• Bounded data size
MySQL (RDS)
Use Case: Global Caching • Cache everything possible in Memcached
including both entities in Amazon DynamoDB and RDS.
• Single interface providing session caching, memcached caching, and Amazon DynamoDB access encourages consistent use of caching.
Memcached (ElastiCache)
public class CoherentStorage { public Cache L1Cache { get; set; } public Cache L2Cache { get; set; } public DynamoClient Dynamo { get; set; } private readonly Games _game; public CoherentStorage(Games game) { _game = game; L1Cache = Cache.Request; L2Cache = Cache.GetMemcached(String.Format("{0}GameState", game)); Dynamo = DynamoClient.Instance; } public void Save(object instance) { } public void Delete(object instance) { } public T Get<T>(object id, bool skipCache = false, bool consistentRead = true) { } }
Use Case: Global Caching
Memcached (ElastiCache)
Tips & Traps • Know your data – use reasonable heuristics for expected
data growth.
• Each data storage technology introduces some level of operational and engineering overhead. Choose wisely.
• Get creative with Amazon DynamoDB.
• Prepare for the unexpected with Metadata columns in MySQL.
Please give us your feedback on this presentation
As a thank you, we will select prize winners daily for completed surveys!
DAT201