MySQL: Cloud vs Bare Metal,
Performance and ReliabilityLos Angeles MySQL Meetup
Vladimir Fedorkov, March 31, 2014
Let’s meet each other
• Performance geek
– All kinds MySQL and some Sphinx
• Working for Blackbird
• Twitter @vfedorkov
– Slides will be there: http://astellar.com
Credits and thank yous
• LAMySQL Meetup
– Joe Devon
– Caroline Rose
• Blackbird, AWS advances tech partner
– Laine Campbell
– http://blackbird.io
– Blackbird’s Rsync and EU team
How does it start
• Startup deciding to go with MySQL
• Getting successful
• That server gets slower and slower
• Now what to do?
Usual ways to grow
• Buy bigger server (scale up)
• Buy another server (scale out)
• Tune up your application
• Limit functionality
Questions to answer
• Which server to buy?
– Where?
• How long does it take to get new server?
• How long does it take to make it work?
• Why it didn’t help?
• Why application is slow?
Which server to get?
• How much would infrastructure will cost me?
• Which performance will I get?
• How easy would be to add more servers?
• How much would it cost me?
Physical servers
• Predictable performance
• Raw infrastructure
• Takes long to install
• Old-style operations
Amazon EC2
• Virtually provisioned boxes
– Managed infrastructure
• Boxes
• Network
• Attached disks
– Web/API bound operations
• Strong scalability suport
– Full OS access
• Flexible performance
– More your pay – more you’ll get
Amazon RDS
• Fully managed MySQL instances (SaaS)
– Oracle, MSSQL, RedShift, etc.
– No OS access
• Same on-demand scalability as EC2
– Scripted scaling up/down
• MySQL-specific operations can be scripted
Resources
Bare metal EC2 RDS
CPU PhysicalVirtual, limited
Virtual, limited
RAM Physical Shared Shared
IO PredictableNetwork-
based, limitedNetwork-
based, limited
Day-to-day MySQL operations
• Hardware faults
• Software bugs
• Reliable backup and restore
• Monitoring
• Upgrading and replication setup
• Networking and Security
Physical servers vs EC2 vs RDS
Bare metal EC2 RDS
Provisioning By you Ready-to-go boxes DB instances
Monitoring Your own OS-level by amazon All by amazon
BackupsYour own
Or no backups
Snapshot-basedManual on MySQL
level
Automated, no delayed replicas
Replication Manual Manual Auto magic
Networking All yours Provided by AWS Provided by AWS
Hardware faultsand software bugs
All yours Included Included
Amazon cloud details
• Several regions (US, Europe, Asia)
• Several availability zones inside one region
• Different instance types
– General
– Memory optimized
– Compute-optimized
– GPU-instances
– Storage optimized
Storage
• EBS
– EC2 / RDS
• POIPS
– EC2 / RDS
• SSD-based backend
• Ephemeral
– EC2 only, attached SSD
• Attached
RDS details
• Parameter groups instead of my.cnf
• One click Web interface
– API available
• Limited visibility and operations
• Cloud-specific behavior
– Upgrade as a Surprise
• Can be disables in control panel
– Instance relocation as-a-surprise
Know your application
• Configure MySQL properly
• Tune your queries
• Monitor your system
• Predict your traffic
EC2 and RDS specific
• Hardware might be different
– Even each benchmark run
• IO is highly relies on network
• CPU & IO limiters are in place
• You don’t know your cloud neighbors
First step
• MySQL configuration
– 5% settings makes 95% performance
• Which every query could kill
• What to tune at the first place
– innodb_buffer_pool_size
• Set on ¾ memory in RDS by default
– innodb_file_per_table
– key_buffer for MyISAM
– server-id for EC2 and standalone
Second step
• Network
– skip_name_resolv
• May affect authentication
– max_connect_errors
• FLUSH HOSTS
– max_allowed_packet
• IO
– innodb_flush_log_at_trx_commit
– innodb_log_file_size
Changes to avoid (usually)
• max_join_size
• sort_buffer_size
• tx_isolation
• Remember about 5% settings and 95% performance!
Backups
• What would you do if your MySQL has gone?
– Replicas-Replicas-Replicas
• … and binary logs– For point-in-time recovery
– Better to another host
– expire_logs_days
– max_binlog_size
• RDS snapshots
• Volume snapshots on EC2
– Works like LVM backups
Consider worst case
• Limit InnoDB IO capacity on EBS
– Affecting replica creation time!
• Tune flush_log_at_trx_commit
• Keep tmp_table_size big enough
– Temporary tables are BAD!
– There are drawbacks!
• Know your queries!
Queries
• Most likely the cause of MySQL slowness
– Affect application speed
• Fastest MySQL query is the query that did not MySQL at all
– Cached
– Sent to another storage (you name it)
– Just didn’t run
• Helps in case of overload
Bad queries: full table scans
SELECT * FROM table …
• WHERE DAY(FROM_UNIXTIME(`ts`)) = 205
• WHERE enabled != 1
• WHERE id NOT IN (1,2,3,…,10)
• WHERE url LIKE ‘%something%’
– LIKE ‘something%’ is still okay
• ORDER BY RAND()– All time classic
Poor index usage
• SELECT * / COUNT(*) FROM users WHERE enabled=1
– Hello half of the table
• How to fix?– Pre-aggregation
– Caches!
• Not in MySQL query cache!
– Right time to use external storage (Mongo, Redis, Sphinx, etc)
Temporary tables
• Bad
– Very bad if they hit the disk
• Will hit if there are TEXT or BLOB fields
• When?
– GROUP BY
– Sub SELECTS (better in 5.6)
– DISTINCT + ORDER BY
What else to do?
• Scale
• Run heavy queries on “reporting” replica
• Rewrite bad queries where possible
When load is really high
• Query cache (off)
• Thread cache
• Table cache
• Slow query log
• wait_timeout
• connection pooling
• Sharding and replication