Date post: | 10-May-2015 |
Category: |
Technology |
Upload: | oscon-byrum |
View: | 1,882 times |
Download: | 0 times |
Life After Sharding:Managing a Complex Data Cloud
Boris Livshutz, AppDynamics
Why are you here?
• You already shard, plan to shard, or need to shard your data
• You’re considering a NoSQL solution for production
2 Copyright © AppDynamics. All rights reserved.
About AppDynamics
• Distributed application monitoring for enterprise applications
• Data layer part of any enterprise app, monitored by us too
• Collecting massive amounts of metrics from our customers, store it all on MySQL
3 Copyright © AppDynamics. All rights reserved.
About Me
4 Copyright © 2010 AppDynamics. All rights reserved.
• 2 decades of experience building DB kernels, OLAP, server side development
• 4 years at AppDynamics scaling our server and helping our largest customers
What is a Data Cloud?
• Distinct set of data distributed across multiple nodes
• Multiple nodes work together to manage data
• Common examples:
• Sharded RDBMS
• NoSQL
• Data nodes can be part of a rented cloud or on-premise
5 Copyright © AppDynamics. All rights reserved.
Before: The Monolithic DB
• Monitoring Tools
• Cacti, Nagios, MySQL Enterprise, Enterprise Manager, Foglight
• Both open source and commercial systems,
• Alerting: Emails to NOC and DBAs, regarding one database in trouble
• Management
• Query one database: SQL shell, Toad, etc.
• Backup: Hot backup tools for each database
• Schema upgrades: Connect to one database and run upgrade script
6 Copyright © AppDynamics. All rights reserved.
Why We Need a Data Cloud
• The limits of vertical scale
• One Dell box – 256GB RAM, 32 cores, 36 disks in raid-60
• MySQL wasn’t able to use more then 12-16 cores
• 8 TB of data hard to backup, copy.
• Alter table almost impossible on largest tables
• No more growth option, no 256 core CPU!
• Hardware very expensive ($50K), cannot duplicate in test env
• Replication cannot keep up
• Advantages to horizontal scale
• Commodity hardware, easy to buy and expand
• $4k per box, 8 core, 48GB Ram, 5 disks
• MySQL is able to fully leverage the hardware, easier to tune
7 Copyright © AppDynamics. All rights reserved.
Choosing a Data Cloud
• Shard existing RDBMS• Change application logic to be shard-aware (lots of code changes!)
• Use a proxy (Scalebase, DbShards, Spock, HiveDB)
• NoSQL• You are brave!
• Give up on ACID, decades of stability, etc
• Gain failover, auto-resharding, etc OOTB
8 Copyright © AppDynamics. All rights reserved.
Dev Complete - Now What ??
• Can you just throw it over the wall to Ops?
• Almost no off the shelf tools to monitor and manage the data cloud
• DIY: only choice is to do it yourself. Sorry
9 Copyright © AppDynamics. All rights reserved.
What did we do?
• We had one MySQL that kept growing and growing
• Sharded MySql into 7 replica sets, 2 replicas each.
• We couldn’t release it until Ops was ready to keep it up 24x7
• Built our own “glue” to manage and monitor this beast.
• We ate our own dog food
• We partnered and didn’t re-invent the wheel.
10 Copyright © AppDynamics. All rights reserved.
Managing the Data Cloud
• ScaleBase
• Central point of management for data cloud
• The only source of truth: keeps track of each replica, location, naming, heartbeat, load
11 Copyright © AppDynamics. All rights reserved.
Instant access to data in the Data Cloud
• Access DB data through the Scalebase LoadBalancer
• Can set mode to send both query and DML to all replicas or just a subset or one
• We send sql to specific replica without knowing its location
• The only location we connect to is the Scalebase LoadBalancer
• Other 3rd party tools can also connect to the Scalebase LoadBalancer without knowing about our Data Cloud
12 Copyright © AppDynamics. All rights reserved.
Measure performance across your data cloud
13 Copyright © AppDynamics. All rights reserved.
Measure performance – Replica deep dive
14 Copyright © AppDynamics. All rights reserved.
Unified Alerting
• System wide alerts all come from single source - Scalebase
• Alerts go to PagerDuty to reach the right people on duty
• Alerts clearly identify replica set and replica node
• Allows quick resolutions by pinpointing problems in the data cloud
• NOC Response: SQL connection to troubleshoot via Scalebase
• Only need to know the replica and replica set from alert and can immediately investigate with SQL queries
• NOC Response: Use monitoring tool for deep dive investigation into the replica
15 Copyright © AppDynamics. All rights reserved.
Synchronized maintenance tasks
• Backups
• Synchronized
• Backup is just a “job” in Scalebase engine, Scalebase runs it on every replica
• Scalebase tracks the status of each job execution on each replica
• Schema upgrades: upgrade program doesn't need to know about where things are in the data cloud
• Upgrader just connects to Scalebase and upgrade sql will be sent to the whole data cloud automatically
• Configuration Changes
• global changes can be done in sql by just connecting to Scalebase and executing same change on ALL replicas.
• One sql can be sent to all Replicas by Scalebase. Any errors will be logged
16 Copyright © AppDynamics. All rights reserved.
Conclusions
• Lessons Learned
• Development, test and Ops needs to work together.
• Educate more of the team
• Most problems that arise are operational, not code bugs
• The right vendors really make it easier then doing everything yourself
• Future
• Automate failback with hot spare
• Try new technologies like XtraDB Cluster.
17 Copyright © AppDynamics. All rights reserved.
Vendors
18 Copyright © AppDynamics. All rights reserved.
Questions?