+ All Categories
Home > Technology > Life After Sharding: Monitoring and Management of a Complex Data Cloud

Life After Sharding: Monitoring and Management of a Complex Data Cloud

Date post: 10-May-2015
Category:
Upload: oscon-byrum
View: 1,882 times
Download: 0 times
Share this document with a friend
Description:
Slides from Boris Livshutz' presentation at OSCON 2012.
Popular Tags:
19
Life After Sharding: Managing a Complex Data Cloud Boris Livshutz, AppDynamics
Transcript
Page 1: Life After Sharding: Monitoring and Management of a Complex Data Cloud

Life After Sharding:Managing a Complex Data Cloud

Boris Livshutz, AppDynamics

Page 2: Life After Sharding: Monitoring and Management of a Complex Data Cloud

Why are you here?

• You already shard, plan to shard, or need to shard your data

• You’re considering a NoSQL solution for production

2 Copyright © AppDynamics. All rights reserved.

Page 3: Life After Sharding: Monitoring and Management of a Complex Data Cloud

About AppDynamics

• Distributed application monitoring for enterprise applications

• Data layer part of any enterprise app, monitored by us too

• Collecting massive amounts of metrics from our customers, store it all on MySQL

3 Copyright © AppDynamics. All rights reserved.

Page 4: Life After Sharding: Monitoring and Management of a Complex Data Cloud

About Me

4 Copyright © 2010 AppDynamics. All rights reserved.

• 2 decades of experience building DB kernels, OLAP, server side development

• 4 years at AppDynamics scaling our server and helping our largest customers

Page 5: Life After Sharding: Monitoring and Management of a Complex Data Cloud

What is a Data Cloud?

• Distinct set of data distributed across multiple nodes

• Multiple nodes work together to manage data

• Common examples:

• Sharded RDBMS

• NoSQL

• Data nodes can be part of a rented cloud or on-premise

5 Copyright © AppDynamics. All rights reserved.

Page 6: Life After Sharding: Monitoring and Management of a Complex Data Cloud

Before: The Monolithic DB

• Monitoring Tools

• Cacti, Nagios, MySQL Enterprise, Enterprise Manager, Foglight

• Both open source and commercial systems,

• Alerting: Emails to NOC and DBAs, regarding one database in trouble

• Management

• Query one database: SQL shell, Toad, etc.

• Backup: Hot backup tools for each database

• Schema upgrades: Connect to one database and run upgrade script

6 Copyright © AppDynamics. All rights reserved.

Page 7: Life After Sharding: Monitoring and Management of a Complex Data Cloud

Why We Need a Data Cloud

• The limits of vertical scale

• One Dell box – 256GB RAM, 32 cores, 36 disks in raid-60

• MySQL wasn’t able to use more then 12-16 cores

• 8 TB of data hard to backup, copy.

• Alter table almost impossible on largest tables

• No more growth option, no 256 core CPU!

• Hardware very expensive ($50K), cannot duplicate in test env

• Replication cannot keep up

• Advantages to horizontal scale

• Commodity hardware, easy to buy and expand

• $4k per box, 8 core, 48GB Ram, 5 disks

• MySQL is able to fully leverage the hardware, easier to tune

7 Copyright © AppDynamics. All rights reserved.

Page 8: Life After Sharding: Monitoring and Management of a Complex Data Cloud

Choosing a Data Cloud

• Shard existing RDBMS• Change application logic to be shard-aware (lots of code changes!)

• Use a proxy (Scalebase, DbShards, Spock, HiveDB)

• NoSQL• You are brave!

• Give up on ACID, decades of stability, etc

• Gain failover, auto-resharding, etc OOTB

8 Copyright © AppDynamics. All rights reserved.

Page 9: Life After Sharding: Monitoring and Management of a Complex Data Cloud

Dev Complete - Now What ??

• Can you just throw it over the wall to Ops?

• Almost no off the shelf tools to monitor and manage the data cloud

• DIY: only choice is to do it yourself. Sorry

9 Copyright © AppDynamics. All rights reserved.

Page 10: Life After Sharding: Monitoring and Management of a Complex Data Cloud

What did we do?

• We had one MySQL that kept growing and growing

• Sharded MySql into 7 replica sets, 2 replicas each.

• We couldn’t release it until Ops was ready to keep it up 24x7

• Built our own “glue” to manage and monitor this beast.

• We ate our own dog food

• We partnered and didn’t re-invent the wheel.

10 Copyright © AppDynamics. All rights reserved.

Page 11: Life After Sharding: Monitoring and Management of a Complex Data Cloud

Managing the Data Cloud

• ScaleBase

• Central point of management for data cloud

• The only source of truth: keeps track of each replica, location, naming, heartbeat, load

11 Copyright © AppDynamics. All rights reserved.

Page 12: Life After Sharding: Monitoring and Management of a Complex Data Cloud

Instant access to data in the Data Cloud

• Access DB data through the Scalebase LoadBalancer

• Can set mode to send both query and DML to all replicas or just a subset or one

• We send sql to specific replica without knowing its location

• The only location we connect to is the Scalebase LoadBalancer

• Other 3rd party tools can also connect to the Scalebase LoadBalancer without knowing about our Data Cloud

12 Copyright © AppDynamics. All rights reserved.

Page 13: Life After Sharding: Monitoring and Management of a Complex Data Cloud

Measure performance across your data cloud

13 Copyright © AppDynamics. All rights reserved.

Page 14: Life After Sharding: Monitoring and Management of a Complex Data Cloud

Measure performance – Replica deep dive

14 Copyright © AppDynamics. All rights reserved.

Page 15: Life After Sharding: Monitoring and Management of a Complex Data Cloud

Unified Alerting

• System wide alerts all come from single source - Scalebase

• Alerts go to PagerDuty to reach the right people on duty

• Alerts clearly identify replica set and replica node

• Allows quick resolutions by pinpointing problems in the data cloud

• NOC Response: SQL connection to troubleshoot via Scalebase

• Only need to know the replica and replica set from alert and can immediately investigate with SQL queries

• NOC Response: Use monitoring tool for deep dive investigation into the replica

15 Copyright © AppDynamics. All rights reserved.

Page 16: Life After Sharding: Monitoring and Management of a Complex Data Cloud

Synchronized maintenance tasks

• Backups

• Synchronized

• Backup is just a “job” in Scalebase engine, Scalebase runs it on every replica

• Scalebase tracks the status of each job execution on each replica

• Schema upgrades: upgrade program doesn't need to know about where things are in the data cloud

• Upgrader just connects to Scalebase and upgrade sql will be sent to the whole data cloud automatically

• Configuration Changes

• global changes can be done in sql by just connecting to Scalebase and executing same change on ALL replicas.

• One sql can be sent to all Replicas by Scalebase. Any errors will be logged

16 Copyright © AppDynamics. All rights reserved.

Page 17: Life After Sharding: Monitoring and Management of a Complex Data Cloud

Conclusions

• Lessons Learned

• Development, test and Ops needs to work together.

• Educate more of the team

• Most problems that arise are operational, not code bugs

• The right vendors really make it easier then doing everything yourself

• Future

• Automate failback with hot spare

• Try new technologies like XtraDB Cluster.

17 Copyright © AppDynamics. All rights reserved.

Page 18: Life After Sharding: Monitoring and Management of a Complex Data Cloud

Vendors

18 Copyright © AppDynamics. All rights reserved.

Page 19: Life After Sharding: Monitoring and Management of a Complex Data Cloud

Questions?


Recommended