+ All Categories
Home > Technology > Pluk2013 bodybuilding ratheesh

Pluk2013 bodybuilding ratheesh

Date post: 28-May-2015
Category:
Upload: ratheesh-kaniyala
View: 245 times
Download: 0 times
Share this document with a friend
Description:
Slides from my talk @ Percona Live London 2013. This talk is about database administration and how we manage percona xtradb at bodybuilding.com. There are a few benchmarks about percona, fusionio and xfs/ext4 file systems.
Popular Tags:
64
Bodybuilding.com Information Motivation Supplementation
Transcript
Page 1: Pluk2013 bodybuilding ratheesh

S

Bodybuilding.com •  Information •  Motivation •  Supplementation

Page 2: Pluk2013 bodybuilding ratheesh

About Me..

•  Ratheesh Kaniyala

•  MySQL Database Admin @ Bodybuilding.com

•  Responsible for keeping the site up and running

•  Work with system engineers, developers, product owners to get the best out of the database system

Page 3: Pluk2013 bodybuilding ratheesh

Agenda

S  About Bodybuilding.com

S  Mysql Architecture @ Bodybuilding.com

S  Backups

S  Monitoring

S  Some Benchmarks on Fusionio

S  NoSQL @ BBcom

Page 4: Pluk2013 bodybuilding ratheesh

Bodybuilding.com

S  What is Bodybuilding.com?

We:

•  Provide Information + Motivation + Supplementation for leading a healthy lifestyle

•  Are not just for bodybuilders

•  Have hundreds of vendors

•  Have thousands of content (articles, exercises, videos)

•  Have millions of users – still growing

Page 5: Pluk2013 bodybuilding ratheesh
Page 6: Pluk2013 bodybuilding ratheesh
Page 7: Pluk2013 bodybuilding ratheesh
Page 8: Pluk2013 bodybuilding ratheesh

Statistics

S  Ranked #319 US website and #709 worldwide

S  15th largest forum with over 100 million posts

S  About 65TB served from Akamai and 57 TB from cloudfront

S  About 25 Million Unique visitors per month

S  About 5000 Bodyspace registrations per day

Page 9: Pluk2013 bodybuilding ratheesh

APPLICATIONS

S  Our main applications: S  Busy Forum – Powered by vBulletin, Mysql

S  Bodyspace – Php, Java, MySQL, MongoDB

S  Store – ATG (Art Technology Group), Oracle 11G

S  Content – Php, MySQL

Page 10: Pluk2013 bodybuilding ratheesh

Why Percona?

S  This is what Percona says: •  Your queries will run faster and more consistently. •  You will consolidate servers on powerful hardware. •  You will delay sharding, or avoid it entirely. •  You will spend less time tuning and administering. •  You will achieve higher uptime. •  You will troubleshoot without guesswork.

S  And we found this to be true for every point.

Page 11: Pluk2013 bodybuilding ratheesh

High Level Tech Diagram

WEB SERVERS

CACHING LAYER

MYSQL RDBMS

NOSQL SPHINX/SOLR

SECURITY LAYER

DATA LAYER

Page 12: Pluk2013 bodybuilding ratheesh

DATA LAYER

Cluster A Cluster B Cluster C Cluster D

Page 13: Pluk2013 bodybuilding ratheesh

A Typical MYSQL cluster

� � �

Writes & some reads Reads

Backups Offsite Backup

Analytics

Search Indexes

Query replay for lru_dump

Page 14: Pluk2013 bodybuilding ratheesh

HA

•  Multi-master asynchronous replication

•  Active – Passive mode

•  What happens when any slave dies?

•  What happens when passive master dies?

•  What happens when Active master dies?

Page 15: Pluk2013 bodybuilding ratheesh

Active Master Died - Case1

Read only slaves

•  M2 and all slaves will end up being at the same position

•  Point all slaves to the M2 by change master

•  Resume traffic

Page 16: Pluk2013 bodybuilding ratheesh

Active Master Died – Case 2

Read only slaves

•  M2 and all slaves are in different states

•  Promote the most up-to-date slave as master

•  Fix the slave pool by looking into binary logs

Page 17: Pluk2013 bodybuilding ratheesh

Maintenance on Active master

Read only slaves

•  Move all slaves to M2

•  Change DNS to make M2 active

Page 18: Pluk2013 bodybuilding ratheesh

Maintenance on active contd…

Read only slaves

•  Move all slaves to M2

•  Change DNS to make M2 active

Page 19: Pluk2013 bodybuilding ratheesh

Maintenance on active contd…

•  It is not that easy to move N slaves to M2 manually because the binlog file and position on M2 does not match that on M1

•  Can we automate this slave migration? •  Yes

•  How? •  There is an algorithm that I can share

Page 20: Pluk2013 bodybuilding ratheesh

Slave migration algorithm

S  step 1: For each slave node do STOP SLAVE

S  step 2: Sleeps for 10 sec so that the secondary master is slightly ahead than the stopped slaves

S  step 3: STOP SLAVE on the secondary master and record the SHOW SLAVE STATUS result on it.

S  step 4: For each slave node do START SLAVE UNTIL the recorded position in step 3

S  step 5: For each slave node do STOP SLAVE (required for changing master later) S  a) check the slave status and stop slave only when the current position = recorded position in step

3 S  b) If the slave is behind by too much and does not catch up to the UNTIL position in 20 secs then

skip the node and make a note about it in the console and the log file. This node has to be taken care of manually later. Look for directions in the log file on how to change master on this node.

Page 21: Pluk2013 bodybuilding ratheesh

S  step 6: do SHOW MASTER STATUS on secondary master and record the position

S  step 7: For each slave node do a CHANGE MASTER TO secondary master with the position recorded in step 6

S  Step 8: do START SLAVE on secondary master

S  step 9: For each slave node do START SLAVE

Slave migration algorithm contd..

Page 22: Pluk2013 bodybuilding ratheesh

Backups

•  Nightly logical backups (mysqldump)

•  Every 2 hours - lvm snapshot backups of mysql datadir

•  Stop slave

•  Take snapshot

•  All backups are done on a dedicated backup slave (B1)

•  Copy to NAS

•  Be ready for next snapshot

•  mylvmbackup script provides various hook points to call a custom script

•  Note: If you are not validating your backups then you don’t care about your data?

Page 23: Pluk2013 bodybuilding ratheesh

Preconnect

Premount

Backupsuccess

Snapshot Process

Update Monitoring

Page 24: Pluk2013 bodybuilding ratheesh

Backupsuccess

•  Start mysql on the snapshot data – This does mysql recovery and keep the snapshot ready-to-use

•  Validate backup by running custom queries, counts, statistics etc so that you are sure that it can be used for server rebuilds and recovery

Page 25: Pluk2013 bodybuilding ratheesh

Monitoring

•  We use zabbix for monitoring – alerting + trending

•  Close to 100,000 checks

•  Mysql Performance Monitor plugin for Innodb

•  Custom scripts can use zabbix trapper – e.g. data size trend

•  Dbview (php based tool) for realtime processlist and slow query history

•  Red Light is ON – Means trouble

Page 26: Pluk2013 bodybuilding ratheesh

Monitoring Contd..

Page 27: Pluk2013 bodybuilding ratheesh
Page 28: Pluk2013 bodybuilding ratheesh
Page 29: Pluk2013 bodybuilding ratheesh

DBVIEW

Page 30: Pluk2013 bodybuilding ratheesh

DBVIEW contd..

Page 31: Pluk2013 bodybuilding ratheesh

Benchmarking

S  To understand the capacity of the hardware and software

S  To understand what happens when you have a high traffic day

S  To understand the impact of configuration changes

S  To test the storage unit

Page 32: Pluk2013 bodybuilding ratheesh

Benchmarking with linkbench

•  Linkbench from Facebook simulates a socio graphic workload

•  The default workload pattern kind of matches our workload

•  https://github.com/facebook/linkbench

•  https://www.facebook.com/notes/facebook-engineering/linkbench-a-database-benchmark-for-the-social-graph/10151391496443920

•  Read-Write operation mix, Throttle requests

•  Target hot nodes

•  Extendable and you can write plugins – not tried!

•  Note: Linkbench partitions the linktable. If you want to benchmark without partition, create the table without partition and then load data.

Page 33: Pluk2013 bodybuilding ratheesh

Linkbench results

•  Ops/Sec per thread

•  [Thread-122]: 2190000/100000000 requests finished: 2.2% complete at 28311.8 ops/sec 77.4/100000

•  Ops/Sec Total

•  [main]: REQUEST PHASE COMPLETED. 100000000 requests done in 2641 seconds. Requests/second = 37852

Page 34: Pluk2013 bodybuilding ratheesh

Benchmark Cases •  Database Size – 175GB, 400GB, 40 GB

•  RAM – 64 GB

•  Innodb_buffer_pool_size – 50GB for all tests

•  Storage – Fusion iodrive2

•  File System – XFS, EXT4

•  Database Server Specs – Dell R620, 2 Chip-8 core-HT => 32 procs, Intel Xeon E5 2680 processor, debian Linux Squeeze, 2.6.32 Kernel

Benchmark Parameters

Page 35: Pluk2013 bodybuilding ratheesh

Innodb settings for fusionio

•  Innodb_read_ahead=0

•  Innodb_read_io_threads=8

•  Innodb_adaptive_checkpoint=keep_average

•  Innodb_flush_method=O_DIRECT

•  Innodb_io_capacity=10000

•  Innodb_flush_neighbor_pages=0

Page 36: Pluk2013 bodybuilding ratheesh

Benchmark results

25000 30000

33000 38000

0 5000

10000 15000 20000 25000 30000 35000 40000

XFS EXT4

Ops

per

sec

Percona 5.1

Percona 5.5

Linkbench 175G database on fusion iodrive2 •  200 threads •  RW mix (70:30)

Page 37: Pluk2013 bodybuilding ratheesh

Innodb Mutex in 5.1

Page 38: Pluk2013 bodybuilding ratheesh

Bigger Database

10000

13000

0

2000

4000

6000

8000

10000

12000

14000

XFS EXT4

Ops

Per

Sec

•  200 threads •  RW mix (70:30)

Percona 5.5.33 - 400G db

Page 39: Pluk2013 bodybuilding ratheesh

Fusion Fullness

21000

33000 33000 38000

0 5000

10000 15000 20000 25000 30000 35000 40000

XFS EXT4

Ops

Per

Sec

•  175G db + 400G junk •  200 threads •  RW Mix (70:30)

More Full

Less Full

Percona 5.5.33 – 175G DB

Page 40: Pluk2013 bodybuilding ratheesh

Fragmentation Effect

27000

36000

0

5000

10000

15000

20000

25000

30000

35000

40000

XFS EXT4

Ops

Per

Sec

Percona 5.5.33 – 175G DB •  200 threads •  RW mix (70:30)

Page 41: Pluk2013 bodybuilding ratheesh

Doublewrite Effect

38000

45000

34000

36000

38000

40000

42000

44000

46000

Ops

Per

Sec

Doublewrite ON

•  No atomicity without double write

•  Percona supports atomic writes using DirectFS NVM Filesystem

•  RW Mix (70:30) •  200 threads

Percona 5.5.33 – 175G DB

Page 42: Pluk2013 bodybuilding ratheesh

READ Workload

59000

59500

58500

59000

59500

60000

XFS EXT4

Ops

Per

Sec

•  90% Reads •  10% Writes •  200 threads

Percona 5.5.33 – 175G DB

Page 43: Pluk2013 bodybuilding ratheesh

WRITE Workload

6000

14000

0

2000

4000

6000

8000

10000

12000

14000

16000

XFS EXT4

Ops

Per

Sec

•  90% writes •  10% reads •  200 threads

Percona 5.5.33 – 175G DB

Page 44: Pluk2013 bodybuilding ratheesh

More threads

18000

35000

0 5000

10000 15000 20000 25000 30000 35000 40000

XFS EXT4

Ops

Per

Sec

Percona 5.5.33 – 175G DB

•  400 Threads •  RW Mix (70:30)

Page 45: Pluk2013 bodybuilding ratheesh

Bufferpool Instances

37000

37500

38000

38500

39000

39500

40000

1 2 8

Ops

Per

Sec

Bufferpool Instances

•  200 threads •  70:30 RW mix •  NUMA has a

role?

Percona 5.5.33 – 175G DB

Page 46: Pluk2013 bodybuilding ratheesh

1 bufferpool instance

Page 47: Pluk2013 bodybuilding ratheesh

Low concurrency

S  Is important because the workload may not always be high concurrency

S  The system resources are best utilized under high concurrency but there could be a lot of situations where you could just have 8 threads running or 4 threads running.

S  How does the database behave in these situation?

S  Could there be performance regression?

Page 48: Pluk2013 bodybuilding ratheesh

Low concurrency contd..

7500 7300

11000 10500

0

2000

4000

6000

8000

10000

12000

Percona 5.1 Percona 5.5

4 threads

8 threads

Page 49: Pluk2013 bodybuilding ratheesh

ODIRECT Vs Buffered

16500

30000 33000

38000

0 5000

10000 15000 20000 25000 30000 35000 40000

XFS EXT4

Ops

Per

Sec

Buffered

O_DIRECT

Page 50: Pluk2013 bodybuilding ratheesh

EXT4 Buffered Vs XFS Buffered

Page 51: Pluk2013 bodybuilding ratheesh

CPU USAGE

Page 52: Pluk2013 bodybuilding ratheesh

Await

Page 53: Pluk2013 bodybuilding ratheesh

Fusion Throughput

Page 54: Pluk2013 bodybuilding ratheesh

Innodb Semaphores

Page 55: Pluk2013 bodybuilding ratheesh

Innodb Data RW

Page 56: Pluk2013 bodybuilding ratheesh

EXT4 ODIRECT Vs XFS ODIRECT

Page 57: Pluk2013 bodybuilding ratheesh

CPU Usage

Page 58: Pluk2013 bodybuilding ratheesh

Fusion Throughput

Page 59: Pluk2013 bodybuilding ratheesh

Innodb data RW

Page 60: Pluk2013 bodybuilding ratheesh

Innodb Semaphores

Page 61: Pluk2013 bodybuilding ratheesh

Innodb Transactions

Page 62: Pluk2013 bodybuilding ratheesh

NOSQL

•  User feeds, Nested comments, Personalization

•  User feeds can be nasty

•  Some users have thousands of friends

•  Millions of feeds

•  Feeds can have events that cannot share a schema

•  E.g. Somebody updated weight != somebody updated bodyfat %

•  Mongodb a savior plus a killer

•  Savior because of its schemaless nature

•  Killer because of its locking issues, index immaturity

•  Sharding may solve the problem but not a shardable problem.

Page 63: Pluk2013 bodybuilding ratheesh

_id:, Text:, Date:, Event_ids:

Id1, Id2, . .

_id:, Weight:,

_id:, BodyFat:,

User Feeds

Feeds Collection Events Collection

Page 64: Pluk2013 bodybuilding ratheesh

Thank You


Recommended