+ All Categories
Home > Documents > SSDs at Etsy - Percona...Why Are We Here? What is this talk about? • Evolution of sharded...

SSDs at Etsy - Percona...Why Are We Here? What is this talk about? • Evolution of sharded...

Date post: 18-Apr-2020
Category:
Upload: others
View: 2 times
Download: 0 times
Share this document with a friend
27
PERCONA LIVE EUROPE: AMSTERDAM OCTOBER 5, 2016 Jeremy Tinley, Senior MySQL Operations Engineer Twitter: @techwolf359 SSDs at Etsy: A War Story 1
Transcript
Page 1: SSDs at Etsy - Percona...Why Are We Here? What is this talk about? • Evolution of sharded databases at Etsy • What problems we faced along the way • How SSDs made everything

PERCONA LIVE EUROPE: AMSTERDAM

OCTOBER 5, 2016

Jeremy Tinley, Senior MySQL Operations Engineer

Twitter: @techwolf359

SSDs at Etsy: A War Story

1

Page 2: SSDs at Etsy - Percona...Why Are We Here? What is this talk about? • Evolution of sharded databases at Etsy • What problems we faced along the way • How SSDs made everything

Why Are We Here?

What is this talk about? • Evolution of sharded databases at Etsy • What problems we faced along the way • How SSDs made everything better

What is this talk NOT about? • Cloud, Serverless, DevOps, Containers • A deep dive into how SSDs work (hint: it's magic)

What will there definitely be? • Hardware Specs, Vendors and Models • Slides Online After Presentation • Cat Pictures

2

Page 3: SSDs at Etsy - Percona...Why Are We Here? What is this talk about? • Evolution of sharded databases at Etsy • What problems we faced along the way • How SSDs made everything

MySQL Architecture at Etsy

Three Main Databases • Shards: All User Generated Data • Tickets: Globally Unique IDs • Index: ID to Shard Mapping, Convenience Data

Active-Active Reads+Writes • id % 2: odd goes to A, even goes to B

3

Page 4: SSDs at Etsy - Percona...Why Are We Here? What is this talk about? • Evolution of sharded databases at Etsy • What problems we faced along the way • How SSDs made everything

MySQL Architecture at Etsy

Data Lifecycle • Fetch a new unique ID from tickets • Pick a shard location and write mapping to index • Write user data to shards

All production databases are physical hosts in a data center • No containers • No virtualization

4

Page 5: SSDs at Etsy - Percona...Why Are We Here? What is this talk about? • Evolution of sharded databases at Etsy • What problems we faced along the way • How SSDs made everything

Shards v1

5

Page 6: SSDs at Etsy - Percona...Why Are We Here? What is this talk about? • Evolution of sharded databases at Etsy • What problems we faced along the way • How SSDs made everything

Shards v1 - Architecture

Hardware • (60) HP G8 / 96GB / 160GB x16 RAID-10 (1.1TB)

Logical Layout • Active-Active / Master-Master Replication • 1 database on 1 MySQL instance per server • MySQL 5.1 -> 5.5

6

Page 7: SSDs at Etsy - Percona...Why Are We Here? What is this talk about? • Evolution of sharded databases at Etsy • What problems we faced along the way • How SSDs made everything

Shards v1 - Problems

Problem: Consistently Running out of Disk Capacity • User generated data was growing fairly linearly • Data generated *about* users grew faster • Ended with 30 pairs of servers

Problem: Migration of Data Was Painful • Row-by-Row migration of data • Set a migration lock on index to stop writes

7

Page 8: SSDs at Etsy - Percona...Why Are We Here? What is this talk about? • Evolution of sharded databases at Etsy • What problems we faced along the way • How SSDs made everything

Shards v2

8

Page 9: SSDs at Etsy - Percona...Why Are We Here? What is this talk about? • Evolution of sharded databases at Etsy • What problems we faced along the way • How SSDs made everything

Shards v2 - Architecture

Hardware • (60) Dell R720 / 128GB / 320GB x16 RAID-10 (2.2TB) • (60) HP G8 / 96GB / 160GB x16 RAID-10 (1.1TB)

Logical Layout • Active-Active / Master-Master Replication • 1.1TB: 10 databases on 1 MySQL instance per server • 2.2TB: 22 databases on 1 MySQL instance per server • MySQL 5.5

9

Page 10: SSDs at Etsy - Percona...Why Are We Here? What is this talk about? • Evolution of sharded databases at Etsy • What problems we faced along the way • How SSDs made everything

Shards v2 - Architecture

Problem Solved: Disk Capacity • Was 60TB, Now 180TB • Double the server footprint vs triple

Problem Solved: Migration Complexity • 960 database “buckets” • Expand by relocating a database onto another host

10

Page 11: SSDs at Etsy - Percona...Why Are We Here? What is this talk about? • Evolution of sharded databases at Etsy • What problems we faced along the way • How SSDs made everything

Shards v2 - Problems

Problem: Data Redundancy • Starting with 60a+60b physical servers • Adding 60 4-hour delayed replicas • Adding 60 offsite replicas • Faced with 240 servers

Problem: Running on Half Servers Every Week • Schema change process is to pull A, apply on A, put A back in &

repeat on B • Suffering a double server failure unlikely but why risk it? • Adding another realtime replica to A+B == 6 copies of data • Faced with 360 servers!

11

Page 12: SSDs at Etsy - Percona...Why Are We Here? What is this talk about? • Evolution of sharded databases at Etsy • What problems we faced along the way • How SSDs made everything

Shards v2 - Problems

Problem: 360 Servers is Too Many • DBA staff of 2 • Automation exists but not evolved • Cost inefficient by ways of power, data center space and time • Maintenance very time consuming (patching, upgrades, firmware)

Problem: Warranty Expiration • Half of production expiring within 12 months

12

Page 13: SSDs at Etsy - Percona...Why Are We Here? What is this talk about? • Evolution of sharded databases at Etsy • What problems we faced along the way • How SSDs made everything

Shards v3

13

Page 14: SSDs at Etsy - Percona...Why Are We Here? What is this talk about? • Evolution of sharded databases at Etsy • What problems we faced along the way • How SSDs made everything

Shards v3 - Non-Master Replicas First

Hardware • (13) Dell R630 / 384GB / 960GB 12/12 RAID-6 SSD (19.2TB) • (13) Dell R630 / 384GB / 960GB x10 RAID-6 SSD (7.6TB)

Logical Layout • 19.2TB: Multi-Instance per Server for real-time, delayed • 7.6TB: Multi-Instance per Server for offsite • MySQL 5.5

14

Page 15: SSDs at Etsy - Percona...Why Are We Here? What is this talk about? • Evolution of sharded databases at Etsy • What problems we faced along the way • How SSDs made everything

Shards v3 - Non-Master Replicas First

Problems Solved: Data Redundancy, Running on Half Servers, 360 is Too Many • 26 servers doing the work of 240 servers • 1U instead of 2U chassis • Testing running a master on a consolidated server — Worked!

Confidence made us think, why not start replacing everything with SSDs?

15

Page 16: SSDs at Etsy - Percona...Why Are We Here? What is this talk about? • Evolution of sharded databases at Etsy • What problems we faced along the way • How SSDs made everything

Shards v3 - Hardware Issues

Upgrading Index • Replaced with similar hardware as previous servers • Ran for less than 24 hours before it crashed • Multiple disk failure, 3 in RAID-6 is an array failure

Time to go to Dell • Replaced with Intel 800GB (3610) • Problem solved!

16

Page 17: SSDs at Etsy - Percona...Why Are We Here? What is this talk about? • Evolution of sharded databases at Etsy • What problems we faced along the way • How SSDs made everything

Shards v3 - Hardware Issues

Consolidated Servers Started Crashing • SSD vendor was LITEON • Issue with garbage collection and controller timeouts • Firmware upgrade to fix it, but it didn’t • Continued to have issues with drives being kicked out of the array • Also had problems with over-utilization/write endurance on SSDs • Replaced with Samsung 960GB (PM863) that have a higher write

endurance • Both problems solved!

17

Page 18: SSDs at Etsy - Percona...Why Are We Here? What is this talk about? • Evolution of sharded databases at Etsy • What problems we faced along the way • How SSDs made everything

Shards v3 - Planning

Slow Down, Re-evaluate • What is our goal? • How can we avoid more nightmares?

Goal was Server Density • How much can you fit into a single server? • How can this continue to be easy to expand capacity?

18

Page 19: SSDs at Etsy - Percona...Why Are We Here? What is this talk about? • Evolution of sharded databases at Etsy • What problems we faced along the way • How SSDs made everything

Shards v3 - Planning

Wrote a Document Detailing the Project • Start with Problem Statement: “We Have Too Many Servers” • Key Wins:

• Schema Change Speed Faster on SSDs • Power Utilization • Data Center Space Reduction

• Detailed Technical Implementation • “…but will it scale?” • How do splits work?

• Deployment Plan • Risks and Unknowns

Circulated the Document Widely

19

Page 20: SSDs at Etsy - Percona...Why Are We Here? What is this talk about? • Evolution of sharded databases at Etsy • What problems we faced along the way • How SSDs made everything

Shards v3 - Architecture

Hardware • (30) Dell R630 / 512GB / 800GB 12x12 RAID-6 (15TB)

Logical Layout • Active-Active / Master-Master Replication • (20) 22 databases x 3 instances per server [66 dbs] • (10) 10 databases x 6 instances per server [60 dbs] • MySQL 5.5

20

Page 21: SSDs at Etsy - Percona...Why Are We Here? What is this talk about? • Evolution of sharded databases at Etsy • What problems we faced along the way • How SSDs made everything

Shards v3 - Architecture

Problem Solved: 360 is Too Many • Originally 120 servers, was projected to be 360 servers • Now we only have 56! • Started with 60TB, then 180TB, now 450TB

21

Page 22: SSDs at Etsy - Percona...Why Are We Here? What is this talk about? • Evolution of sharded databases at Etsy • What problems we faced along the way • How SSDs made everything

Shards v3 - Graphs - Site Performance

Site Performance During Schema Change is Bad! • We Pull Side A • Side B Receives Side A Traffic but is Cold • I/O Wait Jumps • PHP Response Time Gets Much Slower • 15-30 Minutes for Warm-Up

SSDs Solve This! • Random Reads are Faster • Swinging A to B Still Incurs Buffer Pool Churn • I/O is no longer a bottleneck • Site Performance stays Steady

22

Page 23: SSDs at Etsy - Percona...Why Are We Here? What is this talk about? • Evolution of sharded databases at Etsy • What problems we faced along the way • How SSDs made everything

Shards v3 - Graphs - CPU Utilization

How do 2 years of CPU evolution stack up? • Pretty amazing, actually. • Single 10-database instance runs at 10% CPU • 6 10-database instances run at 15% CPU • 50% increase in CPU for 6x density

23

Page 24: SSDs at Etsy - Percona...Why Are We Here? What is this talk about? • Evolution of sharded databases at Etsy • What problems we faced along the way • How SSDs made everything

Shards v3 - Graphs - Query Performance

At 3-6x density, how will this impact query latency? • Old hardware was 707µ, new hardware is 359µ!

24

Page 25: SSDs at Etsy - Percona...Why Are We Here? What is this talk about? • Evolution of sharded databases at Etsy • What problems we faced along the way • How SSDs made everything

Shards v3 - Graphs - Other Wins

What kind of wins do we see by reducing hardware counts so significantly? • 24k watts of power down to 8k watts of power • Apparently it uses a lot of power to keep disks spinning

Backup Times Improved • New servers had 10gbit NICs • Shuffled the backup servers around to eliminate port congestion • 150mb throttle to no throttle • 9 hours to 1 hour for backups!

Management of Servers Greatly Improved • Upgraded to MySQL 5.6 in a week • Top level masters were only 2 days

25

Page 26: SSDs at Etsy - Percona...Why Are We Here? What is this talk about? • Evolution of sharded databases at Etsy • What problems we faced along the way • How SSDs made everything

Lessons Learned

1. Planning Gives You Confidence 2. Team Smart vs You Smart 3. Estimating Scaling Can Be Tricky 4. Learn How to Performance Test Disks 5. Don’t Fear Large Change 6. Monitor Write Endurance 7. Graph Disk Performance

26

Page 27: SSDs at Etsy - Percona...Why Are We Here? What is this talk about? • Evolution of sharded databases at Etsy • What problems we faced along the way • How SSDs made everything

PERCONA LIVE EUROPE: AMSTERDAM

OCTOBER 5, 2016

Jeremy Tinley, Senior MySQL Operations Engineer

Twitter: @techwolf359

SSDs at Etsy: A War Story

27


Recommended