C* Summit EU 2013: Cassandra on Flash: Performance & Efficiency Lessons Learned

transcript

Cassandra: No Moving PartsCassandra on Flash Memory

Matt Kennedy

(@mattmorefaster)

October 17, 2013

What is this talk about?

▸Efficiency• Definition:

noun 1. The state or quality of being efficient.

▸Efficient• Definition:

adjective 1. (especially of a system of machine) achieving maximum productivity with minimum wasted effort or expense

#CassandraEU2

#CassandraEU3

Flash vs Disk Cost Efficiency

▸Capacity

▸ IOPS

▸Cost per IOP

4TB3TB

150 200,000

$$$$¢¢¢¢

#CassandraEU4

What is flash?

NAND Flash Memory

#CassandraEU5

Flash is a persistent memory technology invented by Dr. Fujio Masuoka at Toshiba in 1980.

BitLine

Source Line Word Line

Control Gate

Float Gate

#CassandraEU6

Consumer Volume Drives Economics

#CassandraEU7

Flash in Servers

#CassandraEU8

Direct Cut Through Architecture

Host CPU

LEGACY APPROACH FUSION DIRECT APPROACH

Data path Controller

Host CPU

RAIDController

Goal of every I/O operation to move data to/from DRAM and flash.

Super Capacitors

#CassandraEU9

#CassandraEU10

Cassandra I/O - Writes

http://www.datastax.com/docs/1.2/dml/about_writes

#CassandraEU11

Cassandra I/O - Reads

http://www.datastax.com/docs/1.2/dml/about_reads

#CassandraEU12

DRAM Dictates Cassandra Scaling

▸Key Design Principle:

▸Working Set < DRAM

#CassandraEU13

SCost of DRAM Modules

4 G B 8 G B 1 6 G B 3 2 G B0

$ $$$$$

$$$$$$

#CassandraEU14

When do we scale out?

▸A typical server…

CPU Cores: 32 with HTMemory: 128 GB

…is your working set > 128GB?

#CassandraEU15

Is there a better way?

▸With NoSQL Databases, we tend to scale out for DRAM

Combined ResourcesCPU Cores: 192Memory: 768 GB

• Low CPU utilization

• High Utility cost

#CassandraEU16

Flash Offers A New Architectural Choice

Milliseconds 10-3 Microseconds 10-6

Nanoseconds 10-9

CPU Cache DRAM

Disk Drives

Server-based Flash

#CassandraEU17

How can we useflash in Cassandra?

Four Deployment Options

1. All Flash

2. Data Placement (CASSANDRA-2749)

3. Use Logical Data Centers

4. Cache Layer

#CassandraEU

Cassandra with All-Flash Storage

#CassandraEU

Step 1: Mount ioMemory at /var/lib/cassandraStep 2:

Data Placement

▸ https://issues.apache.org/jira/browse/CASSANDRA-2749• Thanks Marcus!

▸Takes advantage of filesystem hierarchy

▸Use mount points to pin Keyspaces or Column Families to flash:• /var/lib/cassandra/data/{Keyspace}/{CF}

▸Use flash for high performance needs, disk for capacity needs

#CassandraEU

Data Centers for Storage Control

DC1(Interactive requests)

DC3(High density replicas)

DC2(Hadoop MR Jobs)

PERFORMANCE

CAPACITY/NODE

MEDIUM

Cassandra cluster

#CassandraEU

Flash Caching

▸Use Flash to cache blocks from spinning disk• Larger cheaper caches than DRAM• Helps stabilize performance during compaction

▸Open-Source & Commercial options:• Flashcache: FB developed write-through/back/around cache▸ Kernel patch▸ https://github.com/facebook/flashcache/

• bcache: write-through/back/around cache▸ Kernel patch▸ http://bcache.evilpiepirate.org/

• Fusion ioTurbine: write-through, commercially supported

#CassandraEU22

23 #CassandraEU

The Numbers

YCSB Testing Setup

#CassandraEU

YCSB Load Generator

10GB 16-cores24GB DRAM

Workloads use uniformrandom key selectioninstead of Zipfian.

150 million 1KB records, RF=3: ~ 120GB SSTables/node

50/50 R/W Uniform distribution 10hrs

#CassandraEU

mixed ops/sec

Update LatencyAverage: 511 µs95th Pctl:1 ms99th Pctl: 2 ms

Read LatencyAverage: 7.0 ms95th Pctl: 18 ms99th Pctl: 42 ms

95/5 R/W Uniform distribution

#CassandraEU

75 threads 200 threads 300 threads

# threads Avg Lat. 95th pctl 99th pctl

75 1.4/0.22 ms

2/0 ms 5/0 ms

200 3.1/0.19 ms

7/0 ms 13/0 ms

300 4.4/2.2 ms 11/0 ms 19/0 ms

#CassandraEU27

Consolidation

#CassandraEU28

http://techblog.netflix.com/2012/07/benchmarking-high-performance-io-with.html

#CassandraEU29

Real-World Cassandra on Fusion

• 3-4x consolidation factor• 3-6x reduction in latency• 2.2x ROI

#CassandraEU30

Efficiency: Performance or Consolidation?

Cassandra @ ~100,000 ops/sec (mixed workload)

Memory/DiskioMemory

http://www.fusionio.com/white-papers/accelerate-cassandra-without-the-cluster-crawl/

Thank You

f u s i o n i o . c o m | S A M E P L A N E T. D I F F E R E N T W O R L D .

@mattmorefaster

April 11, 2023

#Cassandra1332

Cassandra: ioDrive2 vs 10 disk RAID-0

#Cassandra1333

50/50 R/W Uniform distribution

April 11, 2023

100000

120000

mixed ops/sec

Update LatencyAverage: 311 µs95th Pctl:0 ms99th Pctl: 1 ms

Read LatencyAverage: 8.2 ms95th Pctl: 20 ms99th Pctl: 62 ms

YCSB: Bulk Load (CL=ALL)

#CassandraEU

1 0 1 5 0 2 9 0 4 3 0 5 7 0 7 1 0 8 5 0 9 9 0 1 1 3 0 1 2 7 0 1 4 1 0 1 5 5 0 1 6 9 0 1 8 3 0 1 9 7 0 2 1 1 0 2 2 5 0 2 3 9 0 2 5 3 0 2 6 7 0 2 8 1 00

inserts/sec

Avg Latency: 0.9 ms95th Percentile: 1 ms99th Percentile: 4 ms

C* Summit EU 2013: Cassandra on Flash: Performance & Efficiency Lessons Learned

Technology