+ All Categories
Home > Technology > Scylla Summit 2017: A Toolbox for Understanding Scylla in the Field

Scylla Summit 2017: A Toolbox for Understanding Scylla in the Field

Date post: 22-Jan-2018
Category:
Upload: scylladb
View: 296 times
Download: 2 times
Share this document with a friend
31
PRESENTATION TITLE ON ONE LINE AND ON TWO LINES First and last name Position, company Scylla Performance Toolbox ScyllaDB Avi Kivity
Transcript
Page 1: Scylla Summit 2017: A Toolbox for Understanding Scylla in the Field

PRESENTATION TITLE ON ONE LINE AND ON TWO LINES

First and last namePosition, company

Scylla Performance Toolbox

ScyllaDB

Avi Kivity

Page 2: Scylla Summit 2017: A Toolbox for Understanding Scylla in the Field

PRESENTATION TITLE ON ONE LINE AND ON TWO LINES

First and last namePosition, company

Understanding environment and application impact

on performance

CTO, ScyllaDB

Avi Kivity

Page 3: Scylla Summit 2017: A Toolbox for Understanding Scylla in the Field

PRESENTATION TITLE ON ONE LINE AND ON TWO LINES

First and last namePosition, company

Avi Kivity

3

KVM hypervisor author and ex-maintainer

ScyllaDB co-founder and CTO

Page 4: Scylla Summit 2017: A Toolbox for Understanding Scylla in the Field

PRESENTATION TITLE ON ONE LINE AND ON TWO LINES

First and last namePosition, company

Agenda

4

▪ Environment

▪ Tracing

▪ Metrics

Page 5: Scylla Summit 2017: A Toolbox for Understanding Scylla in the Field

PRESENTATION TITLE ON ONE LINE AND ON TWO LINES

First and last namePosition, company

Environment

Page 6: Scylla Summit 2017: A Toolbox for Understanding Scylla in the Field

PRESENTATION TITLE ON ONE LINE AND ON TWO LINES

First and last namePosition, company

Environment

▪ Networking

▪ Disk interrupts

▪ Disk write cache

▪ Virtualization and containers

6

Page 7: Scylla Summit 2017: A Toolbox for Understanding Scylla in the Field

PRESENTATION TITLE ON ONE LINE AND ON TWO LINES

First and last namePosition, company

Networking model (multiqueue)

7

NIC

OS/HW

Core Core Core Core Core Core

Rx Queue

Page 8: Scylla Summit 2017: A Toolbox for Understanding Scylla in the Field

PRESENTATION TITLE ON ONE LINE AND ON TWO LINES

First and last namePosition, company

Networking model (singlequeue)

8

NIC

OS/HW

Core Core Core Core Core Core

Rx QueueS/W Rx Queue

Page 9: Scylla Summit 2017: A Toolbox for Understanding Scylla in the Field

PRESENTATION TITLE ON ONE LINE AND ON TWO LINES

First and last namePosition, company

Networking model (hybrid)

▪ Each core group is assigned a single hardware queue

▪ One core in core group handles networking

▪ Useful when too few hardware queues

▪ Too difficult to draw

9

Page 10: Scylla Summit 2017: A Toolbox for Understanding Scylla in the Field

PRESENTATION TITLE ON ONE LINE AND ON TWO LINES

First and last namePosition, company

How is the networking model configured?

▪ Determined by scylla_setup based on the hardware

▪ Stored in /etc/scylla.d/perftune.yaml

10

$ cat /etc/scylla.d/perftune.yaml cpu_mask: '0x000000ff'mode: mqnic: eth0tune:- net

Page 11: Scylla Summit 2017: A Toolbox for Understanding Scylla in the Field

PRESENTATION TITLE ON ONE LINE AND ON TWO LINES

First and last namePosition, company

Unbalanced networking

top - 11:40:29 up 3 min, 1 user, load average: 4.48, 4.36, 3.16

Tasks: 152 total, 8 running, 151 sleeping, 0 stopped, 0 zombie

%Cpu0 : 34.3 us, 17.0 sy, 0.0 ni, 0.0 id, 0.0 wa, 6.1 hi, 42.6 si, 0.0 st

%Cpu1 : 33.0 us, 5.0 sy, 0.0 ni, 59.1 id, 0.0 wa, 0.6 hi, 2.3 si, 0.0 st

%Cpu2 : 40.3 us, 4.3 sy, 0.0 ni, 52.2 id, 0.0 wa, 0.1 hi, 3.1 si, 0.0 st

%Cpu3 : 37.3 us, 5.7 sy, 0.0 ni, 54.7 id, 0.0 wa, 0.0 hi, 2.3 si, 0.0 st

%Cpu4 : 31.0 us, 4.3 sy, 0.0 ni, 61.8 id, 0.0 wa, 0.2 hi, 2.7 si, 0.0 st

%Cpu5 : 41.3 us, 5.3 sy, 0.0 ni, 49.8 id, 0.0 wa, 0.1 hi, 3.5 si, 0.0 st

%Cpu6 : 31.0 us, 4.3 sy, 0.0 ni, 62.7 id, 0.0 wa, 0.0 hi, 2.0 si, 0.0 st

%Cpu7 : 34.0 us, 2.3 sy, 0.0 ni, 59.4 id, 0.0 wa, 0.2 hi, 4.1 si, 0.0 st

KiB Mem : 62882836 total, 61356464 free, 1129072 used, 397300 buff/cache

KiB Swap: 0 total, 0 free, 0 used. 61124456 avail Mem

11

Page 12: Scylla Summit 2017: A Toolbox for Understanding Scylla in the Field

PRESENTATION TITLE ON ONE LINE AND ON TWO LINES

First and last namePosition, company

Disk write cache - write back cache

Write-back cache

▪ Scylla writes to disk

▪ Disk places data in DRAM cache, and acknowledges

▪ Disk initiates data write to actual SSD in background

▪ Scylla asks disk to verify that the data made it to non-volatile

storage

▪ Disk waits until background write completeso Potential stall

12

Page 13: Scylla Summit 2017: A Toolbox for Understanding Scylla in the Field

PRESENTATION TITLE ON ONE LINE AND ON TWO LINES

First and last namePosition, company

STALL

Disk write cache - write back

13

Scylla

Disk controller

Media

Write

Media access

FlushACK

Media accesscomplete

ACK

Page 14: Scylla Summit 2017: A Toolbox for Understanding Scylla in the Field

PRESENTATION TITLE ON ONE LINE AND ON TWO LINES

First and last namePosition, company

Disk write cache - write back cache

Write-back cache

▪ Scylla writes to disk

▪ Disk places data in DRAM cache, and acknowledges

▪ Disk initiates data write to actual SSD in background

▪ Scylla asks disk to verify that the data made it to non-volatile

storage

▪ Disk does not wait until background write completeso No stall

14

Page 15: Scylla Summit 2017: A Toolbox for Understanding Scylla in the Field

PRESENTATION TITLE ON ONE LINE AND ON TWO LINES

First and last namePosition, company

Disk write cache - write back

15

Scylla

Disk controller

Media

Write

Media access

FlushACK

Media accesscomplete

ACK

Page 16: Scylla Summit 2017: A Toolbox for Understanding Scylla in the Field

PRESENTATION TITLE ON ONE LINE AND ON TWO LINES

First and last namePosition, company

Beware of iowait

▪ iowait caused by pushing XFS out of its comfort zone

16

top - 11:40:29 up 3 min, 1 user, load average: 4.48, 4.36, 3.16

Tasks: 152 total, 8 running, 151 sleeping, 0 stopped, 0 zombie

%Cpu0 : 34.1 us, 10.2 sy, 0.0 ni, 0.0 id, 47.0 wa, 6.1 hi, 2.6 si, 0.0 st

Page 17: Scylla Summit 2017: A Toolbox for Understanding Scylla in the Field

PRESENTATION TITLE ON ONE LINE AND ON TWO LINES

First and last namePosition, company

Tracing

Page 18: Scylla Summit 2017: A Toolbox for Understanding Scylla in the Field

PRESENTATION TITLE ON ONE LINE AND ON TWO LINES

First and last namePosition, company

Types of tracing

▪ Single-shot

▪ Probabilistic

▪ Slow query

18

Page 19: Scylla Summit 2017: A Toolbox for Understanding Scylla in the Field

PRESENTATION TITLE ON ONE LINE AND ON TWO LINES

First and last namePosition, company

Single-shot tracing

▪ Useful for gaining an understanding of a query during

development

▪ Issue from cqlsh

19

Page 20: Scylla Summit 2017: A Toolbox for Understanding Scylla in the Field

PRESENTATION TITLE ON ONE LINE AND ON TWO LINES

First and last namePosition, company

Probabilistic tracing

▪ Useful to gain an insight about what the application is doing

▪ Controlled by nodetool

▪ Start with very low probability to avoid disturbing the workload

20

$ nodetool settraceprobability 0.000001

Page 21: Scylla Summit 2017: A Toolbox for Understanding Scylla in the Field

PRESENTATION TITLE ON ONE LINE AND ON TWO LINES

First and last namePosition, company

Slow-query logging

▪ Catch that long (and slow) tail

▪ Caution: a slow query can interfere with fast queries

21

Page 22: Scylla Summit 2017: A Toolbox for Understanding Scylla in the Field

PRESENTATION TITLE ON ONE LINE AND ON TWO LINES

First and last namePosition, company

Metrics

Page 23: Scylla Summit 2017: A Toolbox for Understanding Scylla in the Field

PRESENTATION TITLE ON ONE LINE AND ON TWO LINES

First and last namePosition, company

Metrics overview

▪ Aggregated vs. Shard metrics

▪ CPU metrics

▪ I/O metrics

▪ Coordinator-side metrics

▪ Replica-side metrics

23

Page 24: Scylla Summit 2017: A Toolbox for Understanding Scylla in the Field

PRESENTATION TITLE ON ONE LINE AND ON TWO LINES

First and last namePosition, company

Zooming into aggregated metrics

▪ Start with cluster-level view

▪ Look at individual nodeso Cluster runs at speed of slowest node

▪ Look at individual shardso Node runs at speed of slowest shard

24

Page 25: Scylla Summit 2017: A Toolbox for Understanding Scylla in the Field

PRESENTATION TITLE ON ONE LINE AND ON TWO LINES

First and last namePosition, company

CPU metrics

▪ Utilization / loado For throughput load, should achieve 100%o If not

• Does one shard reach 100% and the others don’t?

– Hot partition– Check networking environment

• Sufficient client concurrency?

25

Page 26: Scylla Summit 2017: A Toolbox for Understanding Scylla in the Field

PRESENTATION TITLE ON ONE LINE AND ON TWO LINES

First and last namePosition, company

I/O Queue metrics

I/O by type of operation: query, compaction, commitlog

▪ Bandwidth, IOPS (and average size)

▪ Delay

▪ Correlates with iostat command output

26

Page 27: Scylla Summit 2017: A Toolbox for Understanding Scylla in the Field

PRESENTATION TITLE ON ONE LINE AND ON TWO LINES

First and last namePosition, company

Coordinator-side metrics

▪ CQL requests per second

▪ CQL connections and their distribution

o High connection open rate?o Sufficient connections per shard?o Bad connection distribution?

▪ Statements prepared

o Is the client using prepared statements correctly?▪ Foreground reads and writes

▪ Background reads and writes

▪ Reconciliation

27

Page 28: Scylla Summit 2017: A Toolbox for Understanding Scylla in the Field

PRESENTATION TITLE ON ONE LINE AND ON TWO LINES

First and last namePosition, company

Replica-side metrics

▪ Reads and writes - hot shard, hot node

▪ Cache hits/misses - compare with expectations

▪ Cache total memory - watch for sudden drops

▪ Active SSTable reads - high value indicates weak I/O

▪ Queued SSTable reads - high value indicates weak I/O

▪ Current compactions

28

Page 29: Scylla Summit 2017: A Toolbox for Understanding Scylla in the Field

PRESENTATION TITLE ON ONE LINE AND ON TWO LINES

First and last namePosition, company

Summary

Page 30: Scylla Summit 2017: A Toolbox for Understanding Scylla in the Field

PRESENTATION TITLE ON ONE LINE AND ON TWO LINES

First and last namePosition, company

Summary

▪ Many moving parts

▪ Despite automation, things can go wrong

▪ Application may get things wrong

▪ Need combination of methodical approach and intuition

▪ Engage the developers so we can improve things

30

Page 31: Scylla Summit 2017: A Toolbox for Understanding Scylla in the Field

PRESENTATION TITLE ON ONE LINE AND ON TWO LINES

First and last namePosition, company

THANK YOU

[email protected]

@AviKivity

Please stay in touch

Any questions?


Recommended