Date post: | 22-Jan-2018 |
Category: |
Technology |
Upload: | scylladb |
View: | 296 times |
Download: | 2 times |
PRESENTATION TITLE ON ONE LINE AND ON TWO LINES
First and last namePosition, company
Scylla Performance Toolbox
ScyllaDB
Avi Kivity
PRESENTATION TITLE ON ONE LINE AND ON TWO LINES
First and last namePosition, company
Understanding environment and application impact
on performance
CTO, ScyllaDB
Avi Kivity
PRESENTATION TITLE ON ONE LINE AND ON TWO LINES
First and last namePosition, company
Avi Kivity
3
KVM hypervisor author and ex-maintainer
ScyllaDB co-founder and CTO
PRESENTATION TITLE ON ONE LINE AND ON TWO LINES
First and last namePosition, company
Agenda
4
▪ Environment
▪ Tracing
▪ Metrics
PRESENTATION TITLE ON ONE LINE AND ON TWO LINES
First and last namePosition, company
Environment
PRESENTATION TITLE ON ONE LINE AND ON TWO LINES
First and last namePosition, company
Environment
▪ Networking
▪ Disk interrupts
▪ Disk write cache
▪ Virtualization and containers
6
PRESENTATION TITLE ON ONE LINE AND ON TWO LINES
First and last namePosition, company
Networking model (multiqueue)
7
NIC
OS/HW
Core Core Core Core Core Core
Rx Queue
PRESENTATION TITLE ON ONE LINE AND ON TWO LINES
First and last namePosition, company
Networking model (singlequeue)
8
NIC
OS/HW
Core Core Core Core Core Core
Rx QueueS/W Rx Queue
PRESENTATION TITLE ON ONE LINE AND ON TWO LINES
First and last namePosition, company
Networking model (hybrid)
▪ Each core group is assigned a single hardware queue
▪ One core in core group handles networking
▪ Useful when too few hardware queues
▪ Too difficult to draw
9
PRESENTATION TITLE ON ONE LINE AND ON TWO LINES
First and last namePosition, company
How is the networking model configured?
▪ Determined by scylla_setup based on the hardware
▪ Stored in /etc/scylla.d/perftune.yaml
10
$ cat /etc/scylla.d/perftune.yaml cpu_mask: '0x000000ff'mode: mqnic: eth0tune:- net
PRESENTATION TITLE ON ONE LINE AND ON TWO LINES
First and last namePosition, company
Unbalanced networking
top - 11:40:29 up 3 min, 1 user, load average: 4.48, 4.36, 3.16
Tasks: 152 total, 8 running, 151 sleeping, 0 stopped, 0 zombie
%Cpu0 : 34.3 us, 17.0 sy, 0.0 ni, 0.0 id, 0.0 wa, 6.1 hi, 42.6 si, 0.0 st
%Cpu1 : 33.0 us, 5.0 sy, 0.0 ni, 59.1 id, 0.0 wa, 0.6 hi, 2.3 si, 0.0 st
%Cpu2 : 40.3 us, 4.3 sy, 0.0 ni, 52.2 id, 0.0 wa, 0.1 hi, 3.1 si, 0.0 st
%Cpu3 : 37.3 us, 5.7 sy, 0.0 ni, 54.7 id, 0.0 wa, 0.0 hi, 2.3 si, 0.0 st
%Cpu4 : 31.0 us, 4.3 sy, 0.0 ni, 61.8 id, 0.0 wa, 0.2 hi, 2.7 si, 0.0 st
%Cpu5 : 41.3 us, 5.3 sy, 0.0 ni, 49.8 id, 0.0 wa, 0.1 hi, 3.5 si, 0.0 st
%Cpu6 : 31.0 us, 4.3 sy, 0.0 ni, 62.7 id, 0.0 wa, 0.0 hi, 2.0 si, 0.0 st
%Cpu7 : 34.0 us, 2.3 sy, 0.0 ni, 59.4 id, 0.0 wa, 0.2 hi, 4.1 si, 0.0 st
KiB Mem : 62882836 total, 61356464 free, 1129072 used, 397300 buff/cache
KiB Swap: 0 total, 0 free, 0 used. 61124456 avail Mem
11
PRESENTATION TITLE ON ONE LINE AND ON TWO LINES
First and last namePosition, company
Disk write cache - write back cache
Write-back cache
▪ Scylla writes to disk
▪ Disk places data in DRAM cache, and acknowledges
▪ Disk initiates data write to actual SSD in background
▪ Scylla asks disk to verify that the data made it to non-volatile
storage
▪ Disk waits until background write completeso Potential stall
12
PRESENTATION TITLE ON ONE LINE AND ON TWO LINES
First and last namePosition, company
STALL
Disk write cache - write back
13
Scylla
Disk controller
Media
Write
Media access
FlushACK
Media accesscomplete
ACK
PRESENTATION TITLE ON ONE LINE AND ON TWO LINES
First and last namePosition, company
Disk write cache - write back cache
Write-back cache
▪ Scylla writes to disk
▪ Disk places data in DRAM cache, and acknowledges
▪ Disk initiates data write to actual SSD in background
▪ Scylla asks disk to verify that the data made it to non-volatile
storage
▪ Disk does not wait until background write completeso No stall
14
PRESENTATION TITLE ON ONE LINE AND ON TWO LINES
First and last namePosition, company
Disk write cache - write back
15
Scylla
Disk controller
Media
Write
Media access
FlushACK
Media accesscomplete
ACK
PRESENTATION TITLE ON ONE LINE AND ON TWO LINES
First and last namePosition, company
Beware of iowait
▪ iowait caused by pushing XFS out of its comfort zone
16
top - 11:40:29 up 3 min, 1 user, load average: 4.48, 4.36, 3.16
Tasks: 152 total, 8 running, 151 sleeping, 0 stopped, 0 zombie
%Cpu0 : 34.1 us, 10.2 sy, 0.0 ni, 0.0 id, 47.0 wa, 6.1 hi, 2.6 si, 0.0 st
PRESENTATION TITLE ON ONE LINE AND ON TWO LINES
First and last namePosition, company
Tracing
PRESENTATION TITLE ON ONE LINE AND ON TWO LINES
First and last namePosition, company
Types of tracing
▪ Single-shot
▪ Probabilistic
▪ Slow query
18
PRESENTATION TITLE ON ONE LINE AND ON TWO LINES
First and last namePosition, company
Single-shot tracing
▪ Useful for gaining an understanding of a query during
development
▪ Issue from cqlsh
19
PRESENTATION TITLE ON ONE LINE AND ON TWO LINES
First and last namePosition, company
Probabilistic tracing
▪ Useful to gain an insight about what the application is doing
▪ Controlled by nodetool
▪ Start with very low probability to avoid disturbing the workload
20
$ nodetool settraceprobability 0.000001
PRESENTATION TITLE ON ONE LINE AND ON TWO LINES
First and last namePosition, company
Slow-query logging
▪ Catch that long (and slow) tail
▪ Caution: a slow query can interfere with fast queries
21
PRESENTATION TITLE ON ONE LINE AND ON TWO LINES
First and last namePosition, company
Metrics
PRESENTATION TITLE ON ONE LINE AND ON TWO LINES
First and last namePosition, company
Metrics overview
▪ Aggregated vs. Shard metrics
▪ CPU metrics
▪ I/O metrics
▪ Coordinator-side metrics
▪ Replica-side metrics
23
PRESENTATION TITLE ON ONE LINE AND ON TWO LINES
First and last namePosition, company
Zooming into aggregated metrics
▪ Start with cluster-level view
▪ Look at individual nodeso Cluster runs at speed of slowest node
▪ Look at individual shardso Node runs at speed of slowest shard
24
PRESENTATION TITLE ON ONE LINE AND ON TWO LINES
First and last namePosition, company
CPU metrics
▪ Utilization / loado For throughput load, should achieve 100%o If not
• Does one shard reach 100% and the others don’t?
– Hot partition– Check networking environment
• Sufficient client concurrency?
25
PRESENTATION TITLE ON ONE LINE AND ON TWO LINES
First and last namePosition, company
I/O Queue metrics
I/O by type of operation: query, compaction, commitlog
▪ Bandwidth, IOPS (and average size)
▪ Delay
▪ Correlates with iostat command output
26
PRESENTATION TITLE ON ONE LINE AND ON TWO LINES
First and last namePosition, company
Coordinator-side metrics
▪ CQL requests per second
▪ CQL connections and their distribution
o High connection open rate?o Sufficient connections per shard?o Bad connection distribution?
▪ Statements prepared
o Is the client using prepared statements correctly?▪ Foreground reads and writes
▪ Background reads and writes
▪ Reconciliation
27
PRESENTATION TITLE ON ONE LINE AND ON TWO LINES
First and last namePosition, company
Replica-side metrics
▪ Reads and writes - hot shard, hot node
▪ Cache hits/misses - compare with expectations
▪ Cache total memory - watch for sudden drops
▪ Active SSTable reads - high value indicates weak I/O
▪ Queued SSTable reads - high value indicates weak I/O
▪ Current compactions
28
PRESENTATION TITLE ON ONE LINE AND ON TWO LINES
First and last namePosition, company
Summary
PRESENTATION TITLE ON ONE LINE AND ON TWO LINES
First and last namePosition, company
Summary
▪ Many moving parts
▪ Despite automation, things can go wrong
▪ Application may get things wrong
▪ Need combination of methodical approach and intuition
▪ Engage the developers so we can improve things
30
PRESENTATION TITLE ON ONE LINE AND ON TWO LINES
First and last namePosition, company
THANK YOU
@AviKivity
Please stay in touch
Any questions?