+ All Categories
Home > Engineering > "Metrics: Where and How", Vsevolod Polyakov

"Metrics: Where and How", Vsevolod Polyakov

Date post: 16-Apr-2017
Category:
Upload: yulia-shcherbachova
View: 697 times
Download: 8 times
Share this document with a friend
65
Metrics: where and how graphite-oriented story
Transcript
Page 1: "Metrics: Where and How", Vsevolod Polyakov

Metrics: where and howgraphite-oriented story

Page 2: "Metrics: Where and How", Vsevolod Polyakov

• Vsevolod Polyakov

• Platform Engineer at Grammarly

Page 3: "Metrics: Where and How", Vsevolod Polyakov

GraphiteAll whisper-based systems

Page 4: "Metrics: Where and How", Vsevolod Polyakov

Default graphite architecture

Page 5: "Metrics: Where and How", Vsevolod Polyakov

what?• RRD-like (gram.ly/gfsx)

• so.it.is.my.metric → /so/it/is/my/metric.wsp

• Fixed retention (by name\pattern)

• Fixed size (actually no)

Page 6: "Metrics: Where and How", Vsevolod Polyakov

Retention and size• 1s:1d → 1 036 828 bytes

• 10s:10d → 1 036 828 bytes

• 1s:365d → 378 432 028 bytes (1 TB ~ 3 000)

• 10s:365d → 37 843 228 bytes (1 TB ~ 30 000)

whisper calc

Page 7: "Metrics: Where and How", Vsevolod Polyakov

Retention and size• 10s:30d,1m:120d,10m:365d → 4 564 864 bytes

• 240 864 metrics in 1 TB

• aggregation: average, sum, min, max, and last.

• can be assign per metric

Page 8: "Metrics: Where and How", Vsevolod Polyakov

How• terraform (https://www.terraform.io/)

• docker (https://www.docker.com/)

• ansible (https://www.ansible.com/)

• rocker (https://github.com/grammarly/rocker)

• rocker-compose (https://github.com/grammarly/rocker-compose)

Page 9: "Metrics: Where and How", Vsevolod Polyakov

Default graphite architecture

Page 10: "Metrics: Where and How", Vsevolod Polyakov

carbon-cache.py

• single-core

• many options in config file

• default

link

Page 11: "Metrics: Where and How", Vsevolod Polyakov

architecturecarbon-cache.py

Page 12: "Metrics: Where and How", Vsevolod Polyakov

Start load testing• m4.xlarge instance (4 CPU, 16 GB ram, 256 GB disk EBS gp2)

• retentions = 1s:1d

• MAX_CACHE_SIZE, MAX_UPDATES_PER_SECOND, MAX_CREATES_PER_MINUTE = inf

• defaults

• almost 1.5h to get limit :(

Page 13: "Metrics: Where and How", Vsevolod Polyakov

carbon-cache.py cache size → 75k req\s

Page 14: "Metrics: Where and How", Vsevolod Polyakov
Page 15: "Metrics: Where and How", Vsevolod Polyakov
Page 16: "Metrics: Where and How", Vsevolod Polyakov

results

• 75 000 req\s max

• 60 000 req\s flagman speed

• I\O :(

Page 17: "Metrics: Where and How", Vsevolod Polyakov

Try to tune!

• WHISPER_SPARSE_CREATE = true (don’t allocate space on creation) non-linear I\O load.

• CACHE_WRITE_STRATEGY = sorted (default)

Page 18: "Metrics: Where and How", Vsevolod Polyakov

cache size 1k → 195k req\s

Page 19: "Metrics: Where and How", Vsevolod Polyakov

results

• 120 000 req\s flagman speed • cache flush problem :(

Page 20: "Metrics: Where and How", Vsevolod Polyakov

Try to tune!

• CACHE_WRITE_STRATEGY = max will give a strong flush preference to frequently updated metrics and will also reduce random file-io.

Page 21: "Metrics: Where and How", Vsevolod Polyakov

from 1k to 150k

Page 22: "Metrics: Where and How", Vsevolod Polyakov

results

• 90 000 req\s flagman speed • cache flush problem :(

Page 23: "Metrics: Where and How", Vsevolod Polyakov

Try to tune!

• CACHE_WRITE_STRATEGY = naive just flush. Better with random I\O.

Page 24: "Metrics: Where and How", Vsevolod Polyakov

from 45k to 135k

Page 25: "Metrics: Where and How", Vsevolod Polyakov

results

• 120 000 req\s flagman speed • still CPU

Page 26: "Metrics: Where and How", Vsevolod Polyakov

sorted

max

naive

Page 27: "Metrics: Where and How", Vsevolod Polyakov

• Maybe it’s I\O EBS limitation? → 512 GB disk.

• No.

Page 28: "Metrics: Where and How", Vsevolod Polyakov

go-carbon

• multi-core single daemon

• written in golang

• not many options to tune :(

link

Page 29: "Metrics: Where and How", Vsevolod Polyakov

Start load testing• m4.xlarge instance (4 CPU, 16 GB ram, 256 GB disk EBS gp2)

• retentions = 1s:1d

• max-size = 0

• max-updates-per-second = 0

• almost 1h to get limit :(

Page 30: "Metrics: Where and How", Vsevolod Polyakov

1k → 130k req\s ~3k/min

Page 31: "Metrics: Where and How", Vsevolod Polyakov
Page 32: "Metrics: Where and How", Vsevolod Polyakov

results• 120 000 req\s flagman speed • but it’s without sparse. • try to implement

Page 33: "Metrics: Where and How", Vsevolod Polyakov

try to tune! remaining := whisper.Size() - whisper.MetadataSize() whisper.file.Seek(int64(remaining-1), 0) whisper.file.Write([]byte{0}) chunkSize := 16384 zeros := make([]byte, chunkSize) for remaining > chunkSize { // if _, err = whisper.file.Write(zeros); err != nil { // return nil, err // } remaining -= chunkSize } if _, err = whisper.file.Write(zeros[:remaining]); err != nil { return nil, err }

Page 34: "Metrics: Where and How", Vsevolod Polyakov

180 000 req\s !

Page 35: "Metrics: Where and How", Vsevolod Polyakov
Page 36: "Metrics: Where and How", Vsevolod Polyakov

try to tune!

• max update operation = 1500

Page 37: "Metrics: Where and How", Vsevolod Polyakov

results

• TLDR 210 000 - 240 000 req\s flagman speed

• 31 000 000 cache size!

Page 38: "Metrics: Where and How", Vsevolod Polyakov
Page 39: "Metrics: Where and How", Vsevolod Polyakov

try to tune!

• max update operation = 0

• input-buffer = 400 000

Page 40: "Metrics: Where and How", Vsevolod Polyakov

results

• 270 000 req\s flagman speed

• 10-20 million req cache size!

Page 41: "Metrics: Where and How", Vsevolod Polyakov
Page 42: "Metrics: Where and How", Vsevolod Polyakov

try to tune!

• vm.dirty_background_ratio=40

• vm.dirty_ratio=60

Page 43: "Metrics: Where and How", Vsevolod Polyakov

300 000 req\s

Page 44: "Metrics: Where and How", Vsevolod Polyakov

results

• 300 000 req\s flagman speed

• 180k+ req\s ±without cache

Page 45: "Metrics: Where and How", Vsevolod Polyakov

Re:Lays

Page 46: "Metrics: Where and How", Vsevolod Polyakov

Default graphite architecture

Page 47: "Metrics: Where and How", Vsevolod Polyakov

arch forward

Page 48: "Metrics: Where and How", Vsevolod Polyakov

arch named\regexp

Page 49: "Metrics: Where and How", Vsevolod Polyakov

arch hash

Page 50: "Metrics: Where and How", Vsevolod Polyakov

arch hash replicafactor: 2

Page 51: "Metrics: Where and How", Vsevolod Polyakov

carbon-relay.py

• twisted based

• native

Page 52: "Metrics: Where and How", Vsevolod Polyakov

Start load testing• c4.xlarge instance (4 CPU, 7.5 GB ram)

• ~1 Gb lan

• default parameters

• hashing

• 10 connections

Page 53: "Metrics: Where and How", Vsevolod Polyakov

WTF!

Page 54: "Metrics: Where and How", Vsevolod Polyakov

carbon-relay-ng• golang-based

• web-panel

• live-updates

• aggregators

• spooling

link

Page 55: "Metrics: Where and How", Vsevolod Polyakov

<150 000 req\s

Page 56: "Metrics: Where and How", Vsevolod Polyakov

carbon-c-relay

• written in C

• advanced cluster management

Page 57: "Metrics: Where and How", Vsevolod Polyakov

from 100 000 to 1 600 000 req\s

Page 58: "Metrics: Where and How", Vsevolod Polyakov

1 400 000 flagman speed. Or not?

Page 59: "Metrics: Where and How", Vsevolod Polyakov

So…go-carbon + carbon-c-relay = ♡

Page 60: "Metrics: Where and How", Vsevolod Polyakov

BTW. influx, 130k req\s on cluster

Page 61: "Metrics: Where and How", Vsevolod Polyakov

influx

Page 62: "Metrics: Where and How", Vsevolod Polyakov

openTSDB single instance + hbase cluster = upto 150k req\s

Page 63: "Metrics: Where and How", Vsevolod Polyakov

ALSO• zipper:

• https://github.com/grobian/carbonserver

• https://github.com/grobian/carbonwriter

• https://github.com/dgryski/carbonzipper

• https://github.com/dgryski/carbonapi

• https://github.com/dgryski/carbonmem

• https://github.com/jssjr/carbonate

Page 64: "Metrics: Where and How", Vsevolod Polyakov

plans

• Cyanite, retest

• newTS

• openTSDB tuninig

• zipper tuning

Page 65: "Metrics: Where and How", Vsevolod Polyakov

feel free to ask• Vsevolod Polyakov

[email protected]

• skype: ctrlok1987

• github.com/ctrlok

• twitter.com/ctrlok

• slack: HangOps

• Gitter: dev_ua/devops

• skype: DevOps from Ukraine


Recommended