+ All Categories
Home > Documents > Analyze Prometheus Metrics Like a Data Scientist€¦ · Analyze Prometheus Metrics Like a Data...

Analyze Prometheus Metrics Like a Data Scientist€¦ · Analyze Prometheus Metrics Like a Data...

Date post: 04-Aug-2020
Category:
Upload: others
View: 6 times
Download: 0 times
Share this document with a friend
25
Analyze Prometheus Metrics Like a Data Scientist Georg Öttl Promcon 2017, Munich
Transcript
Page 1: Analyze Prometheus Metrics Like a Data Scientist€¦ · Analyze Prometheus Metrics Like a Data Scientist G e o r g Ö t t l P r o m c o n 2 0 1 7 , M u n i c h E n t e r p r i s

Analyze Prometheus Metrics Like a Data ScientistGeorg Öttl

Promcon 2017, Munich

Page 2: Analyze Prometheus Metrics Like a Data Scientist€¦ · Analyze Prometheus Metrics Like a Data Scientist G e o r g Ö t t l P r o m c o n 2 0 1 7 , M u n i c h E n t e r p r i s

● Enterprise Software Dev.

● Data Science Services

● Dev / DevOps / Ops

● Developer who likes Math

Twitter: @goettl

About me / experiences

Page 3: Analyze Prometheus Metrics Like a Data Scientist€¦ · Analyze Prometheus Metrics Like a Data Scientist G e o r g Ö t t l P r o m c o n 2 0 1 7 , M u n i c h E n t e r p r i s

Objective talk

Pushing the limits of prometheus: can I have a more reliablealerts model with insights from datasience?

● Journey on how to improve alerts / dashboards with insights from datasience

● Integration points to open source datasience tools

● Bring light into the dark (like prometheus did)

Page 4: Analyze Prometheus Metrics Like a Data Scientist€¦ · Analyze Prometheus Metrics Like a Data Scientist G e o r g Ö t t l P r o m c o n 2 0 1 7 , M u n i c h E n t e r p r i s

... should I?

Don't use deep learning and datasience when a straight-forward 15 minute rule-based system does well.

Datascience can help you to detect patterns and facts in yourmetrics you can't see.

Page 5: Analyze Prometheus Metrics Like a Data Scientist€¦ · Analyze Prometheus Metrics Like a Data Scientist G e o r g Ö t t l P r o m c o n 2 0 1 7 , M u n i c h E n t e r p r i s

What is already available. When do I start?● Great architecture to get high quality data

● Numerical data● Apply mathematical functions on it

● Easy and fast navigable (promql)

● Alert / rule model

● Chart / histogram vis with Grafana

Page 6: Analyze Prometheus Metrics Like a Data Scientist€¦ · Analyze Prometheus Metrics Like a Data Scientist G e o r g Ö t t l P r o m c o n 2 0 1 7 , M u n i c h E n t e r p r i s

Next step: get data out of prometheus... to be used in Open Source datascience tools

Page 7: Analyze Prometheus Metrics Like a Data Scientist€¦ · Analyze Prometheus Metrics Like a Data Scientist G e o r g Ö t t l P r o m c o n 2 0 1 7 , M u n i c h E n t e r p r i s

What data to export?● Raw metrics data, no functions applied on it

● As much as possible● Without putting too much load on prometheus / running into a timeout

Page 8: Analyze Prometheus Metrics Like a Data Scientist€¦ · Analyze Prometheus Metrics Like a Data Scientist G e o r g Ö t t l P r o m c o n 2 0 1 7 , M u n i c h E n t e r p r i s

Two ways to get data out of prometheus● HTTP API (Poll)

● Exploratory data analysis

● REMOTE API (Push)● Streaming analysis

Page 9: Analyze Prometheus Metrics Like a Data Scientist€¦ · Analyze Prometheus Metrics Like a Data Scientist G e o r g Ö t t l P r o m c o n 2 0 1 7 , M u n i c h E n t e r p r i s

HTTP API - /api/v1/query_rangerequests.get( url = 'http://127.0.0.1:9090/api/v1/query_range', params = { 'query': 'sum({__name__=~".+"}) by (__name__,instance)', 'start': '1502809554', 'end' : '1502839554', 'step' : '1m' })

{"data": {..., "resultType": "matrix","result": [{ "metric": {"method": "GET",...}, "values": [[1500008340,"3"], ... ]},...]}}

Page 10: Analyze Prometheus Metrics Like a Data Scientist€¦ · Analyze Prometheus Metrics Like a Data Scientist G e o r g Ö t t l P r o m c o n 2 0 1 7 , M u n i c h E n t e r p r i s

Target format for datascience tools (tabular, csv)X

id time value req_dur ...A 1 1 4 ...

A 2 2 5 ...

B 1 2 3 ...

B 2 3 2 ...

y

id time valueA 1 1

A 2 1

B 1 0

B 2 0

... ... ...

Page 11: Analyze Prometheus Metrics Like a Data Scientist€¦ · Analyze Prometheus Metrics Like a Data Scientist G e o r g Ö t t l P r o m c o n 2 0 1 7 , M u n i c h E n t e r p r i s

Easyiest way to export● Grafana

● Python (robustperception blog entry)

Page 12: Analyze Prometheus Metrics Like a Data Scientist€¦ · Analyze Prometheus Metrics Like a Data Scientist G e o r g Ö t t l P r o m c o n 2 0 1 7 , M u n i c h E n t e r p r i s

Reduce data: use domain knowledge to select relevant datasubset

{__name__=~".+"}

Page 13: Analyze Prometheus Metrics Like a Data Scientist€¦ · Analyze Prometheus Metrics Like a Data Scientist G e o r g Ö t t l P r o m c o n 2 0 1 7 , M u n i c h E n t e r p r i s

Tip: Use alerts as initial set of training labels

y = ALERTS{name="high_latency"}

tidy up, verify true positives, annotate manually, ...

Page 14: Analyze Prometheus Metrics Like a Data Scientist€¦ · Analyze Prometheus Metrics Like a Data Scientist G e o r g Ö t t l P r o m c o n 2 0 1 7 , M u n i c h E n t e r p r i s

Normalize prometheus datatypes● Gauges, histograms are ok

● Counters have to be processed● No repetition in counters. No statistical value in that.● Use e.g derivative function to convert a counter to a gauge equivalent

Page 15: Analyze Prometheus Metrics Like a Data Scientist€¦ · Analyze Prometheus Metrics Like a Data Scientist G e o r g Ö t t l P r o m c o n 2 0 1 7 , M u n i c h E n t e r p r i s

ExamplesApplied datasience on prometheus metrics

Page 16: Analyze Prometheus Metrics Like a Data Scientist€¦ · Analyze Prometheus Metrics Like a Data Scientist G e o r g Ö t t l P r o m c o n 2 0 1 7 , M u n i c h E n t e r p r i s

Example 1

I can predict the latency of http requests

● Can I use the prometheus function predict_linear?

● Are there other predictions possible?

↡↡ R Notebook predict_linear↡↡

Page 17: Analyze Prometheus Metrics Like a Data Scientist€¦ · Analyze Prometheus Metrics Like a Data Scientist G e o r g Ö t t l P r o m c o n 2 0 1 7 , M u n i c h E n t e r p r i s

Example 2

There are a better suited metrics to predict http5x failures thanthe one I use

Page 18: Analyze Prometheus Metrics Like a Data Scientist€¦ · Analyze Prometheus Metrics Like a Data Scientist G e o r g Ö t t l P r o m c o n 2 0 1 7 , M u n i c h E n t e r p r i s

Choose method

Page 19: Analyze Prometheus Metrics Like a Data Scientist€¦ · Analyze Prometheus Metrics Like a Data Scientist G e o r g Ö t t l P r o m c o n 2 0 1 7 , M u n i c h E n t e r p r i s

Get metrics into the right format for method● Training data with labels needed (X,y)

● Seasonally adjust

Page 20: Analyze Prometheus Metrics Like a Data Scientist€¦ · Analyze Prometheus Metrics Like a Data Scientist G e o r g Ö t t l P r o m c o n 2 0 1 7 , M u n i c h E n t e r p r i s

Apply feature selection algorithmfrom sklearn.feature_selection import RFEfrom sklearn.ensemble import RandomForestRegressor...# perform feature selectionrfe = RFE( RandomForestRegressor( n_estimators=500, random_state=1, min_samples_split=5 ), 1)fit = rfe.fit(X, y)...

Selected Feature: POST

Page 21: Analyze Prometheus Metrics Like a Data Scientist€¦ · Analyze Prometheus Metrics Like a Data Scientist G e o r g Ö t t l P r o m c o n 2 0 1 7 , M u n i c h E n t e r p r i s

Feedback cycle

Rewrite your alerts and dashboards to use label POST to betterpredict http 5x errors

Page 22: Analyze Prometheus Metrics Like a Data Scientist€¦ · Analyze Prometheus Metrics Like a Data Scientist G e o r g Ö t t l P r o m c o n 2 0 1 7 , M u n i c h E n t e r p r i s

Example 3 - metrics / feature selection with library tsfresh● Metrics selection / ranking similar to example 1

● Metrics extension by applying functions to metrics

https://github.com/blue-yonder/tsfresh

Page 23: Analyze Prometheus Metrics Like a Data Scientist€¦ · Analyze Prometheus Metrics Like a Data Scientist G e o r g Ö t t l P r o m c o n 2 0 1 7 , M u n i c h E n t e r p r i s

Prometheus datascience mantra

● Create hypothesis about your system and metrics

● Get metrics (devops) and convert them into the right format

● Use statistical methods to verify hypothesis

● Feedback results to system, the dashboards and alerts

Page 24: Analyze Prometheus Metrics Like a Data Scientist€¦ · Analyze Prometheus Metrics Like a Data Scientist G e o r g Ö t t l P r o m c o n 2 0 1 7 , M u n i c h E n t e r p r i s

Lessons learned● Alert model improves with insights from descriptive statistics and ML!

● Depending on the result, correct, discard or handle data differently

● Day to day usecase: e.g. reduced try and error config on predict_linear function

● No need to process metrics streaming with ML/AI yet

Page 25: Analyze Prometheus Metrics Like a Data Scientist€¦ · Analyze Prometheus Metrics Like a Data Scientist G e o r g Ö t t l P r o m c o n 2 0 1 7 , M u n i c h E n t e r p r i s

Thx for having me here at promcon.io 2017! Questions?

Georg Öttl Twitter Handle: @goettl


Recommended