+ All Categories
Home > Documents > Migrating to Prometheus at Fastly - PromCon NA 2020 · prometheus A prometheus B scrapes targets...

Migrating to Prometheus at Fastly - PromCon NA 2020 · prometheus A prometheus B scrapes targets...

Date post: 27-Jul-2020
Category:
Upload: others
View: 9 times
Download: 0 times
Share this document with a friend
36
Monitoring at Scale Migrating to Prometheus at Fastly PROMCON 2018 | Marcus Barczak @ickymettle
Transcript
Page 1: Migrating to Prometheus at Fastly - PromCon NA 2020 · prometheus A prometheus B scrapes targets SJC scrapes targets prometheus A prometheus B scrapes targets JFK scrapes targets

Monitoring at ScaleMigrating to Prometheus at Fastly

PROMCON 2018 | Marcus Barczak@ickymettle

Page 2: Migrating to Prometheus at Fastly - PromCon NA 2020 · prometheus A prometheus B scrapes targets SJC scrapes targets prometheus A prometheus B scrapes targets JFK scrapes targets
Page 3: Migrating to Prometheus at Fastly - PromCon NA 2020 · prometheus A prometheus B scrapes targets SJC scrapes targets prometheus A prometheus B scrapes targets JFK scrapes targets
Page 4: Migrating to Prometheus at Fastly - PromCon NA 2020 · prometheus A prometheus B scrapes targets SJC scrapes targets prometheus A prometheus B scrapes targets JFK scrapes targets
Page 5: Migrating to Prometheus at Fastly - PromCon NA 2020 · prometheus A prometheus B scrapes targets SJC scrapes targets prometheus A prometheus B scrapes targets JFK scrapes targets

How were we monitoring Fastly?

Page 6: Migrating to Prometheus at Fastly - PromCon NA 2020 · prometheus A prometheus B scrapes targets SJC scrapes targets prometheus A prometheus B scrapes targets JFK scrapes targets

+

Page 7: Migrating to Prometheus at Fastly - PromCon NA 2020 · prometheus A prometheus B scrapes targets SJC scrapes targets prometheus A prometheus B scrapes targets JFK scrapes targets

๏ Operational overhead.

๏ Limited graphing functions.

๏ No alerting support,

๏ No real API for consuming metric data.

Growing pains with Ganglia

Page 8: Migrating to Prometheus at Fastly - PromCon NA 2020 · prometheus A prometheus B scrapes targets SJC scrapes targets prometheus A prometheus B scrapes targets JFK scrapes targets

aaS

+

+

Page 9: Migrating to Prometheus at Fastly - PromCon NA 2020 · prometheus A prometheus B scrapes targets SJC scrapes targets prometheus A prometheus B scrapes targets JFK scrapes targets

๏ Now supporting two systems.

๏ Where do I put my metrics?

๏ Still writing external plugins and agents.

๏ Monitoring treated as a "post-release" phase.

Growing pains doubled

Page 10: Migrating to Prometheus at Fastly - PromCon NA 2020 · prometheus A prometheus B scrapes targets SJC scrapes targets prometheus A prometheus B scrapes targets JFK scrapes targets

Scaling our infrastructure horizontally

Required scaling our monitoring vertically

Page 11: Migrating to Prometheus at Fastly - PromCon NA 2020 · prometheus A prometheus B scrapes targets SJC scrapes targets prometheus A prometheus B scrapes targets JFK scrapes targets

Third time lucky

Page 12: Migrating to Prometheus at Fastly - PromCon NA 2020 · prometheus A prometheus B scrapes targets SJC scrapes targets prometheus A prometheus B scrapes targets JFK scrapes targets

๏ Scale with our infrastructure growth,

๏ Be easy to deploy and operate.

๏ Engineer friendly instrumentation libraries.

๏ First class API support for data access.

๏ To reinvigorate our monitoring culture.See: https://peter.bourgon.org/observability-the-hard-parts/

Project goals

Page 13: Migrating to Prometheus at Fastly - PromCon NA 2020 · prometheus A prometheus B scrapes targets SJC scrapes targets prometheus A prometheus B scrapes targets JFK scrapes targets

?

Page 14: Migrating to Prometheus at Fastly - PromCon NA 2020 · prometheus A prometheus B scrapes targets SJC scrapes targets prometheus A prometheus B scrapes targets JFK scrapes targets
Page 15: Migrating to Prometheus at Fastly - PromCon NA 2020 · prometheus A prometheus B scrapes targets SJC scrapes targets prometheus A prometheus B scrapes targets JFK scrapes targets

๏ Build a proof of concept.

๏ Pair with pilot team to instrument their services.

๏ Iterate through the rest.

๏ Run both systems in parallel.

๏ Decommission SaaS system and Ganglia.

Getting started

Page 16: Migrating to Prometheus at Fastly - PromCon NA 2020 · prometheus A prometheus B scrapes targets SJC scrapes targets prometheus A prometheus B scrapes targets JFK scrapes targets

Infrastructure build

Page 17: Migrating to Prometheus at Fastly - PromCon NA 2020 · prometheus A prometheus B scrapes targets SJC scrapes targets prometheus A prometheus B scrapes targets JFK scrapes targets

prometheus A prometheus B

scrapestargets

SJC

scrapestargets

Page 18: Migrating to Prometheus at Fastly - PromCon NA 2020 · prometheus A prometheus B scrapes targets SJC scrapes targets prometheus A prometheus B scrapes targets JFK scrapes targets

prometheus A prometheus B

scrapestargets

SJC

scrapestargets

prometheus A prometheus B

scrapestargets

JFK

scrapestargets

prometheus A prometheus B

scrapestargets

ATL

scrapestargets

Page 19: Migrating to Prometheus at Fastly - PromCon NA 2020 · prometheus A prometheus B scrapes targets SJC scrapes targets prometheus A prometheus B scrapes targets JFK scrapes targets

prometheus A prometheus B

scrapestargets

SJC

scrapestargets

prometheus A prometheus B

scrapestargets

JFK

scrapestargets

prometheus A prometheus B

scrapestargets

ATL

scrapestargets

GCP

federator A federator B

frontend stack

Page 20: Migrating to Prometheus at Fastly - PromCon NA 2020 · prometheus A prometheus B scrapes targets SJC scrapes targets prometheus A prometheus B scrapes targets JFK scrapes targets

prometheus A prometheus B

scrapestargets

SJC

scrapestargets

prometheus A prometheus B

scrapestargets

JFK

scrapestargets

prometheus A prometheus B

scrapestargets

ATL

scrapestargets

GCP

federator A federator B

frontend stack

Query Traffic (TLS)

Page 21: Migrating to Prometheus at Fastly - PromCon NA 2020 · prometheus A prometheus B scrapes targets SJC scrapes targets prometheus A prometheus B scrapes targets JFK scrapes targets

Prometheus Server Software Stack

Ghost Tunnel TLS termination and auth.

Service Discovery Sidecar Target configuration

Rules Loader Recording and Alert rules

Prometheus

Page 22: Migrating to Prometheus at Fastly - PromCon NA 2020 · prometheus A prometheus B scrapes targets SJC scrapes targets prometheus A prometheus B scrapes targets JFK scrapes targets

Prometheus Server Software Stack

Ghost Tunnel TLS termination and auth.

Service Discovery Sidecar Target configuration

Rules Loader Recording and Alert rules

Prometheus

Typical Server Software Stack

Service Discovery Proxy Service discovery and

TLS exporter proxy

Exporters Built into services or sidecar

Page 23: Migrating to Prometheus at Fastly - PromCon NA 2020 · prometheus A prometheus B scrapes targets SJC scrapes targets prometheus A prometheus B scrapes targets JFK scrapes targets

Build your own service discovery?

Page 24: Migrating to Prometheus at Fastly - PromCon NA 2020 · prometheus A prometheus B scrapes targets SJC scrapes targets prometheus A prometheus B scrapes targets JFK scrapes targets

Fastly's infrastructure is bare metal hardware

no cloud conveniences

Page 25: Migrating to Prometheus at Fastly - PromCon NA 2020 · prometheus A prometheus B scrapes targets SJC scrapes targets prometheus A prometheus B scrapes targets JFK scrapes targets

๏ Automatic discovery of targets.

๏ Self-service registration of exporter endpoints,

๏ TLS encryption for all exporter traffic.

๏ Minimal exposure of exporter TCP ports.

Service discovery requirements

Page 26: Migrating to Prometheus at Fastly - PromCon NA 2020 · prometheus A prometheus B scrapes targets SJC scrapes targets prometheus A prometheus B scrapes targets JFK scrapes targets

Prometheus Server Software Stack

Ghost Tunnel TLS termination and auth.

PromSD Sidecar Target configuration

Prometheus

Typical Server Software Stack

PromSD Proxy Service discovery and

TLS exporter proxy

Exporters Built into services or sidecar

generates config for prometheus

scrapes proxied targets over TLS

queries for available targets

Page 27: Migrating to Prometheus at Fastly - PromCon NA 2020 · prometheus A prometheus B scrapes targets SJC scrapes targets prometheus A prometheus B scrapes targets JFK scrapes targets

promsd sidecar

"exporter_hosts": [ "10.0.0.1", "10.0.0.2", "10.0.0.3", "10.0.0.4"]

configly

fetch list of hosts

in a datacenter

1

promsd proxy

request /targets endpoint

for each host to get list

of available scrape targets

32

3

output all targets as a

file service discovery

JSON file

4

Prometheus reads

the file and scrapes

the configured

targets.

{ "targets": [ “10.0.0.1:9702”, “10.0.0.2:9702” ], "labels": { "__metrics_path__": “/node_exporter_9100/metrics", "job": “node_exporter” }},{ "targets": [ “10.0.0.1:9702”, “10.0.0.2:9702” ], "labels": { "__metrics_path__": "/varnishstat_exporter_19102/metrics", "job": "varnishstat_exporter" }}

PromSD sidecar

Page 28: Migrating to Prometheus at Fastly - PromCon NA 2020 · prometheus A prometheus B scrapes targets SJC scrapes targets prometheus A prometheus B scrapes targets JFK scrapes targets

promsd proxy

fetch list of installedsystemd services

node_exporter

process_exporter

systemd

"node_exporter": { "prometheus_properties": { "target": "127.0.0.1:9100" } }, … "varnishstat_exporter": { "prometheus_properties": { "target": "127.0.0.1:19102" } }

for each correspondingsystemd service fetch the local exporter target address

varnishstat_exporter

1

32

3

configly

exposes an APIused by prometheusand promsd sidecar

/node_exporter_9100/metrics/varnish_exporter_19102/metrics

/targetssidecar

PromSD proxy

Page 29: Migrating to Prometheus at Fastly - PromCon NA 2020 · prometheus A prometheus B scrapes targets SJC scrapes targets prometheus A prometheus B scrapes targets JFK scrapes targets

๏ Really easy to leverage the file SD mechanism.

๏ New targets can be added with one line of config.

๏ TLS and authentication everywhere.

๏ Single exporter port open per host.

It worked!

Page 30: Migrating to Prometheus at Fastly - PromCon NA 2020 · prometheus A prometheus B scrapes targets SJC scrapes targets prometheus A prometheus B scrapes targets JFK scrapes targets

Prometheus Adoption

Page 31: Migrating to Prometheus at Fastly - PromCon NA 2020 · prometheus A prometheus B scrapes targets SJC scrapes targets prometheus A prometheus B scrapes targets JFK scrapes targets

Prometheus at Scale at Fastly

114 Prometheus servers globally

28.4M time series

2.2M million samples/second

Page 32: Migrating to Prometheus at Fastly - PromCon NA 2020 · prometheus A prometheus B scrapes targets SJC scrapes targets prometheus A prometheus B scrapes targets JFK scrapes targets

... a few hours later

Page 33: Migrating to Prometheus at Fastly - PromCon NA 2020 · prometheus A prometheus B scrapes targets SJC scrapes targets prometheus A prometheus B scrapes targets JFK scrapes targets

๏ Engineers love it.

๏ Dashboard and alert quality have increased.

๏ PromQL enables some deep insights.

๏ Scaling linearly with our infrastructure growth.

Prometheus wins

Page 34: Migrating to Prometheus at Fastly - PromCon NA 2020 · prometheus A prometheus B scrapes targets SJC scrapes targets prometheus A prometheus B scrapes targets JFK scrapes targets

๏ Metrics exploration without prior knowledge.

๏ Alertmanager's flexibility.

๏ Federation and global views.

๏ Long term storage still an open question.

Still some rough edges.

Page 35: Migrating to Prometheus at Fastly - PromCon NA 2020 · prometheus A prometheus B scrapes targets SJC scrapes targets prometheus A prometheus B scrapes targets JFK scrapes targets

😍

Page 36: Migrating to Prometheus at Fastly - PromCon NA 2020 · prometheus A prometheus B scrapes targets SJC scrapes targets prometheus A prometheus B scrapes targets JFK scrapes targets

Thanks!@ickymettle fastly.com


Recommended