+ All Categories
Home > Presentations & Public Speaking > Talk @ GrafanaCon 2016

Talk @ GrafanaCon 2016

Date post: 16-Apr-2017
Category:
Upload: utkarsh-bhatnagar
View: 167 times
Download: 0 times
Share this document with a friend
50
astic-Monitoring using @
Transcript
Page 1: Talk @ GrafanaCon 2016

Elastic-Monitoring using@

Page 2: Talk @ GrafanaCon 2016

Utkarsh Bhatnagar

• Senior Software Engineer @ Sony Interactive Entertainment (PlayStation).• An active contributor to Grafana.• Project initiator for wizzy – a user friendly CLI tool for GRAFANA

GitHub - https://github.com/utkarshcmuEmail – [email protected]

Page 3: Talk @ GrafanaCon 2016
Page 4: Talk @ GrafanaCon 2016

PlayStation Outage!

Page 5: Talk @ GrafanaCon 2016

Hi, I am Jack.

Sometime 2 years back…

Page 6: Talk @ GrafanaCon 2016

POC on Monitoring

Requirements:

• 50,000 unique metrics from one source• Data points every minute• Roughly about 72 million data points per day• Data retention 60 days• User friendly UI with possible customization

Page 7: Talk @ GrafanaCon 2016

Choosing the technology!

Page 8: Talk @ GrafanaCon 2016

POCDesign & Architecture

METRICSOURCE

Page 9: Talk @ GrafanaCon 2016

POC Completed!

Mission accomplished!

1 metrics source50,000 unique metrics

72 million data points per day

Page 10: Talk @ GrafanaCon 2016

Metrics OnboardingTeam 1 Requirements:• 100,000 unique metrics• About 200 million data points per day

Team 2 Requirements:• 400,000 unique metrics• About 600 million data points per day

Team 3 Requirements:• 500,000 unique metrics• About 2 billion data points per day

Team 4 Requirements:• 800,000 unique metrics• About 5 billion data points per day

And more………

Page 11: Talk @ GrafanaCon 2016

POCDesign & Architecture

METRICSOURCE

Page 12: Talk @ GrafanaCon 2016

How to Scale?

Should he continue with Graphite?Should he ask to reduce metrics or datapoints?

How to dynamically scale Graphite?Does Grafana support other datasources?

OpenTSDB / InfluxDB / KairosDB / Prometheus?Support scaling Infrastructure to support variable load of metrics?

Challenges:• Multiple teams• Millions of unique metrics• Above 10 billion data points a day• Process 3 million logs every minute

and generate metrics• Reprocessing of metrics and logs if

needed• Provide real time monitoring for all

of the above using GRAFANA!

Page 13: Talk @ GrafanaCon 2016

Strategy

Divide & Conquer

Team 1 Requirements:• 100,000 unique metrics• About 200 million data

points per day

Team 2 Requirements:• 500,000 unique metrics• About 2 billion data

points per day

Team 3 Requirements:• 3 million logs a minute• Generate metrics in real

time

And more………

Team 1 Requirements:• 100,000 unique metrics• About 200 million data

points per day

Page 14: Talk @ GrafanaCon 2016

Design & Architecture

POCMETRICSOURCE

POC works for:

1 metrics source50,000 unique metrics

72 million data points per day

Team 1 requirements:

1 metrics source100,000 unique metrics

200 million data points per day

TEAM 1 METRIC SOURCE

Page 15: Talk @ GrafanaCon 2016

Team 1 Conquered!

This strategy works! Bring it on!

Page 16: Talk @ GrafanaCon 2016

Strategy

Divide & Conquer

Team 1 Requirements:• 100,000 unique metrics• About 200 million data

points per day

Team 2 Requirements:• 500,000 unique metrics• About 2 billion data

points per day

Team 3 Requirements:• 3 million logs a minute• Generate metrics in real

time

And more………

Team 2 Requirements:• 500,000 unique metrics• About 2 billion data

points per day

Page 17: Talk @ GrafanaCon 2016

Design & Architecture

POCMETRICSOURCE

Team 2 requirements:

1 metrics source500,000 unique metrics

2 billion data points per day

TEAM 1 METRIC SOURCE

TEAM 2 METRIC SOURCE

Page 18: Talk @ GrafanaCon 2016

Team 2 Conquered!

Page 19: Talk @ GrafanaCon 2016

Design & Architecture

POCMETRICSOURCE

TEAM 1 METRIC SOURCE

TEAM 2 METRIC SOURCE

Team 2 requirements:

1 metrics source500,000 unique metrics

2 billion data points per day

Page 20: Talk @ GrafanaCon 2016
Page 21: Talk @ GrafanaCon 2016

Scaling Graphite

Clustering Graphite

CARBON RELAY

CARBON CACHE + WHISPER +

GRAPHITE WEB

CARBON CACHE + WHISPER +

GRAPHITE WEB

CARBON CACHE + WHISPER +

GRAPHITE WEB. . .

GRAPHITE WEB GRAPHITE WEB

LOAD BALANCER

Page 22: Talk @ GrafanaCon 2016

Design & Architecture

POCMETRICSOURCE

TEAM 1 METRIC SOURCE

TEAM 2 METRIC SOURCE

Team 2 requirements:

1 metrics source500,000 unique metrics

2 billion data points per day

CR

G G G. . .

GW GW

LB

Page 23: Talk @ GrafanaCon 2016

Team 2 Conquered!

But……. Happiness lasted only for a month

Page 24: Talk @ GrafanaCon 2016

Design & Architecture

POCMETRICSOURCE

TEAM 1 METRIC SOURCE

TEAM 2 METRIC SOURCE

Team 2 requirements:

1 metrics source500,000 unique metrics

2 billion data points per day

CR

G G G. . .

GW GW

LB

Page 25: Talk @ GrafanaCon 2016

Scalable Alternatives ToGraphite

Page 26: Talk @ GrafanaCon 2016

Design & Architecture

POCMETRICSOURCE

TEAM 1 METRIC SOURCE

TEAM 2 METRIC SOURCE

Team 2 requirements:

1 metrics source500,000 unique metrics

2 billion data points per day

CR

G G G. . .

GW GW

LB

Page 27: Talk @ GrafanaCon 2016

Team 2 Conquered!

Finally!

Page 28: Talk @ GrafanaCon 2016

Strategy

Divide & Conquer

Team 1 Requirements:• 100,000 unique metrics• About 200 million data

points per day

Team 2 Requirements:• 500,000 unique metrics• About 2 billion data

points per day

Team 3 Requirements:• 3 million logs a minute• Generate metrics in real

time

And more………

Team 3 Requirements:• 3 million logs a minute• Generate metrics in real

time

Page 29: Talk @ GrafanaCon 2016

How to process logs at scale?

Page 30: Talk @ GrafanaCon 2016

Design & Architecture

POCMETRICSOURCE

TEAM 1 METRIC SOURCE

Team 3 requirements:

Over 5000 log sources3 million logs per minute

TEAM 2 METRIC SOURCE

LOGS SOURCES

Page 31: Talk @ GrafanaCon 2016

Team 3 Conquered!

But …. One day..

Page 32: Talk @ GrafanaCon 2016

Design & Architecture

POCMETRICSOURCE

TEAM 1 METRIC SOURCE

TEAM 2 METRIC SOURCE

LOGS SOURCES

Page 33: Talk @ GrafanaCon 2016

Design & ArchitectureMETRIC SOURCE 1

METRIC SOURCE 2

METRIC SOURCE 3

METRIC SOURCE N

LOGS SOURCES

LB

Alerting

Page 34: Talk @ GrafanaCon 2016

Metrics & Logs Sources

Application Metrics- Apps using a stats library written byAlexander Filipchik(Principal Engineer @ PlayStation)

Custom metrics- From other sources

Page 35: Talk @ GrafanaCon 2016

Some numbers• More than 4 million unique metrics supported

- creation and deletion happens all the time

• More than 11 billion data points written per day- across all TSDBs

• Processing about 40 billion events per day- logs and metrics events in near real time (within 30 seconds)

• More than 3000 requests per minute to Grafana dashboards- around 7000 in during outages

Page 36: Talk @ GrafanaCon 2016

Lessons Learned

Page 37: Talk @ GrafanaCon 2016

Strategy

Divide & Conquer

Page 38: Talk @ GrafanaCon 2016
Page 39: Talk @ GrafanaCon 2016

Look for alternatives!

Page 40: Talk @ GrafanaCon 2016

Choose scalable components!

(Subject to effort and time)

Page 41: Talk @ GrafanaCon 2016

Automation

Page 42: Talk @ GrafanaCon 2016

Design & ArchitectureMETRIC SOURCE 1

METRIC SOURCE 2

METRIC SOURCE 3

METRIC SOURCE N

LOGS SOURCES

LB

Alerting

Page 43: Talk @ GrafanaCon 2016
Page 44: Talk @ GrafanaCon 2016

What’s Next for Jack?

Page 45: Talk @ GrafanaCon 2016
Page 46: Talk @ GrafanaCon 2016

Power of Open-Source

Sep, 21st 2015

Nov, 17th 2016

Grafana Pull Requests:• Total - 144• Accepted – 128• Declined – 14• Open - 2

Contribute More!

Page 48: Talk @ GrafanaCon 2016

Use Cases• Prod , Stage and Dev installations of Grafana

• Move/Copy rows, panels from one dashboard to another

• Version control your dashboards

• Manage Grafana entities like orgs, etc via CLI

Page 49: Talk @ GrafanaCon 2016
Page 50: Talk @ GrafanaCon 2016

Utkarsh Bhatnagar

• Senior Software Engineer @ Sony Interactive Entertainment (PlayStation).• An active contributor to Grafana.• Project initiator for wizzy – a user friendly CLI tool for GRAFANA

GitHub - https://github.com/utkarshcmuEmail – [email protected]


Recommended