+ All Categories
Home > Technology > Monitoring Is Never Done

Monitoring Is Never Done

Date post: 08-Aug-2015
Category:
Upload: melanie-cey
View: 2,001 times
Download: 0 times
Share this document with a friend
Popular Tags:
22
Monitoring is Never “Done” @melaniemj
Transcript
Page 1: Monitoring Is Never Done

Monitoring is Never “Done”

@melaniemj

Page 2: Monitoring Is Never Done

Responsibilities @ Yardi

Implementation and administration of monitoring, alerting, and log aggregation/analysis tools.

o 15,000+ Deviceso 9 Datacenterso 5000+ Customer Installationso We monitor windows envs with linux envs

Page 3: Monitoring Is Never Done

This was me in 2008 @ Point2

Page 4: Monitoring Is Never Done

How code is delivered

Page 5: Monitoring Is Never Done

How code operates in production

Page 6: Monitoring Is Never Done

A good problem to have

Everyone wants “the monitoring” so they can say “it’s monitored”

Page 7: Monitoring Is Never Done

Communicating Work

o Classify o Quantify o Qualify

Page 8: Monitoring Is Never Done

Words....

o Loggingo Alertingo Dashboards o Reportso 4-9so 24x7x365 this shit can’t go down

Page 9: Monitoring Is Never Done

Can it be this simple?

Let’s talk about “the monitoring” for X

Be awesome

X is monitored

Page 10: Monitoring Is Never Done

DCVA (OODA)

Page 11: Monitoring Is Never Done

1. Definition

I can hit this one page so it’s up right?

No thanks, let’s redefine status

Page 12: Monitoring Is Never Done

1. Definition

o What questions are you trying to answer?o What information do you need when a failure

occurs?o What are the most common failures?o Who is the audience for the information?

Page 13: Monitoring Is Never Done

2. Checks & Collections

o Environment & Codeo Data pointso Detailed logso Current state

Page 14: Monitoring Is Never Done

3. Visualization

o Analysiso Dashboardso Correlations

Page 15: Monitoring Is Never Done

4. Action

o Fault detection o Alertingo RCA

Page 16: Monitoring Is Never Done
Page 17: Monitoring Is Never Done

Cycle

(What to collect)

(Inform on failure) (How to collect)

(Make collections pretty)

Page 18: Monitoring Is Never Done

Team Time Distribution

Page 19: Monitoring Is Never Done

Time Distribution (Desired)

Page 20: Monitoring Is Never Done

Is “X” monitored?

When “X” goes into some degraded stateo The right people know.

o They have enough information to find the problem, recover, and later to do RCA.

o If they don’t they will revisit definition.

Page 21: Monitoring Is Never Done

How does your team

o Classify o Quantify o Qualify

Page 22: Monitoring Is Never Done

Monitoring is Never “Done”

Melanie Cey @melaniemj

Senior Systems AnalystSystems Reliability Engineering @ Yardi


Recommended