Monitoring Is Never Done

Post on 08-Aug-2015

2,001 views 0 download

Tags:

transcript

Monitoring is Never “Done”

@melaniemj

Responsibilities @ Yardi

Implementation and administration of monitoring, alerting, and log aggregation/analysis tools.

o 15,000+ Deviceso 9 Datacenterso 5000+ Customer Installationso We monitor windows envs with linux envs

This was me in 2008 @ Point2

How code is delivered

How code operates in production

A good problem to have

Everyone wants “the monitoring” so they can say “it’s monitored”

Communicating Work

o Classify o Quantify o Qualify

Words....

o Loggingo Alertingo Dashboards o Reportso 4-9so 24x7x365 this shit can’t go down

Can it be this simple?

Let’s talk about “the monitoring” for X

Be awesome

X is monitored

DCVA (OODA)

1. Definition

I can hit this one page so it’s up right?

No thanks, let’s redefine status

1. Definition

o What questions are you trying to answer?o What information do you need when a failure

occurs?o What are the most common failures?o Who is the audience for the information?

2. Checks & Collections

o Environment & Codeo Data pointso Detailed logso Current state

3. Visualization

o Analysiso Dashboardso Correlations

4. Action

o Fault detection o Alertingo RCA

Cycle

(What to collect)

(Inform on failure) (How to collect)

(Make collections pretty)

Team Time Distribution

Time Distribution (Desired)

Is “X” monitored?

When “X” goes into some degraded stateo The right people know.

o They have enough information to find the problem, recover, and later to do RCA.

o If they don’t they will revisit definition.

How does your team

o Classify o Quantify o Qualify

Monitoring is Never “Done”

Melanie Cey @melaniemj

Senior Systems AnalystSystems Reliability Engineering @ Yardi