+ All Categories
Home > Technology > Fact-Based Monitoring - PuppetConf 2014

Fact-Based Monitoring - PuppetConf 2014

Date post: 25-Dec-2014
Category:
Upload: puppet-labs
View: 620 times
Download: 0 times
Share this document with a friend
Description:
Fact-Based Monitoring - Alexis Le-Quoc, Datadog
35
Fact-based Monitoring puppetconf 2014 Alexis Lê-Quôc @alq
Transcript
Page 1: Fact-Based Monitoring - PuppetConf 2014

Fact-based Monitoringpuppetconf 2014

Alexis Lê-Quôc @alq

Page 2: Fact-Based Monitoring - PuppetConf 2014

Alexis Lê-Quôc, @alqCTO at Datadog

Page 3: Fact-Based Monitoring - PuppetConf 2014

Poll: Monitoring makes me…

happy proud

cry want to hide

Page 4: Fact-Based Monitoring - PuppetConf 2014

Puppet brings Automation to Systems Management

Page 5: Fact-Based Monitoring - PuppetConf 2014

Improve Monitoring

the way Puppet has improved

Systems Management

Page 6: Fact-Based Monitoring - PuppetConf 2014

“The good old days”

• Your “CMDB” was Excel

• SSH in and hack away

• Little time for anything else

Page 7: Fact-Based Monitoring - PuppetConf 2014

Then Puppet came…

• Expressive rules that capture expected result

• Using facts and classifiers, a.k.a. metadata to figure out where to apply changes

• That freed up a lot of our time*

* on a per-machine basis

Page 8: Fact-Based Monitoring - PuppetConf 2014

–Me (just now)

“Puppet brings immunity of configuration to change in infrastructure”

Page 9: Fact-Based Monitoring - PuppetConf 2014

I have seen this before…

Page 10: Fact-Based Monitoring - PuppetConf 2014

–C.J. Date (1977)

“[SQL brings] immunity of application to change in storage structure and access strategy”

http://www.cs.berkeley.edu/~brewer/cs262/SystemR.pdf

Page 11: Fact-Based Monitoring - PuppetConf 2014

SQL

• 1974 IBM introduces System R and its Structured Query Language

• Expressive rules that capture expected result

• Using facts and predicates, a.k.a. metadata to figure out what data to get

• That freed up a lot of development time

Page 12: Fact-Based Monitoring - PuppetConf 2014

SQL

• From a time-consuming, imperative mess (“how”)

• … to expressive data queries (“what”)

SQL query

SELECT (desired facts) FROM (existing facts) WHERE (matching criteria)

Page 13: Fact-Based Monitoring - PuppetConf 2014

Puppet

• From a time-consuming, imperative mess (“how”)

• … to expressive configuration queries (“what”)

puppet apply

CHANGE (desired facts) FROM (existing puppet facts) WHERE (matching puppet classes)

Page 14: Fact-Based Monitoring - PuppetConf 2014

Is there a pattern?

Page 15: Fact-Based Monitoring - PuppetConf 2014

–MCollective overview

“Break free from ever more complex naming conventions for hostnames as a means of identity. Use a very rich set of meta

data provided by each machine to address them.”

Page 16: Fact-Based Monitoring - PuppetConf 2014

MCollective

• From a time-consuming, imperative mess (“how”)

• … to expressive orchestration queries (“what”)

mco rpc service restart service=nginx\ -F webpool=A

EXEC (desired actions) FROM (existing puppet facts) WHERE (matching puppet classes)

Page 17: Fact-Based Monitoring - PuppetConf 2014

Back to monitoring

• Monitoring is to behavior what Puppet is to configuration

• Monitoring is to behavior what MCollective is to orchestration

Page 18: Fact-Based Monitoring - PuppetConf 2014

Monitoring

• From a time-consuming, imperative mess (“how”)

• … to expressive monitoring queries (“what”)

Monitoring query

MONITOR (desired behavior) FROM (existing heartbeats/metrics) WHERE (matching puppet facts)

Page 19: Fact-Based Monitoring - PuppetConf 2014

Examples• “All provisioned web servers in the production environment,

datacenter ABC must respond to queries within 200ms”

• “All PostgreSQL servers must have a postgres: bgwriter process running”

• “At least one ActiveMQ server is up to support mcollective"

• Never mention a hostname

Page 20: Fact-Based Monitoring - PuppetConf 2014

Hosts are not the center of the monitoring universe.

Facts are!

Hosts are just places where facts occur.

Page 21: Fact-Based Monitoring - PuppetConf 2014

The proof is in the pudding…

Page 22: Fact-Based Monitoring - PuppetConf 2014

Hosts at the center of the universea.k.a. the Wrong Way

Page 23: Fact-Based Monitoring - PuppetConf 2014

–Nagios Core 4 manual on monitoring clusters

“Its fairly straightforward, so hopefully you find things easy to understand…”

Page 24: Fact-Based Monitoring - PuppetConf 2014

Host-centric: Monitor a DNS cluster

check_commandcheck_service_cluster!"DNS Cluster"!0!1!$SERVICESTATEID:host1:DNS Service$,$SERVICESTATEID:host2:DNS Service$,$SERVICESTATEID:host3:DNS Service$

Where do host1, host2, host3 come from?

Page 25: Fact-Based Monitoring - PuppetConf 2014

Host-centric: can’t use facts directly• “Host groups solve this problem”. No, they don’t.

• Combinatorial explosion, e.g. trivially

• 4 data centers (us-1, us-2, eu, apac)

• 5 classes (web, db, cache, appserver, hadoop)

• 3 environments (test, staging, prod)

• => up to 119 materialized host groups

Page 26: Fact-Based Monitoring - PuppetConf 2014

Nagios-bashing?

• No!

• Same fatal flaw with all host-centric monitoring tools

• Host-centric monitoring forces an extra, expensive step:

• replicate fact-based conditionals in host-centric templates

Page 27: Fact-Based Monitoring - PuppetConf 2014

–puppet-nagios author

“Please note that this module is not for the faint of heart. Even I (the author) have my head hurt each time I have to make

modifications to it…”

Page 28: Fact-Based Monitoring - PuppetConf 2014

Facts at the center of the universea.k.a. the Right Way

"De Revolutionibus manuscript p9b" by Nicolas Copernicus - www.bj.uj.edu.pl. Licensed under Public domain via Wikimedia Commons - http://commons.wikimedia.org/wiki/File:De_Revolutionibus_manuscript_p9b.jpg#mediaviewer/File:De_Revolutionibus_manuscript_p9b.jpga

Page 29: Fact-Based Monitoring - PuppetConf 2014

Earlier Examples

• “All provisioned web servers in the production environment, datacenter ABC must respond to queries within 200ms”

• “All PostgreSQL servers must have a postgres: bgwriter process running”

• “At least one ActiveMQ server is up to support mcollective"

Page 30: Fact-Based Monitoring - PuppetConf 2014

In Sensu (heartbeats)• “All PostgreSQL servers must have a postgres: bgwriter process

running”

class postgres::monitoring::sensu { sensu::subscription { 'postgres': }}

• Monitoring using a fact-based query

• Is node of class “postgres” and subscribed to “postgres” or not?

• If so, it will execute the postgres check

Page 31: Fact-Based Monitoring - PuppetConf 2014

In Datadog (metrics)• “All provisioned web servers in the production environment,

datacenter ABC must respond to queries within 200ms”$ puppet module install datadog-datadog_agent

class { ‘datadog_agent’:

api_key => …,tags => [$environment],fact_to_tags => [“datacenter”]

}include datadog_agent::integrations::nginx

Page 32: Fact-Based Monitoring - PuppetConf 2014

In Datadog (metrics)• Monitoring using a fact-based query

• Puppet facts directly reused

max(nginx.request.latency{production,datacenter:ABC}) < 200

Page 33: Fact-Based Monitoring - PuppetConf 2014

What to take away

Page 34: Fact-Based Monitoring - PuppetConf 2014

Fact-based monitoring

1. Hosts are not at the center of the monitoring universe

2. Expressive monitoring uses queries

3. Monitoring queries should use Puppet facts

Page 35: Fact-Based Monitoring - PuppetConf 2014

Thank you!


Recommended