Date post: | 23-Jan-2017 |
Category: |
Technology |
Upload: | kris-buytaert |
View: | 1,167 times |
Download: | 3 times |
From #MonitoringSucks to From #MonitoringSucks to #MonitoringLove #MonitoringLove
(and back)(and back)
@KrisBuytaert
T-Dose 2015, Eindhoven,.nl
Kris BuytaertKris Buytaert● I used to be a Dev,I used to be a Dev,● Then Became an OpThen Became an Op● Chief Trolling Officer and Open Source Chief Trolling Officer and Open Source
Consultant @inuits.euConsultant @inuits.eu● Everything is an effing DNS ProblemEverything is an effing DNS Problem● Building Clouds since before the bookstoreBuilding Clouds since before the bookstore● Organising Conferences Organising Conferences ● Evangelizing devopsEvangelizing devops
An opinionated talk about the Open Source An opinionated talk about the Open Source Monitoring tooling landscapeMonitoring tooling landscape
In which I hope to learn from YOUIn which I hope to learn from YOU
#devops=~C(L)AMS#devops=~C(L)AMS● CultureCulture
● (Lean)(Lean)
● AutomationAutomation
● Monitoring and MeasurementMonitoring and Measurement
● SharingSharing
Damon Edwards and John WillisDamon Edwards and John Willis
Gene KimGene Kim
Monitoring is usually an Monitoring is usually an aftertoughtaftertought
ENOBUDGET, ENOTIMEENOBUDGET, ENOTIME
An 2008 OLS PaperAn 2008 OLS Paper● We have bloated Java toolsWe have bloated Java tools
● Some open Core stufSome open Core stuf
● DYI folks want traditional NagiosDYI folks want traditional Nagios
● DBA RequiredDBA Required
#monitoringsucks#monitoringsucks● John Vincent (@lusis), june 2011John Vincent (@lusis), june 2011
● A sub #devops movement A sub #devops movement
● https://github.com/monitoringsucks/https://github.com/monitoringsucks/
Why #monitoringsucksWhy #monitoringsucks● Manual config (gui)Manual config (gui)
● Not in sync with realityNot in sync with reality
● Hosts onlyHosts only
● Services sometimesServices sometimes
● Aplication neverAplication never
● Chaos or out of sync with realityChaos or out of sync with reality
● Alert FatigueAlert Fatigue
Let's forget aboutLet's forget about● Tools with no (stable) APITools with no (stable) API
● Tools with strong focus on GUITools with strong focus on GUI
● Unless you are an SME with < 100 nodesUnless you are an SME with < 100 nodes
● Zenoss, Hyperic, GroundWork, ....Zenoss, Hyperic, GroundWork, ....
● P.S. : don't even mention proprietary software to meP.S. : don't even mention proprietary software to me
What we wantWhat we want
● Small , well suited componentsSmall , well suited components
• CollectCollect
• Transport / MangleTransport / Mangle
• StoreStore
• Analyse Analyse
• Act / Alert Act / Alert
• VisualizeVisualize
•
#monitoringlove#monitoringlove•
• Ulf Mansson #devopsdays Rome 2011 Ulf Mansson #devopsdays Rome 2011
• A new era of toolingA new era of tooling
• #monitoringlove hacksessions @inuits#monitoringlove hacksessions @inuits
• #monitorama#monitorama
IcingaIcinga• 2009 Fork2009 Fork
• I consider Nagios deadI consider Nagios dead
• Vibrant Community (or they stalk me)Vibrant Community (or they stalk me)
• Throw great parties in NurnbergThrow great parties in Nurnberg
• Nobody can pronounce it anyhowNobody can pronounce it anyhow
• https://github.com/Inuits/puppet-icinga/https://github.com/Inuits/puppet-icinga/
AutomationAutomation
#monitoringlove#monitoringloveBut the love was about :But the love was about :
SensuSensu● Awesome for non static Awesome for non static
environmentsenvironments
● Scaling a clustered RabbitMQ ?Scaling a clustered RabbitMQ ?
● This is Europe, U no do cloudThis is Europe, U no do cloud
Automation of Automation of #monitoring #monitoring brought back brought back
the the #love#love
Monitoring a Monitoring a serviceservice
vs vs
Monitoring a Monitoring a ServiceService
definition of done:definition of done:
monitored and in productionmonitored and in production
A software project is not done A software project is not done untill your last end user is deaduntill your last end user is dead
Culture, Culture,
Automation,Automation,
Measurement :Measurement :
measure all the thingsmeasure all the thingsSharingSharing
Deploy StatisticsDeploy Statistics● Time To DeployTime To Deploy
● Deploy Deploy FrequencyFrequency
● Lifecycle Lifecycle frequencyfrequency
● Map to other Map to other metrics metrics
CollectD all the metrics, CollectD all the metrics,
at high intervalsat high intervals
Oldschool graphiteOldschool graphite
Self ServiceSelf ServiceGdash based pipelinesGdash based pipelines
Puppetized Templates (wip)Puppetized Templates (wip)
GdashGdash
GrafanaGrafana
Graphite++Graphite++● Dashboards Dashboards
• GrafanaGrafana
● Engines : Engines :
• InfluxDBInfluxDB
• CyaniteCyanite
Triggers on GraphsTriggers on Graphs● Export Java MetricsExport Java Metrics
● JMXTransJMXTrans
● Export JMXConfigsExport JMXConfigs
● Configure NRPE CheckConfigure NRPE Check
● Export NagiosCheckExport NagiosCheck
● Collect JMX Exports on Collect JMX Exports on JMXTransNodeJMXTransNode
● Graph EmGraph Em
Collect Icinga Configs Collect Icinga Configs on Icingaon Icinga
Aggregation Aggregation ● Alert on streamsAlert on streams
● Alert on aggregated metricsAlert on aggregated metrics
RiemannRiemann● I still don't get it ?I still don't get it ?
● Distributed TopDistributed Top
● Do you like Clojure ?Do you like Clojure ?
● Riemann Health plugin ?Riemann Health plugin ?
● s/riemann-health/collectd/g;s/riemann-health/collectd/g;
● Output to graphiteOutput to graphite
Graphs to KnowledgeGraphs to Knowledge
SkylineSkyline
• OculusOculus
• Creating Information out of this data Creating Information out of this data
• Big dataBig data
• Machine LearningMachine Learning
But I have log files..But I have log files..
Logs and MetricsLogs and Metrics● Graylog2Graylog2
● ELSA (Enterprise Log Search and ELSA (Enterprise Log Search and Archive) Archive)
● ELK StackELK Stack
● Collect from Collect from anywhereanywhere
● FilterFilter
● Send anywhereSend anywhere
● QueingQueing
APMAPMBut what about my apps ?But what about my apps ?
Half the world cheers about SAAS Half the world cheers about SAAS tools :(tools :(
PacketbeatPacketbeat● Traffic Flow Traffic Flow
through networkthrough network
● Transactions Transactions causing errroscausing errros
● SQL per HTTPSQL per HTTP
● API call usageAPI call usage
PacketBeatPacketBeat
So your DC failsSo your DC fails
Whom to alert when ?Whom to alert when ?
'New' kids on the block'New' kids on the block● FlapjackFlapjack
flapjack.ioflapjack.io
monitoring notification routing + monitoring notification routing + event processing systemevent processing system
● OpenDuty OpenDuty
github.com/szechuen/OpenDutygithub.com/szechuen/OpenDuty
Duty managementDuty management
My Alerting StrategyMy Alerting Strategy
Is still in beta Is still in beta
And back :(And back :(
In 2014 I`m still running the same check forIn 2014 I`m still running the same check for
- service registration (consul)- service registration (consul)
- high availability (pacemaker/corosync)- high availability (pacemaker/corosync)
- monitoring (icinga)- monitoring (icinga)
But I love where Monitoring is heading But I love where Monitoring is heading
We have much less false positivesWe have much less false positives
And we have a Maintainable Monitoring InfraAnd we have a Maintainable Monitoring Infra
KindaKinda
[email protected]@inuits.eu
Further ReadingFurther Reading@krisbuytaert @krisbuytaert http://www.krisbuytaert.be/blog/http://www.krisbuytaert.be/blog/http://www.inuits.eu/http://www.inuits.eu/
InuitsInuits
Duboistraat 50Duboistraat 502060 Antwerpen2060 AntwerpenBelgiumBelgium891.514.231891.514.231
+32 475 961221+32 475 961221