Munin and Nagios 2010-10-07 1 / 53
Munin and Nagios
Stig Sandbeck Mathisen
Redpill Linpro AS
2010-10-07
PRODUCTS • DEVELOPMENT • APPLICATION MANAGEMENT • IT OPERATIONS • SUPPORT • TRAINING
Munin and Nagios Introduction 2010-10-07 2 / 53
Outline
1 Introduction2 The Munin project3 Munin master and node4 Munin plugins5 Integration with Nagios6 Example graph set7 New features in Munin 1.48 New features in Munin 2.09 Speeding up munin10 The end
PRODUCTS • DEVELOPMENT • APPLICATION MANAGEMENT • IT OPERATIONS • SUPPORT • TRAINING
Munin and Nagios Introduction 2010-10-07 3 / 53
About the speaker
System Administrator - 12 years with open source based systemadministration. 5 years at Redpill Linpro.
Debian Developer - varnish, munin, puppet, facter
PRODUCTS • DEVELOPMENT • APPLICATION MANAGEMENT • IT OPERATIONS • SUPPORT • TRAINING
Munin and Nagios Introduction 2010-10-07 4 / 53
About Redpill LinproLeading Nordic provider of professional Open Source services - across thestack.
Presence in Denmark, Finland, Norway and Sweden
Hosting and training facilities in Oslo and Karlstad.
190 employees in Gothenburg, Helsinki, Karlstad, Oslo, Stavanger andStockholm.
More than 300 customers across the Nordic countries, 60% in the“enterprise tier”
15 years in the Open Source business.
PRODUCTS • DEVELOPMENT • APPLICATION MANAGEMENT • IT OPERATIONS • SUPPORT • TRAINING
Munin and Nagios The Munin project 2010-10-07 5 / 53
Outline
1 Introduction2 The Munin project3 Munin master and node4 Munin plugins5 Integration with Nagios6 Example graph set7 New features in Munin 1.48 New features in Munin 2.09 Speeding up munin10 The end
PRODUCTS • DEVELOPMENT • APPLICATION MANAGEMENT • IT OPERATIONS • SUPPORT • TRAINING
Munin and Nagios The Munin project 2010-10-07 6 / 53
Introduction to Munin
Munin is a networked resource monitoring tool. It gathers data from yoursystems, creates lots of graphs, and...
can show you trends
can help you predict bottlenecks
can show you old data for comparison with current numbers
can send events to other systems
PRODUCTS • DEVELOPMENT • APPLICATION MANAGEMENT • IT OPERATIONS • SUPPORT • TRAINING
Munin and Nagios The Munin project 2010-10-07 7 / 53
History
Originally called LRRD, about the time Nagios was still known as Netsaint.The old code is still available as the “LRRD” project at SourceForge.2002: Started by Linpro2004: 1.0 released, development moved from CVS to Subversion2005: 1.2 released2009: 1.4 released
PRODUCTS • DEVELOPMENT • APPLICATION MANAGEMENT • IT OPERATIONS • SUPPORT • TRAINING
Munin and Nagios Munin master and node 2010-10-07 8 / 53
Outline
1 Introduction2 The Munin project3 Munin master and node4 Munin plugins5 Integration with Nagios6 Example graph set7 New features in Munin 1.48 New features in Munin 2.09 Speeding up munin10 The end
PRODUCTS • DEVELOPMENT • APPLICATION MANAGEMENT • IT OPERATIONS • SUPPORT • TRAINING
Munin and Nagios Munin master and node 2010-10-07 9 / 53
Master
The munin master runs from cron. It runs four jobs, each with its ownlock. A new cron job can begin while the previous already runs.
munin-update: Contacts eachnode, and retrieves pluginconfiguration and data
munin-graph: Creates graphsfrom RRD files
munin-limits: Checks for limitbreaches
munin-html: Updates theHTML documents
PRODUCTS • DEVELOPMENT • APPLICATION MANAGEMENT • IT OPERATIONS • SUPPORT • TRAINING
Munin and Nagios Munin master and node 2010-10-07 10 / 53
Node
The munin node listens for connections on 4949/TCP, and runs plugins onrequest. It runs each plugin when the master asks, to retrieveconfiguration and values.
The node uses Net::Server
Plugins are commonly written in shell, perl, python, ruby, awk...
SNMP plugins runs on a node, and queries other hosts
PRODUCTS • DEVELOPMENT • APPLICATION MANAGEMENT • IT OPERATIONS • SUPPORT • TRAINING
Munin and Nagios Munin master and node 2010-10-07 11 / 53
Wire protocolThe wire protocol is a simple and in clear text. Keywords are “list”,“config” and “fetch”.
Example
# munin node at puppet1.example.org
list
apache_accesses apache_processes [...] uptime users vmstat
config uptime
graph_title Uptime
[...]
.
fetch uptime
uptime.value 148.03
.
quitPRODUCTS • DEVELOPMENT • APPLICATION MANAGEMENT • IT OPERATIONS • SUPPORT • TRAINING
Munin and Nagios Munin plugins 2010-10-07 12 / 53
Outline
1 Introduction2 The Munin project3 Munin master and node4 Munin plugins5 Integration with Nagios6 Example graph set7 New features in Munin 1.48 New features in Munin 2.09 Speeding up munin10 The end
PRODUCTS • DEVELOPMENT • APPLICATION MANAGEMENT • IT OPERATIONS • SUPPORT • TRAINING
Munin and Nagios Munin plugins 2010-10-07 13 / 53
What is a munin plugin?
A plugin is a standalone executable.It is run by the Munin Node whenthe Munin Master connects.The plugin prints a clear textkey/value list of configuration andvalues on STDOUT.
plugins
Most plugins are shell scripts,but many are made in perl,python or ruby.
PRODUCTS • DEVELOPMENT • APPLICATION MANAGEMENT • IT OPERATIONS • SUPPORT • TRAINING
Munin and Nagios Munin plugins 2010-10-07 14 / 53
Plugin design
Munin plugins are designed to besimple to develop.If run with the argument “config”, itdisplays its configuration.If run with no arguments, it outputsits values.
Magic markers inside the plugin listsother capabilities, like the optionalarguments “autoconf” and“suggest”.
Magic markers
#%# capabilities=autoconf
PRODUCTS • DEVELOPMENT • APPLICATION MANAGEMENT • IT OPERATIONS • SUPPORT • TRAINING
Munin and Nagios Munin plugins 2010-10-07 15 / 53
Plugin configuration
Each plugin have a sensible defaultconfiguration.You can configure plugins in/etc/munin/plugin-conf.d/ onthe munin node.You can also configure plugins inmunin.conf on the munin master.
Configuration items
user (default is “nobody”)
group (default is “nogroup”)
Environment variables for theplugin
PRODUCTS • DEVELOPMENT • APPLICATION MANAGEMENT • IT OPERATIONS • SUPPORT • TRAINING
Munin and Nagios Munin plugins 2010-10-07 16 / 53
Example
#!/bin/sh
case "$1" in
config)
echo ’graph_title System Boredom Index’
echo ’graph_vlabel boredom in %’
echo ’time.label Total time’
echo ’bored.value Bored time’
;;
*)
awk ’{printf "time.value %d\n"
"bored.value %d\n", $1, $2}’ /proc/uptime
;;
esac
PRODUCTS • DEVELOPMENT • APPLICATION MANAGEMENT • IT OPERATIONS • SUPPORT • TRAINING
Munin and Nagios Munin plugins 2010-10-07 17 / 53
Boring example graph
A single line is described as a label,and given a value.
Several lines can be combined ininteresting ways with CDEF.
Example
time.graph no
bored.graph no
foo.cdef time,100,*,bored,/
foo.label Part time bored silly
PRODUCTS • DEVELOPMENT • APPLICATION MANAGEMENT • IT OPERATIONS • SUPPORT • TRAINING
Munin and Nagios Integration with Nagios 2010-10-07 18 / 53
Outline
1 Introduction2 The Munin project3 Munin master and node4 Munin plugins5 Integration with Nagios6 Example graph set7 New features in Munin 1.48 New features in Munin 2.09 Speeding up munin10 The end
PRODUCTS • DEVELOPMENT • APPLICATION MANAGEMENT • IT OPERATIONS • SUPPORT • TRAINING
Munin and Nagios Integration with Nagios 2010-10-07 19 / 53
Munin events
Munin generates events whenever a valuerises above, or sinks below a predefined limit.Many plugins support limits, but some donot set them by default.
Example
[cpu]
env.iowait_warning 5
env.iowait_critical 20
PRODUCTS • DEVELOPMENT • APPLICATION MANAGEMENT • IT OPERATIONS • SUPPORT • TRAINING
Munin and Nagios Integration with Nagios 2010-10-07 20 / 53
Munin and Nagios events
Munin ships with usable defaults to sendevents to Nagios.Events...
correspond with Nagios events
are processed by a templating system
are sent to a defined contact
Supported events
CRITICAL
WARNING
UNKNOWN
OK
PRODUCTS • DEVELOPMENT • APPLICATION MANAGEMENT • IT OPERATIONS • SUPPORT • TRAINING
Munin and Nagios Integration with Nagios 2010-10-07 21 / 53
Integration with Nagios
The contact.nagios.text template isdefined by default in munin.Use the contact.nagios.command
configuration setting to send events.
where?
This is configured in/etc/munin/munin.conf
PRODUCTS • DEVELOPMENT • APPLICATION MANAGEMENT • IT OPERATIONS • SUPPORT • TRAINING
Munin and Nagios Integration with Nagios 2010-10-07 22 / 53
Configuration for the munin master
On the munin master, We define a contact for nagios, using send nsca,and configure send nsca to encrypt the messages.
example munin.conf
contact.nagios.command /usr/sbin/send_nsca \
-H nagios.example.com -c /etc/send_nsca.cfg
example send nsca.conf
password="I like cheese!"
encryption_method=8
PRODUCTS • DEVELOPMENT • APPLICATION MANAGEMENT • IT OPERATIONS • SUPPORT • TRAINING
Munin and Nagios Integration with Nagios 2010-10-07 23 / 53
Configuration for the nagios server
On the nagios side, we set up nsca to receive events from our muninmasters.
Example part of nsca.conf
server_port=5667
nsca_user=nagios
nsca_group=nagios
command_file=/var/run/nagios/nagios.cmd
password="I like cheese!"
decryption_method=8
PRODUCTS • DEVELOPMENT • APPLICATION MANAGEMENT • IT OPERATIONS • SUPPORT • TRAINING
Munin and Nagios Integration with Nagios 2010-10-07 24 / 53
Configuration for the nagios server
Nagios, by default, do not accept external commands. We need toconfigure this to enable nsca to send the events to Nagios.
Example part of nagios.conf
check_external_commands=1
log_external_commands=1
command_file=/var/run/nagios/nagios.cmd
PRODUCTS • DEVELOPMENT • APPLICATION MANAGEMENT • IT OPERATIONS • SUPPORT • TRAINING
Munin and Nagios Integration with Nagios 2010-10-07 25 / 53
Nagios service configuration
To accept munin services, you will need a passive service check.
example nagios service
passive_checks_enabled 1
active_checks_enabled 0
max_check_attempts 1
PRODUCTS • DEVELOPMENT • APPLICATION MANAGEMENT • IT OPERATIONS • SUPPORT • TRAINING
Munin and Nagios Integration with Nagios 2010-10-07 26 / 53
Nagios service configuration
For services we have not heard from in a while, we use the “freshness”feature to set the service state to UNKNOWN. Munin reports state on allservices every 24 hours by default, so a bit more than this is a sensibledefault.
example nagios service
check_command check_passive_timeout
check_freshness 1
freshness_threshold 93600 # 26 hours
normal_check_interval 604800 # 1 week
The check passive timeout command is a dummy script that alwaysreturn state UNKNOWN.
PRODUCTS • DEVELOPMENT • APPLICATION MANAGEMENT • IT OPERATIONS • SUPPORT • TRAINING
Munin and Nagios Integration with Nagios 2010-10-07 27 / 53
Integration with other systems
You can define your own contact text and command to interface with anysystem you need. Both the text and the command can use aText::Balanced template to get event data from Munin.
Command run defined in contact.example.command
Text for STDIN defined in contact.example.text
Default STDIN text is defined in: contact.default.text
PRODUCTS • DEVELOPMENT • APPLICATION MANAGEMENT • IT OPERATIONS • SUPPORT • TRAINING
Munin and Nagios Integration with Nagios 2010-10-07 28 / 53
Text::Balanced templates
The command and text templates use Text::Balanced. This providesvariables, tests and loops.
Example
[${var:group};${var:host}] -> ${var:graph_title} ->
warnings: ${loop<,>:wfields ${var:label}=${var:value}} /
criticals: ${loop<,>:cfields ${var:label}=${var:value}}
Example
[example.com;foo] -> HDD temperature -> warnings:
sde=29.00,sda=26.00,sdc=25.00,sdd=26.00,sdb=26.05
/ criticals:
PRODUCTS • DEVELOPMENT • APPLICATION MANAGEMENT • IT OPERATIONS • SUPPORT • TRAINING
Munin and Nagios Integration with Nagios 2010-10-07 29 / 53
Example: mail
Add the following to /etc/munin/munin.conf:
Example
contact.mail.command /usr/bin/mail -s "[...]" [email protected]
If contact.mail.text is not defined, it will fall back tocontact.default.text.The “[...]” should be replaced with a Text::Balanced template toprovide a useful subject.For mail, you do not need to provide a contact.mail.text, sincecontact.default.text is designed as a mail template.
PRODUCTS • DEVELOPMENT • APPLICATION MANAGEMENT • IT OPERATIONS • SUPPORT • TRAINING
Munin and Nagios Example graph set 2010-10-07 30 / 53
Outline
1 Introduction2 The Munin project3 Munin master and node4 Munin plugins5 Integration with Nagios6 Example graph set7 New features in Munin 1.48 New features in Munin 2.09 Speeding up munin10 The end
PRODUCTS • DEVELOPMENT • APPLICATION MANAGEMENT • IT OPERATIONS • SUPPORT • TRAINING
Munin and Nagios Example graph set 2010-10-07 31 / 53
Example graph set
Peak in network connections andtraffic. We look for correspondinggraphsNote: logarithmic scale on the“netstat” graph ensures detail is notlost to peaks.
PRODUCTS • DEVELOPMENT • APPLICATION MANAGEMENT • IT OPERATIONS • SUPPORT • TRAINING
Munin and Nagios Example graph set 2010-10-07 32 / 53
Example graph set
Apache HTTPD has a correspondingpeak in traffic.In this case this is a customerpublishing a new version of theirdevelopment tool.
PRODUCTS • DEVELOPMENT • APPLICATION MANAGEMENT • IT OPERATIONS • SUPPORT • TRAINING
Munin and Nagios Example graph set 2010-10-07 33 / 53
Example graph set
We check the server health. Do wehave a bottleneck? It does not looklike it, but there is a strange bump inI/O latency and utilisation notrelated to the network traffic. Also,the periods of 100% utilisation meanswe have to talk the customer intogetting faster disks or more RAM.
PRODUCTS • DEVELOPMENT • APPLICATION MANAGEMENT • IT OPERATIONS • SUPPORT • TRAINING
Munin and Nagios Example graph set 2010-10-07 34 / 53
Example graph set
Found more corresponding graphs:“MySQL”. We notice no cachedSELECTs? Contact the customer?No corresponding network traffic -looks like a local job. Probably worthmentioning to the customer, butdon’t panic.
PRODUCTS • DEVELOPMENT • APPLICATION MANAGEMENT • IT OPERATIONS • SUPPORT • TRAINING
Munin and Nagios New features in Munin 1.4 2010-10-07 35 / 53
Outline
1 Introduction2 The Munin project3 Munin master and node4 Munin plugins5 Integration with Nagios6 Example graph set7 New features in Munin 1.48 New features in Munin 2.09 Speeding up munin10 The end
PRODUCTS • DEVELOPMENT • APPLICATION MANAGEMENT • IT OPERATIONS • SUPPORT • TRAINING
Munin and Nagios New features in Munin 1.4 2010-10-07 36 / 53
Released in December 2009, after three months of solid developmentfollowing four years of having the project in “maintenance mode”.
23 committers, with contributions from many more
1500 changesets
100 new plugins, including JVM profiling plugins.
TLS / SSL support
Better SNMP Support
Multigraph plugins
Documentation improvements, including good per-plugindocumentation
Better non-Linux support
PRODUCTS • DEVELOPMENT • APPLICATION MANAGEMENT • IT OPERATIONS • SUPPORT • TRAINING
Munin and Nagios New features in Munin 1.4 2010-10-07 37 / 53
Multigraph plugins
“You are in a maze of twisty littlegraphs, all alike”. Multigraph pluginscreate a tree of graphs, all from oneplugin.
nested graphs
fast plugins
new scaling issues on the master
PRODUCTS • DEVELOPMENT • APPLICATION MANAGEMENT • IT OPERATIONS • SUPPORT • TRAINING
Munin and Nagios New features in Munin 2.0 2010-10-07 38 / 53
Outline
1 Introduction2 The Munin project3 Munin master and node4 Munin plugins5 Integration with Nagios6 Example graph set7 New features in Munin 1.48 New features in Munin 2.09 Speeding up munin10 The end
PRODUCTS • DEVELOPMENT • APPLICATION MANAGEMENT • IT OPERATIONS • SUPPORT • TRAINING
Munin and Nagios New features in Munin 2.0 2010-10-07 39 / 53
Version numbers
Starting with “munin 2.0”, the project has changed what the versionnumbers signify.
Major version - new features
Minor version - bug fixes
PRODUCTS • DEVELOPMENT • APPLICATION MANAGEMENT • IT OPERATIONS • SUPPORT • TRAINING
Munin and Nagios New features in Munin 2.0 2010-10-07 40 / 53
Asynchronous proxy node
This node contacts munin-node periodically, andstores the result. The master connects to the proxynode, and retrieves stored results.The master does not have to wait for nodes torespond.This will increase peak write loads on the master.
Munin master
Server
Async proxy node
Munin node
PRODUCTS • DEVELOPMENT • APPLICATION MANAGEMENT • IT OPERATIONS • SUPPORT • TRAINING
Munin and Nagios New features in Munin 2.0 2010-10-07 41 / 53
SSH transport
With Munin 1.4, we have SSH tunnelling. Starting with Munin 2.0, wehave a native SSH transport.
[old-style-host]
address host.example.com
[new-style-host]
address ssh://[email protected]:\
/path/to/stdio-enabled-node --params
For now, this requires the use of the “asynchronous proxy node”, to limitthe privileges of the SSH node.
PRODUCTS • DEVELOPMENT • APPLICATION MANAGEMENT • IT OPERATIONS • SUPPORT • TRAINING
Munin and Nagios New features in Munin 2.0 2010-10-07 42 / 53
Zooming graphs
No longer locked to specific time periods, you can now drill down in thegraphs to look at interesting time periods.
PRODUCTS • DEVELOPMENT • APPLICATION MANAGEMENT • IT OPERATIONS • SUPPORT • TRAINING
Munin and Nagios New features in Munin 2.0 2010-10-07 43 / 53
Multi master
With the “Asynchronous Proxy Node”, wecan also support multi master setupswithout conflicts.One master for the customer, one for thehosting provider.
Munin node
Munin master
(provider)
Munin master
(customer)
Munin node
Munin node
Munin node
PRODUCTS • DEVELOPMENT • APPLICATION MANAGEMENT • IT OPERATIONS • SUPPORT • TRAINING
Munin and Nagios New features in Munin 2.0 2010-10-07 44 / 53
Unresolved issues in 2.0
There is a fair bit of work left before 2.0 can be released.
Performance and scaling
Functional and pretty HTML (got functional so far)
Whatever we broke in 1.4...
PRODUCTS • DEVELOPMENT • APPLICATION MANAGEMENT • IT OPERATIONS • SUPPORT • TRAINING
Munin and Nagios Speeding up munin 2010-10-07 45 / 53
Outline
1 Introduction2 The Munin project3 Munin master and node4 Munin plugins5 Integration with Nagios6 Example graph set7 New features in Munin 1.48 New features in Munin 2.09 Speeding up munin10 The end
PRODUCTS • DEVELOPMENT • APPLICATION MANAGEMENT • IT OPERATIONS • SUPPORT • TRAINING
Munin and Nagios Speeding up munin 2010-10-07 46 / 53
Scaling munin
On a 1 CPU (2 threads) system, 65k RRDfiles are...
Updated in 1 minute
Graphed in 40 minutes
The cron job runs every 5 minutes; Houston,we have a problem.
PRODUCTS • DEVELOPMENT • APPLICATION MANAGEMENT • IT OPERATIONS • SUPPORT • TRAINING
Munin and Nagios Speeding up munin 2010-10-07 47 / 53
FastCGI
FastCGI to the rescue!This gives us graphs on demand. When we have a large number of graphs,it makes little sense to update them all.The “munin 2.0” CGI grapher can make zoom-able graphs for any timeperiod.We still need a CGI HTML generator to reduce resource usage for largegraph sets. This is not implemented yet.
PRODUCTS • DEVELOPMENT • APPLICATION MANAGEMENT • IT OPERATIONS • SUPPORT • TRAINING
Munin and Nagios Speeding up munin 2010-10-07 48 / 53
Storage tuning
Enough RAM for file system caching
Everything on SSD
OpenSolaris / FreeBSD: ZFSZIL/L2ARC
Linux: FlashCache
Turn off atime
Linux ext3/ext4: mount optiondata=journal
PRODUCTS • DEVELOPMENT • APPLICATION MANAGEMENT • IT OPERATIONS • SUPPORT • TRAINING
Munin and Nagios The end 2010-10-07 49 / 53
Outline
1 Introduction2 The Munin project3 Munin master and node4 Munin plugins5 Integration with Nagios6 Example graph set7 New features in Munin 1.48 New features in Munin 2.09 Speeding up munin10 The end
PRODUCTS • DEVELOPMENT • APPLICATION MANAGEMENT • IT OPERATIONS • SUPPORT • TRAINING
Munin and Nagios The end 2010-10-07 50 / 53
Munin links
http://munin-monitoring.org/ - project, documentation, bugtracker, svn source code access
http://exchange.munin-monitoring.org/ Extra plugins forMunin contributed by the Munin community
http://munin-monitoring.org/wiki/HowToContactNagios
In-depth configuration examples for Munin / Nagios integration
http://munin-monitoring.org/wiki/MuninAlertVariables
What to put in the contact.example.text when making templates
PRODUCTS • DEVELOPMENT • APPLICATION MANAGEMENT • IT OPERATIONS • SUPPORT • TRAINING
Munin and Nagios The end 2010-10-07 51 / 53
Munin book
Gabriele Pohl and Michael Rennerhave published an entire, thoroughlywritten book on Munin in German:“Munin - Graphisches Netzwerk- undSystem-Monitoring” (ISBN978-3-937514-48-2), published byOpen Source Press in cooperationwith Linpro.
PRODUCTS • DEVELOPMENT • APPLICATION MANAGEMENT • IT OPERATIONS • SUPPORT • TRAINING
Munin and Nagios The end 2010-10-07 52 / 53
Questions from the audience?
? ? ? ??? ? ?? ??? ? ? ? ? ? ? ? ??? ?? ? ? ??? ?? ??? ? ?? ? ? ? ?? ?
I was going to write “If I timed thiscorrectly, we should now have around10 minutes for questions”.I decided against it. What if I missedthe time? That would beembarrassing.
PRODUCTS • DEVELOPMENT • APPLICATION MANAGEMENT • IT OPERATIONS • SUPPORT • TRAINING