+ All Categories
Home > Technology > Computer monitoring with the Open Monitoring Distribution

Computer monitoring with the Open Monitoring Distribution

Date post: 10-May-2015
Category:
Upload: kelvin-vanderlip
View: 34,661 times
Download: 5 times
Share this document with a friend
Description:
Kelvin Vanderlip's slides (as a PDF) for his 3/1/2012 talk to the UULAC-LA group on Nagios, check_mk and the "Open Monitoring Distribution"
Popular Tags:
53
Monitoring with Open Monitoring Distro Kel Vanderlip 3-1-2012 UUASC-LA 1 Kelvin Vanderlip Oracle Linux systems administrator, Sunrider International, Torrance [email protected] 1. Overview of Nagios 2. Check_MK 3. What is the “Open Monitoring Distribution”? 4. Operating a monitoring system TONIGHT’S OUTLINE:
Transcript
Page 1: Computer monitoring with the Open Monitoring Distribution

Monitoring with Open Monitoring Distro

Kel Vanderlip 3-1-2012 UUASC-LA 1

Kelvin Vanderlip Oracle Linux systems administrator, Sunrider International, Torrance [email protected]

1. Overview of Nagios 2. Check_MK 3. What is the “Open Monitoring Distribution”? 4. Operating a monitoring system

TONIGHT’S OUTLINE:

Page 2: Computer monitoring with the Open Monitoring Distribution

Monitoring with Open Monitoring Distro

Kel Vanderlip 3-1-2012 UUASC-LA 2

Better to remain silent and be thought a fool than to speak out and remove all doubt. -- Abraham Lincoln (also attr. Confucius)

A thought for the night:

Page 3: Computer monitoring with the Open Monitoring Distribution

Monitoring with Open Monitoring Distro

Kel Vanderlip 3-1-2012 UUASC-LA 3

In the beginning…

Page 4: Computer monitoring with the Open Monitoring Distribution

Monitoring with Open Monitoring Distro

Kel Vanderlip 3-1-2012 UUASC-LA 4

•You choose a job in which success depend on hard disks, NFS, DNS, DHCP, NIS, mgetty, Cron jobs, postfix, routing, FTP, swap space, fans, UPS systems, switches, CPU registers…

Why do you care about monitoring?

Page 5: Computer monitoring with the Open Monitoring Distribution

Monitoring with Open Monitoring Distro

Kel Vanderlip 3-1-2012 UUASC-LA 5

So you ask you sole staffer “Is it running” He says “I don’t know. Can I install NetSaint?”

Page 6: Computer monitoring with the Open Monitoring Distribution

Monitoring with Open Monitoring Distro

Kel Vanderlip 3-1-2012 UUASC-LA 6

“NetSaint is not affiliated with World Wide Digital Security, Inc. (WWDSI); Richard S. Carson and Associates, Inc; and the marks WEB SAINT, SAINT, SAINTWRITER, SAINTEXPRESS, and SAINTBASIC owned by Richard S. Carson and Associate”

Time passes:

Page 7: Computer monitoring with the Open Monitoring Distribution

Monitoring with Open Monitoring Distro

Kel Vanderlip 3-1-2012 UUASC-LA 7

“This website stands as as testament to a long-running Open Source project that began with a simple idea in my mind. I had no inkling of the future success that NetSaint (and later Nagios) would come by. I almost never released it to the OSS community, but thank goodness I did. For without the constant flow of ideas from NetSaint and Nagios users, the project would have died off a long time ago. Cheers to everyone in the community who has participated in this project at some point in their life. My hat is off to you... -Ethan Galstad: Creator, Developer, Founder of NetSaint, Nagios, and Nagios Enterprises -and happy participant in a wider movement”

Meet Ethan Galstadt:

Page 8: Computer monitoring with the Open Monitoring Distribution

Monitoring with Open Monitoring Distro

Kel Vanderlip 3-1-2012 UUASC-LA 8

As I said, how do you get started in the monitoring business?

Page 9: Computer monitoring with the Open Monitoring Distribution

Monitoring with Open Monitoring Distro

Kel Vanderlip 3-1-2012 UUASC-LA 9

Your server room grows, and you are still asking yourself: “Is it still working?”

Page 10: Computer monitoring with the Open Monitoring Distribution

Monitoring with Open Monitoring Distro

Kel Vanderlip 3-1-2012 UUASC-LA 10

Nagios is a scheduling engine. It is written in C. In runs on Linux. Its an RPM and a DEB. Input: Text configuration files (lots and lots!) Output: Schedule many forks to run external monitoring applications, some locally, some on remote servers. Input: Each called monitoring application returns status and performance information Output: status.dat, a “snapshot” text file kept up to date several times a minute describing the last state for each thing Nagios is checking

All about Nagios:

Page 11: Computer monitoring with the Open Monitoring Distribution

Monitoring with Open Monitoring Distro

Kel Vanderlip 3-1-2012 UUASC-LA 11

######################################## # NAGIOS STATE RETENTION FILE # # THIS FILE IS AUTOMATICALLY GENERATED # BY NAGIOS. DO NOT MODIFY THIS FILE! ######################################## info { created=1330182965 version=3.2.3 last_update_check=0 update_available=0 update_uid=1330021387 last_version= new_version= } program { modified_host_attributes=0 modified_service_attributes=0 enable_notifications=1 active_service_checks_enabled=1 passive_service_checks_enabled=1 active_host_checks_enabled=1 passive_host_checks_enabled=1 enable_event_handlers=1 obsess_over_services=0 obsess_over_hosts=0 check_service_freshness=1 check_host_freshness=0 enable_flap_detection=1 enable_failure_prediction=1 process_performance_data=1 global_host_event_handler= global_service_event_handler= next_comment_id=40 next_downtime_id=1 next_event_id=572 next_problem_id=290 next_notification_id=457 }

host { host_name=Compellent modified_attributes=0 check_command=check-mk-ping check_period=24X7 notification_period=24X7 event_handler= has_been_checked=1 check_execution_time=0.013 check_latency=0.135 check_type=0 current_state=0 last_state=0 last_hard_state=0 last_event_id=0 current_event_id=0 current_problem_id=0 last_problem_id=0 plugin_output=OK - 10.10.99.79: rta 0.785ms, lost 0% long_plugin_output= performance_data=rta=0.785ms;200.000;500.000;0; pl=0%;40;80;; rtmax=1.628ms;;;; rtmin=0.426ms;;;; last_check=1330182913 next_check=1330182974 check_options=0 current_attempt=1 max_attempts=1 normal_check_interval=1.000000 retry_check_interval=1.000000 state_type=1 last_state_change=1330021647 last_hard_state_change=1330021647 last_time_up=1330182914 last_time_down=0 last_time_unreachable=0 notified_on_down=0 notified_on_unreachable=0 last_notification=0 current_notification_number=0 current_notification_id=0 notifications_enabled=1 state_history=0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0 }

Status.dat is updated 3-6 times a minute:

Page 12: Computer monitoring with the Open Monitoring Distribution

Monitoring with Open Monitoring Distro

Kel Vanderlip 3-1-2012 UUASC-LA 12

More status.dat:

service { host_name=ebs-soa1 service_description=CPU load modified_attributes=0 check_command=check_mk-cpu.loads check_period=24X7 notification_period=24X7 event_handler= has_been_checked=1 check_execution_time=0.000 check_latency=0.316 check_type=1 current_state=0 last_state=0 last_hard_state=0 last_event_id=0 current_event_id=0 current_problem_id=0 last_problem_id=0 current_attempt=1 max_attempts=1 normal_check_interval=1.000000 retry_check_interval=1.000000 state_type=1 last_state_change=1330021648 last_hard_state_change=1330021648 last_time_ok=1330182934 last_time_warning=0 last_time_unknown=0 last_time_critical=0 plugin_output=OK - 15min Load 0.19 at 8 CPUs long_plugin_output= performance_data=load1=0.25;40;80;0; load5=0.25;40;80;0; load15=0.19;40;80;0; last_check=1330182934 next_check=0

check_options=0 notified_on_unknown=0 notified_on_warning=0 notified_on_critical=0 current_notification_number=0 current_notification_id=0 last_notification=0 notifications_enabled=1 active_checks_enabled=0 passive_checks_enabled=1 event_handler_enabled=0 problem_has_been_acknowledged=0 acknowledgement_type=0 flap_detection_enabled=1 failure_prediction_enabled=1 process_performance_data=1 obsess_over_service=1 is_flapping=0 percent_state_change=0.00 check_flapping_recovery_notification=0 state_history=0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0 }

The file goes on for about 4 more megabytes. You will never read this.

Page 13: Computer monitoring with the Open Monitoring Distribution

Monitoring with Open Monitoring Distro

Kel Vanderlip 3-1-2012 UUASC-LA 13

Nagios’ CGI is a visualization engine. It is written in C. In runs on Linux. Input: status.dat Output: web pages describing what’s in status.dat Input: Mouse clicks from the operator Output: changes to what is viewed, and changes to Nagios’ current state

When you think of Nagios, you think of its web output:

Page 14: Computer monitoring with the Open Monitoring Distribution

Monitoring with Open Monitoring Distro

Kel Vanderlip 3-1-2012 UUASC-LA 14

Page 15: Computer monitoring with the Open Monitoring Distribution

Monitoring with Open Monitoring Distro

Kel Vanderlip 3-1-2012 UUASC-LA 15

There are lots of books about using Nagios. I have read most of them, and they all helped me out. A good Nagios implementation is a study in organizational behavior. If you run Nagios, and you find that no one else in the organization ever fixes anything Based on Nagios findings, stop looking at the screen and start talking to people. Socially, using Nagios successfully forces you to involve your co-workers. They Have to “buy in” to the Nagios outputs, which means they have to understand what it does and how it reports its findings. Managerially, keeping the email notification flood from Nagios under control is a pre-requisite if you want anyone to actually use an email as a basis for corrective action. Festival (a loud speaker) and SMS work great. An report based on an SQL query works great. Creating a Navy-like “Officer of the watch” worked in the U.K.

Page 16: Computer monitoring with the Open Monitoring Distribution

Monitoring with Open Monitoring Distro

Kel Vanderlip 3-1-2012 UUASC-LA 16

Doing Nagios means: Visit a server, and poke around. List what is important to check; For each important thing you want to check: Find or write some code to check it Set limits which your code can test to decide whether what you are checking is OK, or not Schedule the code to run over and over

If the test is not OK, send a message to the interested party (email seems to be a favorite).

Page 17: Computer monitoring with the Open Monitoring Distribution

Monitoring with Open Monitoring Distro

Kel Vanderlip 3-1-2012 UUASC-LA 17

Find or write some code to check it Grab a check from Nagios libexec apps – C, Perl, Python, bash – and put it where it can perform the check Set limits so your code can decide whether it’s OK or not

Configure command line parameters for the check where it will be called Schedule the code to run over and over

Reconfigure Nagios’s inputs to include the check and run it, perhaps using a transport mechanism

If the test is not OK, send a message to the interested party

Nagios checks return state to Nagios, which can fork to send notifications

Page 18: Computer monitoring with the Open Monitoring Distribution

Monitoring with Open Monitoring Distro

Kel Vanderlip 3-1-2012 UUASC-LA 18

Nagios transport systems: ACTIVE CHECKS

PASSIVE CHECKS

EXPORT STATE

SSH WORKS AS WELL…

Page 19: Computer monitoring with the Open Monitoring Distribution

Monitoring with Open Monitoring Distro

Kel Vanderlip 3-1-2012 UUASC-LA 19

PROBLEMS WITH THE TRADITIONAL NAGIOS APPROACH: How many times do you have to visit each server? How many times do you have to modify Nagios’s input files? How many times to you discover something you are not monitoring? Is all this worth it?

Page 20: Computer monitoring with the Open Monitoring Distribution

Monitoring with Open Monitoring Distro

Kel Vanderlip 3-1-2012 UUASC-LA 20

Home is no better. Can you count the servers?

Page 21: Computer monitoring with the Open Monitoring Distribution

Monitoring with Open Monitoring Distro

Kel Vanderlip 3-1-2012 UUASC-LA 21

Welcome to Check_MK!

Page 22: Computer monitoring with the Open Monitoring Distribution

Monitoring with Open Monitoring Distro

Kel Vanderlip 3-1-2012 UUASC-LA 22

So how about a new approach to managing Nagios? 1. Write a shell script which check everything you can think of checking on a Linux

box in one operation 2. Send this script to each server once. Configure each server’s xinetd so that the

script can be called using port 6556 3. Remotely run this script and feed its output to a process which writes a separate

Nagios configuration for each “service” found 4. Schedule Nagios to run a single check once a minute: call the remote shell script

over port 6556, and process the result in the “check” itself 5. The check returns each individual “service” measurement it finds to Nagios by

writing to the Nagios passive “external command file”

Page 23: Computer monitoring with the Open Monitoring Distribution

Monitoring with Open Monitoring Distro

Kel Vanderlip 3-1-2012 UUASC-LA 23

Write a shell script which check everything you can think of checking on a Linux box in one operation

It is already written for you by M.K., for HP-UX, Linux and Windows, probably others. Send this script to each server once. Configure each server’s xinetd so that the

script can be called using port 6556 Installing the shell script, creating directories, and reconfiguring and restarting xinetd

are done for you by the check_mk_agent.rpm or .deb

Page 24: Computer monitoring with the Open Monitoring Distribution

Monitoring with Open Monitoring Distro

Kel Vanderlip 3-1-2012 UUASC-LA 24

OMD[torrance]:~$ scp /home/kelvinv/check_mk-agent-1.1.12p6-1.noarch.rpm

root@ebsprod-is1:

root@ebsprod-is1's password:

OMD[torrance]:~/etc/check_mk$ ssh ebsprod-is1

torrance@ebsprod-is1's password:

[root@ebsprod-is1 ~]# rpm -Uhv check_mk-agent-1.1.12p6-1.noarch.rpm

Preparing... ###########################################

[100%]

1:check_mk-agent ###########################################

[100%]

Activating startscript of xinetd

Reloading xinetd...

Reloading configuration: [ OK ]

Getting the check_mk agent installed on a Linux box:

Page 25: Computer monitoring with the Open Monitoring Distribution

Monitoring with Open Monitoring Distro

Kel Vanderlip 3-1-2012 UUASC-LA 25

#!/bin/bash

# +------------------------------------------------------------------+

# | ____ _ _ __ __ _ __ |

# | / ___| |__ ___ ___| | __ | \/ | |/ / |

# | | | | '_ \ / _ \/ __| |/ / | |\/| | ' / |

# | | |___| | | | __/ (__| < | | | | . \ |

# | \____|_| |_|\___|\___|_|\_\___|_| |_|_|\_\ |

# | |

# | Copyright Mathias Kettner 2010 [email protected] |

# +------------------------------------------------------------------+

#

# This file is part of Check_MK.

# The official homepage is at http://mathias-kettner.de/check_mk.

#

# check_mk is free software; you can redistribute it and/or modify it

# under the terms of the GNU General Public License as published by

# the Free Software Foundation in version 2. check_mk is distributed

# in the hope that it will be useful, but WITHOUT ANY WARRANTY; with-

# out even the implied warranty of MERCHANTABILITY or FITNESS FOR A

# PARTICULAR PURPOSE. See the GNU General Public License for more de-

# ails.

# Remove locale settings to eliminate localized outputs where possible

export LC_ALL=C

unset LANG

export MK_LIBDIR="/usr/lib/check_mk_agent"

export MK_CONFDIR="/etc/check_mk"

# Make sure, locally installed binaries are found

PATH=$PATH:/usr/local/bin

Page 26: Computer monitoring with the Open Monitoring Distribution

Monitoring with Open Monitoring Distro

Kel Vanderlip 3-1-2012 UUASC-LA 26

echo '<<<check_mk>>>'

echo Version: 1.1.12p6

echo AgentOS: linux

echo PluginsDirectory: $PLUGINSDIR

echo LocalDirectory: $LOCALDIR

echo AgentDirectory: $MK_CONFDIR

# If we are called via xinetd, try to find only_from configuration

if [ -n "$REMOTE_HOST" ]

then

echo -n 'OnlyFrom: '

echo $(sed -n

'/^service[[:space:]]*check_mk/,/}/s/^[[:space:]]*only_from[[:space:]]*=[[:space:]]*\(.*\)/\1/p'

/etc/xinetd.d/* | head -n1)

fi

# Partitionen (-P verhindert Zeilenumbruch bei langen Mountpunkten)

# Achtung: NFS-Mounts werden grundsaetzlich ausgeblendet, um

# Haenger zu vermeiden. Diese sollten ohnehin besser auf dem

# Server, als auf dem Client ueberwacht werden.

echo '<<<df>>>'

df -PTlk -x smbfs -x tmpfs -x cifs -x iso9660 -x udf -x nfsv4 | sed 1d

# VMWare shows its own filesystems with 'vdf'. Just one

# problem: it outputs not 7 but only 6 columns

if which vdf > /dev/null

then

vdf -P | grep ^/vmfs/volumes | sed 's/ / vmfs /'

fi

More tests in check_mk_agent.linux:

Page 27: Computer monitoring with the Open Monitoring Distribution

Monitoring with Open Monitoring Distro

Kel Vanderlip 3-1-2012 UUASC-LA 27

# Check mount options. Filesystems may switch to 'ro' in case

# of a read error.

echo '<<<mounts>>>'

grep ^/dev < /proc/mounts

# processes including username, without kernel processes

echo '<<<ps>>>'

ps ax -o user,vsz,rss,pcpu,command --columns 10000 | sed -e 1d -e 's/ *\([^

]*\) *\([^ ]*\) *\([^ ]*\) *\([^ ]*\) */(\1,\2,\3,\4) /'

More tests in check_mk_agent.linux:

Page 28: Computer monitoring with the Open Monitoring Distribution

Monitoring with Open Monitoring Distro

Kel Vanderlip 3-1-2012 UUASC-LA 28

Connected to nagios.

Escape character is '^]'.

<<<check_mk>>>

Version: 1.1.12p6

AgentOS: linux

PluginsDirectory: /usr/lib/check_mk_agent/plugins

LocalDirectory: /usr/lib/check_mk_agent/local

AgentDirectory: /etc/check_mk

OnlyFrom:

<<<df>>>

/dev/mapper/tom--rp--debian-root ext3 9607396 3714444 5404916 41% /

/dev/xvda1 ext2 233191 30735 190015 14% /boot

/dev/xvdb1 ext4 30961664 7003764 22385140 24% /opt

<<<nfsmounts>>>

<<<mounts>>>

/dev/mapper/tom--rp--debian-root / ext3 rw,relatime,errors=remount-ro,data=ordered 0 0

/dev/xvda1 /boot ext2 rw,relatime,errors=continue 0 0

/dev/xvdb1 /opt ext4 rw,relatime,barrier=1,data=ordered 0 0

<<<ps>>>

(root,8356,808,0.0) init [2]

(root,0,0,0.0) [kthreadd]

(root,0,0,0.0) [migration/0]

(root,0,0,0.0) [ksoftirqd/0]

(root,0,0,0.0) [ksoftirqd/1]

/etc/

Running check_mk_agent – “telnet <remote host> 6556” :

Page 29: Computer monitoring with the Open Monitoring Distribution

Monitoring with Open Monitoring Distro

Kel Vanderlip 3-1-2012 UUASC-LA 29

Remotely run this script and feed its output to a process which writes a separate Nagios configuration for each “service” found

Edit /opt/omd/<site>/etc/check_mk/main.mk to add a host, or use WATO, then: > check_mk –I <hostname>

Schedule Nagios to run a single check once a minute: call the remote shell script

over port 6556, and process the result in the “check” itself

The check returns each individual “service” measurement it finds to Nagios by writing to the Nagios passive “external command file”

> check_mk –O

Update the whole Nagios configuration for a server which has a new

configuration

> check_mk –II <hostname>

> check-mk -O And loop forever…

Page 30: Computer monitoring with the Open Monitoring Distribution

Monitoring with Open Monitoring Distro

Kel Vanderlip 3-1-2012 UUASC-LA 30

Again, to test and see what check_mk_agent will report on your server, install the RPM and then, locally, run

> telnet localhost 6556

To see what configuration has been created for Nagios, look at these files on the Nagios server: > less /opt/omd/<site>/etc/nagios/conf.d/check_mk_objects.cfg

> less /opt/omd/<site>/etc/nagios/conf.d/check_mk_templates.cfg

Page 31: Computer monitoring with the Open Monitoring Distribution

Monitoring with Open Monitoring Distro

Kel Vanderlip 3-1-2012 UUASC-LA 31

Long-term performance history is important. SLAs, correlating things over time. Nagios keeps almost no history. MySQL can save history, but needs maintenance (it fills up and Nagios stalls). Besides, you still need to “visualize” what is going on. RRD is a great database service to keep temporal history. It never fills up. RRD includes visualization tools (graphs) Traditionally, it has been a job to incorporate RRD into Nagios, usually using 3rd party packages.

Charts and graphs:

Page 32: Computer monitoring with the Open Monitoring Distribution

Monitoring with Open Monitoring Distro

Kel Vanderlip 3-1-2012 UUASC-LA 32

“PNP is an addon to Nagios which analyzes performance data provided by plugins and stores them automatically into RRD-databases (Round Robin Databases, see RRD Tool). During development of PNP we set value on easy installation and little maintenance while running it. An administrator should do other things than configure graphing tools. “

Page 33: Computer monitoring with the Open Monitoring Distribution

Monitoring with Open Monitoring Distro

Kel Vanderlip 3-1-2012 UUASC-LA 33

Besides configuring Nagios and RRD, other things an administrator should be doing include Documentation. Wouldn’t it be great if you could move between the Nagios CGI screens, the PNP4Nagios Charts and a documentation Wiki? DokuWiki has worked for me. Also used in OMD for users, passwords, privileges across OMD applications (eg NagViz)

“DokuWiki is a standards compliant, simple to use Wiki, mainly aimed at creating documentation of any kind. It is targeted at developer teams, workgroups and small companies. It has a simple but powerful syntax which makes sure the datafiles remain readable outside the Wiki and eases the creation of structured texts. All data is stored in plain text files – no database is required. “

Page 34: Computer monitoring with the Open Monitoring Distribution

Monitoring with Open Monitoring Distro

Kel Vanderlip 3-1-2012 UUASC-LA 34

Here’s the punch line:

Page 35: Computer monitoring with the Open Monitoring Distribution

Monitoring with Open Monitoring Distro

Kel Vanderlip 3-1-2012 UUASC-LA 35

OMD Quick introduction First install the package matching your operating system: # zypper install omd-0.50-sles11sp1-25.x86_64.rpm Now create a monitoring instance (OMD calls this "site"): # omd create UULAC And let's start Nagios and all other processes: # omd start UULAC Other OMD features: •Run several monitoring sites in parallel •Install and use several different versions of OMD in parallel •Easily update, duplicate, rename and manage sites

Page 36: Computer monitoring with the Open Monitoring Distribution

Monitoring with Open Monitoring Distro

Kel Vanderlip 3-1-2012 UUASC-LA 36

nagios-3.2.3 The current version Nagios nagios-plugins Standard external apps which take and report measurements Nsca The listener for passive checks from remote servers check_nrpe The check application which calls checks on remote hosts

Shinken-0.6.99 (drop-in Nagios replacement, a whole world to explore) Nagvis The management-level view of state – live maps, schematics Pnp4nagios RRD and useful graphs. Compare services across hosts. rrdtool/rrdcached Check_MK God’s gift to the sysadmin MK Livestatus replace status.dat with a callable data provider Multisite Easily add additional monitoring sites.

What OMD contains, Page 1

Page 37: Computer monitoring with the Open Monitoring Distribution

Monitoring with Open Monitoring Distro

Kel Vanderlip 3-1-2012 UUASC-LA 37

What OMD contains, Page 2

Dokuwiki A nice no-SQL wiki linked from Check_MK’s screens Thruk A Perl CGI to view Nagios state (unexplored) Mod-Gearman Process queue manager, reduces Nagios fork load check_logfiles Locally read log files and report to Nagios check_oracle_health Locally perform several Oracle DB checks check_mysql_health Locally perform several Oracle DB checks Jmx4perl (unexplored) check_webinject wget-like web site checker, easy to use from Nagios check_multi The all singing, all dancing, Python-writing Nagios check

Page 38: Computer monitoring with the Open Monitoring Distribution

Monitoring with Open Monitoring Distro

Kel Vanderlip 3-1-2012 UUASC-LA 38

The Check_MK dashboard (actually called “Multisite”):

Page 39: Computer monitoring with the Open Monitoring Distribution

Monitoring with Open Monitoring Distro

Kel Vanderlip 3-1-2012 UUASC-LA 39

# KCV Dec 2011

snmp_default_community = 'public'

snmp_communities = [

( "SunriderR0!", ["UCS"] ),

( "SunriderR0!", ["Compellent"] ),

]

monitoring_host = "nagios",

ntp_default_levels = (10, 80.0, 110.0)

# hosts not added in WATO

all_hosts = [

"copy-server|linux|dev|tcp",

"ebsprod-ap1|linux|dev|tcp",

"ebsprod-ap2|linux|dev|tcp",

"ebsprod-db1|linux|dev|tcp",

"ebsprod-db2|linux|dev|tcp",

"fortunedelight|linux|dev|tcp",

"ip158|linux|dev|tcp",

"istore-1|linux|dev|tcp",

"istore-uat|linux|dev|tcp",

"landing-page|windows|ping",

"pci-kickstart|dev|linux|tcp",

"soa11g|linux|dev|tcp",

"xbiz1-ap1|linux|dev|tcp",

"xbiz3-db1|linux|dev|tcp",

"xuat1-is1|linux|dev|tcp",

]

You configure check_mk by editing ~/etc/check_mk/main.mk:

Page 40: Computer monitoring with the Open Monitoring Distribution

Monitoring with Open Monitoring Distro

Kel Vanderlip 3-1-2012 UUASC-LA 40

extra_nagios_conf += r"""

define command {

command_name check-ping

command_line $USER1$/check_ping -H $HOSTADDRESS$ -w $ARG1$ -c $ARG2$ -p 5

}

define command{

command_name check_dp_pool

command_line $USER1$/check_dp_pool.pl -w $ARG1$ -c $ARG2$ $ARG3$

}

define command{

command_name check_by_ssh_kel

command_line $USER1$/check_by_ssh -H $HOSTADDRESS$ -t 20 -l kelvinv -C $ARG1$

}

define command{

command_name check_sunrider_dig

command_line $USER1$/check_dig -l $HOSTNAME$.sunrider.com -H 10.10.2.1

}

define command{

command_name check_gearman

command_line $USER1$/check_gearman -H localhost

}

define command{

command_name check_http

command_line $USER1$/check_http -H $HOSTADDRESS$ -s $ARG1$

}

""“

Getting anything else into Nagios’s cfg file:

Page 41: Computer monitoring with the Open Monitoring Distribution

Monitoring with Open Monitoring Distro

Kel Vanderlip 3-1-2012 UUASC-LA 41

extra_service_conf["notification_options"] = [

( "n", ALL_HOSTS, ["NTP Time"] ),

( "n", ALL_HOSTS, ["CUPS Queue.*"] ),

( "c,r", ALL_HOSTS, ["Ping"] ),

( "n", ALL_HOSTS, ["lpstat_queue"] ),

( "n", ALL_HOSTS, ["Gearman"] ),

]

extra_service_conf["normal_check_interval"] = [

( "2", ["hp"], ALL_HOSTS, ["Check_MK"] ),

( "5", ["db"], ALL_HOSTS, ["ASM disk"] ),

( "3", ALL_HOSTS, ["YP client"] ),

]

Modifying what check_mk –II writes into the nagios .cfg files:

Page 42: Computer monitoring with the Open Monitoring Distribution

Monitoring with Open Monitoring Distro

Kel Vanderlip 3-1-2012 UUASC-LA 42

legacy_checks = [

( ( "check-ping!500,10%!1000,20%", "Ping", True), [ "tcp" ], ALL_HOSTS ),

( ( "check_sunrider_dig", "DNS Entry", True), [ "tcp" ], ALL_HOSTS ),

( ( "check_dp_pool!600000!300000!VAULT", "pool_VAULT", True), [ "sunuxdp" ] ),

( ( "check_gearman", "Gearman", True), [ "nagios" ] ),

( ( "check_http!'ibeCCtdMinisites.jsp?language=US'", "iStore-1 web", True), [ "istore-1" ]

),

( ( "check_by_ssh_kel!'/opt/nrpe/libexec/check_ypwhich.sh'","YP client", True), [

"itauxap1", "itauxap2" ] ),

Add active checks (not run by check_mk_agent) into the Nagios cfg file:

Page 43: Computer monitoring with the Open Monitoring Distribution

Monitoring with Open Monitoring Distro

Kel Vanderlip 3-1-2012 UUASC-LA 43

host_groups = [

( "Production", [ "prod" ], ALL_HOSTS ),

( "Test", [ "test" ], ALL_HOSTS ),

( "Development", [ "dev" ], ALL_HOSTS ),

( "Production PCI", [ "prod_pci" ], ALL_HOSTS ),

( "Business Analyst", [ "ba" ], ALL_HOSTS ),

( "Backup", [ "backup" ], ALL_HOSTS ),

( "Database", [ "db" ], ALL_HOSTS ),

( "Application", [ "ap" ], ALL_HOSTS ),

( "Storage", [ "store" ], ALL_HOSTS ),

( "Monitors", [ "mon" ], ALL_HOSTS ),

( "Infrastructure", [ "infra" ], ALL_HOSTS ),

( "Networking", [ "net" ], ALL_HOSTS ),

( "Physical", [ "phy" ], ALL_HOSTS ),

( "VMware", [ "vmware" ], ALL_HOSTS ),

( "Xen", [ "xen" ], ALL_HOSTS ),

( "Oracle VM", [ "oravm" ], ALL_HOSTS ),

( "HP-UX", [ "hp" ], ALL_HOSTS ),

( "Linux", [ "linux" ], ALL_HOSTS ),

( "otheros", [ "Other OS" ], ALL_HOSTS ),

( "priv", [ "Private" ], ALL_HOSTS ),

]

Page 44: Computer monitoring with the Open Monitoring Distribution

Monitoring with Open Monitoring Distro

Kel Vanderlip 3-1-2012 UUASC-LA 44

ignored_services = [

( [ "eprodap1", "eprodap2", "eproddb1", "eproddb2", "ebspatch-ap2", "ebstest-ap1",

"ebstest-db1", "ip94", "ip88pci" ], [ "IPMI Sensor Fan_Fan_[0-9]$"] ),

( [ "eprodap1", "eprodap2", "eproddb1", "eproddb2", "ebspatch-ap2", "ebstest-ap1",

"ebstest-db1", "ip94", "ip88pci" ], [ "IPMI Sensor Power_Unit_VRM_[0-9]$"] ),

( [ "itauxap2" ], [ "Logical Volume /dev/vg00/lvol2$" ] ),

( [ "itauxdev"], [ "asm_procs$" ], ),

]

Tell check_mk to ignore data returned by check_mk_agent:

Page 45: Computer monitoring with the Open Monitoring Distribution

Monitoring with Open Monitoring Distro

Kel Vanderlip 3-1-2012 UUASC-LA 45

check_parameters = [

( (90, 95), [ "copy-server" ], [ "fs_/u99$" ] ),

( (90, 95), [ "ebspatch-ap2" ], [ "fs_/u01$" ] ),

( (96, 98), [ "ebspatch-ap2" ], [ "fs_/u01/oracle/EBSTEST/db/apps_st/data$" ] ),

( (92, 96), [ "ebstest-ap1" ], [ "fs_/u01$" ] ),

( (93, 95), [ "ebs-ap1", "ebs-db1", "ebs-ap2", "ebs-db2“ ], [ "fs_/u01$" ] ),

( (75, 150), [ "ebs-db3" ], [ "ORA EBSAP31 Sessions$" ] ),

( (90, 90), [ "eprodap1" ], [ "fs_/u01$" ] ),

( (85, 90), [ "eprodap1", "eprodap2" ], [ "fs_/home$" ] ),

( (85, 90), [ "eprodap1", "eprodap2", "eproddb1“ ], [ "fs_/u01/storage$" ] ),

]

Tell check_mk to use these warn and critical parameters. Used by check_mk, results passed into Nagios as passive checks. Not in Nagios cfg files!

Page 46: Computer monitoring with the Open Monitoring Distribution

Monitoring with Open Monitoring Distro

Kel Vanderlip 3-1-2012 UUASC-LA 46

OMD[torrance]:~/etc/check_mk$ locate check_mk_agent

/opt/omd/versions/0.50/share/check_mk/agents/check_mk_agent.aix

/opt/omd/versions/0.50/share/check_mk/agents/check_mk_agent.freebsd

/opt/omd/versions/0.50/share/check_mk/agents/check_mk_agent.hpux

/opt/omd/versions/0.50/share/check_mk/agents/check_mk_agent.linux

/opt/omd/versions/0.50/share/check_mk/agents/check_mk_agent.solaris

/opt/omd/versions/0.50/share/check_mk/agents/windows/check_mk_agent.cc

/opt/omd/versions/0.50/share/check_mk/agents/windows/check_mk_agent.exe

/opt/omd/versions/0.50/share/doc/check_mk/treasures/check_mk_agent.hp

/opt/omd/versions/0.52/share/check_mk/agents/check_mk_agent.aix

/opt/omd/versions/0.52/share/check_mk/agents/check_mk_agent.freebsd

/opt/omd/versions/0.52/share/check_mk/agents/check_mk_agent.hpux

/opt/omd/versions/0.52/share/check_mk/agents/check_mk_agent.linux

/opt/omd/versions/0.52/share/check_mk/agents/check_mk_agent.solaris

/opt/omd/versions/0.52/share/check_mk/agents/windows/check_mk_agent.cc

/opt/omd/versions/0.52/share/check_mk/agents/windows/check_mk_agent.exe

Copies of the check_mk_agent scripts are installed here: This demonstrates how OMD keeps its versions nicely separated.

Page 47: Computer monitoring with the Open Monitoring Distribution

Monitoring with Open Monitoring Distro

Kel Vanderlip 3-1-2012 UUASC-LA 47

In main.mk: legacy_checks = [

( ( "check_by_ssh_kel!'/opt/nrpe/libexec/check_ypwhich.sh'","YP client",

True), [ "itauxap1", "itauxap2" ] ),

]

define command{

command_name check_by_ssh_kel

command_line $USER1$/check_by_ssh -H $HOSTADDRESS$ -t 20 -l kelvinv -C $ARG1$

}

On itauxap1 at /opt/nrpe/libexec/check_ypwhich.sh #!/bin/sh

SERVER=`ypwhich`

if [ -z $SERVER ]

then

echo "CRITICAL: ypwhich NULL"

exit 2

fi

if [ $SERVER != "itauxap1.sunrider.com" ]

then

echo "CRITICAL: ypwhich INCORRECT: $SERVER"

exit 2

fi

echo ypwhich OK: $SERVER

exit 0

Example of adding a custom check run on a remote host using ssh:

Page 48: Computer monitoring with the Open Monitoring Distribution

Monitoring with Open Monitoring Distro

Kel Vanderlip 3-1-2012 UUASC-LA 48

#more check_lpstat-o.sh

#!/bin/sh

COUNT=`lpstat -o | wc -l`

echo lpstat queue $COUNT \| queue=$COUNT

exit 0

Another custom check, returns count of printer queue depth on HP-UX. KISS!

Page 49: Computer monitoring with the Open Monitoring Distribution

Monitoring with Open Monitoring Distro

Kel Vanderlip 3-1-2012 UUASC-LA 49

livestatus.o: It is a replacement output method for Nagios’s status.dat Like NDO, uses Nagios Event Broken API and loads as a module into Nagios. Unlike NDO, does not write; just responds to queries Used by Check_MK_Multisite, NagVis, Thruk to populate CGIs with data. Of course, its automatically set up in OMD…

In nagios.cfg:

broker_module=/usr/local/lib/mk-livestatus/livestatus.o \

/var/run/nagios/rw/live

Anyone using “Livestatus Query Language”?

Page 50: Computer monitoring with the Open Monitoring Distribution

Monitoring with Open Monitoring Distro

Kel Vanderlip 3-1-2012 UUASC-LA 50

livescheck: It is a replacement output method for Nagios’s heavy fork() Lightweight (100k RAM) helper process, called by Nagios to execute external applications. New, only in latest distro, I have not used it yet.

In nagios.cfg:

broker_module=...../livestatus.o livecheck=/omd/sites/mysite/lib/ \

mk-livestatus/livecheck

OMD includes the gearmand helper process, works for me.

Page 51: Computer monitoring with the Open Monitoring Distribution

Monitoring with Open Monitoring Distro

Kel Vanderlip 3-1-2012 UUASC-LA 51

MultiSite: It’s the CGI created by Mithias - “Check_MK” – it is what you see in your browser. “Multisite allows each user to customize the builtin views or create completely new views. This is done in the GUI by flexibly combining datasources, layouts, filters, sortings, groupings, column-painters and inter-view-links. The idea behind is, that the administrators of the monitoring system should be able to create custom views for their users or customers, while those are presented a GUI as simple as possible.” Reads data from livestatus.o, so refresh is almost instant – triggered by Nagios events, I think. Allows “multi-site” Nagios monitoring to be trivially easy: •Set up more than one site using OMD (local or distributed) •Edit “multisite.mk” •Watch the world – I have Hong Kong, they have me.

Includes a configurable sidebar, I have not been there yet.

Page 52: Computer monitoring with the Open Monitoring Distribution

Monitoring with Open Monitoring Distro

Kel Vanderlip 3-1-2012 UUASC-LA 52

Check_MK things we might not have touched on tonight: Python I have not had to look at it yet! WATO GUI management for Multisite Application-level monitoring Aggregation, BI services Logwatch grep your favorite logs and read from the GUI Windows “check_mk_agent.exe install” NagViz The management view – pays your bills! Mailing lists sign up, they are active.

Page 53: Computer monitoring with the Open Monitoring Distribution

Monitoring with Open Monitoring Distro

Kel Vanderlip 3-1-2012 UUASC-LA 53

I’m out of slides! Questions? Kelvin Vanderlip [email protected]


Recommended