NSClient++: Whats New?
5 years of vaporware
Presentation © Michael Medin
These slides represent the work and opinions of the author and do not constitute official positions of any organization sponsoring the author’s work
This material has not been peer reviewed and is presented here as-is with the permission of the author.
The author assumes no liability for any content or opinion expressed in this presentation and or use of content herein.
Disclaimer!
It is not their fault!
It is not my fault!
It is your fault!
Developer (not manager)◦ Not working with Nagios
Accidentally ended up in our NOC◦ Hated BB so we migrated to Nagios
2003: The birth of NSClient++◦ NSClient sucked (Broke Exchange)◦ NRPE_NT was to much work
2004: The open source of NSClient++◦ “just for fun”
2007: The rebirth of NSClient++◦ Got a lot of emails and hits on the webpage
2011: The Present◦ 0.3.9 out last may◦ 0.4.0 out as alfa
My Background
Windows Monitoring and NSClient++◦ Quick Introduction
What’s new in 0.3.9◦ Disk/File/*◦ Scheduled Tasks◦ Aliases◦ Crash Handling
What’s new in 0.4.0◦ New core◦ Unix support◦ New settings subsystem◦ New protocol◦ Python Scripting
The end of NSClient++! Q/A
Agenda
Windows Monitoring and NSClient++
Quick Introduction
What is NSClient?◦ A (pretty old) program
pNSClient A (pretty limited) protocol check_nt
◦ A (pretty incorrect) concept ”Windows monitoring”
What is it not?◦ NSClient++!
NSClient++ was written as a replacement for pNSClient But it has evolved much since then
NSClient: Terminology
NSClient++◦ Freedom!
Custom scripts Decentralized or centralized Active or Passive Can monitor “anything” (including your application) Can perform “tasks” (fix your problems)
Other options:◦ SNMP
Generally complex to use and limited on “standard” hardware◦ pNSClient/NRPE_NT/OpMonAgent/*
Old, outdated and usually limited functionality◦ “Agentless” WMI
Limited functionality Enforces centralized and active monitoring
But...◦ I am biased, so might not want to take my word for it...
Why should you use NSClient++
Protocol Method Encryption Auth Payload M. args. M. cmds HTTP
NSClient Active No Yes No Yes No NoNRPE Active No No 1024 Yes No NoNSCA Passive Yes Yes 512 Yes Yes NoNRDP Passive Yes Yes ∞ Yes Yes YesNSCP Active
PassiveConfigurationCommandsExtensible
Yes Yes ∞ Yes Yes Yes
DNSCP MQ No Yes ∞ Yes Yes Nocheck_mk Active ? No ∞ No Yes No
Several Protocols
Internals:◦ C++◦ Around 75.000 lines of code◦ Actively developed (unfortunately only by me)◦ Modularized design (use what you need)
Runs on:◦ Windows: NT4, w2k, XP, w2k3, Vista, w2k8, X64, X86 …◦ Unix: Linux/Debian (probably many/most others as well)
Current Version:◦ 0.3.9 with 0.4.0 in beta
Most features require NRPE or NSCA (or NSCP) Documentation online (WIKI)
◦ http://nsclient.org
About NSClient++
Not supported by a commercial entity◦ Donations welcome◦ Sponsoring available (contact me for details)
Used by a lot of people (I think)◦ Impossible to estimate any figures
Please, Help out!◦ Add documentation◦ Report problems◦ Come with ideas, thoughts, etc…
About NSClient++ (cont.)
Thank you!
About NSClient++Using NSClient++
NSClient++ is a command line program!◦ nsclient++ -start (net start nsclientpp)◦ nsclient++ -stop (net stop nsclientpp)◦ nsclient++ -test
Configuration:◦ notepad nsc.ini
Testing:1. Local (nsclient++ -test)2. From CLI (check_nrpe ...)3. From Nagios (add command)
Works with “anything” ◦ Including many non Nagios based systems
Using NSClient++ (0.3.9)
nsclient++ -testIs your friend!
New command line syntax!◦ nscp --service --start◦ nscp --service –-stop◦ nscp --help
Testing◦ nscp --test
Configuration:◦ nscp --settings-help◦ nscp --settings --migrate-to ini◦ nscp --settings --set …◦ …
Run scripts:◦ nscp --client --module PythonScript --command execute-and-
load-python --script test.py --install
Using NSClient++ (0.4.0)
nscp --testIs your friend!
NSClient++What’s new 0.3.9
Overview
Major simplification to the disk/file checker◦ CheckFile (removed)◦ CheckFile2 Deprecated◦ CheckFiles (replaces above)
Volume support (for real this time) Aliases NSCA/NRPE enhancements Scheduled task checks Crash Handling A bunch of new commands Bug fixes and many more things…
0.3.9 What's new: Overview
We have recruited a new member to the team!
A girl actually… …Still a bit wet behind the ears…
New team member!
Evelina was born 2010-07-21
NSClient++What’s new 0.3.9
CheckFile(1,2,s,…)
The good:◦ Powerfull interface!◦ Simple to use!◦ out-of-the-box solution!
(on which you can expand) The bad:
◦ Nothing! Really, I mean it! …and then… yesterday…
◦ …in the bar…◦ …all hopes shattered…◦ …aparently it is still to complicated…
Overview
Same as was introduced for eventlog last year
Based on SQL WHERE clauses◦ generated > -2d AND severity = 'error‘◦ size > 5k◦ size > 5k OR size < 1k◦ size > 5k AND written > -2d◦ (size > 5k OR size < 1k ) AND written > -2d◦ …
The new Filters
Type Descriptionfilename Name of the file
path Path of the file
size Size of the file
accessed When the file was last accessed
written When the file was last written
creation When the file was created
version The exe file version (slow)
line_count Number of lines in the file (slow)
Filter keywords
Operator Safe Meaning
= eq Equality
!= ne Not equal
> gt Greater then
< lt Less then
=> ge Greater then or equal
=< le Less then or equal
like String similarity (substring matching)
not like Opposit of like
regexp Regular expression matching
Filter operators
Option Description
path The root path to use
pattern The file pattern to use
filter Define the filter (there can only be one)
warn How many hits constitutes a warning state.warn=>5, warn==5 warn=!=5
crit How many hits constitutes a critical state.
truncate Length of returned data.Since NRPE/NSCA has a limited capacity this is important. (Will be deprecated in 0.4.0)
syntax How to format the return data
master-syntax How to format the “message string”
debug=true Displays a lot more information in the logfile/console
Command Options
CheckDriveSize … CheckAll=volumes … Other new features
◦ Added a new option to ignore drives which are not readable (like office 2010 q: drive) ignore-unreadable
◦ Added magic modifiers (from check_mk) magic=0.7
Volume support (for real this time)
NSClient++What’s new 0.3.9
Scheduled Tasks
Works the ”same” as CheckEventLog◦ ”filter=exit_code ne 0”
Two modules:◦ CheckTaskSched.dll
Works on Windows NT4 and beyond But cannot check ”new” tasks (from Vista and
beyond)◦ CheckTaskSched2.dll
Works on Windows Vista and beyond Has fewer filter keywords
Scheduled Tasks
Type Description
title Tasks name
application The application
comment Retrieves the comment for the work item.
parameters Retrieves the command-line parameters of a task.
working_directory Retrieves the working directory of the task.
exit_codeRetrieves the last exit code returned by the executable associated with the work item on its last run.
max_run_time Retrieves the maximum length of time the task can run.
status
Retrieves the status of the work item. Possible values include: ready, running, not_scheduled, has_not_run, disabled, has_more_runs, no_valid_triggers
most_recent_run_time Retrieves the most recent time the work item began running.
Filter keywords
CheckTaskSched "filter=exit_code ne 0" "syntax=%title%: %exit_code%" warn=>0
WARNING:test.job (1)
CheckTaskSched "filter=status = 'running' AND most_recent_run_time < -30m"
"syntax=%title% (%most_recent_run_time%)“warn=>0
WARNING:test.job (2011-02-10 23:14:35)
Sample Commands
NSClient++What’s new 0.3.9
Aliases
System◦ alias_cpu
CPU Load past 5 minutes, 80/90% bounds◦ alias_cpu_ex
CPU Load past 5 minutes, custom bounds◦ alias_mem
Memory utilization (all) 80/90% bounds.◦ alias_mem_ex
Memory utilization (all), custom bounds◦ alias_up
System uptime
Out of the box aliases
Disk/Drive◦ alias_disk
All fixed drives◦ alias_disk_loose
All fixed drives, ignore any problematic drives◦ alias_volumes
All volumes◦ alias_volumes_loose
All volumes, ignore any problematic drives◦ alias_file_size
Check the size of a given file (filename, size)◦ alias_file_age
Check the age of a given file
Out of the box aliases (continued)
Eventlog◦ alias_event_log
Check for errors in the event log Schedules Tasks
◦ alias_sched_all No scheduled jobs have failed
◦ alias_sched_long No task has been running for longer then a given time.
◦ alias_sched_task Check if a given task succeeded
Misc◦ alias_updates
Check that updates are applied
Out of the box aliases (continued)
Processes◦ alias_service
All services in “sensible state”◦ alias_service_ex
All services in “sensible state” (exclude various services)◦ alias_process
A process must be running◦ alias_process_stopped
A process must not be running◦ alias_process_count
A process must not have more then X instances◦ alias_process_hung
A process must not be hung
Out of the box aliases (continued)
NSClient++What’s new 0.3.9
Crash Handling
Using Google break pad ◦ same as Google Chrome, Mozilla Firefox, etc
Three options (not mutually exclusive)1. Send crash dumps to crash.nsclient.org
Server can be changed if you want to have an internal server or proxy
server.
2. Store crash dumps for analysis Will also be checked with check_nscp
3. Restart service
Crash Handling
[crash]restart=1service_name=nsclientpp
submit=0url=http://crash.nsclient.org/submit
archive=1#folder=<appfolder>/dumps
Configuring Crash Handling
NSClient++What’s new 0.3.9
Miscellaneous Fixes
NSCA◦ Fixed problems with sending ”many” results back
NRPE◦ Added support for large payloads
Checks◦ Added ”check_nscp” to check health of NSClient++◦ Added new check for running other checks ”with a timeout”◦ Added new negate check (to negate the result of another check)
All filters (read CheckEventLog et al)◦ Many fixes and additions (regular expressions)
Process checks◦ Added support for checking if processes has ”hung”
Performance data◦ Added it to many places where it was intermittently missing before
Other stuff (The highlights)
RoadmapWhats to come?
0.3.9• Last
0.3.x
0.4.0• Core switch• Linux
support• Distributed
Monitoring (v1)
0.4.1• Bugfixes
0.4.2• Monitoring
Kits• New windows
check-subsytem
• True passive checks
• Distributed Monitoring (v2)
0.4.3• Bugfixes
Roadmap (rough)
NSClient++What’s new 0.4.0
Overview
Brand new core based upon libraries◦ Things should *work* not just “work”◦ More modular and extensible
Unix support◦ Both as a client and server
New settings subsystem◦ Registry, improved ini support, http, etc
New protocol◦ NSCP (HTTP(s), MQ, Native)
Distributed monitoring◦ Many new things in this area (including MQ)
Python scripting◦ Primary goal (for me) is to create “unit-test”
Updated installer◦ Wix 3.5, more customizable
What’s new 0.4.0
“Monitoring Kits”◦ Monitoring solutions for “standard things”
New windows check-subsytem◦ More modern and less arcane (no NT4 support)◦ Remote checking
.Net plugin support◦ Possibly internal VBA scripting support
Metrics cache and aggregation◦ Lightweight version of CEP◦ “crit=cpu > 80% AND transactions_per_sec < 10”
What’s coming 0.4.2
Filter-like API (in addition to options)◦ “warn=any drive > 90% OR c: > 80%”
Remote updates/upgrades◦ Allow NSCP to upgrade itself
“port” of the “standard plugins”?◦ Run your favorite check_xxx from inside NSClient++
Unix plugins?◦ Run CheckCPU on unix machines?
Client/web Interface?◦ A nice little program (systray)
Let me know what you would like to see!
What might be coming?
NSClient++What’s new 0.4.0
Brand new core
The flux capacitor
This is why it was so long in the making◦ Merging each new version took forever!
New internal protocol◦ Removed all internal “limits” (think buffer sizes)◦ Allows many new features◦ Allows much more advanced internal scripts◦ Allows for “non NRPE based checks”
A lot of new bugs?◦ This is the scary part (for me)
but my testing has show it seems very stable
A completely new core
NSClient++What’s new 0.4.0
Unix support
Good question…◦ Since no one seems to like to program on
Windows I brought NSClient++ to “unix”
◦ Because I can With the new core comes portability So, perhaps the better question was:
Why not?
Will NOT be supported for some time though◦ Unless someone wants to help out
Why?!?!
NSClient++What’s new 0.4.0
New Settings
Hierarchical settings subsystem◦ [/settings/NRPE/server]◦ allow arguments=false
Instead of ◦ [NRPE Server]◦ allow_arguments=false
Why did I do this?◦ Because it was fun ◦ Number of options has started to explode◦ Simpler to use the registry (as well as xml?)
Settings
Since settings have “url:s”◦ old://${exe-path}/nsc.ini◦ ini://${base-path}/nsclient.ini◦ registry://HKEY_LOCAL_MACHINE/software/NSClient++◦ http://my.central.server/config/${hostname}.ini
Allows extensions (not via plugins though)◦ Maybe in the future:
lua://${base-path}/config.lua python://${base-path}/config.py
You can mix and match:◦ ini://${base-path}/nsclient.ini
Can “include”: registry://HKEY_LOCAL_MACHINE/software/NSClient++ Which in turn includes http://conf.server/${hostname}.conf
What’s in it for you?
Ability to load the same plugin twice. Normal (default alias is python)
◦ [/modules]◦ PytonScript=◦ [/settings/python/scripts]◦ test.py
Multiple modules (define two aliases foo and bar)◦ [/modules]◦ foo=PytonScript◦ bar=PythonScript◦ [/settings/foo/scripts]◦ test1.py◦ [/settings/bar/scripts]◦ test2.py
Multiple modules and alias
It depends…◦ If you are “still” using check_nt:
Probably not◦ If you are using NSCA:
Maybe not◦ If you want to use all new features
Yes How do I change?
◦ It is pretty simple… nscp --settings --migrate-to ini
◦ (or) nscp --settings --migrate-to registry
Do I need to change?
NSClient++What’s new 0.4.0
New protocol
Firewall
Windows Computer Nagios Server
NSClient++
check_nrpe
check_nrpe
check_nrpe
...
CPU
Disk
Mem
...
Active NRPE
Fork
Fork
Fork
Fork
Fork
Fork
Fork
Fork
Fork
Fork
Fork
Fork
...
...
...
...
Active NSCPFirewall
Windows Computer Nagios Server
NSClient++
check_nscpCPU
Disk
Mem
...
Allows more then one command to be sent Used internally for plugins Support both passive and active checks Supports configuration, management, etc… Extensible
But will also support:◦ Multiple locales (based on utf)◦ Unlimited payloads (soft configurable)◦ Support real performance data (not strings)
New protocol
NSClient++What’s new 0.4.0
Distributed monitoring
Submission (evolution)
SchedulerCheckCPU
...
CommandbrokerNSCA...
NSCA Server
SchedulerCheckCPU
...
Commandbroker
Event broker NSCA Agent NSCA Server
SchedulerCheckCPU
...
Commandbroker
Event broker NSCA Agent NSCA Server
XXX Agent
Real time plugin
......
XXX Server
Other scenarios
CheckEventLog
SYSLOG AgentEvent broker SysLog Server
NRPE Server
NSCA Agent
CheckCPU
...
Event broker
Commandbroker
NSCA Server
check_nrpe
an extension of the passive checks◦ ”Something” can send notification events◦ ”Something” can receive notification events◦ Agents can forward notification events◦ Replaces NSCAListener module
Supports routing Not a one-to-one mapping.
◦ Multiple consumers◦ multiple producers
Allows◦ Passive plugins (other then the built-in NSCA)◦ Script and rule based routing
Submissions and handlers
NSClient++What’s new 0.4.0
Python scripting
Built-in python scripting Has full API support
◦ Can build ”modules” in python◦ Can access settings◦ Can do “anything”
Primarily used by me for unit-testing Requires a working python install
Python Scripting
The end of NSClient++!
Le Roi est mort, vive le Roi!
0.4.x (ish) will be the last ”Windows” monitoring agent
The idea is to make it more:◦ A platform/client/server for distributed monitoring
Regardless of os/system Regardless of Monitoring solutions
Don’t worry…◦ It will still work just fine as a ”Windows Monitoring
Agent”◦ But in addition to this you will be able to do more.
So whats this all about?
Questions?
Q&A
Michael [email protected]
http://www.linkedin.com/in/mickem
Information about NSClient++http://nsclient.org
Facebook: facebook.com/nsclient
Slides, and exampleshttp://nsclient.org/nscp/conferances/2011/nwcna/
Thank You!