Date post: | 26-Dec-2014 |
Category: |
Documents |
Upload: | paulo-baptista |
View: | 101 times |
Download: | 9 times |
© Opsera 2010 Commercial in Confidence
OV203: Advanced Opsview Configuration and ManagementTon Voon, OpseraWeb trainingMarch 2010
© Opsera 2010 Commercial in Confidence
Introduction
Who am I?
Who are you?• Name• What you do• Experience with Opsview• What you are most interested in
What will we be learning?
2
Ton Voon is the Product Architect for Opsview and is the main person in charge of the design and scope of Opsview. He has been involved in the development of Opsview since 2005.The main documentation site is http://docs.opsview.org/doku.php?id=opsview-community
© Opsera 2010 Commercial in Confidence
Aims of training course
Understand the advanced concepts of how Opsview monitors
Understand how a distributed Opsview system works
Understand how to add new custom plugins
Understand what backups are required for Opsview
3
Theory: Understanding the advanced concepts that will appear in OpsviewDistributed Opsview: Understand how it works and what limitations existPlugins: How to extend Opsview to monitor specific characteristics
© Opsera 2010 Commercial in Confidence
Agenda
Advanced monitoring concepts
Distributed architecture
Modules
Custom plugins
ODW
API
Configuration files
Backup and recovery
Troubleshooting
4
© Opsera 2010 Commercial in Confidence
Advanced concepts
© Opsera 2010 Commercial in Confidence
Checks: active versus passive
Active: Run on a periodic basis
Passive: Result that arrives on demand• Manually reset the state of a failure• or can set a freshness interval to auto change state after a period of time
6
Active checks are "polling checks". Try to use active checks because when things are fixed, the service will automatically change status at the next polling interval.Examples of passive checks: a backup start/finish message; a link up or link down from an interface; or entries in a log file.Passive checks need to be reset otherwise the state stays the same. You can: - submit a result via the contextual menu for the service. This will send the result to the slave (if appropriate) which will process it and send the result back up - configure the service check so that after a defined interval, it will auto reset the stateYou can also submit a result to test failure scenarios.
© Opsera 2010 Commercial in Confidence
State changes
A Soft state is an initial failure state
A failure state will change to Hard if it is still failing after a certain number of checks
Based on Max Check Attempts
Can retry at different intervals
Aim: cater for temporary “glitches”
7
When a service transitions from an OK state to a failed state, the check attempt will be 1. This increments for each subsequent failure. When the max check attempts is reached, the state is considered a hard state.Hostgroup Hierarchy always shows soft states, so it is always the most recent state information.Beware of ignoring soft states - there maybe something that is a transcient problem that needs resolving (usually load related).
© Opsera 2010 Commercial in Confidence
Notifications
Will send notifications on hard state changes
Can configure to receive emails, sms or RSS/Atom feeds• Emails: slave, RSS: master, SMS: configurable
Extra state: recovery
Checks continue to be run - will re-notify unless acknowledged or marked as downtime
Limitation: Notification from a slave will not necessarily have all information that the master has, since the parent/child topology will be different
8
Notifications are only sent on hard state changes which means a delay is introduced. You can set max check attempts to 1 if you want to be notified straight away.Opsview by default supports 3 different notification methods: Emails (this will require setting up a mail system on the Opsview server) SMS (need to have either an SMS gateway or a GSM modem attached) RSSSee: http://docs.opsview.org/doku.php?id=opsview-community:notificationmethods for more information.If you want to overcome the limitations of slaves, you can set notification from the master.A "recovery" state is when a host or service transitions from a hard failure back to ok
© Opsera 2010 Commercial in Confidence
Flapping
If a host or service changes state too frequently
Disables notifications temporarily
9
The flapping value is calculated as the number of state changes that have occurred from the last 21 states for a service.There is a high flapping threshold (when something goes into a flapping state) and a low flapping threshold (when something comes out of a flapping state).We use the default values of high: 30% and low: 20%.You can disable flap detection for each servicecheck. The default is to set this on.
© Opsera 2010 Commercial in Confidence
Parent/child relationships
Host can have a parent to denote a dependency
You can assign multiple parents for a host
Defines a network topology, acting as a dependency for determining network reachability
10
If a host is marked as down, then its parents are checked to determine the network reachability.
© Opsera 2010 Commercial in Confidence
Performance graphing
A way of storing numeric data over time
Automatically created from plugin’s performance data
RRD based, averaged
5 minute intervals
Service checks must run at least once an hour, otherwise “gaps” appear
Gauge or counter data points
Map file for changing non-compliant plugins
11
RRD stands for Round Robin Database. It is a fixed size database and very fast, but it loses resolution over time.If your plugin returns valid performance data, the database will be automatically updated.There is a default resolution for all RRDs: 5 minute is the smallest resolution 50 hours averaged over 5 mins 14 days averaged over 30 mins 2 months averaged over 2 hours 2 years averaged over 1 dayThis produces an RRD file which is 24K in size.The service check must run at least once an hour, otherwise RRD will mark no data in the RRD.The Gauge is automatic and default. To set counter values, suffix the value with “c”
© Opsera 2010 Commercial in Confidence
Checkpoint
What are the two types of checks?
What are the two state change types?
A process graph shows there are 3.2 processes. How come I get a non-integer value?
What happens with service defined with max check attempts of 1? When is this useful?
A active service is checked every 3 minutes, with a retry check interval of 1 min with a max check attempts of 4. How long will it be between the last OK and the time you are notified?
12
Opsview has 4 types of service checks: active, snmp polling, passive and snmp traps. The first two are active checks and the last two are passive.You could get fractional values for data points that are only integers because of the constant averaging at the RRD.Max check attempts 1 useful for passive checks, and also for services you want to get alerted on immediately.
© Opsera 2010 Commercial in Confidence
Answer!
Time
6 minutes since last OK
13
OK OK CRITCRIT
CRITCRIT
Soft Hard
3 3 1 1 1
© Opsera 2010 Commercial in Confidence
Distributed architecture
© Opsera 2010 Commercial in Confidence
Distributed slaves
Slaves provide monitoring from a different location• Reduces bandwidth• Spreads load• Independent monitoring system• Simplifies firewall configuration
Slaves can consist of clustered nodes• Balances workload• Redundancy and automatic failover
15
© Opsera 2010 Commercial in Confidence
Distributed slaves, part 2
Runs as an independent monitoring server, reporting to master (using NSCA)
Stores MRTG and NMIS data locally
Can enable web interface on slave (standard Nagios CGIs)
Managed from master server
Each host is assigned to a slave
Results from slave are marked as “stale” if no results arrive
Slave-node services automatically created
16
One of the main features of Opsview is the handling of distributed system. Opsview handles: * installing the Opsview software on the slave * upgrading Opsview software when master is upgraded * synchronising /usr/local/nagios/libexec * generating configuration for master and slave * single point of control from the master web UI
The technology used to send results back to the master is Nagios Service Check Acceptor (NSCA).
You can enable the web interface on the slave server, so that you can see standard Nagios screens from the slave."Freshness checking", a Nagios concept of expecting results within a certain timeframe, has an additional 30 minute window before it will start to mark services into a different state.There is a "Slave-node: {hostname}" service that will be automatically created that monitors the slave.http://docs.opsview.org/doku.php?id=opsview-community:slavesetup
© Opsera 2010 Commercial in Confidence
Distributed slaves, part 2
Limitation: Same OS and architecture as master
Limitation: Plugin output on slave only sends the first 511 bytes or the 1st line to the master
Limitation: Loses results if connection drops
Limitation: Acknowledgements and downtime not synchronised
Limitation: Time must be synchronised, but time zone can be different
Can have them clustered• Failover• Load balancing
17
Slaves need to be the same architecture as the master because Opsview master sends all files in /usr/local/nagios/bin, including the nagios executable, to the slaves. This will include architecture specific files.Plugin output from a slave to the master is limited to the 1st 511 bytes or the 1st line, which ever comes first. This is due to the transport mechanism used.Opsview uses NSCA for sending results from the slaves to the master. If it fails to send the check results, it will drop those results, hence you could have lost results. Time must be synchronised between master and slave. The check_opsview_slave_node, automatically created, will check that the time is within 5 seconds of each other.The timezone does not have to match, but we recommend it is set the same as the master.Software level clustering for slaves is provided in Opsview. You can use your own virtual machines to provide failover and redundancy if desired.
© Opsera 2010 Commercial in Confidence
Synchronisation of states
Principle: single point of control from Opsview master
Changes made through user interface will be propagated to slaves (5 second delay)
• So acknowledgements and downtimes on master were replicated on slaves
At Opsview reload, states for hosts and services on slave will be synchronised with the master if the last updated time is older on the slave
18
A limitation of Opsview prior to 3.5.2 was that if you moved a host or service that was acknowledged from one slave to another, the new slave would not know about the acknowledgement and thus notifications from the slave would be sent.
From Opsview 3.5.2 onwards, the state of the slave is synchronised with the master as long as the last_updated field on the slave was older than the masterʼs view. This occurs at Opsview reload time and a cluster take over time.
© Opsera 2010 Commercial in Confidence
Distributed Architecture Diagram
Opsview Master
Slave ASlave B1 Slave B2
ssh port 22
HTTP port 80
Web clientsWeb
clientsWeb clientsWeb
clients
Optional: HTTP 80
Datacenter 1 Datacenter 2
NRPE,SNMP or
check_by_ssh
sshSlave B3
ssh
ssh
nsca
All communication between master and slave is over port 22 (SSH). This is usually initiated from the master to the slave. It is possible to do a “reverse SSH”, so the slave initiates an SSH tunnel from the slave to the master which the master will then use to connect to slaves.Setup of slave requires exchanging SSH keys.Cluster nodes requires ssh key exchange between each other so they can tell between themselves what their state information is.You can have a different number of slave cluster nodes for each slave system.
© Opsera 2010 Commercial in Confidence
Clustered slaves
Can have an arbitrary number of clustered nodes, usually 2 or 3
At reload time, the hosts for a slave are split across cluster nodes
On each node, a service (“Cluster-node: {nodename}”) is automatically generated to monitor every other node
An event handler is setup to take over on failure
Requires an ssh connection between the two slave clusters
20
For more information about slave clusters, see: http://docs.opsview.org/doku.php?id=opsview-community:slaveclusters
Master
Slave-node: A
Slave-node: B
Slave-node: C
Monitors hosts
1, 2, 3, 4, 5, 6
Node A Node B Node C
Cluster-node: B
Cluster-node: C
Cluster-node: A
Cluster-node: C
Cluster-node: A
Cluster-node: B
Monitors hosts
1, 2
Monitors hosts
3,4
Monitors hosts
5, 6
Takeover 3 Takeover 1 Takeover 2
Takeover 5 Takeover 6 Takeover 4
© Opsera 2010 Commercial in Confidence
Example slave cluster system
21
The Opsview Master has 6 hosts which are being monitored by the slave system. There are 3 nodes in the slave.
At Opsview reload time, the 6 hosts are split across all the nodes (based on an algorithm called Set::Cluster on CPAN - http://search.cpan.org/dist/Set-Cluster/ ).
The “Cluster-node” services are automatically generated to look at every other node in that slave system. This requires SSHEach cluster node service has a list of hosts it will takeover in the case of the specific node failure - this is calculated at reload time (and not dynamically).
© Opsera 2010 Commercial in Confidence
Clustered slaves, part 2
MRTG data stored locally: the first node in the cluster will poll devices
NMIS data stored locally: will rsync with other cluster nodes
Every 15 minutes, the status information for node is sent to every other node (for synchronisation at takeover)
Limitation: Single node failure only
Limitation: Slave clusters should be in the same network segment
22
The NMIS data is rsyncʼd between cluster nodes every hour (based on nagios userʼs crontab)
© Opsera 2010 Commercial in Confidence
Checkpoint
Why would you use slaves?
How many slaves can you run?
If there are 2 cluster nodes in a slave, how many Cluster-node checks are automatically created?
What if there are 4 cluster nodes?
What would happen if 2 nodes in a cluster failed?
23
There is no software limit to the number of slaves, but there is a cost as reload times increase.For 2 cluster nodes, 2 checks are automatically setup, each monitoring the other.For 3 cluster nodes, 6 checks would be setup.For 4, 12 checks are setup.The formula is: N x (N-1) where N is the number of nodes in a slaveIf two nodes failed, some of the hosts/services would be marked as stale.
© Opsera 2010 Commercial in Confidence
Distributed slaves setup
© Opsera 2010 Commercial in Confidence
Setup
Setup users and groups on slaves
Prerequisite software needs to be installed. Use opsview-slave package
Doesn’t install actual software - will be sent from master
Upgrades will update slaves as part of the upgrade process
Can enable a web interface on the slave
25
The procedure for setting up a slave web interface is documented at http://docs.opsview.org/doku.php?id=opsview-community:slavesetup#slave_web_interface
© Opsera 2010 Commercial in Confidence
Operation
Slave runs its active checks
Every result is written to /usr/local/nagios/var:• cache_host.log• cache_service.log
Every 5 seconds, /usr/local/nagios/bin/process-cache-data is called to send host and service results to the master
Output from these calls is saved to cache.log and the return code is saved to /usr/local/nagios/var/ocsp.status
The Opsview master will show the service as passive, though it is actively run on slave
26
process-cache-data uses send_nsca to transporting results back to the master. This has a limit of 511 characters in the output.
© Opsera 2010 Commercial in Confidence
Troubleshooting
Check ocsp.status for last return code on slave
Check cache.log for errors
Check:• echo “” | /usr/local/nagios/bin/send_nsca -H 127.0.0.1 -c /usr/local/nagios/
etc/send_nsca.cfg• 0 data packet(s) sent to host successfully.
Check ssh on master
Check netstat -an | grep 5667 on master
27
© Opsera 2010 Commercial in Confidence
Modules
We use the term Modules for functionality that is “loosely coupled” to Opsview but we still provide integration with it.
Opsview Core comes with: * Nagvis * MRTG * NMIS
© Opsera 2010 Commercial in Confidence
Nagvis
Nagvis provides a visual representation of the status of various objects
Maps are the grouping of objects together with a background image
Automap is a replacement network map view
Technology:• PHP5
Integration:• Apache configuration• Authentication• Host group hierarchy when choosing host groups
29
Because Nagvis is PHP5 based, Opsview delegates the PHP5 page rendering to Apache, by disabling proxying through to the Opsview Web application.
When configuring Apache, you can use the auth ticket method for authenticating. This means that users of Opsview can access Nagvis using this authentication ticket seamlessly. If a user tries to access nagvis without this ticket, they will be redirected back to the Opsview login screen.
For more information about Nagvis: http://docs.opsview.org/doku.php?id=opsview-community:nagvis
© Opsera 2010 Commercial in Confidence
Nagvis: Using
Initial page at /nagvis
Access to maps
Adding a new map
Adding a background
Adding a new state object (host groups can be based on the hierarchy)
Limitation: No fine-grained access controls within Nagvis
Limitation: Times displayed are in UTC
30
Beware of Nagvis access controls! Maps can be assigned to EVERYONE for view and EVERYONE for edit. This means any authenticated user could edit your maps. Also, since the maps can be edited, this means it is possible to get a drop down list of all the hosts and services on your system! You can overcome this by making sure you only have named users for edit.
© Opsera 2010 Commercial in Confidence
MRTG and NMIS
Both used for interface statistics
Use SNMP to collect information from hosts
Uses RRD files to store its data
31
More information about MRTG and NMIS in the OV204 course
© Opsera 2010 Commercial in Confidence
Custom Plugins
© Opsera 2010 Commercial in Confidence
Plugin specifications
A plugin must provide:• A return code on completion• Output to stdout (preferably 1 line, less than 511 bytes)
A plugin should provide:• Help output when run with -h (written to stdout)
A plugin can provide:• Performance data. Everything after the pipe symbol (“|”) is considered
performance data (preferably on that 1st line)
Full plugin guidelines:• http://nagiosplug.sourceforge.net/developer-guidelines.html
33
© Opsera 2010 Commercial in Confidence
Performance data format
Performance data will be automatically graphed if it is of the correct format
label=value[uom][;warn][;critical]
Can have multiple sections of these - use space to separate
Can change the order and the number of the sections and the insert routine will update the data appropriately
Be aware of averaging!
34
The warning and critical levels are optional.The full performance format includes maximum and minimum values, but these are ignored in Opsview.If the label changes, then it will be considered a new performance plot.ODW will save the raw value. However, threshold information is not retained.
For counter values, use performance data like: inputbytes=119c
© Opsera 2010 Commercial in Confidence
Custom checks
Write it
Run it on command line
Have -h option to print out help text
Drop onto /usr/local/nagios/libexec
Will be automatically available in Opsview servicecheck page
Recommendation: Plugins return a short amount of data on 1 line
35
You can create a plugin in any language as long as it is executable by the nagios user.We will create an example plugin that just returns OKYou may not need to write your own plugin, if an existing one can handle the checking for you. For instance, instead of a dedicated virus update plugin, just use check_file_age to test that the virus definition file is up to date.You could also use a "proxy" method for getting results. For instance, query a database to get results for a test (say, number of sessions in your web application, balance of a test account)FOSDEM example plugin: http://nagiosplugins.org/fosdem
© Opsera 2010 Commercial in Confidence
Creating plugins easily
Use Nagios::Plugin
Distributed with Opsview Agents, installed in /usr/local/nagios/perl/lib
Start plugin with:• use FindBin;
use lib "$FindBin::Bin/../perl/lib";use Nagios::Plugin;
More documentation:• http://search.cpan.org/dist/Nagios-Plugin/lib/Nagios/Plugin.pm
36
© Opsera 2010 Commercial in Confidence
ODW
© Opsera 2010 Commercial in Confidence
History
Nagios is very good at “what is happening now”
Nagios’ reporting uses nagios.log file to get status
Lots of logic to work out status changes held in report code
Need to move to database driven
38
© Opsera 2010 Commercial in Confidence
NDOutils and Runtime
NDOutils (Nagios Data Objects) is a project to put Nagios status data into a mysql database
Some limitations in how NDOutils saves its data:• No configuration information over time• Some tables are too big
39
Opsera have been very active in using NDOutils and updating the software.Nagios has started to move data into a database structure - this is the project called NDOutils (Nagios Data Objects).Opsview uses NDOutils, and has been actively promoting and updating the software, but recognises there are some limitations in how the data is represented in NDOutils. For instance, the rows to record every result in NDOutils takes about 1K for every result, whereas in ODW it takes about 250 bytes.
© Opsera 2010 Commercial in Confidence
Opsview Data Warehouse
ODW designed to be a data warehouse
Denormalised data, for easier searching
Long term storage tables
Raw results - performance data is exactly as received
Summary tables for quick queries
Schema diagram and further documentation• http://docs.opsview.org/doku.php?id=opsview-community:odw
40
© Opsera 2010 Commercial in Confidence
ODW: Operation
Need to opt-in to the import process from System Preference
Cron job called import_runtime runs at 4 minutes past the hour to collect data and summarise
Plugin: check_odw_hostgroup_availability
41
If you change the crontab entry, be aware that an upgrade will revert the crontab back again.Be aware that the tables used by ODW are MyISAM tables which do table level locking. If you have a long running query, it may lock up the import process, but the import process will continue when tables are unlocked again.If you do a lot of reporting, you may want to consider setting up replication of the ODW database onto a different mysql server where you can run the reports without affecting the main Opsview instance.
© Opsera 2010 Commercial in Confidence
Architecture
42
Only the import_runtime script is used to add data into ODW.The import_runtime is also aware of multiple Opsview masters, so it is possible to have a shared ODW across multiple Opsview masters. This allows comparisons between hosts and services on different systems.The main limitation is that the host name must be unique between all Opsview masters: http://docs.opsview.org/doku.php?id=opsview-community:sharedodw
© Opsera 2010 Commercial in Confidence
Reports
Opsview comes with some PDF reports
Phasing out to use Opsview Enterprise Reports Module instead
• Based on Jasper Reports technology• Available to Enterprise subscribers• Documentation at http://docs.opsview.org/doku.php?id=reportingmodule• Some predefined reports, such as Weekly Availability by Keyword and
Weekly Performance by Keyword
Uses ODW to gather all data metrics for availability and performance
43
The PDF reports are retained in Opsview Core, but are being phased out due to complexity of code and poor functionality.
The new reports infrastructure uses Jasper Reports for its base technology, and includes several report types
© Opsera 2010 Commercial in Confidence
API
© Opsera 2010 Commercial in Confidence
Reasons for API
Automated configuration changes
Currently supports:• creating / cloning / deleting hosts• scheduling downtime• reloads
45
For further information about the API, see: http://docs.opsview.org/doku.php?id=opsview-community:api
© Opsera 2010 Commercial in Confidence
<opsview>
<authentication><username>admin</username><password>initial</password></authentication>
<host action="create">
<name>host</name>
<ip>10.10.10.10</ip>
<check_command><name>ping</name></check_command>
<hostgroup><id>2</id></hostgroup>
<icon><name>LOGO - Opsview</name></icon>
</host>
</opsview>
This is an example XML file to push to Opsview.
© Opsera 2010 Commercial in Confidence
Example invocations
curl -H 'Content-Type: text/xml' -d @file.xml http://opsviewserver/api
opsview_api -f file.xml
47
© Opsera 2010 Commercial in Confidence
Configuration files
© Opsera 2010 Commercial in Confidence
Configuration files
/usr/local/nagios/etc• opsview.conf and opsview.defaults• map.local and map
/usr/local/opsview-web• opsview_web_local.yml and opsview_web.yml
/usr/local/nagios/share/stylesheets• custom.css
/usr/local/nagios/nagvis/etc• nagvis.ini.php
/usr/local/nagios/nmis/conf• nmis.conf
49
opsview.defaults is a shipped file and is subject to change. If there are files that you need to override, copy the variable into opsview.conf and amend it there. opsview.conf will not usually be changed over a upgrade.Similarly for the map and the map.local file. However, you shouldnʼt need to use the map file as correctly formatted performance data will be automatically graphed.The nagios configuration files are regenerated every time.
© Opsera 2010 Commercial in Confidence
Backup and recovery
© Opsera 2010 Commercial in Confidence
Backup scope
A cronjob is run around 3am which invokes a backup
The backups save configuration data and some key Runtime data
The scope is to be able to recover a system quickly
Key variables in opsview.conf:• $backup_dir - which directory to store the backups• $backup_retention_days - number of days worth of backups to keep
You will have to design your own backup strategy for long term archival
51
Long term archival is not handled as part of the backup script because of the amount of data that could be in ODW and Runtime. For more information about backups: http://docs.opsview.org/doku.php?id=opsview-community:backups
© Opsera 2010 Commercial in Confidence
Data to consider backing up
ODW
Runtime
MRTG
NMIS
Slaves• MRTG• NMIS• Nagios logs
52
© Opsera 2010 Commercial in Confidence
Restore process
Assumes you are restoring to the same server as the original Opsview server
Stop Opsview and Opsview Web
Restore files
Restore Opsview, Runtime (subset), Reports database
Restart Opsview and Opsview Web
Reload Opsview
53
For the latest restore process, see: http://docs.opsview.org/doku.php?id=opsview-community:backupsIf you are looking to migrate Opsview onto different hardware, see: http://docs.opsview.org/doku.php?id=opsview-community:migratinghardware
© Opsera 2010 Commercial in Confidence
Troubleshooting
© Opsera 2010 Commercial in Confidence
Reload process
Creates Nagios configuration for master
Runs MRTG configuration generation in background
For each slave:• Creates Nagios configuration• Transfers files
Validates all configuration
Reloads Nagios simultaneously
55
Opsview uses Parallel::Forker for the workflow management. There is a limit set to only run 4 concurrent jobs at once - you can increase this if you have more CPUs on your master server.
© Opsera 2010 Commercial in Confidence
Reload troubleshooting
/admin/reload - will show common errors, including Nagios validation errors
/usr/local/nagios/var/rw/config_output• Last full debug output of all configuration generation
/usr/local/nagios/var/log/create_and_send_configs.debug• Last reload process workflow
56
© Opsera 2010 Commercial in Confidence
File locations
/usr/local/nagios - Nagios and main Opsview core
/usr/local/opsview-web - Opsview web application
/var/log/opsview - logs for master server• opsviewd.log - For main opsviewd daemon and other master jobs• opsview-web.log - For web application
/usr/local/nagios/var - logs and status files for Nagios (on slaves too)
• var/log/opsview-slave.log - For Opsview specific slave jobs
Performance RRD data• /usr/local/nagios/var/rrd/{hostname}/{servicename}/{metric}/value.rrd• thresholds.rrd
57
Performance graphs stored in /usr/local/nagios/var/rrd/{hostname}/{servicename}/{metric}/value.rrd.Thresholds are stored in thresholds.rrd, with information about the warning and critical levels.
Opsview uses Log4perl for logging and you can set location and rotation information. However, this file is overwritten as part of an upgrade
© Opsera 2010 Commercial in Confidence
Opsview not running
Restarting Opsview: /etc/init.d/opsview restart
Restarting Opsview Web: /etc/init.d/opsview-web
Do these either as root or switch to nagios user• su - nagios
58
You must use su - nagios. The dash means to pick up some environment variables required by Nagios.
© Opsera 2010 Commercial in Confidence
Errors relating to browser
The Opsview Web Server is not running• If an upgrade is in progress
Error retrieving update from Opsview. Will continue to retry• AJAX update problem. Will continue to repoll. Problems with web service?
59
© Opsera 2010 Commercial in Confidence
Errors relating to Opsview
If a service fails which is for the Opsview master or slaves, escalate to Opsera
60
Opsera provide commercial support for Opsview. See http://opsera.com/jsp/opsera_product/Opsview%20product.jsp for details
© Opsera 2010 Commercial in Confidence
Summary
Understand the advanced concepts of how Opsview monitors
Understand how a distributed Opsview system works
Understand how to add new custom plugins
Understand what backups are required for Opsview
61