The Role of OpManager in
Event and Fault Management
Team OpManagerwww.opmanager.com
Agenda
• Brushing up Fault management– Reactive Vs. Pro-active
• The four processes and OpManager’s role– Detect– Isolate– Inform– Resolve
2
Reactive Fault Management
• Firefighting in nature • Troubleshooting starts after business is impacted• Higher resolution time• Least preferred by both IT admins & End users
3
User
IT Admin
It is not working!
Proactive Fault Management
• Alerts on an impending fault• Resolution time reduced drastically• Reduced operation cost
4
NMS has reported a problem & I’m working on
it
User
IT Admin
What is Fault and Event Management?
• Detecting events • Make sense of them• Present only
actionable events
5
*An event can be informational, a cleared event, warning, trouble or even a critical problem
The four processes
6
The four processes explained
7
• Active Monitoring• Passive Monitoring
• De-duplication• Correlation• Automation
• Visual representation• Ticketing• Alerting
• Automatic correction• Troubleshooting tools
Detect – Capture events
• Active Polling/ Probing/ Query monitoring
8
Active Monitoring: e.g. SNMP Polling
Other e.g. of Active polling are monitoring through SNMP, WMI, Telnet, SSH, Custom scripts, Remote query & more…
Detect – Capture events
• Passive or Event-based Monitoring
9
Passive Monitoring e.g. SNMP TRAP
Other e.g. of Passive monitoring are SNMP TRAPS, Syslog, NetFlow, Packet forwarding & more …
Isolate – Present actionable faults
• Helps identify the root cause of the problem quickly; reduces Mean-Time-To-Resolve (MTTR)
• Includes tasks to– Understand event source– Filters-out redundant or known events – Projects only actionable faults
*Network Management System’s Fault management engine plays a vital role
10
De-duplication
•Drops recurrent events from displaying•Build them as event history
Isolate – Present actionable faults
11
De-duplication
•OpManager Alarms view – Showing unique alerts for every device and type of alarms•Detailed alarm history page with list of alarm actions
Isolate – Present actionable faults
12
Correlation
•Relates previous events and interdependency•Projects only the root cause of the problem
Isolate – Present actionable faults
13
Correlation
•OpManager has automated and custom network maps that lets you identify the root cause much quickly.•Lets you configure device dependencies to project only the root of the problem
Isolate – Present actionable faults
14
Automation
•Ignore incidental events •Remove cleared faults•Suppress known alarms (Automated/ Manual Suppression)
Isolate – Present actionable faults
15
Automation
•Threshold configuration – Consecutive Times and Rearm Value
•Suppress known alarms – Downtime Scheduler
Isolate – Present actionable faults
16
Automation
•Suppress known alarms - Manual suppression for devices and interfaces
Isolate – Present actionable faults
17
•Visual representation of faults to facilitate NOC admins •Ticketing and Alert remote admins
Inform – Notify admins
18
Inform – Notify admins
• Alarms color coding• Web Alarms and Dashboards• Dynamic network or custom maps showing
the network and device status
19
Inform – Notify admins
Trouble ticketing
•Through Email for other helpdesk products •Automatic ticket creation with ManageEngine ServiceDesk plus, through integration
20
Inform – Notify admins
• Alert remote admins – Email, SMS, RSS feeds, Twitter Alerts, iPhone/ Smartphone GUI
21
RSSTwitter DM
Smart Phone UI
SMS
Resolve – Aid faster resolution
• Needs proprietary knowledge of your IT infrastructure, policies & agreed SLAs.
• NMS should help – Execute such automation logics (Communicate
execution faults, if any)– Back manual troubleshooting with set of IT tools
22
Resolve – Aid faster resolutionAutomated Fault resolution
•Run a command or Run a program on a remote machine with options to append error messages•Restart Windows service or the server, if the service is found to be down
23
Resolve – Aid faster resolutionServer Troubleshooting Tools
•Remote Process Diagnostics•Device Tools: Ping, Trace route, Tools to remotely connect to the server – Web console, Telnet/ SSH, MS terminal server
24
Resolve – Aid faster resolutionNetwork Troubleshooting Tools
•Switch Port Mapper•Network Traffic Analysis•Switch port disabling option
25
Resolve – Aid faster resolutionNetwork Troubleshooting Tools
•WAN link hop-wise latency count graph•Network Change and Configuration Management (NCCM)
26
Resolve – Aid faster resolutionOther Troubleshooting Tools
•Real-time performance graphs•MIB Browser and Syslog viewer
27
Tons of features that we’ve not talked about
28
ManageEngine OpManager is comprehensive, easy-to-use network monitoring & management software.
For free trial visit -www.opmanager.com
For product demos - Mail us at [email protected]
Call at +1 888 720 9500
• Automatic network discovery• Device and Interface monitoring
templates• Network Maps/ Custom Maps• WAN RTT and VoIP Monitoring• Network Traffic Analysis• Network Change and Configuration
Mgmt.• Server Monitoring (Windows/ Linux/
UNIX flavor OSes)• ESX VMware Monitoring• MS Exchange, SQL and Active
Directory Monitoring• Service Monitoring, Website
monitoring, Process and File/ Folder monitoring
• Processing SNMP TRAPs, Syslogs & Event Log
• Monitors any pingable and SNMP enabled device
About ManageEngineManageEngine is the only IT Management vendor focused on bringing a complete IT
Management portfolio to the mid-sized enterprise.
29
Trusted by over 45,000 customers including 3 out of every 5 fortune 500 companies. More at www.manageengine.com
Fault and Event Management Proactive and Reactive approach
Four processes of Fault Management: Detect: Active and Passive MonitoringIsolate: De-duplication, Correlation, Automation Inform: Visual fault representation, ticketing and alertingResolve: Automated Scripts and Tools to aid manual troubleshooting
In each process OpManager’s role in Fault and Event management
30
Summary
About ManageEngine and its various IT management products