ITIL v3 – Event Management Best Practices
Agenda
2
ITIL v2, ITIL v3 and Event Management
Image from http://iig.umit.at/
ITIL v2 ITIL v3
Event Management is new in ITIL v3 but can be implemented with ITIL v2 also
3
Service Operation & Event Management
Types of EventsInformational eventsException eventsIncidents
Service Operation module is responsible for the activities and processes required to deliver and manage services at agreed levels
Business services actually deliver value to the business only in this stage
Event Management plays a significant role in ensuring the operational health of IT services
Monitoring Event Management
Event Management RolesService DeskIT OperationsApplication Management
4
Event Management challenges
User Reported Incidents - Ensure that the right events are being generated by monitoring tools
Too many events – Implement strategies for Event correlation
Manual & time-consuming tasks - Identify routine manual tasks that can be automated
5
Are the right events being generated?
Why are 34% Incidents being reported by Users?Why is monitoring not generating these events?
% of Incidents Generated by Users
% of Incidents generated by Monitoring Tools
6
Do we have enough monitoring?
Ensure that the right events are being generated
Identify CI’s that are not monitored
Establish baselines & Reconfigure thresholds
Buy additional tools to add monitoring capability
7
“Manageable Events” before “Event Management”!
Eliminate Duplicate events
Dependency correlation
Root Cause Analysis
Event Correlation techniques
Automation
Run Book Automation
8Image from http://ideachampions.com
Event Management Automation
Sample Scenario – Automate manual taskWeekly scheduled task
Check for Free Disk space on a server
If < 20GB, free up space by deleting temp files
If delete fails log a ticket to SD
If delete successful confirm free space again
If >20GB send a note to server ops team
9
Automation – Reducing Incidents
Sample Scenario – Filter false alarmsServer down event from monitoring tool
Ping Server
If failed – try connecting to a web app on the server, if that fails too then log an Incident to SD
If ping success, try other troubleshooting tasks like DNS lookup and traceroute to determine why ping failed in the first case
10
Conventional IT Organization - NOC vs. Service Desk
Events sent directly to operations via email
Events sent directly to app mgmt via email
Separate teams with separate tools
Critical events may be missed leading to Service disruption
Service Desk missing visibility into critical events that caused the disruption.The million dollar question - Is it the Network, the Server or the Application ?
What caused the Service disruption?
11
ITIL v3 Best practice – Operations Bridge
Preventive tasks – routine operations
Preventive tasks – routine operations
Exceptions – Unusual activityExceptions – Unusual activity
Incidents – Disruption of Service
Incidents – Disruption of Service
Service Desk(tool) as the Operations BridgeAll actionable IT events and Incidents are logged in Service DeskReduced Incidents due to high quality event managementBasis for identifying automation opportunities
12
Operations Bridge"A physical location where IT Services and IT Infrastructure are monitored and managed."
Service Desk as Operations Bridge
Critical & Warning events are logged to Service Desk
Incidents & Service Requests are already logged
Service Desk acts as the Operations BridgeIT operations, Application Management & Service Desk teams work together
13
Info for current (and future )ManageEngine customers
Event ManagementFeature / Best Practice
ManageEngine Product
Incident Analysis Report Service Desk Plus
Monitoring Scope-Network, Server, VMware, VoIP-Applications and Databases
-OpManager-Applications Manager
Event Correlation Techniques OpManager
Root Cause Analysis Applications Manager
Run Book Automation - OpManager (due in the new release)
Operations Bridge Service Desk Plus
14
Summary
Optimum monitoring scope to ensure that the right events are generated
Implement Event correlation strategies to filter events
Use automation to reduce manual tasks and false alerts
Consider using your Service Desk as an Operations Bridge
15