Processes in Service Operation
Event Management
Event Management
• GOAL– To detect Events, make sense of them, and determine
the appropriate control action to be provided by Event Management
• OBJECTIVE– To provide the entry point for the execution of many
Service Operation processes and activities
• SCOPE– Any aspect of Service Management that needs to be
controlled and which can be automated
Concept : Event
An event is a change of state that is significant for the management of Configuration Item or service
For example…This term is also used to mean an alert or notification.
Events typically IT Operations personnel to take action, and often lead to Incidents being logged
STATUS STATUS
STATUS STATUS
Action Needed Backup In Progress
IN MAINTENANCE UNAVAILABLE
AVAILABLEPROCESSING
UNAUTHORIZED ACCESS
SERVICE DEGRADED
Concept Of Alert
An Alert is a warning that…
• A threshold has been reached• Something has changed• A failure has ocurred
• Alerts are often created and managed by system management tools and are managed by the event management process
• The purpose of a alert is to ensure that the person with the skills appropriate to deal with the event is notified
CHANGE
Key Metrics
• No. of events by:– Category– Significance
• No. and % of events– That required human intervention and wheter this was performed– That resulted in Incidents or Changes– Caused by existing Problems or Known Errors– Compared with the number of Incidents
• No. and % of:– Repeated or duplicated events– Events indicating performance issues– Events indicating potential availability issues– Each type of events per platform or application
Implementation Challenges
Correct level of filtering
Rolling out necessary monitoring agents
Obtain funding
Acquiring necessary skills
Service Operation Processes
Incident Management
Incident Management
• GOAL– To restore normal service operation as quickly as possible
and minimize the adverse impact on business operations
• OBJECTIVE– To ensure that the best possible levels of service quality
and availability are maintained
• SCOPE– Incident Management includes any Events which disrupts,
or which could disrupt ,a service. This includes Events which are communicated directly by users, either through the Service Desk or through an interface from Event Management to Incident Management tools
Basic Concepts
• Timescales
• Incident model
• Major Incidents
NOTE: People sometimes use loose terminology and/or confuse a Major Incident with a Problem. In reality, an Incident remains an Incident forever – it may grow in impact or priority to become a Major Incident, but an Incident never ‘becomes a Problem’. A Problem is the unknown cause of one or more Incidents and remains a separate entity always.
Key Metrics
• Total numbers of Incidents– Breakdown at each stage
• Mean elapsed time to achieve Incident resolution to circumvention, broke down by impact code
• Percentage of Incidents handled within agreed response time
• Incident response-time• Average cost per Incident• Number and percentage of:
– Major incidents, backlog, incorrectly assigned or categorized– Resolved remotely, without the need for a visit
Implementation Challenges
• The ability to detect Incidents as early as possible
• Convincing all staff (technical teams as well as users) that all Incidents must be logged
• Availability of information about Problems and Known Errors
• Integration into the:– Configuration Management system (CMS)
– Service Level Management process (SLM)
– Service Knowledge Management System (SKMS
Service Operation Processes
Request Fulfillment
Request Fulfillment
• GOAL– To deal with Service Requests from the users/customers
• OBJECTIVE– To provide a channel for users to request and receive standard services
for which a predefined approval and qualification process exists– To provide information to users and customers about the availability of
services and the procedure for obtaining them– To source and deliver the components of requested standard services
(e.g. : licenses and software media)– To assist with general information, complaints or comments
• SCOPE– Each organization will need to decide and document which requests it
will handle through the Request Fulfillment process and which others will have to go through more formal Change Management process
Concept of the Service Request
The request from a user for information, advice, a standard change or access to an IT service.
– For Example :• To reset a password• To provide standard
IT services for a user
Service requests are usually handled by a Service Desk and do not require an RFC to be submitted.
Concept of the Request Model
The Request Model is a way of predefining the steps that should be taken to handle a process (in this case a process for dealing with a particular type of request) in an agreed way
Support tools can be used to manage the required process. This will ensure that standard requests are handled in a predefined path and within predefined timescales
Key Metrics
Backlog
Average Cost
Met SLA
~~~~~~~~~~~~~~~~~~~~~~~~
Did not meet SLA
~~~~~~~~~ Satisfaction Surveys
Implementations Challenges
Clearly defining and documenting the type of requests that will be handled within the Request Fulfillment process (and those that will either go through the
Service Desk and be handled as Incidents or those that will need to go through formal Change Management) – so that all parties are absolutely clear on the scope
Establishing self-help front-end capabilities that allow the users to interface successfully with the Request
Fulfillment process
Service Operation Processes
Problem Management
Problem Management
• GOAL– To diagnose the root cause of incidents, to determine
the resolution to those problems and to implement resolutions through appropriate control procedures
• OBJECTIVE– Primarily to prevent problems and resulting Incidents,
eliminate recurring Incidents and to minimize the impact of Incidents that cannot be prevented
• SCOPE– The Management of the lifecycle of all problems
Problem Management
Problem Management is the process responsible for managing the Lifecycle of all Problems
Problem Management consists of two major processes :
1. Reactive Problem Management is generally executed as part of Service Operation and is, therefore, covered in the Service Operation book
2. Proactive Problem Management is initiated in Service Operation, but is generally driven as part of Continual Service Improvement.
Problem
The unknown cause of one or more Incidents. The cause is not usually known at the time a Problem Record is created, and the Problem Management
process is responsible for further investigation
• Chronological Analysis
• Pain Value Analysis
• Kepner and Tregoe
• Brainstorming
• Ishikawa Diagrams
• Pareto Analysis
Problem Investigation & Diagnosis
Workaround
• A technique which reduces or eliminates the impact of an incident or problem for which a full resolution is not yet
For Example…
• Restarting a failed Configuration Item
• Rerouting workload
• Workarounds for incidents that do not have associated problem records are documented in the incident record
Shared Data
Incident # xxxxCategory : …
Step 1 : …Step 2 : …Step 3 : …Step 4 : …
Known Error
A Problem that has a documented Root Cause and a Workaround
Known Errors are created and managed throughout their lifecycle by Problem Management. Known Errors may also be identified by development or suppliers
Root Cause Workaround Known Error
+ + =
Problem
Known Error Database ( KEDB )
• A database containing all Known Error records
• The purpose is to store previous knowledge of Incidents and Problems, and how they were overcome, to allow quicker diagnosis and resolution if they recur
• This database is created by Problem Management and used by Incident and Problem Management
• The Known Error Database is part of the Service Knowledge Management System
Known Error
Known Error
Known Error
Known Error
KEDB
Concept of The Problem Model
A problem model is a way of predefining the steps that should be taken to handle a process (in this case a process for dealing with a particular type of problem) in an agreed way
Support tools can then be used to manage the required process. This will ensure that ‘standard’ problems are handled in a pre-defined path and within pre-defined timescales
Key Metrics
• Total problem recorded
• % of problems resolved within SLA
• # or % problems that exceed resolution targets
• Aged problems
• Average cost per problem • # of Major problems
identified• # of Major problem
reviews conducted• Known Errors added to
KEDB
Implementation Challenges
• The establishment of an effective Incident Management process and tools
• Formal interfaces and common practices between the two processes
• Links between Incident and Problem Management tools• The ability to relate Incident and Problem Management Records• Second and third-line Staff need to have a good working
relationship with first-line staff• Business Impact is well understood by staff undertaking
investigation of problems• Problem Management is able to use all Knowledge and
Configuration Management resources available
Service Operation Processes
Access Management
• GOAL– To execute the policies and actions defined in
Security and Availability Management.
• OBJECTIVE– To provide the entry rights for users to be able to use
service or group of services
• SCOPE– Access Management ensures that users are given the
rights to use the service, but it does not ensure this access is available at all agreed times
Management
Concepts
• Access
• Identity
• Rights ( also called privileges )
• Services or service groups
• Directory services
Key Metrics
Number of …
– Requests for access ( Service Request, RFC, etc.)– Incidents requiring a reset of access rights– Incidents caused by incorrect access settings
Instances of access granted : By service, user , department, etc.
Implementation Challenges
Provision of a database of all users and the rights that they have been granted the
ability to…– Verify the identity of a user
– Verify the identity of the approving person of body
– Verify that a user qualifies for access to a specific service
– Link multiple access rights to an individual user
– Being able to determine the status of the user at any time
– Manage changes to a user’s access requirements
– Restrict access rights to unauthorized user’s