Decision Support Systems – a tool for
Alarm Management
Aniruddha Datta, Ph.D., IEEE Fellow
J.W. Runyon, Jr. '35 Professor II
Director, Center for Bioinformatics and Genomics Systems Engineering
Department of Electrical and Computer Engineering
Texas A&M University, College Station, TX-77843
Pankaj Goel
Ph.D. Student
Some Background Info
• Ph.D. 1991 (Robust Adaptive Control Systems)
• 1991-2001 : Teaching and research (Adaptive Control, AIMC, Modern Control and PID control)
• 2001-2003: National Cancer Institute- Bioinformatics Trainee
• 2003-Present : Teaching and research related to cancer genomics.
• 2015-Present: Plant Genomics Research Driven by Agricultural Applications
• 2016 : Control, Automation and Process Safety (Just getting started)
2/21/2017 SC Meeting 2
Industrial Revolution
2/21/2017 SC Meeting 3
Figure 1 : Industrial revolution[1]
Challenges
• Integrated safety strategies
• User friendly solutions
• Migration strategy from semi-automated to fully-automated systems
• Train operators to handle abnormal situations
2/21/2017 SC Meeting 4
Automation Pyramid
Enterprise Resource Planning
(ERP)
Manufacturing and Execution
Application Servers, Supervision and
Control
Automation Controllers
Sensors and Actuators
Amount of
accessible data
Number of
devices
msec
minutes
days
Weeks
Time
Business Value
$$$
2/21/2017 SC Meeting 5
Figure 2 : Automation Pyramid[2]
Layers of Protection
2/21/2017 SC Meeting 6
Figure 3 : Layers of protection[3]
Alarm System
• “An alarm is an audible and/or visible means of indicating to the operator an equipment malfunction, process deviation, or abnormal condition requiring a response” (ANSI/ISA-18.2, 2009).
• Assuming Process variable as P and the set point as S, an alarm A can be mathematically defined as:
A=1, if P >= or =< S
=0, if P < or >S
2/21/2017 SC Meeting 7
Characteristics of Alarm System
2/21/2017 SC Meeting 8
Figure 4 : Characteristics of Alarm system
Some Examples
Figure 6: Good Alarm[5]Figure 5: Poor Alarm[4]
2/21/2017 SC Meeting 9
Example
2/21/2017 SC Meeting 10
Figure 7: Alarm screen[6]
Cost of poor Alarm Management
Nuisance
alarm
Standing
alarm
Critical
alarm
Performance target
missed alarmsPlant State
Normal
Upset
Shutdown
Disturbed
Operator priority
Process Optimization
(Important)
Production
(Very important)
Equipment Damage
(Urgent)
Safety and
Environment
(Critical)
Poor Control
Energy waste
Feedstock expense
Containment loss
More wear & tear
Equipment damage
Injuries/fatalities
ESD activation
Environmental violations
2/21/2017 SC Meeting 11
Figure 8: Cost of poor alarm management [7]
Evidence
Incident list related to alarm management issues
S. No. Incident Year Root causes related to alarms Injuries/FatalitiesFinancial Loss
reported
1 Three Mile Island 1979Operator were loaded with numerous
alarms, Several key alarms were misleading$ 1-2 Billion
2 Piper Alpha Oil rig 1988Inadequate shift handovers, Issues with false
alarms167 fatalities $ 3.4 Billion
3Texaco milford Haven
refinery,UK1994
Poorly prioritized alarms, Poor design of
displays, alarm flood26 injuries $ 71 million
4 Channel tunnel fire 1996Rail control centers were flooded with alarm
and information$308 million
5
Tosco Avon
Accident,Martinez.
California
1997No alarm on temperature indication and
control system with high priority alarms1 fatality 46 injuries
6Longfrod gas
explosion,Australia1998 Inappropriate response for critical alarms 2 fatalities 8 injuries
7
First chemical
corporation,Pascagoula
Mississippi
2002
System was not protected with enough layers
of protection including alarms, safety
interlocks and overpressure protection
3 injuries
8 BP texas refinery incident 2005Failed management of instruments and
alarms
180 injuries, 15
fatalities$1.5 billions
9Buncefield oil
storage,Hemel,Hemstead2005
Shortcomings in design,provision and
operation of the protection alarms and
shutdown systems
40 injuries
10 Kalamazoo River oil spill 2010
Numerous alarms from the affected Line 6B,
but controllers thought the alarms were from
phase separation, and the leak was not
reported
2/21/2017 SC Meeting 12
Table 1 : Incident list[8]
Alarm management lifecycle
2/21/2017 SC Meeting 13
Figure 9: Alarm management lifecycle[9]
Issues
Alarm flooding - A condition during which the alarm rate is greater than the operator can effectively manage (e.g. more than 10 alarms per 10 minutes)”.
According to ASM consortium “Alarm flooding is the phenomenon of presenting more alarms in a given period of time than a human operator can effectively respond”.
Alarm flooding results in more workload on an operator and increased chances of missing a critical alarm.
2/21/2017 SC Meeting 14
Issues
The main reasons of alarm flooding are:
a. Standing alarms - alarms which remain in the alarm state for long period of time
b. Chattering alarms- alarms which are on then off and on again during small period of time (e.g. 1 min)
c. Fleeting and/or momentary alarms- alarms which turn on and off very quickly, but do not necessarily repeat
d. Stale alarms – alarms which go into alarm and do not return to the normal state for at least 24 hrs
2/21/2017 SC Meeting 15
Key performance indicators
KPI EEMUA 191 ISA 18.2
Average alarms per day <144 (up to 288 may be
manageable)
~ 150 (~300 may be manageable)
Average standing alarms <10 <5 per day
Peak alarms /10 minutes <10 <=10
Average alarms / 10-minutes
interval
1 ~1(~2may be manageable)
Distribution % (low/med/high) 80/15/5 80/15/5
2/21/2017 SC Meeting 16
Table 2 : Key performance indicators [9,10]
Human Response
2/21/2017 SC Meeting 17
INPUT
Sensor(Instrument,
Mechanical)
LOGIC
Decision(Logic solver,
Observe, Diagnose,
Decide and Act)
OUTPUTFinal Control
Element
(Logic Solver,
instrument etc.)
Figure 10: Decision loop[40]
Figure 11: Human response timeline[40]
Decision Support System
• Previous research work
– AWARE (2007), HSE UK: A tool for early detection of
runaway events
– OP-AIDE (1999),Purdue University : an intelligent operator
decision support system for diagnosis and assessment of
abnormal situations
– DKIT (1996), Purdue University : Online real time fault
diagnostic system
2/21/2017 SC Meeting 18
Decision Support System
2/21/2017 SC Meeting 19
Data
acquisition
Decision
support
system tool
Data
Mining
Fault
identification
and
diagnostics
HAZID &
Prior
knowledge
Figure 12: Decision support tool
Decision Support System
• Benefits:– Reduce alarm flood by:
• Early detection of faults
• Using historical data and knowledge to predict root causes
• Assistance in advanced alarming techniques
– Reduce action time during abnormal situation:
• System based on prior information and knowledge
• Guiding user interface for the operator to take an action
2/21/2017 SC Meeting 20
Acknowledgements
• Dr. Sam Mannan
• Ms. Valerie Green
• Ms. Alanna Scheinerman
• All members of Steering Committee
• All members of MKOPSC
2/21/2017 SC Meeting 21
References1. Online source: http://www.tritoninnovation.com/industry40/ (Accessed on: 10/16/2016)
2. Modified from Hollender, M. (2010). Collaborative process automation systems. ISA.
3. Exida. Layers of Protection. 2014; Available from: www.exida.com
4. Online source : https://www.carboncleaningusa.com/how-can-i-reset-my-check-engine-light (accessed on:10/14/2016)
5. Online source: http://carrising.info/car-dashboard-lights-and-symbols-guide (accessed on: 10/14/2016)
6. Online source: https://www.youtube.com/watch?v=vj44kr54Y9Q (accessed on: 10/17/2016)
7. Tanner, R., Gould, J., Turner, R., & Atkinson, T. (2005). Keeping the peace (and quiet). ISA InTech September.
8. For Slide#14
i. EEMUA 191 Edition 3, Appendix 1 The cost of poor alarm performance.
ii. Online source: http://www.nrc.gov/reading-rm/doc-collections/fact-sheets/3mile-isle.html
iii. Report of President's Commission on the Accident at Three Mile Island
iv. US chemical safety and hazard investigation board investigation report , Report No. 2005-04-I-TX refinery explosion and fire
v. US chemical safety and hazard investigation board investigation report , Report No. 2003-01-I-MS explosion and fire
vi. Online source: http://www.hse.gov.uk/comah/sragtech/casetexaco94.html
vii. EPA Chemical accident investigation report Tosco Avon accident Martinez California
viii. COMAH: The underlying causes of the explosion and fire at the Buncefield oil storage depot, Hemel Hempstead, Hertfordshire on 11 December 2005
ix. Online source : https://en.wikipedia.org/wiki/List_of_pipeline_accidents_in_the_United_States_in_the_21st_century
x. BP Grangemouth Scotland Major incident investigation report (HSE & SEPA)
xi. The Esso Longford Gas Plant Accident , Report of the Longford Royal Commission
9. ANSI/ISA-18.2-2009 “Management of Alarm Systems for the Process Industries”.
10. EEMUA 191-2013, “Alarm Systems: A Guide to Design, Management and Procurement Edition 3”. The Engineering Equipment and Materials Users Association.
11. Rothenberg, Douglas H. Alarm management for process control: a best-practice guide for design, implementation, and use of industrial alarm systems. Momentum Press, 2009.
12. Habibi, Eddie. Alarm Management: A Comprehensive Guide. International Society of Automation, 2011.
13. Hollifield, Bill, et al. The high performance HMI handbook. Plant Automation Services, 2008.
2/21/2017 SC Meeting 23
References14. Liptak, B. G. "Instrument Engineers’ Handbook Fourth Edition, Process Control and Optimization, vol. 2." (2006): 904.
15. Medica, Librerias Aula. "Guidelines for Safe Automation of Chemical Processes." Guidelines for Safe Automation of Chemical Processes-0816905541-103, 52 (1993).
16. Technical Report ISA-TR 18.2.4-2012- Enhanced and Advanced alarm methods
17. UK HSE information Sheet – Better alarm handling
18. UK HSE The management of alarm systems CRR166/1998
19. Abnormal Situation Management Consortium, www.asmconsortium.net
20. Exida- Saved by the Bell: Using Alarm Management to make Your Plant Safer
21. Honeywell Whitepaper- Alarm Management Standards – Are You Taking Them Seriously?
22. Alarm management and HMI: What documents are standards, what are not, What’s the difference and why should you care?- Bill Hollifield
23. Doug Metzger , ASM alarm management guideline and ISA-18.2: How do they stack up ?
24. Whitepaper : Guide to effective alarm management. Honeywell
25. Bullemer, Peter T., et al. "Towards Improving Operator Alarm Flood Responses: Alternative Alarm Presentation Techniques." Instrumentation Society of America (ISA) Automation Week, Mobile, Alabama (2011).
26. Nimmo Ian “ Rescue your plant from alarm overload
27. Mattiasson, C. T. "The alarm system from the operator's perspective" Human Interfaces in Control Rooms, Cockpits and Command Centres, 1999. International Conference on IET, 1999
28. Nimmo, Ian, and I. A. C. Honeywell "The Importance of Alarm Management Improvement Project." ISA INTERKAMMA (1999)
29. Koene, Johannes, and Hiranmayee Vedam "Alarm management and rationalization." Third International conference on loss prevention. 2000
30. Metzger, Doug, and Ron Crowe "Technology Enables New Alarm Management Approaches." ISA Technical Conference, Houston, TX. 2001
31. Dal Vernon, C. Reising, and Tim Montgomery. "Achieving Effective Alarm System Performance: Results of ASM® Consortium Benchmarking against the EEMUA Guide for Alarm Systems.“
32. Dal Vernon, C. Reising, Joshua L. Downs, and Danni Bayn. "Human performance models for response to alarm notifications in the process industries: An industrial case study." Proceedings of the Human Factors and Ergonomics Society Annual Meeting. Vol. 48. No. 10. SAGE Publications, 2004.
33. Bullemer, Peter T., et al. "Towards Improving Operator Alarm Flood Responses: Alternative Alarm Presentation Techniques." Instrumentation Society of America (ISA) Automation Week, Mobile, Alabama (2011).
2/21/2017 SC Meeting 24
References34. HSE Human Factors Briefing Note No. 9 Alarm Handling.
35. The Cost/Benefit of Alarm Management: An Economic Justification for Alarm System Re-engineering by PAS
36. Vedam, H., Venkatasubramanian, V., & Bhalodia, M. (1998). A B-spline based method for data compression, process monitoring and diagnosis.Computers & chemical engineering, 22, S827-S830.
37. Vedam, H. (1999). OP-AIDE: an intelligent operator decision support system for diagnosis and assessment of abnormal situations in process plants.
38. Dinkar, M. (1996). DKIT: A Blackboard-based, distributed, multi-expert environment for Abnormal Situation Management.
39. Vedam, H., Dash, S., & Venkatasubramanian, V. (1999). An intelligent operator decision support system for abnormal situation management.Computers & Chemical Engineering, 23, S577-S580.
40. Myers, P. M. (2013). Layer of Protection Analysis–Quantifying human performance in initiating events and independent protection layers. Journal of Loss Prevention in the Process Industries, 26(3), 534-546.
2/21/2017 SC Meeting 25
Backup Slides
Abnormal SituationAbnormal situation (AS):
– A disturbance or series of disturbances in a process that cause plant operations to deviate from their normal operating state
– Can develop, extend or change over time in dynamic process control environments
Abnormal Situation Management (ASM):– Timely detection and diagnosis of faults– Situation assessment and counter measure planning
(identify hazards, to avoid/mitigate them and plan for emergencies)
– Early diagnosis – Avoid progression of an event
2/21/2017 SC Meeting 27
Backup slides
How Alarm is generated ?
a) Instrument (Sensor in field) – Signal to Control Room –DCS/PLC compares to preset value – Deviation is displayed and sounded
b) Preset Values- Process Set Values
c) Operator needs to take corrective action
2/21/2017 SC Meeting 28
Backup slidesAlarm Management:
• Alarm management is the process of implementation of documentation, design, usage, and maintenance procedures to construct an effective alarm system.
• Principles of alarm management are:
1. Alert operator about real time information of abnormal condition of the plant to take appropriate action
2. Provide the information and guidance about required operator action
3. Each alarm should be relevant and require response from the operator
4. Levels of the alarms shall enable operator to take timely and appropriate action before the plant abnormal condition intensifies
5. Alarm system shall be designed addressing human abilities and limits.
2/21/2017 SC Meeting 29
Evolution of alarm management
2/21/2017 SC Meeting 30
Backup slides
Symptoms of unhealthy/ineffective alarm system:1. No appropriate master alarm database- No records for why alarms were
designed the way they are? (Master Alarm Database is an important part of PSM element- Process Safety Information)
2. Alarms are appearing without need of any operator action.3. No clear guideline /specification for “How to add or delete an alarm”4. Poor alarm testing procedures and records.5. Operating procedures are not written considering alarms.6. Operator is not aware about how to respond in case of some alarms.7. Change in alarm settings during shift changeovers.8. Important alarms are missed during incidents.9. Minor upsets result in significant number of alarms and operator
cannot keep up with.10. Alarms appear for considerable amount of time(even 24 hours) or
active alarms even in case of no upset11. Too many high priority alarms
2/21/2017 SC Meeting 31
Backup slides
The root causes for above mentioned symptoms are:1. No approved design basis or plant wide/site-wise philosophy and
alarm management procedures in place.2. Alarm are constantly added (during HAZOP,LOPA,PHA studies)
and rarely deleted3. Inadequate information in plant procedures and practices4. Inadequate operator training5. Complex designed alarm system and displays/ HMI’s in use6. Incorrect prioritization of alarms7. Alarm limits and priorities are rarely reviewed during the
operation of the plant8. Ineffective corrective actions, plant equipment not in service and
variations in plant operating conditions.9. Lack of Management of Change procedures.10. Not enough proven technology
2/21/2017 SC Meeting 32
Philosophy
Identification
Rationalization
Detail Design
Implementation
Operation
Maintenance
Audit
Monitoring
Management of Change
Hazard and Risk Assessment
Allocation of Safety functions to protection layers
Safety requirement Specification
Installation Commissioning and Validation
Operation and Maintenance
Modification
SIS Design &
Engineering
Design and developmen
t of other means of
risk reduction
Safety Life Cycle Alarm Management Lifecycle
2/21/2017 SC Meeting 33
Backup slides
Alarm Suppression
Suppression is a technique used to assist in handling high volume of alarms, occurring due to inadvertent shutdown of equipment or an unplanned shutdown
The principle is to hide or mask alarms which no longer have any value to operator
Types:
1. Alarm shelving (manual suppression)
2. Designed suppression (automatic suppression)
3. Out of Service
2/21/2017 SC Meeting 34
Backup slides
Type of Suppression Description &
Characteristics
Example Problem
State-based Suppression
(Static Suppression)
Suppress alarms with pre-
defined states of operation,
equipment and process.
Planned event
Manually initiated
transition
Time Frame: Short
term (hours),long terms
(months)
Reactor startup,
Distillation column in
start-up mode.
State alarms
Alarm Flood Suppression
(Dynamic Suppression)
Suppress alarms which are
not relevant and meaningful
in case of an event and when
the same process can lead to
a hazardous situation
Compressor trip Flooding of alarm
2/21/2017 SC Meeting 35
Backup slides
Alarm suppression principles which can be used
1. Suppress alarms during testing
2. Suppress redundant alarms (alarm the same deviation)
3. Suppress alarms from out of order equipment or under maintenance
4. Suppression based on the operating state or shutdown logics.
5. Usage of first-out logic for shutdown alarms and suppressing others (consequence of shutdown)
6. If alarm suppression hinges on a signal that is not trustworthy, the alarm shall not be suppressed.
2/21/2017 SC Meeting 36
Backup slides
The alarm shelving function requirements:
1. Display of shelved alarms
2. Time limit for shelving
3. Access control for shelving
4. Ability to un-shelve alarms
5. Record of each alarm shelved
2/21/2017 SC Meeting 37
Human Response
2/21/2017 SC Meeting 38