Alarm Management
 Definitions of an alarm
 Based on ISA 18.2  An audible and/or visible means of indicating to the operator an equipment malfunction, process deviation, or abnormal condition requiring a response
 The keywords
 Audible and/or Visible Indication
 For the operator
 Indicating either
 Equipment Malfunction
 Process deviation
 Abnormal condition
 Requiring Response
 The Core Principles in an Alarm System
 Relevant (Not Spurious)
 Prioritized
 Informative (With documentation and diagnostic)
 Unique (Not Redundant)
 Timely
 Advisory (Action Required)
 Purpose of Alarm is to prevent
 Personal injury
 Environmental
 Economic Loss
 Equipment Damage
 Product Quality
 Downtime
 Benefits of good alarms system
 Improve Operational Effectiveness
 Avoid costly plant trips
 Without effective alarming, demand is on ESD
 Minimize upset time
 A good alarm system quickly alerts operator to bring the plant to stable state
 Improve HSE
 HSE related alarms such as fire alarms, environment and occupational safety alarms
 Reduce operator’s load
 More time to concentrate on optimization
 Legal and Insurance Compliance
 Alarm Screening / Identification
 A prerationalization process using a combination of maintenance, sophisticated analysis tool and experience to quickly minimize the number of alarms
 Performed by Control System Engineer, E&I Engineers, Operators and Technicians
 Alarm Rationalization
 The process of reviewing each alarm identified from the identification stage
 Purpose of rationalization is to identify
 Purpose and validity
 Consequence of inaction
 Alarm Class
 Priority
 Alarm Set point
 Operator Action
 Alarm Rationalization Alarm Checklist. There should be an alarm only if :
 There is a potential cause of the alarm
 There is need and possibility for operator intervention
 There is a consequence of inaction
 Consequence of Inaction
 Is typically a set of tables which are divided into
 Economic
 Personal Safety
 Environment
 Reputation (Normally not included)
 Economic Impact can be measured in
 Time
 Cost
 % (of operating cost / feed)
 Lost in production unit (such as Mtpa for mining plant, BPSD for oil refinery and etc)
The details of these definitions are outlined below:
· Employee / Contractor / Personnel Safety

· To estimate the personnel consequence of an event, consider the following extensions of the keywords given in the margin of the risk matrix

· Low

· Moderate

· High

· Reversible health effects of concern.
· Reversible injuries requiring treatment, but does not lead to restricted duties.
· Medical treatment.

· Severe reversible health effects of concern.
· Reversible injury or moderate irreversible damage or impairment to one or more persons.
· Lost time illness or injury.

· Life threatening or irreversible health effects or disabling illness.
· Single fatality and/or severe irreversible damage or severe impairment to one or more persons.

Table 1 – Personnel and Safety
· Environment

· To estimate the personnel consequence of an event, consider the following extensions of the keywords given in the margin of the risk matrix

· Low

· Moderate

· High

· Nearsource confined and shortterm reversible impact.
· (Typically a week)

· Nearsource confined and mediumterm recovery impact.
· (Typically a month)

· Impact is unconfined and requiring longterm recovery, leaving residual damage.
· (Typically years).

Table 2 – Environmental Impact
· Reputation / Community Trust

· To estimate the personnel consequence of an event, consider the following extensions of the keywords given in the margin of the risk matrix

· Low

· Moderate

· High

· Impact on reputation of a Business Unit. Significant public exposure in local media.
· Tangible expressions of trust / mistrust amongst a few community members with some influence on public opinion and decisionmakers.

· Impact on reputation of Product Group. Comment from national NGO which impacts credibility with neighbours/ regional government. Public exposure in the national media.
· Tangible expressions of trust / mistrust amongst some community members with moderate influence on public opinion and decisionmakers.

· Impact on reputation of Rio Tinto Group. Comment from international NGO. Public exposure in international media.
· Tangible expressions of trust / mistrust amongst most community members with significant influence decisionmakers. Widespread loss / gain of trust across the community setting the agenda for decisionmakers and key stakeholders.

Table 3 – Community & Reputation
· Business / Production Loss / Equipment Damage

· To estimate the personnel consequence of an event, consider the following extensions of the keywords given in the margin of the risk matrix

· Low

· Moderate

· High

· < 2.5% of Operating cost
· < 0.15 Mtpa
· < 4hrs Downtime

· 2.5 – 7.5% of Operating cost
· 0.15 – 0.5 Mtpa
· 4hrs – 8hrs Downtime

· >7.5% of Operating cost
o 0.5 Mtpa
· > 8hrs Downtime

Table 4  Business Impact
 Alarm Priority
 There are typically 3 Alarm Priorities which determines the type of audible sound and visual on a HMI
 The 3 Priorities
 Low
 Medium / Normal
 High / Critical
 Alarm Priority Matrix
 Is typically a 3 by 3 matrix which has Consequence and Response Urgency dimension
 It is used to determine the priority of an alarm during alarm rationalization
· Urgency of Controller Response

· Low

· Moderate

· High

· > 30 minutes (longest time)

· Priority “Low”

· Priority “Low”

· Priority “High”

· 5 to 30 minutes (typical time)

· Priority “Low”

· Priority “High”

· Priority “Critical”

· < 5 minute (fastest time)

· Priority “High”

· Priority “Critical”

· Priority “Critical”

Alarm Priority Matrix
The example above is for a mining plant which is usually slow response. For process plant, the controller urgency would be >10 minutes, 210 minutes, <2 minutes
 Alarm Priority Distribution
 It is a best practice that priority level is distributed in the following manner
 High = 515%
 Medium = 15 – 30%
 Low = 55 – 80%
Alarm Management Reports
 The fundamental alarm management report is the Alarm count over time
 Cluster Analysis
 Used to analyze chattering alarms.
 In Matrikon Alarm Manager Maximum number of events to analyze is the number of events to analyze to check on a cluster. It is good to put the numbr as high as possible such as 10,000,000
 A cluster is a group of alarms that repeats itself within the specified cluster time (subsequent windows). This is also defined as time windows. A time window is typically set as 60 seconds
 From the report,
 The number of cluster tells how many clusters are there.
 The average cluster member tells the average number of time the alarm chatters in the cluster
 From the report, the most important analysis that can be done is the total alarm occurrences vs chatter occurrences; the ratio of this is the cluster member%.
 Total alarm occurrence is the total number of alarms.
 Chatter occurrence is the total number of alarms which is inside a chattering window.
 The cluster member % is the chatter occurrence / alarm occurrence
 Symptomatic Analysis
 Is used to find an alarm that will always tend to appear after a parent alarm occurs. This alarm (often referred to as the child alarm) is said to be predictable.
 The predictability measure the % of occurrence the child alarm comes up after the parent alarm. It is recommended that a predictability > 50% is considered predictable and the child alarm can be automatically inhibited should the parent alarm come out
 The significance measures the % precedence of the parent alarm when a child alarm comes up.
 Total Alarm Count
 Sum of Audible Alarms (exclude alarms that default filters apply to) between the start and end time.
 Total Event Count
 Sum of all Events (Alarms, Return to Normals, Acknowledgements, Operator Actions, System Messages, etc) between the start and end time. Exclude events that default filters apply to.
 Total Intervention Count
 Sum of all Interventions (message type is Operator Action) between the start and end time. Exclude events that default filters apply to.
 Alarm Rate
 The Alarm Rate represents the average number of alarms per hour for the selected areas (exclude alarms that default filters apply to). Divide by the sum of the operators assigned to those areas.
 Peak Alarm Rate
 The Peak Alarm rate will divide data into 10minute slices. Take the maximum number of alarms in a 10minute slice for the selected areas (exclude alarms that default filters apply to). Divide by the sum of the number of operators assigned to those areas. Multiply the result by 6 for an hourly peak alarm rate.
 Percent Upset (Percent Hours in Burst Mode)
 Slice each 10 minures. If in 10 minutes there are more than 5 alarms, flag that as burst
 Percent Upset refers to the percentage of 10minute chunks where more than 5 alarms "annunciated" per operator.
 Intervention Rate
 Intervention Rate = Sum of Interventions (message type is Operator Action) for selected areas during the interval / Total Number of operators assigned to those areas / Number of hours in the interval.
 Intervention to Alarm Ratio
 Intervention to Alarm Ratio = Intervention Rate / Alarm Rate
 Priority Distribution
 Priority Distribution represents the condition field as a percentage. Results should match the "Alarms By Condition Range" in Excel.
 A good priority distribution should be as follows :
 Critical Alarm = 0
 High Alarms 05%
 Medium Alarms 5%  15%
 Low Alarms >80%
 Top 20 Alarm Percent
 Sum of 20 "Most Frequent Alarms" divided by the total number of alarms in the time span selected. Filters used should apply to both the numerator and the denominator.
 Top 20 Interventions Percent
 Sum of 20 "Most Frequent Interventions" divided by the total number of interventions in the time span selected. Filters used should apply to both the numerator and the denominator.
 Average Alarm Rate (10 mins)
 This calculation is performed exactly as Alarm Rate. Divide the result by 6 for a 10 minute slice.
 Maximum Alarm Rate (10 mins)
 This calculation is performed exactly as Peak Alarm Rate, however do not multiply the final result by 6 as you would in Peak Alarm Rate.
 Average Alarm Rate (Daily)
 Average Alarm Rate (Daily) = Sum(Total # of Audible Alarms) / Total Number of Assigned Operators / # of Days (for less than 1 day use a fraction to represent the number of hours)
 Maximum Alarm Rate (Daily)
 If the Interval is 1 day or less, the Maximum Alarm Rate will be the same as the Average Alarm Rate (Daily).
 If the Interval is greater than 1 day, then:
 Maximum Alarm Rate (Daily) = Sum(Total # of Audible Alarms) / Total Number of Assigned Operators. Repeat this process for each day in the selected interval.
 Intervention Rate (Daily)
 Daily Intervention Rate = Sum of Interventions (message type is Operator Action) for selected areas during selected interval / Total Number of Operators for those areas / Number of days in the interval.
 Intervention Rate (10 mins)
 10 min Intervention Rate = Daily Intervention Rate = Sum of Interventions (message type is Operator Action) for selected areas during selected interval / Total Number of Operators for those areas / Number of 10minute slices in the interval (typically 6).
 Percent Time < 1 Interventions
 Sum of 10minute intervals where "Intervention Rate (10 mins)" is less than or equal to 1 divided by the total number of 10minute intervals x 100%.
 This calculation is already normalized per operator.
 Percent Time > 10 Interventions
 Sum of 10minute intervals where "Intervention Rate (10 mins)" is greater than or equal to 10 divided by the total number of 10minute intervals x 100%.
 This calculation is already normalized per operator.
 Percent Time 110 Interventions
 100%  (Percent Time < 1 Interventions)  (Percent Time > 10 Interventions)
 Percent Time < 1 Alarms
 Sum of 10minute intervals where "Average Alarm Rate (10 mins)" is less than or equal to 1 divided by the total number of 10minute intervals x 100%.
 This calculation is already normalized per operator.
 Percent Time > 10 Alarms
 Sum of 10minute intervals where "Average Alarm Rate (10 mins)" is greater than or equal to 10 divided by the total number of 10minute intervals x 100%.
 This calculation is already normalized per operator.
 Percent Time 110 Alarms
 100%  (Percent Time < 1 Alarms)  (Percent Time > 10 Alarms)
 Performance Category