Date post: | 21-Jan-2016 |
Category: |
Documents |
Upload: | rosa-hubbard |
View: | 218 times |
Download: | 0 times |
TRACKING OF FAULTS AND FOLLOW-UP
Accelerator Fault Tracking project
Jakub Janczyk (TE-MPE-PE / BE-CO-DS)with input from: Andrea Apollonio, Chris Roderick, Rudiger Schmidt, Benjamin Todd, Daniel Wollmann
R2E/Availability Workshop 2
Agenda
• Purpose of fault tracking
• What has been done in the Past
• Accelerator Fault Tracking project – plans & status
• Summary
10/14/2014
R2E/Availability Workshop 3
Purpose of fault tracking
Complete and consistent tracking allows to identify:• Problems as early as possible to allow for timely mitigation• Key issues which will limit performance of accelerators or
equipment in the future (Run2, Run3, HL-LHC)• Increase availability, in both short- and long-term, by dealing with
issues ASAP
Track Faults in two areas:
1. Directly affecting accelerator operation – identify root causes (e.g. R2E effects, glitches in electrical network, etc.)
2. Equipment (electronic) faults independently of immediate impact on accelerator operation
10/14/2014
R2E/Availability Workshop 4
What has been done in the Past
• A lot of different tools for logging of faults, used by different teams:• eLogbook, Post-Mortem, RadWG page, tools in equipment groups
(JIRA, Excel, Onenote, eLogbook)
• A lot of effort was required from individual teams/working groups to gather and exploit fault data
• Nevertheless, difficult to get a consistent picture
10/14/2014
Credit M.
Brugger
R2E/Availability Workshop 6
Cardiogram - „life” of LHC from operational point of view
• Graphical analytic tool for combining data from different sources
• Initially created by members of Availability WG: B. Todd, L. Ponce, A. Apollonio
• Tedious work to gather and prepare all the necessary data several months for 2010-2012 cardiogram
10/14/2014
R2E/Availability Workshop 7
Cardiogram - example
10/14/2014
Accelerator Mode (Proton Physics, Ion Physics, etc.)
Access
Fill Number
Particle Momentum
Beams Intensities
Stable Beams
PM Beam Dump
Beam Dump Classification
Fault
Fault Lines(Systems/ Fault Classifications)
Credit
AWG
R2E/Availability Workshop 8
Cardiogram – data preparation
10/14/2014
Credit Benjamin
Todd
R2E/Availability Workshop 10
Accelerator Fault Tracking projectProject launched February 2014 (BE/CO, BE/OP, TE/MPE collaboration)
Based on initial inputs from:• Evian Workshops• Availability Working Group• Workshop on Machine Availability & Dependability for Post-LS1 LHC• BE/OP
Goals:
• Capture consistent and complete fault data
• Facilitate fault tracking from perspective of all interested parties (OP,
equipment groups, working groups)
• Single source of data – easier to complete, clean and analyse.
• Provide consistent / standardized statistics, analyses, reports for
different users (8:30 meetings, weekly reports / summaries)
• Interactive overview of faults (cardiogram on demand)
• Proactively identify incomplete data
10/14/2014
Plans (as presented by Chris Roderick @ LMC 30-04-2014)
Provide infrastructure to consistently & coherently capture, persist and make available accelerator fault data for further analysis.
Foreseen project stages:
1. Put in place a fault tracking infrastructure to capture LHC fault data from an operational perspective
• Enable data exploitation by others (e.g. AWG and OP) to identify areas to improve accelerator availability for physics
• Ready before LHC beam commissioning
• Infrastructure should already support capture of equipment group fault data, but not primary focus
2. Focus on equipment group fault data capture
3. Explore integration with other CERN data management systems (e.g. Infor EAM)
• potential to perform deeper analyses of system and equipment availability
• in turn - start predicting and improving dependability
To support data analysis, AFT data extraction infrastructure should also provide data complimentary to the actual fault data - such as accelerator operational modes and states.
Scope:
Initial focus on LHC, but aim to provide a generic infrastructure capable of handling fault data of any CERN accelerator.
We are here...
Tim
e
R2E/Availability Workshop 12
Status• AFT is under development – Web application, available for different users, and integration with eLogbook for LHC operators
• Functionalities available from day 1 will be as planned for first stage of the project
• AFT test version available• We’re open to start discussion with equipment groups
10/14/2014
R2E/Availability Workshop 1310/14/2014
R2E/Availability Workshop 1410/14/2014
R2E/Availability Workshop 1510/14/2014
R2E/Availability Workshop 16
Turnaround Time
10/14/2014
R2E/Availability Workshop 17
Summary• Consistent and complete tracking of faults is the key to
identify and efficiently mitigate issues• The AFT will ease the recording of faults and their root
causes in a complete and consistent way• Run2 data will be essential to identify future
performance/availability limitations towards HL-LHC• Quality and completeness of the data requires effort
from all involved parties• Open to discuss integration of equipment groups data
10/14/2014
R2E/Availability Workshop 18
Questions
10/14/2014
R2E/Availability Workshop 19
Extra Slides
10/14/2014
R2E/Availability Workshop 20
Roles and simplified workflow
10/14/2014
R2E/Availability Workshop 2110/14/2014
2011
2010
2012
R2E/Availability Workshop 22
Multiple failures
• It is easy to see if there are multiple failures at the same time, but it’s not obvious if they are related.
• One of the goal of AFT project is to capture data that will allow to show the relations between faults.
10/14/2014
Faults related
Water leak
Problems caused by water leak
Faults not related – QPS failed and rest of them are accessesin shadow
R2E/Availability Workshop 23
Access without faults
• In 2012, around 40 times there was access without any fault
• The reasons for these accesses are not classified, but often something is repaired
• Inconsistent data – cardiogram allows to spot this
10/14/2014
R2E/Availability Workshop 24
Access without faults - examples
10/14/2014
Few accesses:ATLAS,Change of PC,repair of QPS,intervention on the crates of the BPMD
LHCb – fixing muondetectors Accesses in
shadow of QPS fail:QPS – reset cards,ALICE and CMS,Cryogenics – valveregulation,RF – replacing brokenattenuator
ATLAS access