eLT
C, M
arku
s Z
erla
uth,
7th M
arch
200
8
1
Post Mortem System (during HWC and Beam operation)
Markus Zerlauth, AB-CO-MI
Acknowledgments: Jorg, Verena, Vito, Adriaan, Nikolai and many others
eLT
C, M
arku
s Z
erla
uth,
7th M
arch
200
8
2
Outline
PM System during Hardware Commissioning– Automated testing– Application software
PM System towards parallel commissioning and beam operation– Infrastructure– Application software for beam PMA
Conclusions and outlook
eLT
C, M
arku
s Z
erla
uth,
7th M
arch
200
8
3
PM System during HWC
eLT
C, M
arku
s Z
erla
uth,
7th M
arch
200
8
4
Post Mortem Analysis during HWC
During HWC, each circuit is following a pre-defined set of current cycles to validate functionality of powering equipment and protection systems
LSA Test plan / circuit type (example: individually powered quads in sector 45) HWC sequencer is executing the tests and collecting the results of PMA before sending
data to MTF Hand-shake mechanism between sequencer and PMA for increased automation Main Requirement during HWC: Efficient analysis and validation of specifc test cycles
(prior knowledge about what to expect!)
eLT
C, M
arku
s Z
erla
uth,
7th M
arch
200
8
5
3. PNO.2 auto analysis
PM system and PM Analysis for HWC
Courtesy of PM and sequencer teams
Executed test creates analysis request + data completness check (collection of related PM data)
Test will be analysed and approved in PM Event Handler (dedicated tools for equip experts and general PM data viewer provided by CO-MA)
– Pass / Fail decision to continue / repeat the test step
eLT
C, M
arku
s Z
erla
uth,
7th M
arch
200
8
6
3. PNO.2 auto analysis
PM system and PM Analysis for HWC
Courtesy of PM and sequencer teams
eLT
C, M
arku
s Z
erla
uth,
7th M
arch
200
8
7
PM System towards Beam (and parallel HWC)
Infrastructure
eLT
C, M
arku
s Z
erla
uth,
7th M
arch
200
8
8
PM infrastructure upgrade – Preparing for the full LHC load
n FE servers
…
PO/QPS gateway
…
BE server
n FE servers
BE server
Redundant and scalable services in CCR and TCR, connected to independent network resources
Redundancy for client connections
Multiple Front-end (FE) servers for primary data storage of clients
Back-end (BE) servers with large disks for complete data image
Single Proliantcs-ccr-pm1
Until Mar 08: single server single process on cs-ccr-pm1
Single points of failures due to dependencies in CCR (network, upgrades, etc..)
Small data volume (250GB in 3 years)
eLT
C, M
arku
s Z
erla
uth,
7th M
arch
200
8
9
PM infrastructure - miscellaneous
Today Switch to new platform has been performed successfully this week and can now be transparently
scaled to the coming needs
HWC clients started to re-compile operational GW to use new client lib
Upgrade is a major improvment in terms of reliability, availability and performance
Further infrastructure work during 2008: Scalability / Load tests (some 10 GB / dump – BLM data concentrator)
DB catalogue of PM dumps (including PM data before Mar 2008)
Dependency studies of major clients (machine protection systems) on network, mains supply, etc... (watch out for correlated failures!)
In collaboration with CO-FE, feasibility study launched for the use of local memory in FE
Further extension of PM client lib towards use of multiple FE servers / client (>2) and better automated load sharing
Set of monitoring and data consistency tools
Decision on Data lifetime on various system levels (e.g. CASTOR as long term archive)
eLT
C, M
arku
s Z
erla
uth,
7th M
arch
200
8
10
PM System towards Beam
Application Software
eLT
C, M
arku
s Z
erla
uth,
7th M
arch
200
8
11
Main Requirements for beam PMA:
– needs to reveal cause of emergency beam abort / possible equipment damage to improve operational procedures and protection systems
• Initiating event
• Event sequence leading to dump/incident
– Validate correct funtioning of protection systems (redundancy within system, etc..)
– Automated analysis modules in view of systems and data volume
Main difficulty & difference to HWC: No prior knowledge about what will happen, many more systems involved, no ‘good’ use-cases yet (-> experience)
Operations crew needs clear indication after an emergency beam dump, whether the machine is ready to proceed or not
Application SW for LHC Beam PMA
ES: Application SW for LHC Beam PMA
eLT
C, M
arku
s Z
erla
uth,
7th M
arch
200
8
12
Likely use-case
Initiating event: Power converter failure (e.g. due to water fault) in recombination dipole RD1.LR5
Possible event sequences derived from the BIS:
– PC (interlocked circuit) > WIC/PIC > BIC > dump triggered
– Current decay > FMCM > BIC > dump triggered
– PC > orbit change:
• Beam loss @ collimators > BLMs > BIC > dump triggered
• Fast orbit change > BPM interlock > BIC > dump triggered
Multiple triggers arriving at BIS
– Even for ‘simple’ use case the event sequence might not be conclusive
– Which one was first ? depends on reaction + transmission times, thresholds, accuracy of internal time-stamps,…
eLT
C, M
arku
s Z
erla
uth,
7th M
arch
200
8
13
Data flow and analysis
BLM BPM FGC QPS PIC/WIC BIS… …XPOC
Data completeness and consistency check at system and global level (minimum data, configurable)
Upon beam dump / self triggering, systems start pushing data to PM system, Logging, Alarms, etc…
Individual System Analysis/Checks: Validation of machine protection features, pre-analysis of PM buffers into result files, flagging of interesting systems/data reduction, database catalogue
I/XPOCIPOC-BISEvent Sequence
Circuit events
BLM, BPM > threshold
Global PM Analysis: Global Event sequence, summaries, advised actions, event DB,…
FMCM
Global event sequence
Advised Actions
Machine ProtOK
…
eLT
C, M
arku
s Z
erla
uth,
7th M
arch
200
8
14
Open Post Mortem framework
„Open“ To cope with diversity
– Large variety of systems and data sources which provide PM data
– External systems with PM-like functionality (e.g. I/XPOC-LBDS, IPOC-BIS, HWC tools, etc..) that need to be integrated
– Multiple analysis modules, contributed by different parties, written in different languages (C/C++/JAVA/Labview,...) that work on the same data and need to be executed in the right order
– Different users that want to use the system for different purposes (operators/PM team/experts)
How: coherent overall architecture with:– Support to plug-in different analysis modules into the analysis data flow
(order of execution and the data they should process)
– Standardized data structures (result files , XML or DB) for data exchange between the different analysis modules
eLT
C, M
arku
s Z
erla
uth,
7th M
arch
200
8
15
Data flow and analysis – First vertical slice
BLM BPM FGC QPS PIC/WIC BIS… …XPOC
Data completeness and consistency check (configurable)
Upon beam dump / self triggering, systems start pushing data to PM system, Logging, Alarms, etc…
Individual System Analysis/Checks: Validation of machine protection features, pre-analysis of PM buffers into result files, flagging of interesting systems/data reduction, database catalogue
X/IPOCIPOC-BISEvent Sequence
Circuit events
BLM, BPM > threshold
Global PM Analysis: Global Event sequence, summaries, advised actions, event DB,…
FMCM
Global event sequence
Advised Actions
Machine ProtOK
…
eLT
C, M
arku
s Z
erla
uth,
7th M
arch
200
8
16
First prototyping has been done based on HWC experience In addition to existing classification / system, fully automated process will create
SCEvent - Classification (Single Circuit Event)– Relating all equipment data belonging to this event (Layout DB)– Identifying the event sequence by retrieving history buffer of interlock
systems (Logging DB)– (Pre-)analysis of event based on event sequence and PM data– Event summary for DB upload and further (global analysis)
Individual System Analysis (event building) for HWC
eLT
C, M
arku
s Z
erla
uth,
7th M
arch
200
8
17
Milestones
First Infrastructure upgrade completed (1st priority) March 08
– Start testing BI data transfer to PM system, scalability tests with BLM (Global PM trigger in timing)
April 08
– With help of OP and equip experts, work out a series of use cases for first months/years of operation with beam
– Specification of data completness and result file structure for minimum Individual System Analysis (tbd with LBDS, BIS, HWC, BI, ...)
July 08
– Basic PM framework including individual system analysis for HWC, BIS, XPOC, BLM and BPM
– GUI for first event sequence and results, standardised data viewers 2nd half 08
– First global PMA modules
eLT
C, M
arku
s Z
erla
uth,
7th M
arch
200
8
18
Conclusions and Outlook
HWC is useful to gain experience for use of PM tools during operation Variety of systems and data sources requires an open framework to
accept user-provided code Focus on standardisation and first vital individual system / POC checks
for first months of operation With experience, build up more advanced modules for global PMA As recent focus has been HWC -> Lots of work still to be done, but we
have a plan and a motivated team across CO and OP (Verena, Jorg, Vito, Roman, Adriaan, Hubert, Dmitriy, Nikolai, Markus,...)
Thanks a lot for your attention - Questions?