1Camilo LaraKIP
CBM Conference 2006System Management for distributed DCS
System Management for distributed DCS
2Camilo LaraKIP
CBM Conference 2006System Management for distributed DCS
Contents
Introduction:Planned Controls Architecture
Extended Controls Architecture
The SysMES ProjectSystem PropertiesSystem FunctionalityArchitecture
Current State
Outlook
3Camilo LaraKIP
CBM Conference 2006System Management for distributed DCS
Planned Architecture
Network
DCS Board DAQ Component
Controls Server
Detector Detector
GUI
Controls Client
GUI
Controls Client
Controls communication
protocol
DCS Board
Controls Server
Linux
Controls Server
Linux
Controls communication
protocol
4Camilo LaraKIP
CBM Conference 2006System Management for distributed DCS
Planned Architecture
Network
DAQ Component
Controls Server
Detector Detector
GUI
Controls Client
GUI
Controls Client
Detector Detector
Control servers obtain and save
sensor data from detector Network
GUI
Controls Client
GUI
Controls Client
Control clients get and display
information from the network
DAQ Component
Controls Server
Control servers send information to the network using a
control communication protocol
DCS Board
Controls Server
Linux
DCS Board
Controls Server
Linux
Controls communication
protocol
Controls communication
protocol
5Camilo LaraKIP
CBM Conference 2006System Management for distributed DCS
Planned Architecture
Network
DAQ Component
Controls Server
Detector Detector
GUI
Controls Client
GUI
Controls Client
Detector Detector
Network
GUI
Controls Client
GUI
Controls Client
DAQ Component
Controls Server
Commands can be sent from
clients to servers
DCS Board
Controls Server
Linux
DCS Board
Controls Server
Linux
6Camilo LaraKIP
CBM Conference 2006System Management for distributed DCS
Controls Systems Characteristics
Controls system (e.g. Epics)Designed for the measurement and visualization of system
informationVery good scalabilityVery high data measurement rateMeasured values build a real time database
Normally static configurationDifficult to implement high availability infrastructureLimited interactions with the control servers on the front endLimited possibilities for information correlation for detecting
undesirable statesLimited facility for automatic or manual reaction in case of
failure
Str
on
g P
oin
tsW
eak
Po
ints
7Camilo LaraKIP
CBM Conference 2006System Management for distributed DCS
Network
DCS BoardControls Server
Detector Detector
GUI
Controls Client
GUI
Controls Client
DCS BoardControls Server
DAQ ComponentControls Server
SysMES Framework
Extended Architecture
Network
DCS BoardControls Server
Detector Detector
GUI
Controls Client
GUI
Controls Client
DCS BoardControls Server
DAQ ComponentControls Server
SysMES Framework
SysMES Client
SysMES Client
SysMES Client
SysMES Client
SysMES Client
SysMES Client
Data obtained
from detectors as
before
SysMES client gets and stores
information from controls
server
Communication between
SysMES Clients and framework if
necessary
SysMES client can react automatically without framework interaction
SysMES framework sends a reaction to the clients
Message Job
8Camilo LaraKIP
CBM Conference 2006System Management for distributed DCS
Distributed System Management
Highly available ArchitectureUsing clustering and redundancy (Dynamic)
Possibility to interact with the DCS Board using JobsExecution of binaries by the client (e.g. restart of the Control
Server) Possibility to reconfigure SysMES Clients/Servers on the fly
Changing the management capabilities without restart Possibility to recover a SysMES Client Configuration or State on
the fly In case of crash can recover a previous configuration and data
state Complex rule system for triggering on conditions
Complex rule triggering and reaction on the client or externally on the servers
9Camilo LaraKIP
CBM Conference 2006System Management for distributed DCS
The SysMES Project
System Management for Embedded Systems
10Camilo LaraKIP
CBM Conference 2006System Management for distributed DCS
System properties
Based on available, established standards Interoperability and manufacturer independanceXML (Extensible Markup Language)CIM (Common Information Model)
Object-orientated modelling of the complete systemSimple modelling of their relationshipsReusability
DecentralizationDecentralized modellingDecentralized storage of informationDecentralized and dynamic configurations managementDecentralized management
11Camilo LaraKIP
CBM Conference 2006System Management for distributed DCS
System properties
High availability and scalabilityNo single-point-of-failureRedundant storage of informationClustering of management resources and DBLoad balancing
FlexibilityPlatform independentSelf managementAutomatic reaction to triggering conditions
ReliabilityTransaction-based communication to avoid information loss
12Camilo LaraKIP
CBM Conference 2006System Management for distributed DCS
System functionality
ModellingObject orientated modelling of resources in UMLCreation of Objects from this modelTransfer of Objects to management framework
MonitoringMonitoring occurs directly in Client Interface to other monitoring systems
Message generationEvaluation of the measured values in Client
• Dynamic decision of which values have to be processed
• Prevention of management environment system overload
13Camilo LaraKIP
CBM Conference 2006System Management for distributed DCS
System functionality
Message HandlingDecentralized storage of messages in DB and on Client
Job ManagementCommunication with Clients through JobsDifferent Jobs types
• Configuration Jobs: e.g. changeMonitor
• Update Jobs: e.g. addMonitor
• Management Jobs: e.g. deleteMessages
14Camilo LaraKIP
CBM Conference 2006System Management for distributed DCS
System functionality
Configuration ManagementClient knows its current configuration stateClient stores its current configuration for recoveryAll possible configurations are stored on server
Complex Rule Handling3 tier Rule management SystemTier 1. Rule management on the client (reaction < 10 ms)Tier 2. Simple rule management on the server (reaction < 300
ms)Tier 3. Complex rule management on the server using a expert
system (reaction < 1 s)
15Camilo LaraKIP
CBM Conference 2006System Management for distributed DCS
Architecture – Physical View
AccesP oint
CIMServer
CIMServer
WAM
AccesPointPOWERFAULT DATA ALARM
DataBaseCluster
DataBaseCluster
DataBaseCluster
Admin GUI
WAM
LAM LAM
Full ClientThin Client
AccessPoint
16Camilo LaraKIP
CBM Conference 2006System Management for distributed DCS
Architecture – Logic View
Deploy Module
Object Management Model
XMI2MOF
Class Management Model
Message Module
DatabaseJob
Module
Connection Module
Rule Module
CIMConnector
MonitoringMessage Handling
Job Handling
Third party Interface Operating System
Rule Handling
mof
object
xmi
xml
CIM Server
ArgoUML / Poseidon / Rational
Rose
XMI2MOFCIM Object
Manager
CIM Navigator
Java
Mo
del
ling
Clie
nts
Man
agem
ent
Fra
mew
ork
17Camilo LaraKIP
CBM Conference 2006System Management for distributed DCS
Architecture – Logic View
Deploy Module
Object Management Model
XMI2MOF
Class Management Model
Message Module
DatabaseJob
Module
Connection Module
Rule Module
CIMConnector
MonitoringMessage Handling
Job Handling
Third party Interface Operating System
Rule Handling
xml
WAM / LAM Server
Enterprise JavaBeansJBOSS 4.0
Tomcat 5.0.28
MySQL Database
Cluster
Java AccessPoint
xml
mof
object
xmi
Mo
del
ling
Clie
nts
Man
agem
ent
Fra
mew
ork
18Camilo LaraKIP
CBM Conference 2006System Management for distributed DCS
Architecture – Logic View
Deploy Module
Object Management Model
XMI2MOF
Class Management Model
Message Module
DatabaseJob
Module
Connection Module
Rule Module
CIMConnector
MonitoringMessage Handling
Job Handling
Third party Interface Operating System
Rule Handling
http xml
xmlxml
Thin / Full Client
Java Interpreter / C Compiler
Linux
μcLinux
mof
object
xmi
Third Party Interfacte
Mo
del
ling
Clie
nts
Man
agem
ent
Fra
mew
ork
19Camilo LaraKIP
CBM Conference 2006System Management for distributed DCS
Current State
Prototype implementation at Kirchhoff Institute for Physics has been completed
Management of HLT experimental cluster32 Linux PCsMonitoring with EPICS/SNMP, Lemon and GangliaRules, Jobs, Configurations Management used
Next test at University of Paderborn, Arminius Cluster - 3rd to 6th March200 PCs x 2 Intel Xeon 3.2 GHz ProcessorsOnline analysis of simulated ALICE TPC Data (Time Projection
Chamber) Management of cluster analysing 1000 simulated Proton-
Proton events at 1-3 MByte per event
20Camilo LaraKIP
CBM Conference 2006System Management for distributed DCS
Outlook
Future implementation on 500 node ALICE HLT cluster Dynamisation of HLT cluster:
On-the-fly rerouting of data through the analysis chainOn-the-fly shutdown of idle nodes to save resources
Extension of the CBM project detector controls system (if wanted?!)
21Camilo LaraKIP
CBM Conference 2006System Management for distributed DCS
Summary
The SysMES Framework includes the advantages of current controls systems and extends their functionality to dynamic management
It is suited to complete controls systems and cluster management systems
22Camilo LaraKIP
CBM Conference 2006System Management for distributed DCS
Thank you for your attention
Any questions?