Date post: | 12-Apr-2017 |
Category: |
Software |
Upload: | j-on-the-beach |
View: | 72 times |
Download: | 0 times |
C2MON A highly scalable monitoring platform for Big Data scenarios at CERN
Matthias Bräger, CERN [email protected] 20th – 21st of May 2016 J on the Beach, Malaga, Spain
About me
▪ Responsible for Technical Infrastructure Monitoring (TIM) service at CERN ▪ Head of the CERN Control and Monitoring Platform (C2MON): http://cern.ch/c2mon ▪ Working at CERN since Dec. 2007
Before CERN: ▪ 2 years at LOGICA space department for ESOC, Darmstadt, Germany ▪ 4 years Java developer at IBM, Mainz, Germany
3
European Organization for Nuclear Research
▪ Founded in 1954 (60 years ago!) ▪ 21 Member States
▪ ~ 3’360 Staff, fellows, students... ▪ ~ 10’000 Scientists from
113 different countries
▪ Budget: 1 billion CHF/year
http://cern.ch
4
From Physics to Industry
5 http://cern.ch/knowledgetransfer
ATLAS
CMS LHCb
Alice LHC
The worlds biggest machine
Generated 30 Petabytes in 2012 > 100 PB in total!
6
LHC - Large Hadron Collider
27km ring of superconducting magnets Started operation in 2010 with 3.5 + 3.5 TeV, 4 + 4 TeV in 2012 2013 – 2015 in Long Shutdown 1 (machine upgrade) Restarted in April 2015 with 6.5 + 6.5 TeV max
7
Some ATLAS facts ▪ 25m diameter, 46m length, 7’000 tons ▪ 100 million channels ▪ 40MHz collision rate (~ 1 PB/s)
8
9
Physics data(>100 PB)
Metadata of physics data
Sensor Data of technical installations
Log data Configuration data
Documents
Media data
Others
10
Is Hadoop used for storing the ~30 PB/year of physics data ?
No ;-( Experimental data are mainly stored on tape
CERN uses Hadoop e.g. for storing the metadata of the experimental data
11
Physics Data Handling ▪ Run 1: 30 PB per year
demanding 100’000 processors with peaks of 6 GB/s writing to tape spread across 80 tape drives
▪ Run 2: > 50 PB per year
with peaks of 10 GB/s writing to tape
CERN’s Computer Center (1st floor)
12
Physics Data Handling 2013 already more than 100 PB stored in total! ▪ > 88 PB on 55’000 tapes ▪ > 13 PB on disk (total disk space 45 PB) ▪ > 140 PB free tape storage waiting for
Run 2
CERN’s tape robot
13
Why tape storage? ▪ Cost of tape storage is a lot less than
disk storage ▪ No electricity consumption when tapes
are not being accessed ▪ Tape storage size = Data + Copy
Hadoop storage size = Data + 2 Copies
▪ No requirement to have all recorded physics data available within seconds
CERN’s tape robot
14
Physics data(>100 PB)
Metadata of physics data
Sensor Data of technical installations
Log data Configuration data
Documents
Media data
Others
15
A lot of systems to control
16
85’000 Devices > 2 Million I/O Endpoints
Much more when including subsystems!
Electricity
Magnets
Cryogenics
Controls Computers
Vacuum
Ventilation
Cooling
Safety
Main systems controlled from one central point: The CERN Control Centre
A complex controls infrastructure...
• Each dot is a process • Each line is a network connections
• Many different types of data sources and protocols
• Complex data structure • Different data rates
The configuration hell
19
How to subscribe to my data?
? C2MON
What if…
20
History
… there is a platform that: • handles low level data subscriptions • monitors the different data sources • reconfigures acquisiton processes at runtime
• standardises messages and data storage • reduces data streams to relevant information • always keeps the latest values available
• provides custom data streams • provides access to history … and is modular and open source?
myApp
MySQL HSQL Oracle Elasticsearch
C2MON – A great platform for many use cases
21
C2MON History
Use C2MON… • to feed your analytics framework • to structure persist your data in ES for
offline analytics • as backbone for your SCADA system • as configurable data proxy • to write innovative new Java and Web
applications • …
MySQL HSQL Oracle Elasticsearch
myApp
Use C2MON to realise IoT scenarios
22
C2MON System X
System Z
System Y
C2MON - CERN Control and Monitoring Platform
▪ Modular and scalable at all layers ▪ Optimized for high availability & big data volume
▪ Server based on In-Memory cache solution Two big monitoring services (TIM & DIAMON) running in production with C2MON at CERN Central LHC alarm system (“LASER”) in migration phase
Ready for outside collaborations!
C2MONServer
ClientAPImyApp
DAQAPImyDAQ
Acquisi?onFilteringValida?on
Business
ClientApps
myMod
http://cern.ch/C2MON
Architecture
Client API
C2MON Server
History / Backup
In-Memory
- configuration - rule logic - latest sensor values - assuring high availability
DAQs
No downtime, if DB is not available.
24
Sensor == Tag
In-Memory approach: Scale with data and processing needs
Increase Data in Memory
Reduce database dependency
DB
Application
DB
Application
In-Memory
Distributed In-Memory
DB
Application
In-Memory
Application
In-Memory
25
In-Memory Data Grid solutions
Forrester Wave™: In-Memory Data Grids, Q3 2015
Popular Open Source solutions:
The Tag family
Tag
ControlTag
DataTag
RuleTag
• Internally used for Process and Equipment surveillance
• Used for data acquisition *
• (#123 + #234) > 2 [ERROR], true [OK]
• id • name • value • Quality • timestamp
* Support of primitive arrays and arbitrary Objects
TagService tagService = C2monServiceGateway.getTagService();
tagService.subscribeByName(”my.tags.foo1", tag -> {
// String name = tag.getName(); // Object value = tag.getValue(); // boolean valid = tag.isValid();});
28
Example: Subscription to one Tag
C2MON
myApp
TagService tagService = C2monServiceGateway.getTagService();
tagService.subscribeByName(”my.tags.*", tag -> {
// String name = tag.getName(); // Object value = tag.getValue(); // boolean valid = tag.isValid();});
29
Example: Subscription to Tag stream
C2MON
myApp
my.tags.foo1 my.tags.foo2 … my.tags.xyz900.joe
Wildcard support
Acquisition layer
DAQ Process
Equipment (or Service)
SubEquipment
Machine running DAQ process
DAQ Process takes care of: • Equipment/Service monitoring • Data acquisition for configured Tags • Raw data validation & filtering • Sending data to server tier
Raw data validation & filtering on DAQ layer
Dynamic Filtering ▪ Dynamic Time dead-band filtering for Protecting
against data bursts Static Filtering ▪ Static time dead-band filtering ▪ Value redundancy ▪ Value dead-band filtering Data Validation ▪ Value in defined range? ▪ Correct value data type? ▪ Source timestamp in the future? ▪ Outdated information?
C2MONServer
DAQAPImyDAQ
Acquisi?onFilteringValida?on
BusinessmyMod
31
Configurable by Tag
Basic configuration structure Process
Equiment [0..*]
SubEquipment [0..*]
DataTag(s)
Metadata [0..*] Alarm [0..*]
Metadata [0..*]
DataTag(s)
Metadata [0..*] Alarm [0..*]
Metadata [0..*]
Commands [0..*]
ConfigurationService confService = C2monServiceGateway.getConfigurationService();
confService.createProcess("P_JONTHEBEACH");
confService.createEquipment( "P_JONTHEBEACH”, "E_REST”, "cern.c2mon.daq.rest.RESTMessageHandler”);
33
Example: New DAQ configuration
Inject your own DAQ flavour
C2MON Server
P_JONTHEBEACH
E_REST
HashMap<String, String> tagAddress = new HashMap<>(); tagAddress.put("mode", "POST"); tagAddress.put("postFrequency", "60");
confService.createDataTag( "E_REST", DataTag.create( "Temperature_Malaga", Integer.class, new DataTagAddress(tagAddress)).build() );
34
Example: Create DataTag
C2MON Server
P_JONTHEBEACH
E_REST
HashMap<String, Object> metas = new HashMap<>(); metas.put(”GPS", "N 36° 43' 16.585'', W 4° 25' 17.035''"); metas.put("Country", "Spain");
confService.updateTag( DataTag.update("Temperature_Malaga”).metadata(metas).build() );
35
Example: Add metadata to Tag
C2MON
myApp
36
C2MONServer
DAQAPI
JSON/XML
JSON
ClientAPI
BootstrapWEBConsole
ClientAPIAlarmCon
sole
Database
Open Source in all layers
Easy prototyping due to Spring Boot
as timeseries data storage
ß C2MON à
C2MON example Technical Infrastructure Monitoring (TIM)
▪ Operational since 2005
▪ Used to monitor and control infrastructure at CERN ▪ 24/7 service
▪ ~ 100 different main users at CERN
▪ Since Jan. 2012 based on
new server architecture with C2MON
CERN Control Center at LHC startup
38
TIM – Main features
▪ Unifies sensor data from a multitude of sources and protocols
▪ Provides simple dashboarding and access to historical values
▪ Central configuration management
▪ Filters raw data streams
Workflow based sensor and alarm declaration
39
Cooling Safety Systems Electricity Access Network and Hardware Controls
Cryogenics
TIM Server based on C2MON
Client Tier
Data Analysis Video Viewer TIM Viewer Access Management
Alarm Console
Data Acquisition & Filtering
> 1200 commands > 1300 rules
> 91k data sensors > 41k alarms
Web Apps
40
TIM Server based on C2MON
Client Tier
Data Analysis Video Viewer TIM Viewer Access Management
Alarm Console
Data Acquisition & Filtering
ca. 400 million raw values per day
Filtering ca.2 million updates
> 1200 commands > 1300 rules
> 91k data sensors > 41k alarms
Web Apps
41
C2MON in Action
DAQ
C2MONServer TerracoPa
…
C2MONServer
DAQ
…
AccessDashboardDataAnalysis
TerracoPaStandby
Acquisition & Filtering > 15 Equipment types > 400 Million updates/day (TIM)
Business Logic Layer > 91k data sensors > 300k alarms > 20K commands > 48.000 business rules > 28 Million updates/day (DIAMON)
Client
OPC PLC
AlarmWebInterface
42
What’s coming next?
▪ Offering C2MON to the Open Source community
http://cern.ch/c2mon ▪ Collaboration with Universities and Industry ▪ Providing new simple acquisition API for Zero Configuration ▪ Introduction of Streaming Analytics module
43
Technologies used for C2MON C2MON server: No J2EE server and only open source! ▪ Java 7, Spring 4.2 ▪ persistence framework: MyBATIS (server), Hibernate (client) Dependency management through Maven Supported Databases: Oracle, HSQL, MySQL, Elasticsearch Middleware: JMS ActiveMQ ▪ no direct communication between message publisher and consumer ▪ secured broker access ▪ broker clustering ▪ persistent queues ▪ topics for broadcast/reply messages Message transport format: XML and JSON Remote caching solution for C2MON server cluster: Terracotta/Ehcache, (support for other caches foreseen) ▪ horizontally scalable ▪ proven technology ▪ open source ▪ support contract possible
Questions? Thank you for your attention!