A message-queuing framework for STAR's online - IOPscience

Journal of Physics Conference Series

OPEN ACCESS

A message-queuing framework for STARs onlinemonitoring and metadata collectionTo cite this article D Arkhipkin et al 2011 J Phys Conf Ser 331 022003

View the article online for updates and enhancements

You may also likeSTAR Online Framework from MetadataCollection to Event Analysis and SystemControlD Arkhipkin and J Lauret

-

Research on Airport Deicing Based onQueuing Theory mdashmdash Take an airport inthe west as an exampleYuming Zou and Liang Zhang

-

Overview Impact Of Application OfQueuing Theory Model On ProductivityPerformance In A Banking SectorSunday A Afolalu Kunle O BabaremuSamson O Ongbali et al

-

This content was downloaded from IP address 18910510195 on 29122021 at 1751

A message-queuing framework for STARrsquos online

monitoring and metadata collection

D Arkhipkin J Lauret and W Betts

Physics Department Brookhaven National Laboratory Upton NY 11973-5000 USA

E-mail arkhipkinbnlgov jlauretbnlgov wbettsbnlgov

Abstract We report our experience on migrating STARs Online Services (Run ControlSystem Data Acquisition System Slow Control System and Subsystem Monitoring) from directreadwrite database accesses to a modern non-blocking message-oriented infrastructure Basedon the Advanced Messaging Queuing Protocol (AMQP) and standards this novel approachdoes not specify the message data structure allowing great flexibility in its use After carefulconsideration we chose Google Protocol Buffers as our primary (de)serialization format forstructured data exchange This migration allows us to reduce the overall system complexityand greatly improve the reliability of the metadata collection and the performance of our onlineservices in general We will present this new framework through its software architectureoverview providing details about our staged and non-disruptive migration process as well asdetails of the implementation of pluggable components to provide future improvements withoutcompromising stability and availability of services

1 Introduction

An acronym for the Solenoidal Tracker At RHIC (Relativistic Heavy Ion Collider) STAR tracksthousands of particles produced by ion collision searching for signatures of a state of mattercalled the quark-gluon plasma (QGP) a form that is thought to have existed just after theBig Bang at the dawn of the universe A primary goal of STAR is to bring about a betterunderstanding of the universe in its earliest stages by making it possible for scientists to betterunderstand the nature of the QGP

The STAR collaboration consists of over 500 scientists and engineers representing 60institutions in 12 countries As the size of the collaboration and the scope of its work continuesto grow so does the challenge of having the computing power and data processing resourcesto carry out that work efficiently To meet this challenge a new framework was designed toenhance STAR metadata collector system as presented in this paper

STARrsquos Data Acquisition system (DAQ) Slow Controls system and metadata collectordaemons framework collectively referred as STAR SCADA (Supervisory Control And DataAcquisition) use open-source and open-licensed EPICS framework [1] In this work we willfocus primarily on the improvement of the metadata collection system which corresponds toabout 2000 channels out of approximately 14000 being processed and monitored at STAR [2]

This paper is structured in the following way first we present a brief overview of the existingsystem and itrsquos limitations then we discuss the requirements and expectations for the newmetadata collection system and finally we present our new MQ-based framework performancetesting results as well as concluding remarks and outlook

International Conference on Computing in High Energy and Nuclear Physics (CHEP 2010) IOP PublishingJournal of Physics Conference Series 331 (2011) 022003 doi1010881742-65963312022003

Published under licence by IOP Publishing Ltd 1

Figure 1 Existing setuprsquos bottlenecks 1) Communication failures between EPICS and STAR services2) Subsystem Monitors using Online Database directly 3) Migration macros no complex event processingcapabilities

2 Metadata Collectors Overview

There are many types of the metadata that STAR needs for timely and efficient physicsdata production Most of that metadata comes from various STAR detector subsystemslike calorimeters tracking chambers trigger detectors (voltages currents onoff status etc)Other metadata sources include collider information (beam state various scalers) environment(temperature on a platform humidity) and many run-time parameters such as run numbers filelocations and number of events recorded

STAR has been in operation since 1999 with much of the online software written back in the90rsquos based on best estimates of STARrsquos growth in the first decade Now entering the seconddecade of operations with greatly increased data rates and processing power the time has comefor the improvement of our online stack In this work we describe our attempt to upgradethe online data collector system going beyond the traditional Data Archiver as it is known inSCADA world

The STAR software infrastructure utilizes a typical data warehousing schema consisting ofan operational database layer data access metadata layers and an informational access layerData Collectors (DCs) belong to the operational database layer the front line of the detectordata acquisition process At the moment we collect data from about 2000 EPICS-based channelsin a ldquopullpush moderdquo with read frequencies varying from ldquoas fast as possiblerdquo to once per 5minutes with a prospect of reaching 5000 channels two years from now This data is not onlystored but also complex real-time transformations get performed on it

At the heart of our online data processing reside MySQL databases with MyISAM tablesas storage engine which serve as a run-time and long-term storage at the same time Usersaccess MySQL directly with no intermediate layer or API involved While it is possible toachieve fairly good horizontal scaling with MySQL by adding more slave servers for read-onlyoperations the whole system is seriously constrained by using this database-centric design whenit comes to having many users writing to the database simultaneously The multi-master MySQLreplication setup does not help to balance a system load for writes and the MySQL ClusterEdition [3] provides the RAM-only data storage not suitable for our task

Our concerns about the existing setup include but are not limited to the strong coupling tothe MySQL API the lack of fault-tolerance only limited load balancing is possible and the dataflow is strictly synchronous Also STAR will benefit from switching to publishsubscribe mode


2

of operations (reduces network traffic by a large factor compared to constant polling mode ofoperations) and some way of ldquopluggablerdquo data storage mechanism Fig 1 illustrates some of theexisting bottlenecks in our system The next section summarizes our ideas and expectations onhow to deal with the aforementioned concerns

3 Proposed Framework Requirements and Expectations

Requirements for the new system include a standard API for all clients (unified stable interfaceto access database) with minimal external library dependencies and asynchronous data storageaccess for writes Since we work in a near real-time mode non-blocking IO is desired We need asolution in line with ldquoset it and forget itrdquo ideal This implies reliable data storage access withoutsacrificing performance In addition we need automatic load balancing and fail-over for bothreads and writes and a simple way to configure all clients due to vast array of technologies usedto process the data

While we were primarily looking at an Online API we thought that switching from adatabase-centric model to message-queuing (MQ) bus services will allow us to meet theserequirements Most MQ implementations enforce usage of a predefined message-level dataformat or are available for specific languages or platforms only - this is not suitable for our taskAfter a careful study of all available solutions we settled on the Advanced Message QueuingProtocol (AMQP) having an advantage of being interoperable and technology agnostic - it doesnot constrain us to a specific data format or programming language Since our infrastructureis built primarily upon Scientific Linux we decided to use the qpid server (implementation ofAMQP 010 standard) developed and supported by Red Hat [4] AMQP treats all messagesas binary chunks therefore we can choose absolutely any message serialization standard Manydifferent solutions for structured data serialization exist on the market so in our search wefocused on three things fast (de)serialization speed support for many programming languagesand schema evolution Google Protocol Buffers (protobuf ) satisfied all our requirements so wehave built our framework using this library Vendorrsquos performance report quotes the throughputrates of 547k messages per second for qpid (tuned 1-GigE network 32-byte transfers see [5] fordetails) and Google Protobufrsquos (de)serialize speed could be as high as 300k objects per secondper node (typical setup has several publisher nodes so it is not considered a limit for our system)

Figure 2 Existing setup no middleware layeronly Channel Access communications

Figure 3 Proposed setup using AMQP asmiddleware layer Cloud icons indicate distributedcapabilities


3

4 Framework Implementation

To achieve the desired goals we have defined data exchange structures in the protobuf formatand introduced three simple building blocks EPICS to MQ service MQ to database servicedatabase to MQ service named epics2mq mq2db db2mq Fig 2 illustrates the historical setupand Fig 3 shows the upgraded version

Figure 4 Remotecontrol room exampleDistributed access ca-pabilities provided byMQ server

epics2mq is a daemon which polls EPICS periodically serializes resultsusing Protocol Buffers and sends messages to the AMQP server EPICSand AMQP server parameters are read from a configuration file as wellas the EPICS channel names to poll There is no need to write asingle line of code to add more channels to the data collector systemjust run another epics2mq instance with separate configuration file listingdesired EPICS channels As a bonus epics2mq code is quite simplewhich makes code quality assurance checking a breeze Our future plansinclude a counterpart to this component named mq2epics which willallow to propagate commands from MQ back to EPICS system - requiredcomponent for remote control capabilites

mq2db is a daemon which receives messages published by epics2mqde-serializes the results (Protocol Buffers) and stores data to the backenddatabase via an abstract interface Therersquos no need to modify mq2db

daemon to store extra data channels because it automatically convertsldquopathrdquo and data into databasetablerecord form for all messages arrivingover AMQP (it will create new databases and tables automatically)Also one can have two or more completely independent parallelized dataarchivers using various databases as storage backends by running anotherinstance of mq2db in parallel to the primary archiver

db2mq is a daemon which awaits client data requests over AMQPfetches data from a configured database backend and finally sends data to clients via AMQP

By introducing message-queuing and an interoperable serialization format we achieved thefollowing

bull loose coupling only protobuf structure descriptions are shared between all services

bull synchronous andor asynchronous data transfer support for clients

bull database polling is completely eliminated and we can apply fine-grained load balancingtechniques as we see fit

bull ability to use any storage engine we like for speed maintenance or other reasons

bull easy extensionexpansion of any component better scalability

bull messages could be routed to external facilities - Remote Control Rooms or monitors (seeFig 4)

bull clients rely on clustered AMQP broker for reliable message transfer - no need to deal withdb-specific error handling all over client codes

bull we can simultaneously utilize any database backend suitable for a task it could be well-known maintainable MySQL with tiered SASSSDRAM storage or fast RAM-basedMonetDB or distributed scalable round-robin database like MongoDB or write-tolerantNoSQL Cassandra database etc

bull clients may choose between requestresponse or publishsubscribe mechanisms addingflexibility to the system

5 Performance Testing and Deployment

With the most of the components ready we performed basic qpid performance testing usingperftest The results are encouraging our 5-years old hardware was capable to process


4

approximately 50000 messages per second with no system parameter tuning involved Wefed the MQ server with simulated data from 120 live EPICS channels exported to AMQP inthe following groups 20 integers 40 floats 40 doubles and 20 strings of 16 chars each Next thesame test was repeated for 1-128 publishers (Data Sources) and one subscriber (Data Archiver)For 128 publishers overall message processing rate observed was less than five percent smallerthan the rate we have got for a a single publisher

The second test was to see if this data processing rate is sustainable for a long time Thesame test was run for two weeks continuously with no single failure or delay in data transmissionencountered and the test server load was reported to be less than 05 for the whole period ofobservations

Based on the test results we decided to deploy a functional MQ-based setup for the upcomingrun in 2011 in parallel to the existing system As one can see in Fig 5 it includes two clusteredinstances of qpid several data sources attached to EPICS channels via Channel Access separateMySQL-based Data Archiver and real-time web-based monitoring GUI

6 Summary and Outlook

Figure 5 Deployment setupfor RHIC Run 11 features clusteredqpid instances selected data sourcesarchiver and online users

In this work we presented a message-oriented frameworkdesigned and implemented for the STAR Online DAQdomain Since our primary goal is to employ a vendor-independent message-queuing system as a system bus wedeveloped an AMQP-based framework which will allowus to simplify online metadata collection and maintenanceensure easy expansion and scalability and improve metadatacollection performance and reliability for the upcoming run

Preliminary performance tests indicate that overallsystem performance will be at least two orders of magnitudebetter compared to the EPICS-only setup In addition MQ-based setup allows easy real-time browser-based data flowmonitoring composed of widely available technologies likeWeb Sockets and AJAX because of the possibility to talkto the messaging system directly from web browser withno intermediate services involved (see [6] for example) - notachievable with our existing system Having an MQ-based system will allow STAR to integratededicated stream-processing software into our SCADA thus allowing more complex streamtransformations to be implemented on demand in a scalable way Future plans to extend ourframework may include Control System Studio (see [7]) evaluation and integration as well asextended hardware control capabilities up to device driver level

Acknowledgments

This work was supported by the Office of Nuclear Physics within the US Department of EnergyrsquosOffice of Science

References[1] EPICS - Experimental Physics and Industrial Control System [Online] URL httpwwwapsanlgovepics

[2] D Reichhold et al 2003 Nucl Instrum Meth A 449 792[3] MySQL Cluster home page [Online] URL httpwwwmysqlcomproductscluster

[4] Red Hat [Online] URL httpwwwredhatcom

[5] RHEL6 High Performance Network with MRG - MRG Messaging Throughput amp Latency (Preprint

httpwwwredhatcomfpdfMRG-Messaging-1Gig-10Gig-IB-Xeon-v2pdf)[6] Kamaloka-js AMQP bindings for native JavaScript [Online] URL httpsfedorahostedorgkamaloka-js

[7] Control System Studio [Online] URL httpcs-studiosourceforgenet


5

A message-queuing framework for STARrsquos online

monitoring and metadata collection

D Arkhipkin J Lauret and W Betts

Physics Department Brookhaven National Laboratory Upton NY 11973-5000 USA

E-mail arkhipkinbnlgov jlauretbnlgov wbettsbnlgov

Abstract We report our experience on migrating STARs Online Services (Run ControlSystem Data Acquisition System Slow Control System and Subsystem Monitoring) from directreadwrite database accesses to a modern non-blocking message-oriented infrastructure Basedon the Advanced Messaging Queuing Protocol (AMQP) and standards this novel approachdoes not specify the message data structure allowing great flexibility in its use After carefulconsideration we chose Google Protocol Buffers as our primary (de)serialization format forstructured data exchange This migration allows us to reduce the overall system complexityand greatly improve the reliability of the metadata collection and the performance of our onlineservices in general We will present this new framework through its software architectureoverview providing details about our staged and non-disruptive migration process as well asdetails of the implementation of pluggable components to provide future improvements withoutcompromising stability and availability of services

1 Introduction

An acronym for the Solenoidal Tracker At RHIC (Relativistic Heavy Ion Collider) STAR tracksthousands of particles produced by ion collision searching for signatures of a state of mattercalled the quark-gluon plasma (QGP) a form that is thought to have existed just after theBig Bang at the dawn of the universe A primary goal of STAR is to bring about a betterunderstanding of the universe in its earliest stages by making it possible for scientists to betterunderstand the nature of the QGP

The STAR collaboration consists of over 500 scientists and engineers representing 60institutions in 12 countries As the size of the collaboration and the scope of its work continuesto grow so does the challenge of having the computing power and data processing resourcesto carry out that work efficiently To meet this challenge a new framework was designed toenhance STAR metadata collector system as presented in this paper

STARrsquos Data Acquisition system (DAQ) Slow Controls system and metadata collectordaemons framework collectively referred as STAR SCADA (Supervisory Control And DataAcquisition) use open-source and open-licensed EPICS framework [1] In this work we willfocus primarily on the improvement of the metadata collection system which corresponds toabout 2000 channels out of approximately 14000 being processed and monitored at STAR [2]

This paper is structured in the following way first we present a brief overview of the existingsystem and itrsquos limitations then we discuss the requirements and expectations for the newmetadata collection system and finally we present our new MQ-based framework performancetesting results as well as concluding remarks and outlook


Published under licence by IOP Publishing Ltd 1









2








3





















4








Acknowledgments









5









2








3





















4








Acknowledgments









5








3





















4








Acknowledgments









5





















4








Acknowledgments









5








Acknowledgments









5

Date post:	11-Feb-2022
Category:	Documents
Upload:	others
View:	0 times
Download:	0 times

A message-queuing framework for STAR's online - IOPscience

Documents