+ All Categories
Home > Documents > AVES: A high performance computer cluster array for the INTEGRAL satellite scientific data analysis

AVES: A high performance computer cluster array for the INTEGRAL satellite scientific data analysis

Date post: 22-Aug-2016
Category:
Upload: pietro
View: 212 times
Download: 0 times
Share this document with a friend
17
Exp Astron (2012) 34:105–121 DOI 10.1007/s10686-012-9301-6 ORIGINAL ARTICLE AVES: A high performance computer cluster array for the INTEGRAL satellite scientific data analysis Memmo Federici · Bruno Luigi Martino · Pietro Ubertini Received: 28 October 2011 / Accepted: 26 April 2012 / Published online: 24 May 2012 © Springer Science+Business Media B.V. 2012 Abstract In this paper we describe a new computing system array, designed, built and now used at the Space Astrophysics and Planetary Institute (IAPS) in Rome, Italy, for the INTEGRAL Space Observatory scientific data analysis. This new system has become necessary in order to reduce the processing time of the INTEGRAL data accumulated during the more than 9 years of in- orbit operation. In order to fulfill the scientific data analysis requirements with a moderately limited investment the starting approach has been to use a ‘cluster’ array of commercial quad-CPU computers, featuring the extremely large scientific and calibration data archive on line. Keywords Cluster · AVES · OSA · INTEGRAL · IBIS 1 Introduction The INTernational Gamma-Ray Astrophysics Laboratory of the European Space Agency “INTEGRAL”, was proposed as the ESA M2 mission in the framework of the ‘Horizon 2000’ program at the beginning of the ’90, and successfully placed in orbit the 17 October 2002. INTEGRAL is the heaviest scientific satellite flown by European Space Agency (ESA) to date. The payload has a total weight of about four tons, and comprises two main gamma ray instruments, the imaging telescope IBIS M. Federici (B ) · P. Ubertini IAPS INAF, Via Fosso del Cavaliere 100, Roma 00133, Italy e-mail: [email protected] B. L. Martino IASI CNR Roma, Viale Manzoni 33, Roma 00100, Italy
Transcript
Page 1: AVES: A high performance computer cluster array for the INTEGRAL satellite scientific data analysis

Exp Astron (2012) 34:105–121DOI 10.1007/s10686-012-9301-6

ORIGINAL ARTICLE

AVES: A high performance computer cluster arrayfor the INTEGRAL satellite scientific data analysis

Memmo Federici · Bruno Luigi Martino ·Pietro Ubertini

Received: 28 October 2011 / Accepted: 26 April 2012 / Published online: 24 May 2012© Springer Science+Business Media B.V. 2012

Abstract In this paper we describe a new computing system array, designed,built and now used at the Space Astrophysics and Planetary Institute (IAPS)in Rome, Italy, for the INTEGRAL Space Observatory scientific data analysis.This new system has become necessary in order to reduce the processing timeof the INTEGRAL data accumulated during the more than 9 years of in-orbit operation. In order to fulfill the scientific data analysis requirementswith a moderately limited investment the starting approach has been to usea ‘cluster’ array of commercial quad-CPU computers, featuring the extremelylarge scientific and calibration data archive on line.

Keywords Cluster · AVES · OSA · INTEGRAL · IBIS

1 Introduction

The INTernational Gamma-Ray Astrophysics Laboratory of the EuropeanSpace Agency “INTEGRAL”, was proposed as the ESA M2 mission in theframework of the ‘Horizon 2000’ program at the beginning of the ’90, andsuccessfully placed in orbit the 17 October 2002.

INTEGRAL is the heaviest scientific satellite flown by European SpaceAgency (ESA) to date. The payload has a total weight of about four tons,and comprises two main gamma ray instruments, the imaging telescope IBIS

M. Federici (B) · P. UbertiniIAPS INAF, Via Fosso del Cavaliere 100, Roma 00133, Italye-mail: [email protected]

B. L. MartinoIASI CNR Roma, Viale Manzoni 33, Roma 00100, Italy

Page 2: AVES: A high performance computer cluster array for the INTEGRAL satellite scientific data analysis

106 Exp Astron (2012) 34:105–121

[8] and the high resolution spectrometer SPI [9] that are complemented by anX-ray monitor, named Jem-X [1], and an Optical Monitor Camera (OMC) [7].

In order to detect with INTEGRAL [10] the faintest emitting soft gamma-ray radiation it is necessary to have long exposure time. The nature of the highenergy instruments, operated with tungsten coded masks and position sensitivedetectors in the focal plane, requires to have all the collected scientific dataon line, also implying very heavy ‘cross-correlation’ processing algorithms anditeration procedures in order to extract the full scientific information.

After more than 9 years in orbit the large amount of scientific data collectedcannot be efficiently handled with the processing systems and data analysissoftware package designed at the beginning of the mission. In fact, the obser-vatory has produced to date a large amount of scientific data, so far exceeding10 Tb and the initially provided h/w and s/w data analysis system is now tooslow and partially obsolete to allow an efficient data analysis of the large dataset. This is particularly true for the IBIS gamma-ray imager.

Furthermore, the recent ESA decision to extend the mission in operationtill 2014, has imposed a different approach in order to drastically reduce theaverage computing time to obtain high resolution sky images over a wide fieldof view. As an example, selected deep exposure fields, exceeding 10–20 Msin the Galactic centre, need day-to-week computing time for full scientificanalysis, making survey studies impractical. In the latter case more than 200Msof data have to be treated during a single analysis runs, resulting in week-to-months computer crunching time.

This has finally made it impractical to use standard PCs and/or commerciallyavailable work-stations. To overcome this problem we have designed, pro-duced and tested a new system to run the scientific data analysis at the SpaceAstrophysics and Planetary Institute (IAPS) in Rome, Italy. The new system,based on a ‘cluster’ approach, is named AVES [3], and is fully operative since2010. This new facility is expected to be the main computing system to beused in the next few years for the INTEGRAL ‘all-sky survey’ key programand for the selected deep-fields analysis planned by the IBIS team. Afterseveral months of operational test, followed by a short commissioning phase,the system is now fully validated. This new facility has demonstrated to be fullycompliant with the expected computing performance and has clearly provideda breakthrough in the data analysis performance.

2 The INTEGRAL data analysis system

INTEGRAL has been conceived by ESA as an observatory satellite in whichmost of the observation time is available to the astronomical community atlarge. Most of the scientific observations are long (lasting from a few hundredks to several Ms) and are segmented on the so called ‘Science Windows’,usually lasting each 2000 s on the same sky position. This technique, named‘dithering’, is optimised to minimise the negative impact of the Cosmic Raysinduced background on the high energy detectors, i.e. SPI, IBIS and Jem-X.

Page 3: AVES: A high performance computer cluster array for the INTEGRAL satellite scientific data analysis

Exp Astron (2012) 34:105–121 107

The resulting data sets are large (several Gb/day) and complex as they need torefer to auxiliary data (instrument set-up, calibration, pointing and aspect data,astrometry, etc). In order to release the collected data to the community in aform suitable for scientific analysis it was decided very early on to develop acentre in which the data would be received, analysed, archived and distributed.The INTEGRAL Science Data Centre (ISDC) was conceived to meet thesegeneral goals. It grew from a consortium of European and US institutes whoresponded to a call by ESA to develop and operate the ISDC [2] as a scientificinstitution, in analogy with the on-board instruments. The ISDC was proposedand then implemented by a Consortium led by a Swiss PI with the support ofseveral scientific institutes participating to the INTEGRAL programme.

The data analysis is organised in three different steps: near real time analysis(Quick Look Analysis, QLA) and the Standard Analysis performed at theISDC site in Geneva (Switzerland) and the off-line standard scientific analysisperformed by the users on any set of data delivered via web I/F by the ISDC.The latter analysis is usually performed applying the standard s/w providedto ISDC by the single PI led Instrument Teams. This s/w is named: OfflineScience Analysis (OSA) [2]. OSA is continuously maintained and updated atISDC taking into account the instruments, spacecraft and orbit evolution. Thelatest released version is named OSA9, being the 9th revision from the satellitelaunch.

Usually, the scientific data analysis is carried out by the different scientificteams by means of the use of traditional computers. In fact, the OSA s/wallow to perform the required analysis, irrespectively of the environmentwhere it runs. It generates a number of corrections that are derived from thecalibrations files embedded into the archive or obtained from the previoussteps in the processing. The secondary products that are generated by theanalysis are indexed in such a way that they are associated with the relevantoriginal information without unnecessary duplication. The total amount ofdata, including the raw telemetry data stream, is about 10 Gb per revolution,and it is structured in a properly defined tree structure comprising more than10 k files, basically transparent for the final user.

The incremental amount of data obtained so far is about 10 Tb. Theprojection, considering the actual mission extension till 2014 is 15 Tb. Thedata handling of such a large data set requires a continuous improvement ofthe storage capacity and processing power of computer systems used by thescientific institutes interested in the scientific data exploitation. The complex-ity of the data package obviously makes it impractical, or impossible at all,to perform complex analysis via a remote connection with the Central DataStorage at ISDC. Therefore, most of the different research institutes involvedin the project have adopted ‘ad-hoc’ solutions to solve this issue.

The main problem to solve is clearly related to the data transfer from ISDC,via web interface, to the data storage and, finally, to have a data analysis h/wable to handle properly the complex task of the INTEGRAL data packagetreatment. One of the main difficulties is associated with the control and checkof the completeness of the (large size) data package and its on line redundancy.

Page 4: AVES: A high performance computer cluster array for the INTEGRAL satellite scientific data analysis

108 Exp Astron (2012) 34:105–121

In fact, it is essential that the data stored on site locally are aligned with thoseat the ISDC. Also, it is essential to have a redundant set of locally stored setof data, in view of the fact that re-download of the whole (9 years long) dataset would take an uninterrupted connection lasting for months with the ISDC,not considering the difficulty to provide a reliable ongoing verification of theconsistency of data downloaded. One additional difficulty is due to the fact thatthe data package format provided by ISDC is not easy to manage. Actually,it now consists of about 107 files, possibly containing errors/inconsistenciesthat are evident only when they are analyzed. Those ‘corrupted’ files oftenimply the loss of the data analysis process, while using conventional computersystems. This, in turn, imply the to re-start the analysis after having spot andremoved the fault file(s). Furthermore, the difficult quality control of thedownloaded data package is a longstanding problem which has been entrustedwith the effort of individual scientists, eventually solving the issue with ‘ad-hoc’local solutions, difficult to export to different institutes/sites.

3 The AVES architecture

The INTEGRAL mission extension to 2014, with possible further prolonga-tion till 2016 implied a substantial increase of the dimension of the scientificdata base storage need and a heavier data crunching time due to the longintegration time of ‘ultra deep observations’. In fact, our local computingfacility used at that time was designed and built just before the INTEGRALlaunch: the efficiency of the whole analysis process was going to be clearlyinefficient and too much time would have been necessary to properly obtainthe scientific final product in a reasonable time slowly increased from days,at the beginning of the program, to weeks/months. On the other hand, thedata analysis of the whole data set was often necessary to extract time profile,spectra, photon list, etc from the sources detected during the years long survey.A step forward in h/w computing power was clearly necessary, also in view ofthe new mission duration horizon.

Taking into account of the new system requirements a study was carried outto design and implement a new, more efficient, h/w and s/w system, based on a‘parallel’ cluster made out of commercial PCs. The final goal was to drasticallyimprove, from a factor of 10–100, the scientific analysis duration of the largeset of data incrementally provided by INTEGRAL. The conventional name ofthis new system was AVES (Advanced Versatile Economic computer System).

The main design requirement, apart the system speed, were: (i) total costless than 50–100 keuro, (ii) possibility of incremental expansion in case offurther mission prolongation, (iii) use of the existing OSA s/w, eventuallyintegrated by a further s/w layer, (iv) remote multi-users capability and, last butnot least, (v) full ‘transparency’ for the scientists. The final choice was focusedon a ‘parallel’ machine array based on commercial PCs. In fact, one of themain problem related with the use of a parallel machine array is due to the factthat OSA is composed by about 180 programs designed to work with serial

Page 5: AVES: A high performance computer cluster array for the INTEGRAL satellite scientific data analysis

Exp Astron (2012) 34:105–121 109

instances. Clearly, the scw-by-scw processing allows to have any number ofnodes running on a given analysis at the same time using the common OSA s/w.Nevertheless, to run the OSA programs on a “pure” parallel h/w would haveimplied a massive re-write of the software, out of the scope of the new systemplan. Aves is able to run parallel software through the MPI (Message PassingInterface) library. In fact using the MPI libraries integrated in a program writefor example in C language automatically will allocate the “nodes” of the clusterto run in parallel. AVES is capable to run “pure” parallel software if designedin parallel language. In the OSA software case, AVES master controller willsend to the Slurm (Simple Linux Utility for Resource Management Version0.33) [11] manager the instances to be run simultaneously (in parallel) on thenodes of the cluster as specified below.

As specified in the system design architecture a new s/w, AVES dedicated,was developed to overcome the above mentioned difficulty. The new softwarewas produced in-house by the IAPS designers and engineers and its main taskis to distribute the workload of each run, triggered by the multi-users clients,on the cluster nodes, that automatically divided the analysis of OSA in N jobsoperating on N cores. This solution produced a similar result to that obtainedwith a parallel computing configuration. As planned the new set-up provideda performance increase in the whole scientific analysis by a factor of morethan 100 compared with a server consisting of a single-processor PC, usedbefore. Since 2010 AVES is fully operative and is based on a cluster of low-cost commercial PCs array, interconnected via a local area network (LAN),operating with an embedded large data storage capability. Each single node isequipped with a QuadCore processor Intel Q6600, 4 Gb of RAM and 320 Gbof Hard Disk SATA-2. It is a modular system which offers the possibility ofexpanding the numbers of nodes up to 65,000, very far beyond the actual needfor the INTEGRAL computing requirements. Actually, it consists of an arrayof 34 computers with 120 Gb of RAM and 14 Tb of shared memory, providinga computing power of 3 × 1011 Floating Point Operation per Second, i.e. 300GFlops, adequate for the current applications. The system block diagram isshown in Fig. 1.

3.1 The AVES hardware

The AVES cluster, has the following hardware features:

– 120 CPU cores.– 120 Gb of total RAM (1066 MHz clock frequency).– 8 Tb of total disk space shared UFS (Union File System) expandable to

900 Tb (in the configuration with 60 nodes).– 2 PC used as a master control unit and active reserve, on which is installed

the software that implements the interface (porting) between users andanalysis software (OSA) and the resource manager clustering: Slurm.

Page 6: AVES: A high performance computer cluster array for the INTEGRAL satellite scientific data analysis

110 Exp Astron (2012) 34:105–121

Lan 1000 MhzNode 1 Node 2 Node 3 Node 30 to 60

For INTEGRAL DATAAnalysis

Master controller

Users homefolder

6 to 12 TB

Shared FS6 to 900 TB*

Cluster Data Storage48 TB

ISDC DATA

UPS 10Kva

Local LAN

SpareMaster controllerWeb Interface

Fast temporaryarea

Firewall

Fig. 1 Block diagram of the hardware structure of the AVES system: 480 Tb of UFS (Union FileSystem) capability for 30 nodes configuration

– 1 PC configured as NAS in RAID-5 technology and dedicated to the man-agement of 6 Tb of disk space users “Home area”, and to the temporarymemory.

The choice to use a RAID-5 technology ensures a good reliability of thestored data and, in turn, a high level of stability of the data on the long term.However, it is well known that the files management in a RAID configurationusually generates a degradation in the read/write speed processes. In this casethe degradation is mainly due to the huge amount of files present in thescientific data package. To mitigate the large induced read/write data speedthe system generates a temporary memory area provided by a conventionalhard disk with a moderate storage capability (2 Tb) directly connected to theSATA bus of the NAS PC. Then, the AVES management software takes careto shift the data from the temporary area to the home area. This procedureis much faster than the one achievable with a direct read/write access to theRAID structure. This operational mode is a peculiarity of the AVES systemand provides an average improvement of the computing performance up to afactor of 3.

The communication between the AVES nodes is obtained via the integratednetwork interfaces present on each motherboard, interconnected through an

Page 7: AVES: A high performance computer cluster array for the INTEGRAL satellite scientific data analysis

Exp Astron (2012) 34:105–121 111

ethernet switch at a speed of 1 Gbit/s. The external connectivity intrusionsecurity is guaranteed by means of a dedicated PC that has the function ofa firewall (PfSense 1.2.2). This firewall communicates with the users inside andoutside the system through:

– 1 fiber optic connection to the outside world at a speed of 1 Gbit/s.– 2 wired connections at a speed of 1 Gbit/s.

The bulk Data Storage and historical archive containing all the data trans-mitted by the INTEGRAL mission consists of two NexSan SATA Beast twinunits. The total storage capacity is up to 84 disks of 1 Tb in RAID-6 mode. Thedisk management is carried out by three PCs/servers connected to the clustervia a fiber optic with a throughput of 1 Gbit/s, as shown in Fig. 2.

The set-up provides a safe connection between the scientific users and theData Storage archive via a dedicated hardware firewall. This firewall allows asafe and high speed interconnectivity among the (internal) IAPS AVES clusterusers, in turn ensuring the sharing of the scientific data products obtained bythe AVES processing in a controlled and protected way. It also allows a secureand reliable connection to the ISDC that is the source of the pre-processedtelemetry data flow. This guarantee the full synchronization of the data storedin house (the AVES Data Storage) with the main ISDC archive.

The whole hardware dedicated to the data analysis is powered by two dedi-cated UPS power supplies, each with a capability of 10 kVA. One is dedicatedto the AVES cluster computer array and another one to the Data Storage. Thecluster PCs are housed in an expandable rack, shown in Fig. 3. This solutionis compliant with the requirements of robustness, low-cost and expandabil-ity. The rear view shows the standard low-cost commercial interconnectionwiring.

Cluster Data Storage 148 TB

Cluster Data Storage 248 TB

Server 1 Server 2 data downlad

Web interf ce controller

To firewall

Cluster internal switch

192.168.x.x

192.168.x.x

192.168.x.x 192.168 .x.x

UPS 10Kva

Fig. 2 The AVES Data Storage block diagram

Page 8: AVES: A high performance computer cluster array for the INTEGRAL satellite scientific data analysis

112 Exp Astron (2012) 34:105–121

Fig. 3 Front and back views of the AVES cluster rack

3.2 The AVES software

The software controlling all the AVES functions is composed of:

– The free-of-charge operating system Linux Debian Lenny, kernel 2.6.26-1-686.

– The cluster resource manager Slurm.– The AVES specific s/w.

The “heart” of the s/w consists of about 50 original programs entirely devel-oped for the AVES cluster, mostly written in ‘bash’ script, one of the commandlanguage interpreter, for the Linux operating systems.

The basic operations performed by the specific s/w can be summarized asfollows:

– Data Storage: OCFS2 mount (GPL Oracle Clustered File System,Version 2).

– Home space.– Aves startup.– Resources Allocation.– Running of specific tools.– Synchronization instances subjected to the calculation queue.

3.3 Local UFS storage

The diagram of Fig. 4 summarizes the operations carried out by the start-upprogram. When the AVES cluster is switched on the bootstrap s/w mainlyperforms the control of the active nodes and generate a memory disk area

Page 9: AVES: A high performance computer cluster array for the INTEGRAL satellite scientific data analysis

Exp Astron (2012) 34:105–121 113

Fig. 4 The initialization ofUFS ClustStoragInit

Wait for HostsOn-line

All HostsOn-line

Mount LocalStorages

Mounted?

Master SharedStorage

Mounted ?

Collate Storages

Collated ?

Status = Success

Status = HostOffLine

Status = NoLocMount

Status = NoSharMount

Status = NoCollate

Return Status

N

N

N

N

Y

Y

Y

Y

shared as a single UFS (Union File System) visible by the users, built withthe spare memory area of each PC’s Operative System disk. The UFS is easilyexpandable by installing on each node one or more additional hard disks.

Page 10: AVES: A high performance computer cluster array for the INTEGRAL satellite scientific data analysis

114 Exp Astron (2012) 34:105–121

3.4 The AVES environment setting and user login

In Fig. 5 we describe the basic operations performed during the user login.Follow a description of the basic steps of the standard login procedure:

Environment settings This is the stage at which the automatic setting of theenvironmental variables for the OSA software is done and it is stored in aconfiguration file. After the login each user can override the setting of theenvironmental variables or work with those pre-defined. This procedure makeeasier the use of the analysis software for non skilled users. In fact, it minimizes

Fig. 5 The login procedureAves Login

Resume ?

Nodes >Left Nodes

Available ?

Allocate Nodes

Resume Session

Exit

Valid User Error

Select Nodes Number

Nodes >MAX

Nodes = Max User Nodes

Nodes = Max User Nodes

Delay

Y

Y

Y

Y

Y

N

N

N

N

N

Environment settings

Page 11: AVES: A high performance computer cluster array for the INTEGRAL satellite scientific data analysis

Exp Astron (2012) 34:105–121 115

the required knowledge of the OSA initialization procedure and avoid to theuser to write a list of complex commands.

Restoring existing sessions The users can be connected remotely via variousssh client, available on any operating system (Microsoft, Linux, Unix, Mac OS,etc). The analysis procedure, once it is started, may require an execution timeof several hours. The user can log-out from AVES while the work session isstill running. At the next login the user will find a list containing her/his stillrunning jobs and verify if the (possibly several) jobs are still running properly.In this way the user can run multiple analysis, till her/his allocated resourcesare not exhausted. In Fig. 5 is shown the management of maximum numberof nodes available to each user. If the cluster is fully occupied the ‘next’ userlogin-in is informed about the cluster status and consequent impossibility tohave immediate access to the system.

Allocation of the computing resource Each user has ‘a priori’ assigned amaximum number of usable nodes. This number is variable from a minimumof five nodes (equivalent to 20 CPU) to a maximum of twenty nodes (80 CPU).When the job is finished the session is automatically closed and the resourcepreviously allocated the a specific user is again available for other users. Theinformation generated by all the run jobs are stored in log files that are used tocreate a database containing all the necessary information of the runs made.

3.5 The analysis software

The user, once logged-in via the command “ssh -X user@AVES” followed bythe user’s password, can start the work via a graphic window “Welcome” page(see Fig. 6). This Man-Machine Interface (MMI) summarizes the character-istics of the cluster computing system, including a description of the installedprograms and a summary list of the projects executed by the OSA analysis.The basic information is the name, the directory, and the number of nodesavailable to her/him (see Fig. 7). Depending on the program to be run, theuser interacts with a number of interfaces (GUIs) showing the list of the thenecessary parameters. Finally, the user is allowed to save all the set parametersin a file to be eventually used at a later stage.

As described by the “Welcome” page the following customized programsare available:

– cog_create_gui: through this program we have realized a sort of adaptationto the “og_create” program of the package OSA, that is able to prepareitems for subsequent analysis.

– cibis_science_gui: able to realize a sort of adaptation to the “ibis_science_analysis” program of the package OSA, that is able to carry outall the scientific analysis.

Those adaptations are similar to a “porting”. The term “porting” usuallyindicates the conversion of a software built to run on serial flows (where

Page 12: AVES: A high performance computer cluster array for the INTEGRAL satellite scientific data analysis

116 Exp Astron (2012) 34:105–121

Fig. 6 Welcome AVES loginpage

the instance is sent to a single CPU) converted to be compiled and run onflows of parallel type (parts of one instance or more instances on more CPUsimultaneously). A new generated feature allows the user to choose the desiredSCW list analysis and automatically obtain the corresponding sky images(mosaic) for each selected period, i.e. to obtain one mosaic for each satelliteorbit (lasting 3 days). This feature allows to group the scientific observationsper orbit, significantly improving the statistical significance of the weak sourcesand to provide high quality spectra and light curves of the cosmic gamma-ray

Fig. 7 Summary of jobsexecuted by the user

Page 13: AVES: A high performance computer cluster array for the INTEGRAL satellite scientific data analysis

Exp Astron (2012) 34:105–121 117

sources. The two programs cog_create_gui and cibis_science_gui fully exploitthe AVES cluster computing power (see Figs. 8 and 9 for details).

The above mentioned programs handle the computing resources (nodes)and split the serial job in many jobs which are, in turn, run on each CPUallocated to that specific user/job. This procedure allows to achieve an increaseof a factor of 100 in the computing speed and then in the analysis of thescientific data, if compared to the system used by the INTEGRAL project

Fig. 8 The cog_create_guiObject Creator

Aquire parameters

abort

List file Split

Spawn createOBJ process

Last ?

Error ?

Status = Success

Status = Abort

Status = CreateError

Return Status

Focus on First list file

Focus on next OBJ

All OBJCreated ?

Delay

Y

Y

Y

N

N

N

N

N

Y

Page 14: AVES: A high performance computer cluster array for the INTEGRAL satellite scientific data analysis

118 Exp Astron (2012) 34:105–121

Fig. 9 The cibis_science_guiScience Analysis

Aquire parameters

abort

Spawn Analysis OBJ

Last ?

Error ?

Status = Success

Status = Abort

Status = WrongAnalysis

Return Status

Focus on First OBJ

Focus on next OBJ

End ofAnalysisDelay

Y

Y

Y

Y

N

N

N

N

prior to the launch (2002), and equipped with the Intel Pentium 4 CPU family.There are two more utilities developed to ease the job execution:

– search: allows a quick search of specific words within data structures.

This utility has been developed to allow the rapid identification of any errorsin the output trace.

– qc: program that validates the INTEGRAL data archive.

The use of “qc” is simplified by a graphical interface, as shown in Fig. 10. Thisprogram autonomously performs a quality check on the correctness of the filesin the list that identifies the observations of interest. The check consist in two

Page 15: AVES: A high performance computer cluster array for the INTEGRAL satellite scientific data analysis

Exp Astron (2012) 34:105–121 119

Fig. 10 qc GUI interfaces

types of verification: (i) the readability of the files containing the compresseddata relevant for the observation, and (ii) the existence of keywords thatidentify the correct information. After entering the file list containing the datastructures of the scientific analysis, the program produces three output files:

– file1.swg: new input file list (defined “good”) containing only the sciencewindows that have passed the check.

– file1.swc: corrupted science windows, automatically discarded from theinput file.

– file1.swm: missing science windows from the archive which are automati-cally discarded from the input file.

The proper use of file1.swc and file1.swm allows the cluster system to builda database available to all the users with the information useful for furtheranalysis. This database has very fast access times that, in turn, allows to definean archive containing all files corrupted or data missing. It is possible, viaan automated routine, to correct for these corrupted files, downloading themagain from the main ISDC archive in Geneva.

3.6 Graphical User Interface (GUI)

The AVES use is facilitated by a series of graphical interfaces that allow aneasy selection of the parameters necessary for the scientific analysis and theirsaving for later on use. The “GUI” consists of dozens of graphical windowswhich generate the input parameters for the required analysis. These interfacesare written in Bash (Bourne-Again SHell) using Xdialog and GTK (GIMPToolkit libraries). They run extremely fast and are very light in terms ofresource consumption, making it easier to perform the processing through thenetwork. The AVES GUI overcome the difficulty to write the parametersof the OSA analysis programs, which, in most cases, consist of very longand complex strings of commands. The appearance of the GUI is similar to

Page 16: AVES: A high performance computer cluster array for the INTEGRAL satellite scientific data analysis

120 Exp Astron (2012) 34:105–121

that shown in Fig. 10. The wide variety of graphical user interfaces includesmultiple-choice menu, drop down, input strings, etc.

4 Scientific results

The AVES system is now used as the basic scientific data analysis tool at theSpace Astrophysics and Planetary Institute (IAPS) in Rome and is producingthe IAPS bulk scientific output of the INTEGRAL imager IBIS (comprisingthe ISGRI [6], PICsIT [5] and VETO [8]). To date several scientific papers andscientific talks have been produced and presented at national and internationalconferences and symposia using the AVES cluster been produced by the IAPS.As an example we mention here the discovery paper on the new INTEGRALgamma ray source named IGR J16328-4726: which is a new candidate Super-giant Fast X-ray Transient [4]. In this work the authors have analyzed scientificobservations obtained with the IBIS high resolution gamma-ray imager. Thetotal elapsed observation time exceeds 10 Ms, corresponding to approximately5000 SCW.

The overall time needed to process the above mentioned observation hasbeen of the order of a few days. A similar work in terms of data packet sizeand observing time, carried out in the past on previously used servers wouldhave required a total computing time of several months.

5 Summary and conclusions

The AVES cluster is a computing system that has been designed, built, andtested and is now fully operational.

Since 2010, after a short commissioning phase, it is now dedicated tothe scientific data analysis of INTEGRAL data, and, in particular, to theIBIS/ISGRI deep observation data. The ‘ad hoc’ developed software basicallyallows to run in a ‘parallel’ h/w, the cluster, the complex standard s/w namedOSA and distributed by ESA via the ISDC in Geneva. At the beginning ofthe AVES design has been considered the possibility to re-write the OSApackage in a ‘native’ parallel way. In fact, the major problem associated withthe use of this s/w in a parallel machine is to the impossibility to re-compilethe initial routine-sources in a parallel architectures. On the other hand, theOSA s/w has been developed with an effort of several hundreds of man/yearsmaking impossible to re-write the full s/w in a native parallel way. To overcomethis programmatic show-stopper it has been decided to develop the AVEScluster that has solved the problem in an original and efficient manner, witha moderate financial investment, supported by a dedicated ASI grant.

The ESA Space Program Committee in 2010 has extended the INTEGRALoperation in flight till 2014, subject to a positive science evaluation in middle2012. ESA will consider further extension for the period 2014–2016, in viewof the mission scientific outstanding performance and on-board hardware

Page 17: AVES: A high performance computer cluster array for the INTEGRAL satellite scientific data analysis

Exp Astron (2012) 34:105–121 121

health. If approved, this extension will provide to the high energy scientificcommunity the possibility to use the INTEGRAL Observatory in the futureyears, a unique toll to investigate the more energetic phenomena present inthe Universe.

If the INTEGRAL mission will be extended it will be easy to upgradethe AVES cluster with a moderate investment. In fact, it has been alreadyplanned to increase the number of nodes to 60 with a corresponding increaseof the computing power to about 1 TFlops. Finally, in view of several requestsreceived, we are studying the possibility to give access to ‘external’ scientistsbelonging to the high-energy community.

The AVES cluster is a low cost and innovative system that can be re-usedeasily in different applications. As an example, a similar system though on alarger scale, has been prosed recently by IAPS for the Cerenkov TelescopeArray project. On a shorter time scale, we plan to employ it for the dataanalysis of the NuSTAR satellite, foreseen to be launched in March 2012.

Acknowledgements MF is grateful to G. Sabatino for the procurement of the h/w and set upof the system. The IAPS authors acknowledge the ASI financial support via grant ASI-INAFI/008/07/0/ and I/033/10/0.

References

1. Budtz-Jorgensen, C. : JEM-X: the x-ray monitor on INTEGRAL. SPIE 5165, 139–150 (2004)2. Courvoisier, T.J.-L., et al.: The INTEGRAL science data centre (ISDC). Astron. Astrophys.

411, L53–L57 (2003)3. Federici, M., et al.: POS, p. 92. Published online at http://pos.sissa.it/cgi-bin/reader/conf.

cgi?confid=96 (2009)4. Fiocchi, M., et al.: IGR J16328-4726: a new candidate supergiant fast X-ray transient. ApJ. 725,

L68–L72 (2010)5. Labanti, C., et al.: The Ibis-Picsit detector onboard Integral. Astron. Astrophys. 411, L149–

L152 (2003)6. Lebrun, F., et al.: ISGRI: the INTEGRAL soft gamma-ray imager. Astron. Astrophys. 411,

L141–L148 (2003)7. Mas-Hesse, J.M., et al.: OMC: an optical monitoring camera for INTEGRAL. Instrument

description and performance. Astron. Astrophys. 411, L261 (2003)8. Ubertini, P., et al.: IBIS: the imager on-board INTEGRAL. Astron. Astrophys. 411, L131

(2003)9. Vedrenne, G., et al.: SPI: the spectrometer aboard INTEGRAL. Astron. Astrophys. 411, L63–

L70 (2003)10. Winkler, C., et al.: The INTEGRAL mission. Astron. Astrophys. 411, L1–L6 (2003)11. Yoo, A., Jette, M., Grondona, M.: Job scheduling strategies for parallel processing. Lect. Notes

Comp. Sci. 2862, 44–60 (2003)


Recommended