Post on 20-Dec-2015
transcript
S.G. AnsariS.G. AnsariApril 18, 2023April 18, 2023
GridGrid
GaiaGrid – A three Year GaiaGrid – A three Year ExperienceExperience
Salim AnsariSalim Ansari
ToulouseToulouse2020thth October, 2005 October, 2005
S.G. AnsariS.G. Ansari
GridGrid
April 18, 2023April 18, 2023
Why Grid?Why Grid?
The GDAAS Study had underestimated the The GDAAS Study had underestimated the necessary computational power to carry out the necessary computational power to carry out the Gaia Data Analysis prototype.Gaia Data Analysis prototype.
The number of parallel activities spun out of The number of parallel activities spun out of control, as algorithm providers began delivering control, as algorithm providers began delivering algorithms that could not be implemented on the algorithms that could not be implemented on the limited infrastructure dedicated to GDAASlimited infrastructure dedicated to GDAAS
A clear need for a collaborative environment was A clear need for a collaborative environment was inevitableinevitable
S.G. AnsariS.G. Ansari
GridGrid
April 18, 2023April 18, 2023
ObjectivesObjectives
1. to increase computational power whenever and wherever needed at low cost
2. to provide a framework of developing Shell task algorithms for Gaia and
3. to establish a collaborative environment, where the community may share and exchange results
S.G. AnsariS.G. Ansari
GridGrid
April 18, 2023April 18, 2023
ConstraintsConstraints
Moto: Low cost, high return on investmentMoto: Low cost, high return on investment Low cost hardware budget: reusability of low end Low cost hardware budget: reusability of low end
PC’sPC’s Small investment in industrial effort: [0.5 FTE]Small investment in industrial effort: [0.5 FTE] System Administration: 1 junior staff + System Administration: 1 junior staff +
maintenance [1 FTE]maintenance [1 FTE]
S.G. AnsariS.G. Ansari
GridGrid
April 18, 2023April 18, 2023
Core vs. Shell TasksCore vs. Shell Tasks
Core TasksCore Tasks: : Initial Data TreatmentInitial Data Treatment Global Iterative SolutionGlobal Iterative Solution Cross-correlations Cross-correlations
Acts upon the totality of the dataActs upon the totality of the data
Shell TasksShell Tasks:: ClassificationClassification Photometric analysisPhotometric analysis Spectroscopic analysisSpectroscopic analysis
Any data analysis involving Any data analysis involving remote expertise and acts remote expertise and acts upon a portion of the data at upon a portion of the data at a timea time
Centralised
As a result of the GDAAS Study, two categories of algorithms had been established:
S.G. AnsariS.G. Ansari
GridGrid
April 18, 2023April 18, 2023
Gaia Virtual Organisation June 2005Gaia Virtual Organisation June 2005
S.G. AnsariS.G. Ansari
GridGrid
April 18, 2023April 18, 2023
The Processing ScopeThe Processing ScopeMichael Perryman, GAIA-MP-009, 17 August 2004, Version 1.1
TaskTask Processing Power Processing Power in totalin total
Duration Duration
[1.2 Teraflop machine]*[1.2 Teraflop machine]*
Core TasksCore Tasks 40 × 1018 FLOPs 385 days CPU time on the target ‘2012 machine’
Shell TasksShell Tasks 90 × 1018 FLOPs 880 days CPU time on the target ‘2012 machine’
TOTALTOTAL 1021 FLOPs Assuming factor 10 in uncertaintyAssuming factor 10 in uncertainty
Top TasksTop Tasks- GIS processing: 125 days (CPU processing on 2012 machine)- first-look: 125 days (assumed equal to GIS at present)- spectro PSF fitting: 71 days- variability period: 33 days- various DMS classes: 60 days- DMS: ASM analysis- multiples: ASM
* Assuming a 40 GFlop machine today extrapolated to 2012 with Moore’s Law* Assuming a 40 GFlop machine today extrapolated to 2012 with Moore’s Law
S.G. AnsariS.G. Ansari
GridGrid
April 18, 2023April 18, 2023
The first monthsThe first months
Setting up the hardware and nodes was easy Setting up the hardware and nodes was easy and took 2 man monthsand took 2 man months
Globus was installed on:Globus was installed on: ESTEC nodesESTEC nodes ESRIN was already up and runningESRIN was already up and running CESCA node in BarcelonaCESCA node in Barcelona ULB node in BrusselsULB node in Brussels ARI node in HeidelbergARI node in Heidelberg
GridAssist tool was identified as a potential GridAssist tool was identified as a potential workflow toolworkflow tool
S.G. AnsariS.G. Ansari
GridGrid
April 18, 2023April 18, 2023
Task distribution on GaiaGridTask distribution on GaiaGrid
GDAAS DB
Gaia Simulator
Core Processing
GridAssist Controller
Data Access Layer
Globus NodeShell TaskGlobus NodeShell Task
Initial Data Treatment } Barcelona
ESTEC
ULB
ESRIN
S.G. AnsariS.G. Ansari
GridGrid
April 18, 2023April 18, 2023
Current InfrastructureCurrent Infrastructure
9 Infrastructures in 7 countries (voluntary)9 Infrastructures in 7 countries (voluntary)[51 CPUs]:[51 CPUs]: ESTEC [14 CPUs] (SCI-CI) + 1 Gigabit dedicated link to SurfnetESTEC [14 CPUs] (SCI-CI) + 1 Gigabit dedicated link to Surfnet ESAC [ 4 CPUs] (SCI-SD) + 8 Mb link to REDIRISESAC [ 4 CPUs] (SCI-SD) + 8 Mb link to REDIRIS ESRIN [16 CPUs] (EOP) + 155 Mb link to GARRESRIN [16 CPUs] (EOP) + 155 Mb link to GARR CESCA [ 5 CPUs] (Barcelona) + REDIRIS connectivityCESCA [ 5 CPUs] (Barcelona) + REDIRIS connectivity ARI [ 2 CPUs] (Heidelberg) + Academic backboneARI [ 2 CPUs] (Heidelberg) + Academic backbone ULB [ 1 CPU] (Brussels) + Academic backboneULB [ 1 CPU] (Brussels) + Academic backbone DutchSpace [7 CPU] (Leiden) + Commercial linkDutchSpace [7 CPU] (Leiden) + Commercial link IoA [1 CPU] (Cambridge) + Academic BackboneIoA [1 CPU] (Cambridge) + Academic Backbone UGE [1 CPU] (Geneva) + Academic BackboneUGE [1 CPU] (Geneva) + Academic Backbone
2 Data Storage Elements:2 Data Storage Elements: CESCA [5 Terabytes]CESCA [5 Terabytes] ESTEC [2 Terabytes]ESTEC [2 Terabytes] ESAC [upto 4 Terabytes)ESAC [upto 4 Terabytes)
The current infrastructure has been created on an experimental basis and should The current infrastructure has been created on an experimental basis and should not yet be considered part of an operational environmentnot yet be considered part of an operational environment
S.G. AnsariS.G. Ansari
GridGrid
April 18, 2023April 18, 2023
Current ApplicationsCurrent Applications
Gaia SimulatorGaia Simulator Astrometric Binary Star Shell Task Astrometric Binary Star Shell Task Variability Star Analysis Shell TaskVariability Star Analysis Shell Task RVS Cross Correlation Shell TaskRVS Cross Correlation Shell Task
S.G. AnsariS.G. Ansari
GridGrid
April 18, 2023April 18, 2023
Global Gaia Data ProcessingGlobal Gaia Data Processing
S.G. AnsariS.G. Ansari
GridGrid
April 18, 2023April 18, 2023
The GridAssist ClientThe GridAssist ClientPerformance Grid ComputationPerformance Grid Computation
Heidelberg
Rome
Leiden
Barcelona
S.G. AnsariS.G. Ansari
GridGrid
April 18, 2023April 18, 2023
The GridAssist ClientThe GridAssist ClientDistributed Grid ComputationDistributed Grid Computation
Barcelona
Brussels
Noordwijk
S.G. AnsariS.G. Ansari
GridGrid
April 18, 2023April 18, 2023
ResultsResults
Gaia Simulator profited tremendously from GaiaGrid, Gaia Simulator profited tremendously from GaiaGrid, which accelerated the simulations of the Astrometric which accelerated the simulations of the Astrometric Binary Stars. This would have otherwise needed to be Binary Stars. This would have otherwise needed to be scheduled on a single infrastructure at CESCA, which scheduled on a single infrastructure at CESCA, which was at the same time running GDAAS tasks.was at the same time running GDAAS tasks.
The Astrometric Binary Star Analysis for a single HTM cell The Astrometric Binary Star Analysis for a single HTM cell (383 systems) is down to 15 minutes (and falling) on 2 (383 systems) is down to 15 minutes (and falling) on 2 infrastructures from a single CPU in Brussels, which infrastructures from a single CPU in Brussels, which was taking 3 hours.was taking 3 hours.
S.G. AnsariS.G. Ansari
GridGrid
April 18, 2023April 18, 2023
Possible Implementation:Possible Implementation:The Gaia Collaboration EnvironmentThe Gaia Collaboration Environment
Binary Star Analysis
Variable Star Analysis
Radial Velocity Cross Correlations
Photometric Analysis
Classification
GaiaLib
Core Interface
Core Interface
Binary Star Analysis
Variable Star Analysis
Classification
Gaia Data Results
The Gaia Community would develop, analyse and update the data transparently, without having any notion of where each component is running, or have to worry about CPU and storage limitatons.
S.G. AnsariS.G. Ansari
GridGrid
April 18, 2023April 18, 2023
Security IssuesSecurity Issues
All ESTEC and ESRIN Grid machines lie outside All ESTEC and ESRIN Grid machines lie outside the ESA firewallthe ESA firewall
Security is controlled via ESA Grid certificationSecurity is controlled via ESA Grid certification Currently no distinction is made between Currently no distinction is made between
projects (e.g. GaiaGrid and PlanckGrid.)projects (e.g. GaiaGrid and PlanckGrid.) The GridAssist tool provides basic functionality The GridAssist tool provides basic functionality
to distinguish an administrator (person who may to distinguish an administrator (person who may add/remove sources) from a workflow user.add/remove sources) from a workflow user.
S.G. AnsariS.G. Ansari
GridGrid
April 18, 2023April 18, 2023
CertificationCertification
Certification Authority for ESA Grid lies currently Certification Authority for ESA Grid lies currently with ESTEC (SCI-C)with ESTEC (SCI-C)
This is under review in light of higher-level This is under review in light of higher-level discussions within EIROForum Grid Groupdiscussions within EIROForum Grid Group
S.G. AnsariS.G. Ansari
GridGrid
April 18, 2023April 18, 2023
Future ActivitiesFuture Activities
The GaiaGrid environment is available to anyone wishing The GaiaGrid environment is available to anyone wishing to experiment with parallelization and distribution of tasksto experiment with parallelization and distribution of tasks
In the current Gaia Data Processing framework, the In the current Gaia Data Processing framework, the environment can only be used as standalone. environment can only be used as standalone.
The possibility of using the Grid environment to also The possibility of using the Grid environment to also carry out some core tasks is being investigated.carry out some core tasks is being investigated.
GaiaGrid can be considered the testbed for all GaiaGrid can be considered the testbed for all algorithms under developmentalgorithms under development
S.G. AnsariS.G. Ansari
GridGrid
April 18, 2023April 18, 2023
ConclusionsConclusions
GaiaGrid has demonstrated that it is easy to GaiaGrid has demonstrated that it is easy to setup a Grid environment.setup a Grid environment.
GaiaGrid has also demonstrated the GaiaGrid has also demonstrated the collaborative capabilities by allowing the sharing collaborative capabilities by allowing the sharing of results amongst multiple institutesof results amongst multiple institutes
The deployment of the Gaia Simulator has led The deployment of the Gaia Simulator has led programmer to think more “portable”programmer to think more “portable”
S.G. AnsariS.G. Ansari
GridGrid
April 18, 2023April 18, 2023
Lessons learnedLessons learned
The The developmentdevelopment of Gaia algorithms is a task that involves a community of of Gaia algorithms is a task that involves a community of people dispersed across Europepeople dispersed across Europe
No single groupNo single group should believe that they can implement all of these should believe that they can implement all of these algorithms without the proper support by the communityalgorithms without the proper support by the community
A sound A sound collaborationcollaboration environment is essential to ensure that everyone in environment is essential to ensure that everyone in a single community has a common understanding of the problematics.a single community has a common understanding of the problematics.
ProcessingProcessing is cheap and the technology is simple, but cumbersome to is cheap and the technology is simple, but cumbersome to maintain. Each shell task has to be installed on all the Grid machines used maintain. Each shell task has to be installed on all the Grid machines used in any in any Virtual OrganisationVirtual Organisation. .
There is There is no magicno magic to Grid! to Grid! The main hurdles in Grid involve The main hurdles in Grid involve security and certificationsecurity and certification. Who should be . Who should be
allowed to run jobs on my machine(s)?allowed to run jobs on my machine(s)? Grid should always be considered as Grid should always be considered as “added value”,“added value”, but should not be but should not be
considered within the scope of day-to-day operations like the data considered within the scope of day-to-day operations like the data processing in Gaia (if it becomes that, you have underestimated the effort of processing in Gaia (if it becomes that, you have underestimated the effort of carrying out your project and should review your internal resources for the carrying out your project and should review your internal resources for the long term.)long term.)