Date post: | 30-Dec-2015 |
Category: |
Documents |
Upload: | jessica-dalton |
View: | 215 times |
Download: | 1 times |
CERN
2001 Summer Student Lectures
Computing at CERN
Lecture 4 — The Grid & Software
Tony Cass — [email protected]
CERN
2Tony Cass
LHC Computing WorldwideThis picture, from the CMS CTP, shows how a regional centre, here Fermilab, fits into the computing environment between CERN and universities.
It is assumed here that high bandwidth networks are available between CERN and this, US based, regional centre. However, the possibility of an Air Freight link for data transfers is also indicated.
Although regional centres, in the US and elsewhere, will certainly exist, we do not yet know how best to make use of the facilities they will offer. Can we link CERN and all the regional centres into one global facility, usable from everywhere? Or do the regional centres just provide resources for their local clients?
CERN
3Tony Cass
LHC Computing Worldwide - MONARC
The MONARC Project has been set up to study these issues.– Models Of Networked Analysis at Regional Centres
More input on the practicalities of global analysis is needed for– the Computing Progress Reports to be produced this year by
ATLAS and CMS —and maybe other experiments
– Funding Agencies, especially in the US, and
– Planning!
CERN
4Tony Cass
The Grid Over the past year, the “Grid” metaphor for providing access to remote
computing resource has become popular. Will the Grid bind Regional centres together?
– Studies are underway in Europe and the US.
… the Computing Grid provides transparent, on-demand access to computing facilities.
Just as a Power Grid provides transparent,on-demand access to electrical power…
CERN
5Tony Cass
The Globus Toolkit Providing transparent access to different computing
resources requires an interface layer which hides details of– batch systems (LSF, LoadLeveler, Condor),
– security and authentication
– …
The Globus Toolkit has been developed as just such an interface layer and is being tested at CERN and other HEP labs.– There’s still a long way to go, though!
LHC Computing Grid
CERN
Germany
Tier 1
USAFermiLab
UK
France
Italy
NL
USABrookhaven
……….
The LHC Computing
CentreTier2
Lab a
Uni a
Lab c
Uni n
Lab m
Lab b
Uni bUni y
Uni x
PhysicsDepartment
Desktop
CERN
7Tony Cass
Authentication — Kerberos vs PKI Kerberos is a popular authentication and access control system.
I prove I know something (my password) and a central server gives me a ticket to access resources.I have a ticket, so I just need to type my password once,But a central server is needed at each site.
In a Public Key system, I have a certificate signed by some trusted body which I need to show to prove who I am.My certificate will be accepted by anybody who trusts the organisation
that signed my certificate,but I must protect it so you don’t steal it and use it instead! So I have to
type a password or passphrase whenever I need to use the certificate.
CERN
8Tony Cass
Software Concerns for LHCSoftware will throw LHC data away.
– software (human!) errors will lose data forever. Would you take this responsibility? Can you write bug free
code?– What would you do if you were managing the worldwide effort?
Object Oriented techniques are today’s “industry standard” and LHC experiments must impose best practice.– There are also secondary considerations:
» widespread use of OO techniques outside HEP implies widespread availability of support tools and software, and
» OO trained (ex) physicists will find more employment opportunities.
CERN
9Tony Cass
Software Concerns for LHC II Everything will change between now and 2005.
– The computing environment» Unix vs NT.
– The programming language» C++ vs Java.
– The “in things”» OO vs ?; Java vs ?; what will “computers” look like in 2005?
These all changed for LEP and those planning for LHC must take this into account.– But maybe we’re being too worried. LEP was planned at a time when
IBM mainframes and DEC minis looked invincible. The LEP experiments still coped with change.
CERN
10Tony Cass
batchphysicsanalysis
batchphysicsanalysis
event summary data
rawdata
eventreconstruction
eventreconstruction
eventsimulation
eventsimulation
interactivePhysics analysis
analysis objects(extracted by physics topic)
event filter(selection &
reconstruction)
event filter(selection &
reconstruction)
Data and Computation for Physics Analysis
detector
Storage Solutions: Zebra,Objectivity/DB, ROOT
Simulation Packages:GEANT3, GEANT4, FLUKA
Experiment frameworks provide interfaces to storage and common services. HEP toolkits and packages provided to meet common needs. Analysis and visualisation packages: HBOOK, PAW,
ROOT, Lizard, Iguana, JAS…
Everything built using language standards, e.g. STL
CERN
11Tony Cass
OO Techniques and Data Storage/Management HEP has added many Data Storage and Management systems
on top of Fortran– e.g. Zebra for data structures, FATMEN for event/file management
With the move to OO, can HEP use OO databases for event storage and management?– Can it be done?– Is it efficient?
It seems the answer is yes. How do we really switch to this model?– LHC software designers have to embrace this model of working now
and work to provide optimised storage/processing environments.
CERN
12Tony Cass
Why use an Object Database?Raw data is reconstructed to produce ESD/AOD and then interesting eventsare selected for further study.
Event Tags
Event Header
AOD
ESD
Raw Data
TrackerHits
Tracks
Event Header
Event Header
Raw
ESD
AOD
dst/ntupleIn the traditional schemethis produces different data sets and going back from a high to a low level is difficult.
With an object model and an object database it is much easier to navigate between the different levels of description of an event.
Particles
Boo
kkee
ping
Dat
abas
e
CERN
13Tony Cass
Why use an Object Database? Hiding the details of the file storage is done by the database manager. An
RDBMS (e.g. Oracle) also hides details of file storage, so why use an ODBMS?– With an ODBMS, the underlying details of the I/O are hidden. The program variables
are the storage variables, there is no need for explicit copying by the programmer.
– Physicists don’t set out to select all tracks of a given event. They might want to access some tracks of an event, though. This sort of access maps better onto an object database.
– An ODBMS allows applications to “suggest” that parts of an event should be stored close to each other—rather than storing all tracks close together.
But these don’t seem to be general requirements—the ODBMS market has not taken off. We need to be careful!
CERN
14Tony Cass
Data Databases in 2001 RDBMS vendors have been moving towards the ODBMS market
for some time—introducing Object-Relational DBMS. Oracle 9i, with the recently announced C++ interface, provides
all the ODBMS features of the previous slide.– You can now navigate between objects in the database.
We are now actively testing the use of Oracle 9i for physics data. Particular aspects being investigated are– Scalability – Storage overhead
– Mass Storage System Integration – Data Import/Export
Initial results are promising.
CERN
15Tony Cass
Toolkits versus Frameworks
Toolkits
Sets of generic procedures1 that can be invoked to perform related tasks.Do not constrain users (apart from parameter lists!).Can be provided by experiments but also by others, e.g. IT or 3rd parties.
Frameworks
Systems to decide the order of execution,–invoke procedures to do necessary work in determined order
»including user procedures.
Constrain users to work within the overall architecture.Are experiment specific.
1 Note that the word “procedure” is used here in a general sense. In terms of procedural languages, procedures are subroutines and functions. For an Object Oriented language, a procedure is a class.
CERN
16Tony Cass
Toolkit DesignA toolkit should be
generic so it can be used in more than one frameworkindependent i.e. not forcing the use of other toolkitswell defined with clear interfaces so it can be replaced.
VCR
IEuroConnectorIRfInput
IUserInterface IInfraredInput
TV set• Each interface is specialised in a
domain.
• Interfaces are independent of concrete implementations.
• You can mix devices from several constructors.
• Application built by composing.
• Standardizing on the interfaces gives us big leverage.
As an example, consider a video recorder.
CERN
17Tony Cass
Data Analysis Toolkits for LHC Just as for the data storage/management, HEP has developed a
specialised, Fortran based, analysis environment—HBOOK, PAW and CERNLIB as a whole.
These needed to be rewritten/reinvented as HEP moved to OO techniques.– Can we instead profit from commercial data analysis tools?
OO based simulation packages are needed now—and GEANT4 is becoming a reality.– The GEANT4 project, launched in 1994, is also a demonstration of
effective worldwide collaboration on a major software project, » and there is much interest in GEANT4 beyond HEP.
CERN
18Tony Cass
(LHC) Framework Design Choices From the user point of view, an experiment computing framework
ensures that they can write code for a specific purpose (e.g. analysis or detector reconstruction) without having to worry about anything else– The framework ensures that
» objects and services they need are made available, and
» any objects they create will be stored if required.
The three LHC frameworks are best distinguished by the choices they have made in two areas.– Exposure of the persistency model for storage. Do users work with transient or
persistent objects? Do users see the inheritance from the base persistence class?
– Procedure invocation. Do users themselves decide the order of invocation of a set of procedures to produce a given object? Or do they demand the object and leave the framework to decide which procedures must be invoked to produce it?
CERN
19Tony Cass
GAUDI (after the Catalan architect)
Converter
Algorithm
Event DataService
PersistencyService
DataFiles
AlgorithmAlgorithm
Transient Event Store
Detec. DataService
PersistencyService
DataFiles
Transient Detector
Store
MessageService
JobOptionsService
Particle Prop.Service
OtherServices
HistogramService
PersistencyService
DataFiles
TransientHistogram
Store
ApplicationManager
ConverterConverter
CERN
20Tony Cass
CARF (CMS Analysis and Reconstruction Framework)
LHC++ ODBMS Geant4 CLHEPPAW
Successor
C++ standard library
Extension toolkit
Reconstruction
Algorithms
Data
Monitoring
Event Filter
Physics
Analysis
CalibrationObjects Event Objects
VisualizationObjects
Application Framework Physics modules
Utility Toolkit
CERN
21Tony Cass
AliROOT (Alice and ROOT)
Root particle Root particle stackstack
hits structureshits structures
Geometry Geometry DatabaseDatabase
Virtual MC
Run ControlFLUKA
Geant3.Geant3.2121
Geant4
Fast MCGenerators
TransportEngine selected
at run time
CERN
22Tony Cass
In “Object Solutions”, Booch says that there are three basic types of object oriented applications.
If they focus on they aredirect visualization and manipulation of the user-centricobjects that define a certain domain
preserving the integrity of the persistent objects data-centric
in a system
the transformation of objects that are computation-centric interesting to the system
Using this categorisation, we could say thatAliROOT is user-centric
CARF is data-centric
GAUDI is computation-centric
LHC Frameworks: Another Comparison
CERN
23Tony Cass
Non-event data To make sense of an event, the raw detector data is not
enough. Non-event data is needed– for the overall geometry and structure of the detector, including
information about magnetic fields, and– as they are not perfectly still, to understand the real positions of the
subdetectors at the moment of the collision;– to have the correct detector calibration at the time of the collision as
detector response also changes (e.g. with temperature); and– about the run conditions of the accelerator, e.g. beam energy, at the
time of the collision. All of these non-event data must also be stored and managed.
CERN
24Tony Cass
The Overall PictureGlobally, then, tags point to a collection of events which are in a collection of runs—each of which has certain properties such as energy or calibration constants.
Event DataSet* RunParamRun
TagCollection
**
Everything could be in one single (object) database.
TagCollection
**
Event DataSet* RunParamRun
Or event data can be kept in one database with non event data kept in a different database—either object or relational.
How do these different collections fit together?
CERN
25Tony Cass
When are objects created? Once
– as part of some standard processing step (e.g. reconstruction) run» for all (interesting) events in batch mode, or
» when needed for any individual event.
Many times– Once at least! See above…
– But also as necessary if recomputing using local data is faster than fetching the existing objects from some remote system.
CERN
29Tony Cass
Looking Forwards—Summary LHC demands for CPU and I/O capacity significantly exceed
those of the LEP experiments.– Fortunately, experiments such as COMPASS have intermediate
requirements and allow us to study the problems before LHC startup.– CPU cost trends suggest we can afford distributed computing farms
which provide adequate resources» but we have to start installing these in 2003/2004.
Software quality is a major concern for the LHC experiments.– Object Oriented techniques are being adopted.– This allows us to consider the use of Object Oriented Databases for
data management and other commercial packages for analysis work.
CERN
30Tony Cass
Computing at CERN is interesting! Computing at CERN is about Data!
Computing facilities at CERN are essential for designing, building and operating both accelerators and detectors.
Computers, of course, play a key role in the reconstruction and analysis of the raw data collected by experiments.
There are many interesting challenges as we look forward to high data rate experiments in the next couple of years and beyond to the LHC.
Computing at CERN—Conclusions