+ All Categories
Home > Documents > 2001 Summer Student Lectures Computing at CERN Lecture 4 — The Grid & Software Tony Cass —...

2001 Summer Student Lectures Computing at CERN Lecture 4 — The Grid & Software Tony Cass —...

Date post: 30-Dec-2015
Category:
Upload: jessica-dalton
View: 215 times
Download: 1 times
Share this document with a friend
Popular Tags:
27
CERN 2001 Summer Student Lectures Computing at CERN Lecture 4 — The Grid & Software Tony Cass — [email protected]
Transcript

CERN

2001 Summer Student Lectures

Computing at CERN

Lecture 4 — The Grid & Software

Tony Cass — [email protected]

CERN

2Tony Cass

LHC Computing WorldwideThis picture, from the CMS CTP, shows how a regional centre, here Fermilab, fits into the computing environment between CERN and universities.

It is assumed here that high bandwidth networks are available between CERN and this, US based, regional centre. However, the possibility of an Air Freight link for data transfers is also indicated.

Although regional centres, in the US and elsewhere, will certainly exist, we do not yet know how best to make use of the facilities they will offer. Can we link CERN and all the regional centres into one global facility, usable from everywhere? Or do the regional centres just provide resources for their local clients?

CERN

3Tony Cass

LHC Computing Worldwide - MONARC

The MONARC Project has been set up to study these issues.– Models Of Networked Analysis at Regional Centres

More input on the practicalities of global analysis is needed for– the Computing Progress Reports to be produced this year by

ATLAS and CMS —and maybe other experiments

– Funding Agencies, especially in the US, and

– Planning!

CERN

4Tony Cass

The Grid Over the past year, the “Grid” metaphor for providing access to remote

computing resource has become popular. Will the Grid bind Regional centres together?

– Studies are underway in Europe and the US.

… the Computing Grid provides transparent, on-demand access to computing facilities.

Just as a Power Grid provides transparent,on-demand access to electrical power…

CERN

5Tony Cass

The Globus Toolkit Providing transparent access to different computing

resources requires an interface layer which hides details of– batch systems (LSF, LoadLeveler, Condor),

– security and authentication

– …

The Globus Toolkit has been developed as just such an interface layer and is being tested at CERN and other HEP labs.– There’s still a long way to go, though!

LHC Computing Grid

CERN

Germany

Tier 1

USAFermiLab

UK

France

Italy

NL

USABrookhaven

……….

The LHC Computing

CentreTier2

Lab a

Uni a

Lab c

Uni n

Lab m

Lab b

Uni bUni y

Uni x

PhysicsDepartment

Desktop

CERN

7Tony Cass

Authentication — Kerberos vs PKI Kerberos is a popular authentication and access control system.

I prove I know something (my password) and a central server gives me a ticket to access resources.I have a ticket, so I just need to type my password once,But a central server is needed at each site.

In a Public Key system, I have a certificate signed by some trusted body which I need to show to prove who I am.My certificate will be accepted by anybody who trusts the organisation

that signed my certificate,but I must protect it so you don’t steal it and use it instead! So I have to

type a password or passphrase whenever I need to use the certificate.

CERN

8Tony Cass

Software Concerns for LHCSoftware will throw LHC data away.

– software (human!) errors will lose data forever. Would you take this responsibility? Can you write bug free

code?– What would you do if you were managing the worldwide effort?

Object Oriented techniques are today’s “industry standard” and LHC experiments must impose best practice.– There are also secondary considerations:

» widespread use of OO techniques outside HEP implies widespread availability of support tools and software, and

» OO trained (ex) physicists will find more employment opportunities.

CERN

9Tony Cass

Software Concerns for LHC II Everything will change between now and 2005.

– The computing environment» Unix vs NT.

– The programming language» C++ vs Java.

– The “in things”» OO vs ?; Java vs ?; what will “computers” look like in 2005?

These all changed for LEP and those planning for LHC must take this into account.– But maybe we’re being too worried. LEP was planned at a time when

IBM mainframes and DEC minis looked invincible. The LEP experiments still coped with change.

CERN

10Tony Cass

batchphysicsanalysis

batchphysicsanalysis

event summary data

rawdata

eventreconstruction

eventreconstruction

eventsimulation

eventsimulation

interactivePhysics analysis

analysis objects(extracted by physics topic)

event filter(selection &

reconstruction)

event filter(selection &

reconstruction)

Data and Computation for Physics Analysis

detector

Storage Solutions: Zebra,Objectivity/DB, ROOT

Simulation Packages:GEANT3, GEANT4, FLUKA

Experiment frameworks provide interfaces to storage and common services. HEP toolkits and packages provided to meet common needs. Analysis and visualisation packages: HBOOK, PAW,

ROOT, Lizard, Iguana, JAS…

Everything built using language standards, e.g. STL

CERN

11Tony Cass

OO Techniques and Data Storage/Management HEP has added many Data Storage and Management systems

on top of Fortran– e.g. Zebra for data structures, FATMEN for event/file management

With the move to OO, can HEP use OO databases for event storage and management?– Can it be done?– Is it efficient?

It seems the answer is yes. How do we really switch to this model?– LHC software designers have to embrace this model of working now

and work to provide optimised storage/processing environments.

CERN

12Tony Cass

Why use an Object Database?Raw data is reconstructed to produce ESD/AOD and then interesting eventsare selected for further study.

Event Tags

Event Header

AOD

ESD

Raw Data

TrackerHits

Tracks

Event Header

Event Header

Raw

ESD

AOD

dst/ntupleIn the traditional schemethis produces different data sets and going back from a high to a low level is difficult.

With an object model and an object database it is much easier to navigate between the different levels of description of an event.

Particles

Boo

kkee

ping

Dat

abas

e

CERN

13Tony Cass

Why use an Object Database? Hiding the details of the file storage is done by the database manager. An

RDBMS (e.g. Oracle) also hides details of file storage, so why use an ODBMS?– With an ODBMS, the underlying details of the I/O are hidden. The program variables

are the storage variables, there is no need for explicit copying by the programmer.

– Physicists don’t set out to select all tracks of a given event. They might want to access some tracks of an event, though. This sort of access maps better onto an object database.

– An ODBMS allows applications to “suggest” that parts of an event should be stored close to each other—rather than storing all tracks close together.

But these don’t seem to be general requirements—the ODBMS market has not taken off. We need to be careful!

CERN

14Tony Cass

Data Databases in 2001 RDBMS vendors have been moving towards the ODBMS market

for some time—introducing Object-Relational DBMS. Oracle 9i, with the recently announced C++ interface, provides

all the ODBMS features of the previous slide.– You can now navigate between objects in the database.

We are now actively testing the use of Oracle 9i for physics data. Particular aspects being investigated are– Scalability – Storage overhead

– Mass Storage System Integration – Data Import/Export

Initial results are promising.

CERN

15Tony Cass

Toolkits versus Frameworks

Toolkits

Sets of generic procedures1 that can be invoked to perform related tasks.Do not constrain users (apart from parameter lists!).Can be provided by experiments but also by others, e.g. IT or 3rd parties.

Frameworks

Systems to decide the order of execution,–invoke procedures to do necessary work in determined order

»including user procedures.

Constrain users to work within the overall architecture.Are experiment specific.

1 Note that the word “procedure” is used here in a general sense. In terms of procedural languages, procedures are subroutines and functions. For an Object Oriented language, a procedure is a class.

CERN

16Tony Cass

Toolkit DesignA toolkit should be

generic so it can be used in more than one frameworkindependent i.e. not forcing the use of other toolkitswell defined with clear interfaces so it can be replaced.

VCR

IEuroConnectorIRfInput

IUserInterface IInfraredInput

TV set• Each interface is specialised in a

domain.

• Interfaces are independent of concrete implementations.

• You can mix devices from several constructors.

• Application built by composing.

• Standardizing on the interfaces gives us big leverage.

As an example, consider a video recorder.

CERN

17Tony Cass

Data Analysis Toolkits for LHC Just as for the data storage/management, HEP has developed a

specialised, Fortran based, analysis environment—HBOOK, PAW and CERNLIB as a whole.

These needed to be rewritten/reinvented as HEP moved to OO techniques.– Can we instead profit from commercial data analysis tools?

OO based simulation packages are needed now—and GEANT4 is becoming a reality.– The GEANT4 project, launched in 1994, is also a demonstration of

effective worldwide collaboration on a major software project, » and there is much interest in GEANT4 beyond HEP.

CERN

18Tony Cass

(LHC) Framework Design Choices From the user point of view, an experiment computing framework

ensures that they can write code for a specific purpose (e.g. analysis or detector reconstruction) without having to worry about anything else– The framework ensures that

» objects and services they need are made available, and

» any objects they create will be stored if required.

The three LHC frameworks are best distinguished by the choices they have made in two areas.– Exposure of the persistency model for storage. Do users work with transient or

persistent objects? Do users see the inheritance from the base persistence class?

– Procedure invocation. Do users themselves decide the order of invocation of a set of procedures to produce a given object? Or do they demand the object and leave the framework to decide which procedures must be invoked to produce it?

CERN

19Tony Cass

GAUDI (after the Catalan architect)

Converter

Algorithm

Event DataService

PersistencyService

DataFiles

AlgorithmAlgorithm

Transient Event Store

Detec. DataService

PersistencyService

DataFiles

Transient Detector

Store

MessageService

JobOptionsService

Particle Prop.Service

OtherServices

HistogramService

PersistencyService

DataFiles

TransientHistogram

Store

ApplicationManager

ConverterConverter

CERN

20Tony Cass

CARF (CMS Analysis and Reconstruction Framework)

LHC++ ODBMS Geant4 CLHEPPAW

Successor

C++ standard library

Extension toolkit

Reconstruction

Algorithms

Data

Monitoring

Event Filter

Physics

Analysis

CalibrationObjects Event Objects

VisualizationObjects

Application Framework Physics modules

Utility Toolkit

CERN

21Tony Cass

AliROOT (Alice and ROOT)

Root particle Root particle stackstack

hits structureshits structures

Geometry Geometry DatabaseDatabase

Virtual MC

Run ControlFLUKA

Geant3.Geant3.2121

Geant4

Fast MCGenerators

TransportEngine selected

at run time

CERN

22Tony Cass

In “Object Solutions”, Booch says that there are three basic types of object oriented applications.

If they focus on they aredirect visualization and manipulation of the user-centricobjects that define a certain domain

preserving the integrity of the persistent objects data-centric

in a system

the transformation of objects that are computation-centric interesting to the system

Using this categorisation, we could say thatAliROOT is user-centric

CARF is data-centric

GAUDI is computation-centric

LHC Frameworks: Another Comparison

CERN

23Tony Cass

Non-event data To make sense of an event, the raw detector data is not

enough. Non-event data is needed– for the overall geometry and structure of the detector, including

information about magnetic fields, and– as they are not perfectly still, to understand the real positions of the

subdetectors at the moment of the collision;– to have the correct detector calibration at the time of the collision as

detector response also changes (e.g. with temperature); and– about the run conditions of the accelerator, e.g. beam energy, at the

time of the collision. All of these non-event data must also be stored and managed.

CERN

24Tony Cass

The Overall PictureGlobally, then, tags point to a collection of events which are in a collection of runs—each of which has certain properties such as energy or calibration constants.

Event DataSet* RunParamRun

TagCollection

**

Everything could be in one single (object) database.

TagCollection

**

Event DataSet* RunParamRun

Or event data can be kept in one database with non event data kept in a different database—either object or relational.

How do these different collections fit together?

CERN

25Tony Cass

When are objects created? Once

– as part of some standard processing step (e.g. reconstruction) run» for all (interesting) events in batch mode, or

» when needed for any individual event.

Many times– Once at least! See above…

– But also as necessary if recomputing using local data is faster than fetching the existing objects from some remote system.

CERN

29Tony Cass

Looking Forwards—Summary LHC demands for CPU and I/O capacity significantly exceed

those of the LEP experiments.– Fortunately, experiments such as COMPASS have intermediate

requirements and allow us to study the problems before LHC startup.– CPU cost trends suggest we can afford distributed computing farms

which provide adequate resources» but we have to start installing these in 2003/2004.

Software quality is a major concern for the LHC experiments.– Object Oriented techniques are being adopted.– This allows us to consider the use of Object Oriented Databases for

data management and other commercial packages for analysis work.

CERN

30Tony Cass

Computing at CERN is interesting! Computing at CERN is about Data!

Computing facilities at CERN are essential for designing, building and operating both accelerators and detectors.

Computers, of course, play a key role in the reconstruction and analysis of the raw data collected by experiments.

There are many interesting challenges as we look forward to high data rate experiments in the next couple of years and beyond to the LHC.

Computing at CERN—Conclusions


Recommended