Date post: | 14-Jan-2016 |
Category: |
Documents |
Upload: | cordelia-fox |
View: | 217 times |
Download: | 0 times |
M.MaseraM.Masera ALICE CALCOLO 2004ALICE CALCOLO 2004
IL CALCOLO NEL 2004:
PHYSICS DATA CHALLENGE III PHYSICS DATA CHALLENGE III
Massimo [email protected]
Commissione Scientifica Nazionale I22 giugno 2004
M.MaseraM.Masera ALICE CALCOLO 2004ALICE CALCOLO 2004
Sommario
• The framework: status of AliRootAliRoot for the Physics Data Challenge III (PDCIII)
• The production environment: AliEnAliEn• AliEn as a meta-Grid: the use of LCG and Grid.It
in the PDCIII• Current status of the PDCIII
– Phase I is finished– Starting of Phase II– Phase III
• Conclusions
M.MaseraM.Masera ALICE CALCOLO 2004ALICE CALCOLO 2004
Period(milestone)
Fraction of the final capacity (%)
Physics Objective
06/01-12/01 1% pp studies, reconstruction of TPC and ITS
06/02-12/02 5% First test of the complete chain from simulation to reconstruction for the PPR
Simple analysis tools.Digits in ROOT format.
01/04-06/04 10% Complete chain used for trigger studies.Prototype of the analysis tools.Comparison with parameterised
MonteCarlo.Simulated raw data.
01/06-06/06 20% Test of the final system for reconstruction and analysis.
ALICE PHYSICS DATA CHALLENGES
ALICE Physics Data Challenge (2003) ALICE Physics Data Challenge (2003)
M.MaseraM.Masera ALICE CALCOLO 2004ALICE CALCOLO 2004
AliRoot
M.MaseraM.Masera ALICE CALCOLO 2004ALICE CALCOLO 2004
AliRoot layout
ROOT
AliRoot
STEER
Virtual MC
G3 G4 FLUKA
HIJING
MEVSIM
PYTHIA6
EVGEN
HBTP
HBTAN
ISAJET
AliE
n
EMCAL ZDCITS PHOSTRD TOF RICH
ESD
AliAnalysis
AliReconstruction
PMD
CRT FMD MUON TPCSTART RALICESTRUCT
AliSimulation
DPMJET
Production env. Interface with the world
M.MaseraM.Masera ALICE CALCOLO 2004ALICE CALCOLO 2004
Simulation and Reconstruction in AliRoot
• In the present data challenge simulation and reconstruction are steered by 2 classes: AliSimulation and AliReconstruction– simple user interface
$ AliSimulation sim;$ sim.Run();$ AliReconstruction rec;$ rec.Run();
– Goal: run the standardstandard simulation and reconstruction for all detectors
This is the simplest example:
• the number of events and the config file can be set
• merging and region of interest are merging and region of interest are also implementedalso implemented
M.MaseraM.Masera ALICE CALCOLO 2004ALICE CALCOLO 2004
Event Summary Data (ESD)
• The AliESD class is essentially a container for data No functions for the analysis
• It is the result of the reconstruction carried out systematically via batch/Grid jobs
• It aims to be the starting point for the analysis
• At reconstruction time, it can be used to exchange information among different rec. steps
M.MaseraM.Masera ALICE CALCOLO 2004ALICE CALCOLO 2004
Event Summary Data (ESD)
ESD
TPC tracker
TRD tracker ITS tracker
ITS stand-alone
TOFPHOS
MUON
File
The following detectors are currently contributing to the ESD: ITS, TPC, TRD, TOF, PHOS and MUON.
The ESD structure is sufficient for the following “kinds of physics”: strangeness, charm, HBT, jets (the ones going to be tried in the DC2004).All the objects stored in the ESD are accessed via abstract interfaces (i.e. do not All the objects stored in the ESD are accessed via abstract interfaces (i.e. do not depend on sub-detector code)depend on sub-detector code)
ITS Vertexers
M.MaseraM.Masera ALICE CALCOLO 2004ALICE CALCOLO 2004
Example: primary vertex
AliVertexer
AliITSVertexer
AliITSVertexerIons AliITSVertexerZ AliITSVertexerFast
AliESDVertex
STEER directory:Interfaces and subdetector-independent code
ITS directory
Pb-Pb 3-D info for central events
p-p and peripheral events NEW code
Just a gaussian smearing of the generated vertex
AliITSVertexerTracks
High precision vertexerwith rec. tracks (pp D0)
AliReconstruction
AliReconstructor::CreateVertexer
AliITSReconstructor::CreateVertexer
AliESD
M.MaseraM.Masera ALICE CALCOLO 2004ALICE CALCOLO 2004
AliRoot: present situation
• Major changes in the last year…– New multi-file I/O in full production– New coordinate system– New reconstruction and simulations “drivers” (AliSimulation and
AliReconstruction classes)– First attempt at the ESD and analysis framework– Improvements in reconstruction and simulation
• … However, the system is evolving– ESD: the philosophy is still evolving– Introduction of FLUKA and new geometrical modeller– Development of the analysis framework– Raw data for all the detectors (already available for ITS and
TPC).– Introduction of the condition database infrastructure
M.MaseraM.Masera ALICE CALCOLO 2004ALICE CALCOLO 2004
AliEn and the Grid
M.MaseraM.Masera ALICE CALCOLO 2004ALICE CALCOLO 2004
The ALICE Production Environment: AliEn• Standards are now emerging for the basic building blocks of a GRID
– There are millions lines of code in the OS domain dealing with these issues
• Why not using these to build the minimal GRID that does the job?– Fast development of a prototype, no problem in exploring new roads, restarting from
scratch etc etc– Hundreds of users and developers– Immediate adoption of emerging standards
• An example, AliEn by ALICE (5% of code developed, 95% imported)
(…)
DBI
DBD
RD
BM
S(M
ySQ
L)
LDA
P
V.O
.Packa
ge
s&C
om
ma
nd
s
Perl C
ore
Perl
Module
s
Exte
rnal
Libra
ries
File
&
Meta
data
C
ata
logue
SO
AP
/XM
LC
ESE
Logger
Data
base
Pro
xy
Auth
enti
cati
on
RB
Use
r Inte
rface
ADBI
Confi
g
Mgr
Pack
age
Mgr
Web
Port
al
Use
r A
pplica
tion
API
(C/C
++
/perl)
CLI
GU
I
AliEn
Core
Com
pon
en
ts &
serv
ices
Inte
rfaces
Exte
rnal soft
ware
Low
level
Hig
h level
FS
M.MaseraM.Masera ALICE CALCOLO 2004ALICE CALCOLO 2004
AliEn Timeline
Functionality+
Simulation
Interoperability+
Reconstruction
Performance, Scalability, Standards+
Analysis
First production (distributed simulation)
Physics Performance Report (mixing & reconstruction)
10% Data Challenge (analysis)
2001 2002 2003 2004 2005
Start
M.MaseraM.Masera ALICE CALCOLO 2004ALICE CALCOLO 2004
From AliEn to a Meta-Grid• The Workload Management is “pull-model”: a server holds a
master queue of jobs and it is up to the CE that provides the CPU cycles to call it and ask for a job
• The system is integrated with a large-scale job submission and bookkeeping system “tuned” for Data Challenge productions, with job splitting, statistics, pie charts, automatic resubmissions, etc.
• The Job Monitoring model requires no “sensors” installed on the WN. It is the jobwrapper itself that talks to the server.
• Several Grid infrastructures are (becoming) available: LCG, Grid.It, possibly others
• Lots of resources but, in principle, different middlewares• Pull-model is well-suited for implementing higher-level submission Pull-model is well-suited for implementing higher-level submission
systems, since it does not require knowledge about the periphery, systems, since it does not require knowledge about the periphery, that may be very complexthat may be very complex
M.MaseraM.Masera ALICE CALCOLO 2004ALICE CALCOLO 2004
From AliEn to a Meta-GridDesign strategy:• Use AliEn as a general front-end
– Owned and shared resource are exploited transparently
• Minimize points of contact between the systems– No need to reimplement services etc.– No special services required to run on remote CE/WNs
• Make full use of provided services: Data Catalogues, scheduling, monitoring…– Let the Grids do their jobs (they should know how)
• Use high-level tools and APIs to access Grid resources– Developers put a lot of abstraction effort into hiding the
complexity and shielding the user from implementation changes
M.MaseraM.Masera ALICE CALCOLO 2004ALICE CALCOLO 2004
Available resources for PDC III• Several AliEn “native” sites (some rather large)
– Bari, CERN, CNAF, Catania, Cyfronet, FZK, JINR, LBL, Lyon, OSC, Prague, Torino
• LCG-2 core sites– CERN, CNAF, FZK, NIKHEF, RAL, Taiwan (more than 1000 CPUs)– At CNAF and Catania, the same resources can be accessed either
by LCG/Grid.It and by AliEn
• GRID.IT sites– LNL.INFN, PD.INFN and several smaller ones (about 400 CPUs not
including CNAF)
• Implementation: manage LCG resources through a “gateway”: an AliEn client (CE+SE) sitting on top of an LCG User Interface
The whole of LCG computing is seen as a single, large AliEn CE associated with a single, large SE
M.MaseraM.Masera ALICE CALCOLO 2004ALICE CALCOLO 2004
Software installation• Both AliEn and AliRoot installed via LCG jobs
– Do some checks, download tarballs, uncompress, build environment script and publish relevant tags
– Single command available to get the list of available sites, send the jobs everywhere and wait for completion. Full update on LCG-2 + GRID.IT (16 sites) takes ~30’
– Manual intervention still needed in few sites (e.g. CERN/LSF)– Ready for integration into AliEn automatic installation system
• Experiment software shared area misconfiguration caused most of the trouble in the beginning
LCG-UI
NIKHEF
Taiwan
RAL
CNAF
TO.INFN
installAlice.shinstallAlice.jdl
installAliEn.shinstallAliEn.jdl
…
M.MaseraM.Masera ALICE CALCOLO 2004ALICE CALCOLO 2004
Alien CE
LCG UIAlien
CEs/SEs
Server
User submits jobs
Catalog
LCG RB
LCG CEs/SEs
LCG LFN
LCG PFN
LCG LFN = AliEn PFN
Catalog
AliEn, Genius & EDG/LCG • LCG-2 is one CE of AliEn, which integrates LCG and non LCG resources
– If LCG-2 can run a large number of jobs, it will be used heavily– If LCG-2 cannot do that, AliEn selects other resources, and it will be less used
M.MaseraM.Masera ALICE CALCOLO 2004ALICE CALCOLO 2004
Physics Data Challenge III
M.MaseraM.Masera ALICE CALCOLO 2004ALICE CALCOLO 2004
CERN
Tier2
Tier1
Tier2
Tier1
Production of RAW
Shipment of RAW to CERN
Reconstruction of RAW in all T1-2’s
Analysis
AliEn job control
Data transfer
PDC 3 schema
M.MaseraM.Masera ALICE CALCOLO 2004ALICE CALCOLO 2004
Phases of ALICE Physics Data Challenge 2004
• Phase 1 - production of underlying events using heavy ion MC generators– Status: Completed
• Phase 2 – mixing of signal events in the underlying events– Status – starting
• Phase 3 – analysis of signal+underlying events:– Goal – to test the data analysis model of ALICE– Status – will begin in ~2 months
M.MaseraM.Masera ALICE CALCOLO 2004ALICE CALCOLO 2004
Signal-free event Mixed
signal
Merging
M.MaseraM.Masera ALICE CALCOLO 2004ALICE CALCOLO 2004
Statistics for phase 1 of ALICE PDC 2004 • Number of jobs:
– Central 1 (long, 12 hours) – 20 K– Peripheral 1 (medium – 6 hours) – 20 K– Peripheral 2 to 5 (short – 1 to 3 hours) – 16 K
• Number of files:– AliEn file catalogue: 3.8 million3.8 million (no degradation in
performance observed)– CERN Castor: 1.3 million1.3 million
• File size:– Total: 26 TB
• CPU work:– Total: 285 MSI-2K hours – LCG: 67 MSI-2K hours
M.MaseraM.Masera ALICE CALCOLO 2004ALICE CALCOLO 2004
Phase I: 1 Pb-Pb event
AliEnCatalog.36 files
M.MaseraM.Masera ALICE CALCOLO 2004ALICE CALCOLO 2004
• Phase 1 resource statistics:– 27 production centres, 12 major producers, no single site dominating the production
– Individual contribution of sites not displayed is on the level of Bari
– See slide 28 for a comparison between AliEn and LCG sites
– Italian contribution > 40%Italian contribution > 40%
M.MaseraM.Masera ALICE CALCOLO 2004ALICE CALCOLO 2004
• Phase 1 CPU profile:– Aiming for sustained running (as allowed by resources availability),
average 450 CPUs, max 1450 CPUs (not appearing due to binning)
1000000 filesCastor problem
M.MaseraM.Masera ALICE CALCOLO 2004ALICE CALCOLO 2004
Problems with Phase I• Two months delay mainly due to a delayed release of LCG-2• No SE in LCG-2 + poor storage availability in LCG sites
– Natural solution in PDC- Phase I: all files migrated to Castor@Cern• Castor related problems
– Initial lack of storage w.r.t. requests (30 TB… not yet available)– 300000 files limit above which the system performance dropped– servers reinstallation in March
• LCG: most of the problems are related to the configuration of the sites– Software management tools are still rudimentary– Large sites have often tighter security restrictions & other idiosincrasies– Investigating and fixing problems is hard and time-consuming
• The most difficult part of the management is monitoring LCG through a “keyhole”.– Only integrated information available natively – MonALISA for AliEn, GridICE for LCG
M.MaseraM.Masera ALICE CALCOLO 2004ALICE CALCOLO 2004
LCG / AliEn• Statistics after round 1 (ended april, 4): job distribution (LCG 46%)
– Alice::CERN::LCG is the interface to LCG-2– Alice::Torino::LCG is the interface to GRID.IT
• In the 2nd round AliEn was used more because of the lack of storage continuous stop/start of the production
SITUATION AT THE END OF ROUND 1
M.MaseraM.Masera ALICE CALCOLO 2004ALICE CALCOLO 2004
Phase 2 layoutAlien CE LCG-UI
Server
User submits jobs
Catalog
LCG RB
LCG LFN = AliEn PFNlcg://host/<GUID>
Catalog
Alien CE/SE
LCG CE/SE
CERN Castor
edg-copy-and-register
Phase 2 -- about to start Mixing of the underlying
events with signal events (jets, muons, J/)
We plan to use fully LCG DM tools, we may have may have problems of storage at problems of storage at local SElocal SE
M.MaseraM.Masera ALICE CALCOLO 2004ALICE CALCOLO 2004
Problems with Phase II
• Phase II will generate lots (1M) of (rather small ~7MB) files• We would need an extra stager at CERN, but this is not available at
the moment• We could use some TB of disk space, but this too is not available• We are testing a plug-in to AliEn using tar to bunch small files• The space available on the local LCG storage elements seems very
low… we will see• Preparation of the LCG-2 JDL is more complicated, due to the use
of the data management features• This has introduced a two weeks delay -- we hope to start soon!
M.MaseraM.Masera ALICE CALCOLO 2004ALICE CALCOLO 2004
Phase 3 layout
Alien CE LCG-UI
ServerUser query
Catalog
LCG RB
Catalog
Alien CE/SE
LCG CE/SE
lfn 1lfn 2lfn 3
lfn 7lfn 8
lfn 4lfn 5lfn 6
LCG CE/SE
Phase 3 -- foreseen in two months Analysis of signal+underlying events: Test the data analysis model of ALICE
AliEn job splitting AliEn job splitting tests with ARDA in tests with ARDA in September …September …
… … ARDA workshop today at CernARDA workshop today at Cern
M.MaseraM.Masera ALICE CALCOLO 2004ALICE CALCOLO 2004
ARDA, EGEE, gLite, LCG…
• ARDA was a RTAG (Sep 2003) devoted to analysis:– it found AliEn “the most complete system among all considered”– it became a LCG project. Setting up meeting: Jan, 2004
• ARDA is interfaced to the EGEE middleware (gLite), disclosed on May, 18 th . Prototype with EGEE MW due by Sep 04
• gLite is presently based on AliEn shell, Winsconsin CE, Globus gatekeeper, VOMS, GAS, …
• Next steps (F. Hemmer, PEB, Jun, 7th ): integration of R-GMA and EDG-WMS (developed by INFN)
• Support of LCG-2 maintained until EGEE satisfies the requirements of the experiments (PEB, Jun, 7th )
• This picture seems to be reverted now (EGEE, All activities meeting Jun, 18 th ):– LCG-2 will evolve to LCG-3, being focused on production– gLite will evolve in parallel and will be focused on development and analysis.
• These parallel evolutions will occasionally converge: gLite components will be merged in LCG-x MW as soon as they are completed
• This major change within the LCG project occurred abruptly without a prior discussion at PEB and SC2 level and without the approval of the experiments
• ALICE will examine the current situation in its next offline week (start: Jun, 28 th)
M.MaseraM.Masera ALICE CALCOLO 2004ALICE CALCOLO 2004
Conclusions
M.MaseraM.Masera ALICE CALCOLO 2004ALICE CALCOLO 2004
We will have an additional DC
The difficult start of the ongoing DC taught a lesson:• We cannot stay 18 months without testing our “production
capabilities”• In particular we have to maintain the readiness of
– Code (AliRoot + MW)– ALICE distributed computing facilities– LCG infrastructure– Human “production machinery”
• Getting all the partners into “production mode” was a non-negligible effort
• We have to plan carefully size and physics objectives of this data challenge
M.MaseraM.Masera ALICE CALCOLO 2004ALICE CALCOLO 2004
Period(milestone)
Fraction of the final capacity (%)
Physics Objective
06/01-12/01 1% pp studies, reconstruction of TPC and ITS
06/02-12/02 5%
• First test of the complete chain from simulation to reconstruction for the PPR
• Simple analysis tools• Digits in ROOT format
01/04-06/04 10%
• Complete chain used for trigger studies• Prototype of the analysis tools• Comparison with parameterised MonteCarlo• Simulated raw data
05/05-07/05 TBD• Refinement of jet studies• Test of new infrastructure and MW• TBD
01/06-06/06 20%• Test of the final system for reconstruction
and analysis
ALICE Physics Data Challenges
NEW NEW
M.MaseraM.Masera ALICE CALCOLO 2004ALICE CALCOLO 2004
ALICE Offline Timeline
2004 2005 2006
ALICE PDC04
Analysis PDC04Design of new components
Developmentof new components
Pre-challenge ‘06
PDC06preparation
PDC06
Final developmentof AliRoot
First data takingpreparation
PDC04 PDC06 AliRoot ready PDC06 AliRoot ready
M.MaseraM.Masera ALICE CALCOLO 2004ALICE CALCOLO 2004
Conclusions• Several problems and difficulties… However our DC is progressing and Phase I
is concluded• The DC is completely carried out on the Grid• AliEn
– Tools OK for DC running and resources control– Feedback from the CE and WN proved to be essential for early spotting of problems– Centralized and compact master services allow for fast upgrades– DM was working just fine (providing that underlying MSS systems work well)– File catalogue works great, 4M entries and no noticeable performance degradation
• AliEn as meta-grid works well, across three grids, and this is a success in itsellf• The INFN contribution to the DC and to the grid activities of the experiment is
relevant. – >40% of CPU cycles provided by INFN sites. The efficiency was very high and the
cooperation of the site managers was prompt.– The interface between AliEn and LCG/Grit.It has been developed in Italy
• We are going to use LCG SE for phase II…• Possible bottle-neck for Phase II: lack of local storage resources• Analysis:
– AliEn job splitting– We hope to test the first ARDA prototype in Fall