STEP’09: LAST CHALLENGE BEFORE DATA TAKINGPatricia Méndez Lorenzo (CERN, IT/GS)
ALICE Offline WeekGrid Status & Experience sessionCERN, 24/06/09
2
STEP'09: last challenge before data taking
OUTLOOK
CCRC’08: Reminder What is STEP’09: Origin STEP’09 for ATLAS, CMS and LHCb STEP’09 for ALICE
Goals and Results Summary and Conclusions
24/06/09
3
STEP'09: last challenge before data taking
WHAT IS STEP’09: CCRC’08 REMINDER
WLCG Common Computing Readiness Challenge 2008 (CCRC’08)
It was the first big WLCG Service Challenge which joined the 4 experiments together
Proposed by CMS and ATLAS during the pre-CHEP WLCG WS in Victoria (2007) Goal: measurement of the readiness of the Grid
services and operations before the real data taking
Complementary to the experimentsFull Dress Rehearsals
Distributed in two phases: Feb and May 2008
24/06/09
4
STEP'09: last challenge before data taking
ALICE RESULTS DURING THE CCRC’0824/06/09Slides taken from the
CCRC08 Post-Mortem WS at CERN (June 2008)
5
STEP'09: last challenge before data taking
WHAT IS STEP’09: ORIGIN
WLCG Scale Testing for the Experiment Program at the WLCG 2009: STEP’09
Proposed by CMS during the WLCG pre-CHEP WS in Prague (2009) Scheduled for June 2009
Similar scope to CCRC’08 with special emphasis to data management (data recording, MSS behaviour and transfers) STEP’09 post-mortem in July All experiments presented their programs during
WLCG GDB in April 2009
24/06/09
6
STEP'09: last challenge before data taking
STEP’09 FOR CMSTests ObjectivesT0 multi-VO tape recording at full rate
• Test writing to tape at T0 at full data taking rate and overlapping with other VOs• Sustained writing to tape for several days
T1 archiving and processing at requested scale
Stress tests of the MSS at scale with all concurrent T1 workflows (pre-staging specially relevant)
Transfer tests at requested scale
• Special emphasis in T1-T1 tests• Transfer tests can be easily run among any Tiers in parallel to other VOs to evaluate overlap (not needed by CMS)
Analysis at commissioned T2 at requested scale
CMS should have analysis at a scale that uses all pledged resources at T2
24/06/09
In common with ALICE
7
STEP'09: last challenge before data taking
STEP’09 FOR ATLASTests ObjectivesDDM Functional tests • Tests the full ATLAS data
placement model including tape (RAW) writing• ATLAS ready to create nominal load and file sizes• T0-T1 average rate: 940MB/s• Calibration data distribution also foreseen
Simulation production • G4 HITS Production• HITS production at T2 and upload to T1•HITS merging in T1 and archive on tape• MC reconstruction at T1 only•Pre-staging of merged HITS from tape•Output AOD’s merged to tape and distributed to other clouds
Repeat Cosmic Ray Data Re‐processing
RAW pre‐staging from Tape and data access from the WN’s
Run Hammer cloud in all clouds
• Loads CPU capacity in T2’s• Tests data access to the WN’s
24/06/09
In common with ALICE
8
STEP'09: last challenge before data taking
STEP’09 FOR LHCB
Participation in STEP’09 as part of their specific Full Experiment Test (FEST’09)
LHCb goals Data injection into the HLT farm
File size can be tuned Distribution to T1 sites
Using standard share Reconstruction at T1 sites
Long enough queues at the sites are needed Storage Requirements
3.5 TB/day for RAW (T1D0) at Tier0 < 1TB/day for RAW at Tier1s
24/06/09
In common with ALICE
9
STEP'09: last challenge before data taking
STEP’09 FOR ALICE
Grid activities Replication T0->T1
Planned together with Cosmics data taking, or Repeat the exercise of CCRC’08 with same rates
(100MB/s) and same destinations (All T1 sites) Re-processing with data recalls from tape at T1
Highly desirable exercise, data already available at the T1 MSS storage
Non-Grid activities Transfer rate tests from DAQ@PIT to CASTOR
Validation of the new CASTOR and xrootd for RAW Critically dependent on the availability of CASTOR
v2.1.8 Transfer rate test coupled with the 1st pass reco@T0
24/06/09
10
STEP'09: last challenge before data taking
ALICE NON-GRID ACTIVITIES
RAW data transfers from PIT to CASTOR Basically validated The goal was 1.25GB/sec for one week (just finished) DAQ managed to fill the entire alicedisk pool (850TB)
Validation and feedback of the CASTOR v2.1.8 and xrootd Very positive results the xrootd copy P2->Disk is basically validated second part is disk->tape copy (to a recyclabe pool of
tapes) with the same speed of 1.25GB/sec (this is Pb+Pb full rate) Activity still ongoing
Pass 1 reconstruction of RAW data at the T0 Still pending
24/06/09
11
STEP'09: last challenge before data taking
ALICE GRID ACTIVITIES: RESULTS
ALICE began the STEP09 exercise the 1st of June and finished it the 18th of June
Production results New record of 15000 concurrent jobs by the 1st
of June
24/06/09
New MC cycle
12
STEP'09: last challenge before data taking
PROBLEMS FACED BY ALICE: PRODUCTION
Instabilities with the CREAM-CE system at CERN The system has faced instabilities for some days Fully affecting the production by the 17th of June
Both CREAM-CE services down This morning the system came back in production
A power cut by the 18th of June voalice03 (CREAM VOBOX) could not be recovered In addition the VOBOXES will be out of warranty at the end of the
year 4 VOBOXES have been required (2 production, 2 backup)
New site entered production: CESGA (Santiago de Compostela, Spain) 800 jobs submitted for 29 CPUs Site was reporting 0 jobs running/waiting through VOview ALICE has changed the query to the info system based in
VOview
24/06/09
13
STEP'09: last challenge before data taking
ALICE FTS TRANSFERS
General result: Very successfull exercise during the whole STEP09 period New FTD module in production
During the whole period the 6 T1 sites were available with few issues always solved in the day
Very good support of the FTS experts during the whole period
24/06/09
ALICE requirement
14
STEP'09: last challenge before data taking
PROBLEMS FACED BY ALICE: TRANSFERS
Pre-staging on files: MEETING WITH FIO STILL PENDING The operation takes forever New files have to be created instead of pre-
staging those already existing Asked CMS and LHCb for their own procedures
CMS has implemented a Phedex utility at the client level for CASTOR sites able to make the pre-staging. Comparisons between methods using SRM APIS,
Manual pre-staging and also the same Phedex The staging speed in the 3 cases is comparable and CMS used the STEP09 exercise to define the best way
to define the pre-staging LHCb is using GFAL libs to make an asyn. pre-staging of
the files
24/06/09
15
STEP'09: last challenge before data taking
PROBLEMS FACED BY ALICE: TRANSFERS
Files overwritting: SOLVED This procedure would allow to perform a previous
removal of the already transferred file ALICE implemented correctly the corresponding
option however still failing FTS experts involved in the discussion:
the 'overwrite' flag is properly passed to the FTS agent, however it selects SRMv1 endpoint instead of SRMv2.2
While checking the details ALICE should chose the qualified SURL to ensure the usage of SRMv2.2.
24/06/09
16
STEP'09: last challenge before data taking
PROBLEMS FACED BY ALICE: TRANSFERS
Issues per site NDGF using a wrong SURL while tranferring files:
SOLVED RAL: Permission denied to write in the
corresponding SE area (twice): SOLVED SARA: No space available (twice): SOLVED FZK: gridFTP issue. There was a problem of
dcache pools beeing filled up, and also a gpfs problem of not correctly reporting space: SOLVED
This week CERN: Transfers stucked for more than 60h. Still under investigation It seems some sites do not allow concurrent transfers
24/06/09
17
STEP'09: last challenge before data taking
SUMMARY AND CONCLUSIONS
STEP’09 has been the 2nd multi-VO exercise before the real data taking
Proposed by CMS during the pre-CHEP WS in Prague
ALICE emphasize the testing of the Data Management elements of the computing model Key elements for the 4 LHC experiments
ALICE results: Very good behaviour in terms of production, MSS@T1 and FTS transfers
The 4 LHC experiments will present their results during the new STEP’09 post-mortem WS in July at CERN (9-10 July)
24/06/09