+ All Categories
Home > Documents > WLCG ‘Weekly’ Service Report [email protected] ~~~ WLCG Management Board, 22 th July 2008.

WLCG ‘Weekly’ Service Report [email protected] ~~~ WLCG Management Board, 22 th July 2008.

Date post: 25-Dec-2015
Category:
Upload: brianna-dorsey
View: 214 times
Download: 2 times
Share this document with a friend
10
WLCG ‘Weekly’ Service Report [email protected] [email protected] ~~~ WLCG Management Board, 22 th July 2008
Transcript
Page 1: WLCG ‘Weekly’ Service Report Harry.Renshall@cern.ch ~~~ WLCG Management Board, 22 th July 2008.

WLCG ‘Weekly’ Service Report

[email protected] [email protected] ~~~

WLCG Management Board, 22th July 2008

Page 2: WLCG ‘Weekly’ Service Report Harry.Renshall@cern.ch ~~~ WLCG Management Board, 22 th July 2008.

Introduction

• This ‘weekly’ report covers two weeks (MB summer schedule)

• Last week (7 to 12):• Tuesday: MB F2F• Wednesday: GDB, C-RSG• Friday: OB

• This week(14 to 20):• Monday: CMS CRUZET3 cosmic ray run finished

• Notes from the daily meetings can be found from:• https://

twiki.cern.ch/twiki/bin/view/LCG/WLCGOperationsMeetings

• (Some additional info from CERN C5 reports & other sources)

2

Page 3: WLCG ‘Weekly’ Service Report Harry.Renshall@cern.ch ~~~ WLCG Management Board, 22 th July 2008.

C-RSGAll reviewers have had one or more meetings with their experiments and are filling

in a common (but adaptable) template leading to 2009 resource requirements.

Executive summary per experiment:ALICE: Have had email exchanges and a teleconference and ALICE have completed their

template but follow up is needed.ATLAS: Only one reviewer was available. First iteration of template done but only partially

completed. The second reviewer is now active.CMS: Template fully complete (matches immediately the CMS computing model). HI running

is not being reviewed at this time (separately funded outside of CERN).LHCb: Full information given to enable template to be adapted/completed.

The group notes they will have to renormalise the resulting experiment numbers to a common set of assumptions on the LHC running conditions.

The planning is to report on the scrutiny of the validity of the 2009 resource requests in August. The CSO has agreed that these can already be made public though more detail may be added for the November C-RRB. In future years there may be a C-RRB in the summer to review the Scrutiny Group reports for the following year given the need to start hardware procurements well in advance of need.

The group also had a report on the results of the Common Computing Readiness challenge at its fourth meeting.

The group will meet in August to finalise the 2009 reports then decide the date for one or more Autumn meetings when they see how the LHC is performing bearing in mind that they have to finally report to the C-RRB meeting of the 11 November.

Page 4: WLCG ‘Weekly’ Service Report Harry.Renshall@cern.ch ~~~ WLCG Management Board, 22 th July 2008.

OB

• The OB heard an LCG project status report from I.Bird, a CCRC’08 post-mortem report from myself including a SWOT analysis and a report on procedural progress of the C-RSG from myself.

• The weaknesses are seen as:• Some of the services – including but not limited to storage / data

management – are still not sufficiently robust.• Communication is still an issue / concern. This requires work / attention

from everybody – it is not a one-way flow.• Not all activities (e.g. reprocessing, chaotic end-user analysis) were

fully demonstrated even in May, nor was there sufficient overlap between all experiments (and all activities).

• The main Threat perceived by the WLCG management is that of falling back from reliable service mode into “fire-fighting” at the first sign of serious problems.

• However, a consistent message is being given that experiments, sites and WLCG are ‘more or less’ ready for the expected 2008 data taking although constant attention will be needed at all levels.

Page 5: WLCG ‘Weekly’ Service Report Harry.Renshall@cern.ch ~~~ WLCG Management Board, 22 th July 2008.

Site Reports (1/2)

• CNAF: • 10 July submitted post-mortem on recent power and

network switch problems. Full services reported running by 19 July.

• BNL:• 7 July primary link to TRIUMF failed due to outage in

Seattle area and failover to secondary via CERN OPN did not come up. Workaround by turning off primary interface at BNL or TRIUMF but proper solution still being worked on.

• 9 July storage server network connection failure took some time to solve changing various components. Left some ATLAS files inaccessible.

• 14 July inaccessible file problem understood and put down to a problem introduced by dcache patch level 8. Files which for some reason failed to transfer out of BNL were pinned by dcache. SRM transfer first tries to pin files and gives up when it cannot. Other access methods work. Workaround is to periodically look for such pinned files and unpin them. No long term solution yet. Sites alerted but probably now being seen in IN2P3 after P8 upgrade.

Page 6: WLCG ‘Weekly’ Service Report Harry.Renshall@cern.ch ~~~ WLCG Management Board, 22 th July 2008.

Site Reports (2/2)

• FZK:• 19 July at about 19:20 a major network router failed.

Almost all services were affected. Some services were up again on Sunday but some are still degraded or unavailable (as of 13:00 Monday). In particular, some dCache pool nodes are not yet available. We are working on it and a post-mortem analysis will follow.

• General: • 17 July GGUS conducted first service verification of

the Tier 1 site operator alarm ticket procedure. Failures of the procedure at NDGF and CERN are understood and being fixed.

Page 7: WLCG ‘Weekly’ Service Report Harry.Renshall@cern.ch ~~~ WLCG Management Board, 22 th July 2008.

Experiment reports (1/3)

• LHCB:• DC06 simulation is running smoothly under Dirac3

but reconstruction and stripping tests are still ongoing so there is no official date yet for the start of DC06.

• ALICE:• Production hit by myproxy problems – see PM at

https://twiki.cern.ch/twiki/bin/view/FIOgroup/ScLCGPxOperations

• Working on integration of CREAM-CE with ALIEN.

Page 8: WLCG ‘Weekly’ Service Report Harry.Renshall@cern.ch ~~~ WLCG Management Board, 22 th July 2008.

Experiment reports (2/3)

• CMS:• CRUZET 3 cosmics run from 7 to 14 July. Quite good

experience, more mature in terms of data handling in general. Reconstruction submissions to all Tier 1 ongoing. Preparing for next global cosmic exercise for the second half of August but expect cosmics data tests weekly Wed+Thur.

• Work finalized on the P5->CERN transfer system, a repacker replay is now running (since July 17th), namely redoing the repack for CRUZET-3 data. Plans: Next monday CMS will start more replays with some T0 real prompt reco testing.

• CMS would expect a centrally-triggered big transfer load of many CSA07 MC datasets to CMS T2's, as a needed step in order to complete the migration of the user analysis to T2 sites. Each T2 should expect to be asked to host a fraction of ~30 TB of those datasets.

• CMS have a CASTOR directory of 2.3 * 10**6 files of 160KB which are webcam dumps and have gone to tape. They are looking at deleting them and stopping fresh ones.

Page 9: WLCG ‘Weekly’ Service Report Harry.Renshall@cern.ch ~~~ WLCG Management Board, 22 th July 2008.

Experiment reports (3/3)

• ATLAS:• CERN CASTORATLAS upgraded to 2.1.7-10 on 14 July

to avoid a fatal data size overflow problem.• ATLAS taking cosmics with test triggers resulted in

some very large datasets being (successfully) distributed to BNL.

• ATLAS now running cosmics at weekends and on 20 July ATLAS CERN site services stuck but resulting T0 to T1 catchup when services restarted Monday morning reached an impressive 2.5 GB/sec. Clearly more process monitoring alarms are needed.

• ATLAS workflow management bookeeping needs process level access to their elog instance (via an elog api call) and this is about to be made available after a security analysis.

Page 10: WLCG ‘Weekly’ Service Report Harry.Renshall@cern.ch ~~~ WLCG Management Board, 22 th July 2008.

Summary

10

Solid progress on many fronts


Recommended