+ All Categories
Home > Documents > CCRC’08 Review from a DM perspective

CCRC’08 Review from a DM perspective

Date post: 22-Feb-2016
Category:
Upload: etoile
View: 31 times
Download: 0 times
Share this document with a friend
Description:
CCRC’08 Review from a DM perspective. Alberto Pace (With slides from T.Bell , F.Donno , D.Duelmann , M.Kasemann , J.Shiers , …). Before the main topic. Safety reminder The computer center has different safety requirements than normal offices - PowerPoint PPT Presentation
Popular Tags:
25
CERN IT Department CH-1211 Genève 23 Switzerland www.cern.ch/ CCRC’08 Review from a DM perspective Alberto Pace (With slides from T.Bell, F.Donno, D.Duelmann, M.Kasemann, J.Shiers, …)
Transcript
Page 1: CCRC’08 Review  from a DM perspective

CERN IT Department

CH-1211 Genève 23

Switzerlandwww.cern.ch/

it

CCRC’08 Review from a DM perspective

Alberto Pace(With slides from T.Bell, F.Donno,

D.Duelmann, M.Kasemann, J.Shiers, …)

Page 2: CCRC’08 Review  from a DM perspective

CERN IT Department

CH-1211 Genève 23

Switzerlandwww.cern.ch/

it

Presentation title - 2

Before the main topic

• Safety reminder– The computer center has different safety

requirements than normal offices– This is why authorization is needed to enter !– This is why there are safety courses !– Noise above level acceptable for long term work– Wind above level acceptable for long term work– False Floor – 1 meter deep !– No differential power switch !!

• In case of accident call the fire brigade

Page 3: CCRC’08 Review  from a DM perspective

CERN IT Department

CH-1211 Genève 23

Switzerlandwww.cern.ch/

it

CCRC’08

• Wiki site– https://twiki.cern.ch/twiki/bin/view/LCG/WLCGCo

mmonComputingReadinessChallenges• Ongoing challenge with all 4 experiments

Presentation title - 3

Page 4: CCRC’08 Review  from a DM perspective

CERN IT Department

CH-1211 Genève 23

Switzerlandwww.cern.ch/

it

Online and offline databases

Page 5: CCRC’08 Review  from a DM perspective

CERN IT Department

CH-1211 Genève 23

Switzerlandwww.cern.ch/

it

CPU Usage ATLAS/CMS DBs

Page 6: CCRC’08 Review  from a DM perspective

CERN IT Department

CH-1211 Genève 23

Switzerlandwww.cern.ch/

it

Physical Reads

Page 7: CCRC’08 Review  from a DM perspective

CERN IT Department

CH-1211 Genève 23

Switzerlandwww.cern.ch/

it

Network traffic

Page 8: CCRC’08 Review  from a DM perspective

CERN IT Department

CH-1211 Genève 23

Switzerlandwww.cern.ch/

it

DB service - some observations

• In general: DB load still dominated by activities that did not scale-up significantly during CCRC

– load changes by CCRC on monitoring, work-flow, production systems smaller than eg fluctuations between software releases

– major contribution scaling with reconstruction jobs not yet visible at CERN and Tier 1 sites

• Exception: ATLAS reprocessing at BNL, TRIUMF and NDGF– increased dCache load on to Calibration files (POOL) introduced

bottleneck– Consequence: extremely long (idle) database connections on

conditions database • CORAL failover between T1 sites worked • Increased DB session limits, session sniping added, dCache

pool for calibration files added• DB service run smoothly and without major disruptions

– As usual several node reboots• minor impact thanks to cluster architecture

– 2h streams intervention (downstream capture) was scheduled in agreement with experiments and service coordination during CCRC

Page 9: CCRC’08 Review  from a DM perspective

CERN IT Department

CH-1211 Genève 23

Switzerlandwww.cern.ch/

it

Castor and Grid Data Management

Page 10: CCRC’08 Review  from a DM perspective

CERN IT Department

CH-1211 Genève 23

Switzerlandwww.cern.ch/

it

Tier-0 to Tier-1 Exports

Page 11: CCRC’08 Review  from a DM perspective

CERN IT Department

CH-1211 Genève 23

Switzerlandwww.cern.ch/

it

February Summary

http://gridview.cern.ch/

Page 12: CCRC’08 Review  from a DM perspective

CERN IT Department

CH-1211 Genève 23

Switzerlandwww.cern.ch/

it

Not limited by Castor

Page 13: CCRC’08 Review  from a DM perspective

CERN IT Department

CH-1211 Genève 23

Switzerlandwww.cern.ch/

it

Successful Stage-in test

Page 14: CCRC’08 Review  from a DM perspective

CERN IT Department

CH-1211 Genève 23

Switzerlandwww.cern.ch/

it

SRM – 2 ... Working

Page 15: CCRC’08 Review  from a DM perspective

CERN IT Department

CH-1211 Genève 23

Switzerlandwww.cern.ch/

it

TAPE issues

Page 16: CCRC’08 Review  from a DM perspective

CERN IT Department

CH-1211 Genève 23

Switzerlandwww.cern.ch/

it

Total performance to tape

• Alice and LHCb running Castor 2.1.4 without policies so around 100% improvement in write performance expected with 2.1.6

• With simulated file sizes, Atlas data rates have improved to 30MB/s writing• Focus on file size and policies has shown some improvements in write

performance• Read efficiency remains low and dominates drive utilisation due to low number

of files read per mount and non-production users

Page 17: CCRC’08 Review  from a DM perspective

CERN IT Department

CH-1211 Genève 23

Switzerlandwww.cern.ch/

it

Tape usage read dominated

• Random read dominates drive time (90% reading)• Writing under control of Castor policies • Reading much more difficult to improve from the Castor side

alice atlas cms lhcb0

1000

2000

3000

4000

5000

6000

7000

Mounts per Day during CCRC

ReadWrite

Mou

nts

per D

ay

Page 18: CCRC’08 Review  from a DM perspective

CERN IT Department

CH-1211 Genève 23

Switzerlandwww.cern.ch/

it

Production vs Users

• Data retrieved for CCRC period for CMS• CMS production is under cmsprod and phedex (25% total)• Requests for tape recalls dominated by non-production• Equivalent data for Atlas shows production requests < 5%

cmsprod phedex cms165 cms064 cms124 cms067 Others0

1000

2000

3000

4000

5000

6000

7000

8000

9000

10000

Offline Requests for CMS during Feb CCRCRe

ques

ts p

er d

ay fo

r offl

ine

files

Page 19: CCRC’08 Review  from a DM perspective

CERN IT Department

CH-1211 Genève 23

Switzerlandwww.cern.ch/

it

Options

• Do nothing– Hope things work out OK

• Tape prioritization in Castor– complete minimum implementation of VDQM2

and tape queue prioritization– A new long term strategy may be necessary

• Dedicate resources– Fragmentation risks

• Hardware investment– Purchase 50 tape drives and servers– Cost is 15K CHF/drive and 6K CHF/tape server,

total 1050 kCHF

Page 20: CCRC’08 Review  from a DM perspective

CERN IT Department

CH-1211 Genève 23

Switzerlandwww.cern.ch/

it

Problems reported

Page 21: CCRC’08 Review  from a DM perspective

CERN IT Department

CH-1211 Genève 23

Switzerlandwww.cern.ch/

it

Castor

• Invalid checksum value returned by the CASTOR gridftp2 server (reported by CMS on 05/02)• FIXED in 1.13-11 (07/02)

• Gsiftp TURLs returned by CASTOR are relative (reported by S2 and CMS on 06/02)

• FIXED in 1.13-11 (07/02)• Unable to map request to space for policy TRANSFER_WAN (reported by

CMS on 07/02)• FIXED in 1.13-13 (08/02)

• The srmDaemon attempts to free an unallocated pointer and crashes (reported by CNAF)

• FIXED in 1.13-14 (14/02)• Some of the database at CERN have shown an index to be missing (found

by S2). • FIXED in 1.3.10-1 (15/02)

• Insufficient user privileges to make a request of type StagePutDoneRequest in service class 'atldata' (reported by S2 and ATLAS on 19/02)☺ PutDone executed by and allowed for (root,root) To be fixed Workaround provided on 23/02

Page 22: CCRC’08 Review  from a DM perspective

CERN IT Department

CH-1211 Genève 23

Switzerlandwww.cern.ch/

it

Castor

• Missing access control on spaces based on voms groups and roles (reported by ATLAS/LHCb on 19/02).

Followed by Storage Solution WG• Could not get user information: VOMS credential ops does not match grid

mapping dteam (reported by S2 and CNAF on 21/02)☹ Not yet understood

• Error creating statement, Oracle code: 12154 ORA-12154: TNS:could not resolve the connect identifier specified (reported by S2 and CNAF on 12/02)

• Not yet understood☞ It happens at service startup. A restart cures the problem

• Server unresponsive at RAL? - Space token ATLASDATADISK does not exist (reported by S2 and ATLAS on 28/02)

Number of threads increased from 100 to 150 (28/2)

Page 23: CCRC’08 Review  from a DM perspective

CERN IT Department

CH-1211 Genève 23

Switzerlandwww.cern.ch/

it

Castor Summary

• 10 software problems reported, no major problems• 6 problems fixed (in 2-3 days average)• Developers and operation people very responsive.

Page 24: CCRC’08 Review  from a DM perspective

CERN IT Department

CH-1211 Genève 23

Switzerlandwww.cern.ch/

it

DPM

• Default ACLs on directories do not work (reported by ATLAS on 13/02)• FIXED in 1.6.7-4 (certified)

• Slow file removal (reported by ATLAS on 22/02):• ext3 filesystems much slower than xfs for delete operations

(2048 files of 1.5GB removed in 90minutes against 5 seconds of xfs – tests performed on the 25/02)

• DPM 1.6.10 is being certified and will be the release available for CCRC08 in May.

Page 25: CCRC’08 Review  from a DM perspective

CERN IT Department

CH-1211 Genève 23

Switzerlandwww.cern.ch/

it

Conclusion

• CCRC ’08 is a success so far• All DM software and tools has been able to scale to

the challenge and beyond• All is well under control in both the database and

data management areas• Remains strategic directions where investigations

and major improvements or simplifications need discussion:– Improve efficiency for analysis – Tape area in general– Service for online database, piquet service for support– Synergies between DM tools and Castor– Job scheduling in Castor, improve/common database

schema for Grid DM tools and Castor – ...


Recommended