CASTOR CNAF TIER1 SITE REPORT Geneve CERN 13-14 June 2005 Ricci Pier Paolo

CASTORCNAF TIER1 SITE REPORT

Geneve CERN 13-14 June 2005Ricci Pier Paolo [email protected]

[email protected]

mailto:[email protected]


13-14 June 2005 Geneve CERN 2

TIER1 CNAF PRESENTATION

Hardware and software status of our CASTOR installation and management toolsUsage from LHC experiment of our installationComments


MENPOWERAt present there are 4 people at TIER1 CNAF involved in administering our CASTOR installations and front-ends:Ricci Pier Paolo (50% also activity in SAN/NAS HA disk storage management and test, Oracle adm) [email protected] Giuseppe (50% also activity in ALICE exp. as Tier1 reference, SAN HA disk storage management and test, managing Grid frontend to our resources) [email protected] Elisabetta (50% involved in Oracle and RLS development and adm. and SAN disk storage management and test) [email protected] we have 1 CNAF FTE working with the development team at CERN (started March 2005)Lopresti Giuseppe [email protected]






HARDWARE STATUS

At present our CASTOR (1.7.1.5) system is:

1 STK L5500 SILOS partitioned with 2 form-factor slots

About 2000 slots LTO-2 form

About 3500 slots 9940B form

6 LTO-2 DRIVES with 2Gb/s FC interface

2 9940B DRIVES with 2Gb/s FC interface

2 more have been just acquired END OF JUNE INSTALLED

Sun Blade v100 with 2 internal ide disks with software raid-0 running ACSLS 7.0

1300 LTO-2 Imation TAPES

650 9940B Imation TAPES


HARDWARE STATUS (2)

8 Tapeservers, 1U Supermicro 3 GHz 2GB with 1 Qlogic 2300 F.C. HBA, STK CSC Development Toolkit provided by CERN (with licence agreement with STK) ssi,tpdaemon and rtcpd.

The 8 tapeservers are direct connected direcly with the FC drive output: DRIVE LTO-2 0,0,10,0 -> tapesrv-0.cnaf.infn.it DRIVE LTO-2 0,0,10,1 -> tapesrv-1.cnaf.infn.it DRIVE LTO-2 0,0,10,2 -> tapesrv-2.cnaf.infn.it DRIVE LTO-2 0,0,10,3 -> tapesrv-3.cnaf.infn.it DRIVE LTO-2 0,0,10,4 -> tapesrv-4.cnaf.infn.it DRIVE LTO-2 0,0,10,5 -> tapesrv-5.cnaf.infn.itDRIVE 9940B 0,0,10,6 -> tapesrv-6.cnaf.infn.it DRIVE 9940B 0,0,10,7 -> tapesrv-7.cnaf.infn.it 2 MORE WILL BE INSTALLED SOON (tapesrv-8 tapesrv-9) with the 2 new 9940BUSING THE 9940B have drastically reduced the error rate (we report

only one 9940 tape marker RDONLY due to SCSI error and NEVER had “hanged” DRIVES in 6 months of activity).


HARDWARE STATUS (3)

castor.cnaf.infn.it Central Machine 1 IBM x345 2U machine 2x3GHz Intel Xeon, raid1 with double power supply

O.S. Red Hat A.S. 3.0 Machine running all central CASTOR 1.7.1.5 services (Nsdaemon, vmgrdaemon, Cupvdaemon, vdqmdaemon, msgdaemon) and the ORACLE client for the central database

castor-4.cnaf.infn.it ORACLE Machine 1 IBM x345 O.S. Red Hat A.S. 3.0 Machine running ORACLE DATABASE 9.i rel 22 more x345 machines are in standby and are used for storing all

the backup information of the ORACLE db (.exp .dbf) and can be used for replacing the above machines if needed...

castor-1.cnaf.infn.it Monitoring Machine 1 DELL 1650 R.H 7.2 Machine running monitoring CASTOR service (Cmon daemon) NAGIOS central service for monitoring and notification. Also contains the command rtstat e tpstat that are usually runned with the –S option over the tapeserver


HARDWARE STATUS (4)

Stagers with diskserver: 1U Supermicro 3 GHz 2GB with 1 Qlogic 2300 F.C. HBA accessing our SAN and runnig Cdbdaemon, stgdaemon end rfiod. 1 STAGER to EACH LHC Experiment

disksrv-1.cnaf.infn.it ATLAS stager with 2TB locallydisksrv-2.cnaf.infn.it CMS stager with 3.2TB locallydisksrv-3.cnaf.infn.it LHCB stager with 3.2TB locallydisksrv-4.cnaf.infn.it ALICE stager with 3.2TB locallydisksrv-5.cnaf.infn.it TEST and PAMELA stager disksrv-6.cnaf.infn.it stager with 2TB locally (archive purpose LVD,alice

TOF,CDF,VIRGO,AMS,BABAR,ARGO and other HEP experiment...)Diskservers: 1U Supermicro 3 GHz 2GB with 1 Qlogic 2300 F.C.

HBA accessing our SAN and runnig rfiod. Red Hat 3.0 Cluster has been tested but not used in production for the rfiod.


HARDWARE STATUS (5)

Storage Element front-end for CASTOR

castorgrid.cr.cnaf.infn.it (DNS alias load balaced over 4 machines for WAN gridftp )

SRM v.1 is installed and in production in the above machines.


TIER1 INFN CNAF Storage

Linux SL 3.0 clients (100-1000 nodes)

WAN or TIER1 LAN

STK180 with 100 LTO-1 (10Tbyte Native)

STK L5500 robot (5500 slots) 6 IBM LTO-2, 2 (4) STK 9940B drives

PROCOM 3600 FC NAS2 9000 Gbyte

PROCOM 3600 FC NAS3 4700 Gbyte

NAS1,NAS43ware IDE SAS1800+3200 Gbyte

AXUS BROWIEAbout 2200 GByte 2 FC interface

2 Gadzoox Slingshot 4218 18 port FC Switch

STK BladeStoreAbout 25000 GByte 4 FC interfaces

Infortrend 4 x 3200 GByte SATA A16F-R1A2-M1

NFS-RFIO-GridFTP oth...

W2003 Server with LEGATO Networker (Backup)

CASTOR HSM serversH.A.

Diskservers with Qlogic FC HBA 2340IBM FastT900 (DS 4500) 3/4 x 50000 GByte 4 FC interfaces

2 Brocade Silkworm 3900 32 port FC Switch

Infortrend 5 x 6400 GByte SATA A16F-R1211-M2 + JBOD

SAN 2 (40TB)SAN 1 (200TB) + 200TB end of June

HSM (400 TB) NAS (20TB)

NFSRFIO


CASTOR HSMSTK L5500 2000+3500

6 drives LTO2 (20-30 MB/s)

2 drives 9940B (25-30 MB/s)

1300 LTO2 (200 GB native)

650 9940B (200 GB native)

TOTAL CAPACITY with 200GB

250 TB LTO-2 (400TB)

130 TB 9940B (700TB)

Sun Blade v100 with 2 internal ide disks with software raid-1 running ACSLS 7.0 OS Solaris 9.0 1 CASTOR (CERN)Central

Services server RH AS3.0

8 tapeserverLinux RH AS3.0HBA Qlogic 2300

6 stager with diskserver RH AS3.015 TB Local staging area

EXPERIMENT Staging area (TB)

Tape pool (TB native)

ALICE 8 12(LTO-2)ATLAS 6 20(MIXED)CMS 2 1(9940B)LHCb 18 30(LTO-2)BABAR,AMS+oth

2 4(9940B)

Point to Point FC 2Gb/s connections

1 ORACLE 9i rel 2 DB server RH AS 3.0

8 or more rfio diskservers RH AS 3.0 min 20TB staging area (variable)

SAN 1

WAN or TIER1 LAN

SAN 2Indicates Full rendundancy FC 2Gb/sconnections (dual controller HW and Qlogic SANsurfer Path Failover SW)


CASTOR Grid Storage Element

GridFTP access through the castorgrid SE, a dns cname pointing to 3 server.Dns round-robin for load balancing During LCG Service Challenge2 introduced also a load average selection: every M minutes the ip of the most loaded server is replaced in the cname (see graph)


NOTIICATION (Nagios)


LHCb CASTOR tape pool

# processes on a CMS disk SE

eth0 traffic through a

CASTOR LCG SE

MONITORING (Nagios)


DISK ACCOUNTINGPure disk space

(TB)CASTOR disk space

(TB)


CASTOR USAGE

The access to the castor system is 1) Grid using our SE frontends (from WAN)2) Rfio using castor rpm and rfio commands

installed on our WN and UI (from LAN)Only the 17% (65TB / 380TB) of the total HSM space was

effectively used by the experiments in a 1.5 years period because

1) As TIER1 storage we offer “pure” disk as primary storage over SAN (preferred by the experiments) (GSIftp,nfs,xrootd,bbftp,GPFS ….)

2) The lack of an optimization in parallel stage-in operation (pre-stage) and reliability/performance problem arisen in LTO-2 give in general very bad performance when reading from castor so experiments in general ask for “pure” disk resources (next year requests are NOT for tape HW).


COMMENTSAs said we have a lot of disk space to manage and no definitive

solution (xrootd, gpfs, dcache to be tested etc...)1) CASTOR have already an SRM interface working. Is CASTOR-2

enough reliable and scalable to manage pure diskpool spaces? We think that it should be conceived also for this use (dcache and xrootd).

2) The limits in the rfio protocol/new stager performance could seriusly limit the potential performance scalability in a pure CASTOR diskpool. (i.e. a single open() calls need to query many database). A single diskserver with rfiod can fulfil only a limited number of request. In our site we have a limited number of diskserver with a big amount of space each (10-15TB) and the limit of the rfiod caused access failure to jobs. (we use rfiod for DIRECT access to local filesystem outside castor i.e. CMS)

SOLUTION TO FAILURES=> Possibility to use swap memorySOLUTION TO PERFORMANCE=> More RAM? Other? rfio can be

modified to our site-specific use?


COMMENTS1) We need the authorization method in

CASTOR to be compatible also with LDAP not only on the password and group files.

2) Useful also include rfstage (or something similar in the official release?)

3) HA. We are planning to use the 2 stand-by machines as

HA for CASTOR central services and vdqm replicaOracle 9.i rel 2 stand-by database (dataguard) or RAC


CONCLUSION

Possible to have collaborations with other groups (in order to expand

the dev. team at CERN)TIER1 and LHC computing with IHEP

[email protected]

THANK YOU FOR THE ATTENTION!

Date post:	18-Jan-2018
Category:	Documents
Upload:	jordan-warren
View:	225 times
Download:	0 times

CASTOR CNAF TIER1 SITE REPORT Geneve CERN 13-14 June 2005 Ricci Pier Paolo

Documents