+ All Categories
Home > Documents > INFN-T1 site report

INFN-T1 site report

Date post: 24-Jan-2016
Category:
Upload: yered
View: 37 times
Download: 0 times
Share this document with a friend
Description:
INFN-T1 site report. Andrea Chierici On behalf of INFN-T1 staff 28 th October 2009. Overview. Infrastructure Network Farming Storage. Infrastructure. Tier1 2005 vs Tier1 2009. UPS up to 3,8 MW. 1.4 MW. 1.4 MW. 1 MW. 15000 V. 1.2 MW. Chillers. - PowerPoint PPT Presentation
Popular Tags:
27
1 INFN-T1 site report Andrea Chierici On behalf of INFN-T1 staff 28 th October 2009
Transcript
Page 1: INFN-T1 site report

1

INFN-T1 site report

Andrea Chierici

On behalf of INFN-T1 staff

28th October 2009

Page 2: INFN-T1 site report

Overview

Infrastructure Network Farming Storage

2

Page 3: INFN-T1 site report

Infrastructure

3

Page 4: INFN-T1 site report

4

INFN-T1 2005 INFN-T1 2009

Racks 40 120

Power source University Directly from supplier (15kV)

Power Transformer

1 (~1MVA) 3 (~2.5MVA)

UPS 1 diesel engine/UPS (~640kVA)

2 Rotary UPS (~3400kVA) + 1 diesel engine (~640kVA)

Chiller 1 (~530kVA) 7 (~2740kVA)

Page 5: INFN-T1 site report

5

UPS up to 3,8 MW

15000 V

1.4 MW

1 MW

Chillers

1.4 MW

1.2 MW

Page 6: INFN-T1 site report

Mechanical and electrical surveillance

Page 7: INFN-T1 site report

Network

7

Page 8: INFN-T1 site report

INFN CNAF TIER1 Network

7600

GARR

2x10Gb/s

10Gb

/sExtermeBD10808

4x10Gb

/s

10Gb/s

LHC-OPNdedicated link 10Gb/s

T0-T1 (CERN)T1-T1 (PIC,RAL,TRIUMPH)

•T1-T1’s (BNL,FNAL,TW-ASGC,NDGF)•T1-T2’s

•CNAF General purpose

ExtermeBD8810

Worker Nodes

Worker Nodes

2x1Gb/s

2x1Gb/s

Extreme Summit450

Extreme Summit450

4x1Gb

/s

Extreme Summit450

Worker Nodes

4x1Gb/s

2x10

Gb/

s

Extreme Summit400

Storage Servers•Disk Servers

•Castor StagersFiber Channel

Storage Devices

SANExtreme

Summit400

In Case of network Congestion: Uplink upgrade from 4 x 1Gb/s

to 10 Gb/s or 2x10Gb/s

FC director

LHC-OPNCNAF-KIT

CNAF-IN2P3CNAF-SARA

T0-T1 BACKUP 10Gb/s

WAN

RALPIC

TRIUMPH

CiscoNEXUS 7000

Page 9: INFN-T1 site report

Farming

9

Page 10: INFN-T1 site report

New tender

1U Twin solution with these specs: 2 Intel Nehalem E5520 @2.26GHz 24GB RAM 2x 320 GB SATA HD @7200 rpm, 2x 1Gbps Ethernet

118 twin, reaching 20500 HEP-SPEC, measured on SLC44

Delivery and installation foreseen within 2009

10

Page 11: INFN-T1 site report

Computing resources

Including machines from new tender, INFN-T1 computing power will reach 42000 HEP-SPEC within 2009

Further increase within January 2010 will bring us to 46000 HEP-SPEC

Within may 2010, we will reach 68000 HEP-SPEC (as we pledged to WLCG) This basically will triple current computing power

11

Page 12: INFN-T1 site report

Resource usage per VO

12

Page 13: INFN-T1 site report

KSI2K pledged vs used

13

Page 14: INFN-T1 site report

New accounting system

Grid, local and overall job visualization Tier1/Tier2 separation Several parameters monitored

avg and max RSS, avg and max Vmem added in latest release

KSI2K/HEP-SPEC accounting WNoD accounting Available at: http://tier1.cnaf.infn.it/monitor Feedback welcome to: [email protected]

14

Page 15: INFN-T1 site report

New accounting: sample picture

15

Page 16: INFN-T1 site report

GPU Computing (1)

We are investigating GPU computingNVIDIA Tesla C1060, used for porting

software and performing comparison testshttps://agenda.cnaf.infn.it/conferenceDisplay.

py?confId=266, meeting with Bill Dally (chief scientist and vice president of NVIDIA).

16

Page 17: INFN-T1 site report

GPU Computing (2)

Applications currently tested: Bioinformatics: CUDA-based paralog filtering in

Expressed Sequence Tag clusters Physics: Implementing a second order

electromagnetic particle in cell code on the CUDA architecture

Physics: Spin-Glass Monte Carlo Simulations First two apps showed more than 10x

increase in performance!!

17

Page 18: INFN-T1 site report

GPU Computing (3)

We plan to buy 2 more workstations in 2010, with 2 GPU each. We wait for the FERMI architecture, foreseen for

spring 2010 We will continue the activities currently

ongoing and will probably test some monte carlo simulations for superB

We plan to test selection and shared usage of GPUs via grid

18

Page 19: INFN-T1 site report

Storage

19

Page 20: INFN-T1 site report

2009-2010 tenders

Disk tender requested Baseline: 3.3 PB raw (~ 2.7 PB-N)

1st option: 2.35 PB raw (~ 1.9 PB-N) 2nd option: 2 PB raw (~ 1.6 PB-N) Options to be requested during Q2 and Q3 2010

New disk in production ~ end of Q1 2010 4000 tapes (~ 4 PB) acquired with library

tender4.9 PB needed beginning of 20107.7 PB probably needed by half 2010

Page 21: INFN-T1 site report

21

Castor@INFN-T1 To be upgraded to 2.1.7-27 1 Srm v 2.2 end-points available

Supported protocols: rfio, gridftp Still cumbersome to manage

requires frequent intervention in the Oracle db Lack of management tools

CMS migrated to StoRM for D0T1

Page 22: INFN-T1 site report

22

WLCG Storage Classes at INFN-T1 today Storage Class – offer different levels of storage

quality (e.g. copy on disk and/or on tape) DnTm = n copies on disk and m copies on tape

Implementation of 3 Storage Classes needed for WLCG (but usable also by non-LHC experiments) Disk0-Tape1 (D0T1) or “custodial nearline”

Data migrated to tapes and deleted from disk whenstaging area full

Space managed by system Disk is only a temporary buffer

Disk1-Tape0 (D1T0) “replica online” Data kept on disk: no tape copy Space managed by VO

Disk1-Tape1 (D1T1) “custodial online” Data kept on disk AND one copy kept on tape Space managed by VO (i.e. if disk is full, copy fails)

CurrentlyCASTOR

CurrentlyGPFS/TSM

+ StoRM

Page 23: INFN-T1 site report

23

YAMSS: present status Yet Another Mass Storage System

Scripting and configuration layer to interface GPFS&TSM Can work driven by StoRM or stand-alone

Experiments not using the SRM model can work with it GPFS-TSM (no StoRM) interface ready

Full support for migrations and tape ordered recalls StoRM

StoRM in production at INFN-T1 and in other centres around the world for “pure” disk access (i.e. no tape)

integration with YAMSS for migrations and tape ordered recalls ongoing (almost completed)

Bulk migrations and recalls tested with a typical use case (stand-alone YAMSS, without StoRM) Weekly production workflow of the CMS experiment

Page 24: INFN-T1 site report

24

Why GPFS&TSM Tivoli Storage Manager (developed by IBM) is a

tape oriented storage manager widely used (also in HEP world, e.g. FZK) Built-in functionality present in both products to

implement backup and archiving from GPFS. The development of a HSM solution is based on

the combination of features of GPFS (since v.3.2) and TSM (since v.5.5). Since GPFS v.3.2 the new concept of “external

storage pool” extends use of policy driven Information Lifecycle Management (ILM) to tape storage.

External pools are real interfaces to external storage managers, e.g. HPSS or TSM HPSS very complex (no benefits in this sense

compared to CASTOR)

Page 25: INFN-T1 site report

25

YAMSS: hardware set-up

20x4 Gbps

~ 500 TBfor GPFSon CX4-960

4 GridFTP servers (4x2 Gbps)6 NSD servers (6x2 Gbps) on LAN

HSMSTA

HSMSTA

HSMSTA

8x4 Gbps

3x4 Gbps

3x4 Gbps

db

8x4 Gbps 8 tape drives T10KB:- 1 TB per tape, - 1 Gbps per driveTAN

SAN

TSM server

4 Gbps FC

Page 26: INFN-T1 site report

YAMSS: validation tests

Concurrent access in r/w to MSS for transfers and from farmStoRM not used in these tests

3 HSM nodes serving 8 T10KB drives6 drives (at maximum) used for recalls2 drives (at maximum) used for migrations

Order of 1GB/s of aggregated traffic

26

• ~550 MB/s from tape to disk

• ~100 MB/s from disk to tape

• ~400 MB/s from disk to the computing

nodes (not shown in this graph)

Page 27: INFN-T1 site report

Questions?

27


Recommended