+ All Categories
Home > Documents > CERN Computer Centre Tier SC4 Planning FZK October 20 th 2005 Harry.Renshall@ CERN.ch.

CERN Computer Centre Tier SC4 Planning FZK October 20 th 2005 Harry.Renshall@ CERN.ch.

Date post: 29-Jan-2016
Category:
Upload: nora-goodman
View: 218 times
Download: 0 times
Share this document with a friend
17
CERN Computer Centre Tier SC4 Planning FZK October 20 th 2005 Harry.Renshall@CERN .ch
Transcript
Page 1: CERN Computer Centre Tier SC4 Planning FZK October 20 th 2005 Harry.Renshall@ CERN.ch.

CERN Computer Centre Tier SC4 Planning

FZK October 20th 2005

[email protected]

Page 2: CERN Computer Centre Tier SC4 Planning FZK October 20 th 2005 Harry.Renshall@ CERN.ch.

2 [email protected]

See https://uimon.cern.ch/twiki/bin/view/LCG/ServiceChallengeFourProgress Twiki shows work in progress:

– Service Level Definition - what is required – Technical Factors - components, capacity and

constraints – LCG Service Co-ordination Meeting Status – The set of activites required to deliver the building

blocks on which SC4 can be built

Leads to our (evolving) hardware configurations for grid servers, operational procedures and staffing

Hope it will prove useful to other sites

Page 3: CERN Computer Centre Tier SC4 Planning FZK October 20 th 2005 Harry.Renshall@ CERN.ch.

3 [email protected]

Using existing buildings

Physical location– B513

» Main Computer Room, ~1,500m2 & 1.5kW/m2, built for mainframes in 1970, upgraded for LHC PC clusters 2003-2005.

» Second ~1,200m2 room created in the basement in 2003 as additional space for LHC clusters and to allow ongoing operations during the main room upgrade. Cooling limited to 500W/m2.

» Contains half of tape robotics (less heat/m2).

– Tape Robot building ~50m from B513» Constructed in 2001 to avoid loss of all CERN data due to an

incident in B513. Contains half of tape robotics.

Page 4: CERN Computer Centre Tier SC4 Planning FZK October 20 th 2005 Harry.Renshall@ CERN.ch.

4 [email protected]

Capacity today 2000 KSi2K batch – 1100 worker nodes

– Adding 2000 KSi2K December

10 STK tape silos of 6000 slots– 5 interconnected silos each in two separate buildings– Physics data split– About half of slots now occupied (after media

migration)– 50 9940B tape drives – 30 MB/sec– 200GB capacity cartridges – 6PB total

About 2 PB raw disk storage – older servers used mirrored, newer as raid.

Page 5: CERN Computer Centre Tier SC4 Planning FZK October 20 th 2005 Harry.Renshall@ CERN.ch.

5 [email protected]

Activities

» Physics computing services Interactive cluster - lxplus Batch computing - lxbatch Data recording, storage and management Grid computing infrastructure

» Laboratory computing infrastructure Campus networks—general purpose and technical Home directory, email & web servers (10k+ users) Administrative computing servers

Page 6: CERN Computer Centre Tier SC4 Planning FZK October 20 th 2005 Harry.Renshall@ CERN.ch.

6 [email protected]

Physics Computing Requirements

25,000k SI2K in 2008, rising to 56,000k in 2010– 2,500-3,000 boxes (multicore, blade … ?)– 500kW-600kW @ 200W/box.

2.5MW @ 0.1W/SI2K

6,800TB online disk in 2008, 11,800TB in 2010– 1,200-1,500 boxes,– 600kW-750kW

15PB of data per year– 30,000 500GB cartridges/year– Five 6,000 slot robots/year

Sustained data recording at up to 2GB/s– Over 250 tape drives and associated servers

Page 7: CERN Computer Centre Tier SC4 Planning FZK October 20 th 2005 Harry.Renshall@ CERN.ch.

7 [email protected]

Tape plans By end 2005 we will have 40 high-duty-cycle

new model tape drives and matching robotics from each of IBM (3592B) and another vendor for evaluation.

Drive data rates are expected to approach 100MB/sec

Cartridge sizes are expected to approach 500GB

Cartridge costs canonical US$120 so about 25cts/GB (compared with 60 cts/GB today).

For LHC startup operations we plan on 200 drives with these characteristics.

Page 8: CERN Computer Centre Tier SC4 Planning FZK October 20 th 2005 Harry.Renshall@ CERN.ch.

8 [email protected]

Grid operations servers Hardware matched to QoS requirements

– today mostly on ad-hoc older disk servers/farm PCs– Migrate immediately critical/high services to more

reliable but simple mid-range servers– Evaluate high availibility solutions to be deployed by

SC4 startup looking at:» FC San multiple host/disk interconnects» HA linux (automatic failover)» Logical volume replication» Application level replication» Ready to go spare hardware for less critical services (with

simple operational procedures)

– Objective to reach availability levels 24 by 7.

Page 9: CERN Computer Centre Tier SC4 Planning FZK October 20 th 2005 Harry.Renshall@ CERN.ch.

9 [email protected]

Mid-range server building block

Dual 2.8GHz Xeon, 2GB mem, 4 hot-swap 250GB disks

Page 10: CERN Computer Centre Tier SC4 Planning FZK October 20 th 2005 Harry.Renshall@ CERN.ch.

10 [email protected]

Mid-range server - back

Dual gigabit ethernet, dual power supply

Page 11: CERN Computer Centre Tier SC4 Planning FZK October 20 th 2005 Harry.Renshall@ CERN.ch.

11 [email protected]

Mid-range server burnin test racks

2 different vendors

Page 12: CERN Computer Centre Tier SC4 Planning FZK October 20 th 2005 Harry.Renshall@ CERN.ch.

12 [email protected]

Building 513 ground floor

Space for more…

Page 13: CERN Computer Centre Tier SC4 Planning FZK October 20 th 2005 Harry.Renshall@ CERN.ch.

13 [email protected]

Current Oracle RAC cluster building blocks

Fibre-channel disks/switches infrastructure

Page 14: CERN Computer Centre Tier SC4 Planning FZK October 20 th 2005 Harry.Renshall@ CERN.ch.

14 [email protected]

Oracle RAC cluster

SATA2 FC attached disks and Qlogic switches

Page 15: CERN Computer Centre Tier SC4 Planning FZK October 20 th 2005 Harry.Renshall@ CERN.ch.

15 [email protected]

Oracle RAC cluster - back

Qlogic HBAs in mid-range servers

Page 16: CERN Computer Centre Tier SC4 Planning FZK October 20 th 2005 Harry.Renshall@ CERN.ch.

16 [email protected]

Oracle RAC cluster

Dual FC switch connections

Page 17: CERN Computer Centre Tier SC4 Planning FZK October 20 th 2005 Harry.Renshall@ CERN.ch.

17 [email protected]

Who

– Contract Shift Operators: 1 person 24x7

– Technician level System Administration Team» 10 team members plus 3 people for machine room

operations plus engineer level manager. 24 by 7 on-call.

– Engineer level teams for Physics computing» System & Hardware support: approx 10FTE» Service support: approx 10FTE» ELFms software: 3FTE plus students and collaborators.

~30FTE-years total investment since 2001


Recommended