+ All Categories
Home > Documents > PRAGUE site report

PRAGUE site report

Date post: 05-Jan-2016
Category:
Upload: xerxes
View: 34 times
Download: 0 times
Share this document with a friend
Description:
PRAGUE site report. Overview. Supported HEP experiments and staff Hardware on Prague farms Statistics about running LHC experiment’s DC Experience. Experiments and people. Three institutions in Prague Academy of Sciences of the Czech Republic Charles University in Prague - PowerPoint PPT Presentation
Popular Tags:
18
1 PRAGUE site report
Transcript
Page 1: PRAGUE site report

1

PRAGUE site report

Page 2: PRAGUE site report

2

Overview

• Supported HEP experiments and staff

• Hardware on Prague farms

• Statistics about running LHC experiment’s DC

• Experience

Page 3: PRAGUE site report

3

Experiments and people• Three institutions in Prague

– Academy of Sciences of the Czech Republic– Charles University in Prague– Czech Technical University in Prague

• Collaborate on experiments– CERN – ATLAS, ALICE, TOTEM, *AUGER*– FNAL – D0– BNL - STAR– DESY – H1

• Collaborating community 125 persons– 60 researchers– 43 students and PHD students– 22 engineers and 21 technicians

• LCG Computing staff – takes care of GOLIAS (farm at IOP AS CR) and SKURUT (farm located at CESNET)

– Jiri Kosina – LCG, experiment software support, networking– Jiri Chudoba – ATLAS and ALICE SW and running– Jan Svec – HW, operating system, PbsPro, networking, D0 SW support (SAM, JIM)

• Vlastimil Hynek – run D0 simulations – Lukas Fiala – HW, networking, web

Page 4: PRAGUE site report

4

Available HW in Prague• Two independent farms in Prague

– GOLIAS – Institute of Physics AS CR• LCG2 (testZone - ATLAS & ALICE production), D0

(SAM and JIM installation)– SKURUT – CESNET, z.s.p.o.

• EGEE preproduction farm, also used for ATLAS DC• Separate nodes used for GILDA (tool/interface

developed at INFN to allow new users to easily use grid and demonstrate it’s power) with GENIUS installed on top of user interface

– Sharing of resourcesD0:ATLAS:ALICE= 50:40:10 (dynamically changed when needed)

• GOLIAS: – 80 nodes (2 CPUs each), 40 TB

• 32 dual CPU nodes PIII1.13GHz, 1GB RAM• In July 04 bought new 49 dual CPU Xeon 3.06 GHz, 2

GB RAM (WN)– Currenlty considering, if HT should be on/off (memory,

scheduler problems in older(?) kernels).• 10 TB disk space, we use LVM to create 3 volumes with

3 TB, one per experiment, nfs mounted on SE.• In July 04 + 30 TB disk space, now in tests (30 TB XFS

NFS-exported partition. Unreliable with pre-2.6.5 kernels, newer seem reliable so far)

• PBSPro batch system

– New server room: 18 racks, more than half empty yet, 180 kW secured input electric power

GOLIAS

Page 5: PRAGUE site report

5

Available HW in Prague• Skurut – located at

CESNET• 32 dual CPU nodes PIII

700MHz, 1GB RAM (16 LCG2 + 16 GILDA)

• OpenPBS batch system• LCG2 installation: 1xCE+UI,

1xSE, WNs (count varies)• GILDA installation: 1xCE+UI,

1xSE, 1xRB(installation in progress). WNs are manually moved to LCG2 or GILDA, as needed.

• Will be used for EGEE tutorial

Page 6: PRAGUE site report

6

Network connection• General – Geant connection

– 1 Gbps backbone at GOLIAS, over 10 Gbps Metropolitan Prague backbone

– CZ - GEANT 2.5 Gbps (over 10 Gbps HW)– USA 0.8 Gbps (Telia)

• Dedicated connection – provided by CESNET– Delivered by CESNET in Collaboration with NetherLight

• 1 Gbps (10 Gbps line) optical connection Golias-CERN (currently routed by PC with PCI-X bus, seems to be sufficient for the current traffic)

• Planning to use liberouter - http://www.liberouter.org - hardware gigabit router PC card

• Plan to provide the connection for other institutions in Prague

– In consideration connections to FERMILAB, RAL or Taipei– Independent optical connection between the collaborating Institutes

in Prague, will be finished by end 2004

Page 7: PRAGUE site report

7

Data Challenges

Page 8: PRAGUE site report

8

ATLAS - July 1 – September 21GOLIAS jobs CPU

(days)

Elapsed

(days)

all 4811 1653 1992

long (cpu>100s) 2377 1653 1881

short 2434 .4 111

SKURUT jobs CPU

(days)

Elapsed

(days)

all 1446 1507 1591

long (cpu>100s) 870 1507 1554

short 576 .2 37

number of jobs in DQ: 1349 done 1231 failed = 2580 jobs, 52%

number of jobs in DQ: 362 done 572 failed = 934 jobs, 38%

Page 9: PRAGUE site report

9

Local job distribution

• GOLIAS– not enough ATLAS jobs

ALICE

D0

ATLAS

2 Aug 23 Aug

Page 10: PRAGUE site report

10

Local job distribution

• SKURUT – ATLAS jobs– usage much better

Page 11: PRAGUE site report

11

ATLAS - CPU Time

PIII1.13GHz

Xeon 3.06GHz

hours hours

PIII700MHz

hours

queue limit: 48 hours later changed to 72 hours

Page 12: PRAGUE site report

12

Statistics for 1.7.-6.10.2004 ATLAS - Jobs distribution

0

20

40

60

80

100

120

140

160

180

200

13.7

.200

4

20.7

.200

4

27.7

.200

4

3.8.

2004

10.8

.200

4

17.8

.200

4

24.8

.200

4

31.8

.200

4

7.9.

2004

14.9

.200

4

21.9

.200

4

28.9

.200

4

5.10

.200

4

njo

bs

jobs started

Page 13: PRAGUE site report

13

ATLAS - Real and CPU Time

0

100

200

300

400

500

600

0 1000 10000 30000 50000 70000 90000 110000 130000 150000 170000 190000 210000 230000 250000

time (s)

njo

bs

CPU time

Real time

very long tail for real time – some jobs were hanging during IO operation

Page 14: PRAGUE site report

14

ATLAS Total statistics

• Total time used:– 1593 days of CPU time– 1829 days of real time

Page 15: PRAGUE site report

16

memory

0

500

1000

1500

2000

2500

3000

3500

4000

4500

010

020

030

040

050

060

070

080

090

010

0011

0012

0013

0014

0015

0016

0017

0018

0019

00

memory (MB)

njo

bs

memory

ALICE jobs 1.7.- 6.10. 04

Page 16: PRAGUE site report

17

ALICE

0

10002000

30004000

50006000

70008000

9000

time (s)

njo

bs

CPU time

Real time

Page 17: PRAGUE site report

19

ALICE Total statistics

• Total time used:– 2076 days of CPU time– 2409 days of real time

Page 18: PRAGUE site report

20

LCG installation

• LCG installation on GOLIAS– We use PBSPro. In cooperation with Peer Haaselmayer (FZK),

“cookbook” for LCG2+PBSPro was created (some patching is needed)

– Worker nodes – the first node installation is done using LCFGng, then immediately it is switched off

– From then on everything is done manually - we find it much more convenient and transparent and manual installation guide helps.

– Currently installed LCG2 version 2_2_0• LCG installation on SKURUT

– almost default LCG2 installation, only with some PBS queues properties tweaking

– we recently found that openpbs in LCG2 already contains required_property patch, which is very convenient for better resource management

• currently trying somehow to integrate this feature into PBSPro


Recommended