+ All Categories
Home > Documents > GridPP Deployment Status GridPP14 Jeremy Coles [email protected] 6 th September 2005.

GridPP Deployment Status GridPP14 Jeremy Coles [email protected] 6 th September 2005.

Date post: 04-Jan-2016
Category:
Upload: bethany-wheeler
View: 218 times
Download: 0 times
Share this document with a friend
Popular Tags:
25
GridPP Deployment Status GridPP14 Jeremy Coles [email protected] 6 th September 2005
Transcript
Page 1: GridPP Deployment Status GridPP14 Jeremy Coles J.Coles@rl.ac.uk 6 th September 2005.

GridPP Deployment Status

GridPP14

Jeremy [email protected]

6th September 2005

Page 2: GridPP Deployment Status GridPP14 Jeremy Coles J.Coles@rl.ac.uk 6 th September 2005.

Overview

2 Trends in basic EGEE metrics

3 Utilisation and efficiency

4 Deployment priorities

5 Brief look at service challenges

6 Summary

1 The main changes over the last two months

Page 3: GridPP Deployment Status GridPP14 Jeremy Coles J.Coles@rl.ac.uk 6 th September 2005.

Old vs new SFT

• http://goc.grid.sinica.edu.tw/gocwiki/Site_Functional_Tests• See Piotr Nyczyk’s mail to LCG-ROLLOUT 21st July

Page 4: GridPP Deployment Status GridPP14 Jeremy Coles J.Coles@rl.ac.uk 6 th September 2005.

Old vs new SFT

1. Change in critical tests2. Change in impact of test order3. Tests are run more regularly4. THINGS NOW LOOK MUCH MORE

STABLE!

Page 5: GridPP Deployment Status GridPP14 Jeremy Coles J.Coles@rl.ac.uk 6 th September 2005.

The new SFTs are used to populate regional weekly

views

Page 6: GridPP Deployment Status GridPP14 Jeremy Coles J.Coles@rl.ac.uk 6 th September 2005.

… and monthly views. The variations need to be

understood (avg. 24hrs)

Sites with large farms upgrading?

Tier-1 scheduler lost

Page 7: GridPP Deployment Status GridPP14 Jeremy Coles J.Coles@rl.ac.uk 6 th September 2005.

GridPP is still the largest contributor of resources

Page 8: GridPP Deployment Status GridPP14 Jeremy Coles J.Coles@rl.ac.uk 6 th September 2005.

UK job slots have increased by >20% in last few months

0

500

1000

1500

2000

2500

3000

3500

Date

Pu

bli

shed

jo

b s

lots

UK total job slots

Page 9: GridPP Deployment Status GridPP14 Jeremy Coles J.Coles@rl.ac.uk 6 th September 2005.

Next to CERN additions this is one of the major recent

increases

0

2000

4000

6000

8000

10000

12000

14000

16000

18000

20000

06/2

4/04

07/0

6/20

04

07/1

8/04

07/3

0/04

08/1

1/20

04

08/2

3/04

09/0

4/20

04

09/1

6/04

09/2

8/04

10/1

0/20

04

10/2

2/04

11/0

3/20

04

11/1

5/04

11/2

7/04

12/0

9/20

04

12/2

1/04

01/0

2/20

05

01/1

4/05

01/2

6/05

02/0

7/20

05

02/1

9/05

03/0

3/20

05

03/1

5/05

03/2

7/05

04/0

8/20

05

04/2

0/05

05/0

2/20

05

05/1

4/05

05/2

6/05

06/0

7/20

05

06/1

9/05

07/0

1/20

05

07/1

3/05

07/2

5/05

08/0

6/20

05

08/1

8/05

EGEE total job slots UK total job slots

Page 10: GridPP Deployment Status GridPP14 Jeremy Coles J.Coles@rl.ac.uk 6 th September 2005.

Contribution to EGEE CPU resources therefore remains

good at ~20%

0.00%

5.00%

10.00%

15.00%

20.00%

25.00%

30.00%

35.00%

06/2

4/04

07/0

7/20

04

07/2

0/04

08/0

2/20

04

08/1

5/04

08/2

8/04

09/1

0/20

04

09/2

3/04

10/0

6/20

04

10/1

9/04

11/0

1/20

04

11/1

4/04

11/2

7/04

12/1

0/20

04

12/2

3/04

01/0

5/20

05

01/1

8/05

01/3

1/05

02/1

3/05

02/2

6/05

03/1

1/20

05

03/2

4/05

04/0

6/20

05

04/1

9/05

05/0

2/20

05

05/1

5/05

05/2

8/05

06/1

0/20

05

06/2

3/05

07/0

6/20

05

07/1

9/05

08/0

1/20

05

08/1

4/05

08/2

7/05

Date

Per

cen

tag

e co

ntr

ibu

tio

n

UK % total CPU

Page 11: GridPP Deployment Status GridPP14 Jeremy Coles J.Coles@rl.ac.uk 6 th September 2005.

This has translated into GridPP taking an average of

about 20% of the work recently

0.00%

10.00%

20.00%

30.00%

40.00%

50.00%

60.00%

70.00%

06/0

2/20

04

06/1

6/04

06/3

0/04

07/1

4/04

07/2

8/04

08/1

1/20

04

08/2

5/04

09/0

8/20

04

09/2

2/04

10/0

6/20

04

10/2

0/04

11/0

3/20

04

11/1

7/04

12/0

1/20

04

12/1

5/04

12/2

9/04

01/1

2/20

05

01/2

6/05

02/0

9/20

05

02/2

3/05

03/0

9/20

05

03/2

3/05

04/0

6/20

05

04/2

0/05

05/0

4/20

05

05/1

8/05

06/0

1/20

05

06/1

5/05

06/2

9/05

07/1

3/05

07/2

7/05

08/1

0/20

05

08/2

4/05

Date

Per

cen

tag

e o

f jo

bs

in U

K

% EGEE jobs running in UK

Page 12: GridPP Deployment Status GridPP14 Jeremy Coles J.Coles@rl.ac.uk 6 th September 2005.

Which reflects the fact that our sites remain at least as stable as the EGEE average

0

5

10

15

20

25

30

35

40

45

24/0

1/20

05

07/0

2/20

05

21/0

2/20

05

07/0

3/20

05

21/0

3/20

05

04/0

4/20

05

18/0

4/20

05

02/0

5/20

05

16/0

5/20

05

30/0

5/20

05

13/0

6/20

05

27/0

6/20

05

11/0

7/20

05

25/0

7/20

05

08/0

8/20

05

22/0

8/20

05

gst

at m

etri

c

EGEE

UKI

Page 13: GridPP Deployment Status GridPP14 Jeremy Coles J.Coles@rl.ac.uk 6 th September 2005.

A reminder of the “gstat metric” basis

Status Description Example

0 na or no status available

10 ok or normal status No problems

20 info or useful information Storage over 90% full

30 note or important information GridIce tests are failing

40 warn or subject mail fail soon Blank values or wrong format in configuration

50 error or subject has failed and problem is localised

A query failed (e.g. no cpu information found)

60 crit or subject has failed and problem is fatal

maint or subject is under maintenance Scheduled downtime at site

off or subject has monitoring off Site is undertaking work that would trigger alerts

Gstat metric = ((#ok sites)*10+(#info sites)*20+(#note sites)*30+(#warn sites)*40+(#error sites)*50+(#crit sites)*60) / (#sites – (#maint+#off))

Page 14: GridPP Deployment Status GridPP14 Jeremy Coles J.Coles@rl.ac.uk 6 th September 2005.

Occupancy averages at 55% for August (26% for period from June

04)

0.00%

10.00%

20.00%

30.00%

40.00%

50.00%

60.00%

70.00%

80.00%

90.00%

06/0

2/20

04

06/1

6/04

06/3

0/04

07/1

4/04

07/2

8/04

08/1

1/20

04

08/2

5/04

09/0

8/20

04

09/2

2/04

10/0

6/20

04

10/2

0/04

11/0

3/20

04

11/1

7/04

12/0

1/20

04

12/1

5/04

12/2

9/04

01/1

2/20

05

01/2

6/05

02/0

9/20

05

02/2

3/05

03/0

9/20

05

03/2

3/05

04/0

6/20

05

04/2

0/05

05/0

4/20

05

05/1

8/05

06/0

1/20

05

06/1

5/05

06/2

9/05

07/1

3/05

07/2

7/05

08/1

0/20

05

08/2

4/05

Date

% j

ob

slo

ts u

sed

% EGEE slots used % UK slots used

Page 15: GridPP Deployment Status GridPP14 Jeremy Coles J.Coles@rl.ac.uk 6 th September 2005.

Several sites have been running full for July/August.

The plot below is for the Tier-1 in August

Maximum Capacity

Page 16: GridPP Deployment Status GridPP14 Jeremy Coles J.Coles@rl.ac.uk 6 th September 2005.

August was the busiest month for the Tier-1 as

evidenced by the total KSI2K delivered (KSI2K*CPUMonths)

CPU Use (KSI2K*CPUMonths)

0

100

200

300

400

500

600

700

800

900

J an Feb Mar Apr May J un J ul Aug Sep Oct Nov Dec

Other

SNO

Zeus

H1

Minos

UKQCD

LHCB

DZERO

CMS

CDF

Babar

Atlas

Alice

Page 17: GridPP Deployment Status GridPP14 Jeremy Coles J.Coles@rl.ac.uk 6 th September 2005.

There has been a Tier-1 investigation into job efficiency over the year (CPU time/Elapsed

time

• Low efficiencies impact utilisation (in terms of CPU time provided)

•Produced by global performance problems on LCG SEs, coupled with problems in logging and book-keeping services

• Approximately 400 KSI2K*CPUmonths per month Feb-June – about 50% of total capacity

•Farm occupancy (job slots used) has increased

>1 if job runs more than 1 CPU intensive process

Page 18: GridPP Deployment Status GridPP14 Jeremy Coles J.Coles@rl.ac.uk 6 th September 2005.

Specific weighted job efficiencies for ATLAS in

July

• Straight line structures show jobs which ran for a period of time before blocking on an external resource and eventually being killed by an elapsed time limit• Clusters at low efficiency probably show performance problems on external storage elements• Many problems seen here are NOW FIXED

Page 19: GridPP Deployment Status GridPP14 Jeremy Coles J.Coles@rl.ac.uk 6 th September 2005.

We have seen a good general response to 2.6.0

deployment

0

5

10

15

20

25

30

35

40

24/0

1/20

05

07/0

2/20

05

21/0

2/20

05

07/0

3/20

05

21/0

3/20

05

04/0

4/20

05

18/0

4/20

05

02/0

5/20

05

16/0

5/20

05

30/0

5/20

05

13/0

6/20

05

27/0

6/20

05

11/0

7/20

05

25/0

7/20

05

08/0

8/20

05

22/0

8/20

05

Date

Sit

es a

t re

leas

e

LCG-2_4_0 LCG-2_3_1 LCG-2_3_0 LCG-2_5_0 LCG-2_6_0 Sites

Page 20: GridPP Deployment Status GridPP14 Jeremy Coles J.Coles@rl.ac.uk 6 th September 2005.

SRMs and data migration

• SRMs and data migration – dCache/DPM– We have most experience with dCache-SRM but

gaining knowledge of DPM– The mailing list remains active – join and review

the archives BEFORE attempting an installation so that we can support you better

– There is now a GridPP wiki, which brings us on to …

Links to all areas mentioned can be found on the deployment links page:http://www.gridpp.ac.uk/deployment/links.html

Page 21: GridPP Deployment Status GridPP14 Jeremy Coles J.Coles@rl.ac.uk 6 th September 2005.

Our support model needs to be developed

UKI ROC ticket tracking system(Footprints)

Site ASite A

Site ASite A

GGUS

Regional service 1Regional service 1

Regional service 1

Tier-1 helpdesk(Remedy)

Grid-Ireland helpdesk(Remedy)

GOSC(Footprints)

CIC-on-duty

Users Experiments/VOs

Savannah – bug tracking

Site administrators

LCG-ROLLOUT

TB-SUPPORT

Page 22: GridPP Deployment Status GridPP14 Jeremy Coles J.Coles@rl.ac.uk 6 th September 2005.

Other areas (examples)

Technical• Implications of LCG Baseline Services Group findings• Procurement and deployment of more resources while

maintaining a steady service

General• PPARC signs the LCG MoU shortly – this commits all sites to

a certain basic level of service (Tier-2s 72hrs response)• The operations workshop at Culham (near RAL) later this

monthhttp://egee.in2p3.fr/events/UKI/

• A training course for GridPP sysadmins to help prepare sites for SC4 and the increasing service demands (PPARC signs an LCG MoU soon!)

• A UK support workshop for users and sysadmins?

Page 23: GridPP Deployment Status GridPP14 Jeremy Coles J.Coles@rl.ac.uk 6 th September 2005.

Service Challenge 3 enters a new phase

Phase 1 (throughput tests) – July 2005– dCache-SRM working at all sites– Tier-1 managed rates (on UKLIGHT) up to 650 Mb/s to CERN.

This is similar to SC2 rates. – Edinburgh – 10TB data transferred. Sustained rates of 220-

250Mb/s– Imperial – Rates reached 400-480 Mb/s– Lancaster – 958GB (978 files) over 8 days (~27Mb/s

sustained)

Phase 2 (service phase) from 1st September 2005– The experiments will use the SC3 infrastructure for testing

their models and production– Experiment (basic functionality) test jobs are being

developed (to run as part of the SFTs) to check sites

Page 24: GridPP Deployment Status GridPP14 Jeremy Coles J.Coles@rl.ac.uk 6 th September 2005.

Service Challenge 4 will affect all sites – start

preparing!

• SC4 consists of a Setup Phase starting on 1st April 2006, during which a number of Throughput tests will be performed

• followed by a Service Phase from 1st May 2006 until the 30th September 2006

• All service components for SC4 need to be delivered ready for production by the 31st January 2006

• Final testing and integration of components and services must be completed by 31st March 2006

… more details in the panel discussion later today.

Page 25: GridPP Deployment Status GridPP14 Jeremy Coles J.Coles@rl.ac.uk 6 th September 2005.

Summary

2 GridPP remains a major contributor to LCG/EGEE resources

3 Use of resources is increasing – there were concerns about efficiency

4 Sites did well with the upgrade during a vacation period

6 Service Challenge 3 enters the “Service Phase”. SC4 planning starts

1 We have seen changes in SFTs

5 Two major deployment tasks – support & SRM implementations


Recommended