+ All Categories
Home > Documents > GridPP3 Project Management GridPP20 Sarah Pearce 11 March 2008.

GridPP3 Project Management GridPP20 Sarah Pearce 11 March 2008.

Date post: 03-Jan-2016
Category:
Upload: tracy-hamilton
View: 216 times
Download: 2 times
Share this document with a friend
Popular Tags:
15
GridPP3 Project Management GridPP20 Sarah Pearce 11 March 2008
Transcript

GridPP3 Project Management

GridPP20Sarah Pearce

11 March 2008

11 March 2008 GridPP20: GridPP3 Project Management

Slide 2

11 March 2008 GridPP20: GridPP3 Project Management

Slide 3

What’s the project map for?

• To show us how well GridPP is delivering against requirements

• To report to the Oversight Committee on:– Areas where GridPP is doing well (or OK)

– Areas that need attention

• To report on staff posts

• GridPP is not in direct control of all metrics, but can aim to put pressure in areas where we see problems

• Will need to be complete for next OC – May?

11 March 2008 GridPP20: GridPP3 Project Management

Slide 4

From production…

0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 0.10 0.11 0.12 0.13 0.14 0.15 0.16 0.17 0.100 0.101 0.102 0.103 0.104 0.105 0.106 0.107 0.108 0.109 0.110 0.111 0.112 0.113 0.114 0.115 0.116

0.18 0.19 0.20 0.21 0.22 0.23 0.24 0.25 0.26 0.27 0.28 0.29 0.30 0.31 0.32 0.33 0.34 0.117 0.118 0.119 0.120 0.121 0.122 0.123 0.124 0.125 0.126 0.127 0.128 0.129 0.130 0.131 0.132 0.133

0.35 0.36 0.37 0.38 0.39 0.40 0.41 0.42 0.43 0.44 0.45 0.46 0.47 0.48 0.49 0.50 0.51 0.134 0.135 0.136 0.137 0.138 0.139 0.140 0.141 0.142 0.143 0.144 0.145 0.146 0.1470.52 0.53 0.54 0.55 0.56 0.57 0.58 0.59 0.60 0.61 0.62 0.63 0.64 0.65 0.66 0.67 0.68

2.1 3.1 4.1 5.1 6.1 1.1.1 1.1.2 1.1.3 1.1.4 2.1.1 2.1.2 2.1.3 2.1.4 2.1.5 3.1.1 3.1.2 3.1.3 3.1.4 3.1.5 4.1.1 4.1.2 4.1.3 4.1.4 4.1.5 5.1.1 5.1.2 5.1.3 5.1.4 5.1.5 6.1.1 6.1.2 6.1.3 6.1.4 6.1.5

1.1.5 2.1.6 2.1.7 2.1.8 2.1.9 2.1.10 3.1.6 3.1.7 3.1.8 3.1.9 3.1.10 4.1.6 4.1.7 4.1.8 4.1.9 4.1.10 5.1.6 5.1.7 5.1.8 5.1.9 5.1.10 6.1.6 6.1.7 6.1.8 6.1.9

2.1.11 2.1.12 3.1.11 3.1.12 3.1.13 4.1.11 4.1.12 5.1.11 5.1.12

2.2 3.2 4.2 5.2 6.2 1.2.1 1.2.2 1.2.3 1.2.4 2.2.1 2.2.2 2.2.3 2.2.4 2.2.5 3.2.1 3.2.2 3.2.3 3.2.4 3.2.5 4.2.1 4.2.2 4.2.3 4.2.4 4.2.5 5.2.1 5.2.2 5.2.3 5.2.4 5.2.5 6.2.1 6.2.2 6.2.3 6.2.4 6.2.5

1.2.5 2.2.6 2.2.7 2.2.8 2.2.9 2.2.10 3.2.6 3.2.7 4.2.6 4.2.7 4.2.8 4.2.9 4.2.10 5.2.6 5.2.7 5.2.8 5.2.9 5.2.10 6.2.6 6.2.7 6.2.8 6.2.9 6.2.10

2.2.11 2.2.12 2.2.13 2.2.14 2.2.15 4.2.11 4.2.12 4.2.13 4.2.14 4.2.15 5.2.11 5.2.12 5.2.13 5.2.14 5.2.15 6.2.11 6.2.12 6.2.13 6.2.14

2.3 3.3 4.3 6.3 1.3.1 1.3.2 1.3.3 2.3.1 2.3.2 2.3.3 2.3.4 2.3.5 3.3.1 3.3.2 3.3.3 3.3.4 3.3.5 4.3.1 4.3.2 4.3.3 4.3.4 4.3.5 6.3.1 6.3.2 6.3.3 6.3.4 6.3.5

2.3.6 2.3.7 2.3.8 2.3.9 2.3.10 3.3.6 3.3.7 3.3.8 3.3.9 3.3.10 4.3.6 4.3.7 4.3.8 4.3.9 4.3.10

2.3.11 3.3.11 3.3.12 3.3.13 4.3.11 4.3.12 4.3.13

2.4 3.4 4.4 6.4 2.4.1 2.4.2 2.4.3 2.4.4 2.4.5 3.4.1 3.4.2 3.4.3 3.4.4 3.4.5 4.4.1 4.4.2 4.4.3 4.4.4 4.4.5 6.4.1 6.4.2 6.4.3 6.4.4

2.4.6 2.4.7 2.4.8 2.4.9 2.4.10 3.4.6 3.4.7 3.4.8 3.4.9 3.4.10 4.4.6 4.4.7 4.4.8 4.4.9 4.4.10

2.4.11 2.4.12 2.4.13 2.4.14 2.4.15 3.4.11 3.4.12 3.4.13 3.4.14 3.4.15

2.5 3.5 90 Days2.5.1 2.5.2 2.5.3 2.5.4 2.5.5 3.5.1 3.5.2 3.5.3 3.5.4 3.5.5

2.5.6 2.5.7 2.5.8 2.5.9 2.5.10 3.5.6 3.5.7 3.5.8 3.5.9 Monitor OK 1.1.1 2.5.11 2.5.12 2.5.13 2.5.14 Monitor not OK 1.1.1 Milestone complete 1.1.1

2.6 3.6 Milestone overdue 1.1.1

2.6.1 2.6.2 2.6.3 2.6.4 2.6.5 3.6.1 3.6.2 3.6.3 3.6.4 3.6.5 Milestone due soon 1.1.1

2.6.6 2.6.7 2.6.8 2.6.9 2.6.10 3.6.6 3.6.7 3.6.8 3.6.9 3.6.10 Milestone not due soon 1.1.1

2.6.11 2.6.12 2.6.13 Item not Active 1.1.1

Other Link Network LHC Deployment

Project Planning

CMS

Portal

Status Date - 30/Jun/07 + next

UKQCD

Navigate downExternal link

PhenoGrid

LHC Apps

1.1

1.3

Security

InfoMon

Design

Service Challenges

Production Grid Milestones Production Grid Metrics

1LCG External

4M/S/N

5Non-LHC Apps Management

GridPP2 Goal: To develop and deploy a large scale production quality grid in the UK for the use of the Particle Physics community

2 3

Knowledge Transfer

LHCb

GANGA

ATLAS

InteroperabilitySamGrid

EngagementWorkload

6

1.2

Development

Dissemination

Project Execution

BaBarMetadata

Storage

Update

Clear

11 March 2008 GridPP20: GridPP3 Project Management

Slide 5

…to exploitation

1.1 1.2 1.3 1.4

2.1 3.1 4.1 5.1 6.1

2.2 3.2 4.2 5.2 6.2

2.3 3.3 4.3 6.3

2.4 3.4 4.4 6.4

2.5 0.1

Navigate downExternal linkLink to goals

3 4 5 6Tier-2 Management External

Outreach &

management

engagementNorthGrid

Resource delivery

Tier-1

London EGEE

National GridInfrastructure

transitionsupport

ScotGrid

Grid services

Middleware GridPP2+

Hardware procurement

Other experiments

Planning

SouthGrid Deployment

To provide UK computing for the Large Hadron ColliderGridPP3 Goal

Front end systems

LCG

LHCb

Operations

2

& tracking

ATLAS CMS

Storage systems

& deployment

Data and storage

Security

Network

11 March 2008 GridPP20: GridPP3 Project Management

Slide 6

Main features

• Led by experiments – key to delivering for LHC

• Tier-1 and Tier-2 areas– Aggregated per Tier-2

• Mainly metrics, with some deliverables

• Based around services delivered – especially meeting MoU commitments

• Includes section for GridPP2+

11 March 2008 GridPP20: GridPP3 Project Management

Slide 7

Milestones and metrics

ATLAS LHCb CMS 1.1 1.2 1.3 1.4

1.1.1 1.1.2 1.1.3 1.1.4 1.1.5 1.2.1 1.2.2 1.2.3 1.2.4 1.2.5 1.3.1 1.3.2 1.3.3 1.3.4 1.3.5 1.4.1 1.4.2 1.4.3 1.4.4 1.4.51.1.6 1.1.7 1.1.8 1.1.9 1.1.10 1.2.6 1.2.7 1.2.8 1.2.9 1.2.10 1.3.6 1.4.6 1.4.7 1.4.8

1.1.11 1.1.12 1.1.13 1.1.14 1.1.15 1.2.11 1.2.121.1.16 1.1.17 1.1.18 1.1.19 1.1.20

2.1 3.1 4.1 5.1 6.1 2.1.1 2.1.2 2.1.3 2.1.4 2.1.5 3.1.1 3.1.2 3.1.3 3.1.4 3.1.5 4.1.1 4.1.2 4.1.3 4.1.4 4.1.5 5.1.1 5.1.2 5.1.3 5.1.4 5.1.5 6.1.1 6.1.2 6.1.3 6.1.4 6.1.52.1.6 2.1.7 2.1.8 2.1.9 2.1.10 3.1.6 3.1.7 3.1.8 3.1.9 3.1.10 4.1.6 4.1.7 4.1.8 4.1.9 4.1.10 5.1.6 5.1.7 6.1.6 6.1.7 6.1.8 6.1.9

2.1.11 2.1.12 2.1.13 2.1.14 3.1.11 3.1.12 3.1.13 3.1.14 3.1.15 4.1.11 4.1.123.1.16

2.2 3.2 4.2 5.2 6.2 2.2.1 2.2.2 3.2.1 3.2.2 3.2.3 3.2.4 3.2.5 4.2.1 4.2.2 4.2.3 4.2.4 4.2.5 5.2.1 5.2.2 5.2.3 5.2.4 5.2.5 6.2.1 6.2.2 6.2.3 6.2.4 6.2.5

3.2.6 3.2.7 3.2.8 4.2.6 4.2.7 4.2.8 4.2.9 4.2.10 5.2.6 5.2.7 5.2.8 5.2.9 5.2.10 6.2.6 6.2.7 6.2.8 6.2.9 6.2.104.2.11 4.2.12 5.2.11 5.2.12 5.2.13 5.2.14 6.2.11 6.2.12 6.2.13 6.2.14 6.2.15

2.3 3.3 4.3 6.3 2.3.1 2.3.2 2.3.3 2.3.4 2.3.5 3.3.1 3.3.2 3.3.3 3.3.4 3.3.5 4.3.1 4.3.2 4.3.3 4.3.4 4.3.5 6.3.1 6.3.2 6.3.3 6.3.4 6.3.52.3.6 2.3.7 3.3.6 3.3.7 3.3.8 3.3.9 3.3.10 4.3.6 4.3.7 4.3.8 4.3.9 4.3.10 6.3.6 6.3.7 6.3.8 6.3.9

3.3.11 3.3.12 3.3.13 4.3.11 4.3.12

2.4 3.4 4.4 6.4 3.4.1 3.4.2 3.4.3 3.4.4 3.4.5 4.4.1 4.4.2 4.4.3 4.4.4 4.4.53.4.6 3.4.7 3.4.8 3.4.9 3.4.10 4.4.6 4.4.7 4.4.8 4.4.9 4.4.10

3.4.11 4.4.11 4.4.12

2.5 0.1 2.5.1 2.5.2 2.5.3 2.5.4 0.1.1 0.1.2 0.1.3 0.1.4 0.1.5

0.1.6 0.1.7 0.1.8 0.1.9 0.1.10 Navigate down

0.1.11 0.1.12 0.1.13 External link

Link to goals

GridPP3 Goal

2

NGI

LCG

EGEE

To provide UK computing for the Large Hadron Collider

Grid services

Middleware support

ScotGrid

London

GridPP 2+

Network

Data and storage

Hardware procurement

Storage systems

Other experiments

Operations

Security

Front end systems

Resource delivery

NorthGrid

SouthGrid

Planning

Deployment

Outreach

Tier-2 Management ExternalTier-13 4 5 6

11 March 2008 GridPP20: GridPP3 Project Management

Slide 8

Example Experiment metrics

Number LHCb

1.2.1UK share of LHCb production computing

needs

1.2.2 MC production (generation) efficiency

1.2.3T1 MC production (reconstruction, stripping)

efficiency

1.2.4T1 MC/Event user analysis - UK share/

efficiency

1.2.5 T2 data transfer - T2->RAL

1.2.6 T2 data transfer- T2->others (failover?)

1.2.7 T1 data transfer - Incoming

1.2.8 T1 data transfer - Outgoing

1.2.9 T1 data storage : Tape

1.2.10 T1 data storage : Disk

1.2.11 LHCb SAM tests uptime T1

1.2.12 LHCb SAM tests uptime T2

Number ATLAS

1.101 Tier 1 - Available jobs slots for reconstruction

1.102 Tier 2 - Available job slots for group analysis

1.103 Tier 1 - Available job slots for MC production

1.104 Tier 1 - Job success rates in batch system

1.105Tier 1 - Available storage in usable service

classes

1.106Tier 1 - Data reading rates from storage system

to batch farm

1.107Tier 1 - Rates of data movement from tape to

disk for reprocessing.

1.108 Tier 1 - Data availability in storage system.

1.109Tier -1 Data loss per quarter (when not

recoverable)

1.110Tier 1 - Data acceptance from CERN, Tier 1s,

Tier 2s

1.111 Tier 1- MoU service levels

1.112 Tier 2 - Data acceptance from Tier 1

1.113 Tier 2 - Available simulation slots

1.114 Tier 2 - Available analysis slots

11 March 2008 GridPP20: GridPP3 Project Management

Slide 9

Overall operations metrics

Number Title

2.1.1 Fraction of UK sites in Production

2.1.2 Number of supported VOs

2.1.3 Fraction of kSI2k used

2.1.4 GridPP kSI2K Available

2.1.5 GridPP disk storage available

2.1.6 Job failure rates

2.1.7 UK contribution to LHC experiments

2.1.8 UK contribution to non-LHC experiments

2.1.9 Deployment team meetings

2.1.10 UK wide deployment support active

2.1.11 GridPP deployment web-pages up-to-date

2.1.12 Training needs addressed

2.1.13 GridPP helpdesk functioning adequately

2.1.14 Number of sites on VO blacklists

11 March 2008 GridPP20: GridPP3 Project Management

Slide 10

Tier-1 metrics – examples

Number Resource delivery

3.2.1 Tier-1 KSI2K Available to EGEE/LCG

3.2.2 Tier-1 delivering to LCG MoU

3.2.3Fraction of available T1 KSI2K used in

quarter

3.2.4Fraction of available T1 KSI2K used in

quarter

3.2.5 UB schedule implemented and upheld

3.2.6 Time on VO blacklists

3.2.7 Respond to tickets within required time

3.2.8 Job efficiencies

Number Hardware procurement

3.1.1 Disk tender started

3.1.2 Disk delivered

3.1.3 Disk available and in production as per plan

3.1.4 Tape tender started

3.1.5 Tape delivered

3.1.6 Tape available and in production as per plan

3.1.7 CPU tender started

3.1.8 CPU delivered

3.1.9 CPU available and in production as per plan

3.1.10 New machine room migration plan available

3.1.11 New machine room - migration complete

3.1.12New machine room available to accept

hardware

3.1.13 Network upgraded

•Services•Storage

11 March 2008 GridPP20: GridPP3 Project Management

Slide 11

Tier-2 metrics

Number Title

4.x.1 % of promised (by that time) disk available

4.x.2 % of promised (by that time) CPU available

4.x.3Average SAM (SLL page) availability performance over

the last quarter

4.x.4Average SAM (SLL page) reliability performance over the

last quarter

4.x.5 Average SLL ATLAS test performance?

4.x.6 Average SLL disk test performance ?

4.x.7 Amount of CPU delivered

4.x.8 Number of TB of disk used

4.x.9 Number of technical meetings held

4.x.10 Number of management meetings held

4.x.11 Tier-2 delivering to LCG MoU

4.x.12 Quarterly operational performance review

11 March 2008 GridPP20: GridPP3 Project Management

Slide 12

Risk registerID Name

Li Im Risk Li Im Risk Li Im Risk Li Im Risk Li Im RiskR1 Recruitment/retention difficulties 2 2 4 2 2 4 2 2 4 2 2 4R2 Sudden loss of key staff 1 3 3 1 3 3 1 3 3 1 4 4R3 Minimal Contingency 4 2 8R4 GridPP deliverables late 1 3 3 2 3 6 2 2 4R5 Sub-components not delivered to project 1 2 2 2 3 6 3 3 9 2 3 6R6 Non take-up of project results 2 1 2 1 4 4 2 2 4 1 4 4R7 Change in project scope 1 1 1 2 2 4R8 Bad publicity 1 3 3 1 3 3 1 3 3 2 3 6R9 External OS dependence 3 1 3R10 External middleware dependence 4 2 8 1 4 4 3 2 6 2 2 4R11 Lack of monitoring of staff 1 2 2 2 2 4 2 2 4 1 3 3R12 Withdrawal of an experiment 2 3 6 1 4 4R13 Lack of cooperation between Tier centres 2 2 4 1 3 3R14 Scalablity problems 1 2 2 2 2 4R15 Software maintainability problems 2 2 4 2 3 6 4 3 12 1 4 4R16 Technology shifts 1 2 2 2 3 6 2 3 6R17 Repitition of research 3 2 6R18 Lack of funding to meet LCG PH-1 goals 4 1 4R20 Conflicting software requirements 3 2 6 2 3 6R22 Hardware resources inadequate 2 3 6 2 3 6 2 3 6R25 Hardware procurement problems 2 2 4 2 3 6R26 LAN Bottlenecks 1 3 3R27 Tier-2 organisation fails 2 2 4

R28 Experiment Requirements not met 2 3 6R29 SYSMAN effort inadequate 2 3 6R30 Firewalls interfere with Grid 2 3 6R31 Inablility to establish trust relationshipsR32 Security inadequate to operate Grid 2 3 6R33 Interoperability 2 3 6R35 Failure of international cooperation 2 1 2R36 e-Science and GridPP divergence 2 3 6R37 Institutes do not embrace Grid 2 2 4R38 Grid does not work as required 4 2 8 4 2 8R39 Delay of the LHC 2 2 4R40 Lack of future funding 2 3 6 3 3 9 4 3 12R41 Network backbone failure 0 4 1R42 Network backbone bottleneck 2 2 4R43 Network backbone upgrade delay 1 4 4R44 Inadequate User Support 2 3 6

Pro. GridGridPP LCG MSN Apps

11 March 2008 GridPP20: GridPP3 Project Management

Slide 13

Reporting

Project Manager

User BoardChair

Tier-1Manager

ProductionManager

Technical director

ATLAS

LHCb

CMS Tier-1Staff

Tier-2 Coordinators

Tier-2HardwareSupportPosts

Storage

Data

Info.Mon.

WLMS

Security

Network

CBPMB

OC

Portal

Other expts

User support

Expt.support

11 March 2008 GridPP20: GridPP3 Project Management

Slide 14

Quarterly reports

• Produced by manager in each area• Reporting on progress in the quarter, including:

– Effort figures– Resources delivered– Service levels– Metrics and milestones– Issues arising

• Expected 1 month after the end of each quarter

11 March 2008 GridPP20: GridPP3 Project Management

Slide 15


Recommended