+ All Categories
Home > Documents > LCG Denis Linglin - 1 MàJ : 9/02/03 07:24 LHC Computing Grid Project Status Report 12 February...

LCG Denis Linglin - 1 MàJ : 9/02/03 07:24 LHC Computing Grid Project Status Report 12 February...

Date post: 14-Jan-2016
Category:
Upload: john-palmer
View: 213 times
Download: 0 times
Share this document with a friend
39
MàJ : 9/02/03 07:24 LCG Denis Linglin - 1 LHC Computing Grid Project Status Report 12 February 2003
Transcript
Page 1: LCG Denis Linglin - 1 MàJ : 9/02/03 07:24 LHC Computing Grid Project Status Report 12 February 2003.

MàJ : 9/02/03 07:24

LCG

Denis Linglin - 1

LHC Computing Grid Project

Status Report

12 February 2003

Page 2: LCG Denis Linglin - 1 MàJ : 9/02/03 07:24 LHC Computing Grid Project Status Report 12 February 2003.

MàJ : 9/02/03 07:24

LCG

Denis Linglin - 2

Project Goals

• applications – environment, common tools, frameworks, persistency, ..

• computing system – data recording, reconstruction, managed storage

(CERN)– global grid service of collaborating computer centres– global analysis environment

• central role of data challenges – deploy & evolve– experience confidence

Goal –Prepare and deploy the LHC computing environmentto help the experiments’ analyse the data coming from the detectors

Page 3: LCG Denis Linglin - 1 MàJ : 9/02/03 07:24 LHC Computing Grid Project Status Report 12 February 2003.

MàJ : 9/02/03 07:24

LCG

Denis Linglin - 3

Two Phases

Phase 1 – 2002-05 -- R&D– Applications - prototyping development– Develop and Operate a Grid Service– Computing Services TDR – July-2005

Phase 2 –2006-08 -- Construction & operation

– Installation, commissioning and operation

of the initial global LHC data analysis Grid

Page 4: LCG Denis Linglin - 1 MàJ : 9/02/03 07:24 LHC Computing Grid Project Status Report 12 February 2003.

MàJ : 9/02/03 07:24

LCG

Denis Linglin - 4

Requirements &Implementation

• SC2 brings together the Four Experiments and Tier 1 Regional Centres

• it identifies common domains and sets requirements for the project– may use an RTAG – Requirements and Technical Assessment

Group– limited scope, two-month lifetime with intermediate report– one member per experiment + experts

• PEB manages the implementation– organizing projects, work packages– coordinating between the Regional Centres– collaborating with Grid projects– organizing grid services

• SC2 approves the work plan, monitors progress

Info from SC2

LHCC Computing RRB

ProjectExecution

Board

Software andComputingCommittee

Overview Board

Page 5: LCG Denis Linglin - 1 MàJ : 9/02/03 07:24 LHC Computing Grid Project Status Report 12 February 2003.

MàJ : 9/02/03 07:24

LCG

Denis Linglin - 5

SC2 Requirements Specificationstatus of RTAGs

– On applications: final report• data persistency apr02• software support process may02• mathematical libraries may02• detector geometry description oct02• Monte Carlo generators oct02• applications architectural blueprint oct02• Detector simulation dec02

– On Fabrics• mass storage requirements may02

– On Grid technology and deployment area• Grid technology use cases jun02• Regional Center categorisation jun02

– Current status of RTAGs (and available reports) on www.cern.ch/lcg/sc2

Info from SC2

Page 6: LCG Denis Linglin - 1 MàJ : 9/02/03 07:24 LHC Computing Grid Project Status Report 12 February 2003.

MàJ : 9/02/03 07:24

LCG

Denis Linglin - 6

Work Planning Status

• High level planning paper prepared and presented to LHCC in July• Level 1 and 2 milestones agreed with LHCC referees – November

2002• PBS/WBS agreed with experiments – December 2002• see www.cern.ch/lcg/peb Planning

• Formal work plans agreed for – Data Persistency (POOL)– Support for the Software Process & Infrastructure (SPI)– Mass Storage– Core software services (SEAL)

• Work plans in preparation:– Mathematical Libraries– Physics Interfaces (PI)

• LHC Global Grid Service– First service definition in preparation February 2002

Info from SC2

Page 7: LCG Denis Linglin - 1 MàJ : 9/02/03 07:24 LHC Computing Grid Project Status Report 12 February 2003.

MàJ : 9/02/03 07:24

LCG

Denis Linglin - 7

LCG Level 1 Milestones

2002 200520042003

Q1 Q2 Q3 Q4 Q1 Q2 Q3 Q4Q1 Q2 Q3 Q4Q1 Q2 Q3 Q4

Hybrid Event Store available for general users

Distributed production using grid services

First Global Grid Service (LCG-1) available

Distributed end-user interactive analysis

Full Persistency Framework

LCG-1 reliability and performance targets

“50% prototype” (LCG-3) available

LHC Global Grid TDR

applications

grid service

launch workshop

Here we

are

Page 8: LCG Denis Linglin - 1 MàJ : 9/02/03 07:24 LHC Computing Grid Project Status Report 12 February 2003.

MàJ : 9/02/03 07:24

LCG

Denis Linglin - 8

LCG Project Implementation

PEB : 4 Areas of Work -• Applications – Torre Wenaus• Grid deployment – Ian Bird• Fabrics – Bernd Panzer• Provision of Grid Technology

– David Foster

LHCC Computing RRB

ProjectExecution

Board

Software andComputingCommittee

Overview Board

Page 9: LCG Denis Linglin - 1 MàJ : 9/02/03 07:24 LHC Computing Grid Project Status Report 12 February 2003.

MàJ : 9/02/03 07:24

LCG

Denis Linglin - 9

Applications Area

Area manager – Torre Wenaus

• Importance of RTAGs to define scope• Open weekly applications area meetings• Software Architects Forum

– process for taking LCG-wide software decisions• Staffing of projects –

– CERN, experiments, other institutes– CERN resources being merged into a single group – EP/SFT

and moving people together in building 32

Page 10: LCG Denis Linglin - 1 MàJ : 9/02/03 07:24 LHC Computing Grid Project Status Report 12 February 2003.

MàJ : 9/02/03 07:24

LCG

Denis Linglin - 10

Simulation

• RTAGs have defined formal requirements for LCG for :– detector geometry description– MC generators– detector simulation

• Support required for both GEANT 4 and FLUKA

• GEANT4– independent collaboration, including HEP institutes, LHC and

other experiments, other sciences– significant LHC related resources (including CERN)– MoU being re-defined now– need to ensure long-term support

– CERN resources will be under the direction of the project– process for agreeing common LHC priorities

Page 11: LCG Denis Linglin - 1 MàJ : 9/02/03 07:24 LHC Computing Grid Project Status Report 12 February 2003.

MàJ : 9/02/03 07:24

LCG

Denis Linglin - 11

Grid Deployment

Area Manager – Ian Bird

• Planning, building, commissioning, operating - -

a stable, reliable, manageable Grid for - -

Data Challenges and the general analysis workload

• Integrating fabrics from many Regional Centres and CERN

Page 12: LCG Denis Linglin - 1 MàJ : 9/02/03 07:24 LHC Computing Grid Project Status Report 12 February 2003.

MàJ : 9/02/03 07:24

LCG

Denis Linglin - 12

Distributed Analysis must work

• CERN will provide the data reconstruction & recording service (Tier 0)-- but only a small part of the analysis capacity

Other Total CERN as Total CERN asTier 0 Tier 1 Total Tier 1 Tier 1 % of Tier 1 Tier 0 + 1 % of total

Tier 0 + 1

Processing (K SI2000) 12,000 8,000 20,000 49,000 57,000 14% 69,000 29%Disk (PetaBytes) 1.1 1.0 2.1 8.7 9.7 10% 10.8 20%Magnetic tape (PetaBytes) 12.3 1.2 13.5 20.3 21.6 6% 33.9 40%

-------------- CERN --------------

Summary of Computing Capacity Required for all LHC Experiments in 2008

current planning for capacity at CERN + principal Regional Centres– 2002: 650 KSI2000 <1% of capacity required in 2008– 2005: 6,600 KSI2000 < 10% of 2008 capacity

KSI2000 at CC-IN2P3 : March 2002 ~190, Nov. 2002 ~275, March 2003 ~700

% CPU (LHC/∑CC-in2p3) = 16% in 2002

Page 13: LCG Denis Linglin - 1 MàJ : 9/02/03 07:24 LHC Computing Grid Project Status Report 12 February 2003.

MàJ : 9/02/03 07:24

LCG

Denis Linglin - 13

Data Challenges in 2002

Page 14: LCG Denis Linglin - 1 MàJ : 9/02/03 07:24 LHC Computing Grid Project Status Report 12 February 2003.

Wisconsin 18%

INFN 18%

IN2P3 10%

RAL 6%UCSD 3%

UFL 5%

HIP 1%

Caltech 4%Moscow

10%

Bristol 3%

FNAL 8%

CERN 15%

IC 6%

Wisconsin 18%

INFN 18%

IN2P3 10%

RAL 6%UCSD 3%

UFL 5%

HIP 1%

Caltech 4%Moscow

10%

Bristol 3%

FNAL 8%

CERN 15%

IC 6%

Most Resources not at CERN (CERN not even biggest Single Resource)

Spring02: CPU Resources

6 million events

~20 sites

Page 15: LCG Denis Linglin - 1 MàJ : 9/02/03 07:24 LHC Computing Grid Project Status Report 12 February 2003.

MàJ : 9/02/03 07:24 [email protected]

grid tools used at 11 sites

Page 16: LCG Denis Linglin - 1 MàJ : 9/02/03 07:24 LHC Computing Grid Project Status Report 12 February 2003.

MàJ : 9/02/03 07:24

LCG

Denis Linglin - 16

Grid Deployment

• Experiments can do (and are doing) their event production using distributed resources with a variety of solutions

– classic distributed production – send jobs to specific sites, simple

bookkeeping– some use of Globus, and some of the HEP Grid tools– other integrated solutions (ALIEN)

• The hard problem for distributed computing is data analysis – ESD and AOD

– chaotic workload– unpredictable data access patterns

this is the problem that the LCG has to solve and this is where Grid technology should really help

Page 17: LCG Denis Linglin - 1 MàJ : 9/02/03 07:24 LHC Computing Grid Project Status Report 12 February 2003.

MàJ : 9/02/03 07:24

LCG

Denis Linglin - 17

Deploying the LHC Grid

• The priority for 2003 is to move

from testbeds to a SERVICE

• We need to learn how to OPERATE a Grid

Service Quality and Reliability

are as important as functionality

Page 18: LCG Denis Linglin - 1 MàJ : 9/02/03 07:24 LHC Computing Grid Project Status Report 12 February 2003.

MàJ : 9/02/03 07:24

LCG

Denis Linglin - 18

Grid Deployment Board

• Grid Deployment Board – chair Mirco Mazzucato– representatives from the experiments and from

each country with an active Regional Centre taking part in the LCG Grid Service

– forges the agreements, takes the decisions, defines the standards and policies that are needed to set up and manage the LCG Global Grid Services

– coordinates the planning of resources for physics and computing data challenges

• First meeting 4 October in Milano

• First task is the detailed definition of LCG-1, the initial LCG Global Grid Service

Page 19: LCG Denis Linglin - 1 MàJ : 9/02/03 07:24 LHC Computing Grid Project Status Report 12 February 2003.

MàJ : 9/02/03 07:24

LCG

Denis Linglin - 19

Grid Deployment - The Strategy

Get a basic grid service into production so that we know what works, what doesn’t, what the priorities are

And evolve from there to the full LHC service

• Agree on a common set of middleware to be used for the first LCG grid service – LCG-1

• target - full definition of LCG-1 by February 2003 - LCG-1 in operation mid-2003

- LCG-1 in full service by end of 2003

• this will be conservative – stability before functionalityand will not satisfy all of the HEPCAL requirements

• but must be sufficient for the data challenges scheduled in 2004

Page 20: LCG Denis Linglin - 1 MàJ : 9/02/03 07:24 LHC Computing Grid Project Status Report 12 February 2003.

MàJ : 9/02/03 07:24

LCG

Denis Linglin - 20

Centres taking part in the LCG-1

around the world around the clock

Page 21: LCG Denis Linglin - 1 MàJ : 9/02/03 07:24 LHC Computing Grid Project Status Report 12 February 2003.

MàJ : 9/02/03 07:24

LCG

Denis Linglin - 21

Centres taking part in LCG-1Centres that have declared resources – Dec. 2002

Tier 0 • CERNTier 1 Centres• Brookhaven National Lab • CNAF Bologna• Fermilab• FZK Karlsruhe • IN2P3 Lyon• Rutherford Appleton Lab

(UK)• University of Tokyo• CERN

Other Centres• Academica Sinica (Taipei)• Barcelona• Caltech• GSI Darmstadt• Italian Tier 2s(Torino, Milano,

Legnaro)• Manno (Switzerland)• Moscow State University• NIKHEF Amsterdam• Ohio Supercomputing Centre• Sweden (NorduGrid)• Tata Institute (India)• Triumf (Canada)• UCSD• UK Tier 2s• University of Florida– Gainesville • University of Prague• ……

Page 22: LCG Denis Linglin - 1 MàJ : 9/02/03 07:24 LHC Computing Grid Project Status Report 12 February 2003.

MàJ : 9/02/03 07:24

LCG

Denis Linglin - 22

LCG-1 as a service for LHC experiments

• Mid-2003– 5-10 of the larger regional centres– available as one of the services used for simulation

campaigns• 2H03

– add more capacity at operational regional centres– add more regional centres– activate operations centre, user support infrastructure

• Early 2004– principal service for physics data challenges

Grid Technology in LCGLCG expects to obtain Grid Technology, along with maintenance and

support, from projects funded by national and regional e-science initiatives -- and, later, from industry

Page 23: LCG Denis Linglin - 1 MàJ : 9/02/03 07:24 LHC Computing Grid Project Status Report 12 February 2003.

MàJ : 9/02/03 07:24

LCG

Denis Linglin - 23

Grid Technology in LCG

Coordination by the project CTO – David FosterThis area of the project is concerned with • ensuring that the LCG requirements are known to current and potential

Grid projects• active lobbying for suitable solutions – influencing plans and priorities• evaluating potential solutions• negotiating support for tools developed by Grid projects• developing a plan to supply solutions that do not emerge from other

sources

BUT this must be done with caution – important to avoid HEP-SPECIAL solutionsimportant to migrate to standards as they

emerge

(avoid emotional attachment to prototypes)

Page 24: LCG Denis Linglin - 1 MàJ : 9/02/03 07:24 LHC Computing Grid Project Status Report 12 February 2003.

MàJ : 9/02/03 07:24

LCG

Denis Linglin - 24

Grid Technology Status

• A base set of requirements has been defined (HEPCAL, HEP common application layer) :– 43 use cases– ~2/3 of which should be satisfied ~2003 by currently

funded projects• Good experience of working with Grid projects in Europe

and the United States• Practical results from testbeds used for physics simulation

campaigns• GLUE initiative – has shown how to integrate the EDG and

VDT toolkits• An initial agreement is being made on a joint toolkit for

LCG-1

Page 25: LCG Denis Linglin - 1 MàJ : 9/02/03 07:24 LHC Computing Grid Project Status Report 12 February 2003.

MàJ : 9/02/03 07:24

LCG

Denis Linglin - 25

Grid Technology Status

• We are still solving basic reliability & functionality problems– This is worrying as we still have a long way to go to get to a

solid service– At end 2002, a solid service in mid-2003 looks (surprisingly)

ambitious• HEP needs to limit divergence in developments.

– Complexity adds cost• We have not yet addressed system level issues

– How to manage and maintain the Grid as a system providing a high-quality reliable service.

– Few tools and treatment in current developments of problem determination, error recovery, fault tolerance etc.

• Some of the advanced functionality we will need is only being thought about now

– Comprehensive data management, SLA’s, reservation schemes, interactive use.

• Many many initiatives are underway and more are coming

How do we manage the complexity of all this ?

Page 26: LCG Denis Linglin - 1 MàJ : 9/02/03 07:24 LHC Computing Grid Project Status Report 12 February 2003.

MàJ : 9/02/03 07:24

LCG

Denis Linglin - 26

Establishing Priorities

• We need to create a basic infrastructure that works well.– LHC needs a systems architecture and high-

quality middleware – reliable and fault tolerant.– Tools for systems administration.– Focus on mainline physics requirements and

robust data handling.– Simple end-user tools that deal with the

complexity.• Need to look at the overall picture of what we are

trying to do and focus resources on key priority developments

We must simplify and make the simple things work well. It is easy to expand scope, much harder to contract it !

Page 27: LCG Denis Linglin - 1 MàJ : 9/02/03 07:24 LHC Computing Grid Project Status Report 12 February 2003.

MàJ : 9/02/03 07:24

LCG

Denis Linglin - 27

Grid Technology – Next Steps

• leverage the considerable investments being made– proposals being prepared for EU 6th Framework

Programme, NSF-DoE funding round, various national science infrastructure funding opportunities

• priority target:hardening/re-engineering of current prototypes

with correctly funded maintenance and support

• but - expect several major architectural changes before things mature

Page 28: LCG Denis Linglin - 1 MàJ : 9/02/03 07:24 LHC Computing Grid Project Status Report 12 February 2003.

MàJ : 9/02/03 07:24

LCG

Denis Linglin - 28

Target for the end of the decade

LHC data analysis using

“global collaborative environments integrating large-scale, globally distributed computational systems and complex data collections linking tens of thousands of computers and hundreds of terabytes of storage”

The researchers concentrating on science, unaware of the details and complexity of the environment they are exploiting

Success will be when the scientist does not mention the Grid

Page 29: LCG Denis Linglin - 1 MàJ : 9/02/03 07:24 LHC Computing Grid Project Status Report 12 February 2003.

MàJ : 9/02/03 07:24

LCG

Denis Linglin - 29

A few things to keep in mind

A global grid infrastructure needs a coordinated management structure

Middleware for a global infrastructure –– International development programme– World-wide support & maintenance– Regional and national sensitivities

Avoid HEP specials– Basic middleware for global science – not just for HEP– Plan for convergence with industrial solutions

Collaborative, complementary development projects– partnership of computer science, software engineering,

scientists– funding from multiple agencies – national, regional, ..

Page 30: LCG Denis Linglin - 1 MàJ : 9/02/03 07:24 LHC Computing Grid Project Status Report 12 February 2003.

MàJ : 9/02/03 07:24

LCG

Denis Linglin - 30

Grid Technology Summary• many R&D projects funded

– to develop and demonstrate middleware– limited duration – many already in mid-life

• excellent initial experience– shows the potential for science grids– has given a lot of insight– but – we are understanding that this is very hard to do

• consolidation of the results and coordination of future efforts

is now needed to build a solution for LHC

• a priority now is to –– harden/re-implement the current prototypes and pilot products– understand support issues– add the essential missing features for a production environment

– that were not part of the R&D projects

Page 31: LCG Denis Linglin - 1 MàJ : 9/02/03 07:24 LHC Computing Grid Project Status Report 12 February 2003.

MàJ : 9/02/03 07:24

LCG

Denis Linglin - 31

Fabric AreaArea Manager – Bernd Panzer • CERN Tier 0+1 centre

– high performance data recording– automated systems management & operation– integration in LHC Grid

• Tier 1,2 centre collaboration– develop/share experience on installing and operating a Grid– exchange information on planning and experience of large fabric

management– look for areas for collaboration and cooperation– use HEPiX as the communications forum

• Technology tracking & costing– new technology assessment (PASTA III) just completed (Feb 03)– re-costing of Phase II will be done 1H03 in light of

• PASTA III• re-assessment of experiment trigger rates, event sizes (LHCC)• but no significant re-assessment of the analysis model

Page 32: LCG Denis Linglin - 1 MàJ : 9/02/03 07:24 LHC Computing Grid Project Status Report 12 February 2003.

MàJ : 9/02/03 07:24

LCG

Denis Linglin - 32

Mass Storage Requirements

• Current mass storage requirements defined by ALICE for high performance data recording

350 MB/sec 2002 750 MB/sec 2005 1.2 GB/sec in

2008

• Attempt to define requirements for mass storage support for analysis stalled – – analysis model not clear enough– worrying for Tier 1 centres

Page 33: LCG Denis Linglin - 1 MàJ : 9/02/03 07:24 LHC Computing Grid Project Status Report 12 February 2003.

MàJ : 9/02/03 07:24

LCG

Denis Linglin - 33

Resources in Regional Centres

• Estimates of resources in Regional Centres being gathered by Grid Deployment Board

• Expect to be complete this month

• Then we will compare with Data Challenge requirements

• Delivery efficiency is a key factor – hard to estimate at present

Page 34: LCG Denis Linglin - 1 MàJ : 9/02/03 07:24 LHC Computing Grid Project Status Report 12 February 2003.

MàJ : 9/02/03 07:24

LCG

Denis Linglin - 34

Resources at CERN

Page 35: LCG Denis Linglin - 1 MàJ : 9/02/03 07:24 LHC Computing Grid Project Status Report 12 February 2003.

MàJ : 9/02/03 07:24

LCG

Denis Linglin - 35

LCG Phase 1 - Externally-Funded Personnel Profile at CERN

0

10

20

30

40

50

60

70

2001 2002 2003 2004 2005Years

FT

E *

Wei

gh

ted

by

exp

erie

nce

EU

USA

CERNMat

Sweden

Israel

Hungary

Portugal

Switzerland

Spain

France

Germany

Italy

UK

-60

-40

-20

0

20

40

60

80

2002 2003 2004 2005

RequestedCommittedCumulative Balance

FTE

Years

LCG

Page 36: LCG Denis Linglin - 1 MàJ : 9/02/03 07:24 LHC Computing Grid Project Status Report 12 February 2003.

MàJ : 9/02/03 07:24

LCG

Denis Linglin - 36

Engineering and Control Systems

Infrastructure (non -physics )

CC preparation

Prototype Tier 0 +1

Tier 0 +1 installation,

commissioning and operation

Short term staff for Phase 2

Staff for Tier 0/1 20 FTE

0

10

20

30

40

50

60

2002 2003 2004 2005 2006 2007 2008 2009 2010

Year

MCHF

External Income + MTPMedium Term Plan

Infrastructure (LHC experiments )

Production Computing

(LEP/Fixed Target )

Production Computing (LHC Experiments )

Computing Materials at CERNInfrastructure + Physics

LCG

Page 37: LCG Denis Linglin - 1 MàJ : 9/02/03 07:24 LHC Computing Grid Project Status Report 12 February 2003.

MàJ : 9/02/03 07:24

LCG

Denis Linglin - 37

Challenges - I

General background -• Complexity of the project – Regional Centres, Grid

projects, experiments, funding sources and funding motivation

• The project is operating in an environment where –– there is already a great deal of activity –

applications software, data challenges, grid testbeds

– requirements are changing as understanding and experience develop

• Fundamental technologies are evolving independently of the project and LHC

Page 38: LCG Denis Linglin - 1 MàJ : 9/02/03 07:24 LHC Computing Grid Project Status Report 12 February 2003.

MàJ : 9/02/03 07:24

LCG

Denis Linglin - 38

Challenges - II

Going well -• Obtaining agreement on common requirements between

the LHC experiments• Integrating all of the players in implementation teams

– CERN staff and visitors, experiments, other institutes• Resources in Regional Centres

– but we need to understand delivery efficiency

Going reasonably well -• Influence on external projects to which LCG supplies

resources – GEANT4, ROOT• Influence on grid projects and evolution

Page 39: LCG Denis Linglin - 1 MàJ : 9/02/03 07:24 LHC Computing Grid Project Status Report 12 February 2003.

MàJ : 9/02/03 07:24

LCG

Denis Linglin - 39

Challenges - III

Still in question - • Production quality service on a Grid - harder than it looks

– Proceed with caution - realistic targets– Urgent to establish how well middleware works, get

suppliers focused on support, stability• Grids imply operation and management by the community –

evolution from empires to a federation

• We are a long way from demonstrating that we can do effective ESD analysis on a Grid


Recommended