+ All Categories
Home > Documents > The LHC Computing Grid – February 2008 CERN’s use of gLite Dr Markus Schulz LCG Deployment...

The LHC Computing Grid – February 2008 CERN’s use of gLite Dr Markus Schulz LCG Deployment...

Date post: 18-Jan-2016
Category:
Upload: raymond-wilson
View: 216 times
Download: 0 times
Share this document with a friend
Popular Tags:
87
The LHC Computing Grid – February 2008 CERN’s use of gLite Dr Markus Schulz LCG Deployment Leader 24 April 2008 4 th Black Forest Grid Workshop Markus Schulz, CERN, IT Department
Transcript
Page 1: The LHC Computing Grid – February 2008 CERN’s use of gLite Dr Markus Schulz LCG Deployment Leader 24 April 2008 4 th Black Forest Grid Workshop Markus.

The LHC Computing Grid – February 2008

CERN’s use of gLite

Dr Markus Schulz

LCG Deployment Leader

24 April 2008

4th Black Forest Grid Workshop

Markus Schulz, CERN, IT Department

Page 2: The LHC Computing Grid – February 2008 CERN’s use of gLite Dr Markus Schulz LCG Deployment Leader 24 April 2008 4 th Black Forest Grid Workshop Markus.

CERN IT Department

CH-1211 Genève 23

Switzerlandwww.cern.ch/it

Outline

• CERN• LHC the computing challenge

– Data rate, computing , community

• Grid Projects @ CERN– WLCG, EGEE

• gLite Middleware – Code Base– Software life cycle

• Outlook and summary

Markus Schulz, CERN, IT Department

Page 3: The LHC Computing Grid – February 2008 CERN’s use of gLite Dr Markus Schulz LCG Deployment Leader 24 April 2008 4 th Black Forest Grid Workshop Markus.

CERN IT Department

CH-1211 Genève 23

Switzerlandwww.cern.ch/it

The LHC Computing Challenge

Signal/Noise 10-9

Data volume High rate * large number of

channels * 4 experiments 15 PetaBytes of new data each

year Compute power

Event complexity * Nb. events * thousands users

200 k of (today's) fastest CPUs Worldwide analysis & funding

Computing funding locally in major regions & countries

Efficient analysis everywhere GRID technology

Page 4: The LHC Computing Grid – February 2008 CERN’s use of gLite Dr Markus Schulz LCG Deployment Leader 24 April 2008 4 th Black Forest Grid Workshop Markus.

CERN IT Department

CH-1211 Genève 23

Switzerlandwww.cern.ch/it

Flow to the CERN Computer Center

Markus Schulz, CERN, IT Department

10Gbit 10Gbit

10Gbit10Gbit

Page 5: The LHC Computing Grid – February 2008 CERN’s use of gLite Dr Markus Schulz LCG Deployment Leader 24 April 2008 4 th Black Forest Grid Workshop Markus.

CERN IT Department

CH-1211 Genève 23

Switzerlandwww.cern.ch/it

Flow out of the center

Markus Schulz, CERN, IT Department

Page 6: The LHC Computing Grid – February 2008 CERN’s use of gLite Dr Markus Schulz LCG Deployment Leader 24 April 2008 4 th Black Forest Grid Workshop Markus.

CERN IT Department

CH-1211 Genève 23

Switzerlandwww.cern.ch/it

LHC Computing Grid project (LCG)

• Dedicated 10Gbit links between the T0 and each T1 center

Page 7: The LHC Computing Grid – February 2008 CERN’s use of gLite Dr Markus Schulz LCG Deployment Leader 24 April 2008 4 th Black Forest Grid Workshop Markus.

LCG Service HierarchyTier-0: the accelerator centre• Data acquisition & initial processing• Long-term data curation• Distribution of data Tier-1 centres

Canada – Triumf (Vancouver)France – IN2P3 (Lyon)Germany – Forschunszentrum KarlsruheItaly – CNAF (Bologna)Netherlands – NIKHEF/SARA (Amsterdam)Nordic countries – distributed Tier-1

Spain – PIC (Barcelona)Taiwan – Academia SInica (Taipei)UK – CLRC (Oxford)US – FermiLab (Illinois) – Brookhaven (NY)

Tier-1: “online” to the data acquisition process high availability

• Managed Mass Storage – grid-enabled data service

• Data-heavy analysis• National, regional support

Tier-2: ~200 centres in ~35 countries• Simulation• End-user analysis – batch and interactive

Page 8: The LHC Computing Grid – February 2008 CERN’s use of gLite Dr Markus Schulz LCG Deployment Leader 24 April 2008 4 th Black Forest Grid Workshop Markus.

CERN IT Department

CH-1211 Genève 23

Switzerlandwww.cern.ch/it

LHC DATA ANALYSIS

HEP code key characteristics • modest memory requirements

• 2GB/job• performs well on PCs • independent events

trivial parallelism• large data collections (TB PB)• shared by very large user

collaborations

For all four experiments• ~15 PetaBytes per year• ~200K processor cores • > 6,000 scientists & engineers

Page 9: The LHC Computing Grid – February 2008 CERN’s use of gLite Dr Markus Schulz LCG Deployment Leader 24 April 2008 4 th Black Forest Grid Workshop Markus.

CERN IT Department

CH-1211 Genève 23

Switzerlandwww.cern.ch/it

LHC Computing Multi-science

• 1999 - MONARC project – First LHC computing architecture – hierarchicaldistributed model

• 2000 – growing interest in grid technology– HEP community main driver in launching the DataGrid project

• 2001-2004 - EU DataGrid project– middleware & testbed for an operational grid

• 2002-2005 – LHC Computing Grid – LCG– deploying the results of DataGrid to provide aproduction facility for LHC experiments

• 2004-2006 – EU EGEE project phase 1– starts from the LCG grid– shared production infrastructure– expanding to other communities and sciences

• 2006-2008 – EU EGEE project phase 2– expanding to other communities and sciences– Scale and stability– Interoperations/Interoperability

• 2008-2010 – EU EGEE project phase 3– More communities– Efficient operations– Less central coordination

CERN

Page 10: The LHC Computing Grid – February 2008 CERN’s use of gLite Dr Markus Schulz LCG Deployment Leader 24 April 2008 4 th Black Forest Grid Workshop Markus.

CERN IT Department

CH-1211 Genève 23

Switzerlandwww.cern.ch/it

WLCG Collaboration• The Collaboration

– 4 LHC experiments– ~250 computing centres– 12 large centres

(Tier-0, Tier-1)– 38 federations of smaller

“Tier-2” centres– Growing to ~40 countries– Grids: EGEE, OSG, Nordugrid

• Technical Design Reports– WLCG, 4 Experiments: June 2005

• Memorandum of Understanding– Agreed in October 2005

• Resources– 5-year forward look

• Relies on EGEE and OSG – and other regional efforts like NDGF

Page 11: The LHC Computing Grid – February 2008 CERN’s use of gLite Dr Markus Schulz LCG Deployment Leader 24 April 2008 4 th Black Forest Grid Workshop Markus.

The EGEE project

• EGEE– Started in April 2004, now in second phase with 91 partners in

32 countries– 3rd phrase (2008-2010) starts next month

• Objectives– Large-scale, production-quality

grid infrastructure for e-Science – Attracting new resources and

users from industry as well asscience

– Maintain and further improve“gLite” Grid middleware

Markus Schulz, CERN, IT Department

Page 12: The LHC Computing Grid – February 2008 CERN’s use of gLite Dr Markus Schulz LCG Deployment Leader 24 April 2008 4 th Black Forest Grid Workshop Markus.

Enabling Grids for E-sciencE

EGEE-II INFSO-RI-031688

Registered Collaborating Projects

Applicationsimproved services for academia,

industry and the public

Support Actionskey complementary functions

Infrastructuresgeographical or thematic coverage

25 projects have registered as of September 2007: web page

Markus Schulz, CERN, IT Department

Page 13: The LHC Computing Grid – February 2008 CERN’s use of gLite Dr Markus Schulz LCG Deployment Leader 24 April 2008 4 th Black Forest Grid Workshop Markus.

Enabling Grids for E-sciencE

EGEE-II INFSO-RI-031688

Collaborating infrastructures

Markus Schulz, CERN, IT Department

Page 14: The LHC Computing Grid – February 2008 CERN’s use of gLite Dr Markus Schulz LCG Deployment Leader 24 April 2008 4 th Black Forest Grid Workshop Markus.

CERN IT Department

CH-1211 Genève 23

Switzerlandwww.cern.ch/it

Virtual Organizations

6866

58

52

33

18

117

3 2

1 2 5 10 20 50 100 200 500 1000

VO Members

Vir

tua

l O

rga

niz

ati

on

s

201

139

77

42

22

7 7 2

204

151

59

38

207 5 1

1 2 5 10 20 50 100 200

Supporting Sites

Vir

tual

Org

an

izati

on

s

CPUs Storage

Total Users: 5034Affected People: 10200Median members per VO: 18

Total VOs: 204Registered VOs: 116Median sites per VO: 3

Page 15: The LHC Computing Grid – February 2008 CERN’s use of gLite Dr Markus Schulz LCG Deployment Leader 24 April 2008 4 th Black Forest Grid Workshop Markus.

Enabling Grids for E-sciencE

EGEE-II INFSO-RI-031688

ArcheologyAstronomyAstrophysicsCivil ProtectionComp. ChemistryEarth SciencesFinanceFusionGeophysicsHigh Energy PhysicsLife SciencesMultimediaMaterial Sciences…

>250 sites48 countries>50,000 CPUs>20 PetaBytes>10,000 users>200VOs>150,000 jobs/day

Markus Schulz, CERN, IT Department

Page 16: The LHC Computing Grid – February 2008 CERN’s use of gLite Dr Markus Schulz LCG Deployment Leader 24 April 2008 4 th Black Forest Grid Workshop Markus.

Sustainability• Need to prepare for permanent Grid infrastructure in Europe and the

world• Ensure a high quality of service for all user communities• Independent of short project funding cycles• Infrastructure managed in collaboration

with National Grid Initiatives (NGIs)• European Grid Initiative (EGI)• Future of projects like OSG, NorduGrid, ... ?

Page 17: The LHC Computing Grid – February 2008 CERN’s use of gLite Dr Markus Schulz LCG Deployment Leader 24 April 2008 4 th Black Forest Grid Workshop Markus.

CERN IT Department

CH-1211 Genève 23

Switzerlandwww.cern.ch/it

For more information:

Thank you for your kind attention!

www.cern.ch/lcg www.eu-egee.org

www.eu-egi.org/

www.gridcafe.org

www.opensciencegrid.org

Page 18: The LHC Computing Grid – February 2008 CERN’s use of gLite Dr Markus Schulz LCG Deployment Leader 24 April 2008 4 th Black Forest Grid Workshop Markus.

Summary of Computing Resource RequirementsAll experiments - 2008From LCG TDR - June 2005

CERN All Tier-1s All Tier-2s TotalCPU (MSPECint2000s) 25 56 61 142Disk (PetaBytes) 7 31 19 57Tape (PetaBytes) 18 35 53

2001

2002

2003

2004

2005

2006

2007

2008

2009

2010

0

50

100

150

200

250

LHC CPU Capacity - MSI2K

CERN Tier-1 Tier-2

LHC Computing Requirements

Page 19: The LHC Computing Grid – February 2008 CERN’s use of gLite Dr Markus Schulz LCG Deployment Leader 24 April 2008 4 th Black Forest Grid Workshop Markus.

CERN IT Department

CH-1211 Genève 23

Switzerlandwww.cern.ch/it

Grid Activity

Ramp-up needed over next 8 months : factor 5

170K Jobs/day

16000 KSpecInt Years

WLCG ran ~ 44 M jobs in 2007 – workload has continued to increase now at ~ 165k jobs/day

Page 20: The LHC Computing Grid – February 2008 CERN’s use of gLite Dr Markus Schulz LCG Deployment Leader 24 April 2008 4 th Black Forest Grid Workshop Markus.

CERN IT Department

CH-1211 Genève 23

Switzerlandwww.cern.ch/it

October 2007 – CPU Usage

• > 85% of CPU Usage is external to CERN

* NDGF usage for September 2007

*Tier-2s

CERN

Page 21: The LHC Computing Grid – February 2008 CERN’s use of gLite Dr Markus Schulz LCG Deployment Leader 24 April 2008 4 th Black Forest Grid Workshop Markus.

CERN IT Department

CH-1211 Genève 23

Switzerlandwww.cern.ch/it

Tier-2 Sites – October 2007

• 30 sites deliver 75% of the cpu• 30 sites deliver 1%

Page 22: The LHC Computing Grid – February 2008 CERN’s use of gLite Dr Markus Schulz LCG Deployment Leader 24 April 2008 4 th Black Forest Grid Workshop Markus.

CERN IT Department

CH-1211 Genève 23

Switzerlandwww.cern.ch/it

2007 – CERN Tier-1 Data Distribution

Average data rate per day by experiment (Mbytes/sec)

1.5 Gbyte/s

Page 23: The LHC Computing Grid – February 2008 CERN’s use of gLite Dr Markus Schulz LCG Deployment Leader 24 April 2008 4 th Black Forest Grid Workshop Markus.

CERN IT Department

CH-1211 Genève 23

Switzerlandwww.cern.ch/it

Site Reliability

Site ReliabilityTier-2 Sites

83 Tier-2 sites being monitored

Targets – CERN + Tier-1s

BeforeJuly July 07 Dec 07 Avg.last 3

months

Each site 88% 91% 93% 89%

8 best sites 88% 93% 95% 93%

Page 24: The LHC Computing Grid – February 2008 CERN’s use of gLite Dr Markus Schulz LCG Deployment Leader 24 April 2008 4 th Black Forest Grid Workshop Markus.

CERN IT Department

CH-1211 Genève 23

Switzerlandwww.cern.ch/it

Grid Computing at CERN

• Core grid infrastructure services (~300 nodes)– CA, VOMS servers, monitoring hosts, information system, testbeds

• Grid Catalogues– Using ORACLE clusters as backend DB– 20 instances

• Workload management nodes– 16 RBs , 15 WMS (different flavours, not all fully loaded)– 22 CEs (for headroom)

• Worker Nodes (limited by power < 2.5 MW)– LSF managed cluster– 16000 cores, currently adding 12000 cores (2GB/core)– We use node disks only as scratch space for OS installation

• Extensive use of fabric management– Quattor for install and config, Lemon+Leaf for fabric monitoring

Markus Schulz, CERN, IT Department

Page 25: The LHC Computing Grid – February 2008 CERN’s use of gLite Dr Markus Schulz LCG Deployment Leader 24 April 2008 4 th Black Forest Grid Workshop Markus.

CERN IT Department

CH-1211 Genève 23

Switzerlandwww.cern.ch/it

Grid Computing at CERN

• Storage (CASTOR-2)– Disk caches : 5 Pbyte (20k disks) mid 2008 additional 12k disks (16 PB)– Linux boxes with RAID disks– Tape storage: 18 PB (~30k cartidges) – We have to add 10 PB this year ( the robots can be extended)

• 700GB/cartridge

– Why tapes?• still 3 times lower system costs• long time stability is well understood• The gap is closing

• Networking– T0 -> T1 dedicated 10Gbit links– CIXP Internet exchange point for links to T2– Internal: 10Gbit infrastructure

Markus Schulz, CERN, IT Department

Page 26: The LHC Computing Grid – February 2008 CERN’s use of gLite Dr Markus Schulz LCG Deployment Leader 24 April 2008 4 th Black Forest Grid Workshop Markus.

CERN IT Department

CH-1211 Genève 23

Switzerlandwww.cern.ch/it

Markus Schulz, CERN, IT Department

www.glite.org

Page 27: The LHC Computing Grid – February 2008 CERN’s use of gLite Dr Markus Schulz LCG Deployment Leader 24 April 2008 4 th Black Forest Grid Workshop Markus.

CERN IT Department

CH-1211 Genève 23

Switzerlandwww.cern.ch/it

gLite Middleware Distribution• Combines components from different

providers– Condor and Globus (via VDT)– LCG– EDG/EGEE– Others

• After prototyping phases in 2004 and 2005 convergence with LCG-2 distribution reached in May 2006– gLite 3.0– gLite 3.1 ( 2007)

• Focus on providing a deployable MW distribution for EGEE production service

LCG-2

prototyping

prototyping

product

20042004

20052005 product

gLite

20062006 gLite 3.0

Page 28: The LHC Computing Grid – February 2008 CERN’s use of gLite Dr Markus Schulz LCG Deployment Leader 24 April 2008 4 th Black Forest Grid Workshop Markus.

CERN IT Department

CH-1211 Genève 23

Switzerlandwww.cern.ch/it

gLite Services

gLite offers a range of services

Page 29: The LHC Computing Grid – February 2008 CERN’s use of gLite Dr Markus Schulz LCG Deployment Leader 24 April 2008 4 th Black Forest Grid Workshop Markus.

CERN IT Department

CH-1211 Genève 23

Switzerlandwww.cern.ch/it Markus Schulz, CERN, IT Department

Middleware structure• Applications have access both to

Higher-level Grid Services and to Foundation Grid Middleware

• Higher-Level Grid Services are supposed to help the users building their computing infrastructure but should not be mandatory

• Foundation Grid Middleware will be deployed on the EGEE infrastructure– Must be complete and robust– Should allow interoperation with

other major grid infrastructures– Should not assume the use of

Higher-Level Grid ServicesFoundation Grid Middleware Security model and infrastructureComputing (CE) and Storage Elements (SE)AccountingInformation and Monitoring

Higher-Level Grid Services Workload ManagementReplica ManagementVisualizationWorkflowGrid Economies...

Applications

Overview paper http://doc.cern.ch//archive/electronic/egee/tr/egee-tr-2006-001.pdf

Page 30: The LHC Computing Grid – February 2008 CERN’s use of gLite Dr Markus Schulz LCG Deployment Leader 24 April 2008 4 th Black Forest Grid Workshop Markus.

CERN IT Department

CH-1211 Genève 23

Switzerlandwww.cern.ch/it

Workload Management (compact)

Desktops

A few~50 nodes

1-20 per site

1-24000 per site

Page 31: The LHC Computing Grid – February 2008 CERN’s use of gLite Dr Markus Schulz LCG Deployment Leader 24 April 2008 4 th Black Forest Grid Workshop Markus.

CERN IT Department

CH-1211 Genève 23

Switzerlandwww.cern.ch/it

Markus Schulz, CERN, IT Department

Software Process• Introduced new software lifecycle process

– Based on the gLite process and LCG-2 experience

– Components are updated independently – Updates are delivered on a weekly basis to PPS

• Each week either a gLite 3.1 or 3.0 update (if needed)• Move after 2 weeks to production

– Acceptance criteria for new components – Clear link between component versions, Patches and Bugs

• Semi-automatic release notes – Clear prioritization by stakeholders

• TCG for medium term (3-6 months) and EMT for short term goals– Clear definition of roles and responsibilities – Documented in MSA3.2

– In use since July 2006

Page 32: The LHC Computing Grid – February 2008 CERN’s use of gLite Dr Markus Schulz LCG Deployment Leader 24 April 2008 4 th Black Forest Grid Workshop Markus.

CERN IT Department

CH-1211 Genève 23

Switzerlandwww.cern.ch/it

Component based process

Release Day

time

C

Update1

B

Update2

AC

Update3

B

Integration CertificationBuild

Regular release interval

Component A

Component B

Component C

Illustration of

in a component based release process

Update4

Page 33: The LHC Computing Grid – February 2008 CERN’s use of gLite Dr Markus Schulz LCG Deployment Leader 24 April 2008 4 th Black Forest Grid Workshop Markus.

CERN IT Department

CH-1211 Genève 23

Switzerlandwww.cern.ch/it

Markus Schulz, CERN, IT Department

gLite code base

Page 34: The LHC Computing Grid – February 2008 CERN’s use of gLite Dr Markus Schulz LCG Deployment Leader 24 April 2008 4 th Black Forest Grid Workshop Markus.

CERN IT Department

CH-1211 Genève 23

Switzerlandwww.cern.ch/it

Markus Schulz, CERN, IT Department

gLite code details

Page 35: The LHC Computing Grid – February 2008 CERN’s use of gLite Dr Markus Schulz LCG Deployment Leader 24 April 2008 4 th Black Forest Grid Workshop Markus.

CERN IT Department

CH-1211 Genève 23

Switzerlandwww.cern.ch/it

Markus Schulz, CERN, IT Department

gLite code details10K 5K

2K

1K

Page 36: The LHC Computing Grid – February 2008 CERN’s use of gLite Dr Markus Schulz LCG Deployment Leader 24 April 2008 4 th Black Forest Grid Workshop Markus.

CERN IT Department

CH-1211 Genève 23

Switzerlandwww.cern.ch/it

Markus Schulz, CERN, IT Department

gLite code details

2K

The list is not complete. Some components are provided as binaries and are only packaged by the ETICS system

Page 37: The LHC Computing Grid – February 2008 CERN’s use of gLite Dr Markus Schulz LCG Deployment Leader 24 April 2008 4 th Black Forest Grid Workshop Markus.

CERN IT Department

CH-1211 Genève 23

Switzerlandwww.cern.ch/it

Markus Schulz, CERN, IT Department

Complex Dependencies

Page 38: The LHC Computing Grid – February 2008 CERN’s use of gLite Dr Markus Schulz LCG Deployment Leader 24 April 2008 4 th Black Forest Grid Workshop Markus.

CERN IT Department

CH-1211 Genève 23

Switzerlandwww.cern.ch/it

Markus Schulz, CERN, IT Department

Data Management

Page 39: The LHC Computing Grid – February 2008 CERN’s use of gLite Dr Markus Schulz LCG Deployment Leader 24 April 2008 4 th Black Forest Grid Workshop Markus.

CERN IT Department

CH-1211 Genève 23

Switzerlandwww.cern.ch/it

Dependency Challenge

We spent significant resources in streamlining dependencies, goal is improved portability of the code

Page 40: The LHC Computing Grid – February 2008 CERN’s use of gLite Dr Markus Schulz LCG Deployment Leader 24 April 2008 4 th Black Forest Grid Workshop Markus.

CERN IT Department

CH-1211 Genève 23

Switzerlandwww.cern.ch/it

Markus Schulz, CERN, IT Department

Code Stability• Since mid 2006 we processed more than 500

Patches (changes) to gLite middleware• Covering two release branches

06 /

2006

07 /

2006

08 /

2006

09 /

2006

10 /

2006

11 /

2006

12 /

2006

01 /

2007

02 /

2007

03 /

2007

04 /

2007

05 /

2007

06 /

2007

07 /

2007

08 /

2007

09 /

2007

10 /

2007

11 /

2007

12 /

2007

01 /

2008

0

5

10

15

20

25

30

35

40

45

50

gLite 3.1

gLite 3.0

Page 41: The LHC Computing Grid – February 2008 CERN’s use of gLite Dr Markus Schulz LCG Deployment Leader 24 April 2008 4 th Black Forest Grid Workshop Markus.

CERN IT Department

CH-1211 Genève 23

Switzerlandwww.cern.ch/it

Markus Schulz, CERN, IT Department

The Process is monitored• To spot problems and manage resources

Page 42: The LHC Computing Grid – February 2008 CERN’s use of gLite Dr Markus Schulz LCG Deployment Leader 24 April 2008 4 th Black Forest Grid Workshop Markus.

CERN IT Department

CH-1211 Genève 23

Switzerlandwww.cern.ch/it

Markus Schulz, CERN, IT Department

Code Stability: Bugs• Bugs….

Almost constant rate

Page 43: The LHC Computing Grid – February 2008 CERN’s use of gLite Dr Markus Schulz LCG Deployment Leader 24 April 2008 4 th Black Forest Grid Workshop Markus.

CERN IT Department

CH-1211 Genève 23

Switzerlandwww.cern.ch/it

Markus Schulz, CERN, IT Department

Code Stability• We are improving

Page 44: The LHC Computing Grid – February 2008 CERN’s use of gLite Dr Markus Schulz LCG Deployment Leader 24 April 2008 4 th Black Forest Grid Workshop Markus.

CERN IT Department

CH-1211 Genève 23

Switzerlandwww.cern.ch/it

Experience Gained• Middleware is almost irrelevant

– As long as some minimal functionality is provided• Authentication, Authorization, Accounting, robustness, scalability• Leave complex services to user communities

• Distributed Operations is the challenge– Policies, Security, Monitoring, User Support…….– Upgrades take months…

Number of tickets processed by GGUS

0

200

400

600

800

1000

1200

1400

1600

Apr-06 May-06 Jun-06 Jul-06 Aug-06 Sep-06 Oct-06 Nov-06 Dec-06 Jan-07 Feb-07 Mar-07

Date

Nu

mb

er

CoD ENOC Others AllNo. Tickets Processed

Operations Network User All

1300 tickets/month

Page 45: The LHC Computing Grid – February 2008 CERN’s use of gLite Dr Markus Schulz LCG Deployment Leader 24 April 2008 4 th Black Forest Grid Workshop Markus.

CERN IT Department

CH-1211 Genève 23

Switzerlandwww.cern.ch/it

Experience Gained• We are victims of our own success

– Moved prototypes into production very early• Complex software stack

– Didn’t understood the right split ( lightweight WNs)• Hard to port to other platforms

– Since we have users we can only slowly migrate to new approaches (standards)

• Interoperability/Interoperation– Is absolutely a key area

• All middleware that is sufficiently fit will survive– OSG 100%– Prototypes: ARC, Unicore, Naregi

• Standards have to follow practice– OGF only recently agrees to this concept

Page 46: The LHC Computing Grid – February 2008 CERN’s use of gLite Dr Markus Schulz LCG Deployment Leader 24 April 2008 4 th Black Forest Grid Workshop Markus.

CERN IT Department

CH-1211 Genève 23

Switzerlandwww.cern.ch/it

Summary• The core of a grid production infrastructure for LHC

computing is available• In the next two years the focus will be on:

– Scaling (jobs and data management)• Handle chaotic use cases (analysis)• Data management for analysis

– Monitoring to improve:• Availability and Reliability• Automate more (reduce operations cost)

– Interoperation• Lessons Learned:

– Operations is the hardest part in grid computing– A strict, monitored software lifecycle is essential– Keep the software as simple as possible– You can’t monitor too much

Page 47: The LHC Computing Grid – February 2008 CERN’s use of gLite Dr Markus Schulz LCG Deployment Leader 24 April 2008 4 th Black Forest Grid Workshop Markus.

CERN IT Department

CH-1211 Genève 23

Switzerlandwww.cern.ch/it

Enabling Grids for E-sciencE

EGEE-II INFSO-RI-031688 LHCC Comprehensive Review; 19-20 November 2007

GridMap Prototype Visualization

Metric selection for colour of rectangles

Show SAM status

Show GridView availability data

Grid topology view (grouping)

Metric selection for size of rectangles

VO selection

Overall Site or Site Service selection

Link: http://gridmap.cern.ch Drilldown into region by clicking on the title

Context sensitive information

Colour KeyDescription of current view

Page 48: The LHC Computing Grid – February 2008 CERN’s use of gLite Dr Markus Schulz LCG Deployment Leader 24 April 2008 4 th Black Forest Grid Workshop Markus.

CERN IT Department

CH-1211 Genève 23

Switzerlandwww.cern.ch/it

Markus Schulz, CERN, IT Department

www.glite.org Middleware components

Page 49: The LHC Computing Grid – February 2008 CERN’s use of gLite Dr Markus Schulz LCG Deployment Leader 24 April 2008 4 th Black Forest Grid Workshop Markus.

CERN IT Department

CH-1211 Genève 23

Switzerlandwww.cern.ch/it

Markus Schulz, CERN, IT Department

Standards• EGEE needs to interoperate with other infrastructures:

– To provide users with the ability to access resources available on collaborating infrastructures

• The best solution is to have common interfaces through the development and adoption of standards.

• The gLite reference forum for standardization activities is the Open Grid Forum– Many contributions (e.g. OGSA-AUTH, BES, JSDL, new GLUE-

WG, UR, RUS, SAGA, INFOD, NM, …)• Problems:

– Infrastructures are already in production– Standards are still in evolution and often

underspecified• OGF-GIN follows a pragmatic approach

– balance between application needs vs. technology push

52

GIN

Page 50: The LHC Computing Grid – February 2008 CERN’s use of gLite Dr Markus Schulz LCG Deployment Leader 24 April 2008 4 th Black Forest Grid Workshop Markus.

CERN IT Department

CH-1211 Genève 23

Switzerlandwww.cern.ch/it

Authentication • gLite authentication is based on X.509 PKI

– Certificate Authorities (CA) issue (long lived) certificates identifying individuals (much like a passport)

• Commonly used in web browsers to authenticate to sites– Trust between CAs and sites is established (offline)– In order to reduce vulnerability, on the Grid user

identification is done by using (short lived) proxies of their certificates

• Support for Short-Lived Credential Services (SLCS) – issue short lived certificates or proxies to its local users

• e.g. from Kerberos or from Shibboleth credentials (new in EGEE II)

• Proxies can– Be delegated to a service such that it can act on the user’s

behalf– Be stored in an external proxy store (MyProxy) – Be renewed (in case they are about to expire)– Include additional attributes

Page 51: The LHC Computing Grid – February 2008 CERN’s use of gLite Dr Markus Schulz LCG Deployment Leader 24 April 2008 4 th Black Forest Grid Workshop Markus.

CERN IT Department

CH-1211 Genève 23

Switzerlandwww.cern.ch/it

Authorization• VOMS is now a de-facto standard

– Attribute Certificates provide users with additional capabilities defined by the VO.

– Basis for the authorization process • Authorization: currently via mapping to a local

user on the resource– glexec changes the local identity (based on suexec

from Apache)• Designing an authorization service with a

common interface agreed with multiple partners– Uniform implementation of authorization in gLite

services– Easier interoperability with other infrastructures– Prototype being prepared now

Page 52: The LHC Computing Grid – February 2008 CERN’s use of gLite Dr Markus Schulz LCG Deployment Leader 24 April 2008 4 th Black Forest Grid Workshop Markus.

CERN IT Department

CH-1211 Genève 23

Switzerlandwww.cern.ch/it

Common AuthZ interface

SAML-XACML interface

Common SAML XACML library

Site Central: LCAS + LCMAPS

L&L plug-ins

GPBox

LCMAPSplug-in

Site Central: GUMS (+ SAZ)

Common SAML XACML library

glexec

L&L plug-in: SAML-XACML

edg-gk

edg-gridftp

gt4-interface

pre-WS GT4 gk, gridftp, opensshd

Prima + gPlazma: SAML-XACML

GT4 gatekeeper,

gridftp, (opensshd)

dCache

LCAS + LCMAPS

CREAM

Oblg: user001, somegrp<other obligations>

SAML-XACML Query

Q:

R:

map.user.to.some.pool

Pilot job on Worker Node(both EGEE and OSG)

OSG EGEE

Page 53: The LHC Computing Grid – February 2008 CERN’s use of gLite Dr Markus Schulz LCG Deployment Leader 24 April 2008 4 th Black Forest Grid Workshop Markus.

CERN IT Department

CH-1211 Genève 23

Switzerlandwww.cern.ch/it

Information Schema

• gLite is using the GLUE schema (version 1.3)– abstract modeling for Grid resources and mapping to

concrete schemas that can be used in Grid Information Services

– The definition of this schema started in April 2002 as a collaboration effort between EU-DataTAG and US-iVDGL projects

• The GLUE Schema is now an official activity of OGF– Starting points are the Glue Schema 1.3 the

Nordugrid Schema and CIM (used by NAREGI)– Will produce the GLUE 2.0 specifications

Page 54: The LHC Computing Grid – February 2008 CERN’s use of gLite Dr Markus Schulz LCG Deployment Leader 24 April 2008 4 th Black Forest Grid Workshop Markus.

CERN IT Department

CH-1211 Genève 23

Switzerlandwww.cern.ch/it

Markus Schulz, CERN, IT Department

Information System Architecture

SiteBDII

ResourceBDII

SiteBDII

ResourceBDII

Provider Provider ProviderProvider

Query

TopBDII

FCR

DNS Round Robin Alias

ResourceBDII

ResourceBDII

TopBDII

DNS Round Robin Alias

Query

Page 55: The LHC Computing Grid – February 2008 CERN’s use of gLite Dr Markus Schulz LCG Deployment Leader 24 April 2008 4 th Black Forest Grid Workshop Markus.

CERN IT Department

CH-1211 Genève 23

Switzerlandwww.cern.ch/it Markus Schulz, CERN, IT Department

Performance Improvements

0

0,05

0,1

0,15

0,2

0,25

0,3

0,35

0,4

0,45

0,5

18 54 90

parallel Requests

Res

po

nse

Tim

e [s

ec]

SLC3

SLC4

QuadCore SLC4

0,01

0,1

1

10

3 6 9 12 15 18 21 24 27 30 33 36 39 42 45 48 51 54 57 60

parallel Requests

Tim

e [s

ec]

indexed DB

nonindexed DB

Log Scale

!

Page 56: The LHC Computing Grid – February 2008 CERN’s use of gLite Dr Markus Schulz LCG Deployment Leader 24 April 2008 4 th Black Forest Grid Workshop Markus.

CERN IT Department

CH-1211 Genève 23

Switzerlandwww.cern.ch/it

EGEE Data Management

lcg_utilsFTS

Vendor Specific

APIs

GFAL Cataloging Storage Data transfer

Data Management

User ToolsVOFrameworks

(RLS) LFC SRM(Classic

SE)gridft

pRFIO

Information System/Environment Variables

Page 57: The LHC Computing Grid – February 2008 CERN’s use of gLite Dr Markus Schulz LCG Deployment Leader 24 April 2008 4 th Black Forest Grid Workshop Markus.

CERN IT Department

CH-1211 Genève 23

Switzerlandwww.cern.ch/it

Markus Schulz, CERN, IT Department

Storage Element

• Storage Resource Manager (SRM) – Standard that hides the storage system implementation (disk or active

tape)– handles authorization– Web service based on gsoap– translates SURLs (Storage URL) to TURLs (Transfer URLs)– disk-based: DPM, dCache,Storm, BeSTman; tape-based: Castor, dCache– SRM-2.2

• Space tokens (manage space by VO/USER), advanced authorization,• Better handling of directories, lifetime, +++++++

• File I/O: posix-like access from local nodes or the grid GFAL (Grid File Access Layer)

Page 58: The LHC Computing Grid – February 2008 CERN’s use of gLite Dr Markus Schulz LCG Deployment Leader 24 April 2008 4 th Black Forest Grid Workshop Markus.

CERN IT Department

CH-1211 Genève 23

Switzerlandwww.cern.ch/it

SRM basic and use cases tests

61

Page 59: The LHC Computing Grid – February 2008 CERN’s use of gLite Dr Markus Schulz LCG Deployment Leader 24 April 2008 4 th Black Forest Grid Workshop Markus.

CERN IT Department

CH-1211 Genève 23

Switzerlandwww.cern.ch/it

Markus Schulz, CERN, IT Department 62

/vo

Storage Element - DPM/dpm

/domain/home

DPMheadnode

file

(uid, gid)

DPM disk servers

Direct data transfer from/to disk server (no bottleneck)External transfers via gridFTP (de-facto standard)Target: small to medium sites

– One or more disk servers

data transfer

Disk Pool Manager (DPM)– Manages storage on

disk serversUses LFC as local

catalog– Same features for role

based ACLs, etc...

Client

Page 60: The LHC Computing Grid – February 2008 CERN’s use of gLite Dr Markus Schulz LCG Deployment Leader 24 April 2008 4 th Black Forest Grid Workshop Markus.

CERN IT Department

CH-1211 Genève 23

Switzerlandwww.cern.ch/it Markus Schulz, CERN, IT Department

LCG File Catalog• LFC maps LFNs to SURLs

– Logical File Name (LFN): user file name• in VO namespace, aliases supported

– Glbally Unique IDentifier (GUID)• unique string assigned by the system to the

file– Site URL (SURL): identifies a replica

– A Storage Element and the logical name of the file inside it• GSI security: ACLs (based on VOMS)

– To each VOMS group/role corresponds a virtual group identifier– Support for secondary groups

• Web Service query interface: Data Location Interface (DLI)• Hierarchical Namespace• Supports sessions and bulk operations

LFC

GUIDSURL 1

SURL 2

ACL

LFN 1

LFN 2

lfc-ls –l /grid/vo/

/grid/vo/data

fileLFCDLI

lfc-getacl /grid/vo/data

Page 61: The LHC Computing Grid – February 2008 CERN’s use of gLite Dr Markus Schulz LCG Deployment Leader 24 April 2008 4 th Black Forest Grid Workshop Markus.

CERN IT Department

CH-1211 Genève 23

Switzerlandwww.cern.ch/it Markus Schulz, CERN, IT Department

Encrypted Data Storage• Intended for VO’s with very strong security requirements

– e.g. medical community• anonymity (patient data is separate)• fine grained access control (only selected individuals)• privacy (even storage administrator cannot read)

• Interface to DICOM (Digital Image and COmmunication in Medicine)

• Hydra keystore– store keys for data

encryption

• N instances– Decryption works with

subset of stores

AMGAHydra gridftp SRM I/O

DPM

DICOMTrigger 0. retrieve

image

0. storeencrypted

image&ACL

0. storekeys&ACL

0. storepatient

data&ACL

1. patient look-up2. retrievekeys 3. get

TURL

4. read

Page 62: The LHC Computing Grid – February 2008 CERN’s use of gLite Dr Markus Schulz LCG Deployment Leader 24 April 2008 4 th Black Forest Grid Workshop Markus.

CERN IT Department

CH-1211 Genève 23

Switzerlandwww.cern.ch/it

Markus Schulz, CERN, IT Department 65

File Transfer Service• FTS: Reliable, scalable and customizable file transfer

– Multi-VO service, used to balance usage of site resources according to the SLAs agreed between a site and the VOs it supports

– WS interface, support for different user and administrative roles (VOMS)– Manages transfers through channels

• mono-directional network pipes between two sites– File transfers handled as jobs

• Prioritization• Retries in case of failures

– Automatic discovery of services

• Designed to scale up to the transfer needs of very data intensive applications– Demonstrated about 1 GB/s

sustained– Over 9 petabytes transferred

in the last 6 months (> 10 million files)

Page 63: The LHC Computing Grid – February 2008 CERN’s use of gLite Dr Markus Schulz LCG Deployment Leader 24 April 2008 4 th Black Forest Grid Workshop Markus.

CERN IT Department

CH-1211 Genève 23

Switzerlandwww.cern.ch/it

FTS server architecture

• All components are decoupled from each other– Each interacts only with the (Oracle) database

Experiments interact via aweb-service

VO agents do VO-specific operations (1 per VO)

Channel agents do channel specific operation (e.g. the transfers)

Monitoring and statistics can be collected via the DB

Page 64: The LHC Computing Grid – February 2008 CERN’s use of gLite Dr Markus Schulz LCG Deployment Leader 24 April 2008 4 th Black Forest Grid Workshop Markus.

CERN IT Department

CH-1211 Genève 23

Switzerlandwww.cern.ch/it

High Level Services: Catalogues

• File Catalogs– LFC from LCG

• Fully VOMS aware• Readonly replicas• MySQL and ORACLE

– Hydra: stores keys for data encryption• Being interfaced to GFAL (done by December 2007)• Currently only one instance, but in future there will be 3 instances: at

least 2 need to be available for decryption.• Not yet certified in gLite 3.0. Certification will start soon.

– AMGA Metadata Catalog: generic metadata catalogue• Joint JRA1-NA4 (ARDA) development. Used mainly by Biomed• Released for Postgres and ORACLE

Page 65: The LHC Computing Grid – February 2008 CERN’s use of gLite Dr Markus Schulz LCG Deployment Leader 24 April 2008 4 th Black Forest Grid Workshop Markus.

CERN IT Department

CH-1211 Genève 23

Switzerlandwww.cern.ch/it

Markus Schulz, CERN, IT Department

Job Management Services

Computing Element

Storage Element

Site X

Information System

submit

submit

query

retrieve

retrieve

Workload ManagementLogging & Bookkeeping

User Interface

publishstate

File and ReplicaCatalogs

AuthorizationService

query

updatecredential publish

state

discoverservices

Page 66: The LHC Computing Grid – February 2008 CERN’s use of gLite Dr Markus Schulz LCG Deployment Leader 24 April 2008 4 th Black Forest Grid Workshop Markus.

CERN IT Department

CH-1211 Genève 23

Switzerlandwww.cern.ch/it Markus Schulz, CERN, IT Department

Workload Management System• Workload Management System

– Assigns jobs to resources according to user requirements• possibly including data location and user defined ranking of resources

– Handles I/O data (input and output sandboxes)– Supports compound jobs and workflows (Direct Acyclic Graphs)

• One shot submission of a group of jobs, shared input sandbox– Has a Web Service interface: WMProxy

• UI→WMS decoupled form WMS→CE– Supports automatic re-

submissions• Logging&Bookkeeping

– Tracks jobs while they are running

• Job Provenance– Store and retain data on

finished jobs– Provides data mining

capabilities– Allows job re-execution– Prototype! 15000 jo

bs/day

20000 jobs/d

ay

Page 67: The LHC Computing Grid – February 2008 CERN’s use of gLite Dr Markus Schulz LCG Deployment Leader 24 April 2008 4 th Black Forest Grid Workshop Markus.

CERN IT Department

CH-1211 Genève 23

Switzerlandwww.cern.ch/it

Resource Access• Currently the most critical element for interoperation

– Many implementations– Standard is new and underspecified:

• Basic Execution Service (BES) document published on 28/8/2007

• EGEE needs to support the applications running on the infrastructure– Support of the legacy Globus pre-WS service: LCG-CE

• Now being certified on SL4

• In order to improve the interoperability with other infrastructures a new WS-I Compute Element has been developed– CREAM offers direct access to the resource via WSDL and

CLI– Supports JSDL– Collaboration with OMII-EU to implement a BES-compliant

interface• See demos at SC06 (Tampa) and future demo at SC07 (Reno)

Page 68: The LHC Computing Grid – February 2008 CERN’s use of gLite Dr Markus Schulz LCG Deployment Leader 24 April 2008 4 th Black Forest Grid Workshop Markus.

CERN IT Department

CH-1211 Genève 23

Switzerlandwww.cern.ch/it

Compute Element(s)

LCG-CE (GT4 pre-WS GRAM)– Globus client, Condor-G– Globus code not modified

Condor-C (was gLite CE)– Condor-G– Maintained by Condor– VDT includes BLAH & glexec

CREAM (WS-I)– CREAM and BES client, ICE,

Condor-G (prototype)

Condor-G

Globusclient

gLite WMS

User

CREAMCEMon

ICE

CREAM orBES client

EGEE authZ,InfoSys,

AccountingIn production Existing prototype

gLitecomponent

non-gLitecomponent

BatchSystem

LCG-CE(GT4 +

add-ons)Condor-C

BLAH

User / Resource

UI

Sit

e

GIP

Page 69: The LHC Computing Grid – February 2008 CERN’s use of gLite Dr Markus Schulz LCG Deployment Leader 24 April 2008 4 th Black Forest Grid Workshop Markus.

CERN IT Department

CH-1211 Genève 23

Switzerlandwww.cern.ch/it Markus Schulz, CERN, IT Department

CREAM status• CREAM passed

the EGEE accepatance tests– > 90000 jobs in 8

days by 50 simultaneous users

– 111 failures (all LSF errors)

• All missing features (needed to work on the EGEE infrastructure) being implemented now

• Certification started April 2008

~ 6 K jobs on the CE at any time

}

Load phase:~ 1 K jobs/hour

CREAM

Run phase:~10 K jobs/day

Page 70: The LHC Computing Grid – February 2008 CERN’s use of gLite Dr Markus Schulz LCG Deployment Leader 24 April 2008 4 th Black Forest Grid Workshop Markus.

CERN IT Department

CH-1211 Genève 23

Switzerlandwww.cern.ch/it

Coming: support for pilot jobs• Several VOs submit pilot jobs with a single identity for all of the VO

– The pilot job gets the user job when it arrives on the WN and executes it• Just-in-time scheduling. VO policies implemented at the central queue

• Use the same mechanism for changing the identity on the Computing Element also on the Worker Nodes (glexec)– The site may know the identity of the real user

Page 71: The LHC Computing Grid – February 2008 CERN’s use of gLite Dr Markus Schulz LCG Deployment Leader 24 April 2008 4 th Black Forest Grid Workshop Markus.

CERN IT Department

CH-1211 Genève 23

Switzerlandwww.cern.ch/it Markus Schulz, CERN, IT Department

Why Interoperability Matters

PBS/Torque

LSF

Condor

Load Leveler

Sun Grid EngineGRAM

v2

ARCCREAM

NAREGI

UnicoreOSG

GRAMv4

Nordugrid

Naregi

DEISA

EGEE

Teragrid

Large number of batch systems and CEs

Page 72: The LHC Computing Grid – February 2008 CERN’s use of gLite Dr Markus Schulz LCG Deployment Leader 24 April 2008 4 th Black Forest Grid Workshop Markus.

CERN IT Department

CH-1211 Genève 23

Switzerlandwww.cern.ch/it

Why?

• Required common interfaces– We now have multiple ”common” interfaces

• Tried to solve one problem– But we created another

• Reasons:– The infrastructures were developed independently– Initially there were no standards– Standards take time to mature

• We need to build the infrastructures now!

– Good standards require experience– Experimentation with different approaches

Page 73: The LHC Computing Grid – February 2008 CERN’s use of gLite Dr Markus Schulz LCG Deployment Leader 24 April 2008 4 th Black Forest Grid Workshop Markus.

CERN IT Department

CH-1211 Genève 23

Switzerlandwww.cern.ch/it

Select Strategy

• Long term solution– Common interfaces– Standards

• Medium term solutions– Gateways – Adaptors and Translators

• Short term solutions– Parallel Infrastructures

• User driven• Site driven

Page 74: The LHC Computing Grid – February 2008 CERN’s use of gLite Dr Markus Schulz LCG Deployment Leader 24 April 2008 4 th Black Forest Grid Workshop Markus.

CERN IT Department

CH-1211 Genève 23

Switzerlandwww.cern.ch/it

Interoperability models

Adaptors and Translators

AP

I Plu

gin

Plu

gin

User driven parallel infr.

Site driven parallel infr.

Gateways

Gateway

Page 75: The LHC Computing Grid – February 2008 CERN’s use of gLite Dr Markus Schulz LCG Deployment Leader 24 April 2008 4 th Black Forest Grid Workshop Markus.

CERN IT Department

CH-1211 Genève 23

Switzerlandwww.cern.ch/it

Example Interoperations Activity• November 2004

– Initial meeting with OSG to discuss interoperation– Use common schema, Glue v1.2

• January 2005– Proof of concept was demonstrated

• Modifications to the software releases– Interoperability achieved

• August 2005– Month of focussed activity on operations issues– First OSG site available

• November 2005– First user jobs from GEANT4 arrived on OSG sites

• March 2006– Operations Progress

• Information system bootstrapping, trouble tickets, operations VO, …..

• Summer 2006– CMS successfully taking advantage of interoperations

• Without being aware of it!

• Summer 2007– Joining software certification testbeds

• To ensure interoperability is maintained

Page 76: The LHC Computing Grid – February 2008 CERN’s use of gLite Dr Markus Schulz LCG Deployment Leader 24 April 2008 4 th Black Forest Grid Workshop Markus.

CERN IT Department

CH-1211 Genève 23

Switzerlandwww.cern.ch/it

Grid Interoperability Now

• Building upon the many bi-lateral activities • Started at GGF-16 (now OGF) in Feb 2006• Demonstrate what we can for SC 2006

– Applications, Security, Job Management– Information Systems, Data Management

GIN

Page 77: The LHC Computing Grid – February 2008 CERN’s use of gLite Dr Markus Schulz LCG Deployment Leader 24 April 2008 4 th Black Forest Grid Workshop Markus.

CERN IT Department

CH-1211 Genève 23

Switzerlandwww.cern.ch/it

GIN Information System

Generic Information Provider

Pro

vide

r E

GE

E

Pro

vide

r O

SG

Pro

vide

r N

DG

F

GINBDII

ARCBDII

Pro

vide

r N

areg

i

Pro

vide

r Te

ragr

id

Pro

vide

r P

ragm

a

EGEESite

OSGSite

NDGFSite

NaregiGrid

TeragridGrid

PragmaGrid

Translators

Glue

Page 78: The LHC Computing Grid – February 2008 CERN’s use of gLite Dr Markus Schulz LCG Deployment Leader 24 April 2008 4 th Black Forest Grid Workshop Markus.

CERN IT Department

CH-1211 Genève 23

Switzerlandwww.cern.ch/it

The EGEE Infrastructure

Markus Schulz, CERN, IT Department

Production Service

Pre-production service

Certification test-beds (SA3)

Test-beds & Services

Operations Coordination Centre

Regional Operations Centres

Global Grid User Support

EGEE Network Operations Centre (SA2)

Operational Security Coordination Team

Operations Advisory Group (+NA4)

Joint Security Policy Group

EuGridPMA (& IGTF)

Grid Security Vulnerability Group

Security & Policy Groups

Support Structures & Processes

Training infrastructure (NA4) Training activities (NA3)

Page 79: The LHC Computing Grid – February 2008 CERN’s use of gLite Dr Markus Schulz LCG Deployment Leader 24 April 2008 4 th Black Forest Grid Workshop Markus.

CERN IT Department

CH-1211 Genève 23

Switzerlandwww.cern.ch/it

User Support• GGUS – now well established

– Used more and more– 10 of 11 ROCs provide dedicated effort to manage the

process – similar to operator on duty teams– Development plan (DSA1.1) and assessment of progress

(MSA1.8) deliveredNumber of tickets processed by GGUS

0

200

400

600

800

1000

1200

1400

1600

Apr-06 May-06 Jun-06 Jul-06 Aug-06 Sep-06 Oct-06 Nov-06 Dec-06 Jan-07 Feb-07 Mar-07

Date

Nu

mb

er

CoD ENOC Others AllNo. Tickets Processed

Operations Network User All

Page 80: The LHC Computing Grid – February 2008 CERN’s use of gLite Dr Markus Schulz LCG Deployment Leader 24 April 2008 4 th Black Forest Grid Workshop Markus.

CERN IT Department

CH-1211 Genève 23

Switzerlandwww.cern.ch/it

Grid Operations

• Grid Operator on Duty (“CoD”)– Teams from 10 of 11 ROCs participate– 5-weekly rotations: each week 1 team primary and 1 team

backup– Critical activity in maintaining usability and stability of sites– Important tools

• Site Availability Tests (SAM)• Information system monitoring • GGUS system for trouble ticket management

– Portal for operations : https://cic.gridops.org

• Significant work on operations procedures– Evolved throughout EGEE and EGEE-II– Contribute to establishment of regional grid infrastructures

through related projects – well beyond Europe now

Page 81: The LHC Computing Grid – February 2008 CERN’s use of gLite Dr Markus Schulz LCG Deployment Leader 24 April 2008 4 th Black Forest Grid Workshop Markus.

CERN IT Department

CH-1211 Genève 23

Switzerlandwww.cern.ch/it

Accounting

• Accounting system set up by UK/I – now well established – all sites reporting into it– Now starting to deploy a version that reports by user – User DN is encrypted for privacy – Policy (in draft) that defines who can access what information

and for what purpose

• Storage accounting – prototype available now– Schema has been defined– Uses information system to publish available and used storage

space data, for different classes of storage– Sensor queries the BDII and stores into R-GMA and the APEL

system– Portal to query the data is based on the CPU accounting portal

Page 82: The LHC Computing Grid – February 2008 CERN’s use of gLite Dr Markus Schulz LCG Deployment Leader 24 April 2008 4 th Black Forest Grid Workshop Markus.

CERN IT Department

CH-1211 Genève 23

Switzerlandwww.cern.ch/it Markus Schulz, CERN, IT Department

Security & Policy

Collaborative policy development– Many policy aspects are collaborative works; e.g.:

• Joint Security Policy Group• Certification Authorities

– EUGridPMA IGTF, etc.• Grid Acceptable Use Policy (AUP)

– common, general and simple AUP – for all VO members using many Grid

infrastructures• EGEE, OSG, SEE-GRID, DEISA, national Grids…

• Incident Handling and Response – defines basic communications paths– defines requirements (MUSTs) for IR– not to replace or interfere with local response

plans

Security & Availability Policy

UsageRules

Certification Authorities

AuditRequirements

Incident Response

User Registration & VO Management

Application Development& Network Admin Guide

VOSecurity

Page 83: The LHC Computing Grid – February 2008 CERN’s use of gLite Dr Markus Schulz LCG Deployment Leader 24 April 2008 4 th Black Forest Grid Workshop Markus.

CERN IT Department

CH-1211 Genève 23

Switzerlandwww.cern.ch/it

Security groups• Joint Security Policy Group:

– Joint with WLCG, OSG, and others– Focus on policy issues– Strong input to e-IRG

• EUGridPMA– Pan-European trust federation of CAs– Included in IGTF (and was model for it)– Success: most grid projects now subscribe to the IGTF

• Grid Security Vulnerability Group– New group in EGEE-II– Looking at how to manage vulnerabilities– Risk analysis is fundamental– Hard to balance between openness and giving away insider info

• Operational Security Coordination Team– Main day-to-day operational security work– Incident response and follow up– Members in all ROCs and sites– Recent security incident (not grid-related) was good shakedown

TAGPMA APGridPMA

The Americas Grid PMA

European Grid PMA

EUGridPMA

Asia-Pacific

Grid PMA

Page 84: The LHC Computing Grid – February 2008 CERN’s use of gLite Dr Markus Schulz LCG Deployment Leader 24 April 2008 4 th Black Forest Grid Workshop Markus.

CERN IT Department

CH-1211 Genève 23

Switzerlandwww.cern.ch/it

Grid Monitoring

• Becoming a critical activity to achieve reliability and stability

System ManagementFabric management

Best PracticesSecurity

…….

Grid ServicesGrid sensors

TransportRepositories

Views…….

System AnalysisApplication monitoring

……

• “… To help improve the reliability of the grid infrastructure …”

• “ … provide stakeholders with views of the infrastructure allowing them to understand the current and historical status of the service …”

• “ … to gain understanding of application failures in the grid environment and to provide an application view of the state of the infrastructure …”

• “ … improving system management practices,

• Provide site manager input to requirements on grid monitoring and management tools

• Propose existing tools to the grid monitoring working group

• Produce a Grid Site Fabric Management cook-book

• Identify training needs

Page 85: The LHC Computing Grid – February 2008 CERN’s use of gLite Dr Markus Schulz LCG Deployment Leader 24 April 2008 4 th Black Forest Grid Workshop Markus.

CERN IT Department

CH-1211 Genève 23

Switzerlandwww.cern.ch/it

Markus Schulz, CERN, IT Department

Prototype site implementation

...Service checks

Page 86: The LHC Computing Grid – February 2008 CERN’s use of gLite Dr Markus Schulz LCG Deployment Leader 24 April 2008 4 th Black Forest Grid Workshop Markus.

CERN IT Department

CH-1211 Genève 23

Switzerlandwww.cern.ch/it

Enabling Grids for E-sciencE

EGEE-II INFSO-RI-031688 LHCC Comprehensive Review; 19-20 November 2007

GridMap Prototype Visualization

Metric selection for colour of rectangles

Show SAM status

Show GridView availability data

Grid topology view (grouping)

Metric selection for size of rectangles

VO selection

Overall Site or Site Service selection

Link: http://gridmap.cern.ch Drilldown into region by clicking on the title

Context sensitive information

Colour KeyDescription of current view

Page 87: The LHC Computing Grid – February 2008 CERN’s use of gLite Dr Markus Schulz LCG Deployment Leader 24 April 2008 4 th Black Forest Grid Workshop Markus.

CERN IT Department

CH-1211 Genève 23

Switzerlandwww.cern.ch/it

Availability metrics - GridView


Recommended