+ All Categories
Home > Documents > The LHCb Computing TDR

The LHCb Computing TDR

Date post: 03-Jan-2016
Category:
Upload: demetrius-glover
View: 57 times
Download: 2 times
Share this document with a friend
Description:
The LHCb Computing TDR. Domenico Galli, Bologna. INFN CSN1 Napoli, 22.9.2005. Outline. LHCb software; Distributed Computing; Computing Model; LHCb & LCG; Milestones; LHCb request for 2006. LHCb Software Framework. - PowerPoint PPT Presentation
Popular Tags:
39
The LHCb Computing TDR Domenico Galli, Bologna INFN CSN1 Napoli, 22.9.2005
Transcript
Page 1: The LHCb Computing TDR

The LHCb Computing TDR

Domenico Galli, Bologna

INFN CSN1

Napoli, 22.9.2005

Page 2: The LHCb Computing TDR

The LHCb Computing TDR. 2Domenico Galli

Outline LHCb software;

Distributed Computing;

Computing Model;

LHCb & LCG;

Milestones;

LHCb request for 2006.

Page 3: The LHCb Computing TDR

The LHCb Computing TDR. 3Domenico Galli

LHCb Software Framework LHCb software has been developed inside a

general Object Oriented framework (Gaudi) designed to provide a common infrastructure and environment for the different software applications of the experiment.

Use of the framework discipline in all applications helps to ensure the integrity of the overall software design and results in maximum reuse of the core software components.

Gaudi is architecture-centric, requirements-driven framework:

Adopted by ATLAS; used by GLAST & HARP.

Same framework used both online & offline.

Page 4: The LHCb Computing TDR

The LHCb Computing TDR. 4Domenico Galli

Object Diagram of the Software Framework

Page 5: The LHCb Computing TDR

The LHCb Computing TDR. 5Domenico Galli

Gaudi Design Choices Decoupling between the objects describing the

data and the algorithms. Distinguish between a transient and a persistent

representation of the data objects. Data flow between algorithms proceeds via the

so-called Transient Store. Same classes for real and MC data. Clear

separation between reconstructed data and the corresponding Monte Carlo Truth data (connection through smart references).

Interfaces (pure abstract classes in C++) developed independent of their actual implementation.

Run-time loading of components (dynamic libraries).

Page 6: The LHCb Computing TDR

The LHCb Computing TDR. 6Domenico Galli

Decoupling between Data and Algorithms OO modeling should mimics the real world.

The tasks of event simulation, reconstruction and analysis consist of the manipulation by algorithms of mathematical or physical quantities such as points, vectors, matrices, hits, momenta etc.

This kind of task maps naturally onto a procedural language such as Fortran, which makes a clear distinction between data and code.

A priori, there is no reason why using an object-oriented language such as C++ should change the way of doing physics analysis.

Allows programmers to concentrate separately on both data and algorithms.

Allows a longer stability for the data objects as algorithms evolve much more rapidly.

Data objects (the LHCb Event Model) Provide manipulation of internal data members: only

contain enough basic internal functionality for givingalgorithms access to their content and derived information.

Algorithms and tools: Perform the actual data transformations: process data objects of

some type and produce new data objects of a different type.

DataObject

NewData

Object

AlgorithmObject

Page 7: The LHCb Computing TDR

The LHCb Computing TDR. 7Domenico Galli

Transient and Persistent Data Gaudi make a clear distinction between a transient and a

persistent representation of the data objects, for all categories of data.

Algorithms see only data objects in the transient representation:

Algorithms are shielded from the technology chosen to store the persistent data objects.

We have changed from ZEBRA to ROOT/IO to LCG POOL without the physics code encapsulated in the algorithms being affected.

The two representations can be optimized following different criteria (e.g. execution vs. I/O performance).

Different technologies can be accessed (e.g. for thedifferent data types).

Page 8: The LHCb Computing TDR

The LHCb Computing TDR. 8Domenico Galli

The Data Flow between the Algorithms The Data Flow between the Algorithms proceeds via the Transient

Event Store. Algorithms retrieve their input data on the TES, and publish their output

data to the TES.

3 categories of data with different lifetime: Event data (valid for the time it takes to process one event). Detector data (valid as long as detector conditions don’t change). Statistical data (lifetime corresponding to a complete job).

Transient store is organized in a tree-like structure. Data item logically related grouped in containers. Algorithms may not modify data already on the TES, and

may not add new objects to existing containers. A given container can only be manipulated by the algorithm that

publishes it on the TES. Ensures that subsequent algorithms that are interested in this data

can be executed in any order. DataObject

NewData

Object

AlgorithmObject

Page 9: The LHCb Computing TDR

The LHCb Computing TDR. 9Domenico Galli

Smart References Clear separation between reconstructed

data and corresponding Monte CarloTruth data.

No references in Digits that allowtransparent navigation to thecorresponding MC Digits.

This allows using exactly the same classesfor reconstructed real data andreconstructed simulated data.

The relationship to Monte Carlo ispreserved by the fact that the MC Digits and the Digits use the unique electronics channel identifier as a Key.

Smart references implements the relationships between objects in different containers.

From the class further in the processing sequence towards the class earlier in the sequence.

Linkers and Relations implements relationship between object distant in the processing chain.

Page 10: The LHCb Computing TDR

The LHCb Computing TDR. 10Domenico Galli

LHCb Data Processing Applications and Data Flow

Page 11: The LHCb Computing TDR

The LHCb Computing TDR. 11Domenico Galli

LHCb Data Processing Applications and Data Flow (II) Each application is a producer and/or consumer of

data for the other applications.

The applications are all based on the Gaudi framework: Communicate via the LHCb Event model and make use of

the LHCb unique Detector Description. Ensures consistency between the applications and allows

algorithms to migrate from one application to another as necessary.

Subdivision between the different applications has been driven by:

Different scopes (simulation and reconstruction); Convenience (simulation and digitization); CPU consumption and repetitiveness of the tasks performed

(reconstruction and analysis).

Page 12: The LHCb Computing TDR

The LHCb Computing TDR. 12Domenico Galli

Event Sizes & Processing Requirements

Aim Current

Event Size [kB]

RAW 25 35

rDST 25 8

DST 75 58

Event processing [kSI2k.s/evt]

Reconstruction 2.4 2.7

Stripping 0.2 0.6

Analysis 0.3 ??

Simulation (bb-incl) 50 50

Page 13: The LHCb Computing TDR

The LHCb Computing TDR. 13Domenico Galli

Conditions DB

Data sourceData source

VersionVersion

TimeTime

t1t1 t2t2 t3t3 t4t4 t5t5 t6t6 t7t7 t8t8 t9t9 t10t10 t11t11

VELO alignmentVELO alignmentHCAL calibration HCAL calibration

RICH pressure RICH pressure ECAL temperature ECAL temperature

Production version: Production version: VELO: v3 for T<t3, v2 for t3<T<t5, v3 for t5<T<t9, v1 for T>t9VELO: v3 for T<t3, v2 for t3<T<t5, v3 for t5<T<t9, v1 for T>t9HCAL: v1 for T<t2, v2 for t2<T<t8, v1 for T>t8HCAL: v1 for T<t2, v2 for t2<T<t8, v1 for T>t8RICH: v1 everywhereRICH: v1 everywhereECAL: v1 everywhereECAL: v1 everywhere

Time = TTime = T

Tools and framework to deal with conditions DB and non-perfect detector geometry is in place.

LCG COOL project is providing the underlying infrastructure for conditions DB.

Page 14: The LHCb Computing TDR

The LHCb Computing TDR. 14Domenico Galli

Distributed Computing LCG (LHC Computing Grid):

Set of baseline services for Workload Management (job submission and follow-up) and Data Management (storage, file transfer, etc.).

DIRAC (Workload Management tool) & GANGA (Distributed Analysis Tool):

Higher level services which are experiment dependent.

DIRAC has been conceived as a lightweight system with the following requirements:

be able to accommodate evolving grid opportunities;

be easy to deploy on various platforms: other resources provided by sites not participating to the LCG; a large number of desktop workstations;

Present all the heterogeneous resources as single pool to a user.

Single central Task Queue is foreseen both for production and user analysis jobs.

Page 15: The LHCb Computing TDR

The LHCb Computing TDR. 15Domenico Galli

DIRAC Architecture

Resources: represents Grid Computing and Storage elements. Provide access to their capacity and status information.

Services: provide access to the various functionalities of the DIRAC system in a well controlled way.

Agents: lightweight software components running close to the computing and storage resources. Allow the services to carry out their tasks in a distributed computing environment.

Page 16: The LHCb Computing TDR

The LHCb Computing TDR. 16Domenico Galli

DIRAC Interface to LCG There are several ways to interface DIRAC

to LCG: Sending jobs directly to the LCG Computing

Element; Used in DC 03;

Interfacing DIRAC to the LCG Resource Broker;

Not yet reliable enough in DC 04;

Using Pilot Agents; Successfully experienced in DC 04.

Page 17: The LHCb Computing TDR

The LHCb Computing TDR. 17Domenico Galli

DIRAC Pilot Agent The jobs that are sent to the LCG-2 Resource Broker

(RB) do not contain any particular LHCb job as payload, but are only executing a simple script, which downloads and installs a standard DIRAC agent.

Since the only environment necessary for the agent to run is the Python interpreter, this is perfectly possible on all the LCG sites.

This pilot-agent is configured to use the hosting Worker Node (WN) as a DIRAC CE.

Once this is done, the WN is reserved for the DIRAC WMS and is effectively turned into a virtual DIRAC production site for the time of reservation.

The pilot agent can verify the resources available on the WN (local disk space, CPU time limit, etc.) and request to the DIRAC Job Management Service only jobs corresponding to these resources.

The reservation jobs are sent whenever there are waiting jobs in the DIRAC Task queue eligible to run on LCG.

Page 18: The LHCb Computing TDR

The LHCb Computing TDR. 18Domenico Galli

Porting Pilot-Agent Technology to EGEE Work is going on in INFN-Grid to

implement the Pilot-Agent Technology into the EGEE middleware.

To be addressed: Security issues in agent to Job Management

Service communication;

Accounting issues.

Page 19: The LHCb Computing TDR

The LHCb Computing TDR. 19Domenico Galli

GANGA - User Interface to the Grid Goal

Simplify the management of analysis for end-user physicists by developing a tool for accessing Grid services with built-in knowledge of how Gaudi works.

Required user functionality Job preparation and configuration. Job submission, monitoring

and control. Resource browsing,

booking, etc. Done in collaboration

with ATLAS. Use Grid middleware services:

Interface to the Grid via Diracand create synergy betweenthe two projects.

GAUDI Program

GU

I

Collective&

ResourceGrid

Services

GAUDI Program

GU

I

Collective&

ResourceGrid

Services

GAUDI Program

GANGAGU

I

Collective&

ResourceGrid

Services

Job OptionsAlgorithms

HistogramsMonitoringResults

Page 20: The LHCb Computing TDR

The LHCb Computing TDR. 20Domenico Galli

Computing Model

Page 21: The LHCb Computing TDR

The LHCb Computing TDR. 21Domenico Galli

The LHCb Dataflow

On-line Farm

CERN Tier-1s

CERN Tier-1s

Tier-2s

reconstruction

pre-selectionanalysis

RAWmc data RAW data

rDST

DST+RAW TAG

calibration data

MC On-line Farm

Physics Analysis

Local Analysis

n-tuple User TAGUser DST

TAGSelected DST+RAW

Paper

CERN Tier-1s

Tier-3s

CERN

Scheduled job Chaotic job

Page 22: The LHCb Computing TDR

The LHCb Computing TDR. 22Domenico Galli

LHCb rDST: a Trick to Save Resources rDST is an intermediate format (final format is

DST). rDST contains the information needed in the next

analysis step. Missing quantities must be re-calculated at

next analysis step: More CPU resources; Less Disk resources.

Convenient, since additional CPU resources needed to re-calculate these quantities are cheaper than disk needed to store them.

Quantities to be written on rDST chosen in order to optimize costs.

Page 23: The LHCb Computing TDR

The LHCb Computing TDR. 23Domenico Galli

Streaming

CERNcomputing

centre

HLTb-exclusive 200 Hz

di-muon 600 Hz

D* 300 Hz

b-inclusive 900 Hz

rDST (25 kB/evt)200 Hz

2 kHzRAW (25 kB/evt)

60 MB/s2x1010 evt/a

500 TB/a

2 streams

1 a = 107 s over 7-month period

rDST25 kB/evt

RAW25 kB/evt

TAG

b-inclusiveDST+RAW100 kB/evt

b-exclusiveDST+RAW100 kB/evt

D*

rDST+RAW50 kB/evt

di-muonrDST+RAW

50 kB/evt

pre-selectionanalysis

0.2 kSi2k•s/evt

Page 24: The LHCb Computing TDR

The LHCb Computing TDR. 24Domenico Galli

Computing Model - Resource Summary

CPU power[MSi2k][# 2.4 GHz PIV]

2006 2007 2008 2009 2010

CERN0.27

312

0.54

624

0.90

1040

1.25

1445

1.88

2173

Tier-1s (6)1.33

1537

2.65

3063

4.42

5109

5.55

6416

8.35

9653

Tier-2s (14)2.29

2647

4.59

5306

7.65

8843

7.65

8843

7.65

8843

Total3.89

4497

7.78

8994

12.97

14994

14.45

16705

17.87

20670

1 2.4 GHz PIV = 865 Si2k

Page 25: The LHCb Computing TDR

The LHCb Computing TDR. 25Domenico Galli

Computing Model - Resource Profiles

CERN CPU

Tier-1 CPU

0

5

10

15

20

25

30

35

Jan

Mar

May Ju

lSe

pNov Ja

nMar

May Ju

lSe

pNov Ja

nMar

May Ju

lSe

pNov

Date

MS

I2k LHCb

CMSATLASALICE

2008 2009 2010

0

20

40

60

80

100

120

140

160

180

Jan

Mar

May Ju

lSe

pNov Ja

nMar

May Ju

lSe

pNov Ja

nMar

May Ju

lSe

pNov

Date

MS

I2k LHCb

CMSATLASALICE

2008 2009 2010

Page 26: The LHCb Computing TDR

The LHCb Computing TDR. 26Domenico Galli

Computing Model - Resource Summary (II)

2006 2007 2008 2009 2010

Disk [TiB]

CERN 248 496 826 1095 1363

Tier-1s 730 1459 2432 2897 3363

Tier-2s 7 14 23 23 23

Total 984 1969 3281 4015 4749

MSS [TiB]

CERN 408 825 1359 2857 4566

Tier-1s 622 1244 2074 4285 7066

Total 1030 2069 3433 7144 11632

Page 27: The LHCb Computing TDR

The LHCb Computing TDR. 27Domenico Galli

LHCb & LCG DC04 (May-August 2004)

187 Mevts simulated and reconstructed 61 TiB of data produced 43 LCG sites used 50% using LCG resources (61% efficiency pure LCG, 76% with

pilot) DC04v2 (December 2004)

100 Mevts simulated and reconstructed DC04 stripping

Helped in debugging CASTOR-SRM functionality CASTOR-SRM now functional (at CERN, CNAF, PIC)

RTTC production (May 2005) 200 Mevts simulated (minimum bias) in 3 weeks (up to 5500

jobs simultaneously).

Page 28: The LHCb Computing TDR

The LHCb Computing TDR. 28Domenico Galli

LHCb & LCG: Large Scale Production in 2005 on the Grid The RTTC production lasted just 20 days.

The startup was very fast: In a few days almost all available sites were in

production.

System was able to run with 4000 CPUs over 3 weeks, with a peak of over 5500 CPUs.

168 M events produced (11 M events as final output after L0 trigger cut).

Page 29: The LHCb Computing TDR

The LHCb Computing TDR. 29Domenico Galli

RTTC-2005 Production Share

Countries Events pruduced

UK 60 M

Italy 42 M

Swiss 23 M

France 11 M

Netherland 10 M

Spain 8 M

Russia 3 M

Grece 2.5 M

Canada 2 M

Germany 0.3 M

Belgium 0.2

Sweden 0.2 M

Romany,Hungary,Brasil,USA

0.8 M

5% produced with plain DIRAC sites95% produced with LCG sites.

Page 30: The LHCb Computing TDR

The LHCb Computing TDR. 30Domenico Galli

CNAF Tier-1 Share (May-August): Total CPU Time

http://tier1.cnaf.infn.it/monitor/LSF/plots/acct/

Page 31: The LHCb Computing TDR

The LHCb Computing TDR. 31Domenico Galli

CPU Exploited by LHCb at the CNAF Tier-1 During the Year 2005 From CNAF LSF monitor:

http://tier1.cnaf.infn.it/monitor/LSF/plots/acct/

(no data available before May 2005) May 2005: 222 kSi2k; Jun 2005: 110 kSi2k; Jul 2005: 76 kSi2k; Aug 2005: 310 kSi2k;

Average CPU power exploited by LHCb in 120 days: 180 kSi2k = 150 cpu2005 1 cpu2005 (3.2 GHz Xeon) = 1.2 kSi2k

Page 32: The LHCb Computing TDR

The LHCb Computing TDR. 32Domenico Galli

LHCb & LCG - SC3 & Beyond Storage Elements for permanent storage should

have a common SRM interface; Supports the LCG requirements for SRM (v2.1).

Evaluating for transfer gLite-FTS in Service Challenge 3 (SC3).

Evaluating LCG File Catalog in SC3; Previously used AliEn FC and LHCb bookkeeping DB.

Uses its own “metadata” catalogue (LHCb Bookkeeping DB);

Implementation based on ARDA metadata interface being tested.

Page 33: The LHCb Computing TDR

The LHCb Computing TDR. 33Domenico Galli

LHCb Collaboration with the CNAF Tier-1 LHCb Italian Computing Group is moving

furthermore toward a strict collaboration with the Italian Tier-1: As the LHCb on-line task (Farm Monitor &

Control) terminated the boot-strap phase.

Collaboration items: Parallel File System for Physics Analysis;

STORM for Parallel File System;

Workload Manager benchmarks.

Page 34: The LHCb Computing TDR

The LHCb Computing TDR. 34Domenico Galli

LHCb Computing Milestones Analysis at all Tier-1’s - November 2005

Start data processing phase of DC’06 - May 2006 Distribution of RAW data from CERN. Reconstruction/stripping at Tier-1’s including CERN. DST distribution to CERN & other Tier-1’s.

Alignment/calibration challenge – October 2006

Align/calibrate detector. Distribute DB slice – synchronize remote DB’s. Reconstruct data.

Production system and software ready for data taking - April 2007

Page 35: The LHCb Computing TDR

The LHCb Computing TDR. 35Domenico Galli

LHCb Computing Milestones (II) LHCb envisages a large scale MC production

commencing January 2006 ready for use in DC06 in May. It will be order of 100's Mevents.

Physics request will be planned by the end of October. Mainly for:

Physics studies; HLT studies.

MC production 2006 is not included in DC’06 (it is no more a real “challenge”).

From now on, practically speaking, an almost continuous MC production is foreseen for LHCb:

This support the request of a chunk of computing resources (mainly CPUs) permanently allocated to LHCb, the LHCb Italian Tier-2.

Page 36: The LHCb Computing TDR

The LHCb Computing TDR. 36Domenico Galli

LHCb Tier-2 (@CNAF): Additional Size and Cost (linear rump-up 2006 → 2008)Strictly according to current LHCb Computing Model

2006 2007 2008 2009 2010 total

CPU [€/Si2k] 0.58 0.38 0.25 0.17 0.12

Disk [€/GiB] 2.25 1.40 0.88 0.55 0.34

CPU running [MSi2k] 0.34 0.69 1.15 1.15 1.15

CPU running [3.2 GHz Xeon] 280 576 960 960 960

Disk running [TiB] 1 2 3 3 3

CPU replacement [MSi2k] 0.34 0.35

Disk replacement [TiB] 1 1

CPU to be acquired [MSi2k] 0.34 0.35 0.46 0.34 0.35

Disk to be acquired [TiB] 1 1 1 1 1

CPU cost [k€] 196.5 132.4 117.1 56.1 43.3 545.5

Disk cost [k€] 2.2 1.4 0.9 0.5 0.3 5.4

Total cost [k€] 198.7 133.8 118.0 56.7 43.7 550.9

3.2 GHz Xeon = 1.2 kSi2k

Page 37: The LHCb Computing TDR

The LHCb Computing TDR. 37Domenico Galli

LHCb Tier-2 (@CNAF): Additional Infrastructures

2006

2007

2008

2009

2010

CPU [MSi2K] 0.34 0.69 1.15 1.15 1.15

Disk [TiB] 1 2 3 3 3

Electric Power [kW]

38 76 127 127 127

N. PC 140 288 480 480 480

N. Racks 4 8 13 13 13

Power+cooling [kW]

95 190 317 317 317

1 kSi2k → 110 W1 TiB → 70 W

Page 38: The LHCb Computing TDR

The LHCb Computing TDR. 38Domenico Galli

LHCb Requests for 2006 200 k€: Tier-2 resources (140 dual-

processor box + 1 TiB Disk).

Since resources are allocated at CNAF, resource management could be flexible: CPUs can be moved from Tier-1 queues to

Tier-2 queues and back with software operations.

But Tier-2 have to be logically separated by Tier-1 (e.g.: different batch queues).

Page 39: The LHCb Computing TDR

The LHCb Computing TDR. 39Domenico Galli

Summary LHCb has in place a robust s/w

framework.

Grid computing can be successfully exploited for production-like tasks.

Next steps: Realistic Grid user analyses.

Prepare reconstruction to deal with real data: particularly calibration, alignment, …

Stress testing of the computing model.

Building the Tier-2.


Recommended