+ All Categories
Home > Documents > LHCb Computing Model and Grid Status Glenn Patrick GRIDPP13, Durham – 5 July 2005.

LHCb Computing Model and Grid Status Glenn Patrick GRIDPP13, Durham – 5 July 2005.

Date post: 05-Jan-2016
Category:
Upload: edwina-ferguson
View: 214 times
Download: 1 times
Share this document with a friend
Popular Tags:
27
LHCb Computing Model and Grid Status Glenn Patrick GRIDPP13, Durham – 5 July 2005
Transcript
Page 1: LHCb Computing Model and Grid Status Glenn Patrick GRIDPP13, Durham – 5 July 2005.

LHCb Computing Model

and Grid Status

Glenn PatrickGRIDPP13, Durham – 5 July 2005

Page 2: LHCb Computing Model and Grid Status Glenn Patrick GRIDPP13, Durham – 5 July 2005.

2Glenn Patrick GridPP13 – 5 July 2005

LHCb – June 2005

03 June 2005

HCAL

MF1-MF3Mu-filters

MF4

LHCb Magnet

ECAL

Page 3: LHCb Computing Model and Grid Status Glenn Patrick GRIDPP13, Durham – 5 July 2005.

3Glenn Patrick GridPP13 – 5 July 2005

Computing completes TDRs

Jan 2000June 2005

Page 4: LHCb Computing Model and Grid Status Glenn Patrick GRIDPP13, Durham – 5 July 2005.

4Glenn Patrick GridPP13 – 5 July 2005

Online System

Multiplexing Layer

FE FE FE FE FE FE FE FE FE FE FE FE

SwitchSwitch SwitchSwitch

Level-1Traffic

HLTTraffic

1000 kHz5.5 GB/s

40 kHz1.6 GB/s

94 SFCs

Front-end Electronics

7.1 GB/s

TRM

Sorter

L1-Decision

StorageSystem

Readout NetworkReadout Network

SwitchSwitch SwitchSwitch SwitchSwitch

SFC

Switch

CPU

CPU

CPU

SFC

Switch

CPU

CPU

CPU

Switch

CPUCPU

CPUCPU

CPUCPU

SFC

Switch

CPU

CPU

CPU

SFC

Switch

CPU

CPU

CPU

Switch

CPUCPU

CPUCPU

CPUCPU

SFC

Switch

CPU

CPU

CPU

SFC

Switch

CPU

CPU

CPU

Switch

CPUCPU

CPUCPU

CPUCPU

SFC

Switch

CPU

CPU

CPU

SFC

Switch

CPU

CPU

CPU

Switch

CPUCPU

CPUCPU

CPUCPU

SFC

Switch

CPU

CPU

CPU

SFC

Switch

CPU

CPU

CPU

Switch

CPUCPU

CPUCPU

CPUCPU

CPUFarm

~1600 CPUs

~ 250 MB/stotal

TIER0

Scalable in depth:more CPUs (<2200)Scalable in width: more detectors in Level- 1

TFCSystem

ECS

40 MHz

Level-0Hardware

1 MHz

Level-1Software

40 kHz

HLTSoftware

2 kHz Tier 0

Raw Data:2kHZ, 50MB/s Tier 1Tier 1Tier 1Tier 1Tier 1Tier 1

Page 5: LHCb Computing Model and Grid Status Glenn Patrick GRIDPP13, Durham – 5 July 2005.

5Glenn Patrick GridPP13 – 5 July 2005

HLT Output

b-exclusive dimuon D* b-inclusive Total

Trigger Rate (Hz)

200 600 300 900 2000

Fraction 10% 30% 15% 45% 100%

Events/year (109)

2 6 3 9 20

200 Hz Hot StreamWill be fully reconstructed on online farmin real time. “Hot stream” (RAW + rDST) written to Tier 0.

2kHzRAW data written to Tier 0 for reconstruction at CERN and Tier 1s.

XJ / )(0* KDD Calibration for proper-

time resolution.Clean peak allows PID

calibration.

bUnderstand bias on other

B selections.

Page 6: LHCb Computing Model and Grid Status Glenn Patrick GRIDPP13, Durham – 5 July 2005.

6Glenn Patrick GridPP13 – 5 July 2005

Data Flow

ReconstructionBrunel

SimulationGauss

DigitisationBoole

AnalysisDaVinci

MC Truth Raw Data DST AnalysisObjects

StrippedDST

Framework - Gaudi

DetectorDescription

ConditionsDatabase

Event Model/Physics Event Model

Page 7: LHCb Computing Model and Grid Status Glenn Patrick GRIDPP13, Durham – 5 July 2005.

7Glenn Patrick GridPP13 – 5 July 2005

LHCb Computing Model

14 candidates

CERN Tier 1 essential

for accessing “hot stream” for1. First alignment

& calibration.2. First high-level

analysis.

Page 8: LHCb Computing Model and Grid Status Glenn Patrick GRIDPP13, Durham – 5 July 2005.

8Glenn Patrick GridPP13 – 5 July 2005

Distributed Data

RAW DATA500 TB

CERN = Master Copy

2nd copy distributed over six Tier 1s

STRIPPING140 TB/pass/copy

Pass 1: During data taking at CERN and Tier 1s (7 months)Pass 2: After data taking at CERN and Tier 1s (1 month)

RECONSTRUCTION500TB/pass

Pass 1: During data taking at CERNand Tier 1s (7 months)

Pass 2: During winter shutdown atCERN, Tier 1s and online farm (2months)

Pass 3: During shutdown at CERN, Tier 1s and online farm

Pass 4: Before next year data taking at CERN and Tier 1s (1 month)

Page 9: LHCb Computing Model and Grid Status Glenn Patrick GRIDPP13, Durham – 5 July 2005.

9Glenn Patrick GridPP13 – 5 July 2005

Check File integrity

DaVinci stripping

Check File integrity

DaVinci stripping

Check File integrity

DaVinci stripping

Stripping Job - 2005

Read INPUTDATA and stage them in 1 go

Check File status

Not yet Staged

Prod DB

group2group1

groupN

staged

Send bad file info

Check File integrity

DaVinci stripping

Good file

Merging processDST and ETC

ETC

DST

Send file info

Usage of SRM

Stripping runs on reducedDSTs (rDST).

Pre-selection algorithmscategorise events into streams.

Events that pass are fullyreconstructed and full DSTs written.

CERN, CNAF, PIC used so far – sites based on CASTOR.

Page 10: LHCb Computing Model and Grid Status Glenn Patrick GRIDPP13, Durham – 5 July 2005.

10Glenn Patrick GridPP13 – 5 July 2005

Resource ProfileCPU (MSI2k.yr)

2006 2007 2008 2009 2010

CERN 0.27 0.54 0.90 1.25 1.88

Tier-1’s 1.33 2.65 4.42 5.55 8.35

Tier-2’s 2.29 4.59 7.65 7.65 7.65

TOTAL 3.89 7.78 12.97 14.45 17.88

DISK (TB)2006 2007 2008 2009 2010

CERN 248 496 826 1095 1363

Tier-1’s 730 1459 2432 2897 3363

Tier-2’s 7 14 23 23 23

TOTAL 984 1969 3281 4015 4749

MSS (TB)2006 2007 2008 2009 2010

CERN 408 825 1359 2857 4566

Tier-1’s 622 1244 2074 4285 7066

TOTAL 1030 2069 3433 7144 11632

Page 11: LHCb Computing Model and Grid Status Glenn Patrick GRIDPP13, Durham – 5 July 2005.

11Glenn Patrick GridPP13 – 5 July 2005

Tier1 CPU

0

20

40

60

80

100

120

140

160

Jan

Feb

Ma

r

Ap

r

Ma

y

Jun Jul

Au

g

Se

p

Oct

No

v

De

c

Jan

Feb

Ma

r

Ap

r

Ma

y

Jun Jul

Au

g

Se

p

Oct

No

v

De

c

Jan

Feb

Ma

r

Ap

r

Ma

y

Jun Jul

Au

g

Se

p

Oct

No

v

De

c

Date

MS

I2k

LHCb

CMS

ATLAS

ALICE

2008 2009 2010

Comparisons - CPUTier 1 CPU – integrated (Nick Brook)

LHCb

Page 12: LHCb Computing Model and Grid Status Glenn Patrick GRIDPP13, Durham – 5 July 2005.

12Glenn Patrick GridPP13 – 5 July 2005

Comparisons- DiskLCG TDR – LHCC, 29.6.2005 (Jurgen Knobloch)

0

20

40

60

80

100

120

140

160

2007 2008 2009 2010Year

PB

LHCb-Tier-2

CMS-Tier-2

ATLAS-Tier-2

ALICE-Tier-2

LHCb-Tier-1

CMS-Tier-1

ATLAS-Tier-1

ALICE-Tier-1

LHCb-CERN

CMS-CERN

ATLAS-CERN

ALICE-CERN

54%

pled

ged

CERN

Tier-1

Tier-2

Page 13: LHCb Computing Model and Grid Status Glenn Patrick GRIDPP13, Durham – 5 July 2005.

13Glenn Patrick GridPP13 – 5 July 2005

Comparisons - TapeLCG TDR – LHCC, 29.6.2005 (Jurgen Knobloch)

0

20

40

60

80

100

120

140

160

2007 2008 2009 2010Year

PB

LHCb-Tier-1

CMS-Tier-1

ATLAS-Tier-1

ALICE-Tier-1

LHCb-CERN

CMS-CERN

ATLAS-CERN

ALICE-CERN75%

pled

ged

CERN

Tier-1

Page 14: LHCb Computing Model and Grid Status Glenn Patrick GRIDPP13, Durham – 5 July 2005.

14

DIRAC Architecture

DIRAC JobManagement

Service

DIRAC JobManagement

Service

DIRAC CEDIRAC CEDIRAC CEDIRAC CE

DIRAC CEDIRAC CE

LCGLCGResourceBroker

ResourceBroker

CE 1CE 1

DIRAC SitesDIRAC Sites

AgentAgent AgentAgent AgentAgent

CE 2CE 2

CE 3CE 3

Productionmanager

Productionmanager GANGA UIGANGA UI User CLI User CLI

JobMonitorSvcJobMonitorSvc

JobAccountingSvcJobAccountingSvc

AccountingDB

Job monitorJob monitor

InformationSvcInformationSvc

FileCatalogSvcFileCatalogSvc

MonitoringSvcMonitoringSvc

BookkeepingSvcBookkeepingSvc

BK query webpage BK query webpage

FileCatalogbrowser

FileCatalogbrowser

Userinterfaces

DIRACservices

DIRACresources

DIRAC StorageDIRAC Storage

DiskFileDiskFile

gridftpgridftpbbftpbbftp

rfiorfio

Services Oriented Architecture

Page 15: LHCb Computing Model and Grid Status Glenn Patrick GRIDPP13, Durham – 5 July 2005.

15Glenn Patrick GridPP13 – 5 July 2005

Data Challenge 2004

DIRAC alone

LCG inaction

1.8 106/day

LCG paused

Phase 1 Completed

3-5 106/day

LCG restarted

187 M Produced Events

LHCb DC'04

0

20

40

60

80

100

120

140

160

180

200

Total may june july august

Month

Ev

en

ts (

M)

LCG

DIRAC

20 DIRAC sites + 43 LCG sites were used.

Data written to Tier 1s.

• Overall, 50% of events produced using LCG.

• At end, 75% produced by LCG.

UK second largest producer (25%) after

CERN.

Page 16: LHCb Computing Model and Grid Status Glenn Patrick GRIDPP13, Durham – 5 July 2005.

16Glenn Patrick GridPP13 – 5 July 2005

RTTC - 2005

Real Time Trigger Challenge – May/June 2005

150M Minimum bias events to feed online farmand test software trigger chain.

Completed in 20 days (169M events) on 65 different sites.95% produced with LCG sites5% produced with “native” DIRAC sites

Average of 10M events/day.Average of 4,000 cpus

CountriesEvents

Produced

UK 60 M

Italy 42 M

Switzerland 23 M

France 11 M

Netherlands 10 M

Spain 8 M

Russia 3 M

Greece 2.5 M

Canada 2 M

Germany 0.3 M

Belgium 0.2M

Sweden 0.2 M

Romany, Hungary, Brazil, USA

0.8 M

37%

Page 17: LHCb Computing Model and Grid Status Glenn Patrick GRIDPP13, Durham – 5 July 2005.

17Glenn Patrick GridPP13 – 5 July 2005

Looking Forward

SC3

LHC Service OperationFull physics run

2005 20072006 2008

First physicsFirst beams

cosmicsSC4

Next ChallengeSC3 – Sept. 2005

Start DC06Processing phase

May 2006

Alignment/calibrationChallenge

October 2006

Ready for data takingApril 2007

Analysisat Tier 1sNov. 2005

Page 18: LHCb Computing Model and Grid Status Glenn Patrick GRIDPP13, Durham – 5 July 2005.

18Glenn Patrick GridPP13 – 5 July 2005

LHCb and SC3

Phase 1 (Sept. 2005 ):

a) Movement of 8TB of digitised data from CERN/Tier 0 to LHCb Tier 1 centres in parallel over a 2 week period (~10k files). Demonstrate automatic tools for data movement and bookkeeeping.

b) Removal of replicas (via LFN) from all Tier 1 centres.

c) Redistribution of 4TB data from each Tier 1 centre to Tier 0 and other Tier 1 centres over a 2 week period. Demonstrate data can be redistributed in real time to meet stripping demands.

d) Moving of stripped DST data (~1TB, 190k files) from CERN to all Tier 1 centres.

Phase 1 (Sept. 2005 ):

a) Movement of 8TB of digitised data from CERN/Tier 0 to LHCb Tier 1 centres in parallel over a 2 week period (~10k files). Demonstrate automatic tools for data movement and bookkeeeping.

b) Removal of replicas (via LFN) from all Tier 1 centres.

c) Redistribution of 4TB data from each Tier 1 centre to Tier 0 and other Tier 1 centres over a 2 week period. Demonstrate data can be redistributed in real time to meet stripping demands.

d) Moving of stripped DST data (~1TB, 190k files) from CERN to all Tier 1 centres.

Phase 2 (Oct. 2005 ):a) MC production in Tier 2 centres with DST data collected in Tier

1 centres in real time followed by stripping in Tier 1 centres (2 months). Data stripped as it becomes available.

b) Analysis of stripped data in Tier 1 centres.

Phase 2 (Oct. 2005 ):a) MC production in Tier 2 centres with DST data collected in Tier

1 centres in real time followed by stripping in Tier 1 centres (2 months). Data stripped as it becomes available.

b) Analysis of stripped data in Tier 1 centres.

Page 19: LHCb Computing Model and Grid Status Glenn Patrick GRIDPP13, Durham – 5 July 2005.

19Glenn Patrick GridPP13 – 5 July 2005

SC3 RequirementsTier 1 Read only LCG file catalogue (LFC) for > 1 Tier 1. SRM version 1.1 interface to MSS. GRIDFTP server for MSS. File Transfer Service (FTS) and LFC client/tools. gLite CE. Hosting CE for agents (with managed, backed up file system).

Tier 1 Read only LCG file catalogue (LFC) for > 1 Tier 1. SRM version 1.1 interface to MSS. GRIDFTP server for MSS. File Transfer Service (FTS) and LFC client/tools. gLite CE. Hosting CE for agents (with managed, backed up file system).

Tier 2 SRM interface to SE. GRIDFTP access FTS and LFC tools.

Tier 2 SRM interface to SE. GRIDFTP access FTS and LFC tools.

Job AgentJob AgentWN

Job

CEHosting CE

LocalSESoftwarerepository

Monitor AgentMonitor Agent

Transfer AgentTransfer Agent

Request DB

Page 20: LHCb Computing Model and Grid Status Glenn Patrick GRIDPP13, Durham – 5 July 2005.

20Glenn Patrick GridPP13 – 5 July 2005

SC3 Resources

Phase 1 Temporary MSS access to ~10TB of data at each Tier 1 (with SRM). Permanent access to 1.5TB on disk at each Tier 1 with SRM interface.

Phase 1 Temporary MSS access to ~10TB of data at each Tier 1 (with SRM). Permanent access to 1.5TB on disk at each Tier 1 with SRM interface.

Phase 2 LCG version 2.5.0 in production for whole of LCG. CPU

MC production: ~250 (2.4GHz) WN over 2 months (non Tier1).

Stripping: ~2 (2.4GHz) WN per Tier 1 for duration of this phase.

Storage (permanent) 10TB of storage across all Tier 1s and CERN for MC output. 350 GB disk with SRM interface at each Tier 1 and CERN for

stripping output.

Phase 2 LCG version 2.5.0 in production for whole of LCG. CPU

MC production: ~250 (2.4GHz) WN over 2 months (non Tier1).

Stripping: ~2 (2.4GHz) WN per Tier 1 for duration of this phase.

Storage (permanent) 10TB of storage across all Tier 1s and CERN for MC output. 350 GB disk with SRM interface at each Tier 1 and CERN for

stripping output.

Page 21: LHCb Computing Model and Grid Status Glenn Patrick GRIDPP13, Durham – 5 July 2005.

21

Data Production on the Grid

DIRAC JobManagement

Service

DIRAC JobManagement

Service

PilotJobAgentPilotJobAgent

LCGLCGResourceBroker

ResourceBroker

CE 1CE 1 CE 2CE 2

CE 3CE 3AgentAgent

AgentAgent

Production DB

DIRAC JobMonitoring

Service

DIRAC JobMonitoring

Service

Production manager

LocalSERemoteSE

File CatalogLFC

File CatalogLFC

DIRAC CEDIRAC CEDIRAC CEDIRAC CE

DIRAC CEDIRAC CE

DIRAC SitesDIRAC Sites

WN

Job

AgentAgent

Page 22: LHCb Computing Model and Grid Status Glenn Patrick GRIDPP13, Durham – 5 July 2005.

22Glenn Patrick GridPP13 – 5 July 2005

UK:Workflow Control

Primary eventSpill-over event

Production DesktopGennady Kuznetsov

(RAL)

Gauss B Gauss MBGauss MBGauss MB

Boole B Boole MBBoole MBBoole MB

Brunel B Brunel MB

Sim

Digi

Reco

Software installation

Gaussexecution

Check logfile

Dir listing

Bookkeeping

report

Steps

Modules

Page 23: LHCb Computing Model and Grid Status Glenn Patrick GRIDPP13, Durham – 5 July 2005.

23Glenn Patrick GridPP13 – 5 July 2005

Web

Browser

Bookkeep

ing

AR

DA

S

erv

er

TCP/IPStreaming

AR

DA

C

lien

t AP

I

Tomcat S

erv

let

AR

DA

Clie

nt

AP

I

GANGA

application

UK: LHCb Metadata and ARDA

Carmine Cioffi (Oxford)

Page 24: LHCb Computing Model and Grid Status Glenn Patrick GRIDPP13, Durham – 5 July 2005.

24Glenn Patrick GridPP13 – 5 July 2005

AtlasPROD

DIAL

DIRAC

LCG2

gLite

localhost

LSF

submit, kill

get outputupdate status

store & retrieve job definition

prepare, configure

Ganga4

JobJobJobJob

scripts

Gaudi

Athena

AtlasPROD

DIAL

DIRAC

LCG2

gLite

localhost

LSF

+ split, merge, monitor, dataset selection

UK: Ganga

See next talk!

Karl Harrison (Cambridge)

Alexander Soroko (Oxford)

Alvin Tan (Birmingham)Ulrik Egede (Imperial)

Andrew Maier (CERN)

Kuba Moscicki (CERN)

Page 25: LHCb Computing Model and Grid Status Glenn Patrick GRIDPP13, Durham – 5 July 2005.

25Glenn Patrick GridPP13 – 5 July 2005

UK: Conditions Database

Data sourceData source

VersionVersion

TimeTime

t1t1 t2t2 t3t3 t4t4 t5t5 t6t6 t7t7 t8t8 t9t9 t10t10 t11t11

VELO alignmentVELO alignmentHCAL calibration HCAL calibration

RICH pressure RICH pressure ECAL temperature ECAL temperature

Production version: Production version: VELO: v3 for T<t3, v2 for t3<T<t5, v3 for t5<T<t9, v1 for T>t9VELO: v3 for T<t3, v2 for t3<T<t5, v3 for t5<T<t9, v1 for T>t9HCAL: v1 for T<t2, v2 for t2<T<t8, v1 for T>t8HCAL: v1 for T<t2, v2 for t2<T<t8, v1 for T>t8RICH: v1 everywhereRICH: v1 everywhereECAL: v1 everywhereECAL: v1 everywhere

Time = TTime = T

LCG COOL project providing underlying structure for conditions database.

User-interfaceNicolas Gilardi (Edinburgh)

Page 26: LHCb Computing Model and Grid Status Glenn Patrick GRIDPP13, Durham – 5 July 2005.

26Glenn Patrick GridPP13 – 5 July 2005

UK: Analysis with DIRAC

Task-Queue

Agent

Job executes on WN

DIRACJob

Installs software

Closest SE

Data as LFN

Matching

Check for allSE’s whichhave data

If no dataspecified

Software Installation + Analysis via DIRAC WMS

Stuart Patterson (Glasgow)

DIRAC API for analysis job submission

[ Requirements = other.Site == "DVtest.in2p3.fr"; Arguments = "jobDescription.xml"; JobName = "DaVinci_1"; OutputData = { "/lhcb/test/DaVinci_user/v1r0/LOG/DaVinci_v12r11.alog" }; parameters = [ STEPS = "1"; STEP_1_NAME = "0_0_1" ]; SoftwarePackages = { "DaVinci.v12r11" }; JobType = "user"; Executable = "$LHCBPRODROOT/DIRAC/scripts/jobexec"; StdOutput = "std.out"; Owner = "paterson"; OutputSandbox = { "std.out", "std.err", "DVNtuples.root", "DaVinci_v12r11.alog", "DVHistos.root" }; StdError = "std.err"; ProductionId = "00000000"; InputSandbox = { "lib.tar.gz", "jobDescription.xml", "jobOptions.opts" }; JobId = ID ]

PACMANDIRAC installation tools

Page 27: LHCb Computing Model and Grid Status Glenn Patrick GRIDPP13, Durham – 5 July 2005.

Half way there!

But the climb getssteeper.

2005 Monte-Carlo Production on the Grid

2007 Data Taking

Data Stripping

Distributed Analysis

Distributed Reconstruction

Conclusion

DC04

DC03


Recommended