+ All Categories
Home > Documents > Vincenzo Innocente, Beauty 2002CMS on the grid1 CMS on the Grid Vincenzo Innocente CERN/EP Toward a...

Vincenzo Innocente, Beauty 2002CMS on the grid1 CMS on the Grid Vincenzo Innocente CERN/EP Toward a...

Date post: 05-Jan-2016
Category:
Upload: esmond-francis
View: 213 times
Download: 0 times
Share this document with a friend
Popular Tags:
30
Vincenzo Innoc ente, Beauty 2 002 CMS on the grid 1 CMS on the Grid Vincenzo Innocente CERN/EP Toward a fully distributed Physics Analysis
Transcript
Page 1: Vincenzo Innocente, Beauty 2002CMS on the grid1 CMS on the Grid Vincenzo Innocente CERN/EP Toward a fully distributed Physics Analysis.

Vincenzo Innocente, Beauty 2002

CMS on the grid 1

CMS on the Grid

Vincenzo Innocente

CERN/EP

Toward a fully distributed Physics Analysis

Page 2: Vincenzo Innocente, Beauty 2002CMS on the grid1 CMS on the Grid Vincenzo Innocente CERN/EP Toward a fully distributed Physics Analysis.

Vin

cenz

o In

noce

nte,

Bea

uty

2002

CM

S o

n th

e gr

id

2

Computing Architecture: Challenges at LHC

Bigger Experiment, higher rate, more data

Larger and dispersed user community performing non trivial queries against a large event store

Make best use of new IT technologies

Increased demand of both flexibility and coherence ability to plug-in new algorithms ability to run the same algorithms in multiple environments guarantees of quality and reproducibility high-performance user-friendliness

Page 3: Vincenzo Innocente, Beauty 2002CMS on the grid1 CMS on the Grid Vincenzo Innocente CERN/EP Toward a fully distributed Physics Analysis.

Vin

cenz

o In

noce

nte,

Bea

uty

2002

CM

S o

n th

e gr

id

5

b physics: a challenge for CMS computingA large distributed effort already today ~150 physicists in CMS Heavy-flavor group > 40 institutions involved

Requires precise and specialized algorithms for vertex-reconstruction and particle identificationMost of CMS triggered events include B particles High level software triggers select exclusive channels in events

triggered in hardware using inclusive conditions

Challanges: Allow remote physicists to access detailed event-information Migrate effectively reconstruction and selection algorithms

to High Level Trigger

Page 4: Vincenzo Innocente, Beauty 2002CMS on the grid1 CMS on the Grid Vincenzo Innocente CERN/EP Toward a fully distributed Physics Analysis.

Vin

cenz

o In

noce

nte,

Bea

uty

2002

CM

S o

n th

e gr

id

6

CMS Experiment-Data Analysis

Detector ControlOnline Monitoring

Environmental data

storeRequest part

of event

Simulation

store

store

Data Quality

Calibrations

Group AnalysisUser Analysis

on demand

Request part

of event

Request part of event

Store rec-Obj

and calibrations

Quasi-online

Reconstruction

Request part

of event

Store rec-Obj

Persistent Object Store ManagerDatabase Management System

Event FilterObject Formatter

PhysicsPhysicsPaperPaper

Page 5: Vincenzo Innocente, Beauty 2002CMS on the grid1 CMS on the Grid Vincenzo Innocente CERN/EP Toward a fully distributed Physics Analysis.

Vin

cenz

o In

noce

nte,

Bea

uty

2002

CM

S o

n th

e gr

id

8

Analysis ModelHierarchy of Processes (Experiment, Analysis Groups, Individuals)

ReconstructionReconstruction

SelectionSelection

AnalysisAnalysis

Re-Re-processingprocessing3 per year3 per year

Iterative selectionIterative selectionOnce per monthOnce per month

Different Physics cutsDifferent Physics cuts& MC comparison& MC comparison~~1 time per day1 time per day

Experiment-Experiment-Wide ActivityWide Activity(10(1099 events) events)

~20 Groups’~20 Groups’ActivityActivity

(10(109 9 101077 events) events)

~25 Individual~25 Individualper Groupper GroupActivityActivity

(10(1066 –10 –1088 events) events)

New detector New detector calibrationscalibrations

Or understandingOr understanding

Trigger based andTrigger based andPhysics basedPhysics basedrefinementsrefinements

Algorithms appliedAlgorithms appliedto datato data

to get resultsto get results

3000 SI95sec/event3000 SI95sec/event1 job year1 job year

3000 SI95sec/event3000 SI95sec/event1 job year1 job year

3000 SI95sec/event3000 SI95sec/event3 jobs per year3 jobs per year

3000 SI95sec/event3000 SI95sec/event3 jobs per year3 jobs per year

25 SI95sec/event25 SI95sec/event~20 jobs per month~20 jobs per month

25 SI95sec/event25 SI95sec/event~20 jobs per month~20 jobs per month

10 SI95sec/event10 SI95sec/event~500 jobs per day~500 jobs per day

10 SI95sec/event10 SI95sec/event~500 jobs per day~500 jobs per day

Monte CarloMonte Carlo

5000 SI95sec/event5000 SI95sec/event5000 SI95sec/event5000 SI95sec/event

1GHz ~ 50SI95

Page 6: Vincenzo Innocente, Beauty 2002CMS on the grid1 CMS on the Grid Vincenzo Innocente CERN/EP Toward a fully distributed Physics Analysis.

Vin

cenz

o In

noce

nte,

Bea

uty

2002

CM

S o

n th

e gr

id

9

Data handling baselineCMS computing in year 2007 data model typical objects 1KB-1MB

3 PB 3 PB of storage space10,000 10,000 CPUs 31 sites: 1 tier0+5 tier1+25 tier2

all over the worldI/O rates disk->CPU: 10,000 MB/s, average 1 MB/s/CPU

RAW->ESD generation: ~0.2 MB/s I/O / CPUESD->AOD generation: ~5 MB/s I/O / CPUAOD analysis into histos: ~0.2 MB/s I/O / CPUDPD generation from AOD and ESD: ~10 MB/s I/O / CPU

Wide-area I/O capacity: order of 700 MByte/s aggregate over all payload intercontinental TCP/IP streams

This implies a system with heavy reliance on access to site-local (cached) data Data-Grid

Page 7: Vincenzo Innocente, Beauty 2002CMS on the grid1 CMS on the Grid Vincenzo Innocente CERN/EP Toward a fully distributed Physics Analysis.

Vin

cenz

o In

noce

nte,

Bea

uty

2002

CM

S o

n th

e gr

id

11

Three Computing Environments: Different Challenges

Centralized quasi-online processing Keep-up with the rate Validate and distribute data efficiently

Distributed organized processing Automatization

Interactive chaotic analysis Efficient access to data and “Metadata” Management of “private” data Rapid Application Development

Page 8: Vincenzo Innocente, Beauty 2002CMS on the grid1 CMS on the Grid Vincenzo Innocente CERN/EP Toward a fully distributed Physics Analysis.

Vin

cenz

o In

noce

nte,

Bea

uty

2002

CM

S o

n th

e gr

id

13

The Final Challenge:A Coherent Analysis Environment

Beyond the interactive analysis tool (User point of view) Data analysis & presentation: N-tuples, histograms, fitting, plotting, …

A great range of other activities with fuzzy boundaries (Developer point of view) Batch Interactive from “pointy-clicky” to Emacs-like power tool to scripting Setting up configuration management tools, application frameworks and

reconstruction packages Data store operations: Replicating entire data stores; Copying runs, events,

event parts between stores; Not just copying but also doing something more complicated—filtering, reconstruction, analysis, …

Browsing data stores down to object detail level 2D and 3D visualisation Moving code across final analysis, reconstruction and triggers

Today this involves (too) many tools

Page 9: Vincenzo Innocente, Beauty 2002CMS on the grid1 CMS on the Grid Vincenzo Innocente CERN/EP Toward a fully distributed Physics Analysis.

Vin

cenz

o In

noce

nte,

Bea

uty

2002

CM

S o

n th

e gr

id

17

Varied components and data flowsOne Portal

Tool plugin

module

Production system and data repositories

ORCA analysis farm(s) (or distributed `farm’ using grid queues)

RDBMS based data

warehouse(s)

PIAF/Proof/..type analysis

farm(s)

Local disk

User

TAGs/AODsdata flow

Physics Query flow

Tier 1/2

Tier 0/1/2

Tier 3/4/5

Productiondata flow

TAG and AOD extraction/conversion/transport services

Data extractionWeb service(s)

Local analysis tool: Lizard/ROOT/… Web browser

Query Web service(s)

Page 10: Vincenzo Innocente, Beauty 2002CMS on the grid1 CMS on the Grid Vincenzo Innocente CERN/EP Toward a fully distributed Physics Analysis.

Vincenzo Innocente, Beauty 2002

CMS on the grid 18

CMS TODAYHome-Made Tools

Data production and analysis exercises granularity (Data Product): Data-Set (simulated physics channel)

Development and deployment of a distributed data processing system (Hardware & Software)Test and integration of Grid middleware prototypesR&D on distributed interactive analysis

Page 11: Vincenzo Innocente, Beauty 2002CMS on the grid1 CMS on the Grid Vincenzo Innocente CERN/EP Toward a fully distributed Physics Analysis.

Vin

cenz

o In

noce

nte,

Bea

uty

2002

CM

S o

n th

e gr

id

19

Current CMS Production

PythiaZebra fileswith HITS

HEPEVTNtuples

CMSIM(GEANT3)

ORCA/COBRADigitization

(merge signaland pile-up)

ObjectivityDatabase

ORCA/COBRAooHit

FormatterObjectivityDatabase

OSCAR/COBRA(GEANT4)

ORCAUser

AnalysisNtuples orRoot files

ObjectivityDatabaseIGUANA

InteractiveAnalysis

Page 12: Vincenzo Innocente, Beauty 2002CMS on the grid1 CMS on the Grid Vincenzo Innocente CERN/EP Toward a fully distributed Physics Analysis.

Vin

cenz

o In

noce

nte,

Bea

uty

2002

CM

S o

n th

e gr

id

21

CMS distributed production toolsRefDB Production flow Manager

Web Portal, MySql backend

IMPALA (Intelligent Monte Carlo Production Local Actuator)

Job scheduler “to-do” discovery, job decomposition, script assembly from

templates error recovery and re-submit

BOSS (Batch Object Submission System) Job control, monitoring and tracking

Envelop script, filter output-stream, log in MySql Backend

DAR Distribution of software in binary form (shared-libs and bin)

Page 13: Vincenzo Innocente, Beauty 2002CMS on the grid1 CMS on the Grid Vincenzo Innocente CERN/EP Toward a fully distributed Physics Analysis.

Vin

cenz

o In

noce

nte,

Bea

uty

2002

CM

S o

n th

e gr

id

22

Current data processing“Produce 100000 events dataset mu_MB2mu_pt4” IMPALA

decomposition(Job scripts)

JOBSRC

BOSSDB

IMPALA monitoring(Job scripts)

Production“RefDB”

ProductionInterface

Production Manager

distributestasks to

Regional Centers

Farm storage

RequestSummary

file

RC farm

Regional Center

Data locationthrough

Production DB

Page 14: Vincenzo Innocente, Beauty 2002CMS on the grid1 CMS on the Grid Vincenzo Innocente CERN/EP Toward a fully distributed Physics Analysis.

Vin

cenz

o In

noce

nte,

Bea

uty

2002

CM

S o

n th

e gr

id

23

Production 2002, Complexity

Number of Regional Centers 11

Number of Computing Centers 21

Number of CPU’s ~1000

Largest Local Center 176 CPUs

Number of Production Passes for each Dataset(including analysis group processing done by production)

6-8

Number of Files ~11,000

Data Size (Not including fz files from Simulation) 17TB

File Transfer by GDMP and by perl Scripts over scp/bbcp7TB toward T1

4TB toward T2

Page 15: Vincenzo Innocente, Beauty 2002CMS on the grid1 CMS on the Grid Vincenzo Innocente CERN/EP Toward a fully distributed Physics Analysis.

Vin

cenz

o In

noce

nte,

Bea

uty

2002

CM

S o

n th

e gr

id

24

Spring02: CPU Resources

Wisconsin 18%

INFN 18%

IN2P3 10%

RAL 6%UCSD 3%

UFL 5%

HIP 1%

Caltech 4%Moscow

10%

Bristol 3%

FNAL 8%

CERN 15%

IC 6%

4.4.02: 700 active CPUs plus 400 CPUs to come

Page 16: Vincenzo Innocente, Beauty 2002CMS on the grid1 CMS on the Grid Vincenzo Innocente CERN/EP Toward a fully distributed Physics Analysis.

Vin

cenz

o In

noce

nte,

Bea

uty

2002

CM

S o

n th

e gr

id

25

INFN-Legnaro Tier-2 prototype

FFastastEEthth

32 – GigaEth 1000 BT32 – GigaEth 1000 BT

SWITCHSWITCH

N1N1FFastastEEthth

SWITCHSWITCH

11 88

S1S1 S16S16

NN2424 N1N1 NN2424

Nx – Computational NodeNx – Computational NodeDual PIII – 1 GHzDual PIII – 1 GHz512 MB512 MB3x75 GB Eide disk + 1x20 GB for O.S.3x75 GB Eide disk + 1x20 GB for O.S.

Sx – Disk Server NodeSx – Disk Server NodeDual PIII – 1 GHzDual PIII – 1 GHzDual PCI (33/32 – 66/64)Dual PCI (33/32 – 66/64)512 MB512 MB3x75 GB Eide Raid 0-5 disks (exp up to 10) 3x75 GB Eide Raid 0-5 disks (exp up to 10) 1x20 GB disk O.S.1x20 GB disk O.S.

FFastastEEthth

SWITCHSWITCH

N1N1 22 NN24242001200135 Nodes35 Nodes70 CPUs70 CPUs3500 SI953500 SI958 TB8 TB

2001-2-32001-2-3up to 190 Nodesup to 190 Nodes

S11S11

2001200111 Servers11 Servers1100 SI951100 SI952.5 TB2.5 TB

To WANTo WAN34 Mbps 200134 Mbps 2001155 Mbps 2002155 Mbps 2002

Page 17: Vincenzo Innocente, Beauty 2002CMS on the grid1 CMS on the Grid Vincenzo Innocente CERN/EP Toward a fully distributed Physics Analysis.

Vincenzo Innocente, Beauty 2002

CMS on the grid 28

CMS TOMORROWTransition to Grid-Middleware

Use Virtual Data tools for workflow mng at DataSet level

Use Grid Security infrastructure & Workload manager

Deploy Grid-enabled portal to interactive Analysis

Global monitoring of Grid performances and quality of service

CMS Grid workshop at CERN 11-14/6/2002http://documents.cern.ch/AGE/current/fullAgenda.php?ida=a02826#s7

Page 18: Vincenzo Innocente, Beauty 2002CMS on the grid1 CMS on the Grid Vincenzo Innocente CERN/EP Toward a fully distributed Physics Analysis.

Vin

cenz

o In

noce

nte,

Bea

uty

2002

CM

S o

n th

e gr

id

29

Toward ONE Grid

Build a unique CMS-GRID framework (EU+US)EU and US grids not interoperable today. Wait for help from DataTAG-iVDGL-GLUE Work in parallel in EU and US

Main US activities: MOP Virtual Data System Interactive Analysis

Main EU activities: Integration of IMPALA with EDG WP1+WP2 sw. Batch Analysis: user job submission & analysis farm

Page 19: Vincenzo Innocente, Beauty 2002CMS on the grid1 CMS on the Grid Vincenzo Innocente CERN/EP Toward a fully distributed Physics Analysis.

Vin

cenz

o In

noce

nte,

Bea

uty

2002

CM

S o

n th

e gr

id

30

PPDG MOP systemPPDG Developed MOP SystemAllows submission of CMS prod. Jobs from a central location, run on remote locations, and returnresults

Relies on GDMP for replication Globus GRAM Condor-G and local queuing

systems for Job Scheduling IMPALA for Job Specification

being deployed in USCMS testbedProposed as basis for next CMS-wide production infrastructure

Page 20: Vincenzo Innocente, Beauty 2002CMS on the grid1 CMS on the Grid Vincenzo Innocente CERN/EP Toward a fully distributed Physics Analysis.

Vin

cenz

o In

noce

nte,

Bea

uty

2002

CM

S o

n th

e gr

id

31

StorageResource

Replica MngmtCatalog Services

Planner Executor

Use r

RefDBMaterialized

DataCatalog

Virtual DataCatalog

ConcretePlanner/

WP1

AbstractPlanner

MOP/WP1

ReplicaCatalogGDMP

Local GridStorage

ObjectivityMetadataCatalog

LocalTracking DB

Compute Resource

BO

SS

CMKIN

CMSIM

ORCA/COBRA

WrapperScripts

Prototype VDG System (production)

Page 21: Vincenzo Innocente, Beauty 2002CMS on the grid1 CMS on the Grid Vincenzo Innocente CERN/EP Toward a fully distributed Physics Analysis.

Vin

cenz

o In

noce

nte,

Bea

uty

2002

CM

S o

n th

e gr

id

32

StorageResource

Replica MngmtCatalog Services

Planner Executor

Use r

RefDBMaterialized

DataCatalog

Virtual DataCatalog

ConcretePlanner/EDG-WP1

AbstractPlanner

MOP/EDG-WP1

ReplicaCatalogGDMP

Local GridStorage

ObjectivityMetadataCatalog

LocalTracking DB

Compute Resource

BO

SS

CMKIN

CMSIM

ORCA/COBRA

WrapperScripts

= no code = existing = implemented using MOP

Prototype VDG System (production)

Page 22: Vincenzo Innocente, Beauty 2002CMS on the grid1 CMS on the Grid Vincenzo Innocente CERN/EP Toward a fully distributed Physics Analysis.

Vin

cenz

o In

noce

nte,

Bea

uty

2002

CM

S o

n th

e gr

id

33

IMPALA/BOSS integration with EDG

UserEnvironment

DOLLY

BOSS

jobs

mySQL DB

RefDB at CERN

CEbatch manager

NFS

WN1 WN2CMKIN

IMPALAWNn

UI

GRIDEDG-RB

UI

job executer

job

Page 23: Vincenzo Innocente, Beauty 2002CMS on the grid1 CMS on the Grid Vincenzo Innocente CERN/EP Toward a fully distributed Physics Analysis.

Vin

cenz

o In

noce

nte,

Bea

uty

2002

CM

S o

n th

e gr

id

34

Push & Pullrsh & ssh existing scripts

snmp RC

MonitorServiceFarm

Monitor

Client(other service)

LookupService

LookupService

Registration

Farm Monitor

Discovery

Proxy

Component Factory

GUI marshaling Code Transport RMI data access

Globally Scalable Monitoring Service

Page 24: Vincenzo Innocente, Beauty 2002CMS on the grid1 CMS on the Grid Vincenzo Innocente CERN/EP Toward a fully distributed Physics Analysis.

Vin

cenz

o In

noce

nte,

Bea

uty

2002

CM

S o

n th

e gr

id

36

CLARENS: a Portal to the GridGrid-enabling the working environment for physicists' data analysisClarens consists of a server communicating with various clients via the commodity XML-RPC protocol. This ensures implementation independence.The server will provide a remote API to Grid tools:

Client

RPC

Web Server

Clarens

Service

http

/htt

ps

The Virtual Data Toolkit: Object collection accessData movement between Tier centres using GSI-FTPCMS analysis software (ORCA/COBRA),Security services provided by the Grid (GSI)No Globus needed on client side, only certificate

Current prototype is running on the Caltech proto-Tier2

Page 25: Vincenzo Innocente, Beauty 2002CMS on the grid1 CMS on the Grid Vincenzo Innocente CERN/EP Toward a fully distributed Physics Analysis.

Vin

cenz

o In

noce

nte,

Bea

uty

2002

CM

S o

n th

e gr

id

37

Clarens ArchitectureCommon protocol spoken by all types of clients to all types of servicesImplement service once for all clientsImplement client access to service once for each client type using common

protocol already implemented for “all” languages (C++, Java, Fortran, etc. :-)Common protocol is XML-RPC with SOAP close to working, CORBA doable, but

would require different server above Clarens (uses IIOP, not HTTP)Handles authentication using Grid certificates, connection management, data

serialization, optionally encryptionImplementation uses stable, well-known server infrastructure (Apache) that is

debugged/audited over a long period by manyClarens layer itself implemented in Python, but can be reimplemented in C++

should performance be inadequate

More information at http://clarens.sourceforge.net, along with a web-based demo

Page 26: Vincenzo Innocente, Beauty 2002CMS on the grid1 CMS on the Grid Vincenzo Innocente CERN/EP Toward a fully distributed Physics Analysis.

Vincenzo Innocente, Beauty 2002

CMS on the grid 38

2007Grid-enable Analysis

Sub-event components map to Grid Data-Products

Balance of load between Network and CPU

Complete Data and Software base “virtually” available at the physicist desktop

Page 27: Vincenzo Innocente, Beauty 2002CMS on the grid1 CMS on the Grid Vincenzo Innocente CERN/EP Toward a fully distributed Physics Analysis.

Vin

cenz

o In

noce

nte,

Bea

uty

2002

CM

S o

n th

e gr

id

39

Evolution of Computing in CMSRamp Production systems 05-07 (30%,+30%,+40% of cost each year)

Match Computing power available with LHC luminosity

CPU Computing Power

0

100

200

300

400

500

600

700

800

900

1000

2000 2001 2002 2003 2004 2005 2006 2007

Year

kS

I95

CERN T0/T1(shared)

Regional T1's

Regional T2's

2006200M Reco ev/mo

100M Re-Reco ev/mo30k ev/s Analysis

2007300M Reco ev/mo

200M Re-Reco ev/mo50k ev/s Analysis

Old schedule: new one stretched of 15 more months

Page 28: Vincenzo Innocente, Beauty 2002CMS on the grid1 CMS on the Grid Vincenzo Innocente CERN/EP Toward a fully distributed Physics Analysis.

Vin

cenz

o In

noce

nte,

Bea

uty

2002

CM

S o

n th

e gr

id

40

Federation

wizards

Detector/EventDisplay

Data Browser

Analysis jobwizards

Generic analysis Tools

ORCAORCA

FAMOSFAMOS

POMPOMtoolstools

GRIDGRID

OSCAROSCARCOBRACOBRA

DistributedData Store

& ComputingInfrastructure

CMSCMStoolstools

Grid-enable Analysis

ConsistentUser Interface

Coherent set of basic tools and mechanisms

Software development and installation

Page 29: Vincenzo Innocente, Beauty 2002CMS on the grid1 CMS on the Grid Vincenzo Innocente CERN/EP Toward a fully distributed Physics Analysis.

Vin

cenz

o In

noce

nte,

Bea

uty

2002

CM

S o

n th

e gr

id

41

Simulation, Reconstruction & Analysis Software System

SpecificFramework

ODBMS Geant3/4 CLHEP PawReplacement

C++ standard library

Extension toolkit

Reconstruction

Algorithms

Data

Monitoring

Event

Filter

Physics

Analysis

CalibrationObjects Event Objects

ConfigurationObjects

Generic Application Framework

Physics modules

adapters and extensions

BasicServices

Grid-Aware Data-Products

Grid-enabled

Application

Framework

Uploadable on the Grid

Page 30: Vincenzo Innocente, Beauty 2002CMS on the grid1 CMS on the Grid Vincenzo Innocente CERN/EP Toward a fully distributed Physics Analysis.

Vin

cenz

o In

noce

nte,

Bea

uty

2002

CM

S o

n th

e gr

id

43

ConclusionsCMS considers the Grid as the enabling technology for the effective deployment of a coherent and consistent data processing environment This is the only base for an efficient physics analysis program

at LHC

“Spring 2002” production just finished successfully: Distributed analysis started Make use of grid-middleware is next milestone

CMS is engaged in an active development, test and deployment program of all software and hardware components that will constitute the future LHC grid


Recommended