+ All Categories
Home > Documents > Distributed Analysis K. Harrison LHCb Collaboration Week, CERN, 1 June 2006.

Distributed Analysis K. Harrison LHCb Collaboration Week, CERN, 1 June 2006.

Date post: 21-Jan-2016
Category:
Upload: sharlene-crawford
View: 216 times
Download: 0 times
Share this document with a friend
Popular Tags:
20
Distributed Analysis K. Harrison LHCb Collaboration Week, CERN, 1 June 2006
Transcript
Page 1: Distributed Analysis K. Harrison LHCb Collaboration Week, CERN, 1 June 2006.

Distributed Analysis

K. HarrisonLHCb Collaboration Week, CERN, 1 June

2006

Page 2: Distributed Analysis K. Harrison LHCb Collaboration Week, CERN, 1 June 2006.

1 June 2006 2/20

Aims of distributed analysisPhysicist defines job

to analyse (large) dataset(s)Use distributed

resources (computing Grid)

Subjob 1Subjob 2Subjob 3 Subjob n

Job

Distribute workload

LHCb distributed-analysis system based on LCG (Grid infrastructure), DIRAC (workload management) and Ganga (user interface)

Single job

submitted

Combined output

returned

Page 3: Distributed Analysis K. Harrison LHCb Collaboration Week, CERN, 1 June 2006.

1 June 2006 3/20

Tier-1 centres

Tier-2 centres

LHCb computing model

Baseline solution: analysis at Tier-1 centres

Analysis at Tier-2 centres not in baseline solution, but not ruled out

Page 4: Distributed Analysis K. Harrison LHCb Collaboration Week, CERN, 1 June 2006.

1 June 2006 4/20

DIRAC submission to LCG : Pilot Agents

JobReceiver

LFC

MatcherData

Optimiser

JobDB

TaskQueue

AgentDirector

Pilot

Agent

LCG

WMS

Computing

Resource

Pilot

Agent

AgentMonitor

DIRAC

Data Optimiser queries Logical File Catalogue to identify sites for job execution

Agent Director submits Pilot Agents for jobs

in waiting state

Agent Monitor tracks Agent status,

and triggers further submission as needed

Page 5: Distributed Analysis K. Harrison LHCb Collaboration Week, CERN, 1 June 2006.

1 June 2006 5/20

DIRAC submission to LCG : Bond Analogy

JobReceiver

LFC

Matcher

JobDB

TaskQueue

AgentDirector

Pilot

Agent

LCG

WMS

Computing

Resource

AgentMonitor

Data Optimiser queries Logical File Catalogue to identify sites for job execution DIRAC

Agent Monitor tracks Agent status,

and triggers further submission as needed

Agent Director submits Pilot Agents for jobs

in waiting state

Page 6: Distributed Analysis K. Harrison LHCb Collaboration Week, CERN, 1 June 2006.

1 June 2006 6/20

Ganga job abstraction• A job in Ganga is constructed from a set of

building blocks, not all required for every job

Merger

Application

Backend

Input Dataset

Output Dataset

Splitter

Data read by application

Data written by application

Rule for dividing into subjobs

Rule for combining outputs

Where to run

What to run

Job

Page 7: Distributed Analysis K. Harrison LHCb Collaboration Week, CERN, 1 June 2006.

1 June 2006 7/20

Framework for plugin handling

• Ganga provides a framework for handling different types of Application, Backend, Dataset, Splitter and Merger, implemented as plugin classes

• Each plugin class has its own schema

DaVinci

GangaObject

IApplication IBackendIDatasetISplitter IMerger

Dirac-version-cmt_user_path-masterpackage-optsfile-extraopts

User

System

Plugin

Interfaces

Example plugins

and schemas

-CPUTime

-destination-id-status

Page 8: Distributed Analysis K. Harrison LHCb Collaboration Week, CERN, 1 June 2006.

1 June 2006 8/20

Ganga Command-Line Interface in Python (CLIP)

• CLIP provides interactive job definition and submission from an enhanced Python shell (IPython)– Especially good for trying things out, and understanding how the system works

# List the available application plug-ins list_plugins( “application” ) # Create a job for submitting DaVinci to DIRAC j = Job( application = “DaVinci”, backend = “Dirac” # Set the job-options file j.application.optsfile = “myOpts.txt” # Submit the job j.submit() # Search for string in job’s standard output !grep “Selected events” $j.outputdir/stdout

Page 9: Distributed Analysis K. Harrison LHCb Collaboration Week, CERN, 1 June 2006.

1 June 2006 9/20

Ganga scripting

• From the command line, a script myScript.py can be executed in the Ganga environment using: ganga myScript.py – Allows automation of repetitive tasks

• Scripts for basic tasks included in distribution # Create a job for submitting Gauss to DIRAC ganga make_job Gauss DIRAC test.py # Edit test.py to set Gauss properties, then submit job ganga submit test.py # Query status, triggering output retrieval if job is completed ganga query

Approach similar to the one typically used when submitting to a local batch system

Page 10: Distributed Analysis K. Harrison LHCb Collaboration Week, CERN, 1 June 2006.

1 June 2006 10/20

Ganga Graphical User Interface (GUI)

• GUI consists of central monitoring panel and dockable windows

• Job definition based on mouse selections and field completion

• Highly configurable: choose what to display and howJob

details

Logical

Folders

Scriptor

Job Monitoring

Log window

Job builder

Page 11: Distributed Analysis K. Harrison LHCb Collaboration Week, CERN, 1 June 2006.

1 June 2006 11/20

Shocking News!

• LHCb Distributed Analysis system is working well• DIRAC and Ganga providing complementary

functionality• People with little or no knowledge of Grid

technicalities are using the system for physics analysis

• More than 75 million events processed in past three months

• Fraction of jobs completing successfully averaging about 92%

• Extended periods with success rate >95%

How can this be happenin

g?

Did he say 75 millio

n?

Who’s doing this?

Page 12: Distributed Analysis K. Harrison LHCb Collaboration Week, CERN, 1 June 2006.

1 June 2006 12/20

Beginnings of a success story

• 2nd LHCb-UK Software Course held at Cambridge, 10th-12th January 2006

• Half day dedicated to Distributed Computing: presentations and 2 hours of practical sessions– U.Egede: Distributed Computing & Ganga– R.Nandakumar: UK Tier-1 Centre– S.Paterson: DIRAC– K.Harrison: Grid submission made simple

• Made clear to participants a number of things– Tier 1 centres have a lot of resources– Easy to submit jobs to Grid using Ganga– DIRAC ensures high success rate Distributed analysis not just possible in theory but possible in practice

Photographs by P.Koppenburg

Page 13: Distributed Analysis K. Harrison LHCb Collaboration Week, CERN, 1 June 2006.

1 June 2006 13/20

Cambridge pioneers of distributed analysis

• C.Lazzeroni: B+ D0(KS0+-)K+

• J.Storey: Flavour tagging with protons• Project students:

– M.Dobrowolski: B+ D0(KS0K+K-)K+

– S.Kelly: B0 D+D- and BS0 DS

+DS-

– B.Lum: B0 D0(KS0+-)K*0

– R.Dixon del Tufo: BS0

– A.Willans: B0 K*0+-

• R.Dixon del Tufo had previous experience of Grid, Ganga and HEP software

• Others encountered these for first time at LHCb-UK software course

Cristina decided

she preferred Cobra to Python

Photograph by A.Buckley

CHEP06, Mumbai

Page 14: Distributed Analysis K. Harrison LHCb Collaboration Week, CERN, 1 June 2006.

1 June 2006 14/20

Work model (1)• Usual strategy has been to develop/test/tune algorithms

using signal samples and small background samples on local disks, then process (many times) larger samples (>700k events) on Grid

• Used pre-GUI version of Ganga, with job submission performed using Ganga scripting interface– Users need only look at the few lines for specifying DaVinci version, master package, job options and splitting requirements

– Splitting parameters are files per job and maximum total number of files (very useful for testing on a few files)

– Script-based approach popular with both new users (very little to remember) and experienced users (similar to what they usually do to submit to a batch system)

– Jobs submitted to both DIRAC and local batch system (Condor)

Page 15: Distributed Analysis K. Harrison LHCb Collaboration Week, CERN, 1 June 2006.

1 June 2006 15/20

Work model (2)• Interactive Ganga session started to have status updates

and output retrieval• DIRAC monitoring page also used for checking job progress• Jobs usually split so that output files were small enough

to be returned in sandbox (i.e. retrieved automatically by Ganga)

• Large outputs placed on CERN storage element (CASTOR) by DIRAC– Outputs retrieved manually using LCG transfer command (lcg-cp) and logical-file name given by DIRAC

• Hbook files merged in Ganga framework using GPI script:– ganga merge 16,27,32-101 myAnalysis.hbook

• ROOT files merged using standalone ROOT script (from C.Jones)

• Excellent support from S.Patterson and A.Tsaregorodtsev for DIRAC problems/queries, and from M.Bargiotti for LCG catalogue problems

Page 16: Distributed Analysis K. Harrison LHCb Collaboration Week, CERN, 1 June 2006.

1 June 2006 16/20

Example plots from jobs run on distributed-analysis system

J.Storey:Flavour tagging with protons

Analysis run on 100k Bs J/ tagHLT

events

C.Lazzeroni:Evaluation of background forB+ D0(K0+-)K+

Analysis run on 400k B+ D0(K0+-)K*0

Results presented at CP Measurements WG meeting, 16 March 2006

Page 17: Distributed Analysis K. Harrison LHCb Collaboration Week, CERN, 1 June 2006.

1 June 2006 17/20

Project reportsR.Dixon del Tufo

BS0

M.DobrowolskiB+ D0(KS

0K+K-)K+ B.Lum

B0 D0(KS0+-)K*0

A.WillansB0 K*0+-

S.KellyB0 D+D- and BS

0 DS+DS

-

• Reports make extensive use of results obtained usingdistributed-analysis system, especially for background estimates

• Aim to have all reports turned into LHCb notes

Page 18: Distributed Analysis K. Harrison LHCb Collaboration Week, CERN, 1 June 2006.

1 June 2006 18/20

Job statistics (1)

DIRAC job

state

outputready

stalled failed other all

Number of jobs

5036 127 257 68 5488

• Statistics taken from DIRAC monitoring page for analysis jobs submitted from Cambridge (user ids: cristina, deltufo, kelly, lum, martad, storey, willans) between 20 February 2006 (week after CHEP06) and 15 May 2006

• Estimated success rate: outputready/all = 5036/5488 = 92%• Individual job typically processes 20 to 40 files of 500-1000

events each– Estimated number of events successfully processed:30 500 5036 = 7.55 107

Page 19: Distributed Analysis K. Harrison LHCb Collaboration Week, CERN, 1 June 2006.

1 June 2006 19/20

Job statistics (2)• Stalled jobs: 127/5488 = 2.3%

– Proxy expires before job completes• Problem essentially eliminated by having Ganga create proxy with long lifetime

– Problems accessing data?• Failed jobs: 257/5488 = 4.7%

– 73 failures where input data listed in bookkeeping database (and physically at CERN), but not in LCG file catalogue• Files registered by M.Bargiotti, then jobs ran successfully

– 115 failures 7-20 April because of transient problem with DIRAC installation of software (associated with upgrade to v2r10)

Excluding above failures, job success rate is: 5036/5300 = 95%

Page 20: Distributed Analysis K. Harrison LHCb Collaboration Week, CERN, 1 June 2006.

1 June 2006 20/20

Conclusions

• LHCb distributed-analysis system is being successfully used for physics studies

• Ganga makes the system easy to use• DIRAC ensures system has high efficiency• Extended periods with job success rate >95%• More than 75 million events processed in past three

months• Working on improvements, but this is already a

useful tool• To get started using the system, see user

documentation on Ganga web site: http://cern.ch/ganga

He did say 75 millio

n!


Recommended