+ All Categories
Home > Documents > Computing and LHCb Raja Nandakumar. The LHCb experiment Universe is made of matter Still not clear...

Computing and LHCb Raja Nandakumar. The LHCb experiment Universe is made of matter Still not clear...

Date post: 16-Jan-2016
Category:
Upload: phebe-pearson
View: 215 times
Download: 1 times
Share this document with a friend
18
Computing and LHCb Raja Nandakumar
Transcript
Page 1: Computing and LHCb Raja Nandakumar. The LHCb experiment  Universe is made of matter  Still not clear why  Andrei Sakharov’s theory of cp-violation.

Computing and LHCb

Raja Nandakumar

Page 2: Computing and LHCb Raja Nandakumar. The LHCb experiment  Universe is made of matter  Still not clear why  Andrei Sakharov’s theory of cp-violation.

The LHCb experiment Universe is made of matter

Still not clear whyAndrei Sakharov’s theory of cp-violation

Study cp-violationIndirect evidence of new physics

There are many other questions (of course)

The LHCb experiment has been built Hope to answer some of these questions

Page 3: Computing and LHCb Raja Nandakumar. The LHCb experiment  Universe is made of matter  Still not clear why  Andrei Sakharov’s theory of cp-violation.

The LHCb detector

February 2002Cavern ready for detector installationAugust 2008

Page 4: Computing and LHCb Raja Nandakumar. The LHCb experiment  Universe is made of matter  Still not clear why  Andrei Sakharov’s theory of cp-violation.

How the data looks

Page 5: Computing and LHCb Raja Nandakumar. The LHCb experiment  Universe is made of matter  Still not clear why  Andrei Sakharov’s theory of cp-violation.

The detector records … >1 Million channels of data every bunch

crossing 25ns between bunch crossings Trigger reduces to about 2000 events/sec

~7 Million events / hour25 KB/s raw event size

4.3 TB/day Not as much as ATLAS / CMS but still … Assuming continuous operation

Breaks for fills, etc. These events will need to be farmed out of

CERNReconstructed and stripped at Tier-1sThen replicated to all LHCb Tier-1 sites

Finally available for user analysis

Page 6: Computing and LHCb Raja Nandakumar. The LHCb experiment  Universe is made of matter  Still not clear why  Andrei Sakharov’s theory of cp-violation.

The LHCb computing model

CERN

Production (T2/T1/T0) Simulation + digitization

.digi

Reconstruction (T1 / T0)

.rdst

.digi

Stripping(T1 / T0)

.dst .rdst

T1 / T0.dst

FTS

User Analysis(T1/T0)

Page 7: Computing and LHCb Raja Nandakumar. The LHCb experiment  Universe is made of matter  Still not clear why  Andrei Sakharov’s theory of cp-violation.

LHCb job submission Computing distributed all over the

world Particle physics is collaborative across

institutes in various nations Both cpu, storage available at various sites

Welcome to the world of grid computing Take advantage of distributed resources Set up a framework for other disciplines

alsoFault tolerant job execution.Also used by Medicine, Chemistry, Space

science, … LHCb interface : DIRAC

Page 8: Computing and LHCb Raja Nandakumar. The LHCb experiment  Universe is made of matter  Still not clear why  Andrei Sakharov’s theory of cp-violation.

What the user sees …

Submit job to the “grid” Ganga (ATLAS/LHCb) Sometimes needs a lot of persuasion

Usually the job comes back successful

On occasion problems seen Frequently wrong parameters, code, …

Correct and resubmit

Page 9: Computing and LHCb Raja Nandakumar. The LHCb experiment  Universe is made of matter  Still not clear why  Andrei Sakharov’s theory of cp-violation.

What the user does not see …

Page 10: Computing and LHCb Raja Nandakumar. The LHCb experiment  Universe is made of matter  Still not clear why  Andrei Sakharov’s theory of cp-violation.

Requirements of DIRAC Fault tolerance

Retries Duplication Failover

Guard against possible grid problems … Network, timeouts Drive failures Systems hacked Bugs in code If it cannot go wrong, it still will

Caching Watchdogs Logs

Overloaded machine, service

Thread safety Fire, Cooling

problems

Page 11: Computing and LHCb Raja Nandakumar. The LHCb experiment  Universe is made of matter  Still not clear why  Andrei Sakharov’s theory of cp-violation.

Submitting jobs on the grid

Two ways of submitting jobs Push jobs out to a site’s batch system

The grid is a simple multiple batch system Job waits at the site until it runs

Lose control of jobs when they leave us (LHCb)Many things can change in the time between job

submission and runningWe only see the batch systems / queues

We do not see the status of the grid in real timeCause of low success rate – previous experience

Load on site Site temporary downtime Change in job priority within the experiment

Pull jobs into the site Pilot jobs

Page 12: Computing and LHCb Raja Nandakumar. The LHCb experiment  Universe is made of matter  Still not clear why  Andrei Sakharov’s theory of cp-violation.

Pilot jobs “Wrapper” jobs

Submitted to a site If site is available, free & there are waiting jobs

Pilot job returns information at current time Job may have resource requirements too …

Look at local environment and request job from DIRACDIRAC returns job with highest priority matching

available resource Internal job prioritisation within DIRAC

Has latest information on experiment prioritiesExit after a short delay if no matching job found

Have fine grained (level of worker node) view of the grid Very high job success rate Pioneered by LHCb

Very simple requirements for sites

Page 13: Computing and LHCb Raja Nandakumar. The LHCb experiment  Universe is made of matter  Still not clear why  Andrei Sakharov’s theory of cp-violation.

Does all on previous slide Refinements still needed (as always)

Job prioritisation still static Dynamic job prioritisation on the way

Basic logs all in place Not everything easy to view for user / shifter Being improved

More improvements in resilience upcoming DIRAC portal : http://lhcbweb.pic.es

All needed information for LHCb users Locating data, Job monitoring, …

Restricted information for outsiders Grid privacy issues

Ganga + DIRAC the only official LHCb grid interface Will support any reasonable use case

Page 14: Computing and LHCb Raja Nandakumar. The LHCb experiment  Universe is made of matter  Still not clear why  Andrei Sakharov’s theory of cp-violation.

Successes …

A single machine is the DIRAC server No particular load issues seen

Page 15: Computing and LHCb Raja Nandakumar. The LHCb experiment  Universe is made of matter  Still not clear why  Andrei Sakharov’s theory of cp-violation.

Analysis also going on

Comparison of different monte carlo

Page 16: Computing and LHCb Raja Nandakumar. The LHCb experiment  Universe is made of matter  Still not clear why  Andrei Sakharov’s theory of cp-violation.

The occasional problem Black hole worker nodes

Bad environment that cannot match jobs Sink for our pilot jobs

Once sink for production jobs alsoMigration from sl3 to sl4

Introduce short sleep time before pilot exits DOS attack on CERN servers

Software being downloaded from CERNWas done if software was not available locally

Now users do not install software

Page 17: Computing and LHCb Raja Nandakumar. The LHCb experiment  Universe is made of matter  Still not clear why  Andrei Sakharov’s theory of cp-violation.

We donot understand …

Very very preliminary Still working on

understanding this

“Same” class of cpu-s at different sites

CPU time scaled median for the cpu class

Page 18: Computing and LHCb Raja Nandakumar. The LHCb experiment  Universe is made of matter  Still not clear why  Andrei Sakharov’s theory of cp-violation.

Now over to ATLAS …


Recommended