+ All Categories
Home > Documents > KANGA: ROOT Access to BABAR Data for Physics Analysis David Kirkby, UC Irvine for the BABAR...

KANGA: ROOT Access to BABAR Data for Physics Analysis David Kirkby, UC Irvine for the BABAR...

Date post: 18-Dec-2015
Category:
Upload: maximillian-moore
View: 216 times
Download: 0 times
Share this document with a friend
Popular Tags:
21
KANGA : ROOT Access to BABAR Data for Physics Analysis David Kirkby, UC Irvine for the BABAR Computing Group CHEP ‘03 - Data Management & Persistency 25 March 2003 Primary Reference: T.J.Adye, A.Dorigo, R.Dubitzky, A.Forti, S.J.Gowdy, G.Hamel de Monchenault, R.G.Jacobsen, D.Kirkby, S.Kluth, E.Leonardi, A.Salnikov, L.Wilden, Comp. Phys. Comm. 150, p.197-214 (2003).
Transcript
Page 1: KANGA: ROOT Access to BABAR Data for Physics Analysis David Kirkby, UC Irvine for the BABAR Computing Group CHEP ‘03 - Data Management & Persistency 25.

KANGA: ROOT Access to BABAR Data for Physics Analysis

David Kirkby, UC Irvinefor the BABAR Computing Group

CHEP ‘03 - Data Management & Persistency25 March 2003

Primary Reference:

T.J.Adye, A.Dorigo, R.Dubitzky, A.Forti, S.J.Gowdy, G.Hamel de Monchenault, R.G.Jacobsen, D.Kirkby, S.Kluth, E.Leonardi, A.Salnikov, L.Wilden,Comp. Phys. Comm. 150, p.197-214 (2003).

Page 2: KANGA: ROOT Access to BABAR Data for Physics Analysis David Kirkby, UC Irvine for the BABAR Computing Group CHEP ‘03 - Data Management & Persistency 25.

ROOT Access to BABAR Data, D. Kirkby, CHEP 03 2

The BABAR ExperimentThe BABAR experiment records e+e- collisions at the SLAC PEP-II collider.

BABAR has ~600 collaborators from 77 institutions in10 countries. Approximately half are from US institutions.

Page 3: KANGA: ROOT Access to BABAR Data for Physics Analysis David Kirkby, UC Irvine for the BABAR Computing Group CHEP ‘03 - Data Management & Persistency 25.

ROOT Access to BABAR Data, D. Kirkby, CHEP 03 3

The BABAR DetectorThe BABAR detector has ~200k channels read out at~100 Hz into a typical raw-data event size of 25kB.

The experiment wrote ~300 TB to tape for the ~40/fb recorded during 2001, with ~10 TB kept on disk at SLAC.

Projected luminosity increases will deliver an integrated ~500/fb by end of 2006.

Page 4: KANGA: ROOT Access to BABAR Data for Physics Analysis David Kirkby, UC Irvine for the BABAR Computing Group CHEP ‘03 - Data Management & Persistency 25.

ROOT Access to BABAR Data, D. Kirkby, CHEP 03 4

BABAR Physics Analysis and Data Access

BABAR has published ~36 physics papers since Feb 2001.

The typical physics analysis only needs access to a“micro-DST” for sparse subsets of data and Monte Carlo.

Raw/Simulated hit data

Reconstructed data

Event summary data

Analysis objs.

Tag

Monte Carlo truth data

“Micro-DST”(incl. truth subset)

Until 1999, data stored exclusively in an Objy. Database (now >750TB). No longer keeping Raw, Sim & Reco.

~0.7kB/evt

~3.0kB/evt

~8.5kB/evt

~120kB/evt

~53kB/evt

~15kB/evt

Page 5: KANGA: ROOT Access to BABAR Data for Physics Analysis David Kirkby, UC Irvine for the BABAR Computing Group CHEP ‘03 - Data Management & Persistency 25.

ROOT Access to BABAR Data, D. Kirkby, CHEP 03 5

BABAR Analysis FrameworkBABAR analysis uses a standard software framework:

•Begin/NextEvent/Finalize transitions.•Each transition is passed through a sequence of execution “modules” with common base class.

•Special modules handle data I/O and the conversion between persistent & transient obj. representations.

•User modules deal only with transient object representations.

•Data access is handled differently for event- and non-event (“conditions”) sources.

This framework design completely decouples the reconstruction and analysis code from the data store technology, at some cost in performance.

Page 6: KANGA: ROOT Access to BABAR Data for Physics Analysis David Kirkby, UC Irvine for the BABAR Computing Group CHEP ‘03 - Data Management & Persistency 25.

ROOT Access to BABAR Data, D. Kirkby, CHEP 03 6

Motivation for KANGAAn Aug 99 review of BABAR Computing examined challenges involved in producing first physics results under conference deadline pressure.

Access to data, both at SLAC and at remote sites, was identified as a critical bottleneck in physics analysis.

Objectivity (Objy) performance problems recognized as weakness of computing model at the time. In particular, the limitations imposed by large files (~2Gb for analysis data), and poor lock-server scaling with many (~100) clients.

Review committee recommended that BABAR develop a “limited-function short-to-medium term solution”…

Page 7: KANGA: ROOT Access to BABAR Data for Physics Analysis David Kirkby, UC Irvine for the BABAR Computing Group CHEP ‘03 - Data Management & Persistency 25.

ROOT Access to BABAR Data, D. Kirkby, CHEP 03 7

KANGA Design RequirementsThis recommendation led to the following design requirements:

1. Access to the identical micro-DST data available from Objy. No support for access to lower-level data.

2. Compatible with existing framework and user analysis code. Changes almost transparent to analysis users (relink required).

3. Fast event filtering using simple “attributes” (TAG) data.

4. Simple and efficient distribution of data to remote (non-SLAC) sites.

Page 8: KANGA: ROOT Access to BABAR Data for Physics Analysis David Kirkby, UC Irvine for the BABAR Computing Group CHEP ‘03 - Data Management & Persistency 25.

ROOT Access to BABAR Data, D. Kirkby, CHEP 03 8

The Implementation: KANGA (ROO)Kind ANd Gentle Analysis (without Relying On Objectivity)

The key technical decision was to use ROOT objects and files for persistent data store.

In general, there are many tradeoffs involved in the Objy/ROOT decision.

Our decision was made in the context of a limited-function, short-term solution that would enhance the capabilities of a continuing Objy data store, and that could be completed quickly.

KANGA was implemented and deployed in ~4 months by a small (~5) team in 1999.

Page 9: KANGA: ROOT Access to BABAR Data for Physics Analysis David Kirkby, UC Irvine for the BABAR Computing Group CHEP ‘03 - Data Management & Persistency 25.

ROOT Access to BABAR Data, D. Kirkby, CHEP 03 9

Event Data: OverviewKANGA event data is stored in ROOT TTree objects. Each branch represents a small set of persistent classes with one branch instance per event.

Events from one run are usually grouped into a single file containing 2 trees (Analysis objs, Tag attributes). Typical size is ~1.7 kB for data (21.6 GB per /fb) and 4.7kB for Monte Carlo. Tag attributes are stored as built-in types.

class-1 class-n attr-1 attr-m… …

Analysis Objs Tag attributes

KANGA file (~106 of these now)

Page 10: KANGA: ROOT Access to BABAR Data for Physics Analysis David Kirkby, UC Irvine for the BABAR Computing Group CHEP ‘03 - Data Management & Persistency 25.

ROOT Access to BABAR Data, D. Kirkby, CHEP 03 10

Event Data: ArchitectureBABAR event data I/O is managed by special-purpose framework execution modules. Only those modules dealing directly with persistent analysis objects and Tag attributes were re-implemented for KANGA.

InputModule

AnalysisModule

AnalysisModule

InputModule

Reco.Module

OutputModule

Reco.Module

…RAW DST

A significant factor in the rapid deployment of KANGA was the earlier design decision to completely decouple the event store technology from the analysis framework.

Page 11: KANGA: ROOT Access to BABAR Data for Physics Analysis David Kirkby, UC Irvine for the BABAR Computing Group CHEP ‘03 - Data Management & Persistency 25.

ROOT Access to BABAR Data, D. Kirkby, CHEP 03 11

Event Data: Attribute TagsThe design requirement of fast selection on a sparse set of event attributes (total energy, # of muons, etc) required a small compromise in the persistent/transient decoupling to gain improved efficiency.

Instead of converting attributes, use “adapter pattern” to implement transient interface directly in terms of persistent objects.

This compromise ties transient class directly to ROOT persistent class, but without exposing persistent class to user code.

Page 12: KANGA: ROOT Access to BABAR Data for Physics Analysis David Kirkby, UC Irvine for the BABAR Computing Group CHEP ‘03 - Data Management & Persistency 25.

ROOT Access to BABAR Data, D. Kirkby, CHEP 03 12

Event Data: Object ReferencesDirect references (eg, by pointer) between transient classes require special handling to be persisted.

Implemented general mechanism to support persistence of references between transient objects valid in a single execution context.

In practice, this limits references to be within an event and does not support inter-event references.

BABAR transient classes do not use direct references, and rely instead on indirect indexing. So this feature is not currently being exploited.

Page 13: KANGA: ROOT Access to BABAR Data for Physics Analysis David Kirkby, UC Irvine for the BABAR Computing Group CHEP ‘03 - Data Management & Persistency 25.

ROOT Access to BABAR Data, D. Kirkby, CHEP 03 13

Event Data: Schema Evolution“Schema” describes the organization of data in a persistent object.

Schema evolution is desirable to support improvements in data representation and pruning of obsolete data.

ROOT I/O supports schema evolution for TObject subclasses via user-managed version numbers for each persistent class that are used to dispatch appropriate input-streamer code at obj-read time.

KANGA additionally requires updated classes to implement a standard (frozen) interface for persistent->transient conversion.

Page 14: KANGA: ROOT Access to BABAR Data for Physics Analysis David Kirkby, UC Irvine for the BABAR Computing Group CHEP ‘03 - Data Management & Persistency 25.

ROOT Access to BABAR Data, D. Kirkby, CHEP 03 14

After schema evolution, only new objects are written by new code.

New and existing code must be linked against all versions of persistent classes. No change required to user modules.

Rev.1 Modules Rev.1

Rev.1 Modules Rev.2

Rev.2 Modules Rev.2

Before:

After:

Page 15: KANGA: ROOT Access to BABAR Data for Physics Analysis David Kirkby, UC Irvine for the BABAR Computing Group CHEP ‘03 - Data Management & Persistency 25.

ROOT Access to BABAR Data, D. Kirkby, CHEP 03 15

Conditions Data: OverviewNon-event data tracks slowly-varying (<1 Hz) data-taking conditions, e.g. high-voltages, gas flows, temperatures.

Calibration results are also considered “conditions”.

Conditions data is accessed using time as a key, unlike event data.

The full BABAR conditions DB is implemented in Objy and supports a flexible revision mechanism.

Page 16: KANGA: ROOT Access to BABAR Data for Physics Analysis David Kirkby, UC Irvine for the BABAR Computing Group CHEP ‘03 - Data Management & Persistency 25.

ROOT Access to BABAR Data, D. Kirkby, CHEP 03 16

Kanga Conditions DataKANGA supports access to the limited set of conditions needed for typical physics analysis.

Access is read-only and limited to a single revision.

The most recent revision of specific conditions are automatically extracted from Objy and stored in a single ROOT file of ~20Mb. Use separate files for data, MC.

ROOT persistent implementation uses a binary tree (BTree class) for efficient time-key lookup with 1s resolution.

Correct association of event- and non-event ROOT files requires some non-trivial bookkeeping.

Page 17: KANGA: ROOT Access to BABAR Data for Physics Analysis David Kirkby, UC Irvine for the BABAR Computing Group CHEP ‘03 - Data Management & Persistency 25.

ROOT Access to BABAR Data, D. Kirkby, CHEP 03 17

Event CollectionsPhysics analysis typically involves analyzing sparse subsets of the events in a data file, but different analyses require different subsets.

Sparse collections used for analysis are grouped into ~100 “skims”. Skims were initially written using self-contained copies of each event. Grouping correlated skims into ~20 “streams” limited event-duplication overhead to ~200%.

More recently, pointer-based collections were implemented. These are more efficient for bulk storage and distribution, but carry additional book-keeping overhead. Now moving in this direction.

Page 18: KANGA: ROOT Access to BABAR Data for Physics Analysis David Kirkby, UC Irvine for the BABAR Computing Group CHEP ‘03 - Data Management & Persistency 25.

ROOT Access to BABAR Data, D. Kirkby, CHEP 03 18

KANGA Book-keeping & ProductionThe set of available KANGA event-data files and their processing history is tracked in a relational DB managed with perl scripts (“SkimTools” package).

This DB is used to schedule and monitor jobs for producing KANGA files from Objy (as well as physics skims from unfiltered data and MC).

Users can query this database to prepare a TCL fragment that configures their analysis job to analyze a dataset.

Size of DB is ~400Mb. Tables and scripts are compatible with Oracle and MySQL.

Page 19: KANGA: ROOT Access to BABAR Data for Physics Analysis David Kirkby, UC Irvine for the BABAR Computing Group CHEP ‘03 - Data Management & Persistency 25.

ROOT Access to BABAR Data, D. Kirkby, CHEP 03 19

Data ExportStraightforward and efficient data export was a primary requirement of the KANGA design.

Goals: - only transfer files that are new (once created, a file is assumed to never change) - mirror SLAC filesystem layout to simplify logical-to-physical name mapping between sites.

Initial implementation based on rsync was not efficient for typical directories containing O(1000) files.

Present implementation uses the relational DB to efficiently generate lists of new files to transfer.

Page 20: KANGA: ROOT Access to BABAR Data for Physics Analysis David Kirkby, UC Irvine for the BABAR Computing Group CHEP ‘03 - Data Management & Persistency 25.

ROOT Access to BABAR Data, D. Kirkby, CHEP 03 20

Experience and OutlookSince May 2002, the primary KANGA event store is based at Rutherford (RAL).

RAL currently stores 22 TB of data and Monte Carlo(~8B events) in 1.1M files.

A survey in early 2002 found that at least 19 institutions operated a local KANGA event store, including 5 with the majority of data available.

Head-to-head comparisons of analysis results obtained with Kanga and Objy provide valuable QA tool.

Page 21: KANGA: ROOT Access to BABAR Data for Physics Analysis David Kirkby, UC Irvine for the BABAR Computing Group CHEP ‘03 - Data Management & Persistency 25.

ROOT Access to BABAR Data, D. Kirkby, CHEP 03 21

Although conceived as a short-term solution, KANGA is still with us 3 years later.

Burden of duplicated support and storage is becoming unsustainable.

BABAR is now implementing a new Computing Model in which ROOT is the primary event store technology.

This migration involves the eventual complete phase out of Objectivity from the event store, and possible significant changes to the original KANGA design to support other features of the new Computing Model.


Recommended