+ All Categories
Home > Documents > Database Access Elizabeth Gallas - Oxford - October 06, 2009 ATLAS Week - Barcelona, Spain What does...

Database Access Elizabeth Gallas - Oxford - October 06, 2009 ATLAS Week - Barcelona, Spain What does...

Date post: 22-Dec-2015
Category:
View: 216 times
Download: 0 times
Share this document with a friend
Popular Tags:
28
Database Access Elizabeth Gallas - Oxford - October 06, 2009 ATLAS Week - Barcelona, Spain What does a job need ? 1. Data (Events) 2. Database (Geometry, Conditions) 3. Efficient I/O (sometime across a network), CPU 4. (A Purpose and a) Place for Output Needs: 1. Food 2. Water 3. Love 4. Place for output
Transcript
Page 1: Database Access Elizabeth Gallas - Oxford - October 06, 2009 ATLAS Week - Barcelona, Spain What does a job need ? 1. Data (Events) 2. Database (Geometry,

Database Access

Elizabeth Gallas - Oxford -

October 06, 2009

ATLAS Week - Barcelona, Spain

What does a job need ?1. Data (Events)2. Database (Geometry, Conditions)3. Efficient I/O (sometime across a network), CPU4. (A Purpose and a) Place for Output

Needs:1. Food2. Water3. Love4. Place for output

Page 2: Database Access Elizabeth Gallas - Oxford - October 06, 2009 ATLAS Week - Barcelona, Spain What does a job need ? 1. Data (Events) 2. Database (Geometry,

06-Oct-2009 Elizabeth Gallas 2

Outline Overview of Oracle Databases in ATLAS Conditions Database Database replication Database Distribution Technologies

Emphasis on “Frontier” Decision for grid-wide Frontier deployment What you need to know and do

TAG DB Architecture, Services, Resource planning

Ongoing Work (on current topics) Summary and Conclusions

Insufficient time to describe many ongoing activities Please see presentations during recent Software week:

http://indico.cern.ch/conferenceDisplay.py?confId=50976 But a lot of activity since then as well !

Page 3: Database Access Elizabeth Gallas - Oxford - October 06, 2009 ATLAS Week - Barcelona, Spain What does a job need ? 1. Data (Events) 2. Database (Geometry,

06-Oct-2009 Elizabeth Gallas 3

Overview – Oracle usage in ATLASOracle is used extensively: every stage of data taking & analysis Configuration

PVSS – Detector Control System (DCS) Configuration & Monitoring Trigger – Trigger Configuration (online and simulation) OKS – Configuration databases for the TDAQ Detector Description – Geometry

File and Job management T0 – Tier 0 processing DQ2/DDM – distributed file and dataset management Dashboard – monitor jobs and data movement on the ATLAS grid PanDa – workload management: production & distributed analysis

Dataset selection catalogue AMI (dataset selection catalogue)

Conditions data (non-event data for offline analysis) Conditions Database in Oracle [POOL files in DDM (referenced from the Conditions DB)]

Event summary - event-level metadata TAGs – ease selection of and navigation to events of interest

Page 4: Database Access Elizabeth Gallas - Oxford - October 06, 2009 ATLAS Week - Barcelona, Spain What does a job need ? 1. Data (Events) 2. Database (Geometry,

06-Oct-2009 Elizabeth Gallas 4

Thanks to Oracle Operations SupportMany applications sharing Oracle resources, can effect operational

capacitySupporting the Databases is critical ! Special thanks to

CERN Physics Database Services Support Oracle-based services at CERN Coordinate Distributed Database Operations (WLCG 3D)

Tier-1 (Tier-2) -- DBAs and system managers ATLAS DBAs:

Florbela Viegas Gancho Dimitrov

- Schedule/Apply Oracle interventions- Advise us on Application development- Coordinate database monitoring (experts, shifters)

- Helping to develop, maintain and distribute this critical data takes specialized knowledge and considerable effort which is frequently underestimated.

Page 5: Database Access Elizabeth Gallas - Oxford - October 06, 2009 ATLAS Week - Barcelona, Spain What does a job need ? 1. Data (Events) 2. Database (Geometry,

06-Oct-2009 Elizabeth Gallas 5

“Conditions”

“Conditions” – general term for information which is not ‘event-wise’ reflecting the conditions or states of a system – conditions are valid for an interval ranging from very short to infinity.

Sets of Conditions can be versioned (called a COOL Tag)Any conditions data needed for offline processing and/or analysis

must be stored in the ATLAS Conditions Database

or in its referenced POOL files (DDM)

ATLAS Conditions Database(any non-event-wise data

needed for offline process/analysis)ZDC

DCS TDAQ OKS

LHC

DQM

Page 6: Database Access Elizabeth Gallas - Oxford - October 06, 2009 ATLAS Week - Barcelona, Spain What does a job need ? 1. Data (Events) 2. Database (Geometry,

06-Oct-2009 Elizabeth Gallas 6

Conditions DB infrastructure in ATLAS Relies on considerable infrastructure: COOL, CORAL, Athena (developed by

ATLAS and CERN IT) -- generic schema design which can store / accommodate / deliver a large amount of data for a diverse set of subsystems.

Athena / Conditions DB Considerable Improvements in R15 (TWiki):

DeliverablesForRelease15 More efficient use of COOL connections Enabling of Frontier / Squid connections to Oracle IOVDbSvc refinements … and much more

Continued refinement – by subsystem As we chip away at inefficient usage

Thanks to sculpting Coordination by Richard Hawkings

COOL Tagging - distinct sets of Conditions making specific computations reproducible

Coordination by Paul Laycock Used at every stage of data taking and analysis

From online calibrations, alignment, monitoring tooffline … processing … more calibrations … further

alignment… reprocessing … analysis …to luminosity and data quality

Page 7: Database Access Elizabeth Gallas - Oxford - October 06, 2009 ATLAS Week - Barcelona, Spain What does a job need ? 1. Data (Events) 2. Database (Geometry,

06-Oct-2009 Elizabeth Gallas 7

DB Access Software Components

DatabaseResidentData

Page 8: Database Access Elizabeth Gallas - Oxford - October 06, 2009 ATLAS Week - Barcelona, Spain What does a job need ? 1. Data (Events) 2. Database (Geometry,

06-Oct-2009 Elizabeth Gallas 8

Oracle Distribution of Conditions data Oracle stores a huge amount of essential data ‘at our fingertips’

But ATLAS has many… many… many… fingers May be looking for oldest to newest data

Conditions in Oracle – Master copy at Tier-0 replicated to 10 Tier-1 sites Running jobs at Oracle sites (direct access) performs well

Important to continue testing, optimize RAC But direct Oracle access on the grid from remote site over WideAreaNetwork

Even after tuning efforts, direct access requires many back/forth communications on the network – excessive RTT (Round Trip Time)… SLOW

Cascade effect Jobs hold connections for longer … Prevents new jobs from starting …

Use alternative technologies, especially over WAN: “caching” Conditions from Oracle when possible

OnlineCondDB

Offlinemaster

CondDB

Tier-1replica

Tier-1replica

Tier-0 farm

Computer centre

Outside world

Isolation / cut

Calibration updates

Page 9: Database Access Elizabeth Gallas - Oxford - October 06, 2009 ATLAS Week - Barcelona, Spain What does a job need ? 1. Data (Events) 2. Database (Geometry,

06-Oct-2009 Elizabeth Gallas 9

“DB Release”: make system of files containing data needed. Used in reprocessing campaigns. Includes:

SQLite replicas: “mini” Conditions DB with specific Folders, IOV range, CoolTag(a ‘slice’ – small subset of many rows in particular tables)

And associated POOL files, PFC

“Frontier”: store results in a web cache. developed by Fermilab

used by CDF, adopted and further refined for CMS model One/more Frontier / Squid servers located at/near Oracle RAC

negotiate transactions between grid jobs and the Oracle DB – load levelling reduce the load on Oracle by caching results of repeated queries reduce latency observed connecting to Oracle over the WAN.

Additional Squid servers at remote sites help even more Picture on next slide

Technologies for Conditions “caching”

Page 10: Database Access Elizabeth Gallas - Oxford - October 06, 2009 ATLAS Week - Barcelona, Spain What does a job need ? 1. Data (Events) 2. Database (Geometry,

06-Oct-2009 Elizabeth Gallas 10

Client machineATHENA

COOL

CORAL

FroNTier client

squid

TomcatFroNTier servlet

Oracle DB server

squid

Serversite

IOVDbSvc

GetFileFromROOTViaPOOL

Oracle sqlite MySQL

AT

LAS

LCG

LC

G

FroNTier for ATLAS (picture: David Front)

Page 11: Database Access Elizabeth Gallas - Oxford - October 06, 2009 ATLAS Week - Barcelona, Spain What does a job need ? 1. Data (Events) 2. Database (Geometry,

06-Oct-2009 Elizabeth Gallas 11

What does a remote collaborator (or grid job) need?

Fred LuehringIndiana U

Use case: TRT Monitoring (one of MANY examples…) Needs: latest Conditions (Oracle + POOL files) Explored all 3 access methods

Talked: in/at hallways … then at meetings (BNL Frontier, Atlas DB) .. With experts and many other users facing similar issues…

In this process: Many many talented people have been involved in this process from around the globe – impressive collaboration !

Collective realization that Use cases continue to grow for distributed

Processing…Calibration…Alignment…Analysis … Expect sustained surge in all use cases w/collision data

Frontier technology seems to satisfies the needs of most use cases in a reasonable time Now a matter of final testing to refine configurations, going global … for all sites wanting to run jobs with latest data …

Page 12: Database Access Elizabeth Gallas - Oxford - October 06, 2009 ATLAS Week - Barcelona, Spain What does a job need ? 1. Data (Events) 2. Database (Geometry,

06-Oct-2009 Elizabeth Gallas 12

Very recent results from Japan … TadaAki Isobe reported last week http://www.icepp.s.u-tokyo.ac.jp/~isobe/rc/conddb/DBforUserAnalysisAtTokyo.pdf

ReadReal.py script Athena 15.4.0 3 Methods:

SQLiteAccess files on NFS

w/ LYON Oracle ~290msec RTT to CC‐IN2P3

w/ FroNTier ~200msec RTT to BNL Zip‐level: 5

Zip‐level: 0 (i.e. no compress), it takes ~15% longer time.

Work is ongoing to understand these kinds of tests.

Page 13: Database Access Elizabeth Gallas - Oxford - October 06, 2009 ATLAS Week - Barcelona, Spain What does a job need ? 1. Data (Events) 2. Database (Geometry,

06-Oct-2009 Elizabeth Gallas 13

Frontier/Squid – Grid-wide deployment Thanks to many !!! We have established “Proof of Principle”

Much testing and refinement of various components ongoing This is quickly falling into place

Requested / Received – Approval from Atlas Computing Management for Full scale deployment enabling grid-wide Frontier access to Conditions data

Frontier/Squid servers: Established/Fleshed-out at Tier-1: BNL, FZK New Frontier Service at Tier-0: CERN Additional sites: RAL, TRIUMF, LYON? improve robustness/fail over

Squid servers (experience at Indiana, Michigan, SLAC, Glasgow, DESY …) Required at all Tier-1 and Tier-2 sites intending to be an analysis center

Complemented by HOTDISK for POOL file subscription Documentation:

Frontier installation (needed only at selected Tier-1 sites) https://www.racf.bnl.gov/docs/services/frontier/installation/

Squid installation (all Tier-1, Tier-2, and no limitation beyond that …) https://www.racf.bnl.gov/docs/services/frontier/squid-installation

Squid testing: https://www.racf.bnl.gov/docs/services/frontier/testing

TWiki to register Squid Servers on the ATLAS grid (Rod Walker) https://twiki.cern.ch/twiki/bin/view/Atlas/T2SquidDeployment

Hypernews (E-group): Email: [email protected] Forum: https://groups.cern.ch/group/hn-atlas-DBOps/

Database Operations TWiki (will be updated) https://twiki.cern.ch/twiki/bin/view/Atlas/DatabaseOperations

Page 14: Database Access Elizabeth Gallas - Oxford - October 06, 2009 ATLAS Week - Barcelona, Spain What does a job need ? 1. Data (Events) 2. Database (Geometry,

06-Oct-2009 Elizabeth Gallas 14

Ongoing Work (in Conditions DB Access) Address inefficiencies on individual subsystems is

essential for collective long term stability Many use cases many different types of queries

Associated with Frontier deployment: Final checks to insure cache is not stale Understanding the ‘scale’ of deployment needed POOL containers smaller IOVs POOL file subscriptions (sites install HOTDISK) PFC (POOL File Catalog) update automation Squid Server – setup, testing, registration Hammercloud tests (JElmsheuser, DvanderSter, RWalker)

http://homepages.physik.uni-muenchen.de/~johannes.elmsheuser/dbaccess/

Gradually expanding to include more sites Ultimate: streamline/unify configuration of grid jobs

Configure sites in uniform way according to capability Input to AGIS – ATLAS Grid Information System

Page 15: Database Access Elizabeth Gallas - Oxford - October 06, 2009 ATLAS Week - Barcelona, Spain What does a job need ? 1. Data (Events) 2. Database (Geometry,

06-Oct-2009 Elizabeth Gallas 15

ATLAS TAGs in the ATLAS Computing model Stages of ATLAS reconstruction

RAW data file ESD (Event Summary Data) ~ 500 kB/event

AOD (Analysis Object Data) ~ 100 kB/event TAG (not an acronym) ~ 1 kB/event (stable)

TAG s Are produced in reconstruction in 2 formats:

File based AthenaAwareNTuple format (AANT)

TAG files are distributed to all Tier 1 sites Oracle Database

Event TAG DB populated from files in ‘upload’ process Can be re-produced in re-processing

Available globally through network connection In addition:

‘Run Metadata’ at Temporal, Fill, Run, LB levels File and Dataset related Metadata

TAG Browser (ELSSI) – uses combined Event, Run, File … Metadata

RAW

AOD

ESD

TAG

Page 16: Database Access Elizabeth Gallas - Oxford - October 06, 2009 ATLAS Week - Barcelona, Spain What does a job need ? 1. Data (Events) 2. Database (Geometry,

06-Oct-2009 Elizabeth Gallas 16

TAG Services and Architecture Evolution of TAG services / architecture model

From: Everything deployed at all voluntary sites To: Specific aspects deployed to optimize resources

Decoupling of services underway – increases flexibility of the system to deploy resources depending on evolution of usage

TAG Upload: now automated/triggered, stably w/monitoring Automated upload for initial reconstruction Controlled by Tier-0 Integrated with AMI and DQ2 tools where

appropriate Balanced for read and write operations

Page 17: Database Access Elizabeth Gallas - Oxford - October 06, 2009 ATLAS Week - Barcelona, Spain What does a job need ? 1. Data (Events) 2. Database (Geometry,

06-Oct-2009 Elizabeth Gallas 17

TAG Services and Architecture Components:

TAG Database(s) at CERN and voluntary Tier-1s, Tier-2s

ELSSI – Event Level Selection Service Interface TAG Usage in every Software tutorial

ELSSI and file based TAG usage Web Services

Extract - dependencies Atlas Software AFS maxidisk to hold root files from Extract

Skim - Atlas Software, DQ2, Ganga, … Surge in effort helping to make TAG jobs grid-enabled

Response times vary: O(sec) for interactive queries

event selection, histograms… O(min) for extract O(hr) for skim

Page 18: Database Access Elizabeth Gallas - Oxford - October 06, 2009 ATLAS Week - Barcelona, Spain What does a job need ? 1. Data (Events) 2. Database (Geometry,

06-Oct-2009 Elizabeth Gallas 18

TAGs by Oracle Site (size in GB) CERN ATLR (total 404 GB)

ATLAS_TAGS_CSC_FDR 22 ATLAS_TAGS_COMM_2009 142ATLAS_TAGS_USERMIX 18ATLAS_TAGS_COMM 200 ATLAS_TAGS_ROME 22

BNL (total 383 GB)ATLAS_TAGS_CSC_FDR 18ATLAS_TAGS_COMM_2009 105ATLAS_TAGS_COMM 260

TRIUMF (total 206 GB)ATLAS_TAGS_CSC_FDR 16ATLAS_TAGS_COMM 190

DESY ATLAS_TAGS_MC 231

RAL – gearing up …

Respectable level of TAG deployment – should entertain wide variety of users (commissioning and physics analysis)

TAG upload now routine for commissioning

Intensive work on deployment of TAG services making them increasingly accessible to users (ELSSI)

Page 19: Database Access Elizabeth Gallas - Oxford - October 06, 2009 ATLAS Week - Barcelona, Spain What does a job need ? 1. Data (Events) 2. Database (Geometry,

06-Oct-2009 Elizabeth Gallas 19

Summary of Ongoing Efforts Oracle Database (online, offline, offsite) -- shared resource

Applications: living, breathing, consuming … and evolving systems Developers coming to understand global consequences Cooperation and Read/Write control via interfaces

critical for collective stability Hot topics

Online Operations Trying to anticipate modes of user analysis Conditions distribution

Frontier deployment Hammercloud testing of direct and Frontier access

Data Quality and Luminosity TAG development and resource optimization

Page 20: Database Access Elizabeth Gallas - Oxford - October 06, 2009 ATLAS Week - Barcelona, Spain What does a job need ? 1. Data (Events) 2. Database (Geometry,

06-Oct-2009 Elizabeth Gallas 20

BACKUP

Page 21: Database Access Elizabeth Gallas - Oxford - October 06, 2009 ATLAS Week - Barcelona, Spain What does a job need ? 1. Data (Events) 2. Database (Geometry,

06-Oct-2009 Elizabeth Gallas 21

Online Database Coordinator:

Giovanna Lehmann (w/support from many)

A lot of work finalizing development of applications Last few months

Many changes enhancing stability for collisions (details on each bullet would fill many slides)

ATONR (Online Oracle RAC) isolation from GPN Devising methods for offline online data

Understanding operational reliance on ATONR Phasing in of locking of owner accounts

Applications evolving from development to operations Source of Oracle Streams interruptions

Page 22: Database Access Elizabeth Gallas - Oxford - October 06, 2009 ATLAS Week - Barcelona, Spain What does a job need ? 1. Data (Events) 2. Database (Geometry,

06-Oct-2009 Elizabeth Gallas 22

Dynamic (TAG) services composition User accesses „integrated ELSSI“

Which data to you want to query? (data09, mc08...) Select metadata criteria Select service(s): count, histogram, tabulate, extract, skim System detects user identification and location

System transparently deploys services based on this input Using internal service catalogue Configure appropriate service and execute operation Optimize response using load-balancing of infrastructure

Anticipate deployment of additional features: Logging access patterns to learn more about user behaviour,

deploy resources to optimize service Log good „service combinations“ – allow sharing between

users.

Elisabeth Vinek - ATLAS Software & Computing Week 2203.09.2009

Page 23: Database Access Elizabeth Gallas - Oxford - October 06, 2009 ATLAS Week - Barcelona, Spain What does a job need ? 1. Data (Events) 2. Database (Geometry,

06-Oct-2009 Elizabeth Gallas 23

Snapshot of ATLAS 3D replication Conditions data needed for offline

analysis on the grid 10 Tier-1 sites Oracle Streams replication technology

Page 24: Database Access Elizabeth Gallas - Oxford - October 06, 2009 ATLAS Week - Barcelona, Spain What does a job need ? 1. Data (Events) 2. Database (Geometry,

06-Oct-2009 Elizabeth Gallas 24

Where are the POOL files ? DQ2(DDM) - distributes Event data files and Conditions POOL files. TWiki: StorageSetUp for T0, T1's and T2's ADC/DDM maintains ToA sites (Tiers of ATLAS)

ToA sites are subscribed to receive DQ2 POOL files ToA sites have "space tokens" (areas for file destinations) such as:

“DATADISK" for real event data “MCDISK" area for simulated event data … “HOTDISK" area for holding POOL files needed by many jobs

has more robust hardware for more intense access

Some sites also use Charles Waldman's "pcache": Duplicates files to a scratchdisk accessible to local jobs

avoiding network access to "hotdisk". Magic in pcache tells the job to look in the scratchdisk first.

Deployment of POOL files deployed to all ToA sites 'on the GRID' ? ADC – in progress

Page 25: Database Access Elizabeth Gallas - Oxford - October 06, 2009 ATLAS Week - Barcelona, Spain What does a job need ? 1. Data (Events) 2. Database (Geometry,

06-Oct-2009 Elizabeth Gallas 25

Request: sites create HOTDISKEmail from Stephane Jezequel (Sept 15) Could you please forward this request to all ATLAS Grid

sites which are included in DDM: As discussed during the ATLAS software week, sites are

requested to implement the space token ATLASHOTDISK. More information:

https://twiki.cern.ch/twiki/bin/view/Atlas/StorageSetUp#The_ATLASHOTDISK_space_token

Sites should assign at least 1 TB to this space token (should foresee 5 TB). In case of storage crisis at the site, the 1 TB can be reduced to 0.5 TB. Because of the special usage of these files, sites should decide to assign a specific pool or not.

When it is done, please report to DDM Ops (Savannah ticket is a good solution) to create the new DDM site.

Page 26: Database Access Elizabeth Gallas - Oxford - October 06, 2009 ATLAS Week - Barcelona, Spain What does a job need ? 1. Data (Events) 2. Database (Geometry,

06-Oct-2009 Elizabeth Gallas 26

Where are the PFCs (POOL File catalogs)? Mario Lassnig - modified DQ2 client dq2-ls

Can ‘on the fly’ create the PFC for the POOL files on a system written to work for "SRM systems“ Systems without SRM cannot subscribe automatically

Can get files via more manual procedures Options in dq2-ls detect the type of system but may not

always successfully remove SRM specific descriptors Unclear at the time of the meeting cases where dev is required

DQ2 client continues to evolve/improve with use cases… Update PFC with new POOL files …

Detection of new POOL file arrival Generate updated PFC Run above script if needed preparing file for local use

Page 27: Database Access Elizabeth Gallas - Oxford - October 06, 2009 ATLAS Week - Barcelona, Spain What does a job need ? 1. Data (Events) 2. Database (Geometry,

06-Oct-2009 Elizabeth Gallas 27

Thanks to many experts in many areas Richard Hawkings, Andrea Valassi (CERN IT), Hans von

der Schmitt, Walter Lampl, Shaun Roe, Paul Laycock, David Front, Slava Khomutnikov, Stefan Schlenker, Saverio D'auria, Joerg Stelzer, Vakho Tsulaia, Thilo Pauly, Marcin Nowak, Yuri Smirnov, Solveig Albrand, Fred Luehring, John DeStefano, Carlos Gamboa, Rod Walker, Bob Ball, Jack Cranshaw, Alessandro Desalvo, Xin Zhao, Mario Lassnig, ADC … ! apologies ! Not complete list ! Should delete slide … too incomplete…

Many many many application developers and subsystem experts who insure the data going in is what we want coming out.

Page 28: Database Access Elizabeth Gallas - Oxford - October 06, 2009 ATLAS Week - Barcelona, Spain What does a job need ? 1. Data (Events) 2. Database (Geometry,

06-Oct-2009 Elizabeth Gallas 28

Features of Athena: Previous to Release 15.4:

Athena (RH) looks at IP the job is running at, uses dblookup.xml in the release to decide the order of

database connections to try to get the Conditions data.

Release 15.4 Athena looks for Frontier environment variable,

if found, ignores the dblookup using instead another env


Recommended