+ All Categories
Home > Documents > Analysis Tools at D0 PPDG Analysis Grid Computing Project, CS 11 Caltech Meeting Lee Lueking Femilab...

Analysis Tools at D0 PPDG Analysis Grid Computing Project, CS 11 Caltech Meeting Lee Lueking Femilab...

Date post: 18-Jan-2018
Category:
Upload: barbra-jefferson
View: 218 times
Download: 0 times
Share this document with a friend
Description:
December 19, 2002Lee Lueking - FNAL/CD SAM Simplified Database Schema Files ID Name Format Size # Events Files ID Name Format Size # Events Events ID Event Number Trigger L1 Trigger L2 Trigger L3 Off-line Filter Thumbnail Events ID Event Number Trigger L1 Trigger L2 Trigger L3 Off-line Filter Thumbnail Volume Project Data Tier Physical Data Stream Physical Data Stream Trigger Configuration Trigger Configuration Creation & Processing Info Creation & Processing Info Run Event-File Catalog Event-File Catalog Run Conditions Luminosity Calibration Trigger DB Alignment Run Conditions Luminosity Calibration Trigger DB Alignment Group and User information Group and User information Station Config. & Cache info Station Config. & Cache info File Storage Locations File Storage Locations MC Process & Decay MC Process & Decay SAM schema has over 100 tables There are several other related tablespaces also available
15
Analysis Tools at D0 PPDG Analysis Grid Computing Project, CS 11 Caltech Meeting Lee Lueking Femilab Computing Division December 19, 2002
Transcript
Page 1: Analysis Tools at D0 PPDG Analysis Grid Computing Project, CS 11 Caltech Meeting Lee Lueking Femilab Computing Division December 19, 2002.

Analysis Tools at D0

PPDG Analysis Grid Computing Project, CS 11Caltech Meeting

Lee LuekingFemilab Computing Division

December 19, 2002

Page 2: Analysis Tools at D0 PPDG Analysis Grid Computing Project, CS 11 Caltech Meeting Lee Lueking Femilab Computing Division December 19, 2002.

December 19, 2002 Lee Lueking - FNAL/CD

DZero Analysis Tools SAM dataset tools

– Database Schema– Datasets and Query Dimensions

MC RunJob• Monte Carlo request system

– Physics Groups submit web based requests for MC– Provides prioritization, work assignment, and tracking.

• D0 Framework, D0Tools, D0 Run Time Environment (RTE) • ROOT

– Using ROOT non-intrusive I/O for Dzero EDM (Event Data Model)– ROOT/SAM and ROOT/SAM-Grid (Philippe Canal, up next)

More in this talk

Page 3: Analysis Tools at D0 PPDG Analysis Grid Computing Project, CS 11 Caltech Meeting Lee Lueking Femilab Computing Division December 19, 2002.

December 19, 2002 Lee Lueking - FNAL/CD

SAM Simplified Database Schema

FilesID

NameFormat

Size# Events

EventsID

Event NumberTrigger L1Trigger L2Trigger L3

Off-line FilterThumbnail

Volume

Project

Data TierPhysical

Data Stream

TriggerConfiguration

Creation &Processing

Info

Run

Event-FileCatalog

Run Conditions

Luminosity

Calibration

Trigger DB

Alignment

Group and User

information

Station Config. &Cache info

File Storage

Locations

MC Process & Decay

•SAM schema has over 100 tables

•There are several other related tablespaces also available

Page 4: Analysis Tools at D0 PPDG Analysis Grid Computing Project, CS 11 Caltech Meeting Lee Lueking Femilab Computing Division December 19, 2002.

December 19, 2002 Lee Lueking - FNAL/CD

file Run Event Date Trigger Apo App vsn …

file1

file2

file3

file4

file5

filen

Challenge: Transform the complex SAM schema into a form that is user friendly, and avoids badly formed user SQL queries.

Solution: Transform the schema to look like one giant table.

Dimension Name

DataFile

Page 5: Analysis Tools at D0 PPDG Analysis Grid Computing Project, CS 11 Caltech Meeting Lee Lueking Femilab Computing Division December 19, 2002.

December 19, 2002 Lee Lueking - FNAL/CD

Dimensions, Links, and ChainsMatthew Vranicar (Piocon Technologies), SAM team

• This transformation is done using what we call links and chains.

• A link is a description of how to relate two tables

• A chain is a set of links that connects the desired dimension (column in a table) to the datafile.

• These are stored in the database itself, and loaded in the middle tier server when it starts up.

• A grammar is provided for users to build complex queries employing dimensions.

Page 6: Analysis Tools at D0 PPDG Analysis Grid Computing Project, CS 11 Caltech Meeting Lee Lueking Femilab Computing Division December 19, 2002.

December 19, 2002 Lee Lueking - FNAL/CD

Dimensions:Examples• There are dozens of dimensions available. • Additional dimensions are easily defined. • Examples of dimensions defined:

– APPL_NAME, APPL_NAME_ANALYZED, CONSUMED_DATE, CONSUMED_STATUS, CONSUMER, CONSUMER_GROUP, CONSUMER_ID, CREATE_DATE, DATASET_DEF_ID, DATASET_DEF_NAME, DATASET_ID, DATASET_VERSION, DATA_FILE_LOCATION_STATUS, DATA_TIER, DATA_TIER_ANALYZED, DELIVERED_STATUS, EVENT_NUMBER, FAMILY, FAMILY_ANALYZED, FILE_ANALYZED, FILE_NAME, FILE_PARTITION, FILE_STATUS, FULL_PATH, LOGICAL_DATASTREAM_NAME, PARAM_TYPE, RUN_ID, RUN_NUMBER, RUN_QUALITY, VERSION, VERSION_ANALYZED, WORK_GRP_NAME , etc., etc., etc.

• __SET__ : Special dimension allowing you to include an existing dataset definition.

Page 7: Analysis Tools at D0 PPDG Analysis Grid Computing Project, CS 11 Caltech Meeting Lee Lueking Femilab Computing Division December 19, 2002.

December 19, 2002 Lee Lueking - FNAL/CD

Query Syntax and Grammar• Constraint operators:=, !=, >, < >=, <=, like, not like, in,

not in, between, is null, is not null • Sets operators: and, or, minus, (union, intersection to be added)• syntax: --dim="[(]name [conOper] value [setOper name

[conOper] value][)] ..." • Command line examples:

– sam define dataset --defname=dataset_definition_name --group=work_group_name --dim="(run_number 100930 data_tier digitized) minus physical_datastream_name electron+jet"

– sam create dataset --defname=dataset_definition_name

Note: Through an SBIR (Matthew Vranicar, Piocon + Randolf Herber, FNAL CD) are providing additional features. More reliability using tokenizer (flex), and parser (bison) to check the grammar and do () handling. Also, security, user access control, database resource management are being added.

Page 8: Analysis Tools at D0 PPDG Analysis Grid Computing Project, CS 11 Caltech Meeting Lee Lueking Femilab Computing Division December 19, 2002.

December 19, 2002 Lee Lueking - FNAL/CD

Page 9: Analysis Tools at D0 PPDG Analysis Grid Computing Project, CS 11 Caltech Meeting Lee Lueking Femilab Computing Division December 19, 2002.

December 19, 2002 Lee Lueking - FNAL/CD

Page 10: Analysis Tools at D0 PPDG Analysis Grid Computing Project, CS 11 Caltech Meeting Lee Lueking Femilab Computing Division December 19, 2002.

December 19, 2002 Lee Lueking - FNAL/CD

MC_RunJob:What is it?Greg Graham (FNAL/CMS), David Evans (Lancaster University/DZero)

• Python based work flow planner• Metadata language interpreter• Flexible and generic

Dataset1

Pkg 1

Pkg 1

Pkg 1

Pkg 2Pkg 3

Pkg 2

Pkg 2

Pkg 2

Pkg 1Processing Line

Phase Boundaries

DS 2 DS 3

Pkg 3

Dataset4

Page 11: Analysis Tools at D0 PPDG Analysis Grid Computing Project, CS 11 Caltech Meeting Lee Lueking Femilab Computing Division December 19, 2002.

December 19, 2002 Lee Lueking - FNAL/CD

MC_RunJob:Work Flow Planner

• Chain a set of inputs, processes, and outputs• Set up a local execution environment• Parallel-ize the job according to the local

environment.• Produce metadata to describe each processing

step.• Retrieval/delivery of input/output data.• Cluster scale job management (adapting to work

with SAM-Grid).

Page 12: Analysis Tools at D0 PPDG Analysis Grid Computing Project, CS 11 Caltech Meeting Lee Lueking Femilab Computing Division December 19, 2002.

December 19, 2002 Lee Lueking - FNAL/CD

MC_RunJob:Metadata Language Interpreter

• Define metadata language:keywords• Convert metadata into jobs: Macro• Generate metadata for processors• Log the metadata for output to declare/store into SAM

Metadata Physics Result

Page 13: Analysis Tools at D0 PPDG Analysis Grid Computing Project, CS 11 Caltech Meeting Lee Lueking Femilab Computing Division December 19, 2002.

December 19, 2002 Lee Lueking - FNAL/CD

MC_RunJob:Use at DZero

• MC production at remote farms• User MC production on central resources• MC SAM storage using keywords• Run MC software: Gen => GEANT => Digi =>

Trigsim.• Runs MC Reconstruction• Runs MC/Data analyzers: ROOT-tuple/tree makers• Plans to use more broadly for data analysis.

Page 14: Analysis Tools at D0 PPDG Analysis Grid Computing Project, CS 11 Caltech Meeting Lee Lueking Femilab Computing Division December 19, 2002.

December 19, 2002 Lee Lueking - FNAL/CD

Flexibility

• OO based python– Basic stuff is pretty generic– Essentially one python class and a list of metadata will

allow running an executable in a SAM friendly way. Can complicate it as desired.

– SAM gets the metadata definition from runjob, so when you add new features, telling SAM about it is as simple as typing “sam load keywords”.

Page 15: Analysis Tools at D0 PPDG Analysis Grid Computing Project, CS 11 Caltech Meeting Lee Lueking Femilab Computing Division December 19, 2002.

December 19, 2002 Lee Lueking - FNAL/CD

For Additional Information

• Dataset Dimensions:– http://d0db.fnal.gov/sam_project_editor

• MC_RunJob: – http://www-clued0.fnal.gov/mc_runjob/mainframe.html

• Monte Carlo Request System:– http://www-d0.fnal.gov/computing/mcprod/mcc.html

• Other D0 stuff:– http://www-d0.fnal.gov/atwork

• SAM/Root:– http://d0db.fnal.gov/sam/doc/userdocs/SamRoot.html


Recommended