Date post: | 19-Jan-2018 |
Category: |
Documents |
Upload: | dennis-bradley |
View: | 216 times |
Download: | 0 times |
RefDB: The Reference Database
for CMS Monte Carlo Production
Véronique LefébureCERN & HIP
CHEP 2003 - San Diego, California 25th of March 2003
Véronique Lefébure - CHEP2003 2
Functionalities of RefDB
1. Management of Physics Production Requests2. Distribution, Coordination and Progress Tracking
of Production around the World: Production Assignments3. Definition of Production Instructions for workflow-planner4. Catalogue Publication of Real and Virtual Data
MySQL Database hosted at CERN Web-server, .htaccess and Php scripts
Véronique Lefébure - CHEP2003 3
General Data Flow
Web Interface: http://cmsdoc.cern.ch/…./*.php
RefDB
Request
Physicist(many)
Production Coordinator
(one)
Assignment
ProductionOperator(many)
Workflow Planner *
RUN Summary
CPU E-mail
Mail box
*IMPALA, McRunjob, CMSProd
Véronique Lefébure - CHEP2003 4
Statistics
• RefDB was designed and implemented in Nov., Dec. of 2001, and is used intensively by CMS since January 2002– DAQ TDR Spring 2002 Production– 2003 Production for preparation of 2004 Data Challenge
• ~ 20 Requestors• > 20 Regional Centres, >40 Production sites, 70 Production Operators• > 2000 Requests, Assignments• > 300 Parameter Files, > 1300 Parameter Values• ~ 24 MB of MySQL data
Véronique Lefébure - CHEP2003 5
Physics Production Request
• Definition of an Atomic Production Request (“Derivation”):
1. Executable (“Transformation”)2. Input Physics Parameters3. Input Data and Number of Events
4. Input Production Parameters
Defined by the Physicist
Defined by the Production Coordinator
Véronique Lefébure - CHEP2003 6
Physics Production Request1. Executable
• Selected according to – Software Name – Software Version – Executable Name
(eg: “ORCA ORCA_7_1_1 writeAllDigis”)• Binaries, distributed with DAR* tool• Based on tagged code (CVS, SCRAM)• but private code may be supported (system for loading and archiving code)• I/O File-Type constraints• Monitoring Schema and Algorithm (can be used by BOSS**)
* DAR: “Distribution After Release” (http://computing.fnal.gov/cms/natasha/DAR)** BOSS: “Batch Object Submission System” (http://www.bo.infn.it/cms/computing/BOSS)
Véronique Lefébure - CHEP2003 7
Tables:Software & Executable
SoftwareName, Version, Dates
SoftwareTypeName
SoftwareMapDARFileName, Dates, Status
DarFileElement
ExecutableName, Package
ExecutableUse
FileTypeName
MonitoringDefinitionSchema, Algorithms
ProductionStepName, Shortname
Distribution
Web forms
in out
Véronique Lefébure - CHEP2003 8
Tables: Monitoring
MonitoringBlockRegular Expression,Piece of code
MonitoringDefinition
MonitoringProcess
MonitoringProcessType
MonitoringSchema
pre
runpost
ProductionStep
MonitoringObjectName, Type, Description
Véronique Lefébure - CHEP2003 9
Physics Production Request2. Physics Parameters
• Input Parameter File is made of 1 File Fragment(s):– Modularity:
• Detector parameters• Beam-luminosity parameters, …
• Parameter File Fragment: list of (Name,Value) pairs for each parameter– Specialised scripts for file formatting– Uniqueness checked
• Single Parameter and its Value:– selected by the Physicist – or new parameter and/or new Value entered by him/her
Véronique Lefébure - CHEP2003 10
Tables: Input Parameters
ParameterName, Description
ParameterFileListOfParameterValues, Location, URL
ParameterTypeName
ParameterValueValue, Description
SoftwareTypeName
ParameterMap
Web forms
Véronique Lefébure - CHEP2003 11
Physics Production Request3. Input Data
• Number of Events to be produced or processed• Input Data:
– Selection of Logical Name of Input Data Collection (Real or Virtual Data)
• Type checked
or– Definition of the Name of a new Dataset
• Uniqueness checked
Véronique Lefébure - CHEP2003 12
Datasets and Collections
• Dataset – Physics Channel: primary interactions– Detector Configuration (geometry, material, magnetic field)
• Collection– For
• Particle tracking through detector • Track reconstruction • Physics reconstruction
– one can change• Software • Software versions• Parameters
• 1 Dataset - Many Collections (re-processing, beam luminosities, filtering, cloning and adding new objects, analysis ntuples, …)
Production Cycle
Véronique Lefébure - CHEP2003 13
Tables: Dataset & Collection
DatasetName, Description, Validity, Date,Cross-section, NbOfEvents
DataType
DatasetMap CollectionDatasetName, CollectionNameStatus, NbOfEvents
GeometrySoftware
Executable
ParameterFile
OwnerName
ProductionCycleCalo/Tk/MuDigis(on/off)Name PUCondition
Input Collection
Véronique Lefébure - CHEP2003 14
Tables: Pile-Up Conditions
Dataset
DataType
DatasetMap
Collection
ParameterFile
PUConditionName
“Minimum Bias”
Véronique Lefébure - CHEP2003 15
Physics Production Request4. Production Parameters
• Data Clustering• Commit Interval• Monitoring• JobSplitting Placeholders in Parameter file:
– for defining • Output file names• input/output run numbers, random number seeds, ….
– overwritten by • the php script that gives access the to the Parameter file• the workflow planner, with values defined by RefDB
Job decomposition defined either – by granularity of input data (runs) or – by adequate Nb of Events per Run for a reasonable job CPU time and
output data size
Véronique Lefébure - CHEP2003 16
Physics Production Request:Procedure
• All steps via web-forms• Pre-registered “Requestors” for each Physics Group: .htaccess permissions• Creation of Parameter File(s) or selection of existing ones• Request web-form starting from any point in the production chain:
atomic or chain requests– Selection of Identity (Name, Group)– Selection of Software, Version , Executable– Selection of Parameter file(s)– Selection of Input Collection or Definition of Dataset Name + Description
for new Physics Channels– Uniqueness of Request checked
• Email notification to Requestor, Group Coordinator, Production Coordinator
Véronique Lefébure - CHEP2003 17
Production Assignments• Assignment of (slices of) Requests to Regional Centres• Assignment centrally created by the Production Coordinator
– Minimize file transfers– Local physics interest– Farm performance and status, function of time– Local manpower availability, function of time– Priority of request
• RC = 1 farm or many farms or Grid– Assignments can be re-assigned by local
production coordinator to local production sites• Assignment Status updated quasi online
– Job Monitoring: log file parsed, summary sent by email– Estimation of local and global production rate
• AssignmentID = key for Production Instructions
Véronique Lefébure - CHEP2003 16
1.2 seconds p
er event, 2
months
2x1033PU4 million events
April 12th June 6th
Véronique Lefébure - CHEP2003 18
Tables: Request & Assignment
AssignmentDates (assignment,Start, End)Status, NbOfEvents (assigned, produced)NbOfEventsperRun, ChainAssignmentMasterCopyLocation
RegionalCentreName. NickName, HostRCMotherRCID, Dates (start, end)
RequestDates (request, delivery), NbOfEvents(requested, produced)NtupleOnly, Status
Person
PersonType
PersonMap
PhysicsGroup
Dataset CollectionProductionCycle
input output
MonitoringDefinition
Véronique Lefébure - CHEP2003 19
Production Instructions• Production Instructions:
– Executable Name, Software, Version– Parameter File URL– Job Splitting Instructions URL
• Table of Placeholders versus Values– Monitoring Instructions URL
• Parsing script for email summary• Parsing scripts and schema for BOSS (optional)
– URL for Geometry File or META files,i.e. Detector Configuration (pre-created)
– Dataset Name, Production Cycle• NB: Workflow-planner knows which output files to be saved• Chain Assignments:
for running sequentially several executables in one job
Véronique Lefébure - CHEP2003 20
Production Book-Keeping
• one Table per Dataset, one Row per Generation Run• for each Production Cycle:
– Run Number– Seeds– (Cross-section)– LFN– Status– Assignment ID– Number of input Events– Number of output Events
• Monitored values sent by email at end of successful jobs
Véronique Lefébure - CHEP2003 21
Data Catalogue
• RefDB Tables:– List of Catalogues
• Objectivity/DB, POOL• disk or tapes
– Catalogue – Publication Site Map– Catalogue – Collection Map
• Completeness checking• Scripts for Dataset queries
Véronique Lefébure - CHEP2003 22
Prospects
• Local installation of RefDB for “private” productions• Extend I/O file-type checking to Software compatibility
Véronique Lefébure - CHEP2003 23
Software Executable
ExecutableUse
FileType
ProductionStep
MonitoringBlock
MonitoringDefinition
MonitoringProcess
MonitoringSchema MonitoringObject
Parameter
ParameterFile ParameterValue
Dataset
CollectionGeometry
ProductionCycle PUCondition
Assignment
RegionalCentre
Request
PersonPhysicsGroup