Date post: | 31-Dec-2015 |
Category: |
Documents |
Upload: | winter-owens |
View: | 28 times |
Download: | 0 times |
Andrew C. Smith, 15th October 2006ICFA Workshop On Grid Activities – LHCb Data Management Tools
ICFA Workshop On Grid Activities
LHCb Data Management Tools
Andrew C. Smith, 15th October 2006ICFA Workshop On Grid Activities – LHCb Data Management Tools
Overview
Brief Introduction to LHCb Computing ModelData Management Requirements
RAW, Stripped, MC.
DIRAC Data Management SystemStorage Element, File Catalogues, Replica Manager, Transfer Agent, Bulk Transfer, FTS.
Automatic Data TransfersReplicationAgent, RAW/Stripped DST Replication
File Integrity Checking
Andrew C. Smith, 15th October 2006ICFA Workshop On Grid Activities – LHCb Data Management Tools
Computing Model Intro.
CERN – Central production centre
Distribution of RAW data
Quasi-real time to 6 LHCb Tier1sTier1s (including CERN) - RAW data
Reconstruction and StrippingStripped DSTs to be distributed to all other Tier1s
Load balanced availability for Analysis
Tier2s – Monte Carlo production centresSimulation files uploaded to Tier1s/CERN
Andrew C. Smith, 15th October 2006ICFA Workshop On Grid Activities – LHCb Data Management Tools
DM Requirements 1
RAW data files produced at LHCb Online FarmFiles created at 60MB/s
Dedicated 1GB link to Castor at Computing Centre
Files divided between Tier1 centresRatio determined by pledged computing resources
Files transferred to assigned Tier1 centreRAW files in Castor have one Tier1 replica
Reliable bulk transfer system requiredCapable of sustained 60MB/s out of CERN
Andrew C. Smith, 15th October 2006ICFA Workshop On Grid Activities – LHCb Data Management Tools
DM Requirements 2
Stripped DST files produced at Tier1 sites (including CERN)
RAW files reconstructed (currently in groups of 20/40)
Resulting rDSTs stripped once created
Stripped DSTs to be distributed to all other Tier1s
Reliable transfer system required between Tier1 sites
Either copy stripped DSTs ‘file-by-file’
Collect files at Tier1s and perform bulk transfersMonte Carlo files mostly produced at Tier2
sitesUploaded to CERN/Tier1s
Typical T2-T1 throughput ~1.1MB/s yearly average
Andrew C. Smith, 15th October 2006ICFA Workshop On Grid Activities – LHCb Data Management Tools
DIRAC DM System
FileCatalogCFileCatalogCFileCatalogBFileCatalogB
SE ServiceSE Service
SRMStorageSRMStorage GridFTPStorageGridFTPStorage HTTPStorageHTTPStorage
StorageElementStorageElement
ReplicaManagerReplicaManagerFileCatalogAFileCatalogA
UserInterfaceUserInterface WMSWMS TransferAgentTransferAgentData Management
Clients
Physical storage
DIRAC Data Management Components
Request DB
The main components are:Storage Element and Storage access plug-ins
Replica Manager
File Catalogs
Andrew C. Smith, 15th October 2006ICFA Workshop On Grid Activities – LHCb Data Management Tools
Storage Element
DIRAC StorageElement is an abstraction of a Storage facility
Access to storage is provided by plug-in modules for each available access protocol.
Pluggable transport modules: srm, gridftp, bbftp, sftp, http,…
Storage Element is used mostly to get access to the files
Grid SE (also Storage Element) is the underlying resource used
Andrew C. Smith, 15th October 2006ICFA Workshop On Grid Activities – LHCb Data Management Tools
File Catalogs
DIRAC Data Management was designed to work with multiple File CatalogsAll available catalogs have identical APIs
Can be used interchangeably
Available catalogsLCG File Catalog – LFC
Current baseline choice
Processing Database File CatalogExposing Processing DB Datafiles and Replicas table as a File Catalog
(more later)
BK database replica tablesTo be phased out
+others….
Andrew C. Smith, 15th October 2006ICFA Workshop On Grid Activities – LHCb Data Management Tools
Replica Manager
Replica Manager provides logic for all data management operations
File upload/download to/from Grid
File replication across SEs
Registration in catalogs
etc.
Keeps a list of active File CatalogsAll registrations applied to all catalogues
Andrew C. Smith, 15th October 2006ICFA Workshop On Grid Activities – LHCb Data Management Tools
Transfer Agent + RequestDB
Data Management requests stored in RequestDB
XML containing params. required for operation
e.g. Operation, LFN, SourceSE, TargetSE, etc…
Transfer AgentPicks up requests from RequestDB and executes them
Operations performed through Replica Manager
Replica Manager returns full log of operations
Transfer Agent performs retries based on logsRetries attempted till success
Andrew C. Smith, 15th October 2006ICFA Workshop On Grid Activities – LHCb Data Management Tools
Bulk Data Management
LCG – Machinery
Transfer network
LHCb - DIRACDMS
Request DB
FC Interface
Transfer Manager Interface
ReplicaManager
TransferAgent
LCG File Catalog
File Transfer Service
Tie
r0 S
E
Tie
r1 S
E A
Tie
r1 S
E B
Tie
r1 S
E C
Bulk asynchronous file replication
Requests set in RequestDB
Transfer Agent executes periodically
‘Waiting’ or ‘Running’ requests obtained from RequestDB
FTS bulk transfer jobs submitted and monitored
Andrew C. Smith, 15th October 2006ICFA Workshop On Grid Activities – LHCb Data Management Tools
FTS Architecture
Point to point channels defined:
CERN-T1s
Tier1-Tier1 matrix
Bulk Transfers Tested During SC3 and LHCb’s DC06
PICFTS Server
Manage Incoming Channels
IN2P3FTS Server
Manage Incoming Channels
CNAFFTS Server
Manage Incoming Channels
FZKFTS Server
Manage Incoming Channels
RALFTS Server
Manage Incoming Channels
SARAFTS Server
Manage Incoming Channels
Andrew C. Smith, 15th October 2006ICFA Workshop On Grid Activities – LHCb Data Management Tools
Bulk Transfer Performance
0
10
20
30
40
50
9/10
/05
11/1
0/05
13/1
0/05
15/1
0/05
17/1
0/05
19/1
0/05
21/1
0/05
23/1
0/05
25/1
0/05
27/1
0/05
29/1
0/05
31/1
0/05
2/11
/05
4/11
/05
6/11
/05
Date
Ra
te (
MB
/s)
CERN_Castor -> RAL_dCache-SC3
CERN_Castor -> PIC_Castor-SC3
CERN_Castor ->SARA_dCache-SC3
CERN_Castor -> IN2P3_HPSS-SC3
CERN_Castor -> GRIDKA_dCache-SC3
CERN_Castor -> CNAF_Castor-SC3
60
Many Castor 2 Problem
Se
rvic
e
Inte
rve
nti
on
SARA ProblemsRequired Rate
Andrew C. Smith, 15th October 2006ICFA Workshop On Grid Activities – LHCb Data Management Tools
Half-Time Summary
RAW data arrives at Castor 60MB/s out of CERN to Tier1s
DIRAC Transfer Agent Interfaced to LCG FTS
Monte Carlo files generated at Tier2sUpload to GRID SE using DIRAC DMS functionality
Stripped DST created at Tier1sMechanism still to be chosen for distribution
Files transferred as they become available or
Wait for a collection of files and perform bulk transfers Utilizing Tier1-Tier1 channels
Strategy for replication also to be decided
Andrew C. Smith, 15th October 2006ICFA Workshop On Grid Activities – LHCb Data Management Tools
LHCb Online to Castor
Files created at LHCb Online Farm at 60MB/s
These files must be transferred to Castor
DIRAC Instance installed on gateway at Farm
Online ‘data mover’ places transfer request
Processed by ReplicaManager and TransferAgent
Online Run Database
RequestDB
Data Mover
DIRAC AT PIT
XML-RPC
Transfer Agent
ReplicaManager
CERN Castor
Online Storage
BK DBLFC
ADTDBFC API
Andrew C. Smith, 15th October 2006ICFA Workshop On Grid Activities – LHCb Data Management Tools
Auto Data Transfers
DIRAC components developed to perform data driven production, reconstruction and stripping
ProcessingDB contains pseudo file catalogueOffers API to manipulate catalogue entries
Based on ‘transformations’ contained in the DB
File ‘mask’ applied to LFN
Can select files of given properties and locations
Data Management instance spawned ‘AutoDataTransferDB’
TransformationAgent manipulates ProcessingDB API
Selects files of a particular type i.e. raw/dst/rdst etc.
Submits DIRAC jobs to WMS based on these files
Perform reconstruction or stripping
This component adapted to create ‘ReplicationAgent’ for Data Management operations
Andrew C. Smith, 15th October 2006ICFA Workshop On Grid Activities – LHCb Data Management Tools
ReplicationAgent
Replication agent developed to allow automatic data transfers when files become available
Transformations defined for each DM operation to be performed
Defines source and target SEs
File mask
Number of files to be transferred in each job
ReplicationAgent operationChecks active files in ProcDB
Applies mask based on file type
Checks the location of file
Files which pass mask and match SourceSE selected for transformation
Once threshold number of files found bulk transfer jobs submitted
ReplicationAgent logic generalised so multiple transforms can be defined and run simultaneously
Andrew C. Smith, 15th October 2006ICFA Workshop On Grid Activities – LHCb Data Management Tools
Automatic RAW Replication
ADTDB
DIRAC DATA WMS
RequestDB
Transfer Agent
ReplicaManager
ReplicationAgent
Glite FTS
Tier1 SE
CERN Castor
BK DB
LFC
ProcDB
FC API
Andrew C. Smith, 15th October 2006ICFA Workshop On Grid Activities – LHCb Data Management Tools
Performance…
Andrew C. Smith, 15th October 2006ICFA Workshop On Grid Activities – LHCb Data Management Tools
Tier1 SE
Stripped DST Replication
ADTDB
DIRAC DATA WMS
RequestDB
Transfer Agent
ReplicaManager
ReplicationAgent
Glite FTS
BK DB
LFC
FC API
Tier1 SE
Tier1 SE Tier1 SE Tier1 SETier1 SE
Tier1 SE
Tier1 CEWN
Andrew C. Smith, 15th October 2006ICFA Workshop On Grid Activities – LHCb Data Management Tools
Need to maintain integrity of file catalogues Catalogue entries present on SEs
Regular listing of catalogue entriesCheck that these entries exists on the SEs
via SRM functionalitiesFiles missing from SEs can be re-replicated
SE contents against cataloguesList the contents of the SE Check against the catalogue for corresponding replicas
Possible because of file naming conventionsFiles paths on SE always ‘SEhost/SAPath/LFN’
Files missing from the catalogue can beRe-registered in catalogueDeleted from SEDepending on file properties
These processes will eventually be run regularly DIRAC Agent or daemon process
File Integrity Checking
Andrew C. Smith, 15th October 2006ICFA Workshop On Grid Activities – LHCb Data Management Tools
DIRAC DMS built from ReplicaManager accessing File Catalogue and Storage Element interfaces.
TransferAgent also extended to perform bulk transfer using FTS
DMS utilized to get RAW data from LHCb to Castor
Then to distribute to Tier1s in load balanced wayReconstruction jobs created automatically
Data driven mechanism to perform recon. and stripping
Transfer jobs created automatically to distribute data
Summary
Andrew C. Smith, 15th October 2006ICFA Workshop On Grid Activities – LHCb Data Management Tools
Questions…?