Don Quijote
Data Management for the ATLAS Automatic Production System
CHEP 2004
Miguel Branco – CERN [email protected]
27/09/2004 Don Quijote - CHEP 2004 2
Overview Introduction
Architecture
End-user tools and APIs
Future Plans, Conclusion and Additional Information
27/09/2004 Don Quijote - CHEP 2004 3
ATLAS Data Challenges
ATLAS decided to undertake a series of Data Challenges in order to validate its Computing Model, its software, its data model
Started summer 2004:o ATLAS DC-2
Introduced the new ATLAS Automatic Production System:o Unsupervised production across many sites spread over
three different Grids (US Grid3, NorduGrid, LCG-2)o 3 major components:
Windmill – ATLAS Production Supervisor Job Executors – one executor per “grid-flavor” Common Data Management system
The decision was taken to implement a single data management system capable of accessing all ATLAS Data Challenges data
27/09/2004 Don Quijote - CHEP 2004 4
Don Quijote Don Quijote (DQ) is a high-level interface for grid data
management for the ATLAS Automatic Production System
Allow transparent registration and movement of replicas between all grid “flavors” used by ATLASo US Grid3, NorduGrid and LCG-2
Avoid creating yet another replica and metadata catalog Use existing catalogs and data management tools
o Find common features between tools and catalogso “Bridge” them and provide a unified interface
Accessible as a Serviceo lightweight clients
27/09/2004 Don Quijote - CHEP 2004 5
Overview Introduction
Architecture
End-user tools and APIs
Future Plans, Conclusion and Additional Information
27/09/2004 Don Quijote - CHEP 2004 6
ArchitectureClient
GlobusRLS 2.x
US Grid3
GlobusRLS 2.x
NorduGrid
LCG RLS
LCG-2
Serverso One per “Grid”o GSI-enabled version and
insecure version (with service certificate)
o Multiple configuration settings
Cliento C++ client APIo User interface tools in
Pythono Configuration file indicating
endpoint of each server
27/09/2004 Don Quijote - CHEP 2004 7
ArchitectureDQ-LCG
serverDQ-Grid3
serverDQ-NGserver
DQ Client
Who has replicas of the LFN?Ok. Taking care of it. Will letyou know when it’s done.
Ok. Stage this one and returnme a GridFTP Transport URLHere is the TURL
Whomever owns castorgrid.ific.uv.es pleasecopy a file from this Transport URL and registerthe replica in the replica catalog maintaining thesemetadata attributes.
These are my replicas
castorgrid.ific.uv.es Source Storage from NG
3rd party-transfer
Replicate this LFN to castorgrid.ific.uv.es
27/09/2004 Don Quijote - CHEP 2004 8
DQ modules Current structure:
DqCore
DqGlobusRls
DqClassicReplicaAccessDqLcgReplicaAccess
DqPoolRls
DqConfigFile
DqFactory
DqInterface DqMonitor
DqUI
dq.py
Python Module C++Python
wrapper
C++ Client Module
DqLcgInfoService DqVdtInfoService
DqNgInfoService
DqServerLcg
dms.py
Production User Interface
dms2.py
End-user Client tool
DqServerNg DqServerVdt
27/09/2004 Don Quijote - CHEP 2004 9
Overview Introduction
Architecture
End-user tools and APIs
Future Plans, Conclusion and Additional Information
27/09/2004 Don Quijote - CHEP 2004 10
Functionalities provided by API
What can be done using client API or command-line tools?o Search for replicas of logical files as well as metadata
attributeso List storage locationso Replicate files between storage locationso Get a locally accessible physical file from a grid-storageo Put a file into a grid storageo Validate a file – md5 checksum, file sizeo Subject to security:
Renaming logical files Removing logical files and physical replicas
All actions above can be executed within or across different grids
27/09/2004 Don Quijote - CHEP 2004 11
End-user tools Provide a single tool for end-users to manage data
fileso Integrates all tools that users would have to know
about into a single one: POOL, EDG, Globus, Castor, …Act as a Replica Manager
o Although being “POOL-aware”, there is nothing ATLAS or HEP-specific
Eases security requirements for end-userso Temporarily and for some requests only!
27/09/2004 Don Quijote - CHEP 2004 12
Overview Introduction
Architecture
End-user tools and APIs
Future Plans, Conclusion and Additional Information
27/09/2004 Don Quijote - CHEP 2004 13
Future plans Decouple DQ modules into full Service Oriented
Architectureo Outsource module implementations
Monitoring of Server requestso Most commonly accessed files/partitions/datasets, …
Reliable File Transfer service (Tier0 exercise) Working on Documentation
o Twiki-based Interface to EGEE/gLite from ARDA project
o Prototype being developed by Frederik Orellana Future? No plans for major rewrite, only refactoring Most important is to maintain the same interface for
end-users and for the production system
27/09/2004 Don Quijote - CHEP 2004 14
Conclusion Don Quijote is becoming the default grid data file access layer for
ATLASo “New catalogs are coming from grid projects; we should stick with
our present DQ insulation layer”ATLAS Database and Data Management project
Accomplished goal of exposing different grids middleware with a unified interface
Client tools for end-users as well as for production managers DQ usage:
o Can access ~32 TB of data and ~140K files produced so far by the ATLAS DC since early June
o Total requests: over 600 000, mostly to the replica catalogs without file movement
o File Transfers: only around 3 TB so far; will increase to around 35 TB with Tier 0 exercise in coming weeks
Overall, still a bit to go to provide a unified system to access ATLAS production data
o DQ aims to help building that unified system
27/09/2004 Don Quijote - CHEP 2004 15
Additional information
DQ web page:o http://cern.ch/mbranco/cern/donquijote/
DQ docs (twiki):o https://uimon.cern.ch/twiki/bin/view/Atlas/DonQuijote
Feel free to contact me:o [email protected]