www.see-grid-sci.eu
SEE-GRID-SCI
The SEE-GRID-SCI initiative is co-funded by the European Commission under the FP7 Research Infrastructures contract no. 211338
gLite Data Services and Data Management
Branko MarovicRCUB - UoB
Meteo VO Training, Belgrade24-25. June 2008
Application Gridification 3/31
Data Management
LCG-2 (LCG-2 User Guide, “man” pages) LCG-UTILS API – C/C++ LFC API – C/C++, Python GFAL API – C/C++, Python
http://grid-deployment.web.cern.ch/grid-deployment/gis/GFAL/GFALindex.htmlSEEGRID Wiki “SG Using file replicas and RFIO: UI configuration, rfiod, usage in apps, limitations and workarounds”
http://wiki.egee-see.org/index.php/SG_Using_file_replicas_and_RFIO:_UI_configuration%2C_rfiod%2C_usage_in_apps%2C_limitations_and_workarounds
Configuring UI, SE, RB Site testing of RFIO/GFAL Typical problems and solutions Java access to LFC and LCG-UTILS
Java LFC/GFAL wrapper http://grid02.rcub.bg.ac.yu/LFCJavaAPI/index.html
Customizable LFC web front end (upload, list, replicate, delete) http://grid02.rcub.bg.ac.yu/repmngr/
gLite http://grid-deployment.web.cern.ch/grid-deployment/documentation/DataManagement/
R3.0/
Application Gridification 4/31
Scope of data services in gLite
Simply, DMS provides all operation that all of us are used to performing Uploading /downloading files Creating file /directories Renaming file /directories Deleting file /directories Moving file /directories Listing directories Creating symbolic links
Application Gridification 5/31
Scope of data services in gLite
Files that are write-once, read-many Files cannot be changed unless remove or replaced If users edit files then
They manage the consequences! Maybe just create a new filename!
No intention of providing a global file management system
3 service types for data Storage Catalogs Transfer
Application Gridification 6/31
Data Issues and Grid Solutions
Resource centers need meet growing demand for storage Storage Element capable to manage multiple disk pools
Disk Pool Manager (DPM), dCache, CASTOR
Data is stored on different storage systems technologies Common interface required to hide underlying complexity
Storage Resource Manager (SRM) – storage management protocol GridFTP – secure file transfer
Data is stored at different locations with separate namespace File catalogue to provide uniform view of Grid data
LCG File Catalog (LFC)
Applications need to access Grid data management services Data management API
GFAL
Application Gridification 7/31
Data Management
The Storage Element is the service that allows a user or an application to store data for future retrieval. In gLite, every SE must have a GSIFTP server, offering basically the same functionalitis of FTP but enhanced to support GSI security.
Files that are copied to a SE should then be registered in a catalog. A catalog is basically a database that maps the name of a file (logical file name) to its physical location (physical file name).
Files in a catalog may have more than one LFN (in principle, it has nothing to do with its real name), they can have more than one replica (that is, the aame file may be present on two different SE). What uniquely identifies them is the guid, grid unique identifier, a string of 40 bytes.
Application Gridification 8/31
Data management example
ResourceResourceBrokerBroker
StorageStorageElement 1Element 1
ComputingComputingElementElement
Input “sandbox”
Input “sandbox” + Broker Info
Output “sandbox”
Output “sandbox”
““User User interface”interface”
StorageStorageElement 2Element 2
1st job writes and replicates output onto 2 SEs
Max. 20MByte
DataSets infoLCG FileCatalogue LCG FileCatalogue (LFC)(LFC)
Application Gridification 9/31
Data management example cont.
ResourceResourceBrokerBroker
StorageStorageElement 1Element 1
ComputingComputingElementElement
Input “sandbox”
Input “sandbox” + Broker Info
Output “sandbox”
Output “sandbox”
““User User interface”interface”
StorageStorageElement 2Element 2
2nd job reads input from an SE
Max. 20MByte
DataSets infoLCG FileCatalogue LCG FileCatalogue (LFC)(LFC)
Keep computation
close to storage data
Application Gridification 10/31
Data management example
StorageStorageElement1Element1
““User User interface”interface”
LCG FileCatalogue LCG FileCatalogue (LFC)(LFC)
StorageStorageElement 2Element 2
•File replicated onto 2 SEs
“Myfile.dat”
Myfile.dat
File_on_se1
File_on_se2
guid
Application Gridification 11/31
StorageStorageElement 1Element 1
““User User interface”interface”
LCG FileCatalogue LCG FileCatalogue (LFC)(LFC)
StorageStorageElement 2Element 2
“Myfile.dat”
Myfile.dat
“Logical filename”
File_on_se1 (“SURL”: site URL)
File_on_se2 (“SURL”: site URL)
“GUID” Global Unique Identifier
Content is available on 2 SEs
File content cannot change No need
to synchronize replicas
Resolving logical file name
Application Gridification 12/31
Name conventionsLogical File Name (LFN) An alias created by a user to refer to some item of data, e.g.
lfn:/grid/gilda/budapest23/run2/track1
Globally Unique Identifier (GUID) A non-human-readable unique identifier for an item of data, e.g.
guid:f81d4fae-7dec-11d0-a765-00a0c91e6bf6
Site URL (SURL) (or Physical File Name (PFN) or Site FN) The location of an actual piece of data on a storage system, e.g.
srm://pcrd24.cern.ch/flatfiles/cms/output10_1 (SRM) sfn://lxshare0209.cern.ch/data/alice/ntuples.dat (Classic SE)
Transport URL (TURL) Temporary locator of a replica + access protocol: understood by a SE, e.g.
rfio://lxshare0209.cern.ch//data/alice/ntuples.dat
Application Gridification 13/31
Name conventions
Users primarily access and manage files through “logical filenames”
•Mapping by the “LFC” catalogue server
Defined by the userLFC Namespace
LFC has a directory tree structurelfn:/grid/<VO_name>/ <you create it>
Application Gridification 14/31
Storage Element 3sfn://trigriden01.unime.it/flatfiles/SE00/gilda/generated/2007-06-23/filec79a9e3c-2485-4206-a2a5-235f
Storage Element 2srm://aliserv6.ct.infn.it/dpm/ct.infn.it/home/gilda/generated/2007-06-23/filea21ab3e2-8ff6-4a44-82a7-f2
LFC directories
LFC directories = virtual directories Each entry in the directory may be stored on different SEs
lfn:/grid/gilda/budapest23/run2/
input1input2input3
Storage Element 1sfn://grid005.iucc.ac.il/storage/gilda/generated/2007-06-23/fileb233d43f-5bc6-4ede-a5fe-611d48be2ba5
LCG FileCatalogue LCG FileCatalogue (LFC)(LFC)
Storage Element 4sfn://grid005.iucc.ac.it/flatfiles/SE00/gilda/generated/2007-06-23/filec79a9e3c-2485-4206-a2a5-235f
Application Gridification 15/31
LCG File Catalog
gLite supports two different types of catalogs: LFC (LCG File Catalog) and RLS (Replica Location Server).
The catalog can be accessed using data management commands from the UI. Two environment variables must be set: the file catalog type and its address
export LCG_CATALOG_TYPE=lfcexport LFC_HOST=lfc-atlas-test.cern.ch
LFN in LFC have a particular form: they’re organized in hierarchical directory-like structure, having the following looklfn:/grid/<VO>/<dir>/<filename>
There are several LFC hosts on LCG and they’re not synchronized, so the choice of the user has to be consistent throughout his activity!Usually, there’s a central LFC per VO, so that basically there are no risks of this kind.
Application Gridification 16/31
Two sets of commands
LFC = LCG File Catalogue LCG = LHC Compute Grid LHC = Large Hadron Collider
Use LFC commands to interact with the catalogue only To create catalogue directory List files
Used by you, your application and by lcg-utils (see below)
lcg-utils Couples catalogue operations with file management
Keeps SEs and catalogue in step! Copy files to/from/between SEs Replicated
Application Gridification 17/31
LFC basics
Defined by the userLFC Namespace
LFC has a directory tree structure/grid/<VO_name>/ <you create it>
• All members of a given VO have read-write permissions in their directory
• Commands look like UNIX with “lfc-” in front (often)
Application Gridification 18/31
Storage Element
Provides Storage for files : massive storage system - disk or tape based Transfer protocol (gsiFTP) ~ GSI based FTP server
Striped file transfer – cluster as back-end
Storage Element server
File request + VOMS proxy
File system
Authentication, authorization
Application Gridification 19/31
Data management commands
Data management commands are of the form lcg-**.Some of them only access the catalog:
lcg-aa add aliaslcg-ra remove aliaslcg-rf register filelcg-uf unregister filelcg-la list aliaseslcg-lg list guidlcg-lr list replicas
Some of them perform real data movement operations, usually updating the catalog about the new changes:
lcg-cp copy locally a file (this command do not write on the catalog)lcg-cr copy and register a file on a SElcg-del delete (physically) a file and its entry in the catalog lcg-rep replicate a file from a SE to another
In order for these commands to work, besides the 2 catalog variables, another env variable must be set:
export LCG_GFAL_INFOSYS=<BDII_address:2170>
Application Gridification 21/31
GFAL C API
GFAL (Grid File Access Library) is a POSIX interface for operation on file on Storage ElementEnable remote handling of filesLibraries are in C and can be included in C/C++ sources (GFAL Java API tomorrow!)The most common of I/O operations are available, just prefix gfal_ to the function name (open(), read()…)man gfal for further details The destination SE must provide secure rfio (classic SEs don’t)GFAL API Description http://grid-deployment.web.cern.ch/grid-deployment/
documentation/LFC_DPM/gfal/html
Application Gridification 22/31
GFAL API code sniffet
Examples in gLite3 User Guide (Appendix F) https://edms.cern.ch/file/722398//gLite-3-UserGuide.pdf
int fd;struct stat remote_file_stat;
fd = gfal_open(file_ref, O_RDONLY, 0644);cod_ex = gfal_stat(file_ref, &file_stat)...cod_ex = gfal_read(fd, buffer, file_stat.st_size));...cod_ex = gfal_close(fd);
Application Gridification 23/31
LFC and LCG utils
List directoryCreate a local file then upload it to an SE and register with a logical name (lfn) in the catalogueCreate a duplicate in another SEList the replicas
LCG File Catalogue LCG File Catalogue (LFC)(LFC)
StorageStorageElement 1Element 1
““User User interface”interface”
StorageStorageElement 2Element 2
lfc-*
lcg-*
Application Gridification 24/31
LFC and LCG utils
List directoryCreate a local file then upload it to an SE and register with a logical name (lfn) in the catalogueCreate a duplicate in another SEList the replicas
Create a second logical file name for a fileDownload a file from an SE to the UI
LCG File Catalogue LCG File Catalogue (LFC)(LFC)
StorageStorageElement 1Element 1
““User User interface”interface”
StorageStorageElement 2Element 2
? lcg-*
lfc-*
Application Gridification 25/31
LFC commands
There are, on the UI, some commands that directly interact with the LFC catalog.Due to its particular LFN structure, files in the LFC catalog can be browsed as if they were in a unix filesystem. Try this:
> lfc-ls /grid/seegrid
The lfc-ls command works just like a ls on a local filesystem (also allowing the -l option). In the same way, lfc-mkdir, lfc-chmod or lfc-chown behave almost like their corresponding brothers on unix.
In spite of the easyness of LFC commands, usually only lfc-ls is used.Commands that perform actions on the catalog, that write on it or delete “directories” from it should be used with great caution: the risk is to cause inconsistencies between the catalog and the files on the SE.
Data management command assure that such inconsistencies are not created. These commands write on the catalog but they also check that no “harm” is done to the system.
Application Gridification 26/31
Add/replace a commentlfc-setcomment
Set file/directory access control listslfc-setacl
Remove a file/directorylfc-rm
Rename a file/directorylfc-rename
Create a directorylfc-mkdir
List file/directory entries in a directorylfc-ls
Make a symbolic link to a file/directorylfc-ln
Get file/directory access control listslfc-getacl
Delete the comment associated with the file/directorylfc-delcomment
Change owner and group of the LFC file-directorylfc-chown
Change access mode of the LFC file/directorylfc-chmod
Summary of the LFC Catalog commands
LFC Catalog commands
Application Gridification 27/31
Low level commands
There are some “low-level” commands made available to grid users that should be used with caution, working merely on the SE without updating the catalog. Anyway, 2 of them will prove to be real friends to anyone who has to look for files on the grid:
edg-gridftp-ls gsiftp://<SE_address>/<dir>/globus-url-copy <src_file> <dest_file>
The first command lists the content of a directory on a remote SE, the second one is the base for every lcg tool that has to move data. The <src_file> and <dest_file> have to be in a fully qualified format:
file:///<abs_path>/<file_name> for local filesgsiftp://<SE_address>/<abs_path>/<file_name> for remote files
Other useful low-level commands (to be used carefully!) are
edg-gridftp-rm <URL>edg-gridftp-rmdir <URL>edg-gridftp-rename <src_URL> <dest_URL>
Application Gridification 28/31
File Transfer Service
FTS is a low level data movement service
Why is it needed? Improves reliability for transfers Provides asynchronous file transfer
schedule transfers when resources are available
Provides control of transfer properties (channel concept)
No catalogue interactions yet users have to handle SURL
Application Gridification 29/31
FTS Concepts
Transfer Job A set of source/destination pairs specifying files to transfer Submitted to FTS for processing
Channel A job is assigned to a channel after submission Represents a point-to-point network link Catch all channels are possible: any-to-me, me-to-any Similar to a queue where you can specify
VO share for the queue Number of concurrent file transfer Number of concurrent streams (gridFTP)
Application Gridification 30/31
FTS architecture
All components are decoupled from each other Each interacts only with the database
Experiments interact viaweb-service User: FileTransfer Admin:
ChannelManagement
VO agents assigns jobs to channels
Channel agents manages assigned file transfers
Monitoring and statistics can be collected via the DB
Application Gridification 31/31
Summary of fts client commands
FTS client
glite-transfer-submit Submit a transfer job : needs at least source and destination SURL
glite-transfer-status Given one or more job ID, query about their status
glite-transfer-cancel Delete the transfer with the give Job ID
glite-transfer-list Query about status of all user’s jobs; support options for query restrictions
glite-transfer-channel-list
Show all available channel; detailed info only if user has admin privileges