+ All Categories
Home > Documents > gLite Data Services and Data Management

gLite Data Services and Data Management

Date post: 11-Jan-2016
Category:
Upload: melita
View: 36 times
Download: 0 times
Share this document with a friend
Description:
gLite Data Services and Data Management. Meteo VO Training, Belgrade 24-25. June 2008. Branko Marovic RCUB - UoB. Data Management. LCG-2 (LCG-2 User Guide, “man” pages) LCG-UTILS API – C/C++ LFC API – C/C++, Python GFAL API – C/C++, Python - PowerPoint PPT Presentation
29
www.see-grid-sci.eu SEE-GRID-SCI The SEE-GRID-SCI initiative is co-funded by the European Commission under the FP7 Research Infrastructures contract no. 211338 gLite Data Services and Data Management Branko Marovic RCUB - UoB Meteo VO Training, Belgrade 24-25. June 2008
Transcript
Page 1: gLite Data Services and Data Management

www.see-grid-sci.eu

SEE-GRID-SCI

The SEE-GRID-SCI initiative is co-funded by the European Commission under the FP7 Research Infrastructures contract no. 211338

gLite Data Services and Data Management

Branko MarovicRCUB - UoB

Meteo VO Training, Belgrade24-25. June 2008

Page 2: gLite Data Services and Data Management

Application Gridification 3/31

Data Management

LCG-2 (LCG-2 User Guide, “man” pages) LCG-UTILS API – C/C++ LFC API – C/C++, Python GFAL API – C/C++, Python

http://grid-deployment.web.cern.ch/grid-deployment/gis/GFAL/GFALindex.htmlSEEGRID Wiki “SG Using file replicas and RFIO: UI configuration, rfiod, usage in apps, limitations and workarounds”

http://wiki.egee-see.org/index.php/SG_Using_file_replicas_and_RFIO:_UI_configuration%2C_rfiod%2C_usage_in_apps%2C_limitations_and_workarounds

Configuring UI, SE, RB Site testing of RFIO/GFAL Typical problems and solutions Java access to LFC and LCG-UTILS

Java LFC/GFAL wrapper http://grid02.rcub.bg.ac.yu/LFCJavaAPI/index.html

Customizable LFC web front end (upload, list, replicate, delete) http://grid02.rcub.bg.ac.yu/repmngr/

gLite http://grid-deployment.web.cern.ch/grid-deployment/documentation/DataManagement/

R3.0/

Page 3: gLite Data Services and Data Management

Application Gridification 4/31

Scope of data services in gLite

Simply, DMS provides all operation that all of us are used to performing Uploading /downloading files Creating file /directories Renaming file /directories Deleting file /directories Moving file /directories Listing directories Creating symbolic links

Page 4: gLite Data Services and Data Management

Application Gridification 5/31

Scope of data services in gLite

Files that are write-once, read-many Files cannot be changed unless remove or replaced If users edit files then

They manage the consequences! Maybe just create a new filename!

No intention of providing a global file management system

3 service types for data Storage Catalogs Transfer

Page 5: gLite Data Services and Data Management

Application Gridification 6/31

Data Issues and Grid Solutions

Resource centers need meet growing demand for storage Storage Element capable to manage multiple disk pools

Disk Pool Manager (DPM), dCache, CASTOR

Data is stored on different storage systems technologies Common interface required to hide underlying complexity

Storage Resource Manager (SRM) – storage management protocol GridFTP – secure file transfer

Data is stored at different locations with separate namespace File catalogue to provide uniform view of Grid data

LCG File Catalog (LFC)

Applications need to access Grid data management services Data management API

GFAL

Page 6: gLite Data Services and Data Management

Application Gridification 7/31

Data Management

The Storage Element is the service that allows a user or an application to store data for future retrieval. In gLite, every SE must have a GSIFTP server, offering basically the same functionalitis of FTP but enhanced to support GSI security.

Files that are copied to a SE should then be registered in a catalog. A catalog is basically a database that maps the name of a file (logical file name) to its physical location (physical file name).

Files in a catalog may have more than one LFN (in principle, it has nothing to do with its real name), they can have more than one replica (that is, the aame file may be present on two different SE). What uniquely identifies them is the guid, grid unique identifier, a string of 40 bytes.

Page 7: gLite Data Services and Data Management

Application Gridification 8/31

Data management example

ResourceResourceBrokerBroker

StorageStorageElement 1Element 1

ComputingComputingElementElement

Input “sandbox”

Input “sandbox” + Broker Info

Output “sandbox”

Output “sandbox”

““User User interface”interface”

StorageStorageElement 2Element 2

1st job writes and replicates output onto 2 SEs

Max. 20MByte

DataSets infoLCG FileCatalogue LCG FileCatalogue (LFC)(LFC)

Page 8: gLite Data Services and Data Management

Application Gridification 9/31

Data management example cont.

ResourceResourceBrokerBroker

StorageStorageElement 1Element 1

ComputingComputingElementElement

Input “sandbox”

Input “sandbox” + Broker Info

Output “sandbox”

Output “sandbox”

““User User interface”interface”

StorageStorageElement 2Element 2

2nd job reads input from an SE

Max. 20MByte

DataSets infoLCG FileCatalogue LCG FileCatalogue (LFC)(LFC)

Keep computation

close to storage data

Page 9: gLite Data Services and Data Management

Application Gridification 10/31

Data management example

StorageStorageElement1Element1

““User User interface”interface”

LCG FileCatalogue LCG FileCatalogue (LFC)(LFC)

StorageStorageElement 2Element 2

•File replicated onto 2 SEs

“Myfile.dat”

Myfile.dat

File_on_se1

File_on_se2

guid

Page 10: gLite Data Services and Data Management

Application Gridification 11/31

StorageStorageElement 1Element 1

““User User interface”interface”

LCG FileCatalogue LCG FileCatalogue (LFC)(LFC)

StorageStorageElement 2Element 2

“Myfile.dat”

Myfile.dat

“Logical filename”

File_on_se1 (“SURL”: site URL)

File_on_se2 (“SURL”: site URL)

“GUID” Global Unique Identifier

Content is available on 2 SEs

File content cannot change No need

to synchronize replicas

Resolving logical file name

Page 11: gLite Data Services and Data Management

Application Gridification 12/31

Name conventionsLogical File Name (LFN) An alias created by a user to refer to some item of data, e.g.

lfn:/grid/gilda/budapest23/run2/track1

Globally Unique Identifier (GUID) A non-human-readable unique identifier for an item of data, e.g.

guid:f81d4fae-7dec-11d0-a765-00a0c91e6bf6

Site URL (SURL) (or Physical File Name (PFN) or Site FN) The location of an actual piece of data on a storage system, e.g.

srm://pcrd24.cern.ch/flatfiles/cms/output10_1 (SRM) sfn://lxshare0209.cern.ch/data/alice/ntuples.dat (Classic SE)

Transport URL (TURL) Temporary locator of a replica + access protocol: understood by a SE, e.g.

rfio://lxshare0209.cern.ch//data/alice/ntuples.dat

Page 12: gLite Data Services and Data Management

Application Gridification 13/31

Name conventions

Users primarily access and manage files through “logical filenames”

•Mapping by the “LFC” catalogue server

Defined by the userLFC Namespace

LFC has a directory tree structurelfn:/grid/<VO_name>/ <you create it>

Page 13: gLite Data Services and Data Management

Application Gridification 14/31

Storage Element 3sfn://trigriden01.unime.it/flatfiles/SE00/gilda/generated/2007-06-23/filec79a9e3c-2485-4206-a2a5-235f

Storage Element 2srm://aliserv6.ct.infn.it/dpm/ct.infn.it/home/gilda/generated/2007-06-23/filea21ab3e2-8ff6-4a44-82a7-f2

LFC directories

LFC directories = virtual directories Each entry in the directory may be stored on different SEs

lfn:/grid/gilda/budapest23/run2/

input1input2input3

Storage Element 1sfn://grid005.iucc.ac.il/storage/gilda/generated/2007-06-23/fileb233d43f-5bc6-4ede-a5fe-611d48be2ba5

LCG FileCatalogue LCG FileCatalogue (LFC)(LFC)

Storage Element 4sfn://grid005.iucc.ac.it/flatfiles/SE00/gilda/generated/2007-06-23/filec79a9e3c-2485-4206-a2a5-235f

Page 14: gLite Data Services and Data Management

Application Gridification 15/31

LCG File Catalog

gLite supports two different types of catalogs: LFC (LCG File Catalog) and RLS (Replica Location Server).

The catalog can be accessed using data management commands from the UI. Two environment variables must be set: the file catalog type and its address

export LCG_CATALOG_TYPE=lfcexport LFC_HOST=lfc-atlas-test.cern.ch

LFN in LFC have a particular form: they’re organized in hierarchical directory-like structure, having the following looklfn:/grid/<VO>/<dir>/<filename>

There are several LFC hosts on LCG and they’re not synchronized, so the choice of the user has to be consistent throughout his activity!Usually, there’s a central LFC per VO, so that basically there are no risks of this kind.

Page 15: gLite Data Services and Data Management

Application Gridification 16/31

Two sets of commands

LFC = LCG File Catalogue LCG = LHC Compute Grid LHC = Large Hadron Collider

Use LFC commands to interact with the catalogue only To create catalogue directory List files

Used by you, your application and by lcg-utils (see below)

lcg-utils Couples catalogue operations with file management

Keeps SEs and catalogue in step! Copy files to/from/between SEs Replicated

Page 16: gLite Data Services and Data Management

Application Gridification 17/31

LFC basics

Defined by the userLFC Namespace

LFC has a directory tree structure/grid/<VO_name>/ <you create it>

• All members of a given VO have read-write permissions in their directory

• Commands look like UNIX with “lfc-” in front (often)

Page 17: gLite Data Services and Data Management

Application Gridification 18/31

Storage Element

Provides Storage for files : massive storage system - disk or tape based Transfer protocol (gsiFTP) ~ GSI based FTP server

Striped file transfer – cluster as back-end

Storage Element server

File request + VOMS proxy

File system

Authentication, authorization

Page 18: gLite Data Services and Data Management

Application Gridification 19/31

Data management commands

Data management commands are of the form lcg-**.Some of them only access the catalog:

lcg-aa add aliaslcg-ra remove aliaslcg-rf register filelcg-uf unregister filelcg-la list aliaseslcg-lg list guidlcg-lr list replicas

Some of them perform real data movement operations, usually updating the catalog about the new changes:

lcg-cp copy locally a file (this command do not write on the catalog)lcg-cr copy and register a file on a SElcg-del delete (physically) a file and its entry in the catalog lcg-rep replicate a file from a SE to another

In order for these commands to work, besides the 2 catalog variables, another env variable must be set:

export LCG_GFAL_INFOSYS=<BDII_address:2170>

Page 19: gLite Data Services and Data Management

Application Gridification 21/31

GFAL C API

GFAL (Grid File Access Library) is a POSIX interface for operation on file on Storage ElementEnable remote handling of filesLibraries are in C and can be included in C/C++ sources (GFAL Java API tomorrow!)The most common of I/O operations are available, just prefix gfal_ to the function name (open(), read()…)man gfal for further details The destination SE must provide secure rfio (classic SEs don’t)GFAL API Description http://grid-deployment.web.cern.ch/grid-deployment/

documentation/LFC_DPM/gfal/html

Page 20: gLite Data Services and Data Management

Application Gridification 22/31

GFAL API code sniffet

Examples in gLite3 User Guide (Appendix F) https://edms.cern.ch/file/722398//gLite-3-UserGuide.pdf

int fd;struct stat remote_file_stat;

fd = gfal_open(file_ref, O_RDONLY, 0644);cod_ex = gfal_stat(file_ref, &file_stat)...cod_ex = gfal_read(fd, buffer, file_stat.st_size));...cod_ex = gfal_close(fd);

Page 21: gLite Data Services and Data Management

Application Gridification 23/31

LFC and LCG utils

List directoryCreate a local file then upload it to an SE and register with a logical name (lfn) in the catalogueCreate a duplicate in another SEList the replicas

LCG File Catalogue LCG File Catalogue (LFC)(LFC)

StorageStorageElement 1Element 1

““User User interface”interface”

StorageStorageElement 2Element 2

lfc-*

lcg-*

Page 22: gLite Data Services and Data Management

Application Gridification 24/31

LFC and LCG utils

List directoryCreate a local file then upload it to an SE and register with a logical name (lfn) in the catalogueCreate a duplicate in another SEList the replicas

Create a second logical file name for a fileDownload a file from an SE to the UI

LCG File Catalogue LCG File Catalogue (LFC)(LFC)

StorageStorageElement 1Element 1

““User User interface”interface”

StorageStorageElement 2Element 2

? lcg-*

lfc-*

Page 23: gLite Data Services and Data Management

Application Gridification 25/31

LFC commands

There are, on the UI, some commands that directly interact with the LFC catalog.Due to its particular LFN structure, files in the LFC catalog can be browsed as if they were in a unix filesystem. Try this:

> lfc-ls /grid/seegrid

The lfc-ls command works just like a ls on a local filesystem (also allowing the -l option). In the same way, lfc-mkdir, lfc-chmod or lfc-chown behave almost like their corresponding brothers on unix.

In spite of the easyness of LFC commands, usually only lfc-ls is used.Commands that perform actions on the catalog, that write on it or delete “directories” from it should be used with great caution: the risk is to cause inconsistencies between the catalog and the files on the SE.

Data management command assure that such inconsistencies are not created. These commands write on the catalog but they also check that no “harm” is done to the system.

Page 24: gLite Data Services and Data Management

Application Gridification 26/31

Add/replace a commentlfc-setcomment

Set file/directory access control listslfc-setacl

Remove a file/directorylfc-rm

Rename a file/directorylfc-rename

Create a directorylfc-mkdir

List file/directory entries in a directorylfc-ls

Make a symbolic link to a file/directorylfc-ln

Get file/directory access control listslfc-getacl

Delete the comment associated with the file/directorylfc-delcomment

Change owner and group of the LFC file-directorylfc-chown

Change access mode of the LFC file/directorylfc-chmod

Summary of the LFC Catalog commands

LFC Catalog commands

Page 25: gLite Data Services and Data Management

Application Gridification 27/31

Low level commands

There are some “low-level” commands made available to grid users that should be used with caution, working merely on the SE without updating the catalog. Anyway, 2 of them will prove to be real friends to anyone who has to look for files on the grid:

edg-gridftp-ls gsiftp://<SE_address>/<dir>/globus-url-copy <src_file> <dest_file>

The first command lists the content of a directory on a remote SE, the second one is the base for every lcg tool that has to move data. The <src_file> and <dest_file> have to be in a fully qualified format:

file:///<abs_path>/<file_name> for local filesgsiftp://<SE_address>/<abs_path>/<file_name> for remote files

Other useful low-level commands (to be used carefully!) are

edg-gridftp-rm <URL>edg-gridftp-rmdir <URL>edg-gridftp-rename <src_URL> <dest_URL>

Page 26: gLite Data Services and Data Management

Application Gridification 28/31

File Transfer Service

FTS is a low level data movement service

Why is it needed? Improves reliability for transfers Provides asynchronous file transfer

schedule transfers when resources are available

Provides control of transfer properties (channel concept)

No catalogue interactions yet users have to handle SURL

Page 27: gLite Data Services and Data Management

Application Gridification 29/31

FTS Concepts

Transfer Job A set of source/destination pairs specifying files to transfer Submitted to FTS for processing

Channel A job is assigned to a channel after submission Represents a point-to-point network link Catch all channels are possible: any-to-me, me-to-any Similar to a queue where you can specify

VO share for the queue Number of concurrent file transfer Number of concurrent streams (gridFTP)

Page 28: gLite Data Services and Data Management

Application Gridification 30/31

FTS architecture

All components are decoupled from each other Each interacts only with the database

Experiments interact viaweb-service User: FileTransfer Admin:

ChannelManagement

VO agents assigns jobs to channels

Channel agents manages assigned file transfers

Monitoring and statistics can be collected via the DB

Page 29: gLite Data Services and Data Management

Application Gridification 31/31

Summary of fts client commands

FTS client

glite-transfer-submit Submit a transfer job : needs at least source and destination SURL

glite-transfer-status Given one or more job ID, query about their status

glite-transfer-cancel Delete the transfer with the give Job ID

glite-transfer-list Query about status of all user’s jobs; support options for query restrictions

glite-transfer-channel-list

Show all available channel; detailed info only if user has admin privileges


Recommended