+ All Categories
Home > Documents > EGEE Data Management Peter Kunszt

EGEE Data Management Peter Kunszt

Date post: 20-Jan-2016
Category:
Upload: mingan
View: 41 times
Download: 0 times
Share this document with a friend
Description:
Diligent – EGEE JRA1 Meeting, 2004 December 16. EGEE Data Management Peter Kunszt. Contents. Component Overview gLite Catalogs Overview Concepts Implementations Distribution gLite Transfer Management Scheduling model Implementation Deployment models Distribution mechanisms - PowerPoint PPT Presentation
17
INFSO-RI-508833 Enabling Grids for E-sciencE www.eu-egee.org EGEE Data Management Peter Kunszt Diligent – EGEE JRA1 Meeting, 2004 December 16
Transcript
Page 1: EGEE Data Management Peter Kunszt

INFSO-RI-508833

Enabling Grids for E-sciencE

www.eu-egee.org

EGEE Data Management

Peter Kunszt

Diligent – EGEE JRA1 Meeting, 2004 December 16

Page 2: EGEE Data Management Peter Kunszt

Dec. 16 Diligent – JRA1 Workshop Peter Kunszt 2

Enabling Grids for E-sciencE

INFSO-RI-508833

Contents

• Component Overview• gLite Catalogs

– Overview– Concepts– Implementations– Distribution

• gLite Transfer Management– Scheduling model– Implementation

• Deployment models• Distribution mechanisms• Discussion

Page 3: EGEE Data Management Peter Kunszt

Dec. 16 Diligent – JRA1 Workshop Peter Kunszt 3

Enabling Grids for E-sciencE

INFSO-RI-508833

ServiceService Oriented Architecture Oriented Architecture

Guiding Principles

InteroperabilityInteroperability

PortabilityPortability

ModularityModularity

ScalabilityScalability

Web ServicesWeb ServicesBuilding on existingBuilding on existing

components in acomponents in alightweight mannerlightweight manner

AliEn LCG Condor

Globus SRM ...

Page 4: EGEE Data Management Peter Kunszt

Dec. 16 Diligent – JRA1 Workshop Peter Kunszt 4

Enabling Grids for E-sciencE

INFSO-RI-508833

Data Management Tasks

• File Management– Storage– Access– Placement– Cataloguing– Security

• Metadata Management– Secure database access– Schema management– File-based metadata– Generic metadata

Page 5: EGEE Data Management Peter Kunszt

Dec. 16 Diligent – JRA1 Workshop Peter Kunszt 5

Enabling Grids for E-sciencE

INFSO-RI-508833

Product Overview

• File Storage– Storage Elements with SRM (Storage Resource Manager) interface– Posix I/O interface through glite-io– Supports transfer protocols (bbftp, https, ftp, gsiftp, rfio, dcap, …)

• Catalogs– File and Replica Catalog– File Authorization Service– Metadata Catalog– Distribution of catalogs, conflicts resolution (messaging)

• Transfer– Top-level Data Scheduler as global entry point (there may be many). – Site File Placement Service managing transfers and catalog

interactions– Site File Transfer Service managing incoming transfers (the network

resource)

Page 6: EGEE Data Management Peter Kunszt

Dec. 16 Diligent – JRA1 Workshop Peter Kunszt 6

Enabling Grids for E-sciencE

INFSO-RI-508833

File Movement and Management

• Data scheduling and high-level optimization

• Job-like data transfers (queuing, ordering, etc)

• Possibility to use reliable managed file transfer

• Site self-consistency (locality of reference)

• SRM-based managed storage (permanent and volatile)

Page 7: EGEE Data Management Peter Kunszt

Dec. 16 Diligent – JRA1 Workshop Peter Kunszt 7

Enabling Grids for E-sciencE

INFSO-RI-508833

File Movement and Management

Internals

Page 8: EGEE Data Management Peter Kunszt

Dec. 16 Diligent – JRA1 Workshop Peter Kunszt 8

Enabling Grids for E-sciencE

INFSO-RI-508833

Catalog Contents

Global UniqueIDentifyer

Storage URL

Storage URL

LogicalFile Name

SymLink

SymLink

Storage URL

UniqueSystem-definedImmutable UUID

UniqueUser-defined

Mutable

Metadata

Page 9: EGEE Data Management Peter Kunszt

Dec. 16 Diligent – JRA1 Workshop Peter Kunszt 9

Enabling Grids for E-sciencE

INFSO-RI-508833

Concepts

• Directories• Symlinks• Authorization: ACL and base (unix) permissions• File metadata (size, ctime, mtime, checksum, status,

type)• File-based metadata (key-value pairs on files), the

schema is associated per directory• Extensible metadata including schema manipulation• Maybe virtual directories (cached metadata queries) in

the future

Page 10: EGEE Data Management Peter Kunszt

Dec. 16 Diligent – JRA1 Workshop Peter Kunszt 10

Enabling Grids for E-sciencE

INFSO-RI-508833

Interface Design

ServiceBase

FASBase

ReplicaCatalogFileCatalog

MetadataBase

MetadataSchema

FiReManFASMetadataCatalog

SEIndex

Base Interfaces Service Interfaces Feature InterfacesEnd-userInterface

Page 11: EGEE Data Management Peter Kunszt

Dec. 16 Diligent – JRA1 Workshop Peter Kunszt 11

Enabling Grids for E-sciencE

INFSO-RI-508833

Metadata Capabilities

• Metadata directly in the File Catalog– Like POSIX file metadata: key-value pairs stored.– Metadata Schema (description of key-value pairs) may be

different for each directory, but all files in the same directory share the same keys

– Limited query and search capabilities to single directory or single schema: the hierarchy has to restrict the query (we don’t allow a global find-like operation on metadata)

• Unconstrained Metadata– Any schema possible– Schema manipulation interface available– Generic query interface (just pass in a query string)

• Application-specific Metadata– On top of any of these two gLite specifications, applications can

build their own metadata interface

Page 12: EGEE Data Management Peter Kunszt

Dec. 16 Diligent – JRA1 Workshop Peter Kunszt 12

Enabling Grids for E-sciencE

INFSO-RI-508833

gLite Catalog Implementations

• Fireman Interface– Oracle 9i implementation– MySQL implementation

• MetadataCatalog Interface– MySQL implementation– Oracle 9i implementation

• MetadataSchema Interface– MySQL implementation– Oracle 9i implementation

• Apply interfaces to existing implementations– Will have a Fireman interface also over the AliEn FC– Fireman interface over the LCG FC– MetadataCatalog and MetadataSchema over existing application

catalogs– …

DONEIn progress or planning

Page 13: EGEE Data Management Peter Kunszt

Dec. 16 Diligent – JRA1 Workshop Peter Kunszt 13

Enabling Grids for E-sciencE

INFSO-RI-508833

Catalog Deployment Models

• Single central catalog (AliEn, LCG-2 model)– All operations go there

• Local catalogs with a central component– Update operation only on local catalogs– Update operation on both local and central catalogs

• Local catalogs, no central component – only indices for certain queries

Page 14: EGEE Data Management Peter Kunszt

Dec. 16 Diligent – JRA1 Workshop Peter Kunszt 14

Enabling Grids for E-sciencE

INFSO-RI-508833

Distribution Mechanism 1

• Data Scheduler (global and local schedulers)– Global scheduler (VO-specific) takes requests like

Copy set of files from A to B Make set of files available at C Upload files from GSIFTP server to D Delete files Maybe also metadata operations

– Local scheduler fetches tasks from known global schedulers Coupled tightly to a local transfer service Manage transfer where the local site is a target Assure atomicity of transfer and catalog operations

• Transfer Service– Queue data transfers to/from a given Storage Element (SRM)– Receives jobs from local scheduler– Manages transfers through a set of states

Page 15: EGEE Data Management Peter Kunszt

Dec. 16 Diligent – JRA1 Workshop Peter Kunszt 15

Enabling Grids for E-sciencE

INFSO-RI-508833

Distribution Mechanism 2

• Certainly possible to just rely on DB replication• Middleware distribution of updates between catalogs

– Using a messaging system (JMS using JORAM)– Publish updates to message queue locally– Subscribe to updates at central catalogs / index nodes– Asynchronous messaging queues take care of update delivery– Scales well to the number of sites we deal with– However, error messages have to be queued for retrieval as well

Page 16: EGEE Data Management Peter Kunszt

Dec. 16 Diligent – JRA1 Workshop Peter Kunszt 16

Enabling Grids for E-sciencE

INFSO-RI-508833

To be understood

• What to distribute and how– All of the data? (Replication)– Just parts? (Indexing)– Read-write mechanisms and updates between many copies (Policies)

• Metadata usage– Schema manipulation capabilities – what is really needed– Metadata services by experiments may interface with gLite or

implement the gLite interfaces themselves Are a set of canned queries good enough? If yes, user does not need to

have a generic query interface. Does all of the metadata need to be local? Or will some metadata have to

be fetched from remote sites? What kinds of distributed queries are necessary at all? What kind of metadata is for local/laptop usage? What kinds of update semantics are needed if at all? (Single instance,

single master, multi master)

Page 17: EGEE Data Management Peter Kunszt

Dec. 16 Diligent – JRA1 Workshop Peter Kunszt 17

Enabling Grids for E-sciencE

INFSO-RI-508833

Summary

• gLite Data Management provides a complete set of file management middleware including data and catalog distribution

• Many extensible modules based on simple interfaces. Capabilities may easily be extended if needed.

• Actual usage patterns need to be understood in order to set up an efficient deployment scenario.

• Still many difficult open questions which have to be answered individually for each Grid VO.

We are looking forward to work with the community to address these issues.


Recommended