Data Access on the TeraGrid (Possibilities & Directions) Dan Fraser, ANL Ann Chervenak, ISI TeraGrid...

Data Access on the TeraGrid(Possibilities & Directions)

Dan Fraser, ANLAnn Chervenak, ISI

TeraGrid data workshop Jan ’07, San Diego

Overview Architecture ideas from LCG

Trending toward SOA (changeable parts) Possible synergy with TeraGrid Data (end)

Globus Data Directions GridFTP and potential benefits for TG Users Reliable File Transfer (RFT) Replica Locator Service (RLS)

Distributed database that records locations of data copies Data Replication Service (DRS)

Integrates RFT & RLS to replicate & register files Data Access and Integration Service (DAIS)

Service to access relational and XML database

Architecture ideas from LCG

CAstor d-Cache DPM MOPS(3 T1s) (9 T1s & T2s) (T2s only) (T2s only)

GridFTP is the underlying transfer mechanism

RFT(OSG) FTS (EGEE) [SRM-cp, Unicore, Oracle, IBM]Tape Reliability & retries,

Single point of control for VOs & bulk transfers

Advanced client tools – RLS, DRS, RFT(Policies)

Phedex Don Quixote FTD DiracExperimental (CMS) (Atlas) (Alice) (LHCB)Tool Kits Subscribe to datasets. diy meta-scheduler

Pick best copy

SRM Interface (LCG Requirement) + POSIX I/O

Experimental framework – pure science codes

Proposed DMIS interface

PhEDEx (Physics Experiment Data Export)(slide from LCG presentation)

Large scale dataset replica management system Managed data flow following a transfer topology (Tier0 → Tier1

Tier2) Routed multi-hop transfers. Routing agents determine the best route Reliable point-to-point transfers based on unreliable Grid transfer

tools Set of quasi-independent, asynchronous software agents posting

messages in a central blackboard Nodes subscribe for data allocated from other nodes Enables distribution management at dataset level rather than at file

level Implements experiment’s policy on data placement Allows prioritization and scheduling In production (~3 years)

Managing transfers of several TB/day ~100 TB known to PhEDEx, ~200 TB total replicated Running at CERN, 7 Tier-1’s, 10 Tier-2’s

What can we learn? Different communities have different needs and

use their own data-specific tools. One monolithic file system could be nice but not

necessary to get work done. (yet) Common central tools help everyone (Catalogues,

Metadata, Replica Management, Easy reliable file access, Workflows [scheduling])

Trend toward isolating “data specialized” code. Common interfaces allow teams to play nicely

together. GSI is a big win, eventually. (hidden by portals) … details to be filled in by people in this room

Overview Architecture Ideas from the LCG

Trending toward SOA (changeable parts) Possible synergy with TeraGrid Data

Globus Data Directions GridFTP and potential benefits for TG Users Reliable File Transport (RFT) Replica Locator Service (RLS)

Distributed database that records locations of data copies Data Replication Service (DRS)

Integrates RFT & RLS to replicate & register files Data Access and Integration Service (DAIS)

Service to access relational and XML database

What is GridFTP

Data StorageInterfaces (DSI) -POSIX -SRB

GridFTP Server-separate control, data-striping

XIO Drivers -TCP -UDT (UDP) -parallel streams -GSI

Client Interfaces -Globus-URL-Copy -C Library -RFT (3rd party)

I/OFileSystems

Clients

Extensible IO (XIO) system Provides a framework that implements a

Read/Write/Open/Close Abstraction Drivers are written that implement the functionality (file,

TCP, UDP, GSI, etc.) Different functionality is achieved by building protocol

stacks GridFTP drivers will allow 3rd party applications to easily

access files stored under a GridFTP server Other drivers could be written to allow access to other

data stores. Changing drivers requires minimal change to the

application code. Ported GridFTP to use UDT in less than a day

AFTER the UDT driver was written

Parallelism vs Striping

Memory to MemoryStriping Performance

BANDWIDTH Vs STRIPING

0

5000

10000

15000

20000

25000

30000

0 10 20 30 40 50 60 70

Degree of Striping

Ban

dw

idth

(M

bp

s)

# Stream = 1 # Stream = 2 # Stream = 4 # Stream = 8 # Stream = 16 # Stream = 32

Why people use GridFTP

Security (GSI, and now SSH) Performance using parallel streams Performance using striping (and PS) Partial File Transfer Third party control (reliable & restartable) Data extensibility Protocol extensibility

Top GridFTP Myths

Hard to install Requires all of Globus Requires GSI

Moving Forward

Data StorageInterfaces (DSI) -POSIX -SRB

I/OFileSystems

Clients

-HPSS-Small File optimization-Virtual Deployment-Dynamic Registration

-SSH

XIO Drivers -TCP -UDT (UDP) -parallel streams -GSI

GridFTP Server-separate control, data-striping

Client Interfaces -Globus-URL-Copy -C Library -RFT (3rd party)

Future GridFTP Directions Client/Server side

Lots of Small File optimizations (beta) – transfer a sequence of small files as if they were one file.

Dynamic Mover Registration Infrastructure (GFork) Enhance reliability, especially during striping Dynamically scale to meet ever-changing transfer demands Enable users to configure fast transfers

Dynamic deployment via Virtual Machines (Infiniband) Managed Object Placement Service (MOPS)

XIO side Enable transfers using SSH (beta)

DSI side HPSS (beta)

Help us prioritize for TeraGRID

Lots Of Small Files (LOSF)

Pipelining Many transfer requests outstanding at once Client sends second request before the first

completes Latency of request is hidden in data transfer

time Cached Data channel connections

Reuse established data channels (Mode E) No additional TCP or GSI connect overhead

Fast LOSF

1 GB of data partitioned into equal sized files Performance doesn't degrade for pipelining until 100KB

GridFTP Advanced Configurations

GFork Robust unix fork/setuid model Allows server state to be maintained across

connections Dynamic backends

Stability in the event of backend failure Growing resource pools for peak demands

Frontend Replication

GFork

Client

Server Host

GForkServer

GridFTPPlugin

GridFTP Server Instance



State Sharing Link

ClientClient Inherited Links

Control Channel Connections

Dynamic Backends

Frontend Host

GForkServer

GridFTPPlugin

Frontend Instance

Fork

Lookup available backend

BackendInstance

Backend Host

INetD

Fork

RegistrationDaemon

Registration

Control Connection

Multiple BEs register with plugin Plugin maintains the list of available

BEs FE instance selects N BEs for use If any one BE fails another can be used BE pool can grow and shrink

Reliable File Transfer RFT accepts a SOAP description of the desired

transfer It writes this to a database It then uses the Java GridFTP client library to

initiate 3rd part transfers on behalf of the requestor.

Restart Markers are stored in the database to allow for restart in the event of an RFT failure.

Supports concurrency, i.e., multiple files in transit at the same time. This gives good performance on many small files.

Reliable File Transfer Comparison with globus-url-copy

Supports all the same options (buffer size, etc) Increased reliability because state is stored in a

database. Service interface

The client can submit the transfer request and then disconnect and go away

Think of this as a job scheduler for transfer job

Two ways to check status Subscribe for notifications Poll for status

Globus Replica Management

Current Globus tools: Replica Location Service (RLS):

Provides registration and discovery of data items Data Replicatoin Service (DRS)

Pull-based data replication from existing data items using RFT and registration of files in RLS

Long-term plan (CEDS): provide flexible, policy-driven replication services

Maintain a certain level of redundancy for all data items Subscribe to data items with certain characteristics and

automatically receive copies of new, matching data items Keep replicas consistent with one another

Replication Scenario: The LIGO Project

Laser Interferometer Gravitational Wave Observatory

Data sets first published at Caltech Publication includes specification of metadata attributes

Data sets may be replicated at up to 10 LIGO sites Sites perform metadata queries to identify desired data Pull copies of data from Caltech or other LIGO sites

Customized data management system: the Lightweight Data Replicator System (LDR)

Uses existing Globus tools (GridFTP, RLS)

The Globus Replica Location Service• A Replica Location Service (RLS) is a distributed

registry that records the locations of data copies and allows replica discovery

RLS maintains mappings between logical identifiers and target names

Must perform and scale well: support hundreds of millions of objects, hundreds of clients

E.g., LIGO (Laser Interferometer Gravitational Wave Observatory) Project

RLS servers at 10 sites Maintain associations between 11 million logical file names

& 120 million physical file locations

Replica Location Indexes

Local Replica Catalogs

•LRCs use soft state update mechanisms to inform RLIs about their state: relaxed consistency of index

• Optional compression of state updates reduces communication, CPU and storage overheads

RLS Framework

• Local Replica Catalogs (LRCs) contain consistent information about logical-to-target mappings

• Replica Location Index (RLI) nodes aggregate information about one or more LRCs

LRC LRC LRC LRC

RLI RLI RLI

RLI RLI

RLS Status Stable component

Greatly improved performance and scalability in last 2 years No major changes to existing RLS functionality, interfaces New interface: WS-RF compatible web services interface

(WS-RLS)

Major difficulty for users has been installation and configuration of open source relational database backends

New features Support for embedded database backend (sqlite) Easier configuration of relational database backends Pure Java client for RLS (available approx. March 2007)

Planned Features Dynamic deployment of RLS services Better support for RLS configuration management in VOs Finer-grained authorization support for users

Motivation for Data Replication Service

Data-intensive applications need higher-level data management services that integrate lower-level Grid functionality

Efficient data transfer (GridFTP, RFT) Replica registration and discovery (RLS) Eventually validation of replicas, consistency management,

etc.

Goal is to generalize the custom data management systems developed by several application communities

Eventually plan to provide a suite of general, configurable, higher-level data management services

Globus Data Replication Service (DRS) is the first of these services

The Data Replication Service Included in the GT4.0.2 release

Design based on publication component of the LIGO Lightweight Data Replicator system

Developed by Scott Koranda

Client specifies (via DRS interface) which files are required at local site

DRS uses: Globus Delegation Service to delegate proxy credentials RLS to discover where replicas exist in the Grid Selection algorithm to choose among available source

replicas (provides a callout; default is random selection) Reliable File Transfer (RFT) service to copy data to site

Via GridFTP data transport protocol RLS to register new replicas

DRS Functionality Delegate credential via

Delegation Service Create a Replicator resource via

DRS Discover replicas of desired files

in RLS, select among replicas Transfer data to local site with

Reliable File Transfer Service using GridFTP Servers

Register new replicas in RLS catalogs

Monitor Replicator resource and trigger events

Inspect state of DRS resource and Resource Properties

Destroy Replicator resource

RPs

Replicator

DRS

RPs

Transfer

RFT

RLSIndex

RLSCatalog

GridFTPServer

GridFTPServer

Client

Next Generation: Data Placement Services

Center for Enabling Petascale Distributed Science (CEDS)

Recently funded by DOE Scidac2 as a Center for Enabling Technologies

Includes: USC Information Sciences Institute Argonne National Laboratory University of Wisconsin Madison Lawrence Berkeley National Laboratory Fermi National Laboratory

Higher-level, policy-driven placement of data End-to-end provisioning of data resources to carry out

placement decisions

Layered Architecture

Higher-Level Data Placement Services Decide where to place objects and replicas in the distributed Grid

environment Policy-driven, based on needs of application

Effectively creates a placement workflow that is passed to the Reliable Distribution Service Layer for execution

Push- or pull-based service that places explicit list of data items

Metadata-based placement Decide where data objects are placed based on results of metadata

queries for data with certain attributes

N-Copies: maintain N copies of data items Placement service checks existing replicas, creates/delete replicas to

maintain N copies

Publication/Subscription Allows sites or clients to subscribe to topics of interest Data objects are placed as indicated by subscriptions

Reliable Distribution Layer

Responsible for carrying out the distribution or placement “plan” generated by higher-level service

Extend functionality of reliable file transfer services

Needs to provide feedback to higher level placement services on the outcome of the placement workflow

Call on lower-level services to coordinate (e.g., GridFTP data transport service)

OGSA-DAI in a nutshell An extensible framework for data access and integration Expose heterogeneous data resources to a grid through web

services Interact with data resources

Queries and updates Data transformation / compression Data delivery Application-specific functionality Supports relational, XML and text and binary files Supports various delivery options and transforms Supports secure conversation message-level security using

X509 certificates A base for higher-level services

Federation, mining, visualisation,…

OGSA-DAI motivation Entering an age of data

Data Explosion CERN LHC will generate 1GB/s =

10PB/y Pixar generate 100 TB/movie

Storage getting cheaper Data stored in many different ways

Relational databases XML databases Text and binary files

Need ways to facilitate Data discovery Data access Data integration

Empower e-Business and e-Science The grid is a vehicle for achieving this

Data services

Data Resource Accessor

Relational

XMLDBData

Resource Accessor

Data Resource Accessor

Data Service

Resource

Files

Data Service

Resource

Data Service

Resource

SQLOne

XMLOne

FilesOne

Data

Service

Architecture Ideas from LCG

CAstor d-Cache DPM MOPS(3 T1s) (9 T1s & T2s) (T2s only) (T2s only)

GridFTP is the underlying transfer mechanism

RFT(OSG) FTS (EGEE) [SRM-cp, Unicore, Oracle, IBM]Tape Reliability & retries,

Single point of control for VOs & bulk transfers


Phedex Don Quixote FTD DiracExperimental (CMS) (Atlas) (Alice) (LHCB)Tool Kits Subscribe to datasets. diy meta-scheduler

Pick best copy

SRM Interface (LCG Requirement) + POSIX I/O


Proposed DMIS interface

Translation to TeraGrid ??

GPFS HPSS SRB xfer pNFS? GridFTP, pFTP GridFTP, ?? Mechanism single point of control for VOs


Atmospheric Astronomy Medicine …ExperimentalTool Kits Subscribe to datasets. diy meta-scheduler

Pick best copy


TGCP Interface | HPSS interface | (possible use of DMIS?)

For More Information GridFTP

http://gridftp.org RLS

“Performance and Scalability of a Replica Location Service,” High Performance Distributed Computing Conference, 2004 http://www.isi.edu/~annc/papers/chervenakhpdc13.pdf

Documentation: http://www.globus.org/toolkit/docs/4.0/data/rls

DRS “Wide Area Data Replication for Scientific Collaborations,” Grid

Computing (Grid2005), http://www.isi.edu/~annc/papers/grid2005final.pdf

Documentation: http://www.globus.org/toolkit/docs/4.0/techpreview/datarep

Discussion ?(over dinner?)

Date post:	22-Jan-2016
Category:	Documents
Upload:	trey-warring
View:	214 times
Download:	0 times

Data Access on the TeraGrid (Possibilities & Directions) Dan Fraser, ANL Ann Chervenak, ISI TeraGrid...

Documents