+ All Categories
Home > Documents > Globus Data Replication Services

Globus Data Replication Services

Date post: 11-Jan-2016
Category:
Upload: selma
View: 35 times
Download: 1 times
Share this document with a friend
Description:
Globus Data Replication Services. Ann Chervenak, Robert Schuler USC Information Sciences Institute. Motivation for Data Replication Services. Data-intensive applications need higher-level data management services that integrate lower-level Grid functionality - PowerPoint PPT Presentation
26
Globus Data Replication Services Ann Chervenak, Robert Schuler USC Information Sciences Institute
Transcript
Page 1: Globus  Data Replication Services

Globus Data Replication Services

Ann Chervenak, Robert SchulerUSC Information Sciences Institute

Page 2: Globus  Data Replication Services

Motivation for Data Replication Services

Data-intensive applications need higher-level data management services that integrate lower-level Grid functionality

Efficient data transfer (GridFTP, RFT) Replica registration and discovery (RLS) Eventually validation of replicas, consistency management,

etc.

Goal is to generalize the custom data management systems developed by several application communities

Eventually plan to provide a suite of general, configurable, higher-level data management services

Globus Data Replication Service (DRS) is the first of these services

Page 3: Globus  Data Replication Services

The Data Replication Service

Included in the Tech Preview of GT4.0 release

Design is based on the publication component of the Lightweight Data Replicator system

Developed by Scott Koranda from U. Wisconsin at Milwaukee

Functionality Replicate a set of files in the Grid on a local site Users identify a set of desired files DRS queries Replica Location Service to discover current

locations of these files Creates local replicas of desired files using the Reliable File

Transfer Service Registers new replicas in Replica Location Service for discovery

Page 4: Globus  Data Replication Services

Outline

Terminology Functionality of Data Replication Service Background: Components used by DRS

Replica Location Service GridFTP Data Transport protocol Reliable File Transfer Service

DRS Design Implementation in GT4 environment Evaluation of DRS performance in a wide area Grid Future work

Page 5: Globus  Data Replication Services

Some Terminology A logical file name (LFN) is a unique identifier for the

contents of a file Typically, a scientific collaboration defines and manages

the logical namespace Guarantees uniqueness of logical names within that

organization

A physical file name (PFN) is the location of a copy of the file on a storage system.

The physical namespace is managed by the file system or storage system

For example, the LIGO environment currently contains: More than six million unique logical files More than 40 million physical files stored at ten sites

Page 6: Globus  Data Replication Services

DRS Overview

Client uses DRS interface to specify which files are required at local site

DRS uses: Delegation Service to delegate proxy credentials Globus RLS to discover whether replicas exist locally

and where they exist in the Grid Selection algorithm to choose among available

source replicas Globus Reliable File Transfer service to copy data to

local site This uses GridFTP data transport protocol

Globus RLS to register new replicas

Page 7: Globus  Data Replication Services

Background: The Replica Location Service

• A Replica Location Service (RLS) is a distributed registry that records the locations of data copies and allows replica discovery RLS maintains mappings between logical identifiers

and target names Must perform and scale well: support hundreds of

millions of objects, hundreds of clients

E.g., Laser Interferometer Gravitational Wave Observ. RLS servers at 8 sites Maintain associations between 6 million logical file

names & 40 million physical file locations

Page 8: Globus  Data Replication Services

LRC LRC LRC

RLIRLI

LRCLRC

Replica Location Indexes

Local Replica Catalogs

• Replica Location Index (RLI) nodes aggregate information about one or more LRCs

• LRCs use soft state update mechanisms to inform RLIs about their state: relaxed consistency of index

• Optional compression of state updates reduces communication, CPU and storage overheads

RLS Features

• Local Replica Catalogs (LRCs) contain consistent information about logical-to-target mappings

Page 9: Globus  Data Replication Services

Background: GridFTP A secure, robust, fast, efficient, standards based, widely

accepted data transfer protocol

Features: Standard FTP: get/put etc., 3rd-party transfer GSS binding, extended directory listing, simple restart Striped/parallel data channels Partial file TCP buffer setting Progress monitoring, extended restart

The Globus Toolkit supplies a reference implementation: Server Client tools (globus-url-copy) Development Libraries

Page 10: Globus  Data Replication Services

Background: Reliable File Transfer Service

RFT accepts SOAP description of transfer

Writes state to a database Uses Java GridFTP client

library to initiate 3rd part transfers

Restart Markers stored in the database

Allow for restart in the event of RFT failure

Supports concurrency, i.e., multiple files in transit

Check status: Subscribe to notifications Poll for status

Control

Data

Control

Data

RFT Service

RFT Client

SOAP Messages Notifications

(Optional)

Page 11: Globus  Data Replication Services

DRS Functionality Initiate a DRS Request Create a delegated credential Create a Replicator resource Monitor Replicator resource Discover replicas of desired files in Replica Location Service, select

among replicas Transfer data to local site with Reliable File Transfer Service Register new replicas in RLS catalogs Allow client inspection of DRS results Destroy Replicator resource

DRS implemented in Globus Toolkit Version 4, complies with Web Services Resource Framework (WS-RF)

Page 12: Globus  Data Replication Services

Relationship to Other Globus Services

At requesting site, deploy:

WS-RF Services Data Replication Service Delegation Service Reliable File Transfer

Service

Pre WS-RF Components

Replica Location Service (Local Replica Catalog, Replica Location Index)

GridFTP Server

Web Service Container

Data Replication

Service

Replicator Resource

Reliable File

Transfer Service

RFT Resource

Local Replica Catalog

Replica Location

Index

GridFTP Server

Delegation Service

Delegated Credential

Local Site

Page 13: Globus  Data Replication Services

WSRF in a Nutshell Service State Management:

Resource Resource Property

State Identification: Endpoint Reference

State Interfaces: GetRP, QueryRPs,

GetMultipleRPs, SetRP Lifetime Interfaces:

SetTerminationTime ImmediateDestruction

Notification Interfaces Subscribe Notify

ServiceGroups

RPs

Resource

ServiceGetRP

GetMultRPs

SetRP

QueryRPs

Subscribe

SetTermTime

Destroy

EPREPR

EPR

Page 14: Globus  Data Replication Services

Service Container

Create Delegated Credential

Client

Delegation

Data Rep.

RFTReplicaIndex

ReplicaCatalog

GridFTPServer

GridFTPServer

ReplicaCatalog

ReplicaCatalog Replica

Catalog

MDS

Credential

RP

proxy

•Initialize user proxy cert.

•Create delegated credential resource•Set termination time

•Credential EPR returnedEPR

Page 15: Globus  Data Replication Services

Service Container

Create Replicator Resource

Client

Delegation

Data Rep.

RFTReplicaIndex

ReplicaCatalog

GridFTPServer

GridFTPServer

ReplicaCatalog

ReplicaCatalog Replica

Catalog

MDS

Credential

RP

•Create Replicator resource•Pass delegated credential EPR•Set termination time

•Replicator EPR returned

EPRReplicator

RP

•Access delegated credential resource

Page 16: Globus  Data Replication Services

Service Container

Monitor Replicator Resource

Client

Delegation

Data Rep.

RFTReplicaIndex

ReplicaCatalog

GridFTPServer

GridFTPServer

ReplicaCatalog

ReplicaCatalog Replica

Catalog

MDS

Credential

RP

Replicator

RP

•Periodically polls Replicator RP via GetRP or GetMultRP

•Add Replicator resource to MDS Information service Index

Index

RP

•Subscribe to ResourceProperty changes for “Status” RP and “Stage” RP

•Conditions may trigger alerts or other actions (Trigger service not pictured)

EPR

Page 17: Globus  Data Replication Services

Service Container

Query Replica Information

Client

Delegation

Data Rep.

RFTReplicaIndex

ReplicaCatalog

GridFTPServer

GridFTPServer

ReplicaCatalog

ReplicaCatalog Replica

Catalog

MDS

Credential

RP

Replicator

RP

Index

RP

•Notification of “Stage” RP value changed to “discover”

•Replicator queries RLS Replica Index to find catalogs that contain desired replica information

•Replicator queries RLS Replica Catalog(s) to retrieve mappings from logical name to target name (URL)

Page 18: Globus  Data Replication Services

Service Container

Transfer Data

Client

Delegation

Data Rep.

RFTReplicaIndex

ReplicaCatalog

GridFTPServer

GridFTPServer

ReplicaCatalog

ReplicaCatalog Replica

Catalog

MDS

Credential

RP

Replicator

RP

Index

RP

•Notification of “Stage” RP value changed to “transfer”

•Create Transfer resource•Pass credential EPR•Set Termination Time•Transfer resource EPR returned

Transfer

RP

EPREPR

•Access delegated credential resource

•Setup GridFTP Server transfer of file(s)

•Data transfer between GridFTP Server sites

•Periodically poll “ResultStatus” RP via GetRP•When “Done”, get state information for each file transfer

Page 19: Globus  Data Replication Services

Service Container

Register Replica Information

Client

Delegation

Data Rep.

RFTReplicaIndex

ReplicaCatalog

GridFTPServer

GridFTPServer

ReplicaCatalog

ReplicaCatalog Replica

Catalog

MDS

Credential

RP

Replicator

RP

Index

RP

•Notification of “Stage” RP value changed to “register”

•RLS Replica Catalog sends update of new replica mappings to the Replica Index

Transfer

RP•Replicator registers new file mappings in RLS Replica Catalog

Page 20: Globus  Data Replication Services

Service Container

Client Inspection of State

Client

Delegation

Data Rep.

RFTReplicaIndex

ReplicaCatalog

GridFTPServer

GridFTPServer

ReplicaCatalog

ReplicaCatalog Replica

Catalog

MDS

Credential

RP

Replicator

RP

Index

RP

•Notification of “Status” RP value changed to “Finished” Transfer

RP

•Client inspects Replicator state information for each replication in the request

Page 21: Globus  Data Replication Services

Service Container

Resource Termination

Client

Delegation

Data Rep.

RFTReplicaIndex

ReplicaCatalog

GridFTPServer

GridFTPServer

ReplicaCatalog

ReplicaCatalog Replica

Catalog

MDS

Credential

RP

Replicator

RP

Index

RP

•Termination time (set by client) expires eventually

Transfer

RP•Resources destroyed (Credential, Transfer, Replicator)

TIME

Page 22: Globus  Data Replication Services

Performance Measurements: Wide Area Testing

The destination for the pull-based transfers is located in Los Angeles

Dual-processor, 1.1 GHz Pentium III workstation with 1.5 GBytes of memory and a 1 Gbit Ethernet

Runs a GT4 container and deploys services including RFT and DRS as well as GridFTP and RLS

The remote site where desired data files are stored is located at Argonne National Laboratory in Illinois

Dual-processor, 3 GHz Intel Xeon workstation with 2 gigabytes of memory with 1.1 terabytes of disk

Runs a GT4 container as well as GridFTP and RLS services

Page 23: Globus  Data Replication Services

DRS Operations Measured

Create the DRS Replicator resource Discover source files for replication using local RLS

Replica Location Index and remote RLS Local Replica Catalogs

Initiate an Reliable File Transfer operation by creating an RFT resource

Perform RFT data transfer(s) Register the new replicas in the RLS Local Replica

Catalog

Page 24: Globus  Data Replication Services

Experiment 1: Replicate 10 Files of Size 1 Gigabyte

Component of Operation Time (milliseconds)

Create Replicator Resource 317.0

Discover Files in RLS 449.0

Create RFT Resource 808.6

Transfer Using RFT 1186796.0

Register Replicas in RLS 3720.8

Data transfer time dominates Wide area data transfer rate of 67.4 Mbits/sec

Page 25: Globus  Data Replication Services

Experiment 2: Replicate 1000 Files of Size 10 Megabytes

Component of Operation Time (milliseconds)

Create Replicator Resource 1561.0

Discover Files in RLS 9.8

Create RFT Resource 1286.6

Transfer Using RFT 963456.0

Register Replicas in RLS 11278.2

Time to create Replicator and RFT resources is larger Need to store state for 1000 outstanding transfers

Data transfer time still dominates Wide area data transfer rate of 85 Mbits/sec

Page 26: Globus  Data Replication Services

Future Work

Continued performance testing of DRS: Increasing the size of the files being transferred Increasing the number of files per DRS request

Add and refine DRS functionality as needed by GEON and other applications

E.g., add a push-based replication capability Add fine-grained authorization capability to RLS, DRS

Long-term: Will develop a suite of general, configurable, composable,

high-level data management services


Recommended