Date post: | 22-Jan-2016 |
Category: |
Documents |
Upload: | trey-warring |
View: | 214 times |
Download: | 0 times |
Data Access on the TeraGrid(Possibilities & Directions)
Dan Fraser, ANLAnn Chervenak, ISI
TeraGrid data workshop Jan ’07, San Diego
Overview Architecture ideas from LCG
Trending toward SOA (changeable parts) Possible synergy with TeraGrid Data (end)
Globus Data Directions GridFTP and potential benefits for TG Users Reliable File Transfer (RFT) Replica Locator Service (RLS)
Distributed database that records locations of data copies Data Replication Service (DRS)
Integrates RFT & RLS to replicate & register files Data Access and Integration Service (DAIS)
Service to access relational and XML database
Architecture ideas from LCG
CAstor d-Cache DPM MOPS(3 T1s) (9 T1s & T2s) (T2s only) (T2s only)
GridFTP is the underlying transfer mechanism
RFT(OSG) FTS (EGEE) [SRM-cp, Unicore, Oracle, IBM]Tape Reliability & retries,
Single point of control for VOs & bulk transfers
Advanced client tools – RLS, DRS, RFT(Policies)
Phedex Don Quixote FTD DiracExperimental (CMS) (Atlas) (Alice) (LHCB)Tool Kits Subscribe to datasets. diy meta-scheduler
Pick best copy
SRM Interface (LCG Requirement) + POSIX I/O
Experimental framework – pure science codes
Proposed DMIS interface
PhEDEx (Physics Experiment Data Export)(slide from LCG presentation)
Large scale dataset replica management system Managed data flow following a transfer topology (Tier0 → Tier1
Tier2) Routed multi-hop transfers. Routing agents determine the best route Reliable point-to-point transfers based on unreliable Grid transfer
tools Set of quasi-independent, asynchronous software agents posting
messages in a central blackboard Nodes subscribe for data allocated from other nodes Enables distribution management at dataset level rather than at file
level Implements experiment’s policy on data placement Allows prioritization and scheduling In production (~3 years)
Managing transfers of several TB/day ~100 TB known to PhEDEx, ~200 TB total replicated Running at CERN, 7 Tier-1’s, 10 Tier-2’s
What can we learn? Different communities have different needs and
use their own data-specific tools. One monolithic file system could be nice but not
necessary to get work done. (yet) Common central tools help everyone (Catalogues,
Metadata, Replica Management, Easy reliable file access, Workflows [scheduling])
Trend toward isolating “data specialized” code. Common interfaces allow teams to play nicely
together. GSI is a big win, eventually. (hidden by portals) … details to be filled in by people in this room
Overview Architecture Ideas from the LCG
Trending toward SOA (changeable parts) Possible synergy with TeraGrid Data
Globus Data Directions GridFTP and potential benefits for TG Users Reliable File Transport (RFT) Replica Locator Service (RLS)
Distributed database that records locations of data copies Data Replication Service (DRS)
Integrates RFT & RLS to replicate & register files Data Access and Integration Service (DAIS)
Service to access relational and XML database
What is GridFTP
Data StorageInterfaces (DSI) -POSIX -SRB
GridFTP Server-separate control, data-striping
XIO Drivers -TCP -UDT (UDP) -parallel streams -GSI
Client Interfaces -Globus-URL-Copy -C Library -RFT (3rd party)
I/OFileSystems
Clients
Extensible IO (XIO) system Provides a framework that implements a
Read/Write/Open/Close Abstraction Drivers are written that implement the functionality (file,
TCP, UDP, GSI, etc.) Different functionality is achieved by building protocol
stacks GridFTP drivers will allow 3rd party applications to easily
access files stored under a GridFTP server Other drivers could be written to allow access to other
data stores. Changing drivers requires minimal change to the
application code. Ported GridFTP to use UDT in less than a day
AFTER the UDT driver was written
Parallelism vs Striping
Memory to MemoryStriping Performance
BANDWIDTH Vs STRIPING
0
5000
10000
15000
20000
25000
30000
0 10 20 30 40 50 60 70
Degree of Striping
Ban
dw
idth
(M
bp
s)
# Stream = 1 # Stream = 2 # Stream = 4 # Stream = 8 # Stream = 16 # Stream = 32
Why people use GridFTP
Security (GSI, and now SSH) Performance using parallel streams Performance using striping (and PS) Partial File Transfer Third party control (reliable & restartable) Data extensibility Protocol extensibility
Top GridFTP Myths
Hard to install Requires all of Globus Requires GSI
Moving Forward
Data StorageInterfaces (DSI) -POSIX -SRB
I/OFileSystems
Clients
-HPSS-Small File optimization-Virtual Deployment-Dynamic Registration
-SSH
XIO Drivers -TCP -UDT (UDP) -parallel streams -GSI
GridFTP Server-separate control, data-striping
Client Interfaces -Globus-URL-Copy -C Library -RFT (3rd party)
Future GridFTP Directions Client/Server side
Lots of Small File optimizations (beta) – transfer a sequence of small files as if they were one file.
Dynamic Mover Registration Infrastructure (GFork) Enhance reliability, especially during striping Dynamically scale to meet ever-changing transfer demands Enable users to configure fast transfers
Dynamic deployment via Virtual Machines (Infiniband) Managed Object Placement Service (MOPS)
XIO side Enable transfers using SSH (beta)
DSI side HPSS (beta)
Help us prioritize for TeraGRID
Lots Of Small Files (LOSF)
Pipelining Many transfer requests outstanding at once Client sends second request before the first
completes Latency of request is hidden in data transfer
time Cached Data channel connections
Reuse established data channels (Mode E) No additional TCP or GSI connect overhead
Fast LOSF
1 GB of data partitioned into equal sized files Performance doesn't degrade for pipelining until 100KB
GridFTP Advanced Configurations
GFork Robust unix fork/setuid model Allows server state to be maintained across
connections Dynamic backends
Stability in the event of backend failure Growing resource pools for peak demands
Frontend Replication
GFork
Client
Server Host
GForkServer
GridFTPPlugin
GridFTP Server Instance
GridFTP Server Instance
GridFTP Server Instance
State Sharing Link
ClientClient Inherited Links
Control Channel Connections
Dynamic Backends
Frontend Host
GForkServer
GridFTPPlugin
Frontend Instance
Fork
Lookup available backend
BackendInstance
Backend Host
INetD
Fork
RegistrationDaemon
Registration
Control Connection
Multiple BEs register with plugin Plugin maintains the list of available
BEs FE instance selects N BEs for use If any one BE fails another can be used BE pool can grow and shrink
Reliable File Transfer RFT accepts a SOAP description of the desired
transfer It writes this to a database It then uses the Java GridFTP client library to
initiate 3rd part transfers on behalf of the requestor.
Restart Markers are stored in the database to allow for restart in the event of an RFT failure.
Supports concurrency, i.e., multiple files in transit at the same time. This gives good performance on many small files.
Reliable File Transfer Comparison with globus-url-copy
Supports all the same options (buffer size, etc) Increased reliability because state is stored in a
database. Service interface
The client can submit the transfer request and then disconnect and go away
Think of this as a job scheduler for transfer job
Two ways to check status Subscribe for notifications Poll for status
Globus Replica Management
Current Globus tools: Replica Location Service (RLS):
Provides registration and discovery of data items Data Replicatoin Service (DRS)
Pull-based data replication from existing data items using RFT and registration of files in RLS
Long-term plan (CEDS): provide flexible, policy-driven replication services
Maintain a certain level of redundancy for all data items Subscribe to data items with certain characteristics and
automatically receive copies of new, matching data items Keep replicas consistent with one another
Replication Scenario: The LIGO Project
Laser Interferometer Gravitational Wave Observatory
Data sets first published at Caltech Publication includes specification of metadata attributes
Data sets may be replicated at up to 10 LIGO sites Sites perform metadata queries to identify desired data Pull copies of data from Caltech or other LIGO sites
Customized data management system: the Lightweight Data Replicator System (LDR)
Uses existing Globus tools (GridFTP, RLS)
The Globus Replica Location Service• A Replica Location Service (RLS) is a distributed
registry that records the locations of data copies and allows replica discovery
RLS maintains mappings between logical identifiers and target names
Must perform and scale well: support hundreds of millions of objects, hundreds of clients
E.g., LIGO (Laser Interferometer Gravitational Wave Observatory) Project
RLS servers at 10 sites Maintain associations between 11 million logical file names
& 120 million physical file locations
Replica Location Indexes
Local Replica Catalogs
•LRCs use soft state update mechanisms to inform RLIs about their state: relaxed consistency of index
• Optional compression of state updates reduces communication, CPU and storage overheads
RLS Framework
• Local Replica Catalogs (LRCs) contain consistent information about logical-to-target mappings
• Replica Location Index (RLI) nodes aggregate information about one or more LRCs
LRC LRC LRC LRC
RLI RLI RLI
RLI RLI
RLS Status Stable component
Greatly improved performance and scalability in last 2 years No major changes to existing RLS functionality, interfaces New interface: WS-RF compatible web services interface
(WS-RLS)
Major difficulty for users has been installation and configuration of open source relational database backends
New features Support for embedded database backend (sqlite) Easier configuration of relational database backends Pure Java client for RLS (available approx. March 2007)
Planned Features Dynamic deployment of RLS services Better support for RLS configuration management in VOs Finer-grained authorization support for users
Motivation for Data Replication Service
Data-intensive applications need higher-level data management services that integrate lower-level Grid functionality
Efficient data transfer (GridFTP, RFT) Replica registration and discovery (RLS) Eventually validation of replicas, consistency management,
etc.
Goal is to generalize the custom data management systems developed by several application communities
Eventually plan to provide a suite of general, configurable, higher-level data management services
Globus Data Replication Service (DRS) is the first of these services
The Data Replication Service Included in the GT4.0.2 release
Design based on publication component of the LIGO Lightweight Data Replicator system
Developed by Scott Koranda
Client specifies (via DRS interface) which files are required at local site
DRS uses: Globus Delegation Service to delegate proxy credentials RLS to discover where replicas exist in the Grid Selection algorithm to choose among available source
replicas (provides a callout; default is random selection) Reliable File Transfer (RFT) service to copy data to site
Via GridFTP data transport protocol RLS to register new replicas
DRS Functionality Delegate credential via
Delegation Service Create a Replicator resource via
DRS Discover replicas of desired files
in RLS, select among replicas Transfer data to local site with
Reliable File Transfer Service using GridFTP Servers
Register new replicas in RLS catalogs
Monitor Replicator resource and trigger events
Inspect state of DRS resource and Resource Properties
Destroy Replicator resource
RPs
Replicator
DRS
RPs
Transfer
RFT
RLSIndex
RLSCatalog
GridFTPServer
GridFTPServer
Client
Next Generation: Data Placement Services
Center for Enabling Petascale Distributed Science (CEDS)
Recently funded by DOE Scidac2 as a Center for Enabling Technologies
Includes: USC Information Sciences Institute Argonne National Laboratory University of Wisconsin Madison Lawrence Berkeley National Laboratory Fermi National Laboratory
Higher-level, policy-driven placement of data End-to-end provisioning of data resources to carry out
placement decisions
Layered Architecture
Higher-Level Data Placement Services Decide where to place objects and replicas in the distributed Grid
environment Policy-driven, based on needs of application
Effectively creates a placement workflow that is passed to the Reliable Distribution Service Layer for execution
Push- or pull-based service that places explicit list of data items
Metadata-based placement Decide where data objects are placed based on results of metadata
queries for data with certain attributes
N-Copies: maintain N copies of data items Placement service checks existing replicas, creates/delete replicas to
maintain N copies
Publication/Subscription Allows sites or clients to subscribe to topics of interest Data objects are placed as indicated by subscriptions
Reliable Distribution Layer
Responsible for carrying out the distribution or placement “plan” generated by higher-level service
Extend functionality of reliable file transfer services
Needs to provide feedback to higher level placement services on the outcome of the placement workflow
Call on lower-level services to coordinate (e.g., GridFTP data transport service)
OGSA-DAI in a nutshell An extensible framework for data access and integration Expose heterogeneous data resources to a grid through web
services Interact with data resources
Queries and updates Data transformation / compression Data delivery Application-specific functionality Supports relational, XML and text and binary files Supports various delivery options and transforms Supports secure conversation message-level security using
X509 certificates A base for higher-level services
Federation, mining, visualisation,…
OGSA-DAI motivation Entering an age of data
Data Explosion CERN LHC will generate 1GB/s =
10PB/y Pixar generate 100 TB/movie
Storage getting cheaper Data stored in many different ways
Relational databases XML databases Text and binary files
Need ways to facilitate Data discovery Data access Data integration
Empower e-Business and e-Science The grid is a vehicle for achieving this
Data services
Data Resource Accessor
Relational
XMLDBData
Resource Accessor
Data Resource Accessor
Data Service
Resource
Files
Data Service
Resource
Data Service
Resource
SQLOne
XMLOne
FilesOne
Data
Service
Architecture Ideas from LCG
CAstor d-Cache DPM MOPS(3 T1s) (9 T1s & T2s) (T2s only) (T2s only)
GridFTP is the underlying transfer mechanism
RFT(OSG) FTS (EGEE) [SRM-cp, Unicore, Oracle, IBM]Tape Reliability & retries,
Single point of control for VOs & bulk transfers
Advanced client tools – RLS, DRS, RFT(Policies)
Phedex Don Quixote FTD DiracExperimental (CMS) (Atlas) (Alice) (LHCB)Tool Kits Subscribe to datasets. diy meta-scheduler
Pick best copy
SRM Interface (LCG Requirement) + POSIX I/O
Experimental framework – pure science codes
Proposed DMIS interface
Translation to TeraGrid ??
GPFS HPSS SRB xfer pNFS? GridFTP, pFTP GridFTP, ?? Mechanism single point of control for VOs
Advanced client tools – RLS, DRS, RFT(Policies)
Atmospheric Astronomy Medicine …ExperimentalTool Kits Subscribe to datasets. diy meta-scheduler
Pick best copy
Experimental framework – pure science codes
TGCP Interface | HPSS interface | (possible use of DMIS?)
For More Information GridFTP
http://gridftp.org RLS
“Performance and Scalability of a Replica Location Service,” High Performance Distributed Computing Conference, 2004 http://www.isi.edu/~annc/papers/chervenakhpdc13.pdf
Documentation: http://www.globus.org/toolkit/docs/4.0/data/rls
DRS “Wide Area Data Replication for Scientific Collaborations,” Grid
Computing (Grid2005), http://www.isi.edu/~annc/papers/grid2005final.pdf
Documentation: http://www.globus.org/toolkit/docs/4.0/techpreview/datarep
Discussion ?(over dinner?)