+ All Categories
Home > Documents > Design and Evaluation of a Simple Data Interface for ...

Design and Evaluation of a Simple Data Interface for ...

Date post: 24-Jan-2022
Category:
Upload: others
View: 1 times
Download: 0 times
Share this document with a friend
24
Design and Evaluation of a Simple Data Interface for Eicient Data Transfer Across Diverse Storage ZHENGCHUN LIU, Argonne National Laboratory RAJKUMAR KETTIMUTHU, Argonne National Laboratory JOAQUIN CHUNG, Argonne National Laboratory RACHANA ANANTHAKRISHNAN, e University of Chicago MICHAEL LINK, e University of Chicago IAN FOSTER, Argonne National Laboratory and e University of Chicago Modern science and engineering computing environments oen feature storage systems of dierent types, from parallel le systems in high-performance computing centers to object stores operated by cloud providers. To enable easy, reliable, secure, and performant data exchange among these dierent systems, we propose Connector, a pluggable data access architecture for diverse, distributed storage. By abstracting low-level storage system details, this abstraction permits a managed data transfer service (Globus in our case) to interact with a large and easily extended set of storage systems. Equally important, it supports third-party transfers: that is, direct data transfers from source to destination that are initiated by a third-party client but do not engage that third party in the data path. e abstraction also enables management of transfers for performance optimization, error handling, and end-to-end integrity. We present the Connector design, describe implementations for dierent storage services, evaluate tradeos inherent in managed vs. direct transfers, motivate recommended deployment options, and propose a performance model-based method that allows for easy characterization of performance in dierent contexts without exhaustive benchmarking. Additional Key Words and Phrases: Data Transfer, Cloud Storage, Storage Interface 1 INTRODUCTION Easy access to data produced by scientic research is essential if such research products are to be widely available for research, education, business, and other purposes [41]. Far from being mere rehashes of old datasets, evidence shows that studies based on analyses of previously published data can achieve just as much impact as the original projects [23]. Reducing barriers to the sharing of scientic data is a multi-faceted challenge [55], but one fundamental need is for mechanisms that can enable ecient, secure, and reliable access to data, regardless of location. e work described here is concerned with addressing two important obstacles to scientic data access, namely storage system diversity and ecient data movement. Ideally, once a scientist locates data of interest, they should be able to retrieve required components easily, reliably, eciently, and securely, without concern for the details of the source and destination storage systems. In practice, considerations such as domain practices, cost, performance, data source, and analysis workows result in scientists storing data on a wide range of storage systems, oen with idiosyncratic interfaces: for example, commercial cloud object-based storage, such as Amazon Simple Storage Service (S3), Google Cloud Storage, and Microso Azure Blob Storage; community object-based storage solutions, such as Ceph and OpenStack Swi; parallel lesystems, such as Lustre, GPFS, and Intel DAOS; and cloud-based le hosting services and synchronization services, such as Box, Google Drive, Microso OneDrive, and Dropbox. Furthermore, the datasets that are created and analyzed in science are frequently large, reaching terabytes or even petabytes in size. Manuscript submied to ACM 1 arXiv:2009.03190v1 [cs.DC] 7 Sep 2020
Transcript

Design and Evaluation of a Simple Data Interface for E�icient Data Transfer Across

Diverse Storage

ZHENGCHUN LIU, Argonne National Laboratory

RAJKUMAR KETTIMUTHU, Argonne National Laboratory

JOAQUIN CHUNG, Argonne National Laboratory

RACHANA ANANTHAKRISHNAN, �e University of Chicago

MICHAEL LINK, �e University of Chicago

IAN FOSTER, Argonne National Laboratory and �e University of Chicago

Modern science and engineering computing environments o�en feature storage systems of di�erent types, from parallel �le systemsin high-performance computing centers to object stores operated by cloud providers. To enable easy, reliable, secure, and performantdata exchange among these di�erent systems, we propose Connector, a pluggable data access architecture for diverse, distributedstorage. By abstracting low-level storage system details, this abstraction permits a managed data transfer service (Globus in our case)to interact with a large and easily extended set of storage systems. Equally important, it supports third-party transfers: that is, directdata transfers from source to destination that are initiated by a third-party client but do not engage that third party in the data path.�e abstraction also enables management of transfers for performance optimization, error handling, and end-to-end integrity. Wepresent the Connector design, describe implementations for di�erent storage services, evaluate tradeo�s inherent in managed vs.direct transfers, motivate recommended deployment options, and propose a performance model-based method that allows for easycharacterization of performance in di�erent contexts without exhaustive benchmarking.

Additional Key Words and Phrases: Data Transfer, Cloud Storage, Storage Interface

1 INTRODUCTION

Easy access to data produced by scienti�c research is essential if such research products are to be widely available forresearch, education, business, and other purposes [41]. Far from being mere rehashes of old datasets, evidence showsthat studies based on analyses of previously published data can achieve just as much impact as the original projects [23].Reducing barriers to the sharing of scienti�c data is a multi-faceted challenge [55], but one fundamental need is formechanisms that can enable e�cient, secure, and reliable access to data, regardless of location.

�e work described here is concerned with addressing two important obstacles to scienti�c data access, namelystorage system diversity and e�cient data movement. Ideally, once a scientist locates data of interest, they shouldbe able to retrieve required components easily, reliably, e�ciently, and securely, without concern for the details ofthe source and destination storage systems. In practice, considerations such as domain practices, cost, performance,data source, and analysis work�ows result in scientists storing data on a wide range of storage systems, o�en withidiosyncratic interfaces: for example, commercial cloud object-based storage, such as Amazon Simple Storage Service(S3), Google Cloud Storage, and Microso� Azure Blob Storage; community object-based storage solutions, such asCeph and OpenStack Swi�; parallel �lesystems, such as Lustre, GPFS, and Intel DAOS; and cloud-based �le hostingservices and synchronization services, such as Box, Google Drive, Microso� OneDrive, and Dropbox. Furthermore,the datasets that are created and analyzed in science are frequently large, reaching terabytes or even petabytes in size.Manuscript submi�ed to ACM 1

arX

iv:2

009.

0319

0v1

[cs

.DC

] 7

Sep

202

0

�us the scientist must be able not only to access diverse data stores but to optimize data movement among di�erentcombinations of such systems. �is need creates its own challenges in terms of leveraging these resources withoutoverburdening application researchers [45].

A �rst important step towards enabling seamless data access and movement among diverse storage systems wastaken more than a decade ago, when Allcock et al. [7] introduced the Data Storage Interface (DSI) within the Globusimplementation of the GridFTP protocol as a uni�ed storage interface for use by data movement tools. Globus GridFTPand its DSI were initially supported on POSIX-compliant �le systems [7], but the storage landscape increasingly includescloud object stores, tape archives, and other proprietary storage systems. �e evolution of DSI to accommodate newstorage systems while maintaining backward compatibility, and also to incorporate support for third-party transfersand modern authentication and authorization mechanisms, produced the Connector abstraction that we describein this paper. �is abstraction, as instantiated in an interface and implementation, enables a wide variety of storagesystems to be accessed in a consistent, performant, and secure manner, simply by installing the required Connector

server so�ware [15]. Equally important, it supports the management of transfers by cloud-hosted management services,such as the Globus service that we consider in this paper.

While a uniform interface to storage has many advantages, any abstraction layer tends to introduce overheads thatcan impact performance. Understanding the nature of these overheads is essential to determining where and whenthe abstraction may be used. To develop this understanding, we �rst present here the Connector abstraction and thenevaluate overheads and performance when the Globus implementation of this abstraction is used to move data betweendiverse storage systems. �e primary contributions of this paper are the following:

• We describe Connector, a data storage interface that permits uniform access to a wide range of storage systems,including both cloud storage and conventional �le systems, while supporting third-party managed transfers.

• We describe how this interface can be incorporated into a data transfer service.• We propose a performance model-based method for exploring performance issues in di�erent contexts without

exhaustive benchmarking.• We draw conclusions about implications for our design and the Globus implementation, and recommend best

practices.

�e rest of this paper is organized as follows. In §2 we describe the motivation for a uniform interface for datamovement across diverse storage systems including the cloud based storage services. In §3 we propose Connector

based on the original DSI to address new challenges in cloud-based storage service. �e details of Connector and sixsample implementations are introduced in §4. In §5 we present a performance-model-based approach to study overheadof data movement using Connector, and in §6 we analyze the throughput of Connector-based data movement andin §7 we evaluate the in�uence of data integrity checking which shows unique characteristics to Connector. �eresults presented in §6 are in line with the performance analysis in §5. In §8 we discuss best practices for productiondeployment. In §9 we review related work , and in §10 we summarize our conclusions and discuss future work.

2 MOTIVATION

�e science and engineering community uses a large and growing number of storage systems. Each such system hasbeen created in response to speci�c needs for storing and accessing science and engineering data, needs that cover abroad spectrum of cost, scale, performance, and other requirements. Di�erent systems focus on distinct requirements,provide distinct services to their clients, and o�en implement di�erent interfaces and protocols for data access.

2

For example, POSIX I/O provides open(), close(), read(), write(), and lseek() operations, with strict consistency andcoherence requirements. For instance, write operations have to be visible to other clients immediately a�er the systemcall returns. POSIX serves as the uniform interface for many storage systems, such as Lustre and GPFS, that are widelyused in science institutions. Although high-level parallel I/O middleware libraries, such as MPI-IO [49], HDF5 [24],ADIOS [33], and PnetCDF [30], provide relaxed semantics for �le system access, most scienti�c HPC applications usePOSIX as their default interface for interacting with local storage [38].

Object stores, widely used in cloud computing, manage data as objects instead of �les. �ey provide a single �atglobal name space, support just a few simple operations, such as PUT and GET, and support a weak form of consistency:eventual consistency. Although APIs provided by cloud storage providers enable high-speed access to data from within

the cloud, they do a bad job of moving data between cloud and local science institutions or among di�erent cloud serviceproviders. �e user must log into the system in order to perform download or upload operations between cloud storageand another institution. �is data movement pa�ern is unreliable, since any system interruption results in failure, andis di�cult to incorporate into research automation tools such as work�ow systems and HPC schedulers, which typicallyuse POSIX interface for data staging.

2.1 Third-Party Transfer

An important data transfer pa�ern in many science and engineering se�ings involves a “third party” (a user or agentworking on behalf of a user) initiating and managing data movement between two remote computers (or instruments).Support for such third-party data transfers allows users to initiate, manage, and monitor data movement from anywhere,without direct access to the systems involved in the data movement. It also facilitates the integration of data movementoperations into a wide variety of data automation tools, from shell scripts to scienti�c work�ow engines, such asGalaxy [25], Kepler [8], Parsl [12], Pegasus [20], and Swi�/T [56]. �e ability to request data transfers enables thework�ow systems to execute transparently on remote resources.

A third-party transfer necessarily engages two distinct communication channels, for control and data. �e controlchannel is used for sending protocol messages between system components, for example, from the third-party manage-ment service to data movers, in order to authenticate and authorize users and to initiate streaming. �e data channelprovides the link between the source and destination of the data being transferred. Cloud-based storage services, incontrast, provide only two-party (i.e., server-client) data movement, which makes it di�cult to integrate such servicesinto scienti�c work�ows in which some components may run on supercomputers that do not have WAN connectivityand typically communicate with dedicated data transfer nodes via parallel �le systems.

2.2 Transfer Management

While the servers connected to storage are the workhorses for data movement, the clients that drive data transfers playa critical role in determining transfer performance and reliability. �e client needs to provide all parameters for anytransfer, including the security credential to be used, network usage levels (e.g., number of concurrent connections,parallelism), and integrity and privacy levels. �e con�guration of these parameters by the client, as part of the requestthat it issues to servers to transfer �les, has signi�cant impact on transfer performance. Moreover, while the GridFTPprotocol is built for supporting reliable transfers, the onus falls on the client to track the information sent back from theservers on how much data has been moved and request restarts when these are needed to ensure a complete transfer.Similarly, any failures, including those that result from integrity checks performed upon completion of a �le transfer,

3

are returned to the client; it is up to the client application to request that data be retransferred. For large transfers thatinclude recursive transfers of directories, the client needs to expand directories and track progress at a per-�le level inorder to ensure that all �les and folders are moved.

Client applications that provide such transfer management capabilities and the rich features required to drive highperformance and reliability are not trivial to write and maintain. �e Globus transfer service provides such a client as ahosted service for managed transfers. It drives maximum e�ciency by combining information on the �les that need tobe moved and the capacity at source and destination to determine performance parameters. It also provides reliabilityby tracking transfer progress and retrying on faults, and it negotiates the security needed to navigate transfers betweensites. By thus providing a �re-and-forget solution for users, it delivers signi�cant usability bene�ts.

�e Connector storage abstraction layer thus has two roles: (1) to enable e�cient access to data stored on a varietyof di�erent storage systems and (2) to support the operation of managed transfer applications such as Globus so thatthey can achieve secure, performant, and reliable transfers across diverse combinations of such systems. �e la�errole requires specialized capabilities, such as enabling a third party to establish their identity and authority to make arequest to the storage system; request that a transfer be initiated; enable encryption; monitor transfer progress; anddetect errors and termination; request checksums.

3 A UNIFORM STORAGE INTERFACE: FROM DSI TO CONNECTOR

Allcock et al. [7] described the initial DSI (illustrated in Figure 1) in 2005 targeting a uniform interface for variouslocal storage systems and data management systems such as HPSS [18], Xrootd [21], and iRODS [29]. �is original

Con

nector

Application(e.g., Data Transfer tool)

Tape

Lustre,GPFS,etc.

HPSS

SyncStorage

CloudStorage

POSIX

Fig. 1. The Connector abstraction

DSI provides a uniform interface to the data storage systems for �le transfers between local storage systems; it wasintended primarily for use by the the open source Globus implementation of GridFTP. �e DSI layer accepts requestssuch as stat, send, and recv and performs these functions using the appropriate APIs of the storage system with whichit interfaces. DSI consists of several function signatures and a set of semantics.

4

Continued advances and diversi�cation in network bandwidth and data storing/management techniques, such asobject stores, cloud-based storage services, and conventional parallel �le systems, led to the original DSI design beingno longer able to handle e�ciently the many di�erent data store/retrieval APIs and diverse authentication methods.Subsequent extensions to the original DSI to support modern authentication methods, handle certain limitations ofcloud storage APIs (such as call quotas), incorporation of automatic retries and fault-tolerant capabilities, and additionalmanagement capabilities, produced what we refer to here as the Connector abstraction, as shown in Figure 3 anddetailed in §4. An implementation of this abstraction is created by instantiating the various Connector functions. �e�rst Connector developed by the Globus team was for POSIX-compliant �le systems. Since then, researchers aroundthe world have implemented others. Examples include HPSS [3], iRODS [2], StoRM [5], SD�ery [48], Xrootd [22],Swi� [39], and MAPFS [42]—some in collaboration with the Globus team and some independently. To set context, weintroduce key Connector interface functions:

• Start is called to establish a new session to access the storage. �is hook gives a Connector an opportunity to setinternal state that will be threaded through to all other function calls associated with this session. It also provides anopportunity to reject the access request.

• Destroy is called to terminate a session. �e Connector should clean up all memory associated with the session.• Stat is called to get information (e.g., size, last modi�ed time) about a given �le or resource and to verify that a �le

exists and has the proper permissions.• Command handles simple (succeed/fail or single-line response) storage system operations such as directory or object

creation and permission changes.• Recv is used to receive data from the application and write to the underlying storage system.• Send is used to read data from the underlying storage system and send the data to the application.• SetCredential allows the application to provide the credential required for the Connector to authenticate with the

underlying storage system.

In addition to these interface functions, the Connector includes various helper functions. For example, Figure 2shows the implementation of the stat() function for the POSIX Connector. �e �nished stat call at the end of the

Listing 1: Example Interface Function

stat ic void POSIX stat ( d s i o p e r a t i o n t op , d s i s t a t i n f o t ⇤ s t a t i n f o ){d s i s t a t t s t a t ou t ;struct s t a t s t a t i n ;s t a t ( s t a t i n f o �> pathname , &s t a t i n ) ;s t a t ou t . mode = s t a t i n . st mode ;s t a t ou t . n l i nk = s t a t i n . s t n l i n k ;s t a t ou t . uid = s t a t i n . s t u i d ;s t a t ou t . g id = s t a t i n . s t g i d ;s t a t ou t . s i z e = s t a t i n . s t s i z e ;s t a t ou t . mtime = s t a t i n . st mtime ;s t a t ou t . atime = s t a t i n . s t a t ime ;s t a t ou t . ct ime = s t a t i n . s t c t ime ;s t a t ou t . name = strdup ( s t a t i n f o �> pathname ) ;f i n i s h e d s t a t ( op , SUCCESS, &sta t out , 1 ) ;

}<latexit sha1_base64="yWBBXHUGCrDiqe5xy0Ageseq3xE=">AAAEyHicbZPfb9MwEMeztcAovzp45MViAm2sRO0QAgkVTVRDjBeGum6T2i5yHSe15tiWfdlWor7wz/E38CfwX2C3adnSnBTp8r2P7+yzb6Q4M9Bs/llbr1Tv3L23cb/24OGjx0/qm09PjEw1oT0iudRnI2woZ4L2gAGnZ0pTnIw4PR1ddFz89JJqw6Q4homiwwTHgkWMYLBSsLn+ezCiMRMZN+DKMRFP+xyLOMUxbXd2dxuIYOXYdnZwjRPFKToUQHWECUVfUkFcbNq45jSCBGubqu033w2AXsMVC2E8rBmwtQi6lCxER9+7h2eBU7ZDwwKpqJ5vBJBUDeQ0FwyYiGQAr9HyZyerIWtLAOYhmcLHWcCATslcXCxaBGytZRr05hNSGMYCJ7SBXuX6zn/UZfQTGVInoPYil28gcGoBFLbtFyvgTC2QqT38akqrFri4lItXOMN+lmzRqcWzAEvo6lmcWiBxKYlLSFJKkhLS9Xm5TR2mqvwq8guImGBmTMP5A3EPotvrdA663cVV2ZQN1LL0tDagIrz5aIP6VtNvzgytOq3c2fJyOwrqfwehJGlCBRCOjem3mgqGGdb2sXJqC6SGKkwu7Bz0reu2aYbZbOKm6KVVQhRJbT8BaKbeXJHhxJhJMrJkYs9oijEnlsX6KUQfhhkTKgUqyLxQlHIEErnxRSHTlACfWAcTzdxgkTHWmNiRvF0lb4yZ1mq2N61iJ1adkz2/9dbf+7G3tf8579KG99x74W17Le+9t+999Y68nkcqfuW4MqycV79VVfWqOpmj62v5mmfeLav++gebAIUc</latexit>

Fig. 2. The stat() interface function as implemented for the POSIX Connector.

5

function above is a helper function (implemented by the application) that allow the Connector to interact with theapplication. Here is a list of other helper functions:

• read/write transfers data between Connector and application.• get concurrency tells the Connector how many outstanding reads or writes it should have. A data transfer application

would specify this value based on the number of parallel TCP streams used for the data transfer.• get blocksize indicates the bu�er size that the Connector should exchange with the application via read/write.• get read range tells the Connector which data it should be sending. �is handles restart (including “holey” transfers)

and partial data transfers.• bytes wri�en should be called whenever the Connector successfully completes a write to the storage system. �is

allows the data transfer application to generate performance and restart markers.

Applications can load and switch Connector at runtime. When the application requires action from the storagesystem (e.g., store/retrieve data, metadata, directory creation), it passes a request to the loaded Connector module. �eConnector then services that request and noti�es the server when it is �nished. An API is provided to the Connector

author to assist in implementation. �e Connector author is not expected to know the details of the application. Instead,this API provides functions for reading and writing data to and from the network.

4 GLOBUS ARCHITECTURE AND CONNECTOR IMPLEMENTATION

Rapid growth in both the use of the Globus service and the variety of available cloud and other storage resourceshas created the need for additional Connectors. We describe here six popular Connectors that are integrated intothe GCS [15]. GCS comprises components for data access, security, and installation and con�guration. For bulk dataaccess, GridFTP—a high-performance, secure, reliable data transfer protocol optimized for high-bandwidth wide-areanetworks—is used. GCS also includes a HTTPS server for direct two-party data access. �e storage Connector providethe abstract layer for secure access to a storage type and include key capabilities such as security protocols needed bythe storage system, management of security credentials or tokens, limit and thro�ling policy management, and the keyIO access to the storage system. Figure 3 shows the �ow of authentication to make a Connector work with Globus.

Every storage blob requires that a credential be registered with the endpoint’s GCS Manager service using its RESTAPI. �e credentials are never sent via the hosted Globus transfer service; instead, they are sent directly from the user’sclient (browser or command line client) to the GCS server. When the storage blob is accessed, the credential is readfrom the GCS Manager by the GridFTP server and passed to the Connector. More speci�cally for POSIX, Box, andCeph connectors, the credential is simply the local username to which the user’s login identity is mapped. For AWS S3,the credential is a user-submi�ed S3 Access Key ID and Secret Key. For Google Drive and Google Cloud Storage, thecredential is a token that is sent to the GCS Manager directly by Google, a�er the user completes the Google OAuth2login. Non-persistent GridFTP clients such as globus-url-copy [7] pass through credentials; users must providethem anew upon each failure. In the rest of this section, we brie�y introduce the cloud stores with which we evaluateConnector later using a performance modeling based method, one of our key contributions of this paper.

Amazon Simple Storage Service (AWS-S3) is a service o�ered by Amazon Web Services (AWS) that provides objectstorage through a web service interface.

Wasabi [53], an enterprise-class, tier-free cloud storage service, provides an S3-compliant interface to use withstorage applications, gateways, and other platforms.

6

Data Transfer Node

GCS Manager

GridFTPwith

Connector

Username/Password

S3 Access Key

Google RefreshToken

Google.com

POSIX, Ceph, Box

  Amazon S3

Google Cloud Storage,Google Drive

End User Browser

Credential

Storage System

Get/Put

Fig. 3. Data- and Authentication- flow of Connector.

Google-Drive is a �le storage and synchronization service developed by Google. G Suite [27], a suite of cloudcomputing, productivity and collaboration tools, so�ware, and products developed by Google Cloud, is widely used byeducation institutions, with which users have signi�cant storage allocations with Google Drive. It also is being used assecond-tier storage for research data. Connector helps in handling certain limitations of the Google Drive API (such ascall quotas) through automatic retries and fault-tolerant capabilities in Globus transfer service.

Ceph [54], an open-source so�ware storage platform, delivers object, block, and �le storage in one uni�ed system. Itis based on Reliable Autonomic Distributed Object Store, which distributes objects across a cluster of storage nodesand replicates objects for fault tolerance. Ceph decouples data and metadata. Object Storage Devices store data, andMetadata Servers stores metadata, with metadata distributed dynamically among multiple Metadata Servers.

box.com provides a service similar to Google-Drive. Its growing use in universities and national laboratories forresearch data motivated the development of a Box Connector, which enables bridging to other storage and, as withGoogle Drive, handles limitations of the native API.

Google-Cloud storage, like AWS-S3, is a RESTful �le storage web service for storing and accessing data on GoogleCloud Platform infrastructure. Its service speci�cations make it more suitable than Google-Drive for enterprise use.Its growing use for research data motivated the implementation of a Google-Cloud Connector, which is being used tomove data between Google-Cloud and research institute storage and between AWS-S3 and Google-Cloud.

5 PERFORMANCE MODELING BASED OVERHEAD EVALUATION

Liu et al. observe that the per-�le overhead is the performance killer when transferring many small �les between sciencefacilities [34], and that science workloads o�en have that unfortunate characteristic [37]. Although the in�uence ofper-�le overhead can be alleviated by transferring many �les concurrently, the DTN resource requirement will be higherto support large concurrency (the number of �les transferred concurrently). We describe here a performance modelthat captures per-�le overheads, and we present experiments that allow us to measure indirectly the per-�le overhead

7

when DSI cloud Connector are involved in a transfer. All experiment source code, environment setup instructions andexperiment result analysis code are available at h�ps://github.com/ramsesproject/dsi.

5.1 Experiment Design

Figure 4 shows a con�guration in which the Connector is deployed on a DTN managed by a research institution. Werefer to this con�guration as Conn-local.

z Conn(POSIX)-- Local

Conn(box, S3etc)--Cloud

CloudStorage

Gateway DTN

WAN loopback

Data flowControl flow

User

Local DTN

Fig. 4. A typical scenario in which the Connector is deployed locally in the science institution.

GridFTP [7] has been optimized for moving data e�ciently over wide area network [32]. �us, for AWS-S3 andGoogle-Cloud, we evaluate another deployment scenario in which the Connector runs on a VM operated by the samecloud provider, ideally in the same region as the storage. Figure 5 shows the case where the GCS and correspondingstorage Connector are deployed in the same region as the cloud storage. Here, cloud storage (AWS-S3 or Google-Cloud)APIs are used only for local data access and GridFTP is used to move data over the wide area network. We refer to thiscon�guration as Conn-cloud.

Conn(POSIX)-- LocalConn(GCS,

S3)--Cloud

AWS S3 Gateway DTN

WAN

aws/Google Cloud

Cloud VM Local DTN

Data flowControl flow

User

GCS

Fig. 5. A scenario in which the Connector is deployed in the same cloud as storage, as seen in AWS-S3 and Google-Cloud.

5.2 Analysis of Experiment Results

5.2.1 Regression analysis. �ese statistical processes enable estimation of the relationships between a dependentvariable (e.g., data movement performance) and one or more independent variables (e.g., �le sizes). Consider the modelfunction

y = α + βx , (1)8

which describes a line with slope β and y-intercept α . �is relationship may not hold exactly for the largely unobservedpopulation of values of the independent and dependent variables; we call the unobserved deviations from this equationthe errors. Suppose we observen data pairs and call them (xi ,yi ), i = 1, . . .n. We can describe the underlying relationshipbetween yi and xi involving the error term ϵi by

Yi = α + βxi + ϵi . (2)

We can then estimate α and β by solving the following minimization problem:

minα,β

Q(α , β), for Q(α , β) =n∑i=1

(yi − α − βxi )2. (3)

5.2.2 Performance model. We consider the transfer of multiple �les sequentially between two endpoints. We assumethat each �le introduces a �xed overhead of t0 [34], and the end-to-end theoretical throughput (the minimum of sourceread, network, and destination write, as studied by Liu et al. [35]) is R. �en the time T to transfer N �les totaling B

bytes using a concurrency of one (i.e., transfer N �les one-by-one) is

T = N × t0 + B

R+ S0, (4)

where S0 is the transfer startup cost in seconds (to be measured in §5.4). S0 will be close to zero for two-party transfer(e.g., using native API) but higher for third-party (e.g., cloud-hosted Globus) transfer because there will be coordinationcost between transfer client and servers at source and sink.

We then use Equation 4 to measure indirectly the per-�le overhead using Equation 3 by le�ing α = BR + S0 and β = t0.

Since S0 typically is a constant value, α will re�ect the network use efficiency, namely, how fast the network cantransfer one single large �le.

5.2.3 Pearson correlation. �is coe�cient [13], ρ (x ,y), is a measure of the linear correlation between variables xand y. It has a value between +1 and −1, where 1 is total positive linear correlation, 0 is no linear correlation, and −1 istotal negative linear correlation. It is calculated by:

ρ (x ,y) = cov (x ,y)σxσy

, (5)

where cov (x ,y) = E[(X − µx )(Y − µy )

]is the co-variance of variables x and y, and σx and σy denote the standard

deviation of x and y, respectively.We performed experiments to verify this model and to estimate the overhead (i.e., t0) of a �le indirectly. We then

used Pearson correlation to quantify the linear relation between data transfer time, t, and number of �les, f, in thedatasets that total to the same size across all experiments. Table 1 presents the correlation coe�cient between datatransfer time and number of �les for transfers to and from the six storage systems using Connector (deployed locallyand at cloud if applicable) as well as native API. We see that the coe�cients are close to 1 for all cases, indicating astrong linear relation between transfer time and number of �les. �us we can use Equation 4 as a performance modelfor data transfers to/from all six storage systems and use regression analysis to resolve the model parameters.

In all experiments, we kept the total dataset size �xed but varied the number of �les. In choosing dataset sizes, weaimed to keep the experiment time short enough to reduce the in�uence of �uctuating external load such as storage andnetwork, but long enough to mitigate transfer service startup cost (i.e., per-transfer overhead). To this end, we used a5 GB dataset for Wasabi, AWS-S3, and Google-Cloud and a 1 GB dataset for Google-Drive and box.com, considering

9

their di�erence in peak end-to-end performance. We split each dataset into N ∈ {50, 100, 200, 400, 600, 800, 1000}equal-sized �les.

We moved the dataset using both the appropriate Connector and the service providers’ native APIs and then usedthe regression analysis methods described above to build a performance model for both the Connector and the nativeAPI. To mitigate the in�uence of external load on the cloud and network, we repeated each experiment between 3 and10 times, depending on the observed �uctuation.

Table 1. Correlation coe�icient, ρ(t, f ), between transfer time t, and number of files, f, to and from di�erent storage systems usingthe native API and Connector deployed locally (at a science institution) and at the cloud (where applicable).

Transfer Direction Connector-Local Connector-Cloud Native-APITo AWS-S3 0.999 0.973 0.995From AWS-S3 0.989 0.993 0.989To Wasabi 0.999 N/A 0.998From Wasabi 0.997 N/A 0.998To Google-Cloud 0.997 0.999 0.993From Google-Cloud 0.999 0.996 0.992To Google-Drive 0.994 N/A 0.992From Google-Drive 0.989 N/A 0.995To Ceph 0.996 0.999 0.999From Ceph 0.986 0.994 0.976To box.com 0.998 N/A 0.999From box.com 0.996 N/A 0.998

5.3 Results and Discussion

We performed the following experiments without Globus integrity checking [14, 15]. Globus integrity check detects anydata corruption that occurred during transmission over the network and/or while writing data to the destination storageby reading the data at the destination (a�er it was wri�en to the storage), computing the checksum, and verifying it withthe source checksum. In practice, integrity checking is of vital importance for storage-to-storage transfers [16, 37, 47].See §7 for more discussion of the importance and performance impacts of integrity checks.

5.3.1 Amazon S3. We used boto3 [1], a Python interface to AWS, as the native tool to download from and uploadto AWS-S3 buckets, and we compared its performance with that of AWS-Connector. Figure 6 shows the experimentalresults and performance model predictions. When uploading from local to AWS-S3, Conn-local performs worse thannative API; Conn-cloud has less per-�le overhead but lower throughput than the native API has. �us, Connector canoutperform native API when transferring many small �les, where per-�le overhead become signi�cant. For downloadsfrom AWS-S3 to local �le system, the per-�le overhead trend is similar to that for uploads, but AWS-Connector e�ciencyis worse when compare with native APIs, leading to worse performance when downloading large �les.

5.3.2 Wasabi. Wasabi provides an S3-compliant interface to use with storage applications, gateways, and otherplatforms. We used APIs from boto3 (the same API used for AWS-S3) as the native tool to download from and upload toWasabi buckets, and we compared its performance with the Wasabi connector.

Figure 7 presents the regression analysis results for both upload to and download from Wasabi using boto3 and theGlobus Wasabi connector. We see from Figure 7 that the native tool and the Connector have similar per-�le overheads,

10

100 200 300 400 500 600 700 800 900 1000Number of files

100150200250300350400450500

Tran

sfer

tim

e(s)

Tapi =0:1438N+101:1 T localconn =0:2161N+151:2

T cloudconn =0:0694N+160:2

Conn-cloud - ExpConn-cloud - MdlConn-local - ExpConn-local - MdlAPI - ExpAPI - Mdl

(a) Upload to AWS-S3

100 200 300 400 500 600 700 800 900 1000Number of files

100

200

300

400

500

Tran

sfer

tim

e(s)

Tapi =0:1469N+72:8 T localconn =0:2660N+116:6

T cloudconn =0:1425N+115:9

Conn-cloud - ExpConn-cloud - MdlConn-local - ExpConn-local - MdlAPI - ExpAPI - Mdl

(b) Download from AWS-S3

Fig. 6. Transfer time for 5 GB between local file system and AWS-S3 vs. number of files. Conn-cloud: AWS-Connector is deployed inAWS near the S3 bucket (Figure 5); Conn-local: deployed in a science institution (Figure 4).

for both download from- and upload to Wasabi. In terms of average throughput achieved, the Connector is slightlyslower for uploads and slightly faster for downloads. Overall, we conclude that the Connector will perform worse thanthe native tool when uploading many large �les, but it will be comparable to the native tool when transferring manysmall �les, a common use case in practice.

100 200 300 400 500 600 700 800 900 1000Number of files

40

60

80

100

120

140

160

180

Tran

sfer

tim

e(s)

Tapi = 0.1117N+ 34.7

Tconn = 0.1176N+ 46.1

API - ExperimentAPI - Linear fitConn - ExperimentConn - Linear fit

(a) Upload to Wasabi

100 200 300 400 500 600 700 800 900 1000Number of files

80100120140160180200220240260

Tran

sfer

tim

e(s)

Tapi = 0.1737N+ 74.0

Tconn = 0.1712N+ 69.2

API - ExperimentAPI - Linear fitConn - ExperimentConn - Linear fit

(b) Download from Wasabi

Fig. 7. Transfer time vs. number of files for 5 GB between local filesystem and Wasabi

We note that this experiment was designed to measure per-�le overhead. �is overhead can be mitigated by usingeither high concurrency or prefetch, as studied by Liu et al. [34]. We study throughput performance in §6.

5.3.3 Google Cloud Storage. Google Cloud also provides native APIs [26] for upload to and download from a storagebucket. �ese APIs behave similarly to the boto3 that we used for AWS-S3 and Wasabi; we can authenticate once andreuse the credential to transfer all �les sequentially for the regression analysis. Figure 8 compares experimental resultsand the ��ed performance model.

When the GoogleCloud-Connector is deployed locally, per-�le overhead is much higher than that of the native APIin both data movement directions. As modeled in Equation 4, however, the Connector achieves much higher e�ciency

11

100 200 300 400 500 600 700 800 900 1000Number of files

100

200

300

400

500

Tran

sfer

tim

e(s)

Tapi =0:2326N+56:9 T localconn =0:3096N+46:4

T cloudconn =0:2618N+44:9

Conn-cloud - ExpConn-cloud - MdlConn-local - ExpConn-local - MdlAPI - ExpAPI - Mdl

(a) Upload

100 200 300 400 500 600 700 800 900 1000Number of files

0

100

200

300

400

500

Tran

sfer

tim

e(s)

Tapi =0:1911N+49:7 T localconn =0:3111N+11:9

T cloudconn =0:1653N+38:1

Conn-cloud - ExpConn-cloud - MdlConn-local - ExpConn-local - MdlAPI - ExpAPI - Mdl

(b) Download

Fig. 8. Transfer time vs. number of files for 5 GB between the local file system and Google-Cloud. Conn-cloud: Connector isdeployed in the Google Cloud near the bucket (Figure 5); Conn-local is deployed in a science institution (Figure 4).

than the native API does (bias of the linear model is smaller). In other words, the Connector will perform be�er thanthe API when transferring a few big �les but worse when transferring many small �les.

In contrast, when the Connector is deployed near the storage bucket on a Google Cloud VM, then GridFTP (optimizedfor WAN data movement) is used for WAN transfer and the API only within the cloud, namely, to move data betweenthe Cloud VM and storage bucket within the same data center. As shown in Figure 8, the per-�le overhead of Connectoris slightly worse than that of the API for upload but be�er than the API for download, thanks to GridFTP WAN datamovement optimizations. Again, the Connector achieves much higher e�ciency than does the native API. �ese resultsreveal that if the Connector is deployed near the storage bucket, it will perform be�er than the native API, if thenetwork bandwidth is not the bo�leneck.

5.3.4 Google Drive. Transfers to and from Google-Drive are signi�cantly slower than with the other storageservices studied. �us, to optimize experiment times and to minimize the in�uence of external load on our experiments,we used datasets totaling 1 GB, rather than 5 GB as for other Connectors, for our regression analysis experiments.

We see that GoogleDrive-Connector and API perform similarly for uploads from the local �le system to Google-Drive.For downloads, GoogleDrive-Connector introduces a li�le more per-�le overhead than does the native APIs, but itsnetwork use e�ciency is higher. �us, it can achieve similar performance to that of the native API for big �les, but itunderperforms for smaller �les.

5.3.5 Ceph. Similar to our evaluation for AWS-Connector and GoogleCloud-Connector, we consider two deployedscenarios for Ceph-Connector: (1) close to the Ceph storage system (referred to as cloud) and (2) locally in the scienceinstitution (referred to as local).

Figure 10 compares the performance model and actual experiment measurement. We see that, as in the case ofAWS-Connector and GoogleCloud-Connector, the Connector incurs much lower per-�le overheads when deployednear the storage system. �at is mostly because GridFTP allows moving data out-of-order that leads to be�er e�ciency,and GridFTP plays the role to move data over wide-area network when Connector is deployed near the cloud storage.

5.3.6 Box.com. As we did for evaluating other Connectors, here again we used native APIs (here, those provided bythe Box SDK [4]) to move data between box.com and local storage in order to compare Box and the Connector. From

12

100 200 300 400 500 600 700 800 900 1000Number of files

0

200

400

600

800

1000

1200

1400

Tran

sfer

tim

e(s)

Tapi = 1.2545N+ 58.7

Tconn = 1.2511N+ 59.2

API - ExperimentAPI - Linear fitConn - ExperimentConn - Linear fit

(a) Upload to Google-Drive

100 200 300 400 500 600 700 800 900 1000Number of files

0

100

200

300

400

500

600

Tran

sfer

tim

e(s)

Tapi = 0.4185N+ 46.8

Tconn = 0.5529N+ 27.5

API - ExperimentAPI - Linear fitConn - ExperimentConn - Linear fit

(b) Download

Fig. 9. Transfer time vs. number of files for 1 GB between local filesystem and Google-Drive

100 200 300 400 500 600 700 800 900 1000Number of files

100

200

300

400

500

Tran

sfer

tim

e(s)

Tapi =0:2061N+30:6 T localconn =0:3513N+78:7

T cloudconn =0:1586N+39:0

Conn-cloud - ExpConn-cloud - MdlConn-local - ExpConn-local - MdlAPI - ExpAPI - Mdl

(a) Upload to Ceph

100 200 300 400 500 600 700 800 900 1000Number of files

0

100

200

300

400

500

Tran

sfer

tim

e(s)

Tapi =0:0882N+109:6 T localconn =0:3615N+52:9

T cloudconn =0:0990N+31:5

Conn-cloud - ExpConn-cloud - MdlConn-local - ExpConn-local - MdlAPI - ExpAPI - Mdl

(b) Download from Ceph

Fig. 10. Transfer time vs. number of files for 5 GB between local file system and Ceph cloud storage.

the experimental measurements in Figure 11, we observe that Box.com-Connector and the native API have similarper-�le overheads.

100 200 300 400 500 600 700 800 900 1000Number of files

20406080

100120140160180

Tran

sfer

tim

e(s)

Tapi = 0.1523N+ 5.8

Tdsi = 0.1544N+ 10.9

API - ExperimentAPI - Linear fitConn - ExperimentConn - Linear fit

(a) Upload to box.com

100 200 300 400 500 600 700 800 900 1000Number of files

100200300400500600700800900

1000

Tran

sfer

tim

e(s)

Tapi = 0.8692N+ 50.2

Tdsi = 0.8514N+ 33.8

API - ExperimentAPI - Linear fitConn - ExperimentConn - Linear fit

(b) Download from box.com

Fig. 11. Transfer time vs. number of files for 1 GB between the local file system and box.com

13

5.4 Transfer Startup Cost

Equation 4 includes, for each transfer, a startup cost of S0. �is cost varies according to the transfer method used. If auser logs in to a cloud service and initiates a two-party transfer directly, the cost may be relatively low. In the case of acloud-hosted third-party transfer service such as Globus, it will be higher. To measure this cost in di�erent contexts, wedesigned an experiment that transfers a single �le of di�erent sizes. �us, the performance model is

T = B ∗ tu + S0, (6)

where B is the size of the single �le in GB and tu is the time to transfer 1 GB. To resolve S0, we transfer a single �le withB ∈{1, 3, . . . , 17, 19} GB from a local �le system to a cloud store (in this case, Wasabi), and �t the resulting runtimes toEquation 6. Figure 12 shows the relation between B and T. We see a strong linear relationship between B and T and atransfer startup cost of 2.3 seconds, which is negligible in most cases except where one transfers a particularly smallamount of data in a particularly high-throughput environment.

0 5 10 15 20Transfer Size(GB)

020406080

100120140160

Tran

sfer

tim

e(s)

Tapi = 6.8B+ 0.1

Tdsi = 8.1B+ 2.3

API - ExperimentAPI - Linear fitDSI - ExperimentDSI - Linear fit

Fig. 12. Transfer time vs. single-file dataset size for upload to Wasabi: Globus Connector third-party and API two-party.

6 THROUGHPUT ANALYSIS

Based on the investigation of per-�le overhead, we see that datasets with big �les are more friendly to transfer tools [34].Here, we used the most friendly datasets to benchmark the best transfer performance using di�erent concurrency levels.Speci�cally, in order to use a concurrency of cc with a Connector, we initiated a transfer with cc �les, each of size1 GB. When a native API was used, we initiated cc threads to transfer cc �les concurrently. In practice, aggregatedthroughput �rst increases quickly with concurrency and eventually drops slowly, because of local contention. As notedin previous studies [11, 36, 57], there is no one-size-�ts-all se�ing for concurrency. �us, for all experiments in thissection, we increased concurrency until we see negative bene�t.

14

6.1 Wasabi

Figure 13 compares the S3 Connector and Wasabi API. We see that transferring multiple �les concurrently doeshelp to some extent by overlapping the per-�le overhead. As for throughput, as evaluated in the preceding section,Wasabi-Connector achieves performance similar to that of the native API does.

1 2 3 4 5 6Concurrency

0

1

2

3

4

5

Thro

ughp

ut (G

bps) Conn API

(a) Upload to Wasabi

1 2 3 4 5 6Concurrency

0.00.20.40.60.81.01.21.41.6

Thro

ughp

ut (G

bps) Conn API

(b) Download from Wasabi

Fig. 13. Transfer performance as a function of concurrency: Globus Connector third-party and Wasabi two-party API

6.2 AWS S3

Figure 14 shows transfer performance between local DTN and AWS S3 as a function of concurrency used. We see thatuploads to AWS S3 are consistently faster via the AWS API than via AWS-Connector, while for downloads the reverseis true. Furthermore, the Connector performance is consistently be�er when on AWS rather than local. We a�ributethe superior download performance of AWS-Connector to its use of the wide-area-network-optimized GridFTP, whichfor example allows out-of-order transmissions. �us, AWS-Connector can extract data from S3 as fast as S3 will allowvia local area network (within AWS region) and transmit them in parallel (out-of-order if needed) over the wide areanetwork using GridFTP.

1 2 3 4 5 6 7 8 9Concurrency

0

2

4

6

8

Thro

ughp

ut (G

bps) Conn-Local Conn-Cloud API

(a) Upload to AWS-S3

1 2 3 4 5 6 7Concurrency

0

1

2

3

4

Thro

ughp

ut (G

bps) Conn-Local Conn-Cloud API

(b) Download from AWS-S3

Fig. 14. Transfer performance as a function of concurrency

15

For downloads, the limitation seems to be the network performance of the AWS EC2 instance on which theAWS-Connector is located. �e m5.8xlarge instance (32 vCPU, 128 GB RAM) that we used to host AWS-Connectoron AWS is supposed to deliver 10 Gbps external network performance. However, an iperf test with 16 parallel TCPstreams from the AWS instance to our local DTN showed only 4.7 Gbps (i.e., downloads in Figure 14b).

6.3 Google Cloud Storage

Figure 15 shows transfer performance as a function of concurrency used. In the Conn-cloud case, the Connector runson a Google Cloud virtual machine instance with 32 vCPU and 128GB RAM that is close to the Google-Cloud bucket.We used iperf with 16 parallel TCP streams to measure network bandwidth between our local DTN and the VMinstance on Google Cloud; we achieved 4 Gbps from Google Cloud to local DTN (i.e., download) and 7.3 Gbps fromlocal DTN to the Google Cloud instance (i.e., upload). Since the data will not go through this VM instance when usingthe native API, native API transfers are not limited by the above mentioned peak iperf throughput values (which arelikely limited by the VM’s network). �us, it is not fair to compare the throughput achieved by the API with that of thethroughput achieved by Connector when the throughput achieved by the API is above the peak iperf measurements.We see in Figure 15a that the Connector upload performance is consistently be�er than that of the native API.

1 2 3 4 5 6 7Concurrency

0

1

2

3

4

Thro

ughp

ut (G

bps) Conn-Local Conn-Cloud API

(a) Upload

1 2 3 4 5 6 7Concurrency

0

1

2

3

4

5Th

roug

hput

(Gbp

s) Conn-Local Conn-Cloud API

(b) Download

Fig. 15. Transfer (upload to and download from Google-Cloud) performance as a function of concurrency

In the download case, since the VM’s network egress bandwidth is only 4 Gbps, the comparison a�er achieving4 Gbps (there are protocol overheads in practice) does not make sense. However, in those experiment that are notlimited by network bandwidth (i.e., when concurrency is less than 5), the Connector clearly performs be�er than API,with the cloud-placed Connector (Conn-cloud) performing be�er than the locally placed connector (Conn-local). �eseresults are in line with our regression analysis in §5.3.3.

6.4 Ceph

Depending on resource availability and deployment of Ceph storage, similar to AWS-S3 and Google-Cloud, theCeph-Connector can be deployed in one of two ways: 1) in a local DTN or, 2) near the Ceph storage. Here weconduct experiments to benchmark the throughput of Ceph-Connector, and compare with native APIs. Our Cephis deployed on a bare metal node at the University of Chicago site of the NSF Chameleon cloud [31]. We deployedCeph-Connector in two locations: adjacent to the Ceph storage in Chicago and in Texas, at the TACC site of the NSF

16

Chameleon cloud. Since the data channel of Ceph-Connector uses the S3 protocol, we compared it against using boto3

to access Ceph. Figure 16 compares the performance of the two Ceph-Connector deployments with that of the nativeAPI (i.e., boto3). Ceph-Connector always get the best performance when deployed near the Ceph system, thanks tothe optimized data movement over WAN delivered by GridFTP.

1 2 3 4 5 6Concurrency

0.0

0.5

1.0

1.5

Thro

ughp

ut (G

bps) Conn-Local Conn-Closeby API

(a) Upload

1 2 3 4 5 6Concurrency

0

1

2

3

4

5

Thro

ughp

ut (G

bps) Conn-Local Conn-Closeby API

(b) Download

Fig. 16. Transfer (upload to and download from Ceph) performance as a function of concurrency

6.5 Inter-cloud Transfers

�e ability to transfer �les directly from one cloud store to another, instead of downloading and re-uploading �lesto and from an intermediate point, such as a user’s workstation, can be a major boost to researcher productivity. Inaddition to increasing performance, the �re-and-forget nature of third-party transfer increases reliability and eliminatesthe need to maintain an intermediate node running for the duration of the transfer from one cloud to another.

6.5.1 Connector cross-cloud performance. Globus logs show that moving data between AWS-S3 and Google-Cloud

is a common use case. We �rst evaluated performance for moving data between cloud providers. Figure 17 showsperformance vs. concurrency when moving data between AWS-S3 and Google-Cloud using Connector. Since there isno straightforward or automated way to do cross-cloud transfers using the native cloud storage APIs, we benchmarkthe performance of Connector alone to determine best practice for cross-cloud transfers (more on best practices in §8).An iperf3 network speed test with 16 parallel TCP streams between the AWS VM and Google Cloud VM achieved about4.5 Gbps in each direction. �us, as shown in Figure 17a, Connector can reach peak throughput when deployed atthe cloud provider. If deployed locally, however, they achieve only about half of the performance, a reduction that wea�ribute to network connectivity among AWS, Google Cloud, and the local DTN.

6.5.2 Connector comparison. MultCloud [40], like Globus, supports data movement across cloud storage services,including Google-Drive, box.com and AWS-S3. In comparing performance, we used our analysis of �le size charac-teristics [37] to select a test dataset of 50 �les totaling 1 GB. Since the free trial version of MultCloud only supportstransferring �les one by one, we also set concurrency to one for the Globus Connector for the experiment. Weused a local DTN to run the corresponding Connector for the experiment, although a cloud-based DTN gives be�erperformance. We see in Figure 18 that the Connector outperforms MultCloud in all cases.

17

1 2 3 4 5 6 7 8 9 10 11Concurrency

0.00.51.01.52.02.53.03.54.0

Thro

ughp

ut (G

bps) AWS-to-Google

Google-to-AWS

(a) Via in-cloud DTN

1 2 3 4 5 6 7 8 9 10 11Concurrency

0.00.51.01.52.02.53.03.54.0

Thro

ughp

ut (G

bps) AWS-to-Google

Google-to-AWS

(b) Via local DTN

Fig. 17. Transfer performance between AWS-S3 and Google-Cloud vs. concurrency, for local and in-cloud DTN

7 INTEGRITY CHECKING

It is good practice to perform integrity checking on �les transferred over wide area networks because factors such asfaulty routers and �le systems can cause silent data corruption [16, 37, 47]. �e 16-bit TCP checksum is inadequate tocatch network transmission errors, and other errors can occur when accessing storage. Indeed, a recent study [37]reported at least one checksum failure per 1.26 TB moved from storage to storage over a wide area network. Whilethis number is likely an over-estimate, as it does not distinguish between data corruption and cases in which a �le ismodi�ed while a transfer is in progress, it emphasizes the importance of integrity checking.

�e Connector abstraction interface supports integrity checking via GridFTP [7]. A client can verify transmissionintegrity by having a �le read and a checksum computed at the source before transmission and then reread and asecond checksum computed at the destination. �is “strong integrity checking” approach has the advantage that it candetect not only errors incurred during data transport over the network but also errors incurred while writing data.However, the additional read operations can impact performance, particularly if a Connector is located remotely fromcloud storage. Given the wide variety of storage systems, Connector placement strategies, and transfer workloads,

18

GDr to AWS GDr to Box AWS to Box AWS to GDr Box to Gdr Box to AWSTransfer Direction

0

20

40

60

80

100

120

Thro

ughp

ut (M

bps) Connector MultCloud

Fig. 18. Throughput comparison: MultCloud vs. Globus

we cannot provide a complete analysis of integrity checking costs. However, we present some relevant results forhigh throughout storage systems (where even a small integrity checking overhead can have a signi�cant in�uence)in Figure 19–Figure 21 for Wasabi, AWS-S3 and Google-Cloud respectively. In each case, the Connector is located on acomputer in our institution(Argonne), and the transfer involves c 300 MB �les, where c is the concurrency. As onecan see, transfer rates are lower when integrity checking is enabled, but not remarkably so, given that the �le is beingreread over the wide area network a�er writing.

1 2 3 4 5 6 7 8 9 10Concurrency

0.00.51.01.52.02.53.03.54.04.5

Thro

ughp

ut (G

bps) Check-OFF Check-ON

Fig. 19. Transfer (upload to Wasabi) performance, with integrity checking ON versus OFF

8 BEST PRACTICE

�e GridFTP-based Globus transfer service has been heavily optimized for moving data over wide area networks [7, 32].Here we provide recommendations for Connector deployment when aiming either to maximize throughput or tominimize costs.

19

1 2 3 4 5 6 7 8 9 10 11Concurrency

0.0

0.5

1.0

1.5

2.0

2.5

3.0Th

roug

hput

(Gbp

s) Check-OFF Check-ON

Fig. 20. Transfer (upload to AWS-S3) performance, with integrity checking ON versus OFF

1 2 3 4 5 6 7 8 9 10 11Concurrency

0.00.51.01.52.02.53.03.54.04.5

Thro

ughp

ut (G

bps) Check-OFF Check-ON

Fig. 21. Transfer (upload to Google-Cloud) performance, with integrity checking ON versus OFF

8.1 Throughput Maximization

Best practice for throughput maximization when moving data to and from cloud storage is to deploy the correspondingConnector near the cloud storage service. �is means, for example, using a Google Cloud computing instance as aDTN to run one or more GoogleCloud-Connector and using AWS EC2 instance(s) to run one or more AWS-Connector.Moreover, benchmark experiments in §6.5.1 also shows that for inter-cloud transfers, this con�guration (deployingConnector near the cloud storage) can achieve a 100% improvement in throughput compared to the con�gurationin which Connector is deployed locally at users’ site (or at a location that is not closer to the cloud storage). �etransfer throughput achievable in these two cases depends on the size (in terms of vCPUs and memory) of the allocatedinstance(s). We have found that two vCPUs and 4 GB of memory are needed to saturate a 10 Gbps network. In order toachieve high performance with reduced cost, these cloud-hosted DTN instances can adopt an elastic resource allocationapproach, increasing resources allocated to the Connector when demand is high and reducing it at other times [17].

We note that such cloud-hosted Connector can be shared by several science institutions that use the same federatedauthentication mechanism, such as XSEDE [50].

20

8.2 Cost Minimization

An alternative deployment approach is to run Connector on computers hosted at science institutions. �is approachdoes not require any additional hardware, but it means that all accesses to cloud storage involve data transfers with cloudprovider protocols. �e results presented earlier in this paper suggest that this approach will lead to li�le performanceloss for datasets with large �les but signi�cant performance loss for datasets with many small �les.

Performance-cost calculations may be di�erent when using integrity checking, as discussed in the next section. SinceConnector integrity checking involves rereading a �le a�er writing and since cloud storage providers usually chargefor network usage when data is moved out of the cloud, it is advantageous when integrity checking is enabled to deploya cloud storage Connector in the same cloud as the storage.

9 RELATEDWORK

Others have developed implementations of the Globus GridFTP DSI. EUDAT [51] implemented a DSI [2] for theIntegrated Rule-Oriented Data System (iRODS) [43] data management so�ware. Sanchez et al. [46] proposed a parallelDSI for GridFTP and o�ered an implementation for the MAPFS [42] parallel �le system. A DSI implementation forOpenStack Object Storage (Swi�) is also available [39]. However, no DSI implementation targets cloud stores, and noneprovide performance evaluations.

Others have developed uniform interfaces to cloud storage, but by supporting multiple protocols in a client, not aConnector as proposed here. We described MultCloud in §6.5.2. Rclone [44] is a command line program that o�ers arsync-like tool to synchronize �les for cloud storage. It integrates APIs for various cloud stores but does not providetransfer management functionality. iRODS implements an interface to AWS S3 [52].

As cloud computing has become the de facto standard for big data processing, Abramson et al. [6] proposed theMetropolitan Data Caching Infrastructure (MeDiCI) architecture to simplify the movement of data between di�erentclouds and a centralized storage site. It is similar as the scenario we evaluated in §6.5.1 but MeDiCI is cache-basedtarget at on-demand cloud computing.

Liu et al. [34] used regression analysis to measure per-�le overhead indirectly, and concluded that the bo�leneckin transferring many small �les between HPC facilities is not any single subsystem but rather the per-�le overheadsintroduced by the major components in wide area �le transfers. Deelman et al. [19] have developed similar models. �ebene�ts of parallel streams for transfer performance are well known [28]. Several researchers have studied the impactof concurrency, parallelism, and other parameters on GridFTP transfer performance [9–11, 36, 57], for example, basedon historical data [10] or lightweight probing [9].

10 CONCLUSION

In this paper, we described an architecture, interfaces, and implementation methods for unifying the interface to a widerange of storage systems. �is architecture enables the plug-and-play integration of storage Connector for di�erentstorage systems that simpli�es both the use of di�erent storage systems and the development of new Connector.Integration of these Connector with the Globus data transfer service enables data movement across various storagesystems in a “�re-and-forget” fashion. We described Connector implementations for a range of storage types, fromPOSIX �le systems to HPC parallel �le systems and cloud object stores. We used a performance-model-based analysis toevaluate Connector implementations and used both experiments and analysis to draw conclusions about implicationsfor the design and implementation. �e proposed performance evaluation method can also be used for inspecting and

21

explaining performance of any other �le transfer services. We conclude that the Connector model enables e�ectiveuse of distributed storage with only modest performance loss relative to native in most cases—and performanceimprovements in other cases, due to the optimization of data movement over wide area networks delivered by the opensource GridFTP.

ACKNOWLEDGMENTS

�is material was based upon work supported by the U.S. Department of Energy,O�ce of Science, under contractDE-AC02-06CH11357.

REFERENCES[1] [n.d.]. AWS SDK for Python (Boto3). h�ps://aws.amazon.com/sdk-for-python. Accessed April 1, 2020.[2] [n.d.]. B2STAGE-GridFTP (iRODS-DSI). h�ps://github.com/EUDAT-B2STAGE/B2STAGE-GridFTP. Accessed April 1, 2020.[3] [n.d.]. GridFTP-DSI-for-HPSS. h�ps://github.com/JasonAlt/GridFTP-DSI-for-HPSS. Accessed April 1, 2020.[4] [n.d.]. Introducing the Box SDK. h�p://opensource.box.com/box-python-sdk. Accessed April 1, 2020.[5] [n.d.]. StoRM GridFTP DSI. h�ps://github.com/italiangrid/storm-grid�p-dsi. Accessed April 1, 2020.[6] David Abramson, Jake Carroll, Chao Jin, Michael Mallon, Zane van Iperen, Hoang Nguyen, Allan McRae, and Liang Ming. 2019. A Cache-Based

Data Movement Infrastructure for On-demand Scienti�c Cloud Computing. In Supercomputing Frontiers, David Abramson and Bronis R. de Supinski(Eds.). Springer International Publishing, Cham, 38–56.

[7] William Allcock, John Bresnahan, Rajkumar Ke�imuthu, Michael Link, Catalin Dumitrescu, Ioan Raicu, and Ian Foster. 2005. �e Globus StripedGridFTP Framework and Server. In ACM/IEEE Conference on Supercomputing (SC ’05). IEEE Computer Society, Washington, DC, USA, 54–.h�ps://doi.org/10.1109/SC.2005.72

[8] I. Altintas, C. Berkley, E. Jaeger, M. Jones, B. Ludascher, and S. Mock. 2004. Kepler: An extensible system for design and execution of scienti�cwork�ows. In 16th International Conference on Scienti�c and Statistical Database Management. 423–424. h�ps://doi.org/10.1109/SSDM.2004.1311241

[9] Engin Arslan, Kemal Guner, and Tev�k Kosar. 2016. HARP: predictive transfer optimization based on historical analysis and real-time probing. InSC’16: Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis. IEEE, 288–299.

[10] Engin Arslan and Tev�k Kosar. 2018. High-speed transfer optimization based on historical analysis and real-time tuning. IEEE Transactions onParallel and Distributed Systems 29, 6 (2018), 1303–1316.

[11] Engin Arslan, Bahadir A Pehlivan, and Tev�k Kosar. 2018. Big data transfer optimization through adaptive parameter tuning. J. Parallel and Distrib.Comput. 120 (2018), 89–100.

[12] Yadu Babuji, Anna Woodard, Zhuozhao Li, Daniel S. Katz, Ben Cli�ord, Rohan Kumar, Luksaz Lacinski, Ryan Chard, Justin M. Wozniak, Ian Foster,Michael Wilde, and Kyle Chard. 2019. Parsl: Pervasive Parallel Programming in Python. In 28th ACM International Symposium on High-PerformanceParallel and Distributed Computing. h�ps://doi.org/10.1145/3307681.3325400 babuji19parsl.pdf.

[13] Jacob Benesty, Jingdong Chen, Yiteng Huang, and Israel Cohen. 2009. Pearson correlation coe�cient. In Noise Reduction in Speech Processing.Springer, 1–4.

[14] Kyle Chard, Ian Foster, and Steven Tuecke. 2017. Globus: Research Data Management as Service and Platform. In Proceedings of the Practiceand Experience in Advanced Research Computing 2017 on Sustainability, Success and Impact (New Orleans, LA, USA) (PEARC17). Association forComputing Machinery, New York, NY, USA, Article 26, 5 pages. h�ps://doi.org/10.1145/3093338.3093367

[15] Kyle Chard, Steven Tuecke, and Ian Foster. 2016. Globus: Recent enhancements and future plans. In XSEDE16 Conference on Diversity, Big Data, andScience at Scale. ACM, 27.

[16] Batyr Charyyev, Ahmed Alhussen, Hemanta Sapkota, Eric Pouyoul, Mehmet H Gunes, and Engin Arslan. 2019. Towards securing data transfersagainst silent data corruption. In IEEE/ACM International Symposium in Cluster, Cloud, and Grid Computing.

[17] Joaquin Chung, Zhengchun Liu, Rajkumar Ke�imuthu, and Ian Foster. 2019. Toward an Elastic Data Transfer Infrastructure. In 15th InternationalConference on eScience. 262–265. h�ps://doi.org/10.1109/eScience.2019.00036

[18] HPSS Collaboration. [n.d.]. High Performance Storage System. h�p://www.hpss-collaboration.org/ Accessed June 1, 2020.[19] Ewa Deelman, Christopher Carothers, Anirban Mandal, Brian Tierney, Je�rey S Ve�er, Ilya Baldin, Claris Castillo, Gideon Juve, Dariusz Krol, Vickie

Lynch, Ben Mayer, Jeremy Meredith, �omas Pro�en, Paul Ruth, and Rafael Ferreira da Silva. 2017. PANORAMA: An approach to performancemodeling and diagnosis of extreme-scale work�ows. International Journal of High Performance Computing Applications 31, 1 (2017), 4–18.

[20] Ewa Deelman, Karan Vahi, Gideon Juve, Mats Rynge, Sco� Callaghan, Philip J Maechling, Rajiv Mayani, Weiwei Chen, Rafael Ferreira Da Silva,Miron Livny, and Kent Wenger. 2015. Pegasus, a work�ow management system for science automation. Future Generation Computer Systems 46(2015), 17–35.

[21] Alvise Dorigo, Peter Elmer, Fabrizio Furano, and Andrew Hanushevsky. 2005. XROOTD – A Highly scalable architecture for data access. WSEASTransactions on Computers 1, 4.3 (2005), 348–353. h�ps://xrootd.slac.stanford.edu/ Accessed June 1, 2020.

22

[22] Alvise Dorigo, Peter Elmer, Fabrizio Furano, and Andrew Hanushevsky. 2005. XROOTD–A Highly scalable architecture for data access. WSEASTransactions on Computers 1, 4.3 (2005), 348–353.

[23] Editorial. 2018. Data sharing and the future of science. Nature Communications 9, 1 (19 Jul 2018), 2817. h�ps://doi.org/10.1038/s41467-018-05227-z[24] Mike Folk, Albert Cheng, and Kim Yates. 1999. HDF5: A �le format and I/O library for high performance computing applications. In Supercomputing,

Vol. 99. 5–33.[25] Jeremy Goecks, Anton Nekrutenko, James Taylor, and Galaxy Team. 2010. Galaxy: A comprehensive approach for supporting accessible, reproducible,

and transparent computational research in the life sciences. Genome Biology 11, 8 (2010), R86.[26] Google. [n.d.]. Cloud Application Programming Interface. h�ps://cloud.google.com/apis Accessed June 1, 2020.[27] Google. [n.d.]. G Suit. h�ps://gsuite.google.com Accessed June 1, 2020.[28] �omas J Hacker, Brian D Noble, and Brian D Athey. 2004. Improving throughput and maintaining fairness using parallel TCP. In IEEE INFOCOM

2004, Vol. 4. IEEE, 2480–2489.[29] iRODS. [n.d.]. �e Integrated Rule-Oriented Data System (iRODS). h�ps://irods.org/ Accessed June 1, 2020.[30] Jianwei Li, Wei-keng Liao, A. Choudhary, R. Ross, R. �akur, W. Gropp, R. Latham, A. Siegel, B. Gallagher, and M. Zingale. 2003. Parallel netCDF: A

High-Performance Scienti�c I/O Interface. In ACM/IEEE Conference on Supercomputing. 39–39. h�ps://doi.org/10.1109/SC.2003.10053[31] Kate Keahey, Pierre Riteau, Dan Stanzione, Tim Cockerill, Joe Mambre�i, Paul Rad, and Paul Ruth. 2019. Chameleon: A Scalable Production Testbed

for Computer Science Research. In Contemporary High Performance Computing: From Petascale toward Exascale (1 ed.), Je�rey Ve�er (Ed.). Chapman& Hall/CRC Computational Science, Vol. 3. CRC Press, Boca Raton, FL, Chapter 5, 123–148.

[32] Rajkumar Ke�imuthu, Zhengchun Liu, David Wheeler, Ian Foster, Katrin Heitmann, and Franck Cappello. 2018. Transferring a petabyte in a day.Future Generation Computer Systems 88 (2018), 191–198. h�ps://doi.org/10.1016/j.future.2018.05.051

[33] Qing Liu, Jeremy Logan, Yuan Tian, Hasan Abbasi, Norbert Podhorszki, Jong Youl Choi, Sco� Klasky, Roselyne Tchoua, Jay Lofstead, Ron Old�eld,Manish Parashar, Nagiza Samatova, Karsten Schwan, Arie Shoshani, Ma�hew Wolf, Kesheng Wu, and Weikuan Yu. 2014. Hello ADIOS: �echallenges and lessons of developing leadership class I/O frameworks. Concurrency and Computation: Practice and Experience 26, 7 (2014), 1453–1473.

[34] Yuanlai Liu, Zhengchun Liu, Rajkumar Ke�imuthu, Nageswara Rao, Zizhong Chen, and Ian Foster. 2019. Data Transfer between Scienti�c Facilities- Bo�leneck Analysis, Insights and Optimizations. In 19th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing. 122–131.h�ps://doi.org/10.1109/CCGRID.2019.00023

[35] Zhengchun Liu, Prasanna Balaprakash, Rajkumar Ke�imuthu, and Ian Foster. 2017. Explaining Wide Area Data Transfer Performance. In 26thInternational Symposium on High-Performance Parallel and Distributed Computing (Washington, DC, USA) (HPDC’17). ACM, New York, NY, USA,167–178. h�ps://doi.org/10.1145/3078597.3078605

[36] Zhengchun Liu, Rajkumar Ke�imuthu, Ian Foster, and Peter H. Beckman. 2018. Toward a smart data transfer node. Future Generation ComputerSystems 89 (2018), 10–18. h�ps://doi.org/10.1016/j.future.2018.06.033

[37] Zhengchun Liu, Rajkumar Ke�imuthu, Ian Foster, and Nageswara S. V. Rao. 2018. Cross-geography Scienti�c Data Transferring Trends and Behavior.In 27th International Symposium on High-Performance Parallel and Distributed Computing (Tempe, Arizona) (HPDC’18). ACM, New York, NY, USA,267–278. h�ps://doi.org/10.1145/3208040.3208053

[38] Zhengchun Liu, Ryan Lewis, Rajkumar Ke�imuthu, Kevin Harms, Philip Carns, Nageswara Rao, Ian Foster, and Michael Papka. 2020. Characterizationand Identi�cation of HPC Applications at a Leadership Computing Facility. In 34th ACM International Conference on Supercomputing. h�ps://doi.org/10.1145/3392717.3392774

[39] Richard Moore. 2013. Data Services for Campus Researchers. h�ps://bit.ly/2XYGKbK.[40] MultCloud. [n.d.]. Multiple Cloud Storage Manager. h�ps://www.multcloud.com/ Accessed June 1, 2020.[41] Irene V Pasque�o, Bernade�e M Randles, and Christine L Borgman. 2017. On the reuse of scienti�c data. Data Science Journal (2017). h�ps:

//doi.org/10.5334/dsj-2017-008[42] Mara S. Prez, Jess Carretero, Flix Garca, Jos M. Pea, and Vctor Robles. 2006. MAPFS: A �exible multiagent parallel �le system for clusters. Future

Generation Computer Systems 22, 5 (2006), 620–632. h�ps://doi.org/10.1016/j.future.2005.09.006[43] Arcot Rajasekar, Reagan Moore, Chien-yi Hou, Christopher A Lee, Richard Marciano, Antoine de Torcy, Michael Wan, Wayne Schroeder, Sheau-Yen

Chen, Lucas Gilbert, Chien-Yi Hou, Christopher A. Lee, Richard Marciano, Paul Tooby, Antoine de Torcy, and Bing Zhu. 2010. iRODS primer:Integrated rule-oriented data system. Synthesis Lectures on Information Concepts, Retrieval, and Services 2, 1 (2010), 1–143.

[44] Rclone. [n.d.]. Rclone - rsync for cloud storage. h�ps://rclone.org/ Accessed June 1, 2020.[45] Robert Ross, Lee Ward, Philip Carns, Gary Grider, Sco� Klasky, �incey Koziol, Glenn K Lockwood, Kathryn Mohror, Bradley Se�lemyer, and

Ma�hew Wolf. 2019. Storage systems and I/O: Organizing, storing, and accessing data for scienti�c discovery. Technical Report. US-DOE O�ce ofScience.

[46] Alberto Sanchez, Marıa S Perez, Pierre Gueant, Jesus Montes, and Pilar Herrero. 2006. A parallel data storage interface to GridFTP. In OTMConfederated International Conferences On the Move to Meaningful Internet Systems. Springer, 1203–1212.

[47] Jonathan Stone and Craig Partridge. 2000. When the CRC and TCP checksum disagree. ACM SIGCOMM Computer Communication Review 30, 4(2000), 309–319.

[48] Yu Su, Yi Wang, Gagan Agrawal, and Rajkumar Ke�imuthu. 2013. SD�ery DSI: Integrating data management support with a wide area datatransfer protocol. In International Conference on High Performance Computing, Networking, Storage and Analysis. 1–12.

23

[49] Rajeev �akur, William Gropp, and Ewing Lusk. 1999. On implementing MPI-IO portably and with high performance. In 6th Workshop on I/O inParallel and Distributed Systems. 23–32.

[50] John Towns, Timothy Cockerill, Maytal Dahan, Ian Foster, Kelly Gaither, Andrew Grimshaw, Victor Hazlewood, Sco� Lathrop, Dave Li�a, Gregory DPeterson, Ralph Roskies, J. Ray Sco�, and Nancy Wilkins-Diehr. 2014. XSEDE: Accelerating scienti�c discovery. Computing in Science & Engineering16, 5 (2014), 62–74.

[51] Marie van de Sanden, Christine Staiger, Claudio Cacciari, Roberto Mucci, Carl Johan Hakansson, Adil Hasan, Stephane Coutin, Hannes �iemann,Benedikt von St Vieth, and Jens Jensen. 2015. D5.3: Final Report on EUDAT Services. Technical Report. EUDAT.

[52] M Wan, R Moore, and A Rajasekar. 2009. Integration of cloud storage with data grids. In 3rd International Conference on the Virtual ComputingInitiative.

[53] Wasabi. [n.d.]. Cloud Object Storage by Wasabi. h�ps://wasabi.com Accessed June 1, 2020.[54] Sage A Weil, Sco� A Brandt, Ethan L Miller, Darrell DE Long, and Carlos Maltzahn. 2006. Ceph: A scalable, high-performance distributed �le

system. In 7th Symposium on Operating Systems Design and Implementation. 307–320.[55] Mark D. Wilkinson, Michel Dumontier, IJsbrand Jan Aalbersberg, Gabrielle Appleton, Myles Axton, Arie Baak, Niklas Blomberg, Jan Willem

Boiten, Luiz Bonino da Silva Santos, Philip E. Bourne, Jildau Bouwman, Anthony J. Brookes, Tim Clark, Merce Crosas, Ingrid Dillo, Olivier Dumon,Sco� Edmunds, Chris T. Evelo, Richard Finkers, Alejandra Gonzalez-Beltran, Alasdair J.G. Gray, Paul Groth, Carole Goble, Je�rey S. Grethe, JaapHeringa, Peter A C ’t Hoen, Rob Hoo�, Tobias Kuhn, Ruben Kok, Joost Kok, Sco� J. Lusher, Maryann E Martone, Albert Mons, Abel L. Packer,Bengt Persson, Philippe Rocca-Serra, Marco Roos, Rene van Schaik, Susanna Assunta Sansone, Erik Schultes, �ierry Sengstag, Ted Slater, GeorgeStrawn, Morris A. Swertz, Mark �ompson, Johan Van Der Lei, Erik Van Mulligen, Jan Velterop, Andra Waagmeester, Peter Wi�enburg, KatherineWolstencro�, Jun Zhao, and Barend Mons. 2016. �e FAIR Guiding Principles for scienti�c data management and stewardship. Scienti�c Data 3(2016). h�ps://doi.org/10.1038/sdata.2016.18

[56] Justin M Wozniak, Timothy G Armstrong, Michael Wilde, Daniel S Katz, Ewing Lusk, and Ian T Foster. 2013. Swi�/T – large-scale applicatinocomposiion via distributed-memory data�ow processing. In 13th IEEE/ACM International Symposium on Cluster, Cloud, and Grid Computing. IEEE,95–102.

[57] Esma Yildirim, Engin Arslan, Jangyoung Kim, and Tev�k Kosar. 2015. Application-level optimization of big data transfers through pipelining,parallelism and concurrency. IEEE Transactions on Cloud Computing 4, 1 (2015), 63–75.

GOVERNMENT LICENSE

�e submi�ed manuscript has been created by UChicago Argonne, LLC, Operator of Argonne National Laboratory(�Argonne�). Argonne, a U.S. Department of Energy O�ce of Science laboratory, is operated under Contract No.DE-AC02-06CH11357. �e U.S. Government retains for itself, and others acting on its behalf, a paid-up nonexclusive,irrevocable worldwide license in said article to reproduce, prepare derivative works, distribute copies to the public,and perform publicly and display publicly, by or on behalf of the Government. �e Department of Energy willprovide public access to these results of federally sponsored research in accordance with the DOE Public Access Plan.h�p://energy.gov/downloads/doe-public-access-plan.

24


Recommended