Data transfer at CINECA: how to enjoy it! · Data transfer at CINECA: how to enjoy it! Giacomo...

Post on 15-Mar-2020

9 views 0 download

transcript

www.cineca.it

Data transfer at CINECA:Data transfer at CINECA:how to enjoy it!how to enjoy it!

Giacomo Mariani

g.mariani@cineca.itBologna 03/05/2011

02/05/11www.cineca.it 2

Table of contents

• Resources What we have

• Use Case What users need

• Tools How we support them

02/05/11www.cineca.it 3

Resources

02/05/11www.cineca.it 4

File system

• HPC machines at CINECA (PLX and SP6) have similar FS organization:

02/05/11www.cineca.it 5

File system

$HOME: permanent/backuped, local, 2GB;

$CINECA_DATA: permanent/not backuped, shared, i.e. GPFS, quite slow;

$CINECA_SCRATCH: temporary , local (20 days retention), very performing;

cart(uccia) backup your data on tape.

02/05/11www.cineca.it 6

Network

• One HPC machine is a cluster: lots of nodes connected through an HP-network (infiniband).

• Each cluster is connected to the others through a private network.

• The clusters are reachable from the public network through GARR (1Gb/s).

• DEISA2/PRACE projects have a dedicated private network which has 10Gb/s guaranteed bandwidth.

02/05/11www.cineca.it 7

Use Cases

A-Move data from user workstations to CINECA file systems and vice-versa.

B-Move data across file systems on a single HPC machine.

C-Move data across different machines.

D-Archive data.

02/05/11www.cineca.it 8

Tools

02/05/11www.cineca.it 9

cp/mv/scp/sftp

• CP and MV are local commands.• SFTP and SCP are secure methods for

transferring data among computers on a network: SFTP stands for "Secure File Transfer Protocal"; SCP stands for "Secure CoPy". The data are encrypted before being sent across

the Internet.

02/05/11www.cineca.it 10

cp/mv/scp/sftp

• The SFTP and SCP commands allow you to connect from a local machine to a remote machine in order to transfer data.

• The machine on which you run the client software is the local system.

• The machine on the other end of your data transfer session (the machine running the server software) is the remote machine.

02/05/11www.cineca.it 11

cp/mv/scp/sftp: use cases

A:user@ws$scp $HOME/path/to/file user@sp.sp6.cineca.it:path/to/

B:user@sp6$cp $HOME/path/to/file $CINECA_SCRATCH/path/to/

user@sp6$mv $CINECA_SCRATCH/path/to/file $CINECA_DATA/path/to/

02/05/11www.cineca.it 12

rsync

• rsync is a software tool for Unix and Windows systems which synchronizes files and directories from one location to another while optimizing data transfer.

• rsync tries to optimize the exploitation of network performance.

• rsync can copy or show directory contents and copy files, even using compression and recursion.

02/05/11www.cineca.it 13

rsync

• rsync was originally written as a replacement for rcp and scp.

• Like its predecessors, it still requires a source and a destination to be specified, one of these may be remote.

• rsync does not overwrite unmodified files.

02/05/11www.cineca.it 14

rsync: use cases

A:user@ws$rsync -va $HOME/path/to/dir user@sp6:$CINECA_DATA/path/to/

B:user@sp6$module load rsync

user@sp6$rsync --timeout=600 -avHS -r --numeric-ids --bwlimit=80000 --block-size=1048576 --progress $CINECA_SCRATCH/path/file $CINECA_DATA/path/

C:user@sp6$rsync -va $CINECA_DATA/path/to/directory user@remote-hpc:$HPC_DATA/path/to/

02/05/11www.cineca.it

What is GridFTP

GridFTP is a protocol which extends established technologies like FTP (File Transfer Protocol) and SCP (Secure CoPy) with the following improvements:

• Authentication via GSI

• Multiple parallel channels with streams and stripes

• Third-party transfers

• Tunability for network and I/O parameters

02/05/11www.cineca.it 16

How do I use globus-url-copy?

GridFTP main clientGridFTP main client is used like a normal FTP client. It may only be necessary to define some variables.

Transferring:Transferring:

user$ globus-url-copy

sshftp://<username>@sp.sp6.cineca.it/<remote_path/to/yourfile>

file:///home/user/<local_path/to/yourfile>

user$ globus-url-copy gsiftp://grid.cineca.it/<remote_path/to/yourfile>

file:///home/user/<local_path/to/yourfile>

How do I use GridFTP?

02/05/11www.cineca.it 17

How do I authenticate with globus-url-copy?

GridFTP main clientGridFTP main client is able to manage various authentication mechanisms.

The instances at CINECA only support two of them:

• username-password authentication (eventually with RSA keys)

• GSI authentication

How do I use GridFTP?

02/05/11www.cineca.it 18

Useful options

• -p <number>: number of streams, i.e. parallel tcp channels.-p <number>: number of streams, i.e. parallel tcp channels.

• -stripe: enable striped tranfer. -stripe: enable striped tranfer.

• -tcp-bs <size>:-tcp-bs <size>: dimension (in bytes) of the tcp buffer.

• -pp:-pp: allow pipelining.

• -list:-list: used with one argument, lists the given directory.

• -v:-v: verbose output...

• -dbg:-dbg: very verbose output.

02/05/11www.cineca.it

Third Party Transfer with Stripe

GridFTP

Client

GridFTP Server1 FE

GridFTP Server2 FE

BEn BEn

PUT/SPUT GET/SGET

02/05/11www.cineca.it 20

Clients: CL

UberFTP:UberFTP: is the most “official” client and improves globus-url-copy making it more interactive:

http://dims.ncsa.illinois.edu/set/uberftp

Interactive Commands

This listing is generated by typing 'help' at the command prompt.! ? active ascii binaryblksize bugs bye cat cdcdup chgrp chmod cksum closedcau debug dir family getglob hash help keepalive lcatlcd lcdup lchgrp lchmod lcloseldir lls lmkdir lopen lpwdlquote lrename lrm lrmdir lslsize lstage mget mkdir modemput open order parallel passivepbsz pget pput prot putpwd quit quote rename resumeretry rm rmdir runique sizestage sunique tcpbuf versions wait

02/05/11www.cineca.it 21

Clients: Java

Java CoG Kit Project Webstart:Java CoG Kit Project Webstart: File Transfer GUI.

http://www-unix.globus.org/cog/demo/

02/05/11www.cineca.it 22

Clients: Java

Java-Eclipse:Java-Eclipse: GridFTP Client.

http://bi.offis.de/gridftp/index.html

\

02/05/11www.cineca.it 23

GridFTP: use cases

A:user@ws$globus-url-copy /path/to/file sshftp://user@remote-hpc/path/to/

C:user@sp6$globus-url-copy -pp -restart gsiftp://grid.cineca.it/path/to/dir sshftp://user@remote-hpc/path/to/

user@sp6$globus-url-copy -p 8 -restart gsiftp://grid.cineca.it/path/to/file sshftp://user@remote-hpc/path/to/

02/05/11www.cineca.it 24

Usage example

Examples:user@bsd>globus-url-copy -list \ sshftp://user@sp.sp6.cineca.it/user@bsd>globus-url-copy -list \ gsiftp://grid.cineca.it/user@bsd>grid-proxy-inituser@bsd>globus-url-copy -list \ gsiftp://grid.cineca.it/user@bsd>uberftpUberFTP>helpUberFTP>open grid.cineca.itUberFTP>ls; pwd

02/05/11www.cineca.it 25

Usage example

Examples:user@bsd>ssh -Y user@astrctuser@astrct:~>grid-proxy-inituser@astrct:~>globus-url-copy -vb -len 50M \ /dev/zero gsiftp://grid.cineca.it/dev/nulluser@astrct:~>globus-url-copy -vb -p 6 \ -len 50M /dev/zero \ gsiftp://grid.cineca.it/dev/null

02/05/11www.cineca.it 26

Ready available at CINECA

• A pubblic installation (without -stripe option), available for both CINECA and DEISA -users, which is reachable with:

SSH authentication at sshftp://user@sp.sp6.cineca.it GSI authentication at gsiftp://grid.cineca.it

●A DEISA-user only installation (with -stripe option) reachable at: GSI only at gsiftp://grid-deisa.sp6.cineca.it

All of them are on SP6 and let users read and write filesystems on the base of their permissions.

02/05/11www.cineca.it 27

cart commands

• You need to be habilitated in order to use the cart commands.

• To obtain an habilitation send an email to superc@cineca.it

• The default quota is 500GB, but it can be extended providing some good reasons.

• The official guide is available at:https://hpc.cineca.it/content/production-environment-and-tools#cart

02/05/11www.cineca.it 28

cart commands

• cart_new <vol_name>: create a new volume <vol_name>

• cart_dir: show all the defined volumes from the user

• cart_dir <vol_name>: show the files stored on the volume <vol_name>

• cart_put <vol_name> <file>: save <file> in the volume <vol_name>

• cart_get <vol_name> <file>: retrieve the <file> from the volume <vol_name>

• cart_del <vol_name> <file>: delete the <file> from the volume <vol_name>

• cart_del <vol_name>: delete the volume <vol_name> (the volume is empty)|

• cart_acl -u <user> <vol_name>: grant access to a given volume for a specific user

• cart_dir_access -u <owner> [<vol_name>]: show the volumes/files of a different owner (cart_acl is needed)

• cart_get_access -u <owner> <vol_name> <file>: retrieve <file> from a volume of a different owner (cart_acl is needed)

02/05/11www.cineca.it 29

cart commands#!/bin/bash

# @ job_name = archive

# @ output = archive.out

# @ error = archive.err

# @ wall_clock_limit = 30:00

# @ job_type = serial

# @ resources = ConsumableMemory(1500MB)

# @ class = archive

# @ account_no = myaccount ! only for new style users

# @ queue

export CART=cart_name

export DIR=$CINECA_SCRATCH/xxx

echo "storing my " $DIR " directory on cartridge " $CART

cart_dir

cart_dir $CART

### tar the directory

cd $CINECA_SCRATCH

tar -cvzf $DIR.tar $DIR

### copy my tar file on the cart

cart_put $CART $DIR.tar

02/05/11www.cineca.it 30

Use cases: comparative table

A: Local ↔ CINECA

B: CINECA ↔

CINECA

C: CINECA ↔

HPC-machines

D: CINECA ↔

Archive

cp/mv VVscp/sftp VV Vrsync VV V VGridFTP V VVVcart_* V

02/05/11www.cineca.it 31

File dimension: comparative table

remote<->remote local<->remote

scp/sft (interactive) Not Available 20GB

rsync (interactive) Not Available 20GB

globus-url-copy (interactive) Unlimited 150GB

scp/sft (batch) Not Available WCT-limited

rsync (batch) Not Available WCT-limited

globus-url-copy (batch) Unlimited WCT-limited

WCT=Wall Clock Time

02/05/11www.cineca.it 32

Documentation

• Data management page at CINECA.• GridFTP page at Globus official site.• GridFTP page at CINECA• Official CINECA GridFTP guide• UberFTP official web-page• Globus GUI web-page• OFFIS GUI web-page• For further informations...

02/05/11www.cineca.it 33

Thanks

:wq