Download - ECMWF 1 COM HPCF 2004: High performance file I/O High performance file I/O Computer User Training Course 2004 Carsten Maaß User Support.

ECMWF 1 COM HPCF 2004: High performance file I/O

High performance file I/O

Computer User Training Course 2004

Carsten Maaß

User Support


Topics

• introduction to General Parallel File System (GPFS)

• GPFS in the ECMWF High Performance Computing Facility (HPCF)

• staging data to an HPCF cluster

• retrieving results from an HPCF cluster

• maximizing file I/O performance in FORTRAN

• maximizing file I/O performance in C and C++


• each GPFS file system can be shared simultaneously by multiple nodes

• can be configured for high availability

• performance generally much better than locally attached disks (due to parallel nature of the underlying file system)

• provides Unix file system semantics with very minor exceptions

• GPFS provides data coherency

– modifications to file content by any node are immediately visible to all nodes

The General Parallel File System (GPFS)


GPFS does NOT provide full metadata coherency

• the stat system call might return incorrect values for

atime, ctime and mtime (which can result in the ls

command providing incorrect information)

• metadata is coherent if all nodes have sync'd since

the last metadata modification

Use gpfs_stat and/or gpfs_fstat if exact atime, ctime and mtime values are required.

Metadata coherency


GPFS supports all 'traditional' Unix file locking mechanisms:

• lockf

• flock

• fcntl

Remember: Unix file locking is advisory, not mandatory

File locking


Concept GPFS NFSfile systems shared between multiple nodes

yes yespath names the same across different nodes

yes* yes*data coherent across all nodes yes nodifferent parts of a file can be simultaneously up- dated on different nodes

yes no

high performance yes notraditional Unix file locking semantics yes no

* if configured appropriately

Comparison with NFS


GPFS servers

GPFS clients p690128G

p690128G

p69032G

p69032G

p69032G

. .

.dual plane SP Switch 2

(dual ~350MB/sec switch)

I/Onode

I/Onode

I/Onode

I/Onode

p690128G

GPFS in an HPCF cluster


• all HPCF file systems are of the type GPFS

• each GPFS file systems will be global

• accessible from any node within a cluster

• GPFS file systems are not shared between the two HPCF clusters

HPCF file system setup (1/2)


ECMWF's file system locations are described by environment variables:

HPCF file system setup (2/2)

hpca block

size

[KB]

quota

[MB]

total

size

[GB]

suitable for

$HOME 64 80 8.7 permanent files: sources, .profile, utilities, libs

$TEMP 256 — 1100 temporary files (select/delete is applied)

$TMPDIR 256 — 1100 data to be deleted automatically at the end of a job

• Do not rely on select/delete!

• Clear your disk space as soon as possible!


•ecrcp from ecgate to hpca for larger transfers

• NFS to facilitate commands like ls etc. on remote machines

Transferring data to/from an HPCF cluster

depending on source and size of data


In roughly decreasing order of importance:

• Use large record sizes

– aim for at least 100K

– multi-megabyte records are even better

• use FORTRAN unformatted instead of formatted files

• use FORTRAN direct files instead of sequential

• reuse the kernel's I/O buffers:

– if a file which was recently written sequentially is to be read, start at the end and work backwards

Maximizing FORTRAN I/O performance


• each read call will typically result in:

– sending a request message to an I/O node

– waiting for the response

– receiving the data from the I/O node

• the time spent waiting for the response will be at least a few milliseconds regardless of the size of the request

• data transfer rates for short requests can be as low as a few hundred thousand bytes per second

• random access I/O on short records can be even slower!!!

What's wrong with short record sizes?

Reminder: At the HPCF's clock rate of 1.3 GHz, one millisecond spent waiting for data wastes over 1,000,000 CPU cycles!


real*8 a(1000,1000) . . . open (21, file='input.dat',access='DIRECT',recl=8000000, - status='OLD') read(21,rec=1) a close(21) . . . open (22, file='output.dat',access='DIRECT',recl=8000000, - status='NEW') write(22,rec=1) a close(22) . . .

FORTRAN direct I/O example 1


real*8 a(40000), b(40000) open (21, file='input.dat',access='DIRECT',recl=320000, - status='OLD') open (22, file='output.dat',access='DIRECT',recl=320000, - status='NEW') . . . do i = 1, n read(21,rec=i) a do j = 1, 40000 b(j) = ... a(j) ... enddo write(22,rec=i) b enddo close(21) close(22) . . .

FORTRAN direct I/O example 2


In roughly decreasing order of importance:

• use large length parameters in read and write calls

– aim for at least 100K– bigger is almost always better

• use binary data formats (i.e. avoid printf, scanf etc.)

• use open, close, read, write, etc (avoid stdio routines like fopen, fclose, fread and fwrite)

• reuse the kernel's I/O buffers:

– if a file which was recently written sequentially is to be read, start at the end and work backwards

Maximizing C and C++ I/O performance


• underlying block size is quite small

– use setbuf or setvbuf to increase buffer sizes

• writing in parallel using fwrite risks data corruption

Although . . .

• stdio is much better than short requests

– e.g. fgetc, fputc

What's wrong with the stdio routines?


• introduction to General Parallel File System (GPFS)

• GPFS in the ECMWF High Performance Computing Facility (HPCF)

• staging data to an HPCF cluster

• retrieving results from an HPCF cluster

• maximizing file I/O performance in FORTRAN

• maximizing file I/O performance in C and C++

Unit summary