ECMWF 1 COM HPCF 2004: High performance file I/O
High performance file I/O
Computer User Training Course 2004
Carsten Maaß
User Support
ECMWF 2 COM HPCF 2004: High performance file I/O
Topics
• introduction to General Parallel File System (GPFS)
• GPFS in the ECMWF High Performance Computing Facility (HPCF)
• staging data to an HPCF cluster
• retrieving results from an HPCF cluster
• maximizing file I/O performance in FORTRAN
• maximizing file I/O performance in C and C++
ECMWF 3 COM HPCF 2004: High performance file I/O
• each GPFS file system can be shared simultaneously by multiple nodes
• can be configured for high availability
• performance generally much better than locally attached disks (due to parallel nature of the underlying file system)
• provides Unix file system semantics with very minor exceptions
• GPFS provides data coherency
– modifications to file content by any node are immediately visible to all nodes
The General Parallel File System (GPFS)
ECMWF 4 COM HPCF 2004: High performance file I/O
GPFS does NOT provide full metadata coherency
• the stat system call might return incorrect values for
atime, ctime and mtime (which can result in the ls
command providing incorrect information)
• metadata is coherent if all nodes have sync'd since
the last metadata modification
Use gpfs_stat and/or gpfs_fstat if exact atime, ctime and mtime values are required.
Metadata coherency
ECMWF 5 COM HPCF 2004: High performance file I/O
GPFS supports all 'traditional' Unix file locking mechanisms:
• lockf
• flock
• fcntl
Remember: Unix file locking is advisory, not mandatory
File locking
ECMWF 6 COM HPCF 2004: High performance file I/O
Concept GPFS NFSfile systems shared between multiple nodes
yes yespath names the same across different nodes
yes* yes*data coherent across all nodes yes nodifferent parts of a file can be simultaneously up- dated on different nodes
yes no
high performance yes notraditional Unix file locking semantics yes no
* if configured appropriately
Comparison with NFS
ECMWF 7 COM HPCF 2004: High performance file I/O
GPFS servers
GPFS clients p690128G
p690128G
p69032G
p69032G
p69032G
. .
.dual plane SP Switch 2
(dual ~350MB/sec switch)
I/Onode
I/Onode
I/Onode
I/Onode
p690128G
GPFS in an HPCF cluster
ECMWF 8 COM HPCF 2004: High performance file I/O
• all HPCF file systems are of the type GPFS
• each GPFS file systems will be global
• accessible from any node within a cluster
• GPFS file systems are not shared between the two HPCF clusters
HPCF file system setup (1/2)
ECMWF 9 COM HPCF 2004: High performance file I/O
ECMWF's file system locations are described by environment variables:
HPCF file system setup (2/2)
hpca block
size
[KB]
quota
[MB]
total
size
[GB]
suitable for
$HOME 64 80 8.7 permanent files: sources, .profile, utilities, libs
$TEMP 256 — 1100 temporary files (select/delete is applied)
$TMPDIR 256 — 1100 data to be deleted automatically at the end of a job
• Do not rely on select/delete!
• Clear your disk space as soon as possible!
ECMWF 10 COM HPCF 2004: High performance file I/O
•ecrcp from ecgate to hpca for larger transfers
• NFS to facilitate commands like ls etc. on remote machines
Transferring data to/from an HPCF cluster
depending on source and size of data
ECMWF 11 COM HPCF 2004: High performance file I/O
In roughly decreasing order of importance:
• Use large record sizes
– aim for at least 100K
– multi-megabyte records are even better
• use FORTRAN unformatted instead of formatted files
• use FORTRAN direct files instead of sequential
• reuse the kernel's I/O buffers:
– if a file which was recently written sequentially is to be read, start at the end and work backwards
Maximizing FORTRAN I/O performance
ECMWF 12 COM HPCF 2004: High performance file I/O
• each read call will typically result in:
– sending a request message to an I/O node
– waiting for the response
– receiving the data from the I/O node
• the time spent waiting for the response will be at least a few milliseconds regardless of the size of the request
• data transfer rates for short requests can be as low as a few hundred thousand bytes per second
• random access I/O on short records can be even slower!!!
What's wrong with short record sizes?
Reminder: At the HPCF's clock rate of 1.3 GHz, one millisecond spent waiting for data wastes over 1,000,000 CPU cycles!
ECMWF 13 COM HPCF 2004: High performance file I/O
real*8 a(1000,1000) . . . open (21, file='input.dat',access='DIRECT',recl=8000000, - status='OLD') read(21,rec=1) a close(21) . . . open (22, file='output.dat',access='DIRECT',recl=8000000, - status='NEW') write(22,rec=1) a close(22) . . .
FORTRAN direct I/O example 1
ECMWF 14 COM HPCF 2004: High performance file I/O
real*8 a(40000), b(40000) open (21, file='input.dat',access='DIRECT',recl=320000, - status='OLD') open (22, file='output.dat',access='DIRECT',recl=320000, - status='NEW') . . . do i = 1, n read(21,rec=i) a do j = 1, 40000 b(j) = ... a(j) ... enddo write(22,rec=i) b enddo close(21) close(22) . . .
FORTRAN direct I/O example 2
ECMWF 15 COM HPCF 2004: High performance file I/O
In roughly decreasing order of importance:
• use large length parameters in read and write calls
– aim for at least 100K– bigger is almost always better
• use binary data formats (i.e. avoid printf, scanf etc.)
• use open, close, read, write, etc (avoid stdio routines like fopen, fclose, fread and fwrite)
• reuse the kernel's I/O buffers:
– if a file which was recently written sequentially is to be read, start at the end and work backwards
Maximizing C and C++ I/O performance
ECMWF 16 COM HPCF 2004: High performance file I/O
• underlying block size is quite small
– use setbuf or setvbuf to increase buffer sizes
• writing in parallel using fwrite risks data corruption
Although . . .
• stdio is much better than short requests
– e.g. fgetc, fputc
What's wrong with the stdio routines?
ECMWF 17 COM HPCF 2004: High performance file I/O
• introduction to General Parallel File System (GPFS)
• GPFS in the ECMWF High Performance Computing Facility (HPCF)
• staging data to an HPCF cluster
• retrieving results from an HPCF cluster
• maximizing file I/O performance in FORTRAN
• maximizing file I/O performance in C and C++
Unit summary