+ All Categories
Home > Documents > HDF vs NetCDF

HDF vs NetCDF

Date post: 09-Oct-2015
Category:
Upload: mikhailjose
View: 28 times
Download: 0 times
Share this document with a friend
Description:
Comparativo entre dois formatos científicos.
27
 Parallel I/O Performance Study and Optimizations with HDF5,  A Scientific Data Package MuQun Yang, Chris tian Chilan, Albert Cheng , Quincey Koziol , Mike Folk, Leon Arber  The HDF Grou p Champaign , IL 61820
Transcript
 The HDF Group
Champaign, IL 61820
HDF5,netCDF,netCDF4
performance comparison
performance comparison
Conclusion
P0 P1 P2 P3
File System
• May achieve good performance
P0 P1 P2 P3
HDF5 Hierarchical file structures
Chunked storage
Parallel I/O through MPI-IO
NetCDF Linear data layout
 
More than one unlimited dimension
Various compression filters
Parallel IO through MPI-IO
Every processor has a noncontiguous selection.
 Access requests are interleaved.
Write operation with 32 processors, each processor selection has 512K rows and 8 columns (32 MB/proc.)
Independent I/O: 1,659.48 s.
Collective I/O: 4.33 s.
P0 P1 P2 P3
Row 1 Row 2
HDF5,netCDF,netCDF4
performance comparison
performance comparison
Conclusion
NCAR Bluesky Power4
LLNL uP Power5
 
Benchmark is the I/O kernel of FLASH.
FLASH I/O generates 3D blocks of size 8x8x8 on Bluesky and 16x16x16 on uP.
Each processor handles 80 blocks and writes them into 3 output files.
The performance metric given by FLASH I/O is the parallel execution time.
The more processors, the larger the problem size.
 
 ASCI White
0
10
20
30
40
50
60
0
500
1000
1500
2000
2500
0
500
1000
1500
2000
2500
0
10
20
30
40
50
60
Bluesky: Power 4 uP: Power 5
 
Data: 60 1D-4D double-precision float and integer arrays
 
0
20
40
60
80
100
120
140
160
0 16 32 48 64 80 96 112 128 144
Number of processors
   B   a   n    d   w    i   d    t   h    (   M    B    /   S    )
PNetCDF collective NetCDF4 collective
• Performance of PnetCDF4 is close to PnetCDF
 
0
50
100
150
200
250
300
0 16 32 48 64 80 96 112 128 144
Number of Processors
   B   a   n    d   w    i   d    t   h    (   M    B    /   S    )
Output size 995 MB Output size 15.5 GB
 
HDF5,netCDF,netCDF4
performance comparison
performance comparison
Conclusion
Performance optimizations: chunked storage
 
• Only one HDF5 IO call • Good for collective IO
 
• Required for extendable data variables • Required for filters • Better subsetting access time
For more information about chunking:
http://hdf.ncsa.uiuc.edu/UG41r3_html/Perform.fm2.html#149138
Performance issue:
chunk IOs
chunk 1 chunk 2 chunk 3 chunk 4
P0
P1
 
Improvement 3:
Multi-chunk IO Optimization Have to keep the option to do collective IO per chunk
Collective IO bugs inside different MPI-IO packages
Limitation of system memory
Problem Bad performance caused by improper use of collective IO
P0 P4P0
make the correct decision about the way to do
collective IO
the decision-making process
Decision-making about
 
and optimization inside HDF5:
HDF5 provides collective IO supports for non-regular selections
Supporting collective IO for chunked storage is not trivial. Users can participate in the decision-making process that selects different IO options.
I/O Performance is quite comparable when parallel NetCDF and parallel HDF5 libraries are used in similar manners.
 
Foundation Teragrid grants, the

Recommended