HDF Update
Mike FolkThe HDF Group
HDF and HDF-EOS Workshop XIIAurora, ColoradoOctober 16, 2008
Oct. 16, 2008 1HDF and HDF-EOS Workshop XII
Topics
Oct. 16, 2008 HDF and HDF-EOS Workshop XII 2
Topics
Oct. 16, 2008 HDF and HDF-EOS Workshop XII 3
What’s up with The HDF Group?
Oct. 16, 2008 4HDF and HDF-EOS Workshop XII
Announcement!
Oct. 16, 2008 HDF and HDF-EOS Workshop XII 5
NASA Commits $3.1M to The HDF Group to
Support Earth System Science
NASA Commits …
• “The HDF Group has received a 3-year contract from NASA to provide ongoing development and support for the HDF technologies used by NASA’s Earth Observing System.
• The project continues the relationship that was first established in 1994, when HDF was selected as the standard format for the EOS Data and Information System (EOSDIS).
• Since that time, over 4 petabytes of mission data and derived data products have been stored in HDF4 and HDF5, with an estimated 1.6 million users.
Oct. 16, 2008 6HDF and HDF-EOS Workshop XII
• Under the new contract, The HDF Group will support NASA’s EOS program in five critical areas: Provide user support to EOS data providers and
data consumers Perform software development and quality
assurance Assure long-term access to HDF data Integrate with complementary technologies and
applications Advise follow-on earth systems projects
Oct. 16, 2008 7HDF and HDF-EOS Workshop XII
What is The HDF Group
And why does it exist?
Oct. 16, 2008 HDF and HDF-EOS Workshop XII 8
History of The HDF Group
• 18 Years at University of Illinois National Center for Supercomputing Applications
• Spun-off from University July 2006• Non-profit• 20+ scientific, technology, professional staff• Intellectual property:
The HDF Group owns HDF4 and HDF5 HDF formats and libraries to remain open BSD-type license
Oct. 16, 2008 9HDF and HDF-EOS Workshop XII
The HDF Group Mission To ensure long-term
accessibility of HDF data through sustainable
development and support of HDF technologies.
Oct. 16, 2008 HDF and HDF-EOS Workshop XII 10
Goals
• Maintain, evolve HDF for sponsors and communities that depend on it
• Provide consulting, training, tuning, development, research
• Sustain the group for long term to assure data access over time
Oct. 16, 2008 11HDF and HDF-EOS Workshop XII
The HDF Group Services
• Helpdesk and Mailing Lists Available to all users as a first level of support
• Standard Support Rapid issue resolution support
• Consulting Needs assessment, troubleshooting, design reviews, etc.
• Enterprise Support Coordinating HDF activities across departments
• Special Projects Adapting customer applications to HDF New features and tools, with changes normally incorporated into open
source product Research and Development
• Training Tutorials and hands-on practical experience
Oct. 16, 2008 12HDF and HDF-EOS Workshop XII
Members of the HDF support community
• NASA• Sandia National Laboratory (2)• University of Illinois/NCSA• A leading U.S. aerospace company• NOAA Science Data Stewardship• New projects and partners
A major product lifecycle management company A bioinformatics software company Engineering Research and Development Center –
Topographic Engineering Center NPOESS ITT VIS
Oct. 16, 2008 13HDF and HDF-EOS Workshop XII
Initiatives and areas of increased interest
• Bioinformatics• High performance computing (HPC)• Microsoft products (HPC, .NET, others)• Database integration• Improving concurrency• Performance and storage efficiency• Improving high level language support
14HDF and HDF-EOS Workshop XIIOct. 16, 2008
Topics
Oct. 16, 2008 HDF and HDF-EOS Workshop XII 15
Basic Library Releases
Oct. 16, 2008 16HDF and HDF-EOS Workshop XII
HDF5
HDF4HD
F4
Overview of basic library releases
Oct. 16, 2008 HDF and HDF-EOS Workshop XII 17
HDF5 1.8.0 (Feb 08)
• Major release with file format changes and features.
• File format changes affect backward/forward compatibility with previous releases.
• See "New Features in Release 1.8.0 and Format Compatibility Considerations”
http://hdfgroup.org/HDF5/doc/ADGuide/CompatFormat180.html
Oct. 16, 2008 18HDF and HDF-EOS Workshop XII
HDF5 1.8 minor releases
• 1.8.1 (May 08) A minor release with bug fixes Provided 1.8 full support for Fortran applications Enhanced tools with 1.8.0 features
• HDF5 1.8.2 coming Nov 08 Minor bug fixes Tool enhancements
19HDF and HDF-EOS Workshop XIIOct. 16, 2008
HDF5 1.6 minor releases
• 1.6.7 (Feb 08) Modification to address Aura issue
• 1.6.8 coming Nov 08 Minor bug fixes
Oct. 16, 2008 20HDF and HDF-EOS Workshop XII
Future HDF5 releases (highlights)
• Release HDF5 1.10.0 Performance improvements Some new features Support for Fortran 2003 features Target date November 2009
• When to drop support for 1.6.* ?
Oct. 16, 2008 21HDF and HDF-EOS Workshop XII
HDF 4 minor releases
• 4.2r3 (Feb 08) Improved support for apps using HDF4 and NetCDF3 Improved support for data sets and coordinate
variable with the same names• Release HDF4r2.4 coming Nov 08
Minor bug fixing, tools enhancements Support for C shared libraries Support for 32-bit version on Mac Intel
• http://hdfgroup.org/products/hdf4/
Oct. 16, 2008 22HDF and HDF-EOS Workshop XII
H4-H5 Conversion Software 2.0 (May)
• Re-built with HDF5 1.8.1 and HDF 4.2r3.• Conversion tool h4toh5 enhanced
Converts HDF-EOS2 files to HDF5 files Makes HDF5 files readable by NetCDF4
http://hdfgroup.org/h4toh5/
23HDF and HDF-EOS Workshop XIIOct. 16, 2008
HDF-EOS library
Oct. 16, 2008 HDF and HDF-EOS Workshop XII 24
Oct. 16, 2008 HDF and HDF-EOS Workshop XII 25
HDF-EOS2 and HDF-EOS5
• Auto configuration for HDF-EOS2 and HDF-EOS5 Compile and test libraries with automatic
configuration tools Thank you, Abe!
• Testing of EOS2 and EOS5 Test daily with HDF4 and HDF5 development code Periodically test on EOS-critical platforms
• EOS website support
Topics
Oct. 16, 2008 HDF and HDF-EOS Workshop XII 26
h5check 1.0 (March 2008)
• A validation tool to verify whether an HDF5 file is encoded according to the HDF5 File Format Specification.
• To ensure format integrity and long-term compatibility between versions of the HDF5 library.
• By default, the file is verified against 1.8.x. Can also verify against 1.6.x.
27HDF and HDF-EOS Workshop XIIOct. 16, 2008
Major Improvements for Existing Tools
• Improved handling of large datasets by h5diff, h5repack, hdiff, and hrepack
• Other added capabilities H5import: to import strings H5diff: to deal with NaN values H5dump: to dump objects in requested order H5repack:
• To apply multiple filters to all objects• To add a userblock• To align datasets in file at byte offsets that support
efficient access
Oct. 16, 2008 28HDF and HDF-EOS Workshop XII
In the works: h52jpeg
• Converts datasets in an HDF5 file to a jpeg image.• Prototype available, if you are interested.
Oct. 16, 2008 29HDF and HDF-EOS Workshop XII
Please send us your comments and requests regarding the HDF4 and HDF5 library and tools
Oct. 16, 2008 HDF and HDF-EOS Workshop XII 30
Topics
Oct. 16, 2008 HDF and HDF-EOS Workshop XII 31
HDF Java
• HDF-Java 2.5 release Beta 1 Release Feb 08 Full release planned for Dec. 2008
• HDF5 JNI updated for HDF5 1.8.x with 1.6 flag• Binary for 32-bit Linux and 64-bit Solaris
• Also added daily testing added for hdf-java products
Oct. 16, 2008 HDF and HDF-EOS Workshop XII 32
Also in the pipeline
Oct. 16, 2008 33HDF and HDF-EOS Workshop XII
• Full Java Support for HDF5 1.8.x Add and test new functions in Java wrapper Implement and test new functions in C JNI Use new functions in HDF-Java objects
• Add many new features• Improve performance• Revise HDFView User’s Guide
Topics
Oct. 16, 2008 HDF and HDF-EOS Workshop XII 34
Oct. 16, 2008 HDF and HDF-EOS Workshop XII 3535
Surviving a System Failure
Oct. 16, 2008 HDF and HDF-EOS Workshop XII 3636
Surviving a System Failure in HDF5
• Problem: In the event of an application or system crash, data
in HDF5 files are susceptible to corruption Corruption can occur if structural metadata is being
written when the crash occurs
• Initial Objective: Guarantee an HDF5 file with consistent metadata
can be reconstructed in the event of a crash No guarantee on state of raw data – contains
whatever data made it to disk prior to crash
Oct. 16, 2008 HDF and HDF-EOS Workshop XII 38
HDF5 Metadata Journaling Recovery
RestoredHDF5 File
H5recover Tool
Application crashes
Corrupted HDF5 File
Companion Journal File
Oct. 16, 2008 HDF and HDF-EOS Workshop XII 40
Faster HDF5 Data Appends
Oct. 16, 2008 HDF and HDF-EOS Workshop XII 4141
Fast Data Appends
• Problem: Metadata operations limit the rate at which HDF5 can append data to datasets.
• Solution: new data structure for indexing chunks: Allows constant time extend, shrink and lookup of
chunks in datasets with single unlimited dimension # of metadata I/O operations to append to dataset
is independent of # of chunks Also allows single-writer/multiple-reader access
• Details at:http://hdfgroup.uiuc.edu/RFC/HDF5/ReviseChunks/
Oct. 16, 2008 HDF and HDF-EOS Workshop XII 42
HDF Performance Framework
A framework for performance regression testing
Oct. 16, 2008 HDF and HDF-EOS Workshop XII 43
HDF Performance Framework
• A tool for Testing on multiple platforms Testing different versions Long term regression testing Assistance in debugging
• New for 1.8: API and format versioning Improved reporting interfaces
• Future related work Quality monitoring of the software, such as code
coverage, memory usage
Other library work
Oct. 16, 2008 HDF and HDF-EOS Workshop XII 44
Library Features• Improved external link support
External link: link to HDF5 object in another file Can more easily specify path lookup of external
files Adding external link support for h5ls and h5dump
• Time datatype improvements Expand time type to support native formats better Adapt tools to display them properly
• Port to OpenVMS (limited support)
Oct. 16, 2008 45HDF and HDF-EOS Workshop XII
Oct. 16, 2008 HDF and HDF-EOS Workshop XII 46
• Faster file free-space management while file open• Many transactions can create many holes• Free space management recovers unused space• Up to 38x improvement in experiments
• Direct I/O: file I/O goes directly between application and storage, bypassing operating system read and write caches
• Disabling automatic metadata cache flushing In experiments, direct I/O combined with metadata
cache disabling improved I/O speed by about 2x.
Improving performance
Topics
Oct. 16, 2008 HDF and HDF-EOS Workshop XII 47
Remote access
Oct. 16, 2008 HDF and HDF-EOS Workshop XII 48
Three “remote access” projects
• HDF5-OPeNDAP handler See talk by Kent Yang: “HDF5 OPeNDAP project
update and demo”• HDF5-iRODS integration
See Peter Cao’s talk Thursday: “HDF5 iRODS”• Accessing HDF5 through SSHFS-FUSE
Oct. 16, 2008 49HDF and HDF-EOS Workshop XII
Accessing HDF5 through SSHFS-FUSE
• Access to files on remote NFS system limited• Combining FUSE (Filesystem in Userspace) with SSHFS
(Secure Shell File System) FUSE provides application with local view of remote file system
• Another way to mount remote file system SSHFS allows the local file system to access parts of remote
file.• e.g., “read” operation on the remote filesystem can be served
through SSH• Subsetting can be efficiently done with SSHFS
• Extract a dataset (5 MB) from a 96 MB HDF5 file Download whole file + subset locally: 9.85 seconds Subset with SSHFS: 0.47 seconds
• Technical report in the works
Oct. 16, 2008 50HDF and HDF-EOS Workshop XII
HDF4 Layout Map Project
• Problem Long-term readability of HDF data dependent on
long-term availability of HDF software• Proposed solution
Create a map of the layout of data objects in an HDF file, allowing a simple reader to be written to access the data
• See today’s talk by Folk and Duerr: “Ensuring Long Term Access to Remotely Sensed HDF4 Data with Layout Maps.”
Oct. 16, 2008 51HDF and HDF-EOS Workshop XII
Oct. 16, 2008 HDF and HDF-EOS Workshop XII 5252
HDF and .NET Framework
• Prototype .NET wrappers for HDF5 1.8.0 Based on subset of HDF5 C routines
• Released in March, 2008• Unsupported
Considerable interest, but currently no funding to support or maintain
Use hdf-forum email list for questions
Oct. 16, 2008 HDF and HDF-EOS Workshop XII 53
netCDF-4Released June 2008!!
Oct. 16, 2008 HDF and HDF-EOS Workshop XII 54
Investigation of HDF Support in Some Open
Source Software Packages
Five open source packages
• PyHDF Python interface to HDF4 http://pysclint.sourceforge.net/pyhdf/
• Geospatial Data Abstraction Library (GDAL) Translator library for Raster Geospatial Data Formats Supports about 100 file formats http://gdal.org/
• NCAR Common Language (NCL) Interpreted Language for Data Analysis and Visualization http://ncl.ucar.edu/
• Grid Analysis and Display System (GrADS) Interpreted Language for Data Analysis and Visualization http://iges.org/grads/
• GNU Data Language (GDL) Interpreted Language for Data Analysis and Visualization Data Analysis and Visualization http://gnudatalanguage.sourceforge.net/
Oct. 16, 2008 55HDF and HDF-EOS Workshop XII
Evaluation criteria
• Formats HDF4, HDF5, netCDF Objects supported in each language
• Installation Availability of binaries Other requirements
• Adequacy of documentation
• Technical report available soon.
Oct. 16, 2008 56HDF and HDF-EOS Workshop XII
Windows Virtualization
Motivation: high cost of maintaining many different
Windows configurations
Oct. 16, 2008 HDF and HDF-EOS Workshop XII 5757
Maintenance & Testing with VMWare
• Multiple virtual machines run in parallel• Only relevant software installed• Each represents a supported configuration• Run nightly tests of HDF4, HDF5• Each is powered on, tested, cleaned
automatically
• Technical report available soon.
Oct. 16, 2008 58HDF and HDF-EOS Workshop XII
HDF5 Data Transform Pilot Study
• Tools for Flight Test Data• Framework to define and apply transformations
to data being read• Transformations specified in Python
Oct. 16, 2008 59HDF and HDF-EOS Workshop XII
Science Data Stewardship
• Goal: migrate data to a single standards-based archive format.
• Approach: investigate how to store NASA ECS data and metadata in HDF5 Archival Information Packages (AIP).
• See talk by Yang, Duerr et al: “Using HDF5 Archive Information Package to preserve HDF-EOS2 data”
Oct. 16, 2008 60HDF and HDF-EOS Workshop XII
Thank You Alland
Thank You NASA!
Oct. 16, 2008 HDF and HDF-EOS Workshop XII 61
Acknowledgements
This report is based upon work supported in part by a Cooperative Agreement with the National Aeronautics and Space Administration (NASA)
under NASA Awards NNX06AC83A and NNX08AO77A.
Any opinions, findings, and conclusions or recommendations expressed in this material are
those of the author(s) and do not necessarily reflect the views of the National Aeronautics and
Space Administration.
Oct. 16, 2008 62HDF and HDF-EOS Workshop XII
Questions/comments?
Oct. 16, 2008 HDF and HDF-EOS Workshop XII 63