Introduction to Hierarchical Data Format (HDF)

Hierarchical Data Format (HDF) is a non-proprietary format designed to handle large sets of complex, multidimensional, heterogeneous data.  It is supported by the HDF Group, a non-profit organization dedicated to long-term development and maintenance of HDF. There are two versions of the format, HDF4 and HDF5.  HDF5 is a simplification of HDF4, with a better defined data model and a much simpler API. I have worked with both formats, and HDF5 is IMO the better choice, unless you have legacy files and code in HDF4.

HDF5 has a very liberal licensing policy; you may distribute it in binary or source form, as long as you acknowledge the HDF Group appropriately (licensing info).  This includes commercial uses.

HDF5 is supported by Matlab, Mathematica, and other commercial and non-commercial products.  It is a C library with bindings for C++, Fortran,  Java, Python, Julia, .Net, R, and others.  This makes it a good format for cross-platform/cross-language exchange/transmission of data. The distribution includes the library, an Java-based HDF viewer, command-line utilities, and source for a test suite.

Internally, HDF5 is very much like a file system structure, with group objects that can hold other groups or data sets.  You can refer to groups using file system like paths: /path/to/stuff.  It’s basically a Big Data/No SQL format.  Data is self-describing, so any application that wants to read an HDF5 file knows exactly what is in each data set.  It has a very flexible metadata/attribute system.  You can easily describe and read/write any data type that meets specific data needs in your environment, and allow anyone to understand that data type later. An HDF5 file is internally indexed by a BSP tree, so you can randomly access groups and data sets without having to read in the entire file first. HDF5 data sets can be laid out in contiguous or “chunked” form.  Chunking is necessary for compression and for creating data sets with extendible dimensions. In other words, you can continue to grow the dimensions of a data set within the file.

This was just a short introduction, but I hope you can see that HDF is a powerful way to store and exchange data, and check it out to see if it can help you with your data storage needs and issues.



Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

This site uses Akismet to reduce spam. Learn how your comment data is processed.