HDF5 performance, such as speed, memory usage, and storage efficiency can be affected by how an HDF5 file is accessed or stored. Listed below are performance issues that can an occur and how to avoid them.
Open objects use up memory. The amount of memory used may be substantial when many objects are left open. You should:
There are APIs to determine if datasets and groups are left open. H5F_GET_OBJ_COUNT will get the number of open objects in the file, and H5F_GET_OBJ_IDS will return a list of the open object identifiers.
The metadata cache can also affect memory usage. Modify the metadata cache settings to minimize the size and growth of the cache as much as possible without decreasing performance.
By default the metadata cache is 2 MB in size, and it can be allowed to increase to a maximum of 32 MB per file. The metadata cache can be disabled or modified. Memory used for the cache is not released until the datasets or file are closed.
See the H5P_SET_CACHE API for setting the cache, as well as the Information on the Metadata Cache FAQ.
There can be a number of issues caused by using chunking inefficiently. Please see the advanced topic, Chunking in HDF5, for detailed information regarding the use of chunking. Some things that may help are listed below:
Also be aware that if a dataset is read by whole chunks and there is no need to access the chunks more than once on the disk, the chunk cache is not needed and can be set to 0 if there is a shortage of memory.
Variable Length: Datasets with variable length datatypes cannot be compressed. Also, frequent editing of datasets with variable length datatypes and closing the file between edits, can leave holes in the file. A workaround is to leave the file open while editing the datasets.
A fixed length dataset that is compressed can be used as an alternative to using a variable length datatype.
Compound datatypes work well with C, but they are slow when using them with Fortran or Java. They are also cumbersome, because you can only read/write data by field in F90 and Java. [It is not possible to pass an array of Fortran structures to a C function in a portable manner. In any case, the Fortran layer has to repack the Fortran array to an array of C structures. The main problem is that Fortran enforces type checking at compilation time and it is impossible to overload the h5dread/write_f function with a datatype that is defined by the user.]