The Create Creating a Dataset tutorial topic defines a dataset as a multidimensional array of data elements together with supporting metadata, where:
Chunks are too small.
If a very small chunk size is specified for a dataset it can cause the dataset to be excessively large and it can result in degraded performance when accessing the dataset. The smaller the chunk size the more chunks that HDF5 has to keep track of, and the more time it will take to search for a chunk.
Chunks are too large.
An entire chunk has to be read and uncompressed before performing an operation. There can be a performance penalty for reading a small subset, if the chunk size is subtantially larger than the subset. Also, a dataset may be larger than expected if there are chunks that only contain a small amount of data.
- A chunk does not fit in the Chunk Cache.
Every chunked dataset has a chunk cache associated with it that has a default size of 1 MB. The purpose of the chunk cache is to improve performance by keeping chunks that are accessed frequently in memory so that they do not have to be accessed from disk. If a chunk is too large to fit in the chunk cache, it can significantly degrade performance. However, the size of the chunk cache can be increased by calling H5Pset_chunk_cache.
It is a good idea to:
- Avoid very small chunk sizes, and be aware of the 1 MB chunk cache size default.
- Test the data with different chunk sizes to determine the optimal chunk size to use.
- Consider the chunk size in terms of the most common access patterns that will be used once the dataset has been created.
- Create a Dataset Creation Property list. (See H5Pcreate / h5pcreateH5P_fCREATE)
- Modify the property list.
To use chunked storage layout, call: H5Pset_chunk / h5pset_chunk_f
To use the compact storage layout, call: H5Pset_layout / h5pset_layout_f
- Create a dataset with the modified property list. (See H5Dcreate / h5dcreateH5D_fCREATE)
- Close the property list. (See H5Pclose / h5pcloseH5P_fCLOSE )
For example code, see the HDF5 Examples page. Specifically look at the Examples by API. There are examples for different languages.
The dataset layout is a Dataset Creation Property List. This means that once the dataset has been created the dataset layout cannot be changed. The h5repack utility can be used to write a file to a new with a new layout.