The purpose of this page is to briefly describe the new HDF5 Virtual Dataset (VDS) feature and provide a gateway to available documentation. The page includes the following sections:
With a growing amount of data in HDF5, the need has emerged to access data stored across HDF5 files using standard HDF5 objects, such as groups and datasets, without rewriting or rearranging the data.
While the ability to build hierarchical structures across existing HDF5 files has been available in HDF5 for quite some time through the mounting and external link features, the ability has not existed to present data stored in several HDF5 datasets and files as a single HDF5 dataset and to access the data via HDF5 APIs without rewriting and rearranging the data.
To address this, The HDF Group has implemented a new feature called the HDF5 Virtual Dataset (VDS).
The feature is a logical next step in the development of HDF5 that enables HDF5 users to access and work with data stored in a collection of HDF5 files using well-known tools and existing HDF5 applications and higher-level libraries such as h5py, MATLAB, and IDL without changing the way the data is collected and stored.
The following examples illustrate situations that will benefit from use of virtual datasets:
HDF5 VDS User’s Guide (This document is not yet available.)
Until the HDF5 VDS User's Guide becomes available, users may find the following resources helpful:
RFC: HDF5 Virtual Dataset (PDF) Includes several sections illustrating the use of virtual datasets (VDS) and discussing the VDS programming model, some feature constraints, and several use cases.
Note: The current version of this document reflects the design, strategies, and general approach employed in the VDS feature, but the API implementation had to be modified from the specification. An expected update will correct this divergence.
| ||Sets the mapping between virtual and source datasets|
| ||Retrieves the number of mappings for the virtual dataset|
| ||Retrieves a dataspace identifier for the selection within the virtual dataset used in the mapping|
| ||Retrieves a dataspace identifier for the selection within the source dataset used in the mapping|
| ||Retrieves the name of a source dataset used in the mapping|
| ||Retrieves the filename of a source dataset used in the mapping|
| ||Sets maximum number of missing source files and/or datasets with printf-style names when getting the extent of an unlimited virtual dataset|
| ||Returns maximum number of missing source files and/or datasets with printf-style names when getting the extent for an unlimited virtual dataset|
| ||Sets the view of the virtual dataset to include or exclude missing mapped elements|
| ||Retrieves the view of a virtual dataset|
| ||Determines whether a hyperslab selection is regular|
| ||Retrieves a regular hyperslab selection|
| ||Specifies the layout to be used for a dataset|
| ||Retrieves the layout in use for a dataset|
The following additional documentation will be posted as it becomes available:
No new tools are necessary to examine or manipulate virtual datasets. Where necessary, existing HDF5 tools have been updated to be aware of the new properties, but tool operations on virtual datasets will be essentially transparent to the user.
The Virtual Dataset design document below describes feature requirements, how the feature works, and why design choices were made.
| RFC: HDF5 Virtual Dataset (PDF)||This document describes requirements that guided development of the Virtual Dataset (VDS) feature, feature constraints, several use cases, the VDS programming model, and some details of the implementation.|
This document contains useful illustrations that provide an intuitive understanding of virtual datasets.
Note: The current version reflects the design, strategies, and general approach employed in the VDS feature, but the API implementation had to be modified from the specification. An expected update will correct this divergence.