For NOAA and NASA, the data problem with their Joint Polar Satellite System (JPSS) is how to best handle a large volume data stream from the five different instruments on a satellite. JPSS is a new generation of low Earth orbiting satellites that monitor environmental conditions and provide data for long-range weather and climate forecasts.
Here is NOAA's description of one of the instruments from a 2013 news release:
CrIS is the Cross-track Infrared Sounder. CrIS is the first in a series of advanced operational sounders that provides more accurate, detailed atmospheric temperature and moisture observations for weather and climate applications. This high-spectral resolution infrared instrument measures the three-dimensional structure of atmospheric temperatures, water vapor and trace gases. It provides more than 1,000 infrared spectral channels at an improved horizontal spatial resolution and measure temperature profiles with keen vertical resolution to an accuracy approaching 1 Kelvin (the absolute temperature scale). This information helps significantly improve prediction, including both short-term weather “now casting” and longer-term forecasting. It also provides a vital tool for NOAA to continuously take the pulse of the planet and assist in understanding major seasonal and multi-year shifts.
The solution: HDF5 and custom tools.
We know from other projects such as NASA's Earth Science Data and Information System (ESDIS) project that HDF Group software can be used to store large amounts of climate data. ESDIS has in 13 years (as of September 30, 2013) archived in our software 9.8 petabytes of climate data. Each petabyte is a million gigabytes.
How might data be extracted in a timely manner from such a large dataset? A data granule holds the data from a short period of observation by an instrument. Data is stored by granule in HDF5 files. With a custom tool built by The HDF Group, data granules can be aggregated and extracted. This means that only the data for a certain period of time and for a limited location need be retrieved from the data for study. This is much more efficient than having to open the entire file to see any amount of data. Since the archived data is not changed, data granules can be extracted repeatedly, and the data files themselves only need to be downloaded once.
The HDF Group
The software that The HDF Group has developed for the JPSS project is described below.
The HDF Group developers created and currently support the following tools for the JPSS project:
With h5edit, users can edit attributes in an HDF5 file.
With h5augjpss, users can modify a JPSS product file to make it accessible by netCDF-4 based applications.
The nagg tool was created to provide individual users the ability to rearrange product data granules from downloaded files into new files with aggregations or packaging that are better suited as input for a particular application.
This prototype tool allows users to access their data files using different parameters such as chunking sizes, compression methods, access patterns, and chunk cache settings. The tool provides performance statistics to help users to find the optimum parameters to create and sccess their HDF5 files.
JPSS data is distributed in HDF5 files containing raw data and indexing metadata that allows fast access to the raw data. The HDF Group continues to develop software libraries and tools to improve access to this data. As part of this effort, The HDF Group has created a library of C and Fortran routines to access and manipulate data referenced by object and region references and to access and manipulate data packed into integer values. We continue to seek feedback from JPSS applications developers and users, as well as from the wider HDF5 community, and will improve this library as requested.
The HDF Group is maintaining HDF5 software on the following systems used by JPSS:
The latest versions of documentation for software developed for the JPSS project are available for download. See the list below.
The latest versions of the software developed for the JPSS project are available for download. See the list below.
The library contains C and Fortran APIs to:
The 1.1.5 release is a minor release. It was tested with HDF5-1.8.17 and HDF5-1.10.0-patch1.
Please see the Release Notes for complete details regarding this release.
HL REGION 1.1.5 Source Code
|Source Code||Unix (tar)||Windows (zip)|
Check-sum (MD5 format) for source tar file
HL REGION 1.1.5 Pre-built Binary Distributions
The pre-built binary distributions in the table below include the HL REGION libraries and include files.
|Linux 2.6 CentOS 6 x86_64||gcc, gfortran 4.4.7 (w/Fortran)|
|Windows (64-bit)||CMake VS 2013 C, gfortran|
|Windows (64-bit)||CMake VS 2015 C, gfortran|
nagg is a tool for aggregating JPSS data granules from existing files into new files with a different number of granules per file or different combinations of compatible products than in the original files. The tool was created to provide individual users the ability to rearrange NPP product data granules from downloaded files into new files with aggregations or packaging that are better suited as input for a particular application.
The 1.6.2 release provides an environment variable to override the limit on the total number of granules processed at run time and adds two new command options:
The tool was tested on Linux 64-bit systems. For more information on this release, see the RELEASE.txt file.
Source code and binaries
(For earlier versions, see: All Releases)
The h5edit tool is a command-line tool to edit HDF5 files. The current version is limited to operations on attributes only. It supports:
The h5edit 1.3.1 release contains the following changes:
See the Release Notes for complete details on this release.
Source code (tgz format)
The h5augjpss tool is designed to modify a JPSS HDF5 product file to be accessible by the netCDF-4 version 4.1.3 library. The tool:
The tool was tested on Linux 32 and 64-bit systems. For more information, see the RELEASE.txt file.
Source code (tgz format)
See the README.txt file in the source code for information on building and running h5augjpss.
The Chunking and Compression Performance (CPP) tool is a prototype designed to help assess the effect of using various file parameters and access patterns on performance and storage.
The tool was tested on Linux 64-bit systems. For more information, see the README.txt file.
Source code (tgz format)