Page tree

OpenMPI Build Issues

This page documents the current status of HDF5 tested with different versions of OpenMPI. We strongly recommend using the latest version of OpenMPI when possible. Please report any problems you find that are not documented on this page to the Service Desk.

HDF5 and OpenMPI

The HDF Group has tested HDF5-1.10.5 with various versions of OpenMPI. As a result of the efforts of both The HDF Group and our dedicated users, we have discovered the following issues to be aware of:

With OpenMPI 2.x through 3.1.3, as well as 4.0.0, potential data corruption and crashes have been observed due to bugs within OMPIO. These problems have generally been resolved by switching to the ROMIO I/O backend for the time being by supplying the command-line option “–mca io ompio” to mpirun. For more information, refer to 5

With OpenMPI 1.10 and the 3.0 series using the ROMIO I/O backend, crashes related to datatype flattening have been observed in the “t_filters_parallel” test on various Linux machines. Switching to the OMPIO I/O backend by adding “–mca io ompio” to mpirun has been sufficient to resolve these crashes. However, for OpenMPI 1.10 test failures still occur in “t_filters_parallel” due to a bug in MPI_Mprobe.

With OpenMPI 1.10 and 2.0.0 through 2.1.4, test failures have been observed in ‘testphdf5’ and ‘t_bigio’ due to a MPIO file driver write failure. As of OpenMPI 2.1.5 and 3.1.0, these tests appear to pass without problems.

With OpenMPI 3.0.0 through 3.0.2, test failures have been observed in ‘testphdf5’ and ‘t_shapesame’ due to a MPIO file driver write failure. As of OpenMPI 3.0.3, these tests appear to pass without problems.

Where possible, we recommend that users update to the latest stable version of OpenMPI within a given series and ideally to the latest series available. Specifically (and as one might expect), the previous points show that we have found the best compatibility with HDF5 using OpenMPI 2.1.5 (2.1.6 has not yet been tested), 3.0.3, 3.1.3 and 4.0.0. While the data corruption issues discovered with OpenMPI 3.1.3 and 4.0.0 are serious enough to potentially warrant holding off on such an upgrade, the OpenMPI team has been made aware of the issues and they can be worked around in the meantime by switching to the ROMIO I/O backend.

Parallel HDF5

As of HDF5-1.10.5, Parallel HDF5 is supported with OpenMPI.

Please note that OpenMPI 3.0 has an issue handling datasets greater than 2 GB. There is a patch available for the issue. If you are still encountering issues with HDF5 and you are using a later version of OpenMPI or have applied this patch, then please try HDF5-1.10.6 (currently in "develop").

Running "make check", the t_bigio fails

When building HDF5 with OpenMPI 1.10.x, the t_bigio test will fail.  Users should update to a more recent version of OpenMPI to resolve the the issue.

The issue is due to a bug in the OpenMPI MPI datatype code. This bug was fixed in the latest versions of OpenMPI 2.1.x, 3.0.x, 3.1.x, and 4.0.x.

Following are the errors that occur if the tests fail with this issue:

MPI tests finished with no errors

0.87user 1.38system 0:01.21elapsed 186%CPU (0avgtext+0avgdata 123696maxresident)k

0inputs+128outputs (7major+84441minor)pagefaults 0swaps
Finished testing t_mpi
make[4]: Leaving directory /home/users/ntu/juntao00/hdf5-1.10.4/testpar' make[4]: Entering directory/home/users/ntu/juntao00/hdf5-1.10.4/testpar'
Testing t_bigio
t_bigio Test Log
Testing Dataset1 write by ROW
Testing Dataset2 write by COL
Testing Dataset3 write select ALL proc 0, NONE others
Testing Dataset4 write point selection
MPI_ABORT was invoked on rank 1 in communicator MPI_COMM_WORLD
with errorcode 1.
NOTE: invoking MPI_ABORT causes Open MPI to kill all MPI processes.
You may or may not see output from other processes, depending on
exactly when Open MPI kills them.
[ntu02:26084] 2 more processes have sent help message help-mpi-api.txt / mpi-abort
[ntu02:26084] Set MCA parameter "orte_base_help_aggregate" to 0 to see all help / error messages
13.80user 3.49system 0:05.98elapsed 289%CPU (0avgtext+0avgdata 804020maxresident)k
0inputs+112outputs (7major+376166minor)pagefaults 0swaps
make[4]: *** [t_bigio.chkexe_] Error 1

Compiler complains about missing MPI calls

Please be aware that releases prior to HDF5-1.10.5 used MPI functions that were deprecated since MPI 2.0. With OpenMPI 4.0, those deprecated functions were deleted. This caused the compiler to complain about missing MPI calls. An example of APIs that have been deprecated are:  MPI_Type_extent, MPI_Address and MPI_Type_struct

To work around this issue build OpenMPI with MPI-1 backward compatibility. Otherwise the compiler will complain about the missing MPI calls.

The parallel flush test fails

Prior to HDF5-1.10.5 the parallel flush test failed. This failure can be ignored. Redirect the "make check" output to a file and use the "i" option to ignore errors:

make -i check >& check.output

Then view the output file (check.output) for any errors that occurred. If this is the only error, then you can ignore it.