-
- Downloads
Test newer versions of OpenMPI 4.X.X series
May have observed the effects of a bug in older versions of OpenMPI 4.0.X series when attempting to run a single-node HPL calculation on Expanse with the Singularity.hpl-2.3-ubuntu-18.04-openmpi-4.0.4-openblas-0.3.14 container. Single-node job fails with this set of PMIX errors [1] at startup. This issue appears to have been observed previously [2] [3] [4]. Unfortunately, the suggested temporary solutions to set PMIX_MCA_gds=^ds21 or PMIX_MCA_gds=hash do not work. However, it seems like the bug causing the problem should be fixed in the latest releases of the OpenMPI 4.X.X series. Hence, the new Ubuntu 18.04 + OpenMPI 4.0.5 and Ubuntu 18.04 + OpenMPI 4.1.0 definitions files. [1] [exp-8-32:06710] PMIX ERROR: NOT-FOUND in file dstore_base.c at line 2866 [exp-8-32:06710] PMIX ERROR: NOT-FOUND in file server/pmix_server.c at line 3408 [exp-8-32:06742] PMIX ERROR: OUT-OF-RESOURCE in file client/pmix_client.c at line 231 [exp-8-32:06742] OPAL ERROR: Error in file pmix3x_client.c at line 112 *** An error occurred in MPI_Init *** on a NULL communicator *** MPI_ERRORS_ARE_FATAL (processes in this communicator will now abort, *** and potentially your MPI job) [exp-8-32:06742] Local abort before MPI_INIT completed completed successfully, but am not able to aggregate error messages, and not able to guarantee that all other processes were killed! -------------------------------------------------------------------------- Primary job terminated normally, but 1 process returned a non-zero exit code. Per user-direction, the job has been aborted. -------------------------------------------------------------------------- -------------------------------------------------------------------------- mpirun detected that one or more processes exited with non-zero status, thus causing the job to be terminated. The first process to do so was: Process name: [[43048,1],0] Exit code: 1 -------------------------------------------------------------------------- [exp-8-32:06710] PMIX ERROR: ERROR in file gds_ds21_lock_pthread.c at line 99 [exp-8-32:06710] PMIX ERROR: ERROR in file gds_ds21_lock_pthread.c at line 99 [2] https://github.com/open-mpi/ompi/issues/6761 [3] https://github.com/open-mpi/ompi/issues/6981 [4] https://github.com/open-mpi/ompi/issues/7516
Showing
- definition-files/ubuntu/Singularity.ubuntu-18.04-openmpi-4.0.5 134 additions, 0 deletions...ition-files/ubuntu/Singularity.ubuntu-18.04-openmpi-4.0.5
- definition-files/ubuntu/Singularity.ubuntu-18.04-openmpi-4.1.0 134 additions, 0 deletions...ition-files/ubuntu/Singularity.ubuntu-18.04-openmpi-4.1.0
Please register or sign in to comment