4.15. Advice for packagers
4.15.1. Do not use Open MPI’s internal dependent libraries
The Open MPI community strongly suggests that binary Open MPI packages should not include Hwloc, Libevent, PMIx, or PRRTE. Although several of these libraries are required by Open MPI (and are therefore bundled in the Open MPI source code distribution for end-user convenience), binary Open MPI packages should limit themselves solely to Open MPI artifacts. Specifically: ensure to configure and build Open MPI against external installations of these required packages.
Packagers may therefore wish to configure Open MPI with something like the following:
# Install Sphinx so that Open MPI can re-build its docs with the
# installed PRRTE's docs
virtualalenv venv
. ./venv/bin/activate
pip install docs/requirements.txt
./configure --with-libevent=external --with-hwloc=external \
--with-pmix=external --with-prrte=external ...
Important
Note the installation of the Sphinx tool so that Open MPI can re-build its documentation with the external PRRTE’s documentation.
Failure to do this will mean Open MPI’s documentation will be correct for the version of PRRTE that is bundled in the Open MPI distribution, but may not be entirely correct for the version of PRRTE that you are building against.
The external
keywords will force Open MPI’s configure
to
ignore all the bundled libraries and only look for external versions
of these support libraries. This also has the benefit of causing
configure
to fail if it cannot find the required support libraries
outside of the Open MPI source tree — a good sanity check to
ensure that your package is correctly relying on the
independently-built and installed versions.
See this section for more
information about the required support library --with-FOO
command
line options.
4.15.2. Have Sphinx installed
Since you should be (will be) installing Open MPI against an external
PRRTE and PMIx, you should have Sphinx installed before running Open MPI’s
configure
script.
This will allow Open MPI to (re-)build its documentation according to the PMIx and PRRTE that you are building against.
To be clear: the Open MPI distribution tarball comes with pre-built documentation — rendered in HTML and nroff — that is suitable for the versions of PRRTE and PMIx that are bundled in that tarball.
However, if you are building Open MPI against not-bundled versions of
PRRTE / PMIx (as all packagers should be), Open MPI needs to re-build
its documentation with specific information from those external PRRTE
/ PMIx installs. For that, you need to have Sphinx installed before
running Open MPI’s configure
script.
4.15.3. Components (“plugins”): static or DSO?
Open MPI contains a large number of components (sometimes called “plugins”) to effect different types of functionality in MPI. For example, some components effect Open MPI’s networking functionality: they may link against specialized libraries to provide highly-optimized network access.
Open MPI can build its components as Dynamic Shared Objects (DSOs) or statically included in core libraries (regardless of whether those libraries are built as shared or static libraries).
Note
As of Open MPI head of development, configure
’s global default is
to build all components as static (i.e., part of the Open
MPI core libraries, not as DSOs). Prior to Open MPI v5.0.0,
the global default behavior was to build most components as
DSOs.
4.15.3.1. Why build components as DSOs?
There are advantages to building components as DSOs:
Open MPI’s core libraries — and therefore MPI applications — will have very few dependencies. For example, if you build Open MPI with support for a specific network stack, the libraries in that network stack will be dependencies of the DSOs, not Open MPI’s core libraries (or MPI applications).
Removing Open MPI functionality that you do not want is as simple as removing a DSO from
$libdir/open-mpi
.
4.15.3.2. Why build components as part of Open MPI’s core libraries?
The biggest advantage to building the components as part of Open MPI’s core libraries is when running at (very) large scales when Open MPI is installed on a network filesystem (vs. being installed on a local filesystem).
For example, consider launching a single MPI process on each of 1,000 nodes. In this scenario, the following is accessed from the network filesystem:
The MPI application
The core Open MPI libraries and their dependencies (e.g.,
libmpi
)Depending on your configuration, this is probably on the order of 10-20 library files.
All DSO component files and their dependencies
Depending on your configuration, this can be 200+ component files.
If all components are physically located in the libraries, then the third step loads zero DSO component files. When using a networked filesystem while launching at scale, this can translate to large performance savings.
Note
If not using a networked filesystem, or if not launching at scale, loading a large number of DSO files may not consume a noticeable amount of time during MPI process launch. Put simply: loading DSOs as indvidual files generally only matters when using a networked filesystem while launching at scale.
4.15.3.3. Direct controls for building components as DSOs or not
Open MPI head of development has two configure
-time defaults regarding the
treatment of components that may be of interest to packagers:
Open MPI’s libraries default to building as shared libraries (vs. static libraries). For example, on Linux, Open MPI will default to building
libmpi.so
(vs.libmpi.a
).Note
See the descriptions of
--disable-shared
and--enable-static
in this section for more details about how to change this default.Also be sure to see this warning about building static apps.
Open MPI will default to including its components in its libraries (as opposed to being compiled as dynamic shared objects, or DSOs). For example,
libmpi.so
on Linux systems will contain the UCX PML component, instead of the UCX PML being compiled intomca_pml_ucx.so
and dynamically opened at run time viadlopen(3)
.Note
See the descriptions of
--enable-mca-dso
and--enable-mca-static
in this section for more details about how to change this defaults.
A side effect of these two defaults is that all the components
included in the Open MPI libraries will bring their dependencies with
them. For example (on Linux), if the XYZ PML component in the MPI
layer requires libXYZ.so
, then these defaults mean that
libmpi.so
will depend on libXYZ.so
. This dependency will
likely be telegraphed into the Open MPI binary package that includes
libmpi.so
.
Conversely, if the XYZ PML component was built as a DSO, then —
assuming no other parts of Open MPI require libXYZ.so
—
libmpi.so
would not be dependent on libXYZ.so
. Instead, the
mca_pml_xyz.so
DSO would have the dependency upon libXYZ.so
.
Packagers can use these facts to potentially create multiple binary
Open MPI packages, each with different dependencies by, for example,
using --enable-mca-dso
to selectively build some components as
DSOs and leave the others included in their respective Open MPI
libraries.
See the section on building accelerator support for a practical example where this can be useful.
4.15.3.4. GNU Libtool dependency flattening
When compiling Open MPI’s components statically as part of Open MPI’s core libraries, GNU Libtool — which is used as part of Open MPI’s build system — will attempt to “flatten” dependencies.
For example, the ompi_info(1) command links
against the Open MPI core library libopen-pal
. This library will
have dependencies on various HPC-class network stack libraries. For
simplicity, the discussion below assumes that Open MPI was built with
support for Libfabric and UCX, and therefore libopen-pal
has direct
dependencies on libfabric
and libucx
.
In this scenario, GNU Libtool will automatically attempt to “flatten”
these dependencies by linking ompi_info(1)
directly to libfabric
and libucx
(vs. letting libopen-pal
pull the dependencies in at run time).
In some environments (e.g., Ubuntu 22.04), the compiler and/or linker will automatically utilize the linker CLI flag
-Wl,--as-needed
, which will effectively cause these dependencies to not be flattened: ompi_info(1) will not have a direct dependencies on eitherlibfabric
orlibucx
.In other environments (e.g., Fedora 38), the compiler and linker will not utilize the
-Wl,--as-needed
linker CLI flag. As such, ompi_info(1) will show direct dependencies onlibfabric
andlibucx
.
Just to be clear: these flattened dependencies are not a problem. Open MPI will function correctly with or without the flattened dependencies. There is no performance impact associated with having — or not having — the flattened dependencies. We mention this situation here in the documentation simply because it surprised some Open MPI downstream package managers to see that ompi_info(1) in Open MPI head of development had more shared library dependencies than it did in prior Open MPI releases.
If packagers want ompi_info(1) to not have these flattened dependencies, use either of the following mechanisms:
Use
--enable-mca-dso
to force all components to be built as DSOs (this was actually the default behavior before Open MPI v5.0.0).Add
LDFLAGS=-Wl,--as-needed
to theconfigure
command line when building Open MPI.Note
The Open MPI community specifically chose not to automatically utilize this linker flag for the following reasons:
Having the flattened dependencies does not cause any correctness or performance problems.
There’s multiple mechanisms (see above) for users or packagers to change this behavior, if desired.
Certain environments have chosen to have — or not have — this flattened dependency behavior. It is not Open MPI’s place to override these choices.
In general, Open MPI’s
configure
script only utilizes compiler and linker flags if they are needed. All other flags should be the user’s / packager’s choice.
4.15.3.5. Building accelerator support as DSOs
If you are building a package that includes support for one or more accelerators, it may be desirable to build accelerator-related components as DSOs (see the static or DSO? section for details).
Rationale
Accelerator hardware is expensive, and may only be present on some compute nodes in an HPC cluster. Specifically: there may not be any accelerator hardware on “head” or compile nodes in an HPC cluster. As such, invoking Open MPI commands on a “head” node with an MPI that was built with static accelerator support but no accelerator hardware may fail to launch because of run-time linker issues (because the accelerator hardware support libraries are likely not present).
Building Open MPI’s accelerator-related components as DSOs allows Open MPI to try opening the accelerator components, but proceed if those DSOs fail to open due to the lack of support libraries.
Use the --enable-mca-dso
command line parameter to Open MPI’s
configure
command can allow packagers to build all
accelerator-related components as DSO. For example:
# Build all the accelerator-related components as DSOs (all other
# components will default to being built in their respective
# libraries)
shell$ ./configure --enable-mca-dso=btl-smcuda,rcache-rgpusm,rcache-gpusm,accelerator
Per the example above, this allows packaging $libdir
as part of
the “main” Open MPI binary package, but then packaging
$libdir/openmpi/mca_accelerator_*.so
and the other named
components as sub-packages. These sub-packages may inherit
dependencies on the CUDA and/or ROCM packages, for example. The
“main” package can be installed on all nodes, and the
accelerator-specific subpackage can be installed on only the nodes
with accelerator hardware and support libraries.