11.2.7. ROCm
ROCm is the name of the software stack used by AMD GPUs. It includes the ROCm Runtime (ROCr), the HIP programming model, and numerous numerical and machine learning libraries tuned for the AMD Instinct accelerators. More information can be found at the following AMD webpages
11.2.7.1. Building Open MPI with ROCm support
ROCm-aware support means that the MPI library can send and receive data from AMD GPU device buffers directly. As of today, ROCm support is available through UCX. While other communication transports might work as well, UCX is the only transport formally supported in Open MPI v5.0.x for ROCm devices.
Since UCX will be providing the ROCm support, it is important to ensure that UCX itself is built with ROCm support.
To see if your UCX library was built with ROCm support, run the following command:
# Check if ucx was built with ROCm support
shell$ ucx_info -v
# configured with: --with-rocm=/opt/rocm --without-knem --without-cuda
If you need to build the UCX library yourself to include ROCm support, please see the UCX documentation for building UCX with Open MPI:
It should look something like:
# Configure UCX with ROCm support
shell$ cd ucx
shell$ ./configure --prefix=/path/to/ucx-rocm-install \
--with-rocm=/opt/rocm --without-knem
# Configure Open MPI with UCX and ROCm support
shell$ cd ompi
shell$ ./configure --with-rocm=/opt/rocm \
--with-ucx=/path/to/ucx-rocm-install \
<other configure params>
11.2.7.2. Checking that Open MPI has been built with ROCm support
Verify that Open MPI has been built with ROCm using the ompi_info(1) command:
# Use ompi_info to verify ROCm support in Open MPI
shell$ ./ompi_info | grep "MPI extensions"
MPI extensions: affinity, cuda, ftmpi, rocm
11.2.7.3. Using ROCm-aware UCX with Open MPI
If UCX and Open MPI have been configured with ROCm support, specifying
the UCX pml component is sufficient to take advantage of the ROCm
support in the libraries. For example, the command to execute the
osu_latency
benchmark from the OSU benchmarks with ROCm buffers
using Open MPI and UCX ROCm support is something like this:
shell$ mpirun -n 2 --mca pml ucx \
./osu_latency D D
Note: some additional configure flags are required to compile the OSU benchmark to support ROCm buffers. Please refer to the UCX ROCm instructions for details.
11.2.7.4. Runtime querying of ROCm support in Open MPI
Starting with Open MPI v5.0.0 MPIX_Query_rocm_support(3) is available as an extension to check
the availability of ROCm support in the library. To use the
function, the code needs to include mpi-ext.h
. Note that
mpi-ext.h
is an Open MPI specific header file.
11.2.7.5. Collective component supporting ROCm device memory
The UCC based collective component in Open MPI can be configured and compiled to include ROCm support.
An example for configure UCC and Open MPI with ROCm is shown below:
# Configure and compile UCC with ROCm support
shell$ cd ucc
shell$ ./configure --with-rocm=/opt/rocm \
--with-ucx=/path/to/ucx-rocm-install \
--prefix=/path/to/ucc-rocm-install
shell$ make -j && make install
# Configure and compile Open MPI with UCX, UCC, and ROCm support
shell$ cd ompi
shell$ ./configure --with-rocm=/opt/rocm \
--with-ucx=/path/to/ucx-rocm-install \
--with-ucc=/path/to/ucc-rocm-install
To use the UCC component in an applicatin requires setting some additional parameters:
shell$ mpirun --mca pml ucx --mca osc ucx \
--mca coll_ucc_enable 1 \
--mca coll_ucc_priority 100 -np 64 ./my_mpi_app