10.1. OpenFabrics Interfaces (OFI) / Libfabric support
10.1.1. What is OFI / Libfabric?
“OFI” stands for the OpenFabrics Interfaces, which are implemented in the
library. These two terms are typically used interchangeably.
Open MPI supports many different underlying networks via Libfabric, including (but not limited to):
Cornelius Networks Omni-Path
In general, the OFI-based components in Open MPI will auto-select themselves as appropriate at run time.
That being said, additional questions are available in this FAQ section to provide more information about specific OFI-based network types and support.
10.1.2. What are the Libfabric (OFI) components in Open MPI?
Open MPI has three main components for Libfabric (a.k.a., OFI) communications:
ofiMTL: Available since Open MPI v1.10, this component is used with the
cmPML and is used for two-sided MPI communication (e.g.,
ofiMTL requires that the Libfabric provider support reliable datagrams with ordered tagged messaging (specifically:
ofiBTL: Available since Open MPI v4.0.0, this component is used for one-sided MPI communications (e.g.,
ofiBTL requires that the Libfabric provider support reliable datagrams, RMA and atomic operations, and remote atomic completion notifications (specifically:
usnicBTL: This BTL is used exclusively with Cisco usNIC-based networks. It will auto-select itself over the other OFI-based components when run with Cisco usNIC-based networks.
See each Lifabric provider man page (e.g., fi_sockets(7)) to understand which
provider will work for each of the above-listed Open MPI components. Some
providers may require to be used with one of the Libfabric utility providers;
for example, the verbs provider needs to be paired with utility provider
ofi_rxm to provide reliable datagram endpoint support (
Both components have MCA parameters to specify the Libfabric provider(s) that will be included/excluded in the selection process. For example:
shell$ mpirun --mca pml cm --mca mtl ofi --mca mtl_ofi_provider_include psm2 mpi_hello
In addition, each component has specific parameters for each one; see
ompi_info --param <framework> <component> -level 9 for a full
list. For example:
shell$ ompi_info --param mtl ofi --level 9
For more information refer to the Libfabric web site.
10.1.3. Ompi-Path: How can the multi-rail settings be adjusted if multiple HFI (Host Fabric Interface) cards are installed on the system?
Multi-Rail feature allows a process to use multiple HFIs to transfer a message to improve message bandwidth. The PSM2 library handles the support for multi-rail which is off by default. The multi-rail settings can be modified using the following environment variables:
PSM2_MULTIRAIL=[0,1,2] ]: 0=Disabled, 1=Enable across all HFIs in the system, 2=Enable multi-rail within a NUMA node.
The variables above may be included in the
mpirun command line or in
the environment. For example:
shell$ mpirun -mca mtl [psm2|ofi] -x PSM2_MULTIRAIL=1 -n 2 -H host1,host2 ./a.out
When using the OFI MTL, please ensure that the PSM2 OFI provider is used for communication with OPA devices.
10.1.4. Omni-Path: What is Multi-HFI support in PSM2 and how does it differ from multi-rail?
Multi-HFI support is intended to describe the use of multiple HFIs in a system among MPI ranks local to a node in order to load-balance the hardware resources. It differs from the Multi-Rail feature, which is intended to allow a single process to use all HFIs in the system. For an MPI job with multiple processes on a single node, the default PSM2 behavior depends on the affinity settings of the MPI process. The PSM2 library defaults to using the HFI (Host Fabric Interface) that is in the same NUMA node as that of the MPI process.
Users can restrict access to a single HFI using the environment variable:
HFI_UNIT=N: valid values of N are 0,1,2 and 3
More details can be found on the PSM2 Programmer’s Guide and the Omni-Path Fabric Performance Tuning Guide.
Please also see the full Omni-Path documentation for more details.