You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Describe the bug
We try to run two fi_rma_bw (other bandwidth testcase perform the same) simultaneously, but we found that each testcase can only achieve half bandwidth we expected. As a comparison, we also try to run other case, include one perftest (a RDMA device performance micro-benchmark) / two perftest / one fi_rma_bw / one perfter + one fi_rma_bw and only the last test case results were not as expected. If We run single fi_rma_bw, we can see the result reach the line rate of RDMA NIC.
Can you provide some possible reasons? For example, does libfabric take up the resources of an entire NIC?
To Reproduce
Two servers with two Mellanox CX7 cards each and two NIC is configured as bonding.
One host run two fi_rma_bw as server, one host run two fi_rma_bw as client.
Expected behavior
Each fi_rma_bw can reach ~200Gb/s because single CX7 's line rate is 200Gb/s.
Environment:
provider: verbs or verbs;ofi_rxm or verbs;ofi_rxd
version: libfabric-1.20
Additional context
We've ruled out the possibility of a bottleneck on the network.
The text was updated successfully, but these errors were encountered:
Could you please provide more detail on the setup that you tested on? How is the interface configured and how are the nodes connected? Is it back-to-back connection or connected via switch?
Describe the bug
We try to run two fi_rma_bw (other bandwidth testcase perform the same) simultaneously, but we found that each testcase can only achieve half bandwidth we expected. As a comparison, we also try to run other case, include one perftest (a RDMA device performance micro-benchmark) / two perftest / one fi_rma_bw / one perfter + one fi_rma_bw and only the last test case results were not as expected. If We run single fi_rma_bw, we can see the result reach the line rate of RDMA NIC.
Can you provide some possible reasons? For example, does libfabric take up the resources of an entire NIC?
To Reproduce
Two servers with two Mellanox CX7 cards each and two NIC is configured as bonding.
One host run two fi_rma_bw as server, one host run two fi_rma_bw as client.
Expected behavior
Each fi_rma_bw can reach ~200Gb/s because single CX7 's line rate is 200Gb/s.
Environment:
provider: verbs or verbs;ofi_rxm or verbs;ofi_rxd
version: libfabric-1.20
Additional context
We've ruled out the possibility of a bottleneck on the network.
The text was updated successfully, but these errors were encountered: