Skip to content

Commit

Permalink
Use IB_MERGE_VFS argument when detecting PCI path
Browse files Browse the repository at this point in the history
When running in a cloud-hypervisor guest, IB VFs are exposed as a
RCiEP. If the IB VFs are merged, NCCL does not correctly detect
PCI topology.
  • Loading branch information
Thomas Barrett authored and bureddy committed Jan 31, 2024
1 parent 090d825 commit 99925f1
Show file tree
Hide file tree
Showing 2 changed files with 3 additions and 1 deletion.
2 changes: 2 additions & 0 deletions include/p2p_plugin.h
Original file line number Diff line number Diff line change
Expand Up @@ -123,6 +123,8 @@ int nccl_p2p_ib_speed(int speed);

int64_t ncclParamSharpMaxComms();

int64_t ncclParamIbMergeVfs();

int ncclIbRelaxedOrderingCapable(void);

nccl_p2p_plugin_t nccl_p2p_get_plugin_type();
Expand Down
2 changes: 1 addition & 1 deletion src/p2p_plugin.c
Original file line number Diff line number Diff line change
Expand Up @@ -385,7 +385,7 @@ ncclResult_t nccl_p2p_ib_pci_path(nccl_ib_dev_t *devs, int num_devs, char* dev_n
// Merge multi-port NICs into the same PCI device
p[strlen(p)-1] = '0';
// Also merge virtual functions (VF) into the same device
p[strlen(p)-3] = '0';
if (ncclParamIbMergeVfs()) p[strlen(p)-3] = p[strlen(p)-4] = '0';
// And keep the real port aside (the ibv port is always 1 on recent cards)
*real_port = 0;
for (int d=0; d<num_devs; d++) {
Expand Down

0 comments on commit 99925f1

Please sign in to comment.