-
Notifications
You must be signed in to change notification settings - Fork 427
Issues: openucx/ucx
Error: Transport retry count exceeded on mlx5_0:1/RoCE
#6000
by afernandezody
was closed Feb 1, 2021
Closed
7
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Author
Label
Projects
Milestones
Assignee
Sort
Issues list
Does the UCX support Nividia PXN without extra path when using NVSwitch?
#10293
opened Nov 12, 2024 by
MoFHeka
How to initialize UCP when there are multiple GPUs in one machine and multiple GPU machines in cluster?
#10276
opened Nov 5, 2024 by
MoFHeka
CUDA-Aware UCX with a mixture of CPU-only & GPU Nodes
Bug
#10273
opened Nov 4, 2024 by
judicaelclair
Single network card is very fast, dual network card speed is super slow
Bug
#10249
opened Oct 23, 2024 by
yangrudan
How preallocate buffer through rendezvous protocol before ucp_tag_recv_nbx actually receiving?
#10148
opened Sep 12, 2024 by
MoFHeka
Is there any benchmark of P2P communication between NCCL and UCX(ucp)?
#10147
opened Sep 11, 2024 by
MoFHeka
rocm_ipc_md.c:79 UCX ERROR Failed to create ipc for 0x7f9030c18000/8000
Bug
#10087
opened Aug 25, 2024 by
jinz2014
Address Registration Error in CUDA Aware MPICH 4.2.2 + UCX 1.17.0 Application
#10085
opened Aug 23, 2024 by
cl3to
how does use PCIe peer-to-peer or NVLink between two containers that each have an isolated GPU
#10070
opened Aug 20, 2024 by
linxiaochou
UCX ignores exclusively setting TCP devices when RoCE is available.
Bug
#10049
opened Aug 6, 2024 by
bertiethorpe
How to change single copy VIA xpmem execution to the sender process
#10019
opened Jul 17, 2024 by
arun-chandran-edarath
How to Use Shared Memory for Intra-node Inter-process Data Transfer?
Bug
#10016
opened Jul 16, 2024 by
Clownier
mlx5 connect on mlx5_1 failed: Connection timed out
Bug
#9971
opened Jun 24, 2024 by
shinoharakazuya
UCX blocked after .sendStreamNonBlocking(sendBuffer, new SendCallback(sendBuffer, this));
Bug
#9964
opened Jun 17, 2024 by
pereverges
UCX installation done with OFED doesn't recognize cuda, cuda_cpy etc.
Bug
#9950
opened Jun 12, 2024 by
RamHPC
Performance regression in collectives due to UCX_PROTO_ENABLE
Bug
#9914
opened May 30, 2024 by
angainor
Previous Next
ProTip!
Find all open issues with in progress development work with linked:pr.