You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Describe the bug
A clear and concise description of what the bug is.
prov/verbs: one side trigger fi_connect and fi_eq_sread,it cost about 18.5s when the other side recv fi_eq_sread connect msg,Why?
By the way,how to set the interval of reconnect?tks
To Reproduce
Steps to reproduce the behavior:
1,prov/verbs: one side trigger fi_connect and fi_eq_sread
2, After 18.5s,the other side recv fi_eq_sread connect msg.
Expected behavior
If needed, a clear and concise description of what you expected to happen.
Output
If applicable, add output to help explain your problem. (e.g. backtrace, debug logs)
Environment:
OS (if not Linux), provider, endpoint type, etc.
Additional context
Add any other context about the problem here.
The text was updated successfully, but these errors were encountered:
@hsh258 It's clearly an issue with your environment. Have you tried with standalone tools like rdma_{server,client} or qperf if the problem reproduces?
@hsh258 It's clearly an issue with your environment. Have you tried with standalone tools like rdma_{server,client} or qperf if the problem reproduces?
@sydidelot
Hi,
I want to try to change the interval of connectReq.How to achieve the target? tks
@hsh258 The only timeout I am aware of in the verbs provider is this one: #define VERBS_RESOLVE_TIMEOUT 2000 // ms
But honestly, it would be a better option for you to fix your network than trying to hack libfabric :)
@sydidelot
Hi
When the problem reproduces,I find that the first RRoCE connectRequest is valid (322 bytes) in send sides,but it is invalid package in recv sides(120 bytes),the Mad Header ,CM ConnectRequest Invariant CRC of InfiniBand are drop.
Only the second RRoCE connectRequest happen,the recv side can recv the connectRequest usually.
Could you tell me the cause? tks .
I'm sorry, I don't do consulting on networking problems.
As I told you earlier, you should verify your network infrastructure with standalone tools like rdma_server + rdma_client or qperf. If the problem reproduces with these tools, then it's not specific to libfabric.
At DDN, we've been using libfabric with the verbs provider on large scale systems: issues during connection establishment are usually due to a problem in the network fabric itself.
Describe the bug
A clear and concise description of what the bug is.
prov/verbs: one side trigger fi_connect and fi_eq_sread,it cost about 18.5s when the other side recv fi_eq_sread connect msg,Why?
By the way,how to set the interval of reconnect?tks
To Reproduce
Steps to reproduce the behavior:
1,prov/verbs: one side trigger fi_connect and fi_eq_sread
2, After 18.5s,the other side recv fi_eq_sread connect msg.
Expected behavior
If needed, a clear and concise description of what you expected to happen.
Output
If applicable, add output to help explain your problem. (e.g. backtrace, debug logs)
Environment:
OS (if not Linux), provider, endpoint type, etc.
Additional context
Add any other context about the problem here.
The text was updated successfully, but these errors were encountered: