Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[v1.22.x] prov/efa: backport changes #10673

Merged

Conversation

shijin-aws
Copy link
Contributor

backport #10594 and #10657

Libfabric currently refill the rx pkt pool in every cq read when there are >0 pkts to post,
which makes it have chance to post ibv_recv 1-by-1 if there is only 1 pkt to post per cq read.
Such 1-by-1 post is less performant than having a batch post once.

This patch improves this strategy by introducing a threshold for the refilling. When
When the number of internal rx pkts to post is lower than this threshold, the refill will be skipped.

Also introduced FI_EFA_INTERNAL_RX_REFILL_THRESHOLD that allows tuning this parameter.

Signed-off-by: Shi Jin <[email protected]>
(cherry picked from commit a149f51)
@shijin-aws shijin-aws requested a review from a team January 3, 2025 00:35
@shijin-aws shijin-aws force-pushed the unsolicited_recv_handshake_v1.22.x branch from 4068fa4 to d91ffef Compare January 3, 2025 05:10
…oth sides

Currently, when local support unsolicited write recv while the peer
doesn't support it, the peer will crash because it expects to get
a valid wr_id for IBV_WC_RECV_RDMA_WITH_IMM op code. This peer crash
can cause weird error message on sender side's cq when it is still
sending data to it. When local doesn't support unsolicited write recv
while the peer support it, local will get cq error for the rdma op as
"Unexpected status" as well.

This patch makes the initiator of rdma write imm
detect the unsolicited write recv support status on both sides. If
there is inconsistency, the initiator will return error with clear
error messages that instruct the mitigation.

Signed-off-by: Shi Jin <[email protected]>
(cherry picked from commit ed5560a)
@shijin-aws shijin-aws force-pushed the unsolicited_recv_handshake_v1.22.x branch from d91ffef to b991637 Compare January 3, 2025 22:56
@sunkuamzn sunkuamzn merged commit 15bbaa2 into ofiwg:v1.22.x Jan 6, 2025
9 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants