-
Notifications
You must be signed in to change notification settings - Fork 21
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
issue: 4183221 propagate socket errors #276
base: release_3_40
Are you sure you want to change the base?
issue: 4183221 propagate socket errors #276
Conversation
fa87745
to
519b192
Compare
Tested with sockperf |
BTW, maybe instead of propagating errors to socket and handling socket deletion cases, we can mark some bit on the mem_buf_desc/pbuf, and on the next rexmit attempt lwip/socket will check the bit and fail it. |
519b192
to
4b913e2
Compare
ZC API tester yielded: |
9dfc5b1
to
e78f6dd
Compare
TODO - squash |
9620582
to
8b7a5b4
Compare
Following the split concern @AlexanderGrissik - validated
|
bot:retest |
ea28c28
to
7c9b9a2
Compare
Do we want this to vNext? |
CLONED = 0x01, | ||
ZCOPY = 0x02, | ||
INVALID = 0x04, | ||
}; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please do another scan and check we do not use other values than these for flags.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
verified
src/core/sock/sockinfo_tcp.cpp
Outdated
@@ -1372,6 +1372,17 @@ err_t sockinfo_tcp::ip_output(struct pbuf *p, struct tcp_seg *seg, void *v_p_con | |||
int count = 0; | |||
void *cur_end; | |||
|
|||
if (unlikely(is_set(attr.flags, XLIO_TX_PACKET_REXMIT))) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Lets move this method to a separate inline method. Lets make sure that possible splits will not break it.
src/core/sock/sockinfo_tcp.cpp
Outdated
@@ -1372,6 +1387,10 @@ err_t sockinfo_tcp::ip_output(struct pbuf *p, struct tcp_seg *seg, void *v_p_con | |||
int count = 0; | |||
void *cur_end; | |||
|
|||
if (unlikely(is_socket_in_error_state(p, (struct tcp_pcb *)v_p_conn, flags))) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maybe is_buffer_in_error_state
src/core/sock/sockinfo_tcp.cpp
Outdated
@@ -1347,6 +1347,21 @@ ssize_t sockinfo_tcp::tcp_tx_slow_path(xlio_tx_call_attr_t &tx_arg) | |||
return tcp_tx_handle_done_and_unlock(total_tx, errno_tmp, is_dummy, is_send_zerocopy); | |||
} | |||
|
|||
static bool is_socket_in_error_state(const struct pbuf *p, struct tcp_pcb *pcb, uint16_t flags) | |||
{ | |||
if (unlikely(flags & XLIO_TX_PACKET_REXMIT)) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i would move this check to sockinfo_tcp::ip_output because now we have 2 ifs
src/core/proto/mem_buf_desc.h
Outdated
TYPICAL = 0, | ||
CLONED = 0x01, | ||
ZCOPY = 0x02, | ||
INVALID = 0x04, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would rename it to something more specific than just invalid. Maybe: HAD_CQE_ERROR
src/core/sock/sockinfo_tcp.cpp
Outdated
@@ -1372,6 +1384,13 @@ err_t sockinfo_tcp::ip_output(struct pbuf *p, struct tcp_seg *seg, void *v_p_con | |||
int count = 0; | |||
void *cur_end; | |||
|
|||
if (unlikely(flags & XLIO_TX_PACKET_REXMIT)) { | |||
const mem_buf_desc_t *mem_buf_desc = (const mem_buf_desc_t *)p; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please consider cpp style cast
src/core/sock/sockinfo_tcp.cpp
Outdated
@@ -1372,6 +1384,13 @@ err_t sockinfo_tcp::ip_output(struct pbuf *p, struct tcp_seg *seg, void *v_p_con | |||
int count = 0; | |||
void *cur_end; | |||
|
|||
if (unlikely(flags & XLIO_TX_PACKET_REXMIT)) { | |||
const mem_buf_desc_t *mem_buf_desc = (const mem_buf_desc_t *)p; | |||
if (unlikely(is_socket_in_error_state(mem_buf_desc, (struct tcp_pcb *)v_p_conn))) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
same
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What do you think about making this method a member and doing:
p_si_tcp->is_socket_in_error_state()
Also the name is_socket_in_error_state is a bit confusing because it suggests some read only check but in fact it sets flags. Maybe check_previous_buffer_failure?
src/core/sock/sockinfo_tcp.cpp
Outdated
@@ -5096,8 +5118,9 @@ int sockinfo_tcp::rx_wait_helper(int &poll_count, bool blocking) | |||
// It can be too expansive for the application to get nothing just because of lock contention. | |||
// In this case it will be better to have a lock() version of poll_and_process_element_rx. | |||
// And then we should continue polling untill we have ready packets or we drained the CQ. | |||
bool all_drained = true; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why these are here?
6b5b2bc
to
c174be4
Compare
c174be4
to
489c0bc
Compare
In case we got an ECQE - set the socket in a error state. Effectively this sends a TCP-RST. Signed-off-by: Tomer Cabouly <[email protected]>
489c0bc
to
780d15e
Compare
In case we got an ECQE - set the socket in a closing state. Effectively this sends a TCP-RST.
Description
Currently, in case we got an ECQE - we don't notify our callers we got to an invalid state.
The callers will keep trying to transmit TX segments, unaware of any issues.
This causes callers to not be able to identify and overcome error states, causing issues like 4183221.
What
Propagates the pcb to sq_wqe_prop, so when getting an ECQE - we could notify the TCP/IP stack.
Why ?
Justification for the PR. If there is existing issue/bug please reference.
How ?
Change type
What kind of change does this PR introduce?
Check list