Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

issue: 3586273 Use XLIO_DEFERRED_CLOSE by default #136

Open
wants to merge 163 commits into
base: vNext
Choose a base branch
from
Open
Changes from 1 commit
Commits
Show all changes
163 commits
Select commit Hold shift + click to select a range
1976934
issue: 3514044 Introducing cq_mgr_regrq and cq_mgr_strq
AlexanderGrissik Aug 27, 2023
1495ac4
issue: 3514044 Renaming cq_mgr_mlx5 to cq_mgr_regrq
AlexanderGrissik Aug 27, 2023
fb6022f
issue: 3514044 Renaming cq_mgr_mlx5_strq to cq_mgr_strq
AlexanderGrissik Aug 27, 2023
f21880d
issue: 3514044 Moving cq_mgr_regrq tx methods to cq_mgr
AlexanderGrissik Aug 27, 2023
ea4a4e6
issue: 3514044 Moving cq_mgr_regrq events to cq_mgr
AlexanderGrissik Aug 27, 2023
8f5999c
issue: 3514044 Moving cq_mgr_regrq add_qp_tx to cq_mgr
AlexanderGrissik Aug 27, 2023
16f67fc
issue: 3514044 Moving cq_mgr_regrq RX common to cq_mgr
AlexanderGrissik Aug 27, 2023
52d0cf5
issue: 3514044 Moving Tx from cq_mgr to cq_mgr_tx
AlexanderGrissik Aug 27, 2023
7eb4156
issue: 3514044 Rename cq_mgr to cq_mgr_rx
AlexanderGrissik Aug 28, 2023
553cc35
issue: 3514044 Remove qp_rec struct
AlexanderGrissik Aug 28, 2023
797c40c
issue: 3514044 Squash qp_mgr_eth to qp_mgr
AlexanderGrissik Aug 28, 2023
5fb7681
issue: 3514044 Remove DEFINED_DPCP from qp_mgr and styling fixes
AlexanderGrissik Oct 3, 2023
3e1a3bf
issue: 3514044 Squash qp_mgr_eth_mlx5 to qp_mgr
AlexanderGrissik Oct 3, 2023
6dfaffc
issue: 3514044 Squash qp_mgr_eth_mlx5_dpcp to qp_mgr
AlexanderGrissik Oct 4, 2023
c890d0d
issue: 3514044 Split qp_mgr to hw_queue_tx and hw_queue_rx
AlexanderGrissik Oct 8, 2023
4bd1df4
issue: 3514044 Squash rfs_rule_dpcp to rfs_rule
AlexanderGrissik Oct 8, 2023
01594f5
issue: 3514044 Removing m_attach_flow_data vector from rfs
AlexanderGrissik Oct 9, 2023
d5bb9e4
issue: 3514044 Removing hqrx from attach_flow_data_t
AlexanderGrissik Oct 9, 2023
70b1bb3
issue: 3514044 Removing ibv steering flows
AlexanderGrissik Oct 11, 2023
bc3e728
issue: 3514044 Adding flow tag check through dpcp::adapter
AlexanderGrissik Oct 15, 2023
ea7dfd0
issue: 3514044 Require dpcp for configure and CI
AlexanderGrissik Oct 15, 2023
e2acc51
issue: 3514044 Rebasing changes on top 3.20.5 with coverity fixes
AlexanderGrissik Oct 16, 2023
0e91471
issue 3514044 Fixing package test with mandatory dpcp
AlexanderGrissik Oct 16, 2023
d702e3c
issue: 3514044 Updating min dpcp version to 1.1.43
AlexanderGrissik Dec 31, 2023
67185ae
issue: 3514044 Replacing .inl file with .h
AlexanderGrissik Jan 11, 2024
3ce76f7
issue: 3514044 Removing option_strq
AlexanderGrissik Jan 11, 2024
6cda7bf
issue: 3514044 Removing unnecessary checks
AlexanderGrissik Jan 11, 2024
ecb0c8a
issue: 3745279 Fix artifact generation in CI
alexbriskin Jan 17, 2024
bf0b744
issue: 3664594 Return ETIMEDOUT err for timed out socket
AlexanderGrissik Dec 28, 2023
18f5d31
Issue: 3375239 - add email scan in packages
dpressle Jan 8, 2024
b7ac236
issue: 3724170 Support building as a static library
alexbriskin Jan 8, 2024
2e5c9c6
issue: 3724170 disable LTO in Jenkins compiler tests
alexbriskin Jan 16, 2024
5e8fca0
issue: 3704820 Fix strides in WQE for NGINX master
iftahl Jan 22, 2024
ff34156
issue: 3678579 Update last_unacked on ACK recv
iftahl Nov 22, 2023
4ff44ec
issue: 3678579 Fix last_unsent on retransmission
iftahl Nov 22, 2023
2bf7224
issue: 3678579 Fix last_unacked in tcp_rexmit_rto
iftahl Nov 22, 2023
43fe6e5
issue: 3678579 Update last_unacked in tcp_rexmit
iftahl Nov 22, 2023
95845c3
issue: 3678579 Remove iterating lists to find last
iftahl Nov 22, 2023
8a9d5b2
issue: 3678579 Coverity
iftahl Dec 14, 2023
07203a2
issue: 3690535 Remove SO_XLIO_RING_USER_MEMORY
pasis Dec 3, 2023
b12b93a
issue: 3690535 Reduce ring_allocation_logic size
pasis Dec 3, 2023
8ec6c52
issue: 3690535 Improve condition of ring migration support
pasis Dec 8, 2023
1e92b28
issue: 3690535 Print ring allocation logic type in logs
pasis Dec 8, 2023
871ff34
issue: 3690535 Remove unused fields in sockinfo_tcp
pasis Jan 30, 2024
46d4172
[CI] Coverity: add snapshot action
vialogi Jan 24, 2024
70467eb
issue: 3668182 Add tcp_write_zc/tcp_prealloc_zc
alexbriskin Nov 13, 2023
4713505
issue: 3668182 Connect tcp_write_zc to sockinfo_tcp::tcp_tx
alexbriskin Nov 13, 2023
670058e
issue: 3668182 Remove PBUF_DESC_MAP for send zerocopy
alexbriskin Nov 13, 2023
9ebbfef
issue: 3668182 Allow snd_buf drop below 0 in zero-copy path
alexbriskin Nov 14, 2023
249369d
issue: 3668182 Add sockinfo_tcp::tcp_tx_express
alexbriskin Nov 20, 2023
3c63b73
issue: 3668182 Use tcp_tx_express in TLS tx zerocopy
alexbriskin Dec 5, 2023
cf7ff57
issue: 3668182 Remove zerocopy flow from tcp_write
alexbriskin Dec 31, 2023
94f8392
issue: 3668182 Refactor sockinfo_tcp
alexbriskin Dec 31, 2023
ff55e60
issue: 3668182 Refactor LwIP + sockinfo_tcp + dst_entry_tcp
alexbriskin Nov 15, 2023
1e4e730
issue: 3668182 Fix PR comments
alexbriskin Jan 23, 2024
00db3d7
issue: 3668182 Revert tcp_seg::bufs to pbuf_clen()
alexbriskin Jan 30, 2024
1bc79a7
issue: 3724170 Add missing ifdef __cplusplus
alexbriskin Jan 28, 2024
181f582
issue: 3724170 Remove references to os_api
alexbriskin Jan 29, 2024
72f513d
issue: 3724170 Make xlio.h C standard compliant
benlwalker Jan 26, 2024
65bbb00
issue: 3724170 Disable the constructor/destructor in static build
alexbriskin Jan 29, 2024
24e427f
issue: 3724170 Make socketxtreme API regular function declarations
alexbriskin Jan 30, 2024
569a551
issue: 3724170 Fix compilation for static build
alexbriskin Feb 4, 2024
4534754
issue: 3724170 Disable the *_check functions for the static build
alexbriskin Feb 4, 2024
660afb5
issue: 3771283 Fix function pointer check
alexbriskin Feb 7, 2024
0fe1ee1
version: 3.30.0
galnoam Feb 12, 2024
ae058d1
issue: 3786434 Remove C23 feature from public xlio_extra.h
pasis Feb 19, 2024
2cdbc84
version: 3.30.1
galnoam Feb 22, 2024
9ef53fc
issue: 3792731 Fix -Walloc-size-larger-than warning
pasis Feb 22, 2024
077d5b2
issue: 3514044 Fix null pointer dereference
iftahl Feb 13, 2024
e82e642
issue: 3795922 Remove pbuf_split_64k()
pasis Feb 2, 2024
83fc3f5
issue: 3795922 Remove refused_data in lwip
pasis Feb 2, 2024
74c38c2
issue: 3781322 Fix for 100% CPU load
iftahl Feb 18, 2024
cbdbfec
issue: 3813802 Terminate process instead of 'throw' on panic
pasis Mar 6, 2024
8b076f9
issue: 3813802 Don't wrap xlio_raw_post_recv() with IF_VERBS_FAILURE
pasis Mar 6, 2024
b2c8589
issue: 3813802 Avoid partial initialization of an event_data_t object
pasis Mar 6, 2024
fb1a8ea
issue: 3813802 Remove dst_entry::m_p_send_wqe
pasis Mar 6, 2024
5a7881e
issue: 3813802 Fix type overflow warning in time_converter_rtc
pasis Mar 7, 2024
fa7aadd
issue: 3813802 Fix IP_FRAG_DEBUG=1 build
pasis Mar 7, 2024
4b205fb
issue: 3813802 Include system headers in the right way
pasis Mar 7, 2024
eecf0a5
issue: 3813802 Remove unneeded cppcheck suppressions
pasis Mar 7, 2024
6803858
issue: 3770816 Use override instead virtual
alexbriskin Feb 5, 2024
a095c8b
issue: 3770816 Use nullptr instead of NULL
alexbriskin Feb 6, 2024
b66af63
issue: 3770816 Remove redundant void argument lists
alexbriskin Feb 6, 2024
be5a307
issue: 3770816 Replace empty destructor with default
alexbriskin Feb 6, 2024
e5f3ead
issue: 3788369 Rename thread_local_event_handler
pasis Feb 21, 2024
8561e11
issue: 3788369 Fix subsequent xlio_get_socket_rings_fds() calls
pasis Feb 22, 2024
e583d42
issue: 3788369 Return TX ring by xlio_get_socket_rings_fds()
pasis Feb 23, 2024
757e0a1
issue: 3788369 Don't reset TCP connection twice
pasis Feb 23, 2024
6215359
issue: 3788369 Don't hardcode TCP send buffer for TCP_NODELAY
pasis Feb 27, 2024
a750ddc
issue: 3788369 Disable MSG_ZEROCOPY tests in gtests
pasis Feb 29, 2024
7a4f2da
issue: 3788369 Remove XLIO_ZC_TX_SIZE
pasis Mar 2, 2024
9c38f68
issue: 3788369 Remove redundant max_send_sge field
pasis Mar 2, 2024
cee8ca7
issue: 3788369 Pass iovec to tcp_write_express()
pasis Mar 2, 2024
98e3ada
issue: 3788369 Don't poll RX while checking is_rst()
pasis Mar 2, 2024
a42d50d
issue: 3788369 Fix LwIP type length related to segment/pbuf size
pasis Mar 2, 2024
75221e0
issue: 3788369 Fix Nagle's algorithm for negative snd_buf
pasis Mar 2, 2024
140401b
issue: 3788369 Remove redundant snd_buf check in LwIP
pasis Mar 2, 2024
236d43f
issue: 3788369 Remove pbuf_desc::map
pasis Mar 2, 2024
8779882
issue: 3788369 Introduce XLIO socket API
pasis Feb 4, 2024
4abcf4f
issue: 3788369 Remove lwip/init.[ch]
pasis Mar 4, 2024
cdcc99c
issue: 3788369 Remove pbuf_custom wrapper structure
pasis Mar 4, 2024
fea6e82
version: 3.30.2
galnoam Mar 11, 2024
1d23a9d
issue: HPCINFRA-1321 add Dockerfile for static tests
vialogi Mar 7, 2024
6e9a005
issue: HPCINFRA-1321 Switch cppcheck to a docker
vialogi Mar 5, 2024
bda508b
issue: HPCINFRA-1321 Switch csbuild to a docker
vialogi Mar 6, 2024
accbf7b
issue: HPCINFRA-1321 Switch Tidy to a docker
vialogi Mar 6, 2024
9521056
issue: 3777348 Remove unused pipeinfo class
AlexanderGrissik Feb 12, 2024
e449276
issue: 3777348 Removing cleanable_obj from socket_fd_api
AlexanderGrissik Feb 12, 2024
4305473
issue: 3777348 Removing unused pkt_sndr_source class
AlexanderGrissik Feb 12, 2024
29c6b63
issue: 3777348 Replacing pkt_rcvr_source class with sockinfo
AlexanderGrissik Feb 12, 2024
a1cb209
issue: 3777348 Simplifying timers for TCP sockets
AlexanderGrissik Feb 13, 2024
19441e4
issue: 3777348 Moving wakeup_pipe to be a member of sockinfo
AlexanderGrissik Feb 14, 2024
1aa581a
issue: 3777348 Replacing socket_fd_api access with sockinfo
AlexanderGrissik Feb 15, 2024
fcbba6e
issue: 3777348 Merging socket_fd_api with sockinfo
AlexanderGrissik Feb 15, 2024
0168f14
issue: 3777348 Moving sockinfo inline impl outside the class
AlexanderGrissik Feb 25, 2024
d135aac
issue: 3777348 sockinfo Reordering methods
AlexanderGrissik Feb 25, 2024
eca2eba
issue: 3777348 Moving sock stats outside the socket
AlexanderGrissik Feb 26, 2024
38f8cd0
issue: 3777348 Reordering sockinfo members
AlexanderGrissik Mar 11, 2024
06a7917
issue: 3777348 Removing m_flow_tag_enabled check
AlexanderGrissik Mar 13, 2024
1949822
issue: 3777348 Remove support for SO_XLIO_FLOW_TAG
AlexanderGrissik Mar 13, 2024
28ded23
issue: 3777348 Avoid process_timestamps checking on each packet
AlexanderGrissik Mar 13, 2024
0d975c0
issue: 3777348 Remove precached sysvars from sockinfo
AlexanderGrissik Mar 13, 2024
cb0b278
issue: 3777348 Remove access to m_sock_wakeup_pipe for socketxtreme
AlexanderGrissik Mar 13, 2024
08956f8
issue: 3777348 Avoid checking m_iomux_ready_fd_array for Socketxtrme
AlexanderGrissik Mar 13, 2024
6434e58
issue: 3777348 Avoid unnecessary access to ring_allocation_tx members
AlexanderGrissik Mar 13, 2024
87a76ea
issue: 3777348 Use thread_local dummy lock
AlexanderGrissik Mar 14, 2024
1ff8f40
issue: 3777348 Avoid copying src/dst addresses for TCP flow-tag DP
AlexanderGrissik Mar 14, 2024
7fb5963
issue: 3808935 Add nullptr checks before dereferencing
alexbriskin Mar 18, 2024
af806d1
issue: 3829626 Fix new TCP timers registration for reused sockets
AlexanderGrissik Mar 19, 2024
3a98c90
issue: 3829626 Fixing statistics init for reused sockets
AlexanderGrissik Mar 19, 2024
27931ce
issue: 3788369 Replace XLIO_HUGEPAGE_LOG2 with XLIO_HUGEPAGE_SIZE
pasis Mar 12, 2024
bf77b2c
issue: 3788369 Remove xlio_key prototypes
pasis Mar 18, 2024
0dcadd3
issue: 3788369 Move public types definitions to xlio_types.h
pasis Mar 18, 2024
4df427c
issue: 3788369 Add external allocator to XLIO Socket API
pasis Mar 18, 2024
f959cff
issue: 3788369 Add XLIO Socket API to the xlio_api_t pointers
pasis Mar 18, 2024
1baf574
issue: 3777348 Adding lock_spin_simple for smaller space utilization"
AlexanderGrissik Mar 17, 2024
6ab22fa
issue: 3777348 Adding template cached_obj_pool
AlexanderGrissik Mar 17, 2024
347c362
issue: 3777348 Socketxtreme completions ring pool
AlexanderGrissik Feb 29, 2024
4de688b
issue: Fix big endian build and clean unused macros
pasis Mar 11, 2024
221ea72
version: 3.30.3
galnoam Mar 20, 2024
dcdcd64
issue: 3788369 Keep global collection of the polling groups
pasis Mar 17, 2024
dde0276
issue: 3788369 Keep sockets list per polling group
pasis Mar 17, 2024
8452e00
issue: 3788369 poll_group takes reference to ring
pasis Mar 17, 2024
617759e
issue: 3788369 Throw exception if netdev not found for a ring
pasis Mar 17, 2024
25351d9
issue: 3788369 Release native rings in the poll_group destructor
pasis Mar 21, 2024
0b3eb59
issue: 3788369 Don't free buffer unconditionally in XLIO Socket API
pasis Mar 24, 2024
69ca61f
issue: 3788369 Use reclaim_recv_buffers() in XLIO Socket API
pasis Mar 24, 2024
8ad2b35
issue: 3788369 Pass proper hugepage_size to XLIO Socket API
pasis Mar 25, 2024
7b8853d
issue: 3788369 Poll local ring before XLIO socket destruction
pasis Mar 25, 2024
64bf555
issue: 3788369 Re-read env params in xlio_init_ex()
pasis Mar 25, 2024
4771bdd
issue: 3788369 Avoid POSIX connect() in xlio_socket_connect()
pasis Mar 25, 2024
dfdbd8b
issue: 3788369 Remove get_fd() from XLIO Socket API
pasis Mar 25, 2024
86fc67a
issue: 3829626 Fix seg fault in TCP timers
iftahl Mar 27, 2024
65130bd
issue: 3818038 Remove BlueFlame doorbell method
pasis Apr 1, 2024
6f485a1
issue: 3818038 Remove likely() from the inline WQE branch
pasis Apr 2, 2024
ea38dd7
issue: 3844385 Fix new TCP timers registration lock contention
AlexanderGrissik Apr 2, 2024
8e64060
version: 3.30.4
galnoam Apr 4, 2024
9b7eec0
issue: 3788164 Fix RX poll on TX option for UTLS
pasis Apr 5, 2024
0678a45
issue: 3855390 Fixing adding TCP timer twice warning
AlexanderGrissik Apr 8, 2024
db61660
issue: 3795997 Control TSO max payload size
iftahl Apr 4, 2024
1e18c6a
version: 3.30.5
galnoam Apr 9, 2024
4b47570
issue: 3586273 Use XLIO_DEFERRED_CLOSE by default
iftahl Apr 14, 2024
c0d0e34
issue: add 200ms sleep to gtest
iftahl Apr 17, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Prev Previous commit
Next Next commit
issue: 3788369 Pass iovec to tcp_write_express()
tcp_write_express() is a zerocopy version of tcp_write() and not limited
by the sndbuf or queue length. Therefore, it's unlikely to fail.
However, it still can fail to allocate pbuf or TCP segment in theory.

Move the loop over iovec from higher level methods to
tcp_write_express(), so we keep consistent TCP state if the memory
allocation error happens in the middle of a complex send operation.

Additionally, improve tcp_write_express() robustness and don't allow to
append data to last segment if it's a retransmit.

Signed-off-by: Dmytro Podgornyi <dmytrop@nvidia.com>
  • Loading branch information
pasis authored and galnoam committed Mar 11, 2024
commit cee8ca70f9a076740729d8536ac58444212baa3d
3 changes: 2 additions & 1 deletion src/core/lwip/tcp.h
Original file line number Diff line number Diff line change
@@ -476,7 +476,8 @@ err_t tcp_shutdown(struct tcp_pcb *pcb, int shut_rx, int shut_tx);

err_t tcp_write(struct tcp_pcb *pcb, const void *dataptr, u32_t len, u16_t apiflags,
pbuf_desc *desc);
err_t tcp_write_express(struct tcp_pcb *pcb, const void *arg, u32_t len, pbuf_desc *desc);
err_t tcp_write_express(struct tcp_pcb *pcb, const struct iovec *iov, u32_t iovcnt,
pbuf_desc *desc);

#define TCP_PRIO_MIN 1
#define TCP_PRIO_NORMAL 64
181 changes: 109 additions & 72 deletions src/core/lwip/tcp_out.c
Original file line number Diff line number Diff line change
@@ -726,122 +726,159 @@ err_t tcp_write(struct tcp_pcb *pcb, const void *arg, u32_t len, u16_t apiflags,
/**
* Write data for sending (but does not send it immediately).
*
* It waits in the expectation of more data being sent soon (as
* it can send them more efficiently by combining them together).
* To prompt the system to send data now, call tcp_output() after
* calling tcp_write_express().
*
* The function will zero-copy the data into the payload, i.e. the data pointer, instead of the
* data, will be copied.
* data, will be set.
*
* @param pcb Protocol control block for the TCP connection to enqueue data for.
* @param arg Pointer to the data to be enqueued for sending.
* @param len Data length in bytes
* @param iov Vector of the data buffers to be enqueued for sending.
* @param iovcnt Number of the iov elements.
* @param desc Additional metadata that allows later to check the data mkey/lkey.
* @return ERR_OK if enqueued, another err_t on error
*/
err_t tcp_write_express(struct tcp_pcb *pcb, const void *arg, u32_t len, pbuf_desc *desc)
err_t tcp_write_express(struct tcp_pcb *pcb, const struct iovec *iov, u32_t iovcnt, pbuf_desc *desc)
{
struct pbuf *p;
struct tcp_seg *seg = NULL, *prev_seg = NULL, *queue = NULL;
u32_t pos = 0; /* position in 'arg' data */
u8_t optflags = TF_SEG_OPTS_ZEROCOPY;
const u32_t mss_local = tcp_tso(pcb) ? pcb->tso.max_payload_sz : pcb->mss;
struct tcp_seg *seg = NULL;
struct tcp_seg *queue = NULL;
struct tcp_seg *last;
const u32_t seglen_max = tcp_tso(pcb) ? pcb->tso.max_payload_sz : pcb->mss;
u32_t pos;
u32_t seglen;
u32_t last_seglen;
u32_t total_len = 0;
u16_t queuelen = 0;

if (len < pcb->mss) {
const int byte_queued = pcb->snd_nxt - pcb->lastack;
pcb->snd_sml_add = (pcb->unacked ? pcb->unacked->len : 0) + byte_queued;
}
u8_t optflags = TF_SEG_OPTS_ZEROCOPY;

/*
* Chain a new pbuf to the end of pcb->unsent if there is enough space.
*
* We may run out of memory at any point. In that case we must
* return ERR_MEM and not change anything in pcb. Therefore, all
* changes are recorded in local variables and committed at the end
* of the function. Some pcb fields are maintained in local copies:
*
* queuelen = pcb->snd_queuelen
*
* These variables are set consistently by the phases.
* seg points to the last segment tampered with.
* pos records progress as data is segmented.
* We may run out of memory at any point. In that case we must return ERR_MEM and not change
* anything in pcb. Therefore, all changes are recorded in local variables and committed at
* the end of the function. Some pcb fields are maintained in local copies.
*/
if (pcb->unsent != NULL) {
seg = pcb->last_unsent;
u32_t space = LWIP_MAX(mss_local, pcb->tso.max_payload_sz) - seg->len;

if (space > 0 && (seg->flags & TF_SEG_OPTS_ZEROCOPY) &&
pbuf_clen(seg->p) < pcb->tso.max_send_sge) {
seglen = space < len ? space : len;
last = pcb->last_unsent;
const bool can_merge =
last && (last->flags & TF_SEG_OPTS_ZEROCOPY) && TCP_SEQ_GEQ(last->seqno, pcb->snd_nxt);
if (!can_merge) {
/* We cannot append data to a segment of different type or a retransmitted segment. */
last = NULL;
}
last_seglen = last ? last->len : 0;

for (unsigned i = 0; i < iovcnt; ++i) {
u8_t *data = (u8_t *)iov[i].iov_base;
const u32_t len = iov[i].iov_len;
pos = 0;

if ((p = tcp_pbuf_prealloc_express(seglen, pcb, PBUF_ZEROCOPY, desc, NULL)) == NULL) {
goto memerr;
/* Chain a new pbuf to the last segment if there is enough space. */
if (last) {
seg = last;
const u32_t space = seglen_max - seg->len;

if (space > 0 && pbuf_clen(seg->p) < pcb->tso.max_send_sge) {
seglen = space < len ? space : len;

p = tcp_pbuf_prealloc_express(seglen, pcb, PBUF_ZEROCOPY, desc, NULL);
if (!p) {
goto memerr;
}
p->payload = data;
pbuf_cat(seg->p, p);
seg->len += p->tot_len;
pos += seglen;
queuelen++;
}
p->payload = (u8_t *)arg;
pbuf_cat(seg->p, p);
seg->len += p->tot_len;
pos += seglen;
queuelen++;
}
}

while (pos < len) {
u32_t left = len - pos;
seglen = left > mss_local ? mss_local : left;
while (pos < len) {
u32_t left = len - pos;
seglen = left > seglen_max ? seglen_max : left;

if ((p = tcp_pbuf_prealloc_express(seglen, pcb, PBUF_ZEROCOPY, desc, NULL)) == NULL) {
goto memerr;
}
p->payload = (u8_t *)arg + pos;
queuelen++;
p = tcp_pbuf_prealloc_express(seglen, pcb, PBUF_ZEROCOPY, desc, NULL);
if (!p) {
goto memerr;
}
p->payload = data + pos;

if ((seg = tcp_create_segment(pcb, p, 0, pcb->snd_lbb + pos, optflags)) == NULL) {
tcp_tx_pbuf_free(pcb, p);
goto memerr;
}
seg = tcp_create_segment(pcb, p, 0, pcb->snd_lbb + total_len + pos, optflags);
if (!seg) {
tcp_tx_pbuf_free(pcb, p);
goto memerr;
}

if (queue == NULL) {
queue = seg;
} else {
prev_seg->next = seg;
if (!queue) {
queue = seg;
}
if (last) {
last->next = seg;
}
last = seg;

pos += seglen;
queuelen++;
}

prev_seg = seg;
pos += seglen;
total_len += len;
}

/* Set the PSH flag in the last segment that we enqueued. */
if (enable_push_flag && seg != NULL && seg->tcphdr != NULL) {
TCPH_SET_FLAG(seg->tcphdr, TCP_PSH);
}

#if TCP_OVERSIZE
pcb->unsent_oversize = 0;
#endif /* TCP_OVERSIZE */

if (pcb->last_unsent == NULL) {
if (!pcb->last_unsent) {
pcb->unsent = queue;
} else {
/* The next field is either NULL or equals to queue, so we can overwrite. */
pcb->last_unsent->next = queue;
}
pcb->last_unsent = seg;
if (last) {
pcb->last_unsent = last;
}

/*
* Finally update the pcb state.
*/
pcb->snd_lbb += len;
pcb->snd_buf -= len;
/* Update the pcb state. */
pcb->snd_lbb += total_len;
pcb->snd_buf -= total_len;
pcb->snd_queuelen += queuelen;

/* Set the PSH flag in the last segment that we enqueued. */
if (enable_push_flag && seg != NULL && seg->tcphdr != NULL) {
TCPH_SET_FLAG(seg->tcphdr, TCP_PSH);
/* TODO Move Minshall's logic to tcp_output(). */
if (total_len < pcb->mss) {
const u32_t byte_queued = pcb->snd_nxt - pcb->lastack;
pcb->snd_sml_add = (pcb->unacked ? pcb->unacked->len : 0) + byte_queued;
}

return ERR_OK;

memerr:
/* Error path - restore unsent queue. */
pcb->flags |= TF_NAGLEMEMERR;
if (queue != NULL) {
tcp_tx_segs_free(pcb, queue);
}
if (pcb->last_unsent && last_seglen > 0) {
pcb->last_unsent->next = NULL;
p = pcb->last_unsent->p;
while (last_seglen > 0) {
last_seglen -= p->len;
p = p->next;
}
if (p) {
pcb->last_unsent->len -= p->tot_len;
struct pbuf *ptmp = pcb->last_unsent->p;
while (ptmp) {
ptmp->tot_len -= p->tot_len;
if (ptmp->next == p) {
ptmp->next = NULL;
}
ptmp = ptmp->next;
}
assert(pcb->last_unsent->len == last_seglen);
assert(pcb->last_unsent->p->tot_len == last_seglen);
}
}
return ERR_MEM;
}

53 changes: 20 additions & 33 deletions src/core/sock/sockinfo_tcp.cpp
Original file line number Diff line number Diff line change
@@ -1010,20 +1010,10 @@ ssize_t sockinfo_tcp::tcp_tx(xlio_tx_call_attr_t &tx_arg)
is_non_file_zerocopy, errno_tmp);
}

err = tcp_write_express(&m_pcb, tx_ptr, tx_size, &tx_arg.priv);
const struct iovec iov = {.iov_base = tx_ptr, .iov_len = tx_size};
err = tcp_write_express(&m_pcb, &iov, 1, &tx_arg.priv);
if (unlikely(err != ERR_OK)) {
if (unlikely(err == ERR_CONN)) { // happens when remote drops during big write
si_tcp_logdbg("connection closed: tx'ed = %d", total_tx);
shutdown(SHUT_WR);
return tcp_tx_handle_partial_send_and_unlock(total_tx, EPIPE, is_dummy,
is_non_file_zerocopy, errno_tmp);
}
if (unlikely(err != ERR_MEM)) {
// we should not get here...
BULLSEYE_EXCLUDE_BLOCK_START
si_tcp_logpanic("tcp_write return: %d", err);
BULLSEYE_EXCLUDE_BLOCK_END
}
// tcp_write_express() can return only ERR_MEM error.
return tcp_tx_handle_partial_send_and_unlock(total_tx, EAGAIN, is_dummy,
is_non_file_zerocopy, errno_tmp);
}
@@ -1161,9 +1151,13 @@ ssize_t sockinfo_tcp::tcp_tx_slow_path(xlio_tx_call_attr_t &tx_arg)
is_send_zerocopy, errno_tmp);
}

err_t err = (apiflags & XLIO_TX_PACKET_ZEROCOPY)
? tcp_write_express(&m_pcb, tx_ptr, tx_size, &tx_arg.priv)
: tcp_write(&m_pcb, tx_ptr, tx_size, apiflags, &tx_arg.priv);
err_t err;
if (apiflags & XLIO_TX_PACKET_ZEROCOPY) {
const struct iovec iov = {.iov_base = tx_ptr, .iov_len = tx_size};
err = tcp_write_express(&m_pcb, &iov, 1, &tx_arg.priv);
} else {
err = tcp_write(&m_pcb, tx_ptr, tx_size, apiflags, &tx_arg.priv);
}
if (unlikely(err != ERR_OK)) {
if (unlikely(err == ERR_CONN)) { // happens when remote drops during big write
si_tcp_logdbg("connection closed: tx'ed = %d", total_tx);
@@ -6050,30 +6044,23 @@ int sockinfo_tcp::tcp_tx_express(const struct iovec *iov, unsigned iov_len, uint
mdesc.opaque = opaque_op;

int bytes_written = 0;

lock_tcp_con();

err_t err;
for (unsigned i = 0; i < iov_len; ++i) {
err = tcp_write_express(&m_pcb, iov[i].iov_base, iov[i].iov_len, &mdesc);
if (err != ERR_OK) {
/* The only error in tcp_write_express is a memory error
* In this version we don't implement any error recovery or avoidance
* mechanism and an error at this stage is irrecoverable.
* The considered alternatives are:
* - Setting the socket an error state (this is the one we chose here)
* - Rolling back any written buffers, i.e. recovering
* - Reserving the pbuf(s)/tcp_seg(s) before calling for tcp_write_express */
m_conn_state = TCP_CONN_ERROR;
m_error_status = ENOMEM;
return tcp_tx_handle_errno_and_unlock(ENOMEM);
}
bytes_written += iov[i].iov_len;
}

lock_tcp_con();

err_t err = tcp_write_express(&m_pcb, iov, iov_len, &mdesc);
if (unlikely(err != ERR_OK)) {
// The only error in tcp_write_express() is a memory error.
m_conn_state = TCP_CONN_ERROR;
m_error_status = ENOMEM;
return tcp_tx_handle_errno_and_unlock(ENOMEM);
}
if (!(flags & XLIO_EXPRESS_MSG_MORE)) {
tcp_output(&m_pcb);
}

unlock_tcp_con();

return bytes_written;