Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

issue: 3586273 Use XLIO_DEFERRED_CLOSE by default #136

Open
wants to merge 163 commits into
base: vNext
Choose a base branch
from

Conversation

iftahl
Copy link
Contributor

@iftahl iftahl commented Apr 14, 2024

For incoming sockets - no change.
For outgoing sockets - since outgoing sockets occupy a local port, we should release it on the socket destructor to prevent race from another socket to use the same port.
This race might cause XLIO to hold 2 sockets with the same RFS object at the same time, and this is fatal (particularly for TCP).

Change type

What kind of change does this PR introduce?

  • Bugfix
  • Feature
  • Code style update
  • Refactoring (no functional changes, no api changes)
  • Build related changes
  • CI related changes
  • Documentation content changes
  • Tests
  • Other

Check list

  • Code follows the style de facto guidelines of this project
  • Comments have been inserted in hard to understand places
  • Documentation has been updated (if necessary)
  • Test has been added (if possible)

AlexanderGrissik and others added 30 commits January 14, 2024 16:02
Signed-off-by: Alexander Grissik <[email protected]>
At most a single element of this vector is always used.
Once rfs constructor is complete there must be exactly one attach_flow_data element in case of ring_simple.
For ring_tap this element remains null.

Signed-off-by: Alexander Grissik <[email protected]>
Signed-off-by: Alexander Grissik <[email protected]>
Set ETIMEDOUT errno and return -1 from recv in case a socket was timed out, instead of 0 return value and 0 errno.
For instance, in case of TCP keep alive timeout.

Signed-off-by: Alexander Grissik <[email protected]>
The idea is to scan all rpm/deb packages for personal emails
we should not be releasing packages with such emails
the scan is done on both the metadat info and the changelog
of a specific package

Issue: HPCINFRA-919
Signed-off-by: Daniel Pressler <[email protected]>
pasis and others added 15 commits April 1, 2024 00:09
reclaim_recv_single_buffer() accumulates buffers in a list. In the
performance oriented API we want to reuse hot buffers immediately, so
reclaim_recv_buffers() implementation is more suitable.

Signed-off-by: Dmytro Podgornyi <[email protected]>
The memory callback provides hugepage size of the underlying pages.
Replace hardcoded 0 with real hugepage size.

Keep the page size in xlio_allocator object. This field a relevant only
the hugepage allocation method and 0 in all other cases.

Signed-off-by: Dmytro Podgornyi <[email protected]>
XLIO Socket API must guarantee that the XLIO_SOCKET_EVENT_TERMINATED is
not followed by any other events. Therefore, all the TX completion
events must be completed by that moment.

Do a polling iteration before calling socket destructor to increase the
chance that all the relevant WQEs are completed. This mechanism needs to
be improved in the future.

Signed-off-by: Dmytro Podgornyi <[email protected]>
xlio_init_ex() changes some default parameters. However, a global object
can trigger safe_mce_sys() constructor at the start. Therefore, we need
to re-read the environment variables again to guarantee that the changed
parameters take place.

Signed-off-by: Dmytro Podgornyi <[email protected]>
Avoid using connect() with sock fd interface, because fd_collection
doesn't keep xlio_socket_t objects.

Signed-off-by: Dmytro Podgornyi <[email protected]>
xlio_socket_t objects aren't connected to the fd_collection anymore.
Therefore, all the methods must be called from the sockinfo_tcp objects
directly.

Also, xlio_socket_fd() is not relevant anymore and can be removed.

Signed-off-by: Dmytro Podgornyi <[email protected]>
Iterate over std::list of TCP sockets while
erasing socket during iteration.
Overcomed by increasing iterator before erase.

Signed-off-by: Iftah Levi <[email protected]>
rdma-core limits number of UARs per context to 16 by default. After
creating 16 QPs, XLIO receives duplicates of blueflame registers for
each subsequent QP. As results, blueflame doorbell method can write WQEs
concurrently without serialization and this leads to a data corruption.

BlueFlame can make impact on throughput, since copy to the blueflame
register is expensive. It can improve latency in some low latency
scenarios, however, XLIO targets high traffic/PPS rates.
Removing blueflame method also slightly improves performance in some
scenarios.

BlueFlame can be returned back in the future to improve low-latency
scenarios, however, it will need some rework to avoid the data
corruption.

Signed-off-by: Dmytro Podgornyi <[email protected]>
The inline WQE branch is not likely in most throughput scenarios.

Signed-off-by: Dmytro Podgornyi <[email protected]>
Avoid calling register_socket_timer_event when a socket is already registered (TIME-WAIT).
Although there is no functionality issue with that, it produces too high rate of posting events for internal-thread.
This leads to lock contantion inside internal-thread and degraded performance of HTTP CPS.

Signed-off-by: Alexander Grissik <[email protected]>
Signed-off-by: Gal Noam <[email protected]>
UTLS uses tcp_tx_express() for non blocking sockets. However, this TX
method doesn't support XLIO_RX_POLL_ON_TX_TCP. Additional RX polling
improves scenarios such as WEB servers.

Insert RX polling into UTLS TX path to resolve performance degradation.

Signed-off-by: Dmytro Podgornyi <[email protected]>
In heavy CPS scenarios a socket may go to TIME-WAIT state and be reused before first TCP timer registration is performed by internal-thread.
1. Setting timer_registered=true while posting the event prevents the second attemp to try and post the event again.
2. Adding sanity check in add_new_timer that verifies that the socket is not already in the timer map.

Signed-off-by: Alexander Grissik <[email protected]>
Added new env parameter - XLIO_MAX_TSO_SIZE.
It allows the user to control maximum size of TSO,
instead of taking the maximum cap by HW.
The default size is 256KB (maximum by current HW).
Values higher than HW capabilities won't be taken into account.

Signed-off-by: Iftah Levi <[email protected]>
Signed-off-by: Gal Noam <[email protected]>
@iftahl iftahl requested a review from AlexanderGrissik April 14, 2024 08:26
@iftahl iftahl force-pushed the deferred_close branch 3 times, most recently from b70ab11 to 72c5bc9 Compare April 14, 2024 18:01
For incoming sockets - no change.
For outgoing sockets - since outgoing sockets occupy a local port, we
should release it on the socket destructor to prevent race from another
socket to use the same port.
This race might cause XLIO to hold 2 sockets with the same RFS object at
the same time, and this is fatal (particularly for TCP).

Signed-off-by: Iftah Levi <[email protected]>
For tests related to bind()

Signed-off-by: Iftah Levi <[email protected]>
@iftahl
Copy link
Contributor Author

iftahl commented Apr 17, 2024

bot:retest

1 similar comment
@galnoam
Copy link
Collaborator

galnoam commented Jun 5, 2024

bot:retest

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
postponed Postponed for further decisions
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants