Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Flow timeout timing/v16.1 #12313

Draft
wants to merge 18 commits into
base: master
Choose a base branch
from

Conversation

victorjulien
Copy link
Member

More precise flow timeout enforcement.

Cleans up unix socket initialization.

SV_BRANCH=OISF/suricata-verify#2203

Replaces #12278:
Tried to improve commits and a bunch of comments in the code to make it easier to follow what is changes.
Also reorganized the commits a bit.

https://redmine.openinfosecfoundation.org/issues/7455

Copy link

codecov bot commented Dec 20, 2024

Codecov Report

Attention: Patch coverage is 90.34483% with 14 lines in your changes missing coverage. Please review.

Project coverage is 83.26%. Comparing base (6f937c7) to head (98ad048).

Additional details and impacted files
@@           Coverage Diff           @@
##           master   #12313   +/-   ##
=======================================
  Coverage   83.26%   83.26%           
=======================================
  Files         912      912           
  Lines      257643   257693   +50     
=======================================
+ Hits       214521   214577   +56     
+ Misses      43122    43116    -6     
Flag Coverage Δ
fuzzcorpus 61.13% <43.57%> (-0.01%) ⬇️
livemode 19.54% <46.42%> (+0.14%) ⬆️
pcap 44.38% <84.28%> (-0.04%) ⬇️
suricata-verify 62.88% <90.71%> (+0.02%) ⬆️
unittests 59.17% <18.18%> (-0.02%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

@suricata-qa
Copy link

Information:

ERROR: QA failed on SURI_TLPW2_autofp_suri_time.

field baseline test %
SURI_TLPW1_stats_chk
.app_layer.flow.dhcp 563 600 106.57%

Pipeline 24036

When a thread fails to spawn, include the thread name in the error
message.
No longer init then deinit part of the engine at startup of the unix
socket mode.
Timeout checks would access certain fields w/o locking, which could lead
to thread safety issues.
Can be used to log when the tcp session reuse logic triggers.
Rename to be consistent with other naming:

STREAM_PKT_FLAG_TCP_PORT_REUSE -> STREAM_PKT_FLAG_TCP_SESSION_REUSE
Use a more precise calculation for timing out flows, using both the
seconds and the micro seconds.

Ticket: OISF#7455.
The flow worker needs to get the opportunity to run the flow update
before globally making it's current timestamp available. This is to
avoid another thread using the time to evict the flow that is about to
get a legitimate update.

Ticket: OISF#7455.
If a thread doesn't receive packets for a while the packet timestamp
will no longer be used to determine a reasonable minimum timestamp for
flow timeout handling.

To avoid issues with the minimum timestamp to be set a bit too
aggressively, increase the time a thread can be inactive.
Flow Manager skips rows based on a minimized tracker that tracks the
next second at which the first flow may time out.

If seconds match a flow can still be timing out.
When timing out flows, use the timestamp from the "owning" thread. This
avoids problems with threads being out of sync with each other.

Ticket: OISF#7455.
@victorjulien victorjulien marked this pull request as draft December 20, 2024 15:49
As this may mean that a threads ts is a bit ahead of the minimum time
the flow manager normally uses, it can evict flows a bit faster.

Ticket: OISF#7455.
Until now many accesses to the Thread structure required taking a global
lock, leading to performance issues. In practice this only happened in
offline mode.

This patch adds a finer grained locking scheme. It assumes that the
Thread object itself cannot disappear, and adds a spinlock to protect
updates to the structure.

Additionally, the `pktts` field is made an atomic, so that it can be
read w/o taking the spinlock. Updates to it are still done under lock.
The idea of sealing the thread store is that its members can be accessed
w/o holding a lock to the whole store at runtime.
Since `Thread` objects are part of a big allocation, more than one
Thread could be on a single cache line, leading to false sharing. Atomic
updates to one `Thread` could then lead to poor performance accessing
another `Thread`. Align to CLS (cache line size) to avoid this.
Some checks can be done w/o holding a lock:
- seeing if the flow matches the packet
- if the hash row needs a timeout check

This patch skips taking a lock in these conditions.
Explain meaning of `ts` in flow managers main loop.
@victorjulien victorjulien force-pushed the flow-timeout-timing/v16.1 branch from acefd1d to 98ad048 Compare December 20, 2024 16:06
@suricata-qa
Copy link

Information:

ERROR: QA failed on SURI_TLPW2_autofp_suri_time.

field baseline test %
SURI_TLPW1_stats_chk
.app_layer.flow.dhcp 563 600 106.57%

Pipeline 24044

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

Successfully merging this pull request may close these issues.

2 participants