Jitter in RT communication under system load (Linux) #1374
Replies: 4 comments 23 replies
-
@Schrolli91 thanks for doing this tests. Could you please repeat them with Another question, does your custom implementation also have a broker? If yes, does it run on the same core as one of the clients? |
Beta Was this translation helpful? Give feedback.
-
@Schrolli91 I played around with it over the long weekend and at the moment I suspect maybe our lock-free algorithms. They have the advantage that we do not require a mutex which is perfect for a safety-critical system but may have the disadvantage that they require longer under high CPU load. In most lock-free structures you find some kind of loop where the work is performed and if the structure is unchanged after the work is done you change the structure but if the structure was modified you start again. If this is the case we could implement a queue which is guarded by a mutex which does not have this problem at all. This maybe solves your problem but would not be usable in a safety critical context since blocking calls are forbidden. |
Beta Was this translation helpful? Give feedback.
-
This problem you can mitigate by starting roudi with |
Beta Was this translation helpful? Give feedback.
-
@Schrolli91 I created the issue #1436 which should solve your jitter problem when you turn of the monitoring in roudi. I would like to add you as a reviewer if it's alright with you so that you are able to test if the jittering is really gone. |
Beta Was this translation helpful? Give feedback.
-
Hey folks,
we are currently investigating the suitability of Iceoryx for applications with hard realtime requirements under Linux. We noticed a lot of unexpected jitter, especially compared to a custom/self-made lib we are using internally which uses a shared memory mechanism. After some tests we are not sure where this extreme jitter comes from. It occurs mainly under load of the system.
The only difference we see to our simpler tool is the usage of atomic operations (CAS) in lockfree-queue handling and chunk management. According to the "perf" tool we saw that these operations have the greatest overhead and therefore the highest impact on latency during Pub/Sub communication.
Maybe one of you has an idea what the reason could be why Iceoryx is so much more susceptible to jitter, or what we should investigate further.
Best regards
Operating system:
Debian 11 - Kernel 5.10.84-rt58
Iceoryx Version 2.0.1
Hardware: Industrial PC with Quadcore CPU
Compiler version:
gcc 10.2.1
Observed result or behavior:
We see a lot of jitter within the communication. This occurs especially under increased system load.
A custom lib based on a simple "shmem" approach (p2p/one-way) does not show this extreme jitter.
Expected result or behavior:
Deterministic transmission times of the packets by the usage of Iceoryx.
Conditions where it occurred / Performed steps:
Test Application:
Tests were executed with a simple Publisher/Subscriber application based on provided examples.
Both Application subscribe to each other and proceed to ping-pong messages.
One million packets were sent for warm-up, and then RTTs were measured individually for 10 million messages.
System configuration:
RouDi
Affinity:
taskset –c 2 chrt –f 80
Prio:
FIFO 80
Pinned to Core 2
Publisher
Affinity:
taskset –c 2 chrt –f 80
Prio:
FIFO 80
Pinned to Core 2
Subscriber
Affinity:
taskset –c 3 chrt –f 80
Prio:
FIFO 80
Pinned to Core 3
Stress/Systemload is generated via:
stress-ng --cpu 4 --io 2 --vm 2 --vm-bytes 128M --fork 4 --timeout 0
Here is a comparison of the jitter we could observe using Iceoryx compared to our simpler internal library.
(The absolute values are not the problem, but the enormous jitter at Iceoryx).
Iceoryx (untyped API) – polling – 80 bytes – with stress
Custom implementation – polling – 80 bytes – with stress
Beta Was this translation helpful? Give feedback.
All reactions