-
Notifications
You must be signed in to change notification settings - Fork 91
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Possible memory leak in Cyclone/Iceoryx subscriber history queue #471
Comments
This sounds like an issue with Iceoryx more than an issue with Cyclone itself: we just use the Iceoryx API to publish data and subscribe to it. It would be good to see whether the Iceoryx guys agree with that initial assessment. I am not sure whom to ping for help, perhaps @elBoberido? |
@ksuszka what do you mean with aborting the execution abruptly? Is there still a graceful shutdown or is the application killed. @eboasson this error happens when the chunks taken from the subscriber are not released. Every time a chunk is taken out of the queue it is internally stored in a fixed size used chunk list. When the list is full and there is no space for further tracking the subscriber will immediately release the chunk and return with the |
Writing here just to link the issues ;) We are also experiencing memory leaks that might be / might not be related to CycloneDDS, more information here: ros2/geometry2#630 (comment) |
FYI, for us (with @nachovizzo) the root cause was actually with tf2: ros2/geometry2#636 |
Bug report
Required Info:
Steps to reproduce issue
I prepared separate repo with two very simple applications (publisher/subscriber) to show the issue. https://github.com/ksuszka/cyclonedds_iceoryx_memory_leak/tree/chunks-leak
Using this repo:
Build docker image:
Open four terminal windows.
In the first terminal window run:
In the second terminal window run:
In the third terminal window run:
In the fourth terminal window again run:
And wait a minute.
After some time you will most likely start to get errors:
Expected behavior
Messages which cannot be processed due to a slow subscriber are dropped silently.
Actual behavior
Messages which cannot be processed due to a slow subscriber are dropped silently for a few seconds and then Iceoryx errors start to appear.
Additional information
In this example a really slow subscriber has a QoS with a history depth slightly smaller (250) than the maximum history depth available in the precompiled ros-humble-iceoryx-* package (256). When subscriber's queue is filled up it should stay at a constant size and it is mostly the case if there is a single very fast publisher. If there are multiple parallel publishers is starts to slowly leak, what can be observed with the iox-introspection-client (if it is compiled separately).
The issue is easily reproducible if publishers abort execution abruptly (this method is used in the example repository), however AFAIK it is not a requirement for the issue to occur. We noticed the issue with leaking chunks in our system which has a few dozen of nodes and then we tried to find an easily reproducible, simple case.
For more background, we found this issue due to another possible bug: ros2/geometry2#630. That bug makes tf_buffer a really slow reader of the /parameter_events topic. This topic has QoS history of 1000 so it cannot be even handled at the moment with the default Iceoryx limits. We recompiled Iceoryx with history depth of 4096 and the system seemed to work fine for a few hours, but after a few hours we started to get errors that too many chunks were held in parallel on topic /parameter_events which didn't make sense. But then we observed with the iox-introspection-client that if you start some simple random node with the default parameters handling, and next you start to spawn and close in parallel other, unrelated nodes that broadcast their parameters, the number of memory chunks held by the first node slowly and randomly increases over time.
The text was updated successfully, but these errors were encountered: