Ramping load average on node noticed in 4.12 - 4.13 #1791

andrew-wilson-88 · 2023-11-10T14:09:03Z

andrew-wilson-88
Nov 10, 2023

Hello,

I've been experiencing a ramping load average over a period of ~3 weeks on one of the nodes in my cluster, it was first noticed in 4.12 but has since been updated to 4.13 and the problem is still persisting.

On logging into the node it looks like crio is the culprit. Rebooting the node (still running the same pods when it comes back online) will see the load average drop for a few weeks as it ramps back up to an unsustainable level. Here is a graph for reference as well as a link to the must-gather which was run just prior to a reboot:

https://drive.google.com/file/d/1I3mB3MN8rZtCEplgmT9dfxxvFRiqfxYZ/view?usp=sharing

I don't think there is anything in particular wrong with the cluster as the other smaller nodes don't exhibit the same symptoms (though they are smaller and have a minimal rate of change).

I have a feeling it may be due to a garbage collection issue and the load ramps up exponentially due to constant iterating over objects that no longer exist.

Ultimately I'm wondering if anybody else has experienced a similar issue in the past and could shed some further light into anything that I should be on the lookout for or statistics that I should be monitoring to keep a closer eye on the issue.

Regards,
Andrew

andrew-wilson-88 · 2023-12-13T13:45:49Z

andrew-wilson-88
Dec 13, 2023
Author

I have closed this as the issue has not resurfaced since the last set of upgrades applied

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Ramping load average on node noticed in 4.12 - 4.13 #1791

{{title}}

Replies: 1 comment

{{title}}

Select a reply

Ramping load average on node noticed in 4.12 - 4.13 #1791

andrew-wilson-88 Nov 10, 2023

Replies: 1 comment

andrew-wilson-88 Dec 13, 2023 Author

andrew-wilson-88
Nov 10, 2023

andrew-wilson-88
Dec 13, 2023
Author