-
Notifications
You must be signed in to change notification settings - Fork 374
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Possible Memory Issue #3880
Comments
Hey @bmalinconico thanks for reaching out and sorry you ran into this. I was able to see the I did do a few experiments with this reproducer and the instrumentation does suspiciously seem to be the "straw that breaks the camel's back"; e.g. this does somewhat look to be a bug in pg that the extra memory pressure of tracing seems to trigger... But I don't have convincing evidence either way yet. Just to confirm, you mentioned:
Did you mean that you were upgrading from 1.23 and then downgraded again?
Can you share the output from the Ruby VM crash? It may help in tracking the issue down. Or, even better, if you're able to get a core dump, it would be a really useful tool to help track this down. |
@ivoanjo thanks for confirming. The Yes I first encounter this when I upgraded to 2.3.0 from 1.2.3 (among other changes). In order to isolate the issue I rolled everything back and started piecing it back together. GC compaction was triggering this even on 1.2.3 (same PG version). Our tests suites have occasionally triggered the seg fault when auto compaction was on. I'll see if I can pull the crash log from that. |
I'm a bit confused about this part -- you've mentioned version 1.2.3 in a few places above. Is this version 1.23.something? 🤔 1.2.0 is quite old (July 2022) and there were no other point releases in that series.
That would be great! 👀 |
Sorry I was typing it out from memory. This was encountered when upgrading from 1.22.0 -> 2.3.0, when we encountered the error we rolled back to 1.22 and the issue was still present.
I was not able to find a CI run with the output in it. I've got a branch with auto_compact enabled and I'm trying to get the failure to occur. |
Thanks! Hopefully that will help shine some light on this. |
Current behaviour
I have found that enabling auto-compaction in the Ruby GC is causing what appears to be random memory related bugs. This bug manifests in the following way:
Manifestation 1
Where XXX is any random built-in or app specific class.
This is raised on a call to
exec
orexec_params
in the DD Postgres instrumentation.Manifestation 2
This error is produced by the reproduction steps I will provide later.
I patched the DD PG instrumentation for
exec_params
and rescued the error with a pry session. The params array contained no empty values and a retry of the block succeededManifestation 3
Occasional segfaults.
All of these errors feel like something is holding a memory reference that is being moved, resulting in random garbage getting passed down the stack and occasionally referencing a freed memory location.
Expected behaviour
Not an error!
Steps to reproduce
I was unable to reproduce on my local machine but a local containerized env may be able to reproduce it. I was only able to reproduce this in a container running on EC2, that machine is Linux x86_64.
Dockerfile to reproduce this image
My running application is able to reproduce this error easily when I have the compacting garbage collector enabled due to the volume of activity. Reproducing this in a shell is much more time consuming as you need to (presumably) wait for some compaction.
I'll also acknowledge this may not be datadog, but I've tried to narrow it down as much as I can.
I'm going to reiterate that reproducing this is annoying, since there is no small amount of luck trying to get a compacting GC run to trigger at the right time. Doing the above in concurrent fibers increased the odds of it happening (probably due to increased memory churn) however I am providing the smallest repo I can.
Environment
Datadog.configure ...
):The text was updated successfully, but these errors were encountered: