-
Notifications
You must be signed in to change notification settings - Fork 404
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Memory Leak with KafkaJS Causing Application Crash #2332
Comments
@ffc-gwakefield Can you please provide a reproduction application exhibiting this behavior? We might be able to fix but based on what you're saying it sounds like it's a kafkajs issue which we may be compounding. Also can you link the kafkajs issue? |
Here is the link to the kafkajs issue: tulios/kafkajs#1704 I'll get back with the team and see if we can create a reproduction application for you to use. In the meantime, the issue on kafkajs has a link to a fork with failing test, along with the fix for kafkajs if that's useful. |
Thanks @ffc-gwakefield. In the meantime I recommend you just disable the kafkajs feature flag within the agent: |
@ffc-gwakefield a reproduction application would really help here. I can't seem to reproduce |
Thank you, @bizob2828 for your prompt replies and looking into this. It's unfortunate that you couldn't replicate it because our teams our quite busy since our projects are behind due to this issue. But hopefully we can build something when things settle down here. |
I'm sorry you're all busy but I can't do much without a repro case. Seems like the better angle may be to get the kafkajs folks to fix the issue. We're just compounding the problem as we instrument setTimeout. |
Closing due to lack of repro case. If you can provide repro, please feel free to reopen |
Description
NewrelicJS fills up the heap and crashes applications when paired with KafkaJS in an Express service. This is due to faulty code in KafkaJS's
checkPendingRequests
loop that causes Newrelic to shim setTimeout in a transaction and create aSegmentTrace
object. Because of the flaw in KafkaJS with a runaway loop, thousands of these being created and quickly run out of memory in your Node heap on the server. The transaction doesn't start recording segments until the express http endpoint is hit.Expected Behavior
Newrelic can manage runaway loops like from Kafka without maxing the heap, or have a feature/config to limit this behavior.
Troubleshooting or NR Diag results
Running a Node heapprofile or cpuprofile for a couple seconds shows that Newrelic is using all of the CPU and memory for the trace segment calls. I captured a few and got the same results. Interestingly with the latest package, there was a large increase in CPU time garbage collecting but it did not help. Heap still filled up and eventually crashed the app.
Usually trying to capture an entire heapsnapshot results in the app crashing as it's already using most of the machine's memory. But I managed to get one before it completely bloated the memory and with enough swap space:
Steps to Reproduce
Your Environment
RHEL 9 Linux L2
Node v18.20.2
@nestjs/axios ^3.0.0
@nestjs/core ^10.3.7
kafkajs ^2.2.0
Additional context
I realize the main problem is kafkajs which seems to be an unmaintained library now, but this is causing newrelic to spin out of control. Is there any way to mitigate this for bad acting software like kafkajs?
The text was updated successfully, but these errors were encountered: