Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cooperative Rebalance #1081

Merged
merged 5 commits into from
Nov 18, 2024
Merged

Conversation

serj026
Copy link
Contributor

@serj026 serj026 commented Jun 17, 2024

The purpose of these changes is to add a functional version of node-rdkafka that supports incremental cooperative rebalancing. Given that librdkafka has supported this type of rebalancing for nearly four years, this update aims to bring node-rdkafka in line with librdkafka capabilities and enhancing its functionality.

We use the code from this fork in our production environment for services that handle 5000 requests/messages per second. Using the cooperative-sticky partitioning strategy has shown good results by avoiding "stop-the-world" situations. Specifically, we have observed increased throughput and reduced spikes during scale-up and scale-down operations of pods in Kubernetes.

@SeanReece
Copy link

@GaryWilber Any way we could get some eyes on cooperative rebalancing? :) This would be huge for us.

@GaryWilber
Copy link
Collaborator

@iradul Do you have access to trigger the PR test checks/workflow for this? I don't see a way to trigger it myself.

@ZhukovAntonPls
Copy link

ZhukovAntonPls commented Aug 28, 2024

@SeanReece @GaryWilber @iradul Guys, how can we escalate the testing and merging process of this PR? It would be very useful for the whole community

@ZhukovAntonPls
Copy link

@SeanReece @GaryWilber @iradul Pls help to finish this enhancement

@SeanReece
Copy link

FWIW We've decided to move to https://github.com/confluentinc/confluent-kafka-javascript which supports cooperative rebalance and provides a node-rdkafka compatible API.

@wiktor-obrebski
Copy link

@serj026, from my understanding, the only thing preventing this PR from being merged is that the automatic build hasn’t started. I believe this is because the PR/commits were created before the system was fully in place.

Could you try adding an empty commit or recreating the PR with the same code? This should re-trigger the build.

@serj026
Copy link
Contributor Author

serj026 commented Nov 18, 2024

@SeanReece We tried using confluent-kafka-javascript when the code appeared in master, but encountered a potential memory leak during our e2e tests. We have our own wrapper for node-rdkafka, and in some tests, we observed that memory usage increased to as much as 5.5GB, compared to node-rdkafka, where utilization was around ~580MB.

@wiktor-obrebski
Copy link

Looks like it did not trigger the build. I have no idea why it did not help, as it looks like other PRs do not have same issue.

CI trigger definition: https://github.com/Blizzard/node-rdkafka/blob/master/.github/workflows/test.yml

@wiktor-obrebski
Copy link

Ok, I get confused. It looks like we need "workflow approval" (something other than PR approve) just to run the CI.
@GaryWilber @iradul Any chance you can help here and trigger the CI?

@GaryWilber GaryWilber merged commit 1b33604 into Blizzard:master Nov 18, 2024
5 of 6 checks passed
@wiktor-obrebski
Copy link

Thank you!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants