-
Notifications
You must be signed in to change notification settings - Fork 8
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Confluent connections interfere with KafkaJS connections when SSL is involved #55
Comments
Hi @apeloquin-agilysys , thanks for the detailed steps as always. I could reproduce this with Confluent Cloud and the given test, using node's built-in test-runner. I dug in and investigated it. I ran only I increased the number of topics (and thus consumers) and made the connection of the clients concurrent for KafkaJS (rather than one after the other) until it started failing consistently with the same issue. Based on a few other bugs filed, like this one:
As far as I understand, TLS connection establishment takes possibly >1s, and in that case, KafkaJS will spawn another connection on this timeout. Confluent Cloud also has some throttling built in for preventing these connection storm situations, and that just makes it worse. To fix this and make the test run, I added this to the KafkaJS configuration, This started making the test pass for me, there are no more of the TLS issues. I did need to increase the timeout for We have actually created a PR with KafkaJS for changing this default, but it's not been accepted yet. As for why running it with Confluent consumers makes a difference as opposed to your normal case, I think it's possible that the additional number of connections from the Confluent consumers is something that makes the KafkaJS connections take a little more time, causing the reconnects and the throttling. But that's a guess at the moment. |
I'm closing this issue - given that we believe it's a KafkaJS issue we have seen a few times before with Confluent Cloud's throttling and a large number of connections, which is solved by the timeout. In this case, there's nothing we can do within this library. |
This is an odd one.
We're attempting to phase-in use of the Confluent library to our app for new functionality. This app has existing functionality using KafkaJS, and the existing functionality is in production. Given the "early access" nature of the Confluent library, it makes sense to adopt it for the new functionality, but retain KafkaJS for the existing functionality for now.
The new functionality was added, integration tests showed everything working, but as soon as we deployed to an environment using external Kafka cluster(s) the KafkaJS connections immediately started failing with:
I narrowed it down to starting multiple consumers for both KafkaJS and Confluent concurrently... with SSL connections.
I created a test that replicates the scenario by creating consumers for 5 topics each with KafkaJS and Confluent concurrently on a Confluent Cloud cluster. There are also tests that run just 5 KafkaJS consumers or just 5 Confluent consumers to prove there is no issue when run independently.
Notes:
XXXX
in place of identifying details/credentials.The test for both ultimately fails to connect all the consumers and on the KafkaJS side produces many occurrences of this error (which is not present when running KafkaJS only):
It would be helpful to understand what is conflicting here and if it can be prevented on the Confluent side or if there is a way to work around it.
I have confirmed that if I start the KafkaJS consumers before the Confluent consumers, the KafkaJS connections succeed. This is not viable in a real-world scenario however, because if later on the connection is dropped and the consumer tries to reconnect it will encounter this same issue.
The text was updated successfully, but these errors were encountered: