You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
It has been observed that after a machine/VM re-start the Kafka pods are crashing and zookeeper pod reports an Unresolved address exception when we have single zookeeper and single kafka replicas.
This is occuring after we moved from Kafka version 3.4.0 to 3.7.0 which is supported with Strimzi Operator version 0.40(0.40.0-kafka-3.7.0). In the Strimzi-operator log we can see Session lost/Expired exception.
This issue is not very consistent it happens say 6/10 times after a machine re-start but we have seen this issue only after moving kafka from 3.4.0 to a higher version and we had to do this upgrade as stimzi operator 0.40.0-kafka-3.7.0 doesn't supports using previous Kafka version 3.4.0. As this issue is more prominent with newer version of Strimzi/Kafka, please can this be looked upon ?
The workaround we used was to re-start zookeeper pods and then kafka pods(if they are not up automatically). We had to re-start zookeeper pod multiple times. Another workaround was to uninstall and re-install kafka.
Steps to reproduce
Install Strimzi Operator, Kafka and Zookeeper
Re-Start Machine/VM (You may need to re-start machine multiple times to hit this issue)
Watch for kafka pods in the namespace where kafka is installed
Check logs for zookeeper pod
Expected behavior
Kafka pods should not crash and zookeeper should not report unresolved address exception after a machine/VM re-start
Kafka version
3.7.0
Strimzi version
0.40
Kubernetes version
v1.29.1+rke2r1
Installation method
Helm Chart
Infrastructure
AWS EC2
Additional context
Zookeeper Exception:
2024-05-31 06:22:46,870 INFO Created server with tickTime 500 ms minSessionTimeout 1000 ms maxSessionTimeout 10000 ms clientPortListenBacklog -1 datadir /var/lib/zookeeper/data/version-2 snapdir /var/lib/zookeeper/data/version-2 (org.apache.zookeeper.server.ZooKeeperServer) [QuorumPeermyid=1(secure=[0:0:0:0:0:0:0:0]:2181)]
2024-05-31 06:22:46,870 ERROR Couldn't bind to kafka-cluster-zookeeper-0.kafka-cluster-zookeeper-nodes.foundation-env-default.svc/:2888 (org.apache.zookeeper.server.quorum.Leader) [QuorumPeermyid=1(secure=[0:0:0:0:0:0:0:0]:2181)]
java.net.SocketException: Unresolved address
at java.base/java.net.ServerSocket.bind(ServerSocket.java:380)
at java.base/java.net.ServerSocket.bind(ServerSocket.java:342)
at org.apache.zookeeper.server.quorum.Leader.createServerSocket(Leader.java:322)
at org.apache.zookeeper.server.quorum.Leader.lambda$new$0(Leader.java:301)
at java.base/java.util.stream.ReferencePipeline$3$1.accept(ReferencePipeline.java:197)
at java.base/java.util.concurrent.ConcurrentHashMap$KeySpliterator.forEachRemaining(ConcurrentHashMap.java:3573)
at java.base/java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:509)
at java.base/java.util.stream.AbstractPipeline.wrapAndCopyInto(AbstractPipeline.java:499)
at java.base/java.util.stream.ForEachOps$ForEachOp.evaluateSequential(ForEachOps.java:150)
at java.base/java.util.stream.ForEachOps$ForEachOp$OfRef.evaluateSequential(ForEachOps.java:173)
at java.base/java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:234)
at java.base/java.util.stream.ReferencePipeline.forEach(ReferencePipeline.java:596)
at org.apache.zookeeper.server.quorum.Leader.(Leader.java:304)
at org.apache.zookeeper.server.quorum.QuorumPeer.makeLeader(QuorumPeer.java:1340)
at org.apache.zookeeper.server.quorum.QuorumPeer.run(QuorumPeer.java:1551)
2024-05-31 06:22:46,870 WARN Unexpected exception (org.apache.zookeeper.server.quorum.QuorumPeer) [QuorumPeermyid=1(secure=[0:0:0:0:0:0:0:0]:2181)]
java.io.IOException: Leader failed to initialize any of the following sockets: [kafka-cluster-zookeeper-0.kafka-cluster-zookeeper-nodes.foundation-env-default.svc/:2888]
at org.apache.zookeeper.server.quorum.Leader.(Leader.java:307)
at org.apache.zookeeper.server.quorum.QuorumPeer.makeLeader(QuorumPeer.java:1340)
at org.apache.zookeeper.server.quorum.QuorumPeer.run(QuorumPeer.java:1551)
2024-05-31 06:22:46,870 INFO Peer state changed: looking (org.apache.zookeeper.server.quorum.QuorumPeer) [QuorumPeermyid=1(secure=[0:0:0:0:0:0:0:0]:2181)]
The text was updated successfully, but these errors were encountered:
It has been observed that after a machine/VM re-start the Kafka pods are crashing and zookeeper pod reports an Unresolved address exception when we have single zookeeper and single kafka replicas.
This is occuring after we moved from Kafka version 3.4.0 to 3.7.0 which is supported with Strimzi Operator version 0.40(0.40.0-kafka-3.7.0). In the Strimzi-operator log we can see Session lost/Expired exception.
This issue is not very consistent it happens say 6/10 times after a machine re-start but we have seen this issue only after moving kafka from 3.4.0 to a higher version and we had to do this upgrade as stimzi operator 0.40.0-kafka-3.7.0 doesn't supports using previous Kafka version 3.4.0. As this issue is more prominent with newer version of Strimzi/Kafka, please can this be looked upon ?
The workaround we used was to re-start zookeeper pods and then kafka pods(if they are not up automatically). We had to re-start zookeeper pod multiple times. Another workaround was to uninstall and re-install kafka.
Steps to reproduce
Expected behavior
Kafka pods should not crash and zookeeper should not report unresolved address exception after a machine/VM re-start
Kafka version
3.7.0
Strimzi version
0.40
Kubernetes version
v1.29.1+rke2r1
Installation method
Helm Chart
Infrastructure
AWS EC2
Additional context
Zookeeper Exception:
2024-05-31 06:22:46,870 INFO Created server with tickTime 500 ms minSessionTimeout 1000 ms maxSessionTimeout 10000 ms clientPortListenBacklog -1 datadir /var/lib/zookeeper/data/version-2 snapdir /var/lib/zookeeper/data/version-2 (org.apache.zookeeper.server.ZooKeeperServer) [QuorumPeermyid=1(secure=[0:0:0:0:0:0:0:0]:2181)]
2024-05-31 06:22:46,870 ERROR Couldn't bind to kafka-cluster-zookeeper-0.kafka-cluster-zookeeper-nodes.foundation-env-default.svc/:2888 (org.apache.zookeeper.server.quorum.Leader) [QuorumPeermyid=1(secure=[0:0:0:0:0:0:0:0]:2181)]
java.net.SocketException: Unresolved address
at java.base/java.net.ServerSocket.bind(ServerSocket.java:380)
at java.base/java.net.ServerSocket.bind(ServerSocket.java:342)
at org.apache.zookeeper.server.quorum.Leader.createServerSocket(Leader.java:322)
at org.apache.zookeeper.server.quorum.Leader.lambda$new$0(Leader.java:301)
at java.base/java.util.stream.ReferencePipeline$3$1.accept(ReferencePipeline.java:197)
at java.base/java.util.concurrent.ConcurrentHashMap$KeySpliterator.forEachRemaining(ConcurrentHashMap.java:3573)
at java.base/java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:509)
at java.base/java.util.stream.AbstractPipeline.wrapAndCopyInto(AbstractPipeline.java:499)
at java.base/java.util.stream.ForEachOps$ForEachOp.evaluateSequential(ForEachOps.java:150)
at java.base/java.util.stream.ForEachOps$ForEachOp$OfRef.evaluateSequential(ForEachOps.java:173)
at java.base/java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:234)
at java.base/java.util.stream.ReferencePipeline.forEach(ReferencePipeline.java:596)
at org.apache.zookeeper.server.quorum.Leader.(Leader.java:304)
at org.apache.zookeeper.server.quorum.QuorumPeer.makeLeader(QuorumPeer.java:1340)
at org.apache.zookeeper.server.quorum.QuorumPeer.run(QuorumPeer.java:1551)
2024-05-31 06:22:46,870 WARN Unexpected exception (org.apache.zookeeper.server.quorum.QuorumPeer) [QuorumPeermyid=1(secure=[0:0:0:0:0:0:0:0]:2181)]
java.io.IOException: Leader failed to initialize any of the following sockets: [kafka-cluster-zookeeper-0.kafka-cluster-zookeeper-nodes.foundation-env-default.svc/:2888]
at org.apache.zookeeper.server.quorum.Leader.(Leader.java:307)
at org.apache.zookeeper.server.quorum.QuorumPeer.makeLeader(QuorumPeer.java:1340)
at org.apache.zookeeper.server.quorum.QuorumPeer.run(QuorumPeer.java:1551)
2024-05-31 06:22:46,870 INFO Peer state changed: looking (org.apache.zookeeper.server.quorum.QuorumPeer) [QuorumPeermyid=1(secure=[0:0:0:0:0:0:0:0]:2181)]
The text was updated successfully, but these errors were encountered: