Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

‌‌‌‌‌The FE Leader keeps reporting an UnknownHostException exception #550

Open
yandongxiao opened this issue Jun 24, 2024 · 0 comments
Labels
bug Something isn't working

Comments

@yandongxiao
Copy link
Collaborator

Describe the bug

If the number of replicas for CN or BE is reduced without performing a DROP operation, the FE Leader will continuously report the following error:

  1. For each BE or CN node that is not DROPPED, the following error will be reported.
  2. The error occupies approximately 5KB of space.
  3. Such an error is outputted every five seconds.
  4. The FE logs will become unreadable.
    Each CN that is not DROPPED will result in the FE generating 24 * 10 * 60 * 5KB = 70MB of logs per day.
2024-06-24 10:53:25.176+08:00 WARN (heartbeat mgr|14) [HeartbeatMgr.runAfterCatalogReady():165] get bad heartbeat response: type: BACKEND, status: BAD, msg: java.net.UnknownHostException: kube-starrocks-cn-0.kube
-starrocks-cn-search.starrocks.svc.cluster.local
Jun 24, 2024 10:53:25 AM io.grpc.internal.ManagedChannelImpl$NameResolverListener handleErrorInSyncContext
WARNING: [Channel<16218>: (kube-starrocks-cn-0.kube-starrocks-cn-search.starrocks.svc.cluster.local:9070)] Failed to resolve name. status=Status{code=UNAVAILABLE, description=Unable to resolve host kube-starrocks
-cn-0.kube-starrocks-cn-search.starrocks.svc.cluster.local, cause=java.lang.RuntimeException: java.net.UnknownHostException: kube-starrocks-cn-0.kube-starrocks-cn-search.starrocks.svc.cluster.local: Name or servi
ce not known
        at io.grpc.internal.DnsNameResolver.resolveAddresses(DnsNameResolver.java:223)
        at io.grpc.internal.DnsNameResolver.doResolve(DnsNameResolver.java:282)
        at io.grpc.grpclb.GrpclbNameResolver.doResolve(GrpclbNameResolver.java:63)
        at io.grpc.internal.DnsNameResolver$Resolve.run(DnsNameResolver.java:318)
        at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
        at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
        at java.base/java.lang.Thread.run(Thread.java:829)
Caused by: java.net.UnknownHostException: kube-starrocks-cn-0.kube-starrocks-cn-search.starrocks.svc.cluster.local: Name or service not known
        at java.base/java.net.Inet6AddressImpl.lookupAllHostAddr(Native Method)
        at java.base/java.net.InetAddress$PlatformNameService.lookupAllHostAddr(InetAddress.java:930)
        at java.base/java.net.InetAddress.getAddressesFromNameService(InetAddress.java:1543)
        at java.base/java.net.InetAddress$NameServiceAddresses.get(InetAddress.java:848)
        at java.base/java.net.InetAddress.getAllByName0(InetAddress.java:1533)
        at java.base/java.net.InetAddress.getAllByName(InetAddress.java:1386)
        at java.base/java.net.InetAddress.getAllByName(InetAddress.java:1307)
        at io.grpc.internal.DnsNameResolver$JdkAddressResolver.resolveAddress(DnsNameResolver.java:631)
        at io.grpc.internal.DnsNameResolver.resolveAddresses(DnsNameResolver.java:219)
        ... 6 more
}
2024-06-24 10:53:25.235+08:00 WARN (starmgr-heartbeatmgr-0|100) [StarletAgent.heartbeat():94] caught GRPC exception when sending heartbeat to worker kube-starrocks-cn-0.kube-starrocks-cn-search.starrocks.svc.clus
ter.local:9070, io.grpc.StatusRuntimeException: UNAVAILABLE: Unable to resolve host kube-starrocks-cn-0.kube-starrocks-cn-search.starrocks.svc.cluster.local.
2024-06-24 10:53:25.236+08:00 WARN (starmgr-heartbeatmgr-0|100) [StarletAgent.heartbeat():110] sending heartbeat to worker kube-starrocks-cn-0.kube-starrocks-cn-search.starrocks.svc.cluster.local:9070 failed, GRP
C:UNAVAILABLE: Unable to resolve host kube-starrocks-cn-0.kube-starrocks-cn-search.starrocks.svc.cluster.local.
^[[A^[[A^[[A^[[A^[[A^[[A^[[A^[[A^[[A^[[A^[[A^[[A^[[A2024-06-24 10:53:30.191+08:00 WARN (heartbeat-mgr-pool-4|201) [HeartbeatMgr$BackendHeartbeatHandler.call():321] backend heartbeat got exception, addr: kube-star
rocks-cn-0.kube-starrocks-cn-search.starrocks.svc.cluster.local:9050
org.apache.thrift.transport.TTransportException: java.net.UnknownHostException: kube-starrocks-cn-0.kube-starrocks-cn-search.starrocks.svc.cluster.local
        at org.apache.thrift.transport.TSocket.open(TSocket.java:226) ~[libthrift-0.13.0.jar:0.13.0]
        at com.starrocks.common.GenericPool$ThriftClientFactory.create(GenericPool.java:148) ~[starrocks-fe.jar:?]
        at com.starrocks.common.GenericPool$ThriftClientFactory.create(GenericPool.java:133) ~[starrocks-fe.jar:?]
        at org.apache.commons.pool2.BaseKeyedPooledObjectFactory.makeObject(BaseKeyedPooledObjectFactory.java:62) ~[commons-pool2-2.3.jar:2.3]
        at org.apache.commons.pool2.impl.GenericKeyedObjectPool.create(GenericKeyedObjectPool.java:1036) ~[commons-pool2-2.3.jar:2.3]
        at org.apache.commons.pool2.impl.GenericKeyedObjectPool.borrowObject(GenericKeyedObjectPool.java:356) ~[commons-pool2-2.3.jar:2.3]
        at org.apache.commons.pool2.impl.GenericKeyedObjectPool.borrowObject(GenericKeyedObjectPool.java:278) ~[commons-pool2-2.3.jar:2.3]
        at com.starrocks.common.GenericPool.borrowObject(GenericPool.java:101) ~[starrocks-fe.jar:?]
        at com.starrocks.system.HeartbeatMgr$BackendHeartbeatHandler.call(HeartbeatMgr.java:270) ~[starrocks-fe.jar:?]
        at com.starrocks.system.HeartbeatMgr$BackendHeartbeatHandler.call(HeartbeatMgr.java:256) ~[starrocks-fe.jar:?]
        at java.util.concurrent.FutureTask.run(FutureTask.java:264) ~[?:?]
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128) ~[?:?]
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628) ~[?:?]
        at java.lang.Thread.run(Thread.java:829) ~[?:?]
Caused by: java.net.UnknownHostException: kube-starrocks-cn-0.kube-starrocks-cn-search.starrocks.svc.cluster.local
        at java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:229) ~[?:?]
        at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:392) ~[?:?]
        at java.net.Socket.connect(Socket.java:609) ~[?:?]
        at org.apache.thrift.transport.TSocket.open(TSocket.java:221) ~[libthrift-0.13.0.jar:0.13.0]
        ... 13 more

Expected behavior

Operator should control whether to DROP BE/CN in a proper way.

Please complete the following information

  • Operator Version: v1.9.6
@yandongxiao yandongxiao added the bug Something isn't working label Jun 24, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

1 participant