Missing kubernetes logs makes diagnostic hard #73

MonsieurNicolas · 2022-10-12T16:57:38Z

Kubernetes logs are not propagated "real time", making it difficult to figure out when things go wrong.

For example recently we had this in the supercluster logs (stays stuck like this until hitting the 8 hours timeout):

[11:52:00 INF] Waiting for replicas on stellar-supercluster/ssc-0416z-9db5e7-sts-bd
[11:52:00 INF] Saw event for statefulset ssc-0416z-9db5e7-sts-bd: Added
[11:52:00 INF] StatefulSet stellar-supercluster/ssc-0416z-9db5e7-sts-bd: 0/3 replicas ready
[11:52:59 INF] Waiting for replicas on stellar-supercluster/ssc-0416z-9db5e7-sts-bd
[11:52:59 INF] Saw event for statefulset ssc-0416z-9db5e7-sts-bd: Added
[11:52:59 INF] StatefulSet stellar-supercluster/ssc-0416z-9db5e7-sts-bd: 0/3 replicas ready
[11:53:58 INF] Waiting for replicas on stellar-supercluster/ssc-0416z-9db5e7-sts-bd
[11:53:58 INF] Saw event for statefulset ssc-0416z-9db5e7-sts-bd: Added
[11:53:58 INF] StatefulSet stellar-supercluster/ssc-0416z-9db5e7-sts-bd: 0/3 replicas ready
[11:54:58 INF] Waiting for replicas on stellar-supercluster/ssc-0416z-9db5e7-sts-bd
[11:54:58 INF] Saw event for statefulset ssc-0416z-9db5e7-sts-bd: Added
[11:54:58 INF] StatefulSet stellar-supercluster/ssc-0416z-9db5e7-sts-bd: 0/3 replicas ready
[11:55:57 INF] Waiting for replicas on stellar-supercluster/ssc-0416z-9db5e7-sts-bd
[11:55:57 INF] Saw event for statefulset ssc-0416z-9db5e7-sts-bd: Added
[11:55:57 INF] StatefulSet stellar-supercluster/ssc-0416z-9db5e7-sts-bd: 0/3 replicas ready
[11:56:15 INF] There are 7 pods in total

In this particular case, we had a bad image name passed to kubernetes, and running a couple diagnostic commands allows to quickly spot the problem:

+ kubectl get pods -n stellar-supercluster
NAME                         READY   STATUS             RESTARTS   AGE
ssc-1451z-331471-sts-bd-0    3/4     InvalidImageName   0          24m
ssc-1451z-331471-sts-cq-0    3/4     InvalidImageName   0          24m
ssc-1451z-331471-sts-kb-0    3/4     InvalidImageName   0          24m
ssc-1451z-331471-sts-lo-0    3/4     InvalidImageName   0          24m
ssc-1451z-331471-sts-sdf-0   3/4     InvalidImageName   0          24m
ssc-1451z-331471-sts-sp-0    3/4     InvalidImageName   0          24m
ssc-1451z-331471-sts-wx-0    3/4     InvalidImageName   0          24m

 Warning  InspectFailed     25m (x4 over 25m)    kubelet            Failed to apply default image tag "docker-registry.services.stellar-ops.com/dev/stellar-core:19.4.1-1101.da3754bb5.focal~perftests": couldn't parse image reference "docker-registry.services.stellar-ops.com/dev/stellar-core:19.4.1-1101.da3754bb5.focal~perftests": invalid reference format

The text was updated successfully, but these errors were encountered:

MonsieurNicolas added the bug Something isn't working label Oct 12, 2022

mbsdf self-assigned this Sep 13, 2023

MonsieurNicolas unassigned mbsdf Nov 15, 2023

sisuresh added the core-team Issue can be worked on by the core team label Oct 7, 2024

anupsdf assigned jayz22 Oct 15, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Missing kubernetes logs makes diagnostic hard #73

Missing kubernetes logs makes diagnostic hard #73

MonsieurNicolas commented Oct 12, 2022

Missing kubernetes logs makes diagnostic hard #73

Missing kubernetes logs makes diagnostic hard #73

Comments

MonsieurNicolas commented Oct 12, 2022