-
Notifications
You must be signed in to change notification settings - Fork 1.6k
-
Notifications
You must be signed in to change notification settings - Fork 1.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Hot reload stuck in progress after pausing inputs #9354
Comments
Seems like reproducing this issue is not as easy as I thought but I have some more insight: I managed to find all the pods where this issue happens by querying this in prometheus:
I noticed that the pods that stop sending logs also have the I checked how many hot reloads back to back it took for it to fail and it seems like it was 2 reloads in 30s increments but in some it was even 2mins. I tried reproducing it spamming the hot reload curl in multiple different kind of pods with different scenarios:
I didn’t find any of these criterias to affect if the hot reload will fail even when spamming the hot reload curl nonstop without any wait. Not really sure what else it could be but it’s a tough one 😛 |
Bug Report
Executing a hot reload with from a sidecar container in the same pod as fluent-bit with
curl -X POST http://localhost:2020/api/v2/reload -d '{}'
gets stuck and if you try to reload again you get{"reload":"in progress","status":-2}
In the fluent-bit configuration we can see this and its stuck like this forever:
Normally when the reload works the lines that come after this should be:
To Reproduce
Create fluent-bit.yaml:
curl -X POST http://localhost:2020/api/v2/reload -d '{}'
multiple times until the reload hangs like I mentioned aboveExpected behavior
The reload should fail/continue but not hang.
Your Environment
Additional context
I am running a sidecar next to my fluent-bit container that dynamically creates a filter with throttling configuration that is per container on the same node as fluent-bit(fluent-bit runs as daemonset in my kubernetes cluster).
Everytime theres a change to the filter configuration the sidecar performs a hot reload with
curl -X POST http://localhost:2020/api/v2/reload -d '{}'
Is there maybe an option to verify when the reload fails and be able to retry it because if I try to run the reload again it just says it's in progress forever.
Also is there a way to monitor or know that it happens? Since health check shows as ok, I only noticed that happened because I stopped receiving logs.
Would be nice if there was a way to know about it and maybe restart the fluent-bit container or perform some sort of forced reload.
The text was updated successfully, but these errors were encountered: