Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Hot reload issues #1249

Open
chrono2002 opened this issue Jul 19, 2024 · 4 comments
Open

Hot reload issues #1249

chrono2002 opened this issue Jul 19, 2024 · 4 comments

Comments

@chrono2002
Copy link

Describe the issue

We've got CI which deploys filters, parsers and outputs into several namespaces. It works like this: before deployment it deletes everything in namespace.

Started from version 2.7.0 we've got following errors:

[pod/fluent-bit-v4rzh/fluent-bit] level=info time=2024-07-19T14:05:03Z msg="Config file changed, reloading..."
[pod/fluent-bit-v4rzh/fluent-bit] level=info time=2024-07-19T14:05:03Z msg="Config file changed, reloading..."
[pod/fluent-bit-v4rzh/fluent-bit] level=info time=2024-07-19T14:05:03Z msg="Config file changed, reloading..."
[pod/fluent-bit-ctq8k/fluent-bit] level=info time=2024-07-19T14:05:06Z msg="Config file changed, reloading..."
[pod/fluent-bit-ctq8k/fluent-bit] level=info time=2024-07-19T14:05:06Z msg="Config file changed, reloading..."
[pod/fluent-bit-ctq8k/fluent-bit] level=info time=2024-07-19T14:05:06Z msg="Config file changed, reloading..."
[pod/fluent-bit-p9ftx/fluent-bit] level=info time=2024-07-19T14:05:17Z msg="Config file changed, reloading..."
[pod/fluent-bit-p9ftx/fluent-bit] level=info time=2024-07-19T14:05:17Z msg="Config file changed, reloading..."
[pod/fluent-bit-p9ftx/fluent-bit] level=info time=2024-07-19T14:05:17Z msg="Config file changed, reloading..."

Looks like it is reloading on every object deletion. And when parsers are deleted before filters, it stucks and crashes.
Then restarts normally.

[pod/fluent-bit-xp5lp/fluent-bit] [2024/07/19 13:57:39] [ info] [input:emitter:prod-playtest-ppp-rewrite] initializing
[pod/fluent-bit-xp5lp/fluent-bit] [2024/07/19 13:57:39] [ info] [input:emitter:prod-playtest-ppp-rewrite] storage_strategy='filesystem' (memory + filesystem)
[pod/fluent-bit-xp5lp/fluent-bit] [2024/07/19 13:57:39] [ info] [input:emitter:prod-dev04-rewrite] initializing
[pod/fluent-bit-xp5lp/fluent-bit] [2024/07/19 13:57:39] [ info] [input:emitter:prod-dev04-rewrite] storage_strategy='filesystem' (memory + filesystem)
[pod/fluent-bit-xp5lp/fluent-bit] [2024/07/19 13:57:39] [ info] [input:emitter:prod-qa16-rewrite] initializing
[pod/fluent-bit-xp5lp/fluent-bit] [2024/07/19 13:57:39] [ info] [input:emitter:prod-qa16-rewrite] storage_strategy='filesystem' (memory + filesystem)
[pod/fluent-bit-xp5lp/fluent-bit] [2024/07/19 13:57:39] [ info] [input:emitter:prod-qa18-rewrite] initializing
[pod/fluent-bit-xp5lp/fluent-bit] [2024/07/19 13:57:39] [ info] [input:emitter:prod-qa18-rewrite] storage_strategy='filesystem' (memory + filesystem)
[pod/fluent-bit-xp5lp/fluent-bit] [2024/07/19 13:57:39] [ info] [input:emitter:prod-qa10-rewrite] initializing
[pod/fluent-bit-xp5lp/fluent-bit] [2024/07/19 13:57:39] [ info] [input:emitter:prod-qa10-rewrite] storage_strategy='filesystem' (memory + filesystem)
[pod/fluent-bit-xp5lp/fluent-bit] [2024/07/19 13:57:39] [ info] [input:emitter:prod-dev03-rewrite] initializing
[pod/fluent-bit-xp5lp/fluent-bit] [2024/07/19 13:57:39] [ info] [input:emitter:prod-dev03-rewrite] storage_strategy='filesystem' (memory + filesystem)
[pod/fluent-bit-xp5lp/fluent-bit] [2024/07/19 13:57:39] [ info] [input:emitter:prod-qa19-rewrite] initializing
[pod/fluent-bit-xp5lp/fluent-bit] [2024/07/19 13:57:39] [ info] [input:emitter:prod-qa19-rewrite] storage_strategy='filesystem' (memory + filesystem)
[pod/fluent-bit-xp5lp/fluent-bit] [2024/07/19 13:57:39] [ info] [input:emitter:prod-qa20-rewrite] initializing
[pod/fluent-bit-xp5lp/fluent-bit] [2024/07/19 13:57:39] [ info] [input:emitter:prod-qa20-rewrite] storage_strategy='filesystem' (memory + filesystem)
[pod/fluent-bit-xp5lp/fluent-bit] [2024/07/19 13:57:39] [ info] [input:emitter:prod-qa17-rewrite] initializing
[pod/fluent-bit-xp5lp/fluent-bit] [2024/07/19 13:57:39] [ info] [input:emitter:prod-qa17-rewrite] storage_strategy='filesystem' (memory + filesystem)
[pod/fluent-bit-xp5lp/fluent-bit] [2024/07/19 13:57:39] [ info] [input:emitter:prod-qa11-rewrite] initializing
[pod/fluent-bit-xp5lp/fluent-bit] [2024/07/19 13:57:39] [ info] [input:emitter:prod-qa11-rewrite] storage_strategy='filesystem' (memory + filesystem)
[pod/fluent-bit-xp5lp/fluent-bit] [2024/07/19 13:57:39] [ info] [input:emitter:prod-qc-rewrite] initializing
[pod/fluent-bit-xp5lp/fluent-bit] [2024/07/19 13:57:39] [ info] [input:emitter:prod-qc-rewrite] storage_strategy='filesystem' (memory + filesystem)
[pod/fluent-bit-xp5lp/fluent-bit] [2024/07/19 13:57:39] [ info] [input:emitter:prod-mainline-qa-rewrite] initializing
[pod/fluent-bit-xp5lp/fluent-bit] [2024/07/19 13:57:39] [ info] [input:emitter:prod-mainline-qa-rewrite] storage_strategy='filesystem' (memory + filesystem)
[pod/fluent-bit-xp5lp/fluent-bit] [2024/07/19 13:57:39] [error] [filter:parser:parser.323] requested parser 'cw-meta-meta-server-json-message-time-field-60d52537bbd89f341cbf30ffd3c7677d' not found
[pod/fluent-bit-xp5lp/fluent-bit] [2024/07/19 13:57:39] [error] [filter:parser:parser.323] Invalid 'parser'
[pod/fluent-bit-xp5lp/fluent-bit] [2024/07/19 13:57:39] [error] Failed initialize filter parser.323
[pod/fluent-bit-xp5lp/fluent-bit] [2024/07/19 13:57:39] [error] [engine] filter initialization failed
[pod/fluent-bit-xp5lp/fluent-bit] [2024/07/19 13:57:40] [ info] [input] pausing tail.0
[pod/fluent-bit-xp5lp/fluent-bit] [2024/07/19 13:57:40] [ info] [input] pausing storage_backlog.1
[pod/fluent-bit-xp5lp/fluent-bit] [2024/07/19 13:57:40] [ info] [input] pausing prod-playtest-re_emitter
[pod/fluent-bit-xp5lp/fluent-bit] [2024/07/19 13:57:40] [ info] [input] pausing prod-qa09-re_emitter
[pod/fluent-bit-xp5lp/fluent-bit] [2024/07/19 13:57:40] [ info] [input] pausing prod-xxx01-re_emitter
[pod/fluent-bit-xp5lp/fluent-bit] [2024/07/19 13:57:40] [ info] [input] pausing prod-gd01-re_emitter
[pod/fluent-bit-xp5lp/fluent-bit] [2024/07/19 13:57:40] [ info] [input] pausing prod-qa03-re_emitter
[pod/fluent-bit-xp5lp/fluent-bit] [2024/07/19 13:57:40] [ info] [input] pausing prod-playtest-ppp-re_emitter
[pod/fluent-bit-xp5lp/fluent-bit] [2024/07/19 13:57:40] [ info] [input] pausing prod-dev04-re_emitter
[pod/fluent-bit-xp5lp/fluent-bit] [2024/07/19 13:57:40] [ info] [input] pausing prod-qa16-re_emitter
[pod/fluent-bit-xp5lp/fluent-bit] [2024/07/19 13:57:40] [ info] [input] pausing prod-qa18-re_emitter
[pod/fluent-bit-xp5lp/fluent-bit] [2024/07/19 13:57:40] [ info] [input] pausing prod-qa10-re_emitter
[pod/fluent-bit-xp5lp/fluent-bit] [2024/07/19 13:57:40] [ info] [input] pausing prod-dev03-re_emitter
[pod/fluent-bit-xp5lp/fluent-bit] [2024/07/19 13:57:40] [ info] [input] pausing prod-qa19-re_emitter
[pod/fluent-bit-xp5lp/fluent-bit] [2024/07/19 13:57:40] [ info] [input] pausing prod-qa20-re_emitter
[pod/fluent-bit-xp5lp/fluent-bit] [2024/07/19 13:57:40] [ info] [input] pausing prod-qa17-re_emitter
[pod/fluent-bit-xp5lp/fluent-bit] [2024/07/19 13:57:40] [ info] [input] pausing prod-qa11-re_emitter
[pod/fluent-bit-xp5lp/fluent-bit] [2024/07/19 13:57:40] [ info] [input] pausing prod-qc-re_emitter
[pod/fluent-bit-xp5lp/fluent-bit] [2024/07/19 13:57:40] [ info] [input] pausing prod-mainline-qa-re_emitter
[pod/fluent-bit-xp5lp/fluent-bit] [2024/07/19 13:57:40] [ info] [input] pausing prod-qa15-re_emitter
[pod/fluent-bit-xp5lp/fluent-bit] [2024/07/19 13:57:40] [ info] [input] pausing prod-qa04-re_emitter
[pod/fluent-bit-xp5lp/fluent-bit] [2024/07/19 13:57:40] [ info] [input] pausing prod-qa02-re_emitter
[pod/fluent-bit-xp5lp/fluent-bit] [2024/07/19 13:57:40] [ info] [input] pausing prod-mainline-re_emitter
[pod/fluent-bit-xp5lp/fluent-bit] [2024/07/19 13:57:40] [ info] [input] pausing prod-xxx02-re_emitter
[pod/fluent-bit-xp5lp/fluent-bit] [2024/07/19 13:57:40] [ info] [input] pausing prod-qa01-re_emitter
[pod/fluent-bit-xp5lp/fluent-bit] [2024/07/19 13:57:40] [ info] [input] pausing prod-ld01-re_emitter
[pod/fluent-bit-xp5lp/fluent-bit] [2024/07/19 13:57:40] [ info] [input] pausing prod-dev02-re_emitter
[pod/fluent-bit-xp5lp/fluent-bit] [2024/07/19 13:57:40] [ info] [input] pausing prod-qa07-re_emitter
[pod/fluent-bit-xp5lp/fluent-bit] [2024/07/19 13:57:40] [ info] [input] pausing prod-qa12-re_emitter
[pod/fluent-bit-xp5lp/fluent-bit] [2024/07/19 13:57:40] [ info] [input] pausing prod-consoles-re_emitter
[pod/fluent-bit-xp5lp/fluent-bit] [2024/07/19 13:57:40] [ info] [input] pausing prod-dev01-re_emitter
[pod/fluent-bit-xp5lp/fluent-bit] [2024/07/19 13:57:40] [ info] [input] pausing prod-qa14-re_emitter
[pod/fluent-bit-xp5lp/fluent-bit] [2024/07/19 13:57:40] [ info] [input] pausing prod-qa05-re_emitter
[pod/fluent-bit-xp5lp/fluent-bit] [2024/07/19 13:57:40] [ info] [input] pausing prod-xxx03-re_emitter
[pod/fluent-bit-xp5lp/fluent-bit] [2024/07/19 13:57:40] [ info] [input] pausing prod-qa06-re_emitter
[pod/fluent-bit-xp5lp/fluent-bit] [2024/07/19 13:57:40] [ info] [input] pausing prod-qa13-re_emitter
[pod/fluent-bit-xp5lp/fluent-bit] [2024/07/19 13:57:40] [ info] [input] pausing prod-qa08-re_emitter
[pod/fluent-bit-xp5lp/fluent-bit] [2024/07/19 13:57:40] [ info] [input] pausing prod-playtest-rewrite
[pod/fluent-bit-xp5lp/fluent-bit] [2024/07/19 13:57:40] [ info] [input] pausing prod-qa09-rewrite
[pod/fluent-bit-xp5lp/fluent-bit] [2024/07/19 13:57:40] [ info] [input] pausing prod-xxx01-rewrite
[pod/fluent-bit-xp5lp/fluent-bit] [2024/07/19 13:57:40] [ info] [input] pausing prod-gd01-rewrite
[pod/fluent-bit-xp5lp/fluent-bit] [2024/07/19 13:57:40] [ info] [input] pausing prod-qa03-rewrite
[pod/fluent-bit-xp5lp/fluent-bit] [2024/07/19 13:57:40] [ info] [input] pausing prod-playtest-ppp-rewrite
[pod/fluent-bit-xp5lp/fluent-bit] [2024/07/19 13:57:40] [ info] [input] pausing prod-dev04-rewrite
[pod/fluent-bit-xp5lp/fluent-bit] [2024/07/19 13:57:40] [ info] [input] pausing prod-qa16-rewrite
[pod/fluent-bit-xp5lp/fluent-bit] [2024/07/19 13:57:40] [ info] [input] pausing prod-qa18-rewrite
[pod/fluent-bit-xp5lp/fluent-bit] [2024/07/19 13:57:40] [ info] [input] pausing prod-qa10-rewrite
[pod/fluent-bit-xp5lp/fluent-bit] [2024/07/19 13:57:40] [ info] [input] pausing prod-dev03-rewrite
[pod/fluent-bit-xp5lp/fluent-bit] [2024/07/19 13:57:40] [ info] [input] pausing prod-qa19-rewrite
[pod/fluent-bit-xp5lp/fluent-bit] [2024/07/19 13:57:40] [ info] [input] pausing prod-qa20-rewrite
[pod/fluent-bit-xp5lp/fluent-bit] [2024/07/19 13:57:40] [ info] [input] pausing prod-qa17-rewrite
[pod/fluent-bit-xp5lp/fluent-bit] [2024/07/19 13:57:40] [ info] [input] pausing prod-qa11-rewrite
[pod/fluent-bit-xp5lp/fluent-bit] [2024/07/19 13:57:40] [ info] [input] pausing prod-qc-rewrite
[pod/fluent-bit-xp5lp/fluent-bit] [2024/07/19 13:57:40] [ info] [input] pausing prod-mainline-qa-rewrite
[pod/fluent-bit-xp5lp/fluent-bit] [2024/07/19 13:57:40] [error] [reload] loaded configuration contains error(s). Reloading is aborted
[pod/fluent-bit-xp5lp/fluent-bit] reloading is aborted and exit
[pod/fluent-bit-xp5lp/fluent-bit] level=error time=2024-07-19T13:57:40Z msg="Failure during the run time of fluent-bit" error="failed to run fluent-bit: exit status 255"

To Reproduce

  • create several namespaces
  • create several parsers and filters in every namespace
  • delete then redeploy namespace

Expected behavior

  • reload only once when all operations in ns finishes
  • check config before reloading and delay

Your Environment

- Fluent Operator version: >2.7.0
- Container Runtime: containerd
- Operating system: ubuntu
- Kernel version:

How did you install fluent operator?

helm

Additional context

No response

@cw-Guo
Copy link
Collaborator

cw-Guo commented Jul 20, 2024

reload only once when all operations in ns finishes

how does fluent-operator know when your operations are done?

in my opinion, you can control the create/delete orders in your CI system and this problem will be resolved.

@chrono2002
Copy link
Author

reload only once when all operations in ns finishes

how does fluent-operator know when your operations are done?

in my opinion, you can control the create/delete orders in your CI system and this problem will be resolved.

how exactly you're suggesting to control it?
we have helm chart that simple install parsers, filters and outputs
we've tried to place parsers section before filters, or filters section before parsers, no luck

@Cajga
Copy link
Contributor

Cajga commented Sep 10, 2024

@cw-Guo we use gitops to deploy and when we deploy a bigger application, many fluent-operator CRs gets created that seems to trigger many reload on fluent-bit pods.

This causes troubles for us as fluent-bit starts hanging from time to time (#1332).

It seems, fluent-bit has some issues with hot reload: fluent/fluent-bit#9354

While, these are most probably fluent-bit bugs, maybe being a bit more "kind" with the reload requests could help.

How about a solution that instead of immediately reload on every CR change, fluent-operator would "collect" the changes for some definable period (like 1 minute) and call a single reload only once if any change has happened during this period.

ping @markusthoemmes

@markusthoemmes
Copy link
Collaborator

I'm not really active in this project right now, but I did solve this internally eventually. Essentially, I've created a script that gets the current reloads (GET "http://0.0.0.0:2020/api/v2/reload") and then runs a hot reload. Afterwards it gets the reloads again. If they are the same as before, retry the reload. The need for that was supposed to be fixed via fluent/fluent-bit#8457 though, so now we should be able to handle the return value of the reload and retry on error.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants