-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Logs missing during heavy log volume #4693
Comments
Consistently seeing the following errors in the logs (changed the wait time to 60s): 2024-11-04 15:27:48 +0000 [info]: #0 [in_tail_container_logs] detected rotation of /var/log/containers/logger-deployment-57cc6745c7-4c4fb_default_logger-6712c2913db370d75ab57ea84fadb27351e7fc6841ee0005f313ca2df38e44a2.log; waiting 60.0 seconds
2024-11-04 15:27:48 +0000 [warn]: #0 [in_tail_container_logs] Skip update_watcher because watcher has been already updated by other inotify event path="/var/log/containers/logger-deployment-57cc6745c7-4c4fb_default_logger-6712c2913db370d75ab57ea84fadb27351e7fc6841ee0005f313ca2df38e44a2.log" inode=100695028 inode_in_pos_file=0
2024-11-04 15:27:48 +0000 [info]: #0 [in_tail_container_logs] detected rotation of /var/log/containers/logger-deployment-57cc6745c7-4c4fb_default_logger-6712c2913db370d75ab57ea84fadb27351e7fc6841ee0005f313ca2df38e44a2.log; waiting 60.0 seconds
2024-11-04 15:27:48 +0000 [info]: #0 [in_tail_container_logs] following tail of /var/log/containers/logger-deployment-57cc6745c7-4c4fb_default_logger-6712c2913db370d75ab57ea84fadb27351e7fc6841ee0005f313ca2df38e44a2.log Contents of containers.log.pos file:
|
Thanks for this report. |
Thanks. I've tried different combinations of settings since opening this issue, e.g. using a file buffer, increasing the chunk size, increasing the mem/CPU allocated to the fluentd daemonset, etc. None of them seems to have an impact on Fluentd's ability to tail the logs. It's as if it's losing track of the files it's supposed to tail. I have the notebook I've been using to find gaps in the sequence. Let me know if you want me to post it here. |
Describe the bug
During heavy log volumes, e.g. >10k log entries per second, fluentd consistently drops logs. It may be related to log rotation (on Kubernetes). When I ran a load test, I see the following entries in the fluentd logs:
When I added
follow_inodes=true
androtate_wait=0
to the container configuration, the errors went away, but large chunks of logs were still missing and the following entries appeared in the fluentd logs.I am running the latest version of the fluentd kubernetes daemonset for cloudwatch,
fluent/fluentd-kubernetes-daemonset:v1.17.1-debian-cloudwatch-1.2
.During the test, both memory and CPU utilization for fluentd remained fairly low.
To Reproduce
Run multiple replicas of the following program:
Here's the deployment for the test application:
Here's the container.conf file for fluentd:
Expected behavior
The test application assigns an sequence number to each log entry. I have a Python notebook that flattens the json log output, sorts the logs by sequence number, then finds gaps in the sequence. This is how I know that fluentd is dropping logs. If everything is working as it should there should be no log loss.
I ran the same tests with fluent bit and experience no log loss.
Your Environment
Your Configuration
Your Error Log
Additional context
No response
The text was updated successfully, but these errors were encountered: