Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fluentd crashes when watching namespace metadata #239

Closed
julienlefur opened this issue Jun 26, 2020 · 3 comments
Closed

Fluentd crashes when watching namespace metadata #239

julienlefur opened this issue Jun 26, 2020 · 3 comments

Comments

@julienlefur
Copy link

The watch connection on the API Server seems to be closed regularly by one of the following : Kubeclient / http / apiserver

When no modifications are made on the namespaces, the namespace_watch_retry_count is constantly increasing due to this error:
2020-06-26 11:59:04 +0000 [info]: #0 [filter_kube_metadata] Exception encountered parsing namespace watch event. The connection might have been closed. Sleeping for 128 seconds and resetting the namespace watcher.error reading from socket: Could not parse data

When the max is reached, Fluentd crashes and restarts.

@stats.bump(:namespace_watch_failures)
if Thread.current[:namespace_watch_retry_count] < @watch_retry_max_times
# Instead of raising exceptions and crashing Fluentd, swallow
# the exception and reset the watcher.
log.info(
"Exception encountered parsing namespace watch event. " \
"The connection might have been closed. Sleeping for " \
"#{Thread.current[:namespace_watch_retry_backoff_interval]} " \
"seconds and resetting the namespace watcher.", e)
sleep(Thread.current[:namespace_watch_retry_backoff_interval])
Thread.current[:namespace_watch_retry_count] += 1
Thread.current[:namespace_watch_retry_backoff_interval] *= @watch_retry_exponential_backoff_base
namespace_watcher = nil

The only way to reset the namespace_watch_retry_count is to make a change on a namespace so the function reset_namespace_watch_retry_stats is called. But when no modification are done on the namespace, fluentd crashes after 10 'connection closed' errors.

Would it be possible to catch the 'normal' connection close errors to avoid this behaviour?

It seems to be linked to this issue on Kubeclient: ManageIQ/kubeclient#273

@julienlefur julienlefur changed the title Fluentd crash when no modifications on the watch namespace Fluentd crashes when watching namespace metadata Jun 26, 2020
@jcantrill
Copy link
Contributor

fixed by #247

@julienlefur
Copy link
Author

@jcantrill I still have the same behaviour with the version 2.5.2
The connection to the api-server is closed regularly. The stat counter "namespace_watch_failures" is incremented until 10 and fluentd crashes.
I have a cronjob that runs to apply a change in a namespace in order to make fluentd reset this counter. The is a workaround to prevent fluentd to crash.
I'll keep you posted if I can dig deeper and find a few things.

@mashail
Copy link

mashail commented Aug 1, 2020

@jcantrill we upgraded to 2.5.2 and we still get the same issue. fluentd is crashing every hour

2020-08-01 15:53:44 +0300 [error]: Exception encountered parsing namespace watch event. The connection might have been closed. Retried 10 times yet still failing. Restarting.error reading from socket: Could not parse data
#<Thread:0x000055ea2b3c0d78@/opt/bitnami/fluentd/gems/fluent-plugin-kubernetes_metadata_filter-2.5.2/lib/fluent/plugin/filter_kubernetes_metadata.rb:279 run> terminated with exception (report_on_exception is true):
/opt/bitnami/fluentd/gems/fluent-plugin-kubernetes_metadata_filter-2.5.2/lib/fluent/plugin/kubernetes_metadata_watch_namespaces.rb:70:in `rescue in set_up_namespace_thread': Exception encountered parsing namespace watch event. The connection might have been closed. Retried 10 times yet still failing. Restarting. (Fluent::UnrecoverableError)
	from /opt/bitnami/fluentd/gems/fluent-plugin-kubernetes_metadata_filter-2.5.2/lib/fluent/plugin/kubernetes_metadata_watch_namespaces.rb:39:in `set_up_namespace_thread'
	from /opt/bitnami/fluentd/gems/fluent-plugin-kubernetes_metadata_filter-2.5.2/lib/fluent/plugin/filter_kubernetes_metadata.rb:279:in `block in configure'
/opt/bitnami/fluentd/gems/http-4.4.1/lib/http/response/parser.rb:31:in `add': error reading from socket: Could not parse data (HTTP::ConnectionError)
	from /opt/bitnami/fluentd/gems/http-4.4.1/lib/http/connection.rb:214:in `read_more'
	from /opt/bitnami/fluentd/gems/http-4.4.1/lib/http/connection.rb:92:in `readpartial'
	from /opt/bitnami/fluentd/gems/http-4.4.1/lib/http/response/body.rb:30:in `readpartial'
	from /opt/bitnami/fluentd/gems/http-4.4.1/lib/http/response/body.rb:36:in `each'
	from /opt/bitnami/fluentd/gems/kubeclient-4.8.0/lib/kubeclient/watch_stream.rb:25:in `each'
	from /opt/bitnami/fluentd/gems/fluent-plugin-kubernetes_metadata_filter-2.5.2/lib/fluent/plugin/kubernetes_metadata_watch_namespaces.rb:114:in `process_namespace_watcher_notices'
	from /opt/bitnami/fluentd/gems/fluent-plugin-kubernetes_metadata_filter-2.5.2/lib/fluent/plugin/kubernetes_metadata_watch_namespaces.rb:41:in `set_up_namespace_thread'
	from /opt/bitnami/fluentd/gems/fluent-plugin-kubernetes_metadata_filter-2.5.2/lib/fluent/plugin/filter_kubernetes_metadata.rb:279:in `block in configure'
/opt/bitnami/fluentd/gems/http-4.4.1/lib/http/response/parser.rb:31:in `add': Could not parse data (IOError)
	from /opt/bitnami/fluentd/gems/http-4.4.1/lib/http/connection.rb:214:in `read_more'
	from /opt/bitnami/fluentd/gems/http-4.4.1/lib/http/connection.rb:92:in `readpartial'
	from /opt/bitnami/fluentd/gems/http-4.4.1/lib/http/response/body.rb:30:in `readpartial'
	from /opt/bitnami/fluentd/gems/http-4.4.1/lib/http/response/body.rb:36:in `each'
	from /opt/bitnami/fluentd/gems/kubeclient-4.8.0/lib/kubeclient/watch_stream.rb:25:in `each'
	from /opt/bitnami/fluentd/gems/fluent-plugin-kubernetes_metadata_filter-2.5.2/lib/fluent/plugin/kubernetes_metadata_watch_namespaces.rb:114:in `process_namespace_watcher_notices'
	from /opt/bitnami/fluentd/gems/fluent-plugin-kubernetes_metadata_filter-2.5.2/lib/fluent/plugin/kubernetes_metadata_watch_namespaces.rb:41:in `set_up_namespace_thread'
	from /opt/bitnami/fluentd/gems/fluent-plugin-kubernetes_metadata_filter-2.5.2/lib/fluent/plugin/filter_kubernetes_metadata.rb:279:in `block in configure'
Unexpected error Exception encountered parsing namespace watch event. The connection might have been closed. Retried 10 times yet still failing. Restarting.
  /opt/bitnami/fluentd/gems/fluent-plugin-kubernetes_metadata_filter-2.5.2/lib/fluent/plugin/kubernetes_metadata_watch_namespaces.rb:70:in `rescue in set_up_namespace_thread'
  /opt/bitnami/fluentd/gems/fluent-plugin-kubernetes_metadata_filter-2.5.2/lib/fluent/plugin/kubernetes_metadata_watch_namespaces.rb:39:in `set_up_namespace_thread'
  /opt/bitnami/fluentd/gems/fluent-plugin-kubernetes_metadata_filter-2.5.2/lib/fluent/plugin/filter_kubernetes_metadata.rb:279:in `block in configure'

I enabled trace log for the plugin trying to figure out the issue but I wasn't lucky and I don't want to increase the retry because it will eventually crash I want to diagnose the root cause and solve it. Can you advise.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants