Logstash hangs up after elasticsearch was down (with kafka input) #9421

yfoelling · 2018-04-24T16:05:14Z

Version: 6.2.4
Operating System: Ubuntu 16.04
Config File (if you have sensitive info, please remove it):

input {
  kafka {
    auto_offset_reset  => "earliest"
    bootstrap_servers  => "SERVERS"
    consumer_threads   => 4
    enable_auto_commit => "false"
    group_id           => "GROUP"
    client_id          => "CLIENT"
    topics_pattern     => "PATTERN"
    codec              => "json"
  }
}
filter {
[...]
}
output {
  elasticsearch {
    index =>              "INDEX"
    hosts =>              ["SERVER"]
    document_id =>        "%{[@metadata][fingerprint]}"
    template =>           "/etc/logstash/index_template.json"
    template_name =>      "logging"
    template_overwrite => true
  }
}

Steps to Reproduce:

take a config like the one above (filters are not required)
start logstash (with some continous input)
make ES unreachable somehow
make it reachable after some time

More Info:
With Kafka input and Elasticsearch output, logstash stops if elasticsearch is unreachable for a while.
The process is still running, but doesnt process new data.

The logstash is running in Kubernetes, so i have the problem that i need to manualy restart it each time ES was not reachable. IMHO it should end/stop the sevice or wait for the cluster to get up again.

Log output while ES is down (for some hours in my case):

2018-04-23T16:58:39.510378721Z [WARN ] 2018-04-23 16:58:39.510 [Ruby-0-Thread-11@[main]>worker1: /usr/share/logstash/logstash-core/lib/logstash/pipeline.rb:385] elasticsearch - UNEXPECTED POOL ERROR {:e=>#}
2018-04-23T16:58:39.510487776Z [ERROR] 2018-04-23 16:58:39.510 [Ruby-0-Thread-11@[main]>worker1: /usr/share/logstash/logstash-core/lib/logstash/pipeline.rb:385] elasticsearch - Attempted to send a bulk request to elasticsearch, but no there are no living connections in the connection pool. Perhaps Elasticsearch is unreachable or down? {:error_message=>"No Available connections", :class=>"LogStash::Outputs::ElasticSearch::HttpClient::Pool::NoConnectionAvailableError", :will_retry_in_seconds=>64}
2018-04-23T16:58:42.951478403Z [WARN ] 2018-04-23 16:58:42.951 [Ruby-0-Thread-8: /usr/share/logstash/vendor/bundle/jruby/2.3.0/gems/logstash-output-elasticsearch-9.1.1-java/lib/logstash/outputs/elasticsearch/http_client/pool.rb:232] elasticsearch - Attempted to resurrect connection to dead ES instance, but got an error. {:url=>"SERVER", :error_type=>LogStash::Outputs::ElasticSearch::HttpClient::Pool::HostUnreachableError, :error=>"Elasticsearch Unreachable: [SERVER][Manticore::SocketException] Connection refused (Connection refused)"}

Log output at the time es is up again:

2018-04-23T16:59:42.326249582Z Exception in thread "Ruby-0-Thread-23: /usr/share/logstash/vendor/bundle/jruby/2.3.0/gems/logstash-input-kafka-8.0.6/lib/logstash/inputs/kafka.rb:241" Exception in thread "Ruby-0-Thread-22: /usr/share/logstash/vendor/bundle/jruby/2.3.0/gems/logstash-input-kafka-8.0.6/lib/logstash/inputs/kafka.rb:241" org.apache.kafka.clients.consumer.CommitFailedException: Commit cannot be completed since the group has already rebalanced and assigned the partitions to another member. This means that the time between subsequent calls to poll() was longer than the configured max.poll.interval.ms, which typically implies that the poll loop is spending too much time message processing. You can address this either by increasing the session timeout or by reducing the maximum size of batches returned in poll() with max.poll.records.
2018-04-23T16:59:42.326318585Z 	at org.apache.kafka.clients.consumer.internals.ConsumerCoordinator.sendOffsetCommitRequest(org/apache/kafka/clients/consumer/internals/ConsumerCoordinator.java:721)
2018-04-23T16:59:42.326342549Z 	at org.apache.kafka.clients.consumer.internals.ConsumerCoordinator.commitOffsetsSync(org/apache/kafka/clients/consumer/internals/ConsumerCoordinator.java:599)
2018-04-23T16:59:42.326350452Z 	at org.apache.kafka.clients.consumer.KafkaConsumer.commitSync(org/apache/kafka/clients/consumer/KafkaConsumer.java:1203)
2018-04-23T16:59:42.326358068Z 	at java.lang.reflect.Method.invoke(java/lang/reflect/Method.java:498)
2018-04-23T16:59:42.326363546Z 	at org.jruby.javasupport.JavaMethod.invokeDirectWithExceptionHandling(org/jruby/javasupport/JavaMethod.java:438)
2018-04-23T16:59:42.326455383Z 	at org.jruby.javasupport.JavaMethod.invokeDirect(org/jruby/javasupport/JavaMethod.java:302)
2018-04-23T16:59:42.32646587Z 	at RUBY.block in thread_runner(/usr/share/logstash/vendor/bundle/jruby/2.3.0/gems/logstash-input-kafka-8.0.6/lib/logstash/inputs/kafka.rb:269)
2018-04-23T16:59:42.326472091Z 	at org.jruby.RubyProc.call(org/jruby/RubyProc.java:289)
2018-04-23T16:59:42.326477217Z 	at org.jruby.RubyProc.call(org/jruby/RubyProc.java:246)
2018-04-23T16:59:42.326483081Z 	at java.lang.Thread.run(java/lang/Thread.java:748)
2018-04-23T16:59:42.326607591Z org.apache.kafka.clients.consumer.CommitFailedException: Commit cannot be completed since the group has already rebalanced and assigned the partitions to another member. This means that the time between subsequent calls to poll() was longer than the configured max.poll.interval.ms, which typically implies that the poll loop is spending too much time message processing. You can address this either by increasing the session timeout or by reducing the maximum size of batches returned in poll() with max.poll.records.
2018-04-23T16:59:42.326619456Z 	at org.apache.kafka.clients.consumer.internals.ConsumerCoordinator.sendOffsetCommitRequest(org/apache/kafka/clients/consumer/internals/ConsumerCoordinator.java:721)
2018-04-23T16:59:42.326680519Z 	at org.apache.kafka.clients.consumer.internals.ConsumerCoordinator.commitOffsetsSync(org/apache/kafka/clients/consumer/internals/ConsumerCoordinator.java:599)
2018-04-23T16:59:42.326690954Z 	at org.apache.kafka.clients.consumer.KafkaConsumer.commitSync(org/apache/kafka/clients/consumer/KafkaConsumer.java:1203)
2018-04-23T16:59:42.326696757Z 	at java.lang.reflect.Method.invoke(java/lang/reflect/Method.java:498)
2018-04-23T16:59:42.326703114Z 	at org.jruby.javasupport.JavaMethod.invokeDirectWithExceptionHandling(org/jruby/javasupport/JavaMethod.java:438)
2018-04-23T16:59:42.326749471Z 	at org.jruby.javasupport.JavaMethod.invokeDirect(org/jruby/javasupport/JavaMethod.java:302)
2018-04-23T16:59:42.326760403Z 	at RUBY.block in thread_runner(/usr/share/logstash/vendor/bundle/jruby/2.3.0/gems/logstash-input-kafka-8.0.6/lib/logstash/inputs/kafka.rb:269)
2018-04-23T16:59:42.326801054Z 	at org.jruby.RubyProc.call(org/jruby/RubyProc.java:289)
2018-04-23T16:59:42.326889432Z 	at org.jruby.RubyProc.call(org/jruby/RubyProc.java:246)
2018-04-23T16:59:42.326901776Z 	at java.lang.Thread.run(java/lang/Thread.java:748)
2018-04-23T16:59:42.33038824Z Exception in thread "Ruby-0-Thread-20: /usr/share/logstash/vendor/bundle/jruby/2.3.0/gems/logstash-input-kafka-8.0.6/lib/logstash/inputs/kafka.rb:241" Exception in thread "Ruby-0-Thread-21: /usr/share/logstash/vendor/bundle/jruby/2.3.0/gems/logstash-input-kafka-8.0.6/lib/logstash/inputs/kafka.rb:241" org.apache.kafka.clients.consumer.CommitFailedException: Commit cannot be completed since the group has already rebalanced and assigned the partitions to another member. This means that the time between subsequent calls to poll() was longer than the configured max.poll.interval.ms, which typically implies that the poll loop is spending too much time message processing. You can address this either by increasing the session timeout or by reducing the maximum size of batches returned in poll() with max.poll.records.
2018-04-23T16:59:42.330428219Z 	at org.apache.kafka.clients.consumer.internals.ConsumerCoordinator.sendOffsetCommitRequest(org/apache/kafka/clients/consumer/internals/ConsumerCoordinator.java:721)
2018-04-23T16:59:42.330513757Z 	at org.apache.kafka.clients.consumer.internals.ConsumerCoordinator.commitOffsetsSync(org/apache/kafka/clients/consumer/internals/ConsumerCoordinator.java:599)
2018-04-23T16:59:42.33053427Z 	at org.apache.kafka.clients.consumer.KafkaConsumer.commitSync(org/apache/kafka/clients/consumer/KafkaConsumer.java:1203)
2018-04-23T16:59:42.330542261Z 	at java.lang.reflect.Method.invoke(java/lang/reflect/Method.java:498)
2018-04-23T16:59:42.330548441Z 	at org.jruby.javasupport.JavaMethod.invokeDirectWithExceptionHandling(org/jruby/javasupport/JavaMethod.java:438)
2018-04-23T16:59:42.330554574Z 	at org.jruby.javasupport.JavaMethod.invokeDirect(org/jruby/javasupport/JavaMethod.java:302)
2018-04-23T16:59:42.33056064Z 	at RUBY.block in thread_runner(/usr/share/logstash/vendor/bundle/jruby/2.3.0/gems/logstash-input-kafka-8.0.6/lib/logstash/inputs/kafka.rb:269)
2018-04-23T16:59:42.330566733Z 	at org.jruby.RubyProc.call(org/jruby/RubyProc.java:289)
2018-04-23T16:59:42.330572125Z 	at org.jruby.RubyProc.call(org/jruby/RubyProc.java:246)
2018-04-23T16:59:42.330677833Z 	at java.lang.Thread.run(java/lang/Thread.java:748)
2018-04-23T16:59:42.330688725Z org.apache.kafka.clients.consumer.CommitFailedException: Commit cannot be completed since the group has already rebalanced and assigned the partitions to another member. This means that the time between subsequent calls to poll() was longer than the configured max.poll.interval.ms, which typically implies that the poll loop is spending too much time message processing. You can address this either by increasing the session timeout or by reducing the maximum size of batches returned in poll() with max.poll.records.
2018-04-23T16:59:42.330696963Z 	at org.apache.kafka.clients.consumer.internals.ConsumerCoordinator.sendOffsetCommitRequest(org/apache/kafka/clients/consumer/internals/ConsumerCoordinator.java:721)
2018-04-23T16:59:42.330702453Z 	at org.apache.kafka.clients.consumer.internals.ConsumerCoordinator.commitOffsetsSync(org/apache/kafka/clients/consumer/internals/ConsumerCoordinator.java:599)
2018-04-23T16:59:42.330713048Z 	at org.apache.kafka.clients.consumer.KafkaConsumer.commitSync(org/apache/kafka/clients/consumer/KafkaConsumer.java:1203)
2018-04-23T16:59:42.330718522Z 	at java.lang.reflect.Method.invoke(java/lang/reflect/Method.java:498)
2018-04-23T16:59:42.330724478Z 	at org.jruby.javasupport.JavaMethod.invokeDirectWithExceptionHandling(org/jruby/javasupport/JavaMethod.java:438)
2018-04-23T16:59:42.331021615Z 	at org.jruby.javasupport.JavaMethod.invokeDirect(org/jruby/javasupport/JavaMethod.java:302)
2018-04-23T16:59:42.331034813Z 	at RUBY.block in thread_runner(/usr/share/logstash/vendor/bundle/jruby/2.3.0/gems/logstash-input-kafka-8.0.6/lib/logstash/inputs/kafka.rb:269)
2018-04-23T16:59:42.331041498Z 	at org.jruby.RubyProc.call(org/jruby/RubyProc.java:289)
2018-04-23T16:59:42.331046719Z 	at org.jruby.RubyProc.call(org/jruby/RubyProc.java:246)
2018-04-23T16:59:42.331051815Z 	at java.lang.Thread.run(java/lang/Thread.java:748)```

The text was updated successfully, but these errors were encountered:

andrewvc · 2018-04-24T20:24:23Z

These github issues are only for bugs with Logstash core. This looks to be an issue with the Kafka input. I'm moving this issue there and closing it here. You can track this here: logstash-plugins/logstash-input-kafka#265

andrewvc mentioned this issue Apr 24, 2018

Kafka Input Polling issue logstash-plugins/logstash-input-kafka#265

Open

andrewvc closed this as completed Apr 24, 2018

andrewvc added the invalid For issues that should not have been reported to logstash core label Apr 24, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Logstash hangs up after elasticsearch was down (with kafka input) #9421

Logstash hangs up after elasticsearch was down (with kafka input) #9421

yfoelling commented Apr 24, 2018

andrewvc commented Apr 24, 2018

Logstash hangs up after elasticsearch was down (with kafka input) #9421

Logstash hangs up after elasticsearch was down (with kafka input) #9421

Comments

yfoelling commented Apr 24, 2018

andrewvc commented Apr 24, 2018