Add charset conversion feature. #222

gnuhpc · 2017-08-10T05:46:21Z

Our senario is to collect to some logs encoded by GBK (a Chinese encoding) rather than UTF-8.

We use flume taildirectorysource plugin to do the collection. After avro serialization, the event was sent to kafka. And then we use logstash-kafka-input to fetch and send it to elasticsearch. The data flow is:

Logs(GBK) -> flume (avro serialization)->kafka-> logstash-kafka-input (avro codec to do the avro deserialization) -> elasticsearch

the input config is:

input {
        kafka {
                bootstrap_servers => "DPFTMP06:9092"
                max_partition_fetch_bytes => "3145728"
                topics =>["emu-topic"]
                group_id => "logstashc"
                auto_offset_reset => "earliest"
                consumer_threads => 8
                key_deserializer_class => "org.apache.kafka.common.serialization.ByteArrayDeserializer"
                value_deserializer_class => "org.apache.kafka.common.serialization.ByteArrayDeserializer"
                charset => "GBK"
                charset_field => ["message"]
                codec => avro {
                        schema_uri => "/logger/logstash/configM/test.avsc"
                }
        }
}

If the logs were UTF-8, it won't cause any trouble. Unfortunately, the logs are GBK. So the kibana will show garbled message when it contains Chinese.

With this pr, we can convert the message to the correct encoding after avro codec in this input plugin.

Please check, thank you!

cgyim · 2017-08-24T03:25:33Z

@gnuhpc Recently I met that same trouble as you mentioned, but with GB2312 encoding. My data flow is also flume -> kafka -> logstash , flume source: taildir, could you share your configuration of flume ?

gnuhpc · 2017-09-27T08:59:29Z

@cgyim1992 It's nothing to do with this issue. Please add me in wechat: gnuhpc. Thank you!

imweijh · 2017-12-08T09:40:53Z

Specify charset GBK in flume config,
Not in logstash-input-kakfa.

gnuhpc · 2018-01-04T01:10:48Z

@imweijh can you tell me the flume config to configure the GBK charset when using avro? Thank you

Add charset conversion feature.

5988122

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add charset conversion feature. #222

Add charset conversion feature. #222

gnuhpc commented Aug 10, 2017 •

edited

Loading

cgyim commented Aug 24, 2017

gnuhpc commented Sep 27, 2017

imweijh commented Dec 8, 2017

gnuhpc commented Jan 4, 2018

Add charset conversion feature. #222

Are you sure you want to change the base?

Add charset conversion feature. #222

Conversation

gnuhpc commented Aug 10, 2017 • edited Loading

cgyim commented Aug 24, 2017

gnuhpc commented Sep 27, 2017

imweijh commented Dec 8, 2017

gnuhpc commented Jan 4, 2018

gnuhpc commented Aug 10, 2017 •

edited

Loading