Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add usage info for dead letter queues #7323

Closed
wants to merge 3 commits into from

Conversation

dedemorton
Copy link
Contributor

First draft of content for DLQ. The index changes add a new top-level entry to the TOC. It looks like this:

image

@suyograo
Copy link
Contributor

suyograo commented Jun 9, 2017

@robbavey can you review this?

@suyograo suyograo requested a review from robbavey June 9, 2017 15:34
Copy link
Member

@robbavey robbavey left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good - I have some general questions in here that @suyograo or @acchen97 would be better positioned than me to answer, as I'm still pretty new to the dlq...

=== Dead Letter Queues

//REVIEWERS: I had to install logstash-input-dead_letter_queue. Is it not bundled with the alpha2 release?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it will be installed with alpha3, I don't think it was added to alpha2 - @suyograo ?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah it was not bundled in alpha3. @dedemorton you can get it off the latest 5.5 BC as well. it is included in there.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Verified that the plugin is available in alpha3

of the Logstash plugins.

//REVIEWERS: I feel like we have to say something here ^^ but I'm not sure if this is enough info. How will users be able to tell if a specific output supports DLQs? Do we have a plan for when/how we will add DLQ support to plugins that we support?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does it make sense to refer the user to the documentation for the plugins? Assume that a plugin isn't supported unless the plugin docs state explicit dlq support. @acchen97, @suyograo may have a rollout plan for plugins.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, initially we've targeted on ES output, but I agree we can link back to the plugin docs which should state explicit support. In this case, we should add docs in ES output to mention what exactly gets DLQd. I think it's ok to have this in both places, but we can just cross-reference.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, I'll add something to indicate that users can check the plugin docs for DLQ support. For now, I'll mention that ES is the only output supported, and remove that bit when we have support in a few other outputs.

//REVIEWERS: I feel like we have to say something here ^^ but I'm not sure if this is enough info. How will users be able to tell if a specific output supports DLQs? Do we have a plan for when/how we will add DLQ support to plugins that we support?

//REVIEWERS: It sounds like there might be some performance implications wrt enabling DLQs. If so, what are they? Should we document the restrictions?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, I've created #7404 to track this

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think performance should be called out here at all. We can always improve performance, but the important point of DLQ is the resiliency it brings in. User docs should not highlight a performance impact from DLQ. If the DLQ is super slow that's a bug we can fix IMO.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

but +1 to measuring it, as @robbavey pointed out.


<1> The path to the directory containing the dead letter queue. This is either
the default, `path.data/dead_letter_queue`, or the value specified for
`path.dead_letter_queue` in the `logstash.yml` file.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't know if we need to call out that each pipeline with an plugin that supports dlq output maintains its own dlq folder underneath this directory? But this folder refers to the 'top level' dlq folder.

Or whether this fact would muddy the waters somewhat...

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll add something about this and see how it works. The main point here is to help users understand how they can find out where the files reside.

the dead letter queue on or after June 6, 2017, at 23:40:37.

//REVIEWERS: It's not clear to me what happens when the user configures start_timestamp and commit_offsets is true. If an offset's been committed, will the pipeline start reading at the offset, or go by the timestamp specified in start_timestamp.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reading the code, the timestamp overrides the value of the offset.

Its an interesting thought as to whether this is the desired behaviour - restarting logstash would see the events replayed from the earlier timestamp, even if offsets had been stored for events past the starting timestamp. That might cause some confusion. Thoughts @suyograo, @acchen97?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, this one is tricky. I think the start_timestamp should honor the committed offset and should restrict the range based on the timestamp after the commit point.

@robbavey you are right that the current behavior will confuse users. We should open an issue.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@suyograo
Copy link
Contributor

@dedemorton The latest 5.5 should have most of the fixes if you want to re-test this feature.

@dedemorton
Copy link
Contributor Author

This should be ready for a final review. Does anyone have a better example to use for the DLQ event? Something a bit more real-world-y.

//TODO: Need a better example here.

[source,json]
-------------------------------------------------------------------------------
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Update with actual input event.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It'd be nice to show the full event payload as it sits in the DLQ, including the reason/error metadata.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is an event that I pulled out of the DLQ created by the ES plugin:
{ "@timestamp" => 2017-06-16T20:47:02.510Z, "@metadata" => { "dead_letter_queue" => { "plugin_id" => "095bc531617b5a5fe5f632e85c9a5a03ddb204a5-2", "reason" => "Could not index event to Elasticsearch. status: 404, action: [\"update\", {:_id=>\"1\", :_index=>\"logstash-2017.06.16\", :_type=>\"logs\", :_routing=>nil, :_retry_on_conflict=>1}, #<LogStash::Event:0x31ac3ede>], response: {\"update\"=>{\"_index\"=>\"logstash-2017.06.16\", \"_type\"=>\"logs\", \"_id\"=>\"1\", \"status\"=>404, \"error\"=>{\"type\"=>\"document_missing_exception\", \"reason\"=>\"[logs][1]: document missing\", \"index_uuid\"=>\"FHQaa_FKRcyIB6ef8t9qFQ\", \"shard\"=>\"3\", \"index\"=>\"logstash-2017.06.16\"}}}", "entry_time" => #<Java::OrgLogstash::Timestamp:0x7e5e7c82>, "plugin_type" => "elasticsearch" } }, "@version" => "1", "host" => "dev.local", "message" => "Update" }

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can clean that up, but does that make sense?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@robbavey Yes. Can you clean up the formatting to be more "pretty print" style?

Copy link
Contributor

@suyograo suyograo Jun 22, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I confirmed there is no field called original. I may have misled @robbavey.

The original event gets appended to the Event object at the top-level.

Here is the updated config to deal with mapping error:

input { 
  dead_letter_queue { 
    path => "/path.to/data/dead_letter_queue/" 
  } 
} 
filter {
  mutate { 
    remove_field => "[geoip][location]" 
  } 
} 
output { 
  elasticsearch{} 
}

This removes the offending field and indexes other fields that were present in the original.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here's a before and after event:

In the DLQ before processing.

{
   "@metadata" => {
    "dead_letter_queue" => {
       "entry_time" => #<Java::OrgLogstash::Timestamp:0x5b5dacd5>,
        "plugin_id" => "fb80f1925088497215b8d037e622dec5819b503e-4",
      "plugin_type" => "elasticsearch",
           "reason" => "Could not index event to Elasticsearch. status: 400, action: [\"index\", {:_id=>nil, :_index=>\"logstash-2017.06.22\", :_type=>\"logs\", :_routing=>nil}, 2017-06-22T01:29:29.804Z Suyogs-MacBook-Pro-2.local {\"geoip\":{\"location\":\"home\"}}], response: {\"index\"=>{\"_index\"=>\"logstash-2017.06.22\", \"_type\"=>\"logs\", \"_id\"=>\"AVzNayPze1iR9yDdI2MD\", \"status\"=>400, \"error\"=>{\"type\"=>\"mapper_parsing_exception\", \"reason\"=>\"failed to parse\", \"caused_by\"=>{\"type\"=>\"illegal_argument_exception\", \"reason\"=>\"illegal latitude value [266.30859375] for geoip.location\"}}}}"
    }
  },
  "@timestamp" => 2017-06-22T01:29:29.804Z,
    "@version" => "1",
       "geoip" => {
    "location" => "home"
  },
        "host" => "Suyogs-MacBook-Pro-2.local",
     "message" => "{\"geoip\":{\"location\":\"home\"}}"
}

After processing:

{
   "@metadata" => {
    "dead_letter_queue" => {
       "entry_time" => #<Java::OrgLogstash::Timestamp:0x59f3b191>,
        "plugin_id" => "fb80f1925088497215b8d037e622dec5819b503e-4",
      "plugin_type" => "elasticsearch",
           "reason" => "Could not index event to Elasticsearch. status: 400, action: [\"index\", {:_id=>nil, :_index=>\"logstash-2017.06.22\", :_type=>\"logs\", :_routing=>nil}, 2017-06-22T01:29:29.804Z Suyogs-MacBook-Pro-2.local {\"geoip\":{\"location\":\"home\"}}], response: {\"index\"=>{\"_index\"=>\"logstash-2017.06.22\", \"_type\"=>\"logs\", \"_id\"=>\"AVzNayPze1iR9yDdI2MD\", \"status\"=>400, \"error\"=>{\"type\"=>\"mapper_parsing_exception\", \"reason\"=>\"failed to parse\", \"caused_by\"=>{\"type\"=>\"illegal_argument_exception\", \"reason\"=>\"illegal latitude value [266.30859375] for geoip.location\"}}}}"
    }
  },
  "@timestamp" => 2017-06-22T01:29:29.804Z,
    "@version" => "1",
       "geoip" => {},
        "host" => "Suyogs-MacBook-Pro-2.local",
     "message" => "{\"geoip\":{\"location\":\"home\"}}"
}

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

notice the geoip field is empty.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@suyograo @dedemorton - apologies. I ran that config that I pasted here through my local rig and it appeared to DTRT - an entry got added to ES after going through the DLQ.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah, no worries. I think it just ignored the json filter parsing because original field was not found :) All good now

[[configuring-dlq]]
==== Configuring Logstash to Use Dead Letter Queues

You enable dead letter queues by setting the `dead_letter_queue_enable` option
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe say its off by default?


Dead letter queues are stored as files in the local directory of the Logstash
instance. By default, the dead letter queue files are stored in
`path.data/dead_letter_queue`. Each pipeline has a separate queue. For example,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mentioning each pipeline has a separate queue only makes sense when we have multi-pipeline. Also, mentioning the "main" pipeline may be confusing to users until multi-pipeline is introduced. I'd suggest massaging this a bit.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This PR will get merged into master, too, which we use to build 6.0.0-alpha2, so let's keep the description. I can massage it for 5.5 after we've merged the PR.

==== Processing Events in the Dead Letter Queue

When you are ready to process events in the dead letter queue, you create a
pipeline that uses the `dead_letter_queue` input plugin to read from the dead
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we should link to the DLQ input docs here

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

will do once the DLQ input docs are building.

pipeline without interruption, Logstash provides the following data resiliency
features.

* <<persistent-queues>> protect against data loss by storing events in a message
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe call this an "internal queue"

@acchen97
Copy link
Contributor

Left some comments. @suyograo one thing that missing here is how to cleanup/purge data from the DLQ. We should address that even if its a manual process today.

@suyograo
Copy link
Contributor

@acchen97 we should punt on the manual cleanup instructions as of now. I expect the data volume for DLQ is gonna be low, so this should really not be a big deal. Also, we're already adding retention feature for 5.6, so I'm a little wary of adding manual delete-these-files instructions. Can we see how it goes and we can re-evaluate?

@dedemorton
Copy link
Contributor Author

@suyograo This is ready for a final review. I need to retest the config in the example, but I can update in a separate request, if necessary.

Copy link
Contributor

@suyograo suyograo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A couple of minor comments, but LGTM from my side.

information about the plugin that wrote the event, and the timestamp for when
the event entered the dead letter queue.

//REVIEWERS: Decided that it might be more useful to show the event in the section where we show how to process it.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Makes sense


output {
stdout {
codec => rubydebug
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we use stdout { codec => rubydebug { metadata => true } } in this example. If you add metadata => true, LS will emit the metadata information which is useful in the DLQ case.

@elasticsearch-bot
Copy link

DeDe Morton merged this into the following branches!

Branch Commits
master 2c4194f, e092f6b, ef8ec6a

elasticsearch-bot pushed a commit that referenced this pull request Jun 26, 2017
elasticsearch-bot pushed a commit that referenced this pull request Jun 26, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants