Improve DLQ usability by labeling failed records within rejected chunks

**Is your feature request related to a problem? Please describe.**
I have configured the Dead Letter Queue (DLQ) following the Fluent Bit documentation, where failed logs are written to the rejected directory after retries are exhausted.

When using Fluent Bit’s DLQ, chunks are retried and eventually rejected at the chunk level, not the record level. If a chunk contains multiple records (for example, 10 logs) and only one record fails (for example due to a parsing or mapping error in OpenSearch), the entire chunk is retried. Once retries are exhausted, the whole chunk is moved to the rejected directory, including records that were actually valid and would have been indexed successfully.

In my case, logs are primarily failing due to parsing/mapping errors, as I am using an OpenSearch index template with strict field definitions. A single record that does not conform to the template causes the entire chunk to fail.

This makes it difficult to identify:

-Which specific record(s) caused the failure
-Why the failure occurred
-Which records in the rejected chunk are actually valid vs invalid

As a result, troubleshooting and recovery from DLQ becomes unnecessarily complex and error-prone.

**Describe the solution you'd like**
Fluent Bit to support record-level identification for failures when using DLQ, ideally by tagging/labeling failed records and allowing them to be routed separately.

Preferred behavior (retag + re-inject into the pipeline):
Add a config option to retag failed records and re-inject them into the pipeline so they can be routed to a different output.
If retagging/re-injection is not feasible, then at least include some per-record failure labeling/metadata in the DLQ output (e.g., in the rejected chunk itself or a sidecar metadata file) so that from the rejected directory we can quickly identify which record(s) failed and why, instead of having to manually sift through chunks that contain both successful and failed records.

**Describe alternatives you've considered**
As a workaround, we chose to send logs to Kafka as the output instead of directly indexing into OpenSearch. We then implemented custom consumer logic to insert records into OpenSearch and handle failed logs at the record level.




Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Improve DLQ usability by labeling failed records within rejected chunks #11425

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Improve DLQ usability by labeling failed records within rejected chunks #11425

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions