Skip to content

Conversation

cosmo0920
Copy link
Contributor

@cosmo0920 cosmo0920 commented Jul 28, 2025

We added an interface for configurable buffer limit for multiline.
Also, we implemented robust processing for multiline concatenations.


Enter [N/A] in the box, if an item is not applicable to your change.

Testing
Before we can approve your change; please submit the following in a comment:

  • Example configuration file for the change
  • Debug log output from testing the change
  • Attached Valgrind output that shows no leaks or memory corruption was found

If this is a change to packaging of containers or native binaries then please confirm it works for all targets.

  • Run local packaging test showing all targets (including any new ones) build.
  • Set ok-package-test label to test for all targets (requires maintainer to do).

Documentation

  • Documentation required for this feature

Backporting

  • Backport to latest stable release.

Fluent Bit is licensed under Apache 2.0, by submitting this pull request I understand that this code will be released under the terms of that license.

Summary by CodeRabbit

  • New Features

    • Configurable multiline buffer size limit for concatenated log messages.
    • Support for binary-size strings (KiB/MiB/GiB) when specifying sizes.
    • Per-input and per-filter multiline-truncated metrics; truncated events are annotated and emit warnings.
  • Bug Fixes

    • Multiline append/flush logic now propagates truncation status to prevent silent data loss.
  • Tests

    • Added tests for buffer-limit truncation and binary-size parsing.

Copy link

coderabbitai bot commented Aug 1, 2025

Walkthrough

Adds a configurable multiline buffer limit and binary-size parser; propagates a FLB_MULTILINE_TRUNCATED status through multiline processing, enforces per-group buffer limits on append, records truncation via metrics/logs (filter and tail), extends multiline parser creation with a params API, and adds tests for truncation and binary-size parsing.

Changes

Cohort / File(s) Change Summary
Config field & default
include/fluent-bit/flb_config.h, src/flb_config.c
Added multiline_buffer_limit config macro and char *multiline_buffer_limit field; registered config entry and initialized it to default string.
Multiline core types & defaults
include/fluent-bit/multiline/flb_ml.h
Added buffer-limit default macros, return codes (FLB_MULTILINE_OK, FLB_MULTILINE_PROCESSED, FLB_MULTILINE_TRUNCATED), buffer_limit in flb_ml, and truncated/stream fields in flb_ml_stream_group.
Group API & impl
include/fluent-bit/multiline/flb_ml_group.h, src/multiline/flb_ml_group.c
Declared and implemented flb_ml_group_cat() to append with buffer-limit enforcement, possible truncation, and return of OK/TRUNCATED status.
Multiline processing & rules
src/multiline/flb_ml.c, src/multiline/flb_ml_rule.c, src/multiline/flb_ml_stream.c
Propagated truncation status through append/processing functions; attempt text parsers without early failure; replaced direct buffer writes with flb_ml_group_cat(); set/clear group truncated flags; annotate flushed events with "multiline_truncated": true; initialize ml->buffer_limit from config.
Multiline parser API (params)
include/fluent-bit/multiline/flb_ml_parser.h, src/multiline/flb_ml_parser.c
Added flb_ml_parser_params, flb_ml_parser_params_default(), and flb_ml_parser_create_params(); legacy create now wraps to new params API.
Filter plugin metrics
plugins/filter_multiline/ml.h, plugins/filter_multiline/ml.c
Added truncated metric ID/counter (FLB_MULTILINE_METRIC_TRUNCATED, cmt_truncated), create/register metric and increment on truncation alongside existing emitted metric handling.
Tail plugin metrics & instrumentation
plugins/in_tail/tail_config.h, plugins/in_tail/tail_config.c, plugins/in_tail/tail_file.c
Added per-tail truncated metric constant/counter (FLB_TAIL_METRIC_M_TRUNCATED, cmt_multiline_truncated), register legacy metric, log warning and increment metrics on truncation in process_content.
Size parsing utility & tests
include/fluent-bit/flb_utils.h, src/flb_utils.c, tests/internal/utils.c
Added flb_utils_size_to_binary_bytes() (KiB/MiB/GiB parsing with 1024-based multipliers and overflow guards) and unit tests validating binary-size parsing.
Tests - multiline truncation
tests/internal/multiline.c
Added buffer_limit_truncation test exercising truncation, adjusted flush callback guard, and updated expected outputs.

Sequence Diagram(s)

sequenceDiagram
    participant Config
    participant Tail as TailInput
    participant ML as MultilineCore
    participant Group as StreamGroup
    participant Filter
    participant Metrics

    Config->>ML: init buffer_limit (string -> bytes)
    Tail->>ML: flb_ml_append_text/append_object(data)
    ML->>Group: flb_ml_group_cat(data,len)
    alt appended fully
        Group-->>ML: FLB_MULTILINE_OK
        ML->>Filter: emit/process event
        Filter->>Metrics: inc emitted metric
    else truncated or partially appended
        Group-->>ML: FLB_MULTILINE_TRUNCATED
        ML->>Tail: log warning
        ML->>Metrics: inc truncated metric
        ML->>Group: mark truncated flag
    end
    ML->>Group: flush -> include "multiline_truncated": true if set
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Suggested reviewers

  • leonardo-albertovich
  • edsiper
  • koleini
  • fujimotos

Poem

I nibble bytes beneath the moon,
I guard the buffer, trim by tune.
When lines run long and edges fray,
I tag and count what went astray.
Hooray — tests hop in to play. 🐇

✨ Finishing Touches
  • 📝 Generate Docstrings
🧪 Generate unit tests
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch cosmo0920-add-limit-for-multiline-concatenation

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share
🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

  • Review comments: Directly reply to a review comment made by CodeRabbit. Example:
    • I pushed a fix in commit <commit_id>, please review it.
    • Open a follow-up GitHub issue for this discussion.
  • Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query.
  • PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
    • @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
    • @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.

Support

Need help? Create a ticket on our support page for assistance with any issues or questions.

CodeRabbit Commands (Invoked using PR/Issue comments)

Type @coderabbitai help to get the list of available commands.

Other keywords and placeholders

  • Add @coderabbitai ignore or @coderabbit ignore anywhere in the PR description to prevent this PR from being reviewed.
  • Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
  • Add @coderabbitai anywhere in the PR title to generate the title automatically.

CodeRabbit Configuration File (.coderabbit.yaml)

  • You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
  • Please see the configuration documentation for more information.
  • If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Status, Documentation and Community

  • Visit our Status Page to check the current availability of CodeRabbit.
  • Visit our Documentation for detailed information on how to use CodeRabbit.
  • Join our Discord Community to get help, request features, and share feedback.
  • Follow us on X/Twitter for updates and announcements.

/* Return codes */
#define FLB_MULTILINE_OK 0
#define FLB_MULTILINE_PROCESSED 1 /* Reserved */
#define FLB_MULTILINE_TRUNCATED 2
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changing to status code 2 is needed because status code 1 will be collided for FLB_TRUE status.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

♻️ Duplicate comments (1)
src/multiline/flb_ml_parser.c (1)

124-128: Deferred registry linkage is correct.

Addresses earlier feedback about adding to the list only after successful init.

🧹 Nitpick comments (2)
tests/internal/multiline.c (1)

114-115: Updated container_mix expectations — add a brief note for future readers.

"bbccdd-out\n" spans stdout-only concatenation across multiple records; a short comment here will avoid confusion about why dd-out is appended to the earlier stdout chunk while stderr pieces are separated.

src/multiline/flb_ml_parser.c (1)

31-45: Sane defaults helper LGTM.

Small nit: consider making the name member const char * in the params struct to avoid the cast.

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

💡 Knowledge Base configuration:

  • MCP integration is disabled by default for public repositories
  • Jira integration is disabled by default for public repositories
  • Linear integration is disabled by default for public repositories

You can enable these sources in your CodeRabbit configuration.

📥 Commits

Reviewing files that changed from the base of the PR and between 3384321 and ecf141d.

📒 Files selected for processing (2)
  • src/multiline/flb_ml_parser.c (3 hunks)
  • tests/internal/multiline.c (4 hunks)
🧰 Additional context used
🧬 Code graph analysis (2)
tests/internal/multiline.c (6)
src/flb_config.c (2)
  • flb_config_init (216-421)
  • flb_config_exit (423-594)
src/flb_parser.c (1)
  • flb_parser_get (1022-1042)
src/multiline/flb_ml.c (3)
  • flb_ml_create (868-920)
  • flb_ml_append_text (664-754)
  • flb_ml_destroy (981-1006)
src/multiline/flb_ml_parser.c (4)
  • flb_ml_parser_params_default (32-44)
  • flb_ml_parser_create_params (47-128)
  • flb_ml_parser_init (130-140)
  • flb_ml_parser_instance_create (260-311)
src/multiline/flb_ml_rule.c (1)
  • flb_ml_rule_create (48-115)
src/multiline/flb_ml_stream.c (1)
  • flb_ml_stream_create (223-276)
src/multiline/flb_ml_parser.c (2)
include/fluent-bit/flb_mem.h (2)
  • flb_calloc (84-96)
  • flb_free (126-128)
src/flb_sds.c (2)
  • flb_sds_create (78-90)
  • flb_sds_destroy (389-399)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (29)
  • GitHub Check: pr-windows-build / call-build-windows-package (Windows 64bit, x64, x64-windows-static, 3.31.6)
  • GitHub Check: pr-windows-build / call-build-windows-package (Windows 32bit, x86, x86-windows-static, 3.31.6)
  • GitHub Check: pr-windows-build / call-build-windows-package (Windows 64bit (Arm64), amd64_arm64, -DCMAKE_SYSTEM_NAME=Windows -DCMA...
  • GitHub Check: run-ubuntu-unit-tests (-DFLB_SANITIZE_MEMORY=On, 3.31.6, clang, clang++)
  • GitHub Check: run-ubuntu-unit-tests (-DFLB_SIMD=Off, 3.31.6, gcc, g++)
  • GitHub Check: run-ubuntu-unit-tests (-DFLB_SIMD=Off, 3.31.6, clang, clang++)
  • GitHub Check: run-ubuntu-unit-tests (-DFLB_SIMD=On, 3.31.6, gcc, g++)
  • GitHub Check: run-ubuntu-unit-tests (-DFLB_ARROW=On, 3.31.6, gcc, g++)
  • GitHub Check: run-ubuntu-unit-tests (-DFLB_SANITIZE_THREAD=On, 3.31.6, clang, clang++)
  • GitHub Check: run-ubuntu-unit-tests (-DFLB_SIMD=On, 3.31.6, clang, clang++)
  • GitHub Check: run-ubuntu-unit-tests (-DFLB_SANITIZE_THREAD=On, 3.31.6, gcc, g++)
  • GitHub Check: run-ubuntu-unit-tests (-DFLB_SANITIZE_MEMORY=On, 3.31.6, gcc, g++)
  • GitHub Check: run-ubuntu-unit-tests (-DFLB_SMALL=On, 3.31.6, gcc, g++)
  • GitHub Check: run-ubuntu-unit-tests (-DSANITIZE_ADDRESS=On, 3.31.6, clang, clang++)
  • GitHub Check: run-ubuntu-unit-tests (-DSANITIZE_UNDEFINED=On, 3.31.6, clang, clang++)
  • GitHub Check: run-ubuntu-unit-tests (-DFLB_SMALL=On, 3.31.6, clang, clang++)
  • GitHub Check: run-ubuntu-unit-tests (-DFLB_JEMALLOC=Off, 3.31.6, gcc, g++)
  • GitHub Check: run-ubuntu-unit-tests (-DSANITIZE_UNDEFINED=On, 3.31.6, gcc, g++)
  • GitHub Check: run-ubuntu-unit-tests (-DFLB_JEMALLOC=On, 3.31.6, clang, clang++)
  • GitHub Check: run-ubuntu-unit-tests (-DFLB_JEMALLOC=Off, 3.31.6, clang, clang++)
  • GitHub Check: run-ubuntu-unit-tests (-DFLB_COVERAGE=On, 3.31.6, gcc, g++)
  • GitHub Check: run-ubuntu-unit-tests (-DSANITIZE_ADDRESS=On, 3.31.6, gcc, g++)
  • GitHub Check: run-ubuntu-unit-tests (-DFLB_JEMALLOC=On, 3.31.6, gcc, g++)
  • GitHub Check: pr-compile-system-libs (-DFLB_PREFER_SYSTEM_LIBS=On, 3.31.6, gcc, g++, ubuntu-22.04, clang-12)
  • GitHub Check: pr-compile-system-libs (-DFLB_PREFER_SYSTEM_LIBS=On, 3.31.6, gcc, g++, ubuntu-24.04, clang-14)
  • GitHub Check: pr-compile-centos-7
  • GitHub Check: pr-compile-system-libs (-DFLB_PREFER_SYSTEM_LIBS=On, 3.31.6, clang, clang++, ubuntu-24.04, clang-14)
  • GitHub Check: pr-compile-system-libs (-DFLB_PREFER_SYSTEM_LIBS=On, 3.31.6, clang, clang++, ubuntu-22.04, clang-12)
  • GitHub Check: PR - fuzzing test
🔇 Additional comments (9)
tests/internal/multiline.c (7)

395-398: Defensive early-return in flush callback is correct.

Prevents NULL deref when tests pass no expected-result context.


1481-1484: OK to set buffer limit via string literal.

This exercises the binary-size parser path; 80 bytes is a good minimal boundary.


1493-1503: Params-based initializer usage looks good.

Using key_content="log" with docker parser context matches the JSON-extraction intent.


1517-1519: Passing NULL to flush callback now safe after earlier guard.

Matches the intent to test return codes only.


1523-1531: Return-code assertions cover both OK and TRUNCATED paths.

Nice, this verifies the new truncation code path end-to-end.


1548-1548: Good addition to TEST_LIST.

Keeps the new path exercised in CI.


1533-1535: Verify that flb_config_exit() invokes flb_ml_exit(config) to clean up the parser registry

Run:

rg -nP 'flb_ml_exit' -C3 src/flb_config.c
src/multiline/flb_ml_parser.c (2)

23-30: Includes are appropriate for the new params API.


198-223: Thin wrapper to params keeps legacy API intact.

Straightforward pass-through without behavior changes.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (2)
tests/internal/multiline.c (2)

395-398: Move the NULL-guard to the top of flush_callback to avoid noisy prints

Short-circuit before printing/decoding when tests pass NULL to focus only on status. This reduces log noise and cycles.

Apply this diff within the changed hunk to remove the late guard:

-    if (!res) {
-        return 0;
-    }

And add the guard right after the local is set (outside the changed hunk), e.g.:

static int flush_callback(struct flb_ml_parser *parser,
                          struct flb_ml_stream *mst,
                          void *data, char *buf_data, size_t buf_size)
{
    struct expected_result *res = data;
    if (!res) {
        return 0;
    }
    /* ... existing prints and validation ... */
}

1463-1532: Solid truncation status test; tighten a couple of details

  • Remove unused variable to keep warnings clean.
  • Optional: either adjust the comment to reflect that this test concatenates raw text (JSON strings), or switch to object ingestion if you want to assert that the limit applies specifically to key_content="log".
  • Optional: assert the resolved buffer limit to catch config parsing regressions.

Minimal diffs:

  1. Drop the unused declaration.
-    struct flb_parser *p;
  1. Keep the comment accurate (if staying with text ingestion):
-    /*
-     * A realistic Docker log where the content of the "log" field will be
-     * concatenated, and that concatenated buffer is what should be truncated.
-     */
+    /*
+     * We append JSON-formatted text lines; the test exercises truncation on the
+     * concatenated text buffer (not parsing the "log" field here).
+     */
  1. Assert the parsed limit (right after flb_ml_create):
     ml = flb_ml_create(config, "limit-test");
     TEST_CHECK(ml != NULL);
+    TEST_CHECK(ml->buffer_limit == 80);

Option B (if you prefer key_content-aware validation): pack line1/line2 into msgpack maps {"log": ..., "stream": ...} and use flb_ml_append_object() instead of flb_ml_append_text(). I can provide a compact patch if you want to go that route.

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

💡 Knowledge Base configuration:

  • MCP integration is disabled by default for public repositories
  • Jira integration is disabled by default for public repositories
  • Linear integration is disabled by default for public repositories

You can enable these sources in your CodeRabbit configuration.

📥 Commits

Reviewing files that changed from the base of the PR and between 5278995 and 791d9e4.

📒 Files selected for processing (1)
  • tests/internal/multiline.c (4 hunks)
🧰 Additional context used
🧬 Code graph analysis (1)
tests/internal/multiline.c (5)
src/flb_config.c (2)
  • flb_config_init (216-421)
  • flb_config_exit (423-594)
src/multiline/flb_ml.c (3)
  • flb_ml_create (868-920)
  • flb_ml_append_text (664-754)
  • flb_ml_destroy (981-1006)
src/multiline/flb_ml_parser.c (4)
  • flb_ml_parser_params_default (32-44)
  • flb_ml_parser_create_params (47-129)
  • flb_ml_parser_init (131-141)
  • flb_ml_parser_instance_create (261-312)
src/multiline/flb_ml_rule.c (1)
  • flb_ml_rule_create (48-115)
src/multiline/flb_ml_stream.c (1)
  • flb_ml_stream_create (223-276)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (29)
  • GitHub Check: pr-windows-build / call-build-windows-package (Windows 64bit (Arm64), amd64_arm64, -DCMAKE_SYSTEM_NAME=Windows -DCMA...
  • GitHub Check: pr-windows-build / call-build-windows-package (Windows 32bit, x86, x86-windows-static, 3.31.6)
  • GitHub Check: pr-windows-build / call-build-windows-package (Windows 64bit, x64, x64-windows-static, 3.31.6)
  • GitHub Check: run-ubuntu-unit-tests (-DFLB_ARROW=On, 3.31.6, gcc, g++)
  • GitHub Check: run-ubuntu-unit-tests (-DFLB_SANITIZE_THREAD=On, 3.31.6, clang, clang++)
  • GitHub Check: run-ubuntu-unit-tests (-DFLB_SIMD=Off, 3.31.6, clang, clang++)
  • GitHub Check: run-ubuntu-unit-tests (-DFLB_SIMD=On, 3.31.6, clang, clang++)
  • GitHub Check: run-ubuntu-unit-tests (-DFLB_SIMD=On, 3.31.6, gcc, g++)
  • GitHub Check: run-ubuntu-unit-tests (-DFLB_SIMD=Off, 3.31.6, gcc, g++)
  • GitHub Check: run-ubuntu-unit-tests (-DSANITIZE_UNDEFINED=On, 3.31.6, clang, clang++)
  • GitHub Check: run-ubuntu-unit-tests (-DFLB_SANITIZE_MEMORY=On, 3.31.6, clang, clang++)
  • GitHub Check: run-ubuntu-unit-tests (-DFLB_SANITIZE_THREAD=On, 3.31.6, gcc, g++)
  • GitHub Check: run-ubuntu-unit-tests (-DSANITIZE_ADDRESS=On, 3.31.6, clang, clang++)
  • GitHub Check: run-ubuntu-unit-tests (-DSANITIZE_UNDEFINED=On, 3.31.6, gcc, g++)
  • GitHub Check: run-ubuntu-unit-tests (-DFLB_COVERAGE=On, 3.31.6, gcc, g++)
  • GitHub Check: run-ubuntu-unit-tests (-DFLB_SANITIZE_MEMORY=On, 3.31.6, gcc, g++)
  • GitHub Check: run-ubuntu-unit-tests (-DFLB_SMALL=On, 3.31.6, clang, clang++)
  • GitHub Check: run-ubuntu-unit-tests (-DFLB_JEMALLOC=Off, 3.31.6, clang, clang++)
  • GitHub Check: run-ubuntu-unit-tests (-DSANITIZE_ADDRESS=On, 3.31.6, gcc, g++)
  • GitHub Check: run-ubuntu-unit-tests (-DFLB_JEMALLOC=On, 3.31.6, clang, clang++)
  • GitHub Check: run-ubuntu-unit-tests (-DFLB_JEMALLOC=On, 3.31.6, gcc, g++)
  • GitHub Check: run-ubuntu-unit-tests (-DFLB_SMALL=On, 3.31.6, gcc, g++)
  • GitHub Check: run-ubuntu-unit-tests (-DFLB_JEMALLOC=Off, 3.31.6, gcc, g++)
  • GitHub Check: pr-compile-system-libs (-DFLB_PREFER_SYSTEM_LIBS=On, 3.31.6, gcc, g++, ubuntu-24.04, clang-14)
  • GitHub Check: pr-compile-system-libs (-DFLB_PREFER_SYSTEM_LIBS=On, 3.31.6, clang, clang++, ubuntu-24.04, clang-14)
  • GitHub Check: pr-compile-system-libs (-DFLB_PREFER_SYSTEM_LIBS=On, 3.31.6, gcc, g++, ubuntu-22.04, clang-12)
  • GitHub Check: pr-compile-centos-7
  • GitHub Check: pr-compile-system-libs (-DFLB_PREFER_SYSTEM_LIBS=On, 3.31.6, clang, clang++, ubuntu-22.04, clang-12)
  • GitHub Check: PR - fuzzing test
🔇 Additional comments (2)
tests/internal/multiline.c (2)

114-115: Updated expectations for interleaved container streams look correct

The combined stdout record now accumulating "bbcc" with the later "dd-out" and the stderr record as "dd-err" aligns with per-stream multiline state surviving interleaved CRI entries. No action needed.


1545-1545: Test registration LGTM

New test is properly added to TEST_LIST.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant