feat(codecs): Implement chunked GELF decoding #20859

jorgehermo9 · 2024-07-15T09:08:59Z

Closes #20769. This PR is kind of large(900 lines of code, 600 are from generated docs), if your prefer to chat via discord (and in order to be more agile while merging this), I'm in the vector community server, username @jorgehermo9.

Implementation is based on Graylog's documentation and Graylog's go-gelf library

In my local environment some tests are failing. Could you please trigger the CI so I can see if it is a problem of my environment and if not, I can proceed to fix them?

jorgehermo9 · 2024-07-15T09:12:13Z

Cargo.toml

@@ -139,6 +139,7 @@ serde_json = { version = "1.0.120", default-features = false, features = ["raw_v
 serde = { version = "1.0.204", default-features = false, features = ["alloc", "derive", "rc"] }
 toml = { version = "0.8.14", default-features = false, features = ["display", "parse"] }
 vrl = { version = "0.16.1", features = ["arbitrary", "cli", "test", "test_framework"] }
+tokio = { version = "1.38.0", default-features = false, features = ["full"] }


Needed to use tokio inside the lib/codecs crate, in order to implement gelf decoding timeouts with tokio tasks, so I added the dependency as a workspace one. If there is any problem with this, we may find another solution

lib/codecs/Cargo.toml

jorgehermo9 · 2024-07-15T09:22:24Z

lib/codecs/src/decoding/framing/chunked_gelf.rs

+    [const { Bytes::new() }; GELF_MAX_TOTAL_CHUNKS as usize];
+const DEFAULT_TIMEOUT_MILLIS: u64 = 5000;
+// TODO: ask what would be an appropriate default value for this
+const DEFAULT_PENDING_MESSAGES_LIMIT: usize = 1000;


I don't know what value is appropriate here. Do you have any recommendation?

This limit was enforced so we have a memory-bounded decoder.

The maximum UDP packet size is 65536 bytes... so with this limit, I think we have roughly 65MB of memory limit for pending messages storage.

However, the framing is agnostic of the transport protocol, so maybe other protocols does not have that per-message size limit and thus this can be "theoretically unbounded" (for example, reading raw bytes from a file).

Should we enforce too a per-message limit such as

vector/lib/codecs/src/decoding/framing/character_delimited.rs

Line 51 in 1579627

pub max_length: Option<usize>,

?

I was hoping the reference implementation would serve as prior art here, but https://github.com/Graylog2/graylog2-server/blob/3c7f9df250f7d58d99e9c554d9307dc1eec9fdac/graylog2-server/src/main/java/org/graylog2/inputs/codecs/GelfChunkAggregator.java seems like they have no pending message limit, just the timeout of 5 seconds as you have. I think I'd suggest having this as an option for people that do want to bound the memory, but default to unlimited to match Graylog server behavior.

Addressed this in 85edb00. Feel free to resolve this thread if the change is what you expected

jorgehermo9 · 2024-07-15T09:24:43Z

lib/codecs/src/decoding/framing/chunked_gelf.rs

+    pub timeout_millis: u64,
+
+    /// The maximum number of pending uncomplete messages. If this limit is reached, the decoder will start
+    /// dropping chunks of new messages. This limit ensures the memory usage of the decoder's state is bounded.


The bound is per total messages, but there is no per-message memory usage limit. We can theoretically have a 100GB single message and it won't be limited by this setting.

As stated before, should we include a per-message limit?

I like the idea of having a configurable bound on the number of pending messages.

The chunked encoding is only used for UDP, yes? Shouldn't that provide a defacto bound on size? How can we have a 100 GB message?

The chunked encoding is only used for UDP, yes?

Yes, it is intended to use it only for UDP and therefore it would be limited by the UDP packets limit of 65KB.
Nevertheless, as the chunked_gelf is a framing method, nothing blocks user to use that method with other types of sources, for example, with file sources and explictly stating the config framing.method="chunked-gelf". Although, it really does not have sense to use that framing method outside of UDP socket sources, and no one will use that in real environments... So maybe it is ok to leave this as it is.

We could add a max_length option. This would be consistent with other framers: https://vector.dev/docs/reference/configuration/sources/socket/#framing.newline_delimited.max_length

In chunked_gelf's case, I think we'd want to limit the length of the accumulated chunks in addition to each individual chunk.

Given the nature of gelf's messages, as they are just json, I don't think it would be fine to just truncate the input, as the json message would be most likely broken and the GELF deserialization would fail in nearly all cases after truncating.

Should we instead discard the whole message (including previously stored cunks) if an individual chunk reaches its defined limit or the accumulated chunks limit is reached? I don't know if its worth to do this, but I'm open to implement it if you see that it would be worth

Ah, yes, I think we'd want to discard messages that exceed the limit (in the future we can route them to a "dead letter" output). This is consistent with the other framers (see example:

vector/lib/codecs/src/decoding/framing/character_delimited.rs

Lines 123 to 128 in 17466c6

warn!(

message = "Discarding frame larger than max_length.",

buf_len = buf.len(),

max_length = self.max_length,

internal_log_rate_limit = true

);

).

I think it'd be worth it to remove a DOS risk if it isn't too much effort to add.

lib/codecs/src/decoding/framing/chunked_gelf.rs

jorgehermo9 · 2024-10-26T09:24:05Z

lib/codecs/src/decoding/framing/chunked_gelf.rs

+    /// This limit takes only into account the message's payload and the GELF header bytes are excluded from the calculation.
+    /// The message's payload is the concatenation of all the chunks' payloads.
+    #[serde(skip_serializing_if = "vector_core::serde::is_default")]
+    pub max_message_length: Option<usize>,


same as above, should we use None by default or 65,535 * 128 bytes, which is the max UDP datagram size times the maximum number of chunks.

Both of those defaults would work (theoretically) the same for the UDP socket source.

Let's leave the default as None to match the other framers. Could we call this just max_length, though, also to match the other framers?

done in 8fbfeeb

lib/codecs/src/decoding/framing/chunked_gelf.rs

jszwedko

Nice work on this! I left a few more comments / questions.

lib/codecs/src/decoding/framing/chunked_gelf.rs

jszwedko · 2024-11-01T22:42:55Z

lib/codecs/src/decoding/framing/chunked_gelf.rs

+    /// This limit takes only into account the message's payload and the GELF header bytes are excluded from the calculation.
+    /// The message's payload is the concatenation of all the chunks' payloads.
+    #[serde(skip_serializing_if = "vector_core::serde::is_default")]
+    pub max_message_length: Option<usize>,


Let's leave the default as None to match the other framers. Could we call this just max_length, though, also to match the other framers?

lib/codecs/src/decoding/framing/chunked_gelf.rs

jszwedko · 2024-11-01T22:52:11Z

lib/codecs/src/decoding/framing/chunked_gelf.rs

+        chunk_length: usize,
+        max_chunk_length: usize,
+    },
+    #[snafu(display("Message with id {message_id} has exceeded the maximum message length and it will be dropped: got {message_length} bytes and max message length is {max_message_length} bytes. Discarding all buffered chunks of that message"))]


That's a good question. This has me wondering what the effect of returning an error from decode() is. Do you know if it kills the connection? Or if it just emits the error and continues decoding? If it is the latter, than I think returning an error here is appropriate (and we should probably update the other framers to do the same). If it kills the connection, we may want to avoid returning an error unless it is unrecoverable and simply emit warnings 🤔

jorgehermo9 · 2024-11-04T11:26:14Z

Hi @jszwedko thank you very much for the review. I think I addressed all that is left in 8fbfeeb and answer some questions.

lib/codecs/src/decoding/framing/chunked_gelf.rs

jszwedko

Thanks @jorgehermo9 ! I think this looks good to me now. Thank you for all of your work on this! It was a big one.

Are there any other changes you want to make before we merge this?

lib/codecs/src/decoding/framing/chunked_gelf.rs

jszwedko · 2024-11-04T19:04:01Z

lib/codecs/src/decoding/framing/chunked_gelf.rs

+    // This limitation is due to the fact that the GELF format does not specify the length of the
+    // message, so we have to read all the bytes from the message (datagram)
+    bytes_decoder: BytesDecoder,
+    state: Arc<Mutex<HashMap<u64, MessageState>>>,


In UDP there is no concept of multiple streams as in TCP

That's true, but we did discuss in #8518 the idea of creating a separate "stream" by grouping together packets from the same source IP / port.

Definitely no need to do this now, but if you are ever interested, https://github.com/DataDog/lading is the tool we use to generate load for benchmarks. We could add the option to generate GELF data there in order to add a a benchmark experiment to Vector.

jorgehermo9 · 2024-11-04T19:18:32Z

Thank you very much for your reviews @jszwedko! It had been a hard one, I appreacite a lot your work on this.

Nothing left on this PR from my side. I'll address the decompression part in another PR 😀

jszwedko · 2024-11-04T21:41:27Z

@jorgehermo9 just FYI it looks like the tests fail to compile on Windows here 🤔

jorgehermo9 · 2024-11-04T22:05:44Z

Hi!

It is so weird that it didn't happen before this PR.
It seems that the GenerateConfig use statement is missing here

And in the code, I can see that it is only imported in linux

It is surprising that it worked before, I did not touch any of those files 😮

jorgehermo9 · 2024-11-04T22:11:28Z

I think 38a2af0 fixes it. Could you retrigger that action please?

jszwedko · 2024-11-04T22:14:36Z

Thanks! I triggered it. I'll trigger the component features check, too, since I think that cfg block may have been guarding against dead code warnings.

jszwedko

/ci-run-component-features

jszwedko · 2024-11-04T22:28:10Z

/ci-run-component-features

Mmm, I forgot this doesn't work on forks. I'll run it locally.

jszwedko · 2024-11-07T17:31:16Z

/ci-run-component-features

Mmm, I forgot this doesn't work on forks. I'll run it locally.

Confirmed that this passes. Marking for merge.

jorgehermo9 added 21 commits July 4, 2024 13:16

feat(gelf): Initial implementation of chunked gelf

de3e0c0

feat(gelf): improve error handling when decoding

c6e3204

feat(gelf): improve error handling when decoding

5a1b49f

test(gelf): Add more tests for chunked gelf

b85235e

feat(gelf): derive clone for chunked gelf decoder

baccaba

docs(gelf): Improve doc comments

148e09b

feat(gelf): improve mutex poisioning handling

530f243

feat(gelf): improve mutex poisioning handling

2bc10a7

style: fix Cargo.toml stlye

94b1cf6

style: fix Cargo.toml stlye

d405e6c

chore: remove comment

9b69264

fix(gelf): Fix partial read of buffer

bfc332e

test(gelf): Refactor some tests

0bc3376

style: fix log style

7f633b0

style: fix log style

103b933

test(socket): Add chunked gelf tests for socket source

6d4f8c7

docs(gelf): Improve doc comments

2fda018

docs(gelf): Improve doc comments

0862c10

test(socket): Add chunked gelf tests for unix socket source

a5b1121

chore: remove comments

a5757c6

docs: add changelog

33bda36

jorgehermo9 requested a review from a team as a code owner July 15, 2024 09:09

github-actions bot added the domain: sources Anything related to the Vector's sources label Jul 15, 2024

Merge branch 'master' into feature/chunked-gelf

aaa9dc6

jorgehermo9 commented Jul 15, 2024

View reviewed changes

lib/codecs/Cargo.toml Show resolved Hide resolved

jorgehermo9 commented Jul 15, 2024

View reviewed changes

lib/codecs/src/decoding/framing/chunked_gelf.rs Show resolved Hide resolved

jorgehermo9 commented Jul 15, 2024

View reviewed changes

lib/codecs/src/decoding/framing/chunked_gelf.rs Outdated Show resolved Hide resolved

jorgehermo9 commented Oct 26, 2024

View reviewed changes

jorgehermo9 added 2 commits October 26, 2024 11:27

docs: update component docs

ecc61da

test: fix broken max chunk length test

dc07697

jorgehermo9 commented Oct 26, 2024

View reviewed changes

lib/codecs/src/decoding/framing/chunked_gelf.rs Outdated Show resolved Hide resolved

jszwedko reviewed Nov 1, 2024

View reviewed changes

feat: remove max_chunk_length option

8fbfeeb

jorgehermo9 added 2 commits November 4, 2024 12:28

Merge with master

4896744

chore: update Cargo.lock

d438fb2

jorgehermo9 commented Nov 4, 2024

View reviewed changes

lib/codecs/src/decoding/framing/chunked_gelf.rs Show resolved Hide resolved

docs: generate component docs

901978c

jszwedko approved these changes Nov 4, 2024

View reviewed changes

jszwedko enabled auto-merge November 4, 2024 19:30

jszwedko added this pull request to the merge queue Nov 4, 2024

jorgehermo9 mentioned this pull request Nov 4, 2024

GELF codec needs to support chunking #13292

Open

github-merge-queue bot removed this pull request from the merge queue due to failed status checks Nov 4, 2024

pront changed the title ~~feat(decoding): Implement chunked GELF decoding~~ feat(codecs): Implement chunked GELF decoding Nov 4, 2024

fix: windows compilation

38a2af0

jorgehermo9 force-pushed the feature/chunked-gelf branch from 6f0ec1f to 38a2af0 Compare November 4, 2024 22:11

jszwedko reviewed Nov 4, 2024

View reviewed changes

jszwedko added this pull request to the merge queue Nov 7, 2024

Merged via the queue into vectordotdev:master with commit 07e2eae Nov 7, 2024
54 of 55 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(codecs): Implement chunked GELF decoding #20859

feat(codecs): Implement chunked GELF decoding #20859

jorgehermo9 commented Jul 15, 2024 •

edited

Loading

jorgehermo9 Jul 15, 2024 •

edited

Loading

jorgehermo9 Jul 15, 2024

jszwedko Oct 10, 2024

jorgehermo9 Oct 18, 2024

jorgehermo9 Jul 15, 2024 •

edited

Loading

jszwedko Jul 19, 2024

jorgehermo9 Jul 27, 2024 •

edited

Loading

jszwedko Oct 10, 2024 •

edited

Loading

jorgehermo9 Oct 18, 2024

jszwedko Oct 24, 2024

jorgehermo9 Oct 26, 2024

jszwedko Nov 1, 2024

jorgehermo9 Nov 4, 2024

jszwedko left a comment

jszwedko Nov 1, 2024

jszwedko Nov 1, 2024

jorgehermo9 commented Nov 4, 2024

jszwedko left a comment

jszwedko Nov 4, 2024

jorgehermo9 commented Nov 4, 2024 •

edited

Loading

jszwedko commented Nov 4, 2024

jorgehermo9 commented Nov 4, 2024 •

edited

Loading

jorgehermo9 commented Nov 4, 2024

jszwedko commented Nov 4, 2024

jszwedko left a comment

jszwedko commented Nov 4, 2024

jszwedko commented Nov 7, 2024

	warn!(
	message = "Discarding frame larger than max_length.",
	buf_len = buf.len(),
	max_length = self.max_length,
	internal_log_rate_limit = true
	);

feat(codecs): Implement chunked GELF decoding #20859

feat(codecs): Implement chunked GELF decoding #20859

Conversation

jorgehermo9 commented Jul 15, 2024 • edited Loading

jorgehermo9 Jul 15, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jorgehermo9 Jul 15, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jorgehermo9 Jul 27, 2024 • edited Loading

Choose a reason for hiding this comment

jszwedko Oct 10, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jszwedko left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jorgehermo9 commented Nov 4, 2024

jszwedko left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jorgehermo9 commented Nov 4, 2024 • edited Loading

jszwedko commented Nov 4, 2024

jorgehermo9 commented Nov 4, 2024 • edited Loading

jorgehermo9 commented Nov 4, 2024

jszwedko commented Nov 4, 2024

jszwedko left a comment

Choose a reason for hiding this comment

jszwedko commented Nov 4, 2024

jszwedko commented Nov 7, 2024

jorgehermo9 commented Jul 15, 2024 •

edited

Loading

jorgehermo9 Jul 15, 2024 •

edited

Loading

jorgehermo9 Jul 15, 2024 •

edited

Loading

jorgehermo9 Jul 27, 2024 •

edited

Loading

jszwedko Oct 10, 2024 •

edited

Loading

jorgehermo9 commented Nov 4, 2024 •

edited

Loading

jorgehermo9 commented Nov 4, 2024 •

edited

Loading