-
Notifications
You must be signed in to change notification settings - Fork 5.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat(processors.batch): Add batch processor #15869
Conversation
@srebhan the implementation is a little different than what you suggested. I did not see the added benefit of also specifying the batch size, since any overflow would probably just overflow into the next batch, which then also overflows. Please let me know if it should still be added. |
7bb64c2
to
a8490ff
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @LarsStegman for the contribution! Just two small comments from my side. Furthermore, should we also add a force_rebatch
option that will only overwrite the batch tag if it does not already exists? I'm asking because in the current default, Telegraf will run each processor twice, once before and once after aggregators if any.
Co-authored-by: Sven Rebhan <[email protected]>
Hmmm interesting. The results after the second pass will indeed be different, because the processor will already have run a pass and the count will have increased. I think it is better to add that feature indeed. It will be more predictable for users. |
I made the rebatching enabled by default, because it is less computational load. By default it will now not check the existing tags. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Two more comments. Regarding the flag, I'm fine either way but slightly tend to your approach...
This comment was marked as outdated.
This comment was marked as outdated.
d94682e
to
7e841d8
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@LarsStegman awesome! Maybe just avoid abbreviations in config options? How about naming this just batches
?
@srebhan looks like the test runner timed out or something |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @LarsStegman!
Download PR build artifacts for linux_amd64.tar.gz, darwin_arm64.tar.gz, and windows_amd64.zip. 🥳 This pull request decreases the Telegraf binary size by -8.01 % for linux amd64 (new size: 239.8 MB, nightly size 260.6 MB) 📦 Click here to get additional PR build artifactsArtifact URLs |
But, to parallelize processing, this doesn't help to create Also: Does this help as efficiently as possible? Because with this, all the metrics still go down the same pipeline (and by this: go-channel) and don't split up into multiple batch processing pipelines. Every metric still has to be sorted out by a metric-/tagpass deciding "no this metric isn't batch-tagged for me" on every processor defined. I have to say: I don't see this resolving any one of the claimed Feature-Requests. There shouldn't be batching-tags but
|
I mean, If I batch into, lets say, 2, I have to duplicate my complex code...
|
@knollet this processor was not meant for increasing the efficiency of the processor pipeline, but to increase output capacity of the end of the pipeline. See #15621 (comment) and down. |
Yeah, ok. I don't wanna trash talk your contribution. |
Summary
This new processor can distribute metrics across batches by adding a tag indicating what batch number it is in. This makes it possible to distribute the load of a high number of metrics across multiple instances of the same output plugin.
Checklist
Related issues
resolves #15621
resolves #11707