-
Notifications
You must be signed in to change notification settings - Fork 5.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat(common.socket): Allow parallel parsing with a pool of workers #15891
feat(common.socket): Allow parallel parsing with a pool of workers #15891
Conversation
Hey! Thanks for being so active and contributing your findings and fixes! We recently added something similar to the kafka consumer plugin allowing to override the timestamp provided by parsing.
Option two was discussed a while ago but we never found the time to implement it. Let me know if you are interested in this and I will detail more on what needs to be done. |
You're welcome, it helps me too, since I now only need to maintain some proprietary plugins. Maintaining a fork that touches a lot of the internals is very time consuming. And most importantly, I enjoy contributing. I am also glad that my company allows me to. I'm up to working on it. I wasn't really aware of how the internals for Telegraf worked, since I worked mostly on plugins. This weekend of profiling and optimizing really made things a lot clearer. I think I see how it would be implemented on the Regarding moving parsing for |
The idea is to add a general
to all inputs by adding a new config option (and parse it) to the inputs. Then you adapt the metric using the Does this make things more clear?
I think we might accept this if there is a compelling argumentation why this is needed and what the benefit will be. I guess you are already having those numbers as you are asking... ;-) |
Yes, that makes it clear and is about what I expected. I'll try to work on it tomorrow! I'll also work on the async parsing. |
Just to be clear, async parsing should be an own PR! Please also double check that parser calls are thread-safe! I remember that we had this a while back with Avro but I want to urge you to double check! :-) |
Yes, that was my plan as well :) Easier to get things merged when the changes are small and isolated. I thought the "contract" requires calls to Parse to be thread safe, right? |
I am working on implementing it, but I am a little uncertain how this would work for Service plugins. Those often don't do anything in I was thinking of maybe adding the same feature to the RunningParser, but even then there is no guarantee this is the correct time. The time when the Parse function is called, does not need to be the same as the time the original data comes in. For example, I read the UDP buffer, and then send the data into a goroutine to be parsed. The goroutine is not guaranteed to be started immediately. |
@LarsStegman right, service plugins will need to take care of timestamp overriding themselves... |
8f7aa5d
to
538a885
Compare
c41db77
to
9eaa313
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @LarsStegman for your contribution! A few comments from my side.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I answered some questions and gave some explanations. I have not yet implemented any changes, I will do that tomorrow.
Please move the time-source part to another PR and keep the parallelization in here! |
bad5458
to
a22481d
Compare
814740d
to
6d44ec7
Compare
Download PR build artifacts for linux_amd64.tar.gz, darwin_arm64.tar.gz, and windows_amd64.zip. 📦 Click here to get additional PR build artifactsArtifact URLs |
@LarsStegman one more question because we had issues in the past... Did you test if taking down telegraf still works reliably with parallel workers? |
With the first implementation with just creating new goroutines, it did not. I am not 100% sure with the pool implementation. I don't remember seeing issues, but I will test it tomorrow. I can imagine some things breaking, because the channel is already closed when parsing is finished or something. I should probably add some wait group things or something. Edit: looking at the code again, there shouldn't be any problems, because I already wait until the pool is done. I will test it tomorrow to be sure, though. |
@srebhan yeah, no problems. It stops gracefully. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for all your effort @LarsStegman!
Summary
This is a proof of concept for the issue I raised in #15884. I feel like there might be a better way to implement this, than my way. I was thinking of something like the Options pattern, but I am not sure about this.
I am very open to discuss this! @srebhan do you have time to discuss this? I am not even sure if it is something you're open to add to Telegraf, but I think it is worth it considering the increased performance we see in our application.
Checklist
Related issues
resolves #15884