-
Notifications
You must be signed in to change notification settings - Fork 5.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat(serializers.prometheusremotewrite): Log metric conversion errors #15893
feat(serializers.prometheusremotewrite): Log metric conversion errors #15893
Conversation
With this change, prometheusremotewrite will log the last recorded conversion error in `Serialize` call, if any errors at all. The error might be helpful for user to understand why some of the series were dropped during processing. In the same time, logging only the last error should prevent logs from pollution if too many conversion errors are taking place. See influxdata#15782
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@hagen1778 thanks a lot for your contribution! I have two comments in the code. Furthermore, I'm interested to learn why you do not log all errors but instead keep only the last error? What is the reasoning behind this?
plugins/serializers/prometheusremotewrite/prometheusremotewrite_test.go
Outdated
Show resolved
Hide resolved
plugins/serializers/prometheusremotewrite/prometheusremotewrite.go
Outdated
Show resolved
Hide resolved
* rm unnecessary Logger check for nil * use CaptureLogger instead of custom logger in tests
Thanks for the quick review!
It is expected that metrics batch can contain errors in each series. So depending on the batch size and processing, telegraf could log a lot of errors and confuse the user. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @hagen1778 for your update! I understand that you've chosen the last error as this simplifies the code. Could you please add this as a comment to the code so that the next reader doesn't wonder again? ;-)
Furthermore, would it be an option to log all errors with the Trace
log level for people that want to see what is failing? Otherwise they will need to reproduce the scenario very often if there are a number of metrics failing...
* log all parsing errors with `trace` level
@srebhan I've updated PR with your recommendations. I'd appreciate you taking another look. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @hagen1778! I will adapt the wording in the README a bit but otherwise the PR looks good.
Download PR build artifacts for linux_amd64.tar.gz, darwin_arm64.tar.gz, and windows_amd64.zip. 📦 Click here to get additional PR build artifactsArtifact URLs |
With this change, prometheusremotewrite will log the last recorded conversion error in
SerializeBatch
Summary
If the configured input contains bad data, then user might not be aware of the parsing errors as telegraf won't emit any logs or error messages. The error might be helpful for user to understand why some of the series were dropped during processing. In the same time, logging only the last error should prevent logs from pollution if too many conversion errors are taking place.
Checklist
Related issues
Resolves #15782