feat(serializers.prometheusremotewrite): Log metric conversion errors #15893

hagen1778 · 2024-09-16T12:59:18Z

With this change, prometheusremotewrite will log the last recorded conversion error in SerializeBatch

Summary

If the configured input contains bad data, then user might not be aware of the parsing errors as telegraf won't emit any logs or error messages. The error might be helpful for user to understand why some of the series were dropped during processing. In the same time, logging only the last error should prevent logs from pollution if too many conversion errors are taking place.

Checklist

No AI generated code was used in this PR

Related issues

Resolves #15782

With this change, prometheusremotewrite will log the last recorded conversion error in `Serialize` call, if any errors at all. The error might be helpful for user to understand why some of the series were dropped during processing. In the same time, logging only the last error should prevent logs from pollution if too many conversion errors are taking place. See influxdata#15782

srebhan

@hagen1778 thanks a lot for your contribution! I have two comments in the code. Furthermore, I'm interested to learn why you do not log all errors but instead keep only the last error? What is the reasoning behind this?

plugins/serializers/prometheusremotewrite/prometheusremotewrite_test.go

plugins/serializers/prometheusremotewrite/prometheusremotewrite.go

* rm unnecessary Logger check for nil * use CaptureLogger instead of custom logger in tests

hagen1778 · 2024-09-17T09:49:55Z

Thanks for the quick review!

I'm interested to learn why you do not log all errors but instead keep only the last error? What is the reasoning behind this?

It is expected that metrics batch can contain errors in each series. So depending on the batch size and processing, telegraf could log a lot of errors and confuse the user.
On the other hand, printing one error log per batch still gives a hint to the user that something is wrong and gives enough information to act on. It should work effectively if metrics batch contains only one bad metric and many bad metrics. In both cases, error message can be used in order to get insights about invalid data.

srebhan

Thanks @hagen1778 for your update! I understand that you've chosen the last error as this simplifies the code. Could you please add this as a comment to the code so that the next reader doesn't wonder again? ;-)

Furthermore, would it be an option to log all errors with the Trace log level for people that want to see what is failing? Otherwise they will need to reproduce the scenario very often if there are a number of metrics failing...

* log all parsing errors with `trace` level

hagen1778 · 2024-09-22T06:37:17Z

@srebhan I've updated PR with your recommendations. I'd appreciate you taking another look.

srebhan

Thanks @hagen1778! I will adapt the wording in the README a bit but otherwise the PR looks good.

plugins/serializers/prometheusremotewrite/README.md

telegraf-tiger · 2024-10-01T16:13:15Z

Download PR build artifacts for linux_amd64.tar.gz, darwin_arm64.tar.gz, and windows_amd64.zip.
Downloads for additional architectures and packages are available below.

☺️ This pull request doesn't significantly change the Telegraf binary size (less than 1%)

📦 Click here to get additional PR build artifacts

Artifact URLs

DEB	RPM	TAR GZ	ZIP
amd64.deb	aarch64.rpm	darwin_amd64.tar.gz	windows_amd64.zip
arm64.deb	armel.rpm	darwin_arm64.tar.gz	windows_arm64.zip
armel.deb	armv6hl.rpm	freebsd_amd64.tar.gz	windows_i386.zip
armhf.deb	i386.rpm	freebsd_armv7.tar.gz
i386.deb	ppc64le.rpm	freebsd_i386.tar.gz
mips.deb	riscv64.rpm	linux_amd64.tar.gz
mipsel.deb	s390x.rpm	linux_arm64.tar.gz
ppc64el.deb	x86_64.rpm	linux_armel.tar.gz
riscv64.deb		linux_armhf.tar.gz
s390x.deb		linux_i386.tar.gz
		linux_mips.tar.gz
		linux_mipsel.tar.gz
		linux_ppc64le.tar.gz
		linux_riscv64.tar.gz
		linux_s390x.tar.gz

…influxdata#15893)

telegraf-tiger bot added the feat Improvement on an existing feature such as adding a new setting/mode to an existing plugin label Sep 16, 2024

make linter happy

1054347

srebhan reviewed Sep 16, 2024

View reviewed changes

plugins/serializers/prometheusremotewrite/prometheusremotewrite_test.go Outdated Show resolved Hide resolved

plugins/serializers/prometheusremotewrite/prometheusremotewrite.go Outdated Show resolved Hide resolved

srebhan self-assigned this Sep 16, 2024

review fixes

265065b

* rm unnecessary Logger check for nil * use CaptureLogger instead of custom logger in tests

add logger object to all tests

347f84c

srebhan reviewed Sep 19, 2024

View reviewed changes

srebhan changed the title ~~feat: log conversion errors in prometheusremotewrite plugin~~ feat(serializers.prometheusremotewrite): Log metric conversion errors Sep 19, 2024

telegraf-tiger bot added the plugin/serializer label Sep 19, 2024

srebhan added the area/prometheus label Sep 19, 2024

hagen1778 added 2 commits September 22, 2024 08:23

* add better comments to explain why only last error is logged

9872ec6

* log all parsing errors with `trace` level

make linter happy

13d5afa

srebhan approved these changes Sep 30, 2024

View reviewed changes

plugins/serializers/prometheusremotewrite/README.md Outdated Show resolved Hide resolved

Update plugins/serializers/prometheusremotewrite/README.md

dcc4e5c

srebhan added the ready for final review This pull request has been reviewed and/or tested by multiple users and is ready for a final review. label Sep 30, 2024

srebhan assigned DStrand1 and unassigned srebhan Sep 30, 2024

DStrand1 approved these changes Oct 1, 2024

View reviewed changes

DStrand1 merged commit 7df29b0 into influxdata:master Oct 1, 2024
26 of 27 checks passed

github-actions bot added this to the v1.33.0 milestone Oct 1, 2024

asaharn pushed a commit to asaharn/telegraf that referenced this pull request Oct 16, 2024

feat(serializers.prometheusremotewrite): Log metric conversion errors (…

03a211e

…influxdata#15893)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(serializers.prometheusremotewrite): Log metric conversion errors #15893

feat(serializers.prometheusremotewrite): Log metric conversion errors #15893

hagen1778 commented Sep 16, 2024 •

edited

Loading

srebhan left a comment

hagen1778 commented Sep 17, 2024 •

edited

Loading

srebhan left a comment

hagen1778 commented Sep 22, 2024

srebhan left a comment

telegraf-tiger bot commented Oct 1, 2024

Artifact URLs

feat(serializers.prometheusremotewrite): Log metric conversion errors #15893

feat(serializers.prometheusremotewrite): Log metric conversion errors #15893

Conversation

hagen1778 commented Sep 16, 2024 • edited Loading

Summary

Checklist

Related issues

srebhan left a comment

Choose a reason for hiding this comment

hagen1778 commented Sep 17, 2024 • edited Loading

srebhan left a comment

Choose a reason for hiding this comment

hagen1778 commented Sep 22, 2024

srebhan left a comment

Choose a reason for hiding this comment

telegraf-tiger bot commented Oct 1, 2024

Artifact URLs

hagen1778 commented Sep 16, 2024 •

edited

Loading

hagen1778 commented Sep 17, 2024 •

edited

Loading