Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

prometheusremotewrite: Untyped values are dropped silently #15782

Closed
hagen1778 opened this issue Aug 27, 2024 · 3 comments · Fixed by #15893
Closed

prometheusremotewrite: Untyped values are dropped silently #15782

hagen1778 opened this issue Aug 27, 2024 · 3 comments · Fixed by #15893
Labels
bug unexpected problem or unintended behavior

Comments

@hagen1778
Copy link
Contributor

hagen1778 commented Aug 27, 2024

Relevant telegraf.conf

[[inputs.file]]
  files = ['metrics.influx']
  data_format = "influx"
[[outputs.http]]
  url = "http://localhost:8428/api/v1/write"
  data_format = "prometheusremotewrite"
  [outputs.http.headers]
     Authorization = "Bearer redacted"
     Content-Type = "application/x-protobuf"
     Content-Encoding = "snappy"
     X-Prometheus-Remote-Write-Version = "0.1.0"
  [outputs.http.tagpass]
    data_type = ["application"]

Logs from Telegraf

No logs

System info

telegraf-1.31.3

Docker

No response

Steps to reproduce

  1. Create a file metrics.influx with following content:
measurement,az=us-east-1a,component=ingester,serviceType=application,cluster=foo,environment=test,host=bar,uid=1234567890,instanceType=s-tier,job=server,region=us-east-1 throughput="0" 1724701415000000000
  1. Check that field value throughput="0" is quoted
  2. Run telegraf with the config mentioned above in the report
  3. Observe telegraf logs or logs of the remote database

Expected behavior

  1. Telegraf should notify user that field throughput="0" can't be parsed. This can be done via logs or metrics. If logs are too verbose, telegraf should at least check for errors once per batch.
  2. Telegraf should try parsing quoted value in case if it can be converted to numeric value.

Actual behavior

Nothing happens. Metrics are collected from the file and silently dropped here

case telegraf.Untyped:
value, ok := prometheus.SampleValue(field.Value)
if !ok {
continue
}

No logs are printed, no hints about what's happening.
The remote database receives POST requests with empty body and can't provide a hint to a user what's wrong.

Additional info

No response

@hagen1778 hagen1778 added the bug unexpected problem or unintended behavior label Aug 27, 2024
@srebhan
Copy link
Member

srebhan commented Aug 29, 2024

@hagen1778 this is a limitation of prometheus and documented in the README of the serializer:

Note: String fields are ignored and do not produce Prometheus metrics.

We could log those once if that's sufficient for your use-case!?

@srebhan srebhan added the waiting for response waiting for response from contributor label Aug 29, 2024
@hagen1778
Copy link
Contributor Author

We could log those once if that's sufficient for your use-case!?

Yes, I think something like that would be sufficient.

The problem I faced was a user complaining that remote destination (VictoriaMetrics) was dropping data ingested from telegraf client. The further investigation revealed that remote destination was receiving empty POST requests from telegraf.
I had to check telegraf code in order to understand what could have caused it sending empty requests, and this is how I discovered this behavior.

Logging each skipped line could be too verbose. Maybe logging a single message like n/n lines were dropped in batch per batch if at least one row were skipped when forming this batch? I can contribute to the fix if you agree to this approach.

@telegraf-tiger telegraf-tiger bot removed the waiting for response waiting for response from contributor label Aug 30, 2024
@srebhan
Copy link
Member

srebhan commented Aug 30, 2024

Please feel free to put up a PR. If you cannot do the message shown above, simply log the metric-name and field and then mark it in a map as "already logged" and skip logging next time.

I think a warning would probably be appropriate.

hagen1778 added a commit to hagen1778/telegraf that referenced this issue Sep 16, 2024
With this change, prometheusremotewrite will log the last recorded
conversion error in `Serialize` call, if any errors at all.
The error might be helpful for user to understand why some of the
series were dropped during processing.
In the same time, logging only the last error should prevent logs from
pollution if too many conversion errors are taking place.

See influxdata#15782
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug unexpected problem or unintended behavior
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants