You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Histograms are commonly used for recording latencies. The default values for bucket boundaries are []float64{0, 5, 10, 25, 50, 75, 100, 250, 500, 750, 1000, 2500, 5000, 7500, 10000} (code). This works well when working with milliseconds however the Prometheus documentation recommends using seconds, rather than milliseconds for units. When recording latency metrics in seconds with the default buckets, the vast majority of timings will land in the 0 second to 5 seconds bucket. This results inaccurate histogram quantile calculations.
Agreed. This is the reason we have not made the change.
It's also the reason explicitly called out in the specification:
SDKs SHOULD use the default value when boundaries are not explicitly provided, unless they have good reasons to use something different (e.g. for backward compatibility reasons in a stable SDK release).
This does not look like a proposal we plan to accept.
For anyone else finding this, it looks like most built-in exporters support specifying a different default-bucket-set when constructing the exporter. For example, with the prom-exporter:
// import otelprom "go.opentelemetry.io/otel/exporters/prometheus"// import sdkmetric "go.opentelemetry.io/otel/sdk/metric"// create an otel metric-exporter associated with a prometheus registrymetricExporter, err:=otelprom.New(
otelprom.WithRegisterer(promRegistry),
// OTEL default buckets assume you're using milliseconds. Substitute defaults// appropriate for units of seconds.otelprom.WithAggregationSelector(func(ik sdkmetric.InstrumentKind) sdkmetric.Aggregation {
switchik {
casesdkmetric.InstrumentKindHistogram:
return sdkmetric.AggregationExplicitBucketHistogram{
Boundaries: prometheus.DefBuckets,
NoMinMax: false,
}
default:
returnsdkmetric.DefaultAggregationSelector(ik)
}
}),
)
// do something with err// create a meter-provider associated with the exportermeterProvider:=sdkmetric.NewMeterProvider(
sdkmetric.WithReader(metricExporter),
)
// do something with meterProvider
Problem Statement
Histograms are commonly used for recording latencies. The default values for bucket boundaries are
[]float64{0, 5, 10, 25, 50, 75, 100, 250, 500, 750, 1000, 2500, 5000, 7500, 10000}
(code). This works well when working with milliseconds however the Prometheus documentation recommends using seconds, rather than milliseconds for units. When recording latency metrics in seconds with the default buckets, the vast majority of timings will land in the 0 second to 5 seconds bucket. This results inaccurate histogram quantile calculations.This is very similar to this issue in the .NET repo: open-telemetry/opentelemetry-dotnet#4797
Proposed Solution
opentelemetry-go could use a different set of default buckets when the histogram units are known to be seconds.
This was implemented in the .NET library here: open-telemetry/opentelemetry-dotnet#4820
Alternatives
The current workaround is to use the
WithExplicitBucketBoundaries
option on all histograms dealing in seconds.Prior Art
.NET issue: open-telemetry/opentelemetry-dotnet#4797
.NET solution: open-telemetry/opentelemetry-dotnet#4820
Additional Context
This would likely be a breaking change.
The text was updated successfully, but these errors were encountered: