[deltatocumulativeprocessor] Introduce an upper bound for exp histogram buckets #36874

euroelessar · 2024-12-17T18:46:31Z

Description

This PR introduces a limit on the maximum number of exponential histogram buckets within the deltatocumulativeprocessor. Previously, when merging delta metrics into cumulative metrics, the resulting exponential histograms could grow very large, potentially causing excessive memory usage and processing overhead. By capping the number of buckets at 160 and dynamically downscaling histograms when necessary, this change ensures that the processor remains efficient and stable even when handling large, merged exponential histograms.

Link to tracking issue

Fixes #33277

Testing

Added unit tests for edge cases.

Documentation

Updated changelog.

…am buckets

euroelessar · 2024-12-17T19:10:53Z

@sh0rez Please have a look, it's an alternative take on #34157, it avoids any extra memory allocations and could be easier to follow.
All changes are in ExpHistogram.Add (and Merge called from it), as it's the only place which can grow the histogram.

sh0rez · 2024-12-19T10:49:06Z

processor/deltatocumulativeprocessor/internal/data/add.go

+	// Downscale if an expected number of buckets after the merge is too large.
+	if deltaScale := max(getDeltaScale(dp.Positive(), in.Positive()), getDeltaScale(dp.Negative(), in.Negative())); deltaScale > 0 {


iiuc, this downscales using the existing algorithm first, possibly exceeding the cap.
we check for that only after the fact, already having allocated more memory than we want, right?
this sounds like it's still vulnerable to DoS attacks

Downscaling operation should always produce no more buckets than the original, right?
In particular, expo.Downscale never performs allocations (except panicking path), so it can not be a DoS attack vector on memory.

very true, thanks for the hint ;)

github-actions · 2025-01-04T05:21:14Z

This PR was marked stale due to lack of activity. It will be closed in 14 days.

euroelessar · 2025-01-04T05:28:51Z

@sh0rez hi, would you be able to have an another look at the comment above?

sh0rez

sorry for the slow review!

sh0rez · 2025-01-14T12:54:43Z

processor/deltatocumulativeprocessor/internal/data/add.go

+// getDeltaScale computes how many times the histograms need to be downscaled to ensure
+// the bucket range after their merge fits within maxBuckets.
+// This logic assumes that trailing and leading zeros are going to be removed.
+func getDeltaScale(arel, brel pmetric.ExponentialHistogramDataPointBuckets) expo.Scale {


This is great, but I think it better fits the public api of the expo package, maybe like this?

package expo // Limit returns a target Scale that when be downscaled to, // the total bucket count after [Merge] never exceeds max func Limit(max int, scale Scale, arel, brel Buckets) Scale { ... }

sh0rez · 2025-01-14T12:55:16Z

processor/deltatocumulativeprocessor/internal/data/add.go

+	// Downscale if an expected number of buckets after the merge is too large.
+	if deltaScale := max(getDeltaScale(dp.Positive(), in.Positive()), getDeltaScale(dp.Negative(), in.Negative())); deltaScale > 0 {


very true, thanks for the hint ;)

sh0rez · 2025-01-14T13:05:22Z

processor/deltatocumulativeprocessor/internal/data/expo_test.go

+		dp:   expdp{Scale: 0, PosNeg: generateBins(10, 80, 1), Count: 160},
+		in:   expdp{Scale: 0, PosNeg: generateBins(80+10, 60, 2), Count: 120},
+		want: expdp{Scale: 0, PosNeg: generateBins(10, 80, 1, 60, 2), Count: 280},


why do you need generateBins and downscaled?
From looking at this test case, it's hard to reason about it. I cannot clearly see what either the input or the output looks like and as such not tell whether it's correct.
Is there a way to stick to written-out bins{} literals?

made maxBuckets configurable in code (var instead of const), updated the test to set it to max 8 buckets and moved to rawbs method
should be easier to read now, please let me know if it needs any further changes!

…tor-contrib into cap-exp-histograms-2

sh0rez · 2025-01-15T09:20:32Z

@open-telemetry/collector-contrib-approvers this is ready to merge

mx-psi

LGTM and the chosen size aligns with the SDK default 👍

tiit-clarifai · 2025-01-22T11:31:23Z

processor/deltatocumulativeprocessor/internal/data/add.go

+	to := max(
+		expo.Limit(maxBuckets, from, dp.Positive(), in.Positive()),
+		expo.Limit(maxBuckets, from, dp.Negative(), in.Negative()),
+	)


@euroelessar @mx-psi Shouldn't this be min instead of max? What I am seeing is that if I have no negative data points, then no scaling happens and the number of positive buckets still explodes.

I think the tests missed this because they all use PosNeg, i.e., use the same positive and negative data points.

@tiit-clarifai can you file an issue for this? It definitely looks suspicious

Done, #37416.

My bad, #37432 fixes it. Thanks for letting me know about it! I've also added tests to catch it in the future.

…am buckets (open-telemetry#36874) #### Description This PR introduces a limit on the maximum number of exponential histogram buckets within the deltatocumulativeprocessor. Previously, when merging delta metrics into cumulative metrics, the resulting exponential histograms could grow very large, potentially causing excessive memory usage and processing overhead. By capping the number of buckets at 160 and dynamically downscaling histograms when necessary, this change ensures that the processor remains efficient and stable even when handling large, merged exponential histograms. #### Link to tracking issue Fixes open-telemetry#33277  #### Testing Added unit tests for edge cases. #### Documentation Updated changelog.

[deltatocumulativeprocessor] Introduce an upper bound for exp histogr…

01ceeb5

…am buckets

euroelessar requested a review from a team as a code owner December 17, 2024 18:46

euroelessar requested a review from MovieStoreGuy December 17, 2024 18:46

github-actions bot assigned MovieStoreGuy Dec 17, 2024

github-actions bot added the processor/deltatocumulative label Dec 17, 2024

github-actions bot requested review from RichieSams and sh0rez December 17, 2024 18:46

euroelessar added 3 commits December 17, 2024 10:57

fix test & use /= 2 instead of bitshift

394b85a

add comment for removing zeros

8f73157

add comment for getDeltaScale

c890b96

extend comments

81e2821

sh0rez reviewed Dec 19, 2024

View reviewed changes

euroelessar requested a review from sh0rez December 20, 2024 17:12

github-actions bot added the Stale label Jan 4, 2025

github-actions bot removed the Stale label Jan 5, 2025

sh0rez suggested changes Jan 14, 2025

View reviewed changes

euroelessar added 6 commits January 14, 2025 13:44

Merge branch 'main' of github.com:open-telemetry/opentelemetry-collec…

78e6a26

…tor-contrib into cap-exp-histograms-2

move Limit to expo package

825818a

move tests to rawbs

04aca04

doc

60b9b35

make Limit return an absolute Scale

596318c

simplify

01f4573

sh0rez approved these changes Jan 15, 2025

View reviewed changes

mx-psi approved these changes Jan 16, 2025

View reviewed changes

mx-psi merged commit 37c8044 into open-telemetry:main Jan 16, 2025
163 checks passed

github-actions bot added this to the next release milestone Jan 16, 2025

tiit-clarifai reviewed Jan 22, 2025

View reviewed changes

tiit-clarifai mentioned this pull request Jan 22, 2025

[processor/deltatocumulative] Exponential histogram buckets exceed maxBuckets #37416

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[deltatocumulativeprocessor] Introduce an upper bound for exp histogram buckets #36874

[deltatocumulativeprocessor] Introduce an upper bound for exp histogram buckets #36874

euroelessar commented Dec 17, 2024

euroelessar commented Dec 17, 2024

sh0rez Dec 19, 2024

euroelessar Dec 19, 2024

sh0rez Jan 14, 2025

github-actions bot commented Jan 4, 2025

euroelessar commented Jan 4, 2025

sh0rez left a comment

sh0rez Jan 14, 2025

euroelessar Jan 14, 2025

sh0rez Jan 14, 2025

sh0rez Jan 14, 2025

euroelessar Jan 14, 2025

sh0rez commented Jan 15, 2025

mx-psi left a comment

tiit-clarifai Jan 22, 2025

mx-psi Jan 22, 2025

tiit-clarifai Jan 22, 2025

euroelessar Jan 23, 2025

		// Downscale if an expected number of buckets after the merge is too large.
		if deltaScale := max(getDeltaScale(dp.Positive(), in.Positive()), getDeltaScale(dp.Negative(), in.Negative())); deltaScale > 0 {

[deltatocumulativeprocessor] Introduce an upper bound for exp histogram buckets #36874

[deltatocumulativeprocessor] Introduce an upper bound for exp histogram buckets #36874

Conversation

euroelessar commented Dec 17, 2024

Description

Link to tracking issue

Testing

Documentation

euroelessar commented Dec 17, 2024

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

github-actions bot commented Jan 4, 2025

euroelessar commented Jan 4, 2025

sh0rez left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

sh0rez commented Jan 15, 2025

mx-psi left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment