-
Notifications
You must be signed in to change notification settings - Fork 2.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[deltatocumulativeprocessor] Introduce an upper bound for exp histogram buckets #36874
[deltatocumulativeprocessor] Introduce an upper bound for exp histogram buckets #36874
Conversation
// Downscale if an expected number of buckets after the merge is too large. | ||
if deltaScale := max(getDeltaScale(dp.Positive(), in.Positive()), getDeltaScale(dp.Negative(), in.Negative())); deltaScale > 0 { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
iiuc, this downscales using the existing algorithm first, possibly exceeding the cap.
we check for that only after the fact, already having allocated more memory than we want, right?
this sounds like it's still vulnerable to DoS attacks
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Downscaling operation should always produce no more buckets than the original, right?
In particular, expo.Downscale
never performs allocations (except panicking path), so it can not be a DoS attack vector on memory.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
very true, thanks for the hint ;)
This PR was marked stale due to lack of activity. It will be closed in 14 days. |
@sh0rez hi, would you be able to have an another look at the comment above? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sorry for the slow review!
// getDeltaScale computes how many times the histograms need to be downscaled to ensure | ||
// the bucket range after their merge fits within maxBuckets. | ||
// This logic assumes that trailing and leading zeros are going to be removed. | ||
func getDeltaScale(arel, brel pmetric.ExponentialHistogramDataPointBuckets) expo.Scale { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is great, but I think it better fits the public api of the expo
package, maybe like this?
package expo
// Limit returns a target Scale that when be downscaled to,
// the total bucket count after [Merge] never exceeds max
func Limit(max int, scale Scale, arel, brel Buckets) Scale { ... }
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
// Downscale if an expected number of buckets after the merge is too large. | ||
if deltaScale := max(getDeltaScale(dp.Positive(), in.Positive()), getDeltaScale(dp.Negative(), in.Negative())); deltaScale > 0 { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
very true, thanks for the hint ;)
dp: expdp{Scale: 0, PosNeg: generateBins(10, 80, 1), Count: 160}, | ||
in: expdp{Scale: 0, PosNeg: generateBins(80+10, 60, 2), Count: 120}, | ||
want: expdp{Scale: 0, PosNeg: generateBins(10, 80, 1, 60, 2), Count: 280}, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why do you need generateBins
and downscaled
?
From looking at this test case, it's hard to reason about it. I cannot clearly see what either the input or the output looks like and as such not tell whether it's correct.
Is there a way to stick to written-out bins{}
literals?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
made maxBuckets
configurable in code (var instead of const), updated the test to set it to max 8 buckets and moved to rawbs
method
should be easier to read now, please let me know if it needs any further changes!
@open-telemetry/collector-contrib-approvers this is ready to merge |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM and the chosen size aligns with the SDK default 👍
to := max( | ||
expo.Limit(maxBuckets, from, dp.Positive(), in.Positive()), | ||
expo.Limit(maxBuckets, from, dp.Negative(), in.Negative()), | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@euroelessar @mx-psi Shouldn't this be min
instead of max
? What I am seeing is that if I have no negative data points, then no scaling happens and the number of positive buckets still explodes.
I think the tests missed this because they all use PosNeg
, i.e., use the same positive and negative data points.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@tiit-clarifai can you file an issue for this? It definitely looks suspicious
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done, #37416.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My bad, #37432 fixes it. Thanks for letting me know about it! I've also added tests to catch it in the future.
…am buckets (open-telemetry#36874) #### Description This PR introduces a limit on the maximum number of exponential histogram buckets within the deltatocumulativeprocessor. Previously, when merging delta metrics into cumulative metrics, the resulting exponential histograms could grow very large, potentially causing excessive memory usage and processing overhead. By capping the number of buckets at 160 and dynamically downscaling histograms when necessary, this change ensures that the processor remains efficient and stable even when handling large, merged exponential histograms. #### Link to tracking issue Fixes open-telemetry#33277 <!--Describe what testing was performed and which tests were added.--> #### Testing Added unit tests for edge cases. #### Documentation Updated changelog.
Description
This PR introduces a limit on the maximum number of exponential histogram buckets within the deltatocumulativeprocessor. Previously, when merging delta metrics into cumulative metrics, the resulting exponential histograms could grow very large, potentially causing excessive memory usage and processing overhead. By capping the number of buckets at 160 and dynamically downscaling histograms when necessary, this change ensures that the processor remains efficient and stable even when handling large, merged exponential histograms.
Link to tracking issue
Fixes #33277
Testing
Added unit tests for edge cases.
Documentation
Updated changelog.