-
Notifications
You must be signed in to change notification settings - Fork 3.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
chore(metric_extraction): Optimize labels result #15068
Conversation
if cached, ok := b.resultCache[hash]; ok { | ||
return cached | ||
} | ||
|
||
result := NewLabelsResult(b.buf.String(), hash, stream, structuredMetadata, parsed) | ||
// Now segregate the sorted labels into their categories | ||
var stream, meta, parsed []labels.Label |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should you push it and re-use those slice from the same labels builders ?
// Check which category this label belongs to | ||
if labelsContain(b.add[ParsedLabel], l.Name) { | ||
parsed = append(parsed, l) | ||
} else if labelsContain(b.add[StructuredMetadataLabel], l.Name) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Might be wise to do this test first.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
Some suggestions left.
Do you think we should also cache that call
func (l labelsResult) Labels() labels.Labels {
return flattenLabels(nil, l.stream, l.structuredMetadata, l.parsed)
}
Seems like we could cache if it's being run multiple time on the same object since it's immutable.
Thanks! Will address the comments separately in a separate PR. |
To address the comments here, re-using the slices as expected seem to increase in-use memory as opposed to initializing the slices for parsed, SM and stream labels. Also, the call |
What this PR does / why we need it:
The current implementation of
LabelsResult
method used in critical flows within metrics_generation, pipeline, etc fetched UnsortedLabels in the buffer for each category (ParsedLabel, structured metadata, StreamLabel) and individually sorts them. The sorted results are cached in memory. Majority of resource utilisation here was on sorting labels of each of the categories and creating a copy from buffer.The new implementation fetches all Unsorted Labels and sorts them collectively and caches the result first. Individual categories are segregated after caching.
(notice the labels.Copy is gone in the newer implementation in memory profile)
Results:
BenchmarkStreamLineSampleExtractor_Process
Cpu before:
Cpu after
Mem before:
Cpu after:
BenchmarkReadWithStructuredMetadata: create a memchunk and iterate on it
cpu and memory
benchstat result -
Overall Summary from the results:
Excluding some variability in the measurements, The new implementation is at least 28% faster than the older one with a dramatic 89% improvement in memory usage. Each run also took 79.8% fewer allocs/op than the old implementation.
Which issue(s) this PR fixes:
Special notes for your reviewer:
Checklist
CONTRIBUTING.md
guide (required)feat
PRs are unlikely to be accepted unless a case can be made for the feature actually being a bug fix to existing behavior.docs/sources/setup/upgrade/_index.md
deprecated-config.yaml
anddeleted-config.yaml
files respectively in thetools/deprecated-config-checker
directory. Example PR