Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

chore(metric_extraction): Optimize labels result #15068

Merged
merged 1 commit into from
Nov 22, 2024

Conversation

shantanualsi
Copy link
Contributor

@shantanualsi shantanualsi commented Nov 22, 2024

What this PR does / why we need it:

The current implementation of LabelsResult method used in critical flows within metrics_generation, pipeline, etc fetched UnsortedLabels in the buffer for each category (ParsedLabel, structured metadata, StreamLabel) and individually sorts them. The sorted results are cached in memory. Majority of resource utilisation here was on sorting labels of each of the categories and creating a copy from buffer.

The new implementation fetches all Unsorted Labels and sorts them collectively and caches the result first. Individual categories are segregated after caching.

(notice the labels.Copy is gone in the newer implementation in memory profile)

Results:

BenchmarkStreamLineSampleExtractor_Process

Cpu before:

cpu_before

Cpu after

cpu_aft

Mem before:

mem_before

Cpu after:

mem_after

BenchmarkReadWithStructuredMetadata: create a memchunk and iterate on it

cpu and memory

Screenshot 2024-11-22 at 14 05 34

benchstat result -

benchstat

Overall Summary from the results:

Excluding some variability in the measurements, The new implementation is at least 28% faster than the older one with a dramatic 89% improvement in memory usage. Each run also took 79.8% fewer allocs/op than the old implementation.

Which issue(s) this PR fixes:

Special notes for your reviewer:

Checklist

  • Reviewed the CONTRIBUTING.md guide (required)
  • Documentation added
  • Tests updated
  • Title matches the required conventional commits format, see here
    • Note that Promtail is considered to be feature complete, and future development for logs collection will be in Grafana Alloy. As such, feat PRs are unlikely to be accepted unless a case can be made for the feature actually being a bug fix to existing behavior.
  • Changes that require user attention or interaction to upgrade are documented in docs/sources/setup/upgrade/_index.md
  • If the change is deprecating or removing a configuration option, update the deprecated-config.yaml and deleted-config.yaml files respectively in the tools/deprecated-config-checker directory. Example PR

@shantanualsi shantanualsi marked this pull request as ready for review November 22, 2024 09:29
@shantanualsi shantanualsi requested a review from a team as a code owner November 22, 2024 09:29
if cached, ok := b.resultCache[hash]; ok {
return cached
}

result := NewLabelsResult(b.buf.String(), hash, stream, structuredMetadata, parsed)
// Now segregate the sorted labels into their categories
var stream, meta, parsed []labels.Label
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should you push it and re-use those slice from the same labels builders ?

// Check which category this label belongs to
if labelsContain(b.add[ParsedLabel], l.Name) {
parsed = append(parsed, l)
} else if labelsContain(b.add[StructuredMetadataLabel], l.Name) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Might be wise to do this test first.

Copy link
Contributor

@cyriltovena cyriltovena left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Some suggestions left.

Do you think we should also cache that call

func (l labelsResult) Labels() labels.Labels {
	return flattenLabels(nil, l.stream, l.structuredMetadata, l.parsed)
}

Seems like we could cache if it's being run multiple time on the same object since it's immutable.

@shantanualsi
Copy link
Contributor Author

Thanks! Will address the comments separately in a separate PR.

@shantanualsi shantanualsi merged commit 2ae1ead into main Nov 22, 2024
59 checks passed
@shantanualsi shantanualsi deleted the shantanu/optimize-grouped-labels branch November 22, 2024 12:47
@shantanualsi
Copy link
Contributor Author

To address the comments here, re-using the slices as expected seem to increase in-use memory as opposed to initializing the slices for parsed, SM and stream labels.
main...shantanu/improve-iterator-optimization

Also, the call func (l labelsResult) Labels() is now only used in tests, not in the critical path anymore. We don't need flattenLabels as all the labels are stored alreayd in the buffer and then sorted.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants