chore(metric_extraction): Optimize labels result #15068

shantanualsi · 2024-11-22T08:55:41Z

What this PR does / why we need it:

The current implementation of LabelsResult method used in critical flows within metrics_generation, pipeline, etc fetched UnsortedLabels in the buffer for each category (ParsedLabel, structured metadata, StreamLabel) and individually sorts them. The sorted results are cached in memory. Majority of resource utilisation here was on sorting labels of each of the categories and creating a copy from buffer.

The new implementation fetches all Unsorted Labels and sorts them collectively and caches the result first. Individual categories are segregated after caching.

(notice the labels.Copy is gone in the newer implementation in memory profile)

Results:

BenchmarkStreamLineSampleExtractor_Process

Cpu before:

Cpu after

Mem before:

Cpu after:

BenchmarkReadWithStructuredMetadata: create a memchunk and iterate on it

cpu and memory

benchstat result -

Overall Summary from the results:

Excluding some variability in the measurements, The new implementation is at least 28% faster than the older one with a dramatic 89% improvement in memory usage. Each run also took 79.8% fewer allocs/op than the old implementation.

Which issue(s) this PR fixes:

Special notes for your reviewer:

Checklist

Reviewed the CONTRIBUTING.md guide (required)
Documentation added
Tests updated
Title matches the required conventional commits format, see here
- Note that Promtail is considered to be feature complete, and future development for logs collection will be in Grafana Alloy. As such, feat PRs are unlikely to be accepted unless a case can be made for the feature actually being a bug fix to existing behavior.
Changes that require user attention or interaction to upgrade are documented in docs/sources/setup/upgrade/_index.md
If the change is deprecating or removing a configuration option, update the deprecated-config.yaml and deleted-config.yaml files respectively in the tools/deprecated-config-checker directory. Example PR

cyriltovena · 2024-11-22T12:14:18Z

pkg/logql/log/labels.go

 	if cached, ok := b.resultCache[hash]; ok {
 		return cached
 	}

-	result := NewLabelsResult(b.buf.String(), hash, stream, structuredMetadata, parsed)
+	// Now segregate the sorted labels into their categories
+	var stream, meta, parsed []labels.Label


Should you push it and re-use those slice from the same labels builders ?

cyriltovena · 2024-11-22T12:23:55Z

pkg/logql/log/labels.go

+		// Check which category this label belongs to
+		if labelsContain(b.add[ParsedLabel], l.Name) {
+			parsed = append(parsed, l)
+		} else if labelsContain(b.add[StructuredMetadataLabel], l.Name) {


Might be wise to do this test first.

cyriltovena

LGTM

Some suggestions left.

Do you think we should also cache that call

func (l labelsResult) Labels() labels.Labels {
	return flattenLabels(nil, l.stream, l.structuredMetadata, l.parsed)
}

Seems like we could cache if it's being run multiple time on the same object since it's immutable.

shantanualsi · 2024-11-22T12:47:06Z

Thanks! Will address the comments separately in a separate PR.

shantanualsi · 2024-11-25T07:47:37Z

To address the comments here, re-using the slices as expected seem to increase in-use memory as opposed to initializing the slices for parsed, SM and stream labels.
main...shantanu/improve-iterator-optimization

Also, the call func (l labelsResult) Labels() is now only used in tests, not in the critical path anymore. We don't need flattenLabels as all the labels are stored alreayd in the buffer and then sorted.

Optimize grouped labels

3be736a

pull-request-size bot added the size/M label Nov 22, 2024

shantanualsi marked this pull request as ready for review November 22, 2024 09:29

shantanualsi requested a review from a team as a code owner November 22, 2024 09:29

cyriltovena reviewed Nov 22, 2024

View reviewed changes

cyriltovena approved these changes Nov 22, 2024

View reviewed changes

shantanualsi merged commit 2ae1ead into main Nov 22, 2024
59 checks passed

shantanualsi deleted the shantanu/optimize-grouped-labels branch November 22, 2024 12:47

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

chore(metric_extraction): Optimize labels result #15068

chore(metric_extraction): Optimize labels result #15068

shantanualsi commented Nov 22, 2024 •

edited

Loading

cyriltovena Nov 22, 2024

cyriltovena Nov 22, 2024

cyriltovena left a comment

shantanualsi commented Nov 22, 2024

shantanualsi commented Nov 25, 2024

chore(metric_extraction): Optimize labels result #15068

chore(metric_extraction): Optimize labels result #15068

Conversation

shantanualsi commented Nov 22, 2024 • edited Loading

Results:

BenchmarkStreamLineSampleExtractor_Process

Cpu before:

Cpu after

Mem before:

Cpu after:

BenchmarkReadWithStructuredMetadata: create a memchunk and iterate on it

cpu and memory

benchstat result -

Overall Summary from the results:

cyriltovena Nov 22, 2024

Choose a reason for hiding this comment

cyriltovena Nov 22, 2024

Choose a reason for hiding this comment

cyriltovena left a comment

Choose a reason for hiding this comment

shantanualsi commented Nov 22, 2024

shantanualsi commented Nov 25, 2024

shantanualsi commented Nov 22, 2024 •

edited

Loading