ddtrace/tracer: optimize span tags storage in the hot path #2799

darccio · 2024-07-26T08:56:01Z

What does this PR do?

Implements an automated optimizer based on selecting the best size for maps to store metas and metrics (also known as tags). It's going to be based on sketches-go and it'll be configurable, so costumers can choose the percentile used to initialize the Meta map size.

Also considering if there is benefit on pooling maps that are returned when the span is finished.

Motivation

Gathered data from our intake shows that each services has different usage patterns and requirements for tags. While some services may generate spans with a minimal number of tags (around 5 meta tags with no metrics), others may generate thousands of meta tags or thousands of metric tags.

In general, our internal data offers insights like an 80/20 proportion between meta tags (strings) and metric tags (float64).

This points that there may be wildly different needs in the same organization among services. So, this proposal tries to offer a solution that adapts better than just applying some sort of universal or average that doesn't apply to all services.

Pros

This approach has multiple advantages:

It reduces maps' regrowth, as maps will tend to have an average size.
It automatically adapts to each service's workload memory usage.

Cons

Some disadvantages are:

It'll consume more memory than required if newer spans use less tags.

Benchmarks

Original: map-based

The following results show the impact, and prove the idea, that map allocations have in performance:

BenchmarkSpanSetMeta/baseline-10                100000000               12.19 ns/op              1.000 tags/op         0 B/op          0 allocs/op
BenchmarkSpanSetMeta/random_number_of_tags_(meta_size=8)-10               447457              2372 ns/op                28.00 tags/op       3661 B/op          4 allocs/op
BenchmarkSpanSetMeta/random_number_of_tags_(meta_size=14)-10              748792              1545 ns/op                28.00 tags/op       1586 B/op          1 allocs/op
BenchmarkSpanSetMeta/random_number_of_tags_(meta_size=22)-10              780546              1455 ns/op                28.00 tags/op       1589 B/op          1 allocs/op
BenchmarkSpanSetMeta/random_number_of_tags_(meta_size=31)-10             1000000              1002 ns/op                28.00 tags/op        300 B/op          0 allocs/op
BenchmarkSpanSetMeta/random_number_of_tags_(meta_size=39)-10             1202672              1019 ns/op                28.00 tags/op        300 B/op          0 allocs/op
BenchmarkSpanSetMeta/random_number_of_tags_(meta_size=43)-10             1247433              1025 ns/op                28.00 tags/op        299 B/op          0 allocs/op
BenchmarkSpanSetMeta/random_number_of_tags_(meta_size=54)-10             1315690               893.9 ns/op              28.00 tags/op          0 B/op          0 allocs/op
BenchmarkSpanSetMeta/random_number_of_tags_(meta_size=70)-10             1407741               863.2 ns/op              28.00 tags/op          0 B/op          0 allocs/op

The average number of tags is based on the distribution obtained from our intake. Baseline is just setting a single tag over and over (b.N times) to map with size of 5. Allocations are caused by map's regrowth, and the average time corresponds to setting from 5 to 70 tags per span - selected randomly by the distribution-based number generator - on b.N span.

The upcoming Go 1.24 shows different results from previous versions due to swiss maps enabled by default.

BenchmarkSpanSetMeta/baseline-10                53694176                21.88 ns/op              1.000 tags/op         0 B/op          0 allocs/op
BenchmarkSpanSetMeta/random_number_of_tags_(meta_size=8)-10               534418              2026 ns/op                28.00 tags/op       3178 B/op          6 allocs/op
BenchmarkSpanSetMeta/random_number_of_tags_(meta_size=14)-10              740268              1626 ns/op                28.00 tags/op       2280 B/op          2 allocs/op
BenchmarkSpanSetMeta/random_number_of_tags_(meta_size=22)-10             1000000              1133 ns/op                28.00 tags/op       1214 B/op          1 allocs/op
BenchmarkSpanSetMeta/random_number_of_tags_(meta_size=31)-10             1614978               747.3 ns/op              28.00 tags/op         48 B/op          0 allocs/op
BenchmarkSpanSetMeta/random_number_of_tags_(meta_size=39)-10             1574341               731.1 ns/op              28.00 tags/op         48 B/op          0 allocs/op
BenchmarkSpanSetMeta/random_number_of_tags_(meta_size=43)-10             1489554               736.3 ns/op              28.00 tags/op         49 B/op          0 allocs/op
BenchmarkSpanSetMeta/random_number_of_tags_(meta_size=54)-10             1504778               736.9 ns/op              28.00 tags/op         48 B/op          0 allocs/op
BenchmarkSpanSetMeta/random_number_of_tags_(meta_size=70)-10             1441506               935.5 ns/op              28.00 tags/op          0 B/op          0 allocs/op

Other implementations considered

Other approaches have been explored:

A linked-list of tag holder structs with a tailor pool. It used unsafe. Reference implementation here.
Use channels to pool tag holder structs. Not pushed to the branch, it had poor concurrent performance.
puzpuzpuz/xsync-based implementation. Results below.
jongyunha/shrinkmap-based implementation. Results below.

# github.com/puzpuzpuz/xsync

BenchmarkSpanSetMeta/baseline-10         	12127081	        98.99 ns/op	         1.000 tags/op	      48 B/op	       3 allocs/op
BenchmarkSpanSetMeta/random_number_of_tags_(meta_size=8)-10         	  254242	      4554 ns/op	        28.00 tags/op	    5942 B/op	      92 allocs/op
BenchmarkSpanSetMeta/random_number_of_tags_(meta_size=14)-10        	  288242	      4407 ns/op	        28.00 tags/op	    5942 B/op	      92 allocs/op
BenchmarkSpanSetMeta/random_number_of_tags_(meta_size=22)-10        	  275898	      4385 ns/op	        28.00 tags/op	    5942 B/op	      92 allocs/op
BenchmarkSpanSetMeta/random_number_of_tags_(meta_size=31)-10        	  277908	      4278 ns/op	        28.00 tags/op	    5941 B/op	      92 allocs/op
BenchmarkSpanSetMeta/random_number_of_tags_(meta_size=39)-10        	  282535	      4247 ns/op	        28.00 tags/op	    5944 B/op	      92 allocs/op
BenchmarkSpanSetMeta/random_number_of_tags_(meta_size=43)-10        	  285816	      4537 ns/op	        28.00 tags/op	    5941 B/op	      92 allocs/op
BenchmarkSpanSetMeta/random_number_of_tags_(meta_size=54)-10        	  301698	      4333 ns/op	        28.00 tags/op	    5943 B/op	      92 allocs/op
BenchmarkSpanSetMeta/random_number_of_tags_(meta_size=70)-10        	  299659	      4131 ns/op	        28.00 tags/op	    5943 B/op	      92 allocs/op

# github.com/jongyunha/shrinkmap

BenchmarkSpanSetMeta/baseline-10         	45800142	        26.68 ns/op	         1.000 tags/op	       0 B/op	       0 allocs/op
BenchmarkSpanSetMeta/random_number_of_tags_(meta_size=8)-10         	  228518	      4714 ns/op	        28.00 tags/op	    4354 B/op	      16 allocs/op
BenchmarkSpanSetMeta/random_number_of_tags_(meta_size=14)-10        	  310806	      4780 ns/op	        28.00 tags/op	    4346 B/op	      16 allocs/op
BenchmarkSpanSetMeta/random_number_of_tags_(meta_size=22)-10        	  303002	      4720 ns/op	        28.00 tags/op	    4366 B/op	      16 allocs/op
BenchmarkSpanSetMeta/random_number_of_tags_(meta_size=31)-10        	  310052	      4018 ns/op	        28.00 tags/op	    4390 B/op	      16 allocs/op
BenchmarkSpanSetMeta/random_number_of_tags_(meta_size=39)-10        	  317164	      4015 ns/op	        28.00 tags/op	    4341 B/op	      16 allocs/op
BenchmarkSpanSetMeta/random_number_of_tags_(meta_size=43)-10        	  286071	      4449 ns/op	        28.00 tags/op	    4363 B/op	      16 allocs/op
BenchmarkSpanSetMeta/random_number_of_tags_(meta_size=54)-10        	  290245	      4154 ns/op	        28.00 tags/op	    4312 B/op	      16 allocs/op
BenchmarkSpanSetMeta/random_number_of_tags_(meta_size=70)-10        	  286952	      4350 ns/op	        28.00 tags/op	    4365 B/op	      16 allocs/op

These approaches were discarded because of their performance, complexity or type safety.

Reviewer's Checklist

Changed code has unit tests for its functionality at or near 100% coverage.
System-Tests covering this feature have been added and enabled with the va.b.c-dev version tag.
There is a benchmark for any new code, or changes to existing code.
If this interacts with the agent in a new way, a system test has been added.
Add an appropriate team label so this PR gets put in the right place for the release notes.
Non-trivial go.mod changes, e.g. adding new modules, are reviewed by @DataDog/dd-trace-go-guild.

Unsure? Have a question? Request a review!

…-settags

pr-commenter · 2024-08-06T08:50:53Z

Benchmarks

Benchmark execution time: 2024-12-02 16:08:30

Comparing candidate commit 51cf835 in PR branch dario.castane/exp-optimize-settags with baseline commit 697219e in branch main.

Found 0 performance improvements and 8 performance regressions! Performance is the same for 50 metrics, 1 unstable metrics.

scenario:BenchmarkPartialFlushing/Disabled-24

🟥 execution_time [+7.092ms; +10.172ms] or [+2.638%; +3.784%]

scenario:BenchmarkPartialFlushing/Enabled-24

🟥 execution_time [+6.160ms; +9.569ms] or [+2.237%; +3.475%]

scenario:BenchmarkSetTagMetric-24

🟥 execution_time [+9.520ns; +12.040ns] or [+8.033%; +10.160%]

scenario:BenchmarkSetTagString-24

🟥 execution_time [+10.311ns; +13.289ns] or [+9.113%; +11.746%]

scenario:BenchmarkSetTagStringer-24

🟥 execution_time [+10.961ns; +16.159ns] or [+7.834%; +11.549%]

scenario:BenchmarkSingleSpanRetention/no-rules-24

🟥 execution_time [+6.987µs; +10.504µs] or [+2.950%; +4.435%]

scenario:BenchmarkSingleSpanRetention/with-rules/match-all-24

🟥 execution_time [+6.893µs; +10.079µs] or [+2.887%; +4.222%]

scenario:BenchmarkSingleSpanRetention/with-rules/match-half-24

🟥 execution_time [+7.761µs; +11.563µs] or [+3.261%; +4.859%]

…to test realistic scenarios

… tests and production code

…-settags

Releasing the map after encoding the span for submission doesn't play well with our current codebase

… the Meta map at the right size impacts performance

github-actions · 2024-08-28T01:53:17Z

This PR is stale because it has been open 20 days with no activity. Remove stale label or comment or this will be closed in 10 days.

darccio added 7 commits July 23, 2024 16:34

ddtrace/tracer: dereference pointers to supported types in span.SetTag

6c5c797

ddtrace/tracer: basic implementation for pooled linked span tag list

46d44dc

ddtrace/tracer: rename tag.go to spantags.go

cebf289

ddtrace/tracer: embracing unsafe.Pointer and generic types

af4d586

ddtrace/tracer: add comments to clarify spanTags implementation

2f610fd

ddtrace/tracer: add benchmark for concurrently setting tags in a span

e0f5da3

ddtrace/tracer: preallocate pool in BenchmarkConcurrentSpanSetTag

399d412

Base automatically changed from dario.castane/apmlp-87/handle-ptrs-settag to main August 5, 2024 09:40

darccio added 2 commits August 6, 2024 10:22

ddtrace/tracer: revert to map-based implementation

5860d9d

Merge remote-tracking branch 'origin' into dario.castane/exp-optimize…

3fd141c

…-settags

darccio added 7 commits August 6, 2024 12:43

ddtrace/tracer: implement distribution-based random number generator …

51c0489

…to test realistic scenarios

ddtrace/tracer: preallocate meta map

dc77c97

ddtrace/tracer: use a shared function to create span.Meta map in unit…

ed88dac

… tests and production code

ddtrace/tracer: introduce pooled span.Meta maps

8595e2c

Merge remote-tracking branch 'origin' into dario.castane/exp-optimize…

d091651

…-settags

ddtrace/tracer: remove meta maps pool

30d6cd8

Releasing the map after encoding the span for submission doesn't play well with our current codebase

ddtrace/tracer: complete BenchmarkSpanSetMeta proving that allocating…

372234c

… the Meta map at the right size impacts performance

github-actions bot added the stale Stuck for more than 1 month label Aug 28, 2024

darccio added do-not-merge/WIP and removed stale Stuck for more than 1 month labels Oct 1, 2024

Merge branch 'main' into dario.castane/exp-optimize-settags

51cf835

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ddtrace/tracer: optimize span tags storage in the hot path #2799

ddtrace/tracer: optimize span tags storage in the hot path #2799

darccio commented Jul 26, 2024 •

edited

Loading

pr-commenter bot commented Aug 6, 2024 •

edited

Loading

github-actions bot commented Aug 28, 2024

ddtrace/tracer: optimize span tags storage in the hot path #2799

Are you sure you want to change the base?

ddtrace/tracer: optimize span tags storage in the hot path #2799

Conversation

darccio commented Jul 26, 2024 • edited Loading

What does this PR do?

Motivation

Pros

Cons

Benchmarks

Original: map-based

Other implementations considered

Reviewer's Checklist

pr-commenter bot commented Aug 6, 2024 • edited Loading

Benchmarks

scenario:BenchmarkPartialFlushing/Disabled-24

scenario:BenchmarkPartialFlushing/Enabled-24

scenario:BenchmarkSetTagMetric-24

scenario:BenchmarkSetTagString-24

scenario:BenchmarkSetTagStringer-24

scenario:BenchmarkSingleSpanRetention/no-rules-24

scenario:BenchmarkSingleSpanRetention/with-rules/match-all-24

scenario:BenchmarkSingleSpanRetention/with-rules/match-half-24

github-actions bot commented Aug 28, 2024

darccio commented Jul 26, 2024 •

edited

Loading

pr-commenter bot commented Aug 6, 2024 •

edited

Loading