ddtrace/tracer: runtime metrics v2 (exclude from release notes) #2772

felixge · 2024-07-05T08:39:45Z

What does this PR do?

⚠️ IMPORTANT ⚠️: This is not intended to be used by customers yet. DO NOT ENABLE this yet. We are still evaluating this feature internal and may decide to remove it again.

Implement DD_RUNTIME_METRICS_V2_ENABLED env variable which allows using the new runtime/metrics for runtime metrics. This gives us access to new metrics that are not available in runtime.ReadMemStats, e.g. scheduler latency.

Motivation

Reviewer's Checklist

Changed code has unit tests for its functionality at or near 100% coverage.
System-Tests covering this feature have been added and enabled with the va.b.c-dev version tag.
There is a benchmark for any new code, or changes to existing code.
If this interacts with the agent in a new way, a system test has been added.
Add an appropriate team label so this PR gets put in the right place for the release notes.
Non-trivial go.mod changes, e.g. adding new modules, are reviewed by @DataDog/dd-trace-go-guild.

Unsure? Have a question? Request a review!

ddtrace/tracer/telemetry.go

github-actions · 2024-08-23T01:51:34Z

This PR is stale because it has been open 20 days with no activity. Remove stale label or comment or this will be closed in 10 days.

…ROF-8665-experimental-runtime-v2-metrics

pr-commenter · 2024-08-28T16:32:10Z

Benchmarks

Benchmark execution time: 2024-11-06 17:40:25

Comparing candidate commit b1a595b in PR branch felix.geisendoerfer/PROF-8665-experimental-runtime-v2-metrics with baseline commit c9fc691 in branch main.

Found 1 performance improvements and 0 performance regressions! Performance is the same for 58 metrics, 0 unstable metrics.

scenario:BenchmarkTracerAddSpans-24

🟩 execution_time [-167.076ns; -87.124ns] or [-4.203%; -2.192%]

anatolebeuzon

lgtm! (just need to fix the failing CI tests)

github-actions · 2024-09-18T01:55:11Z

This PR is stale because it has been open 20 days with no activity. Remove stale label or comment or this will be closed in 10 days.

darccio · 2024-10-01T14:48:30Z

@felixge When it's planned to open the PR for review/merge?

felixge · 2024-10-13T09:14:37Z

I'm planning to merge this merged, probably in 1.5 weeks from now (got time blocked to work on this project every 2 weeks).

…ROF-8665-experimental-runtime-v2-metrics

this toil shouldn't be needed, will try to refactor this later

mtoffl01

Reviewed this just because I'm interested! I don't have context on this change, so I left a few questions in the comments.

Also: Curious why we are calling this "runtime metrics v2" if the feature collects perf metrics but about a different process than v1 -- runtime metrics "v1" collects runtime metrics about the process the tracer is running in, whereas "v2" collects runtime metrics about dd-trace-go (and the agent?), as I understand it. Based on the name, I would've expected runtime metrics v2 to do the same thing as v1 but maybe with improved accuracy/more data.

mtoffl01 · 2024-11-06T15:11:09Z

internal/apps/apps.go

+	// Enabled runtime metrics v2 by default
+	if v := os.Getenv("DD_RUNTIME_METRICS_V2_ENABLED"); v == "" {
+		os.Setenv("DD_RUNTIME_METRICS_V2_ENABLED", "true")
+	}


Why do this rather than change line 387 in ddtrace/tracer/option.go to c.runtimeMetricsV2 = internal.BoolVal("DD_RUNTIME_METRICS_V2_ENABLED", true)?

This feature is not meant to be enabled for customers yet. We want to add it to dd-trace-go behind an env var so that we can experiment with it in the backend go repos at Datadog. If that goes well, we'll document and advertise this feature and probably turn it on by default.

mtoffl01 · 2024-11-06T15:11:55Z

ddtrace/tracer/tracer.go

@@ -329,6 +331,14 @@ func newTracer(opts ...StartOption) *tracer {
 			t.reportRuntimeMetrics(defaultMetricsReportInterval)
 		}()
 	}
+	if c.runtimeMetricsV2 {
+		l := slog.New(slogHandler{})


Why use slog instead of tracer logger? Do these logs get routed somewhere else that customers don't see / only we see somewhere?

Why use slog instead of tracer logger?

We want to share the new runtime metrics implementation between dd-trace-go and as well as the datadog agent in the future. The latter doesn't use dd-trace-go to instrument itself, that's why we have put the code in https://github.com/DataDog/go-runtime-metrics-internal. For that repo we decided to use slog as the logging interface as it's the new standard Go logger. We should consider adopting it in dd-trace-go itself in the future as well, but for now we decided to integrate with our internal logger via an adapter. Let me know if that makes sense.

Do these logs get routed somewhere else that customers don't see / only we see somewhere?

No, the slow logs just get forwarded to dd-trace-go's internal logger and end up in the customer stdout as usual.

felixge · 2024-11-06T16:49:40Z

Reviewed this just because I'm interested! I don't have context on this change, so I left a few questions in the comments.

Sorry, I was going to write a PR description, forgot that I hadn't done it when pressing "ready for review" 🙈. We have a little squad to work on this (Anatole, Nayef and I), so I was assuming only they would end up reviewing.

Also: Curious why we are calling this "runtime metrics v2" if the feature collects perf metrics but about a different process than v1 -- runtime metrics "v1" collects runtime metrics about the process the tracer is running in, whereas "v2" collects runtime metrics about dd-trace-go (and the agent?), as I understand it. Based on the name, I would've expected runtime metrics v2 to do the same thing as v1 but maybe with improved accuracy/more data.

runtime metrics v2 serves the same purpose as v1. The only difference is that it's using the "new" runtime/metrics package from Go rather than the old runtime.ReadMemStats interface. This gives us access to some new metrics that were previously unavailable.

felixge · 2024-11-06T16:52:33Z

ddtrace/tracer/option.go

@@ -381,6 +384,7 @@ func newConfig(opts ...StartOption) *config {
 	}
 	c.logStartup = internal.BoolEnv("DD_TRACE_STARTUP_LOGS", true)
 	c.runtimeMetrics = internal.BoolVal(getDDorOtelConfig("metrics"), false)
+	c.runtimeMetricsV2 = internal.BoolEnv("DD_RUNTIME_METRICS_V2_ENABLED", false)


NIT from @Gandem: We should consider adding a test for this.

Resolution: Will probably add one in a follow-up PR.

…untime-v2-metrics

ddtrace/tracer: runtime metrics v2

a7e5c8d

felixge commented Jul 17, 2024

View reviewed changes

ddtrace/tracer/telemetry.go Show resolved Hide resolved

felixge changed the title ~~ddtrace/tracer: runtime metrics v2~~ ddtrace/tracer: runtime metrics v2 (exclude from release notes) Jul 17, 2024

nsrip-dd added the no-changelog label Aug 2, 2024

github-actions bot added the stale Stuck for more than 1 month label Aug 23, 2024

felixge removed the stale Stuck for more than 1 month label Aug 28, 2024

Merge remote-tracking branch 'origin/main' into felix.geisendoerfer/P…

0f5b5f6

…ROF-8665-experimental-runtime-v2-metrics

fix: test failure b/c of global log level leak

fdd5e08

anatolebeuzon approved these changes Aug 28, 2024

View reviewed changes

fix internal apps

1276984

anatolebeuzon reviewed Aug 28, 2024

View reviewed changes

felixge added 3 commits August 28, 2024 22:31

copyright headers

b070f10

fix one more log level leak

2b9ebd0

one more go.mod

160885f

github-actions bot added the stale Stuck for more than 1 month label Sep 18, 2024

darccio added do-not-merge/WIP and removed stale Stuck for more than 1 month labels Oct 1, 2024

felixge added 4 commits October 23, 2024 15:28

Merge remote-tracking branch 'origin/main' into felix.geisendoerfer/P…

17d94f1

…ROF-8665-experimental-runtime-v2-metrics

fix global state pollution in a better way

535a60a

Merge remote-tracking branch 'origin/main' into felix.geisendoerfer/P…

7fcc908

…ROF-8665-experimental-runtime-v2-metrics

statsdtest: implement new mock methods

42701f7

DataDog deleted a comment from github-actions bot Nov 6, 2024

felixge added 2 commits November 6, 2024 15:31

add startup log

110e409

fix startup logs

a4be8ae

this toil shouldn't be needed, will try to refactor this later

felixge marked this pull request as ready for review November 6, 2024 15:12

felixge requested review from a team as code owners November 6, 2024 15:12

mtoffl01 reviewed Nov 6, 2024

View reviewed changes

felixge added 2 commits November 6, 2024 17:40

fix: use BoolEnv instead of BoolVal

a5e1eba

Upgrade go modules

f16058f

felixge commented Nov 6, 2024

View reviewed changes

Gandem approved these changes Nov 6, 2024

View reviewed changes

tidy go.mod

b1a595b

felixge removed the do-not-merge/WIP label Nov 6, 2024

mtoffl01 approved these changes Nov 6, 2024

View reviewed changes

felixge enabled auto-merge (squash) November 7, 2024 06:24

Merge branch 'main' into felix.geisendoerfer/PROF-8665-experimental-r…

bae27e7

…untime-v2-metrics

felixge merged commit bebced4 into main Nov 7, 2024
171 checks passed

felixge deleted the felix.geisendoerfer/PROF-8665-experimental-runtime-v2-metrics branch November 7, 2024 06:34

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ddtrace/tracer: runtime metrics v2 (exclude from release notes) #2772

ddtrace/tracer: runtime metrics v2 (exclude from release notes) #2772

felixge commented Jul 5, 2024 •

edited

Loading

github-actions bot commented Aug 23, 2024

pr-commenter bot commented Aug 28, 2024 •

edited

Loading

anatolebeuzon left a comment

github-actions bot commented Sep 18, 2024

darccio commented Oct 1, 2024

felixge commented Oct 13, 2024

mtoffl01 left a comment

mtoffl01 Nov 6, 2024

felixge Nov 6, 2024

mtoffl01 Nov 6, 2024

felixge Nov 6, 2024 •

edited

Loading

felixge commented Nov 6, 2024

felixge Nov 6, 2024

felixge Nov 6, 2024

ddtrace/tracer: runtime metrics v2 (exclude from release notes) #2772

ddtrace/tracer: runtime metrics v2 (exclude from release notes) #2772

Conversation

felixge commented Jul 5, 2024 • edited Loading

What does this PR do?

Motivation

Reviewer's Checklist

github-actions bot commented Aug 23, 2024

pr-commenter bot commented Aug 28, 2024 • edited Loading

Benchmarks

scenario:BenchmarkTracerAddSpans-24

anatolebeuzon left a comment

Choose a reason for hiding this comment

github-actions bot commented Sep 18, 2024

darccio commented Oct 1, 2024

felixge commented Oct 13, 2024

mtoffl01 left a comment

Choose a reason for hiding this comment

mtoffl01 Nov 6, 2024

Choose a reason for hiding this comment

felixge Nov 6, 2024

Choose a reason for hiding this comment

mtoffl01 Nov 6, 2024

Choose a reason for hiding this comment

felixge Nov 6, 2024 • edited Loading

Choose a reason for hiding this comment

felixge commented Nov 6, 2024

felixge Nov 6, 2024

Choose a reason for hiding this comment

felixge Nov 6, 2024

Choose a reason for hiding this comment

felixge commented Jul 5, 2024 •

edited

Loading

pr-commenter bot commented Aug 28, 2024 •

edited

Loading

felixge Nov 6, 2024 •

edited

Loading