-
Notifications
You must be signed in to change notification settings - Fork 442
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ddtrace/tracer: runtime metrics v2 (exclude from release notes) #2772
ddtrace/tracer: runtime metrics v2 (exclude from release notes) #2772
Conversation
This PR is stale because it has been open 20 days with no activity. Remove stale label or comment or this will be closed in 10 days. |
…ROF-8665-experimental-runtime-v2-metrics
BenchmarksBenchmark execution time: 2024-11-06 17:40:25 Comparing candidate commit b1a595b in PR branch Found 1 performance improvements and 0 performance regressions! Performance is the same for 58 metrics, 0 unstable metrics. scenario:BenchmarkTracerAddSpans-24
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm! (just need to fix the failing CI tests)
This PR is stale because it has been open 20 days with no activity. Remove stale label or comment or this will be closed in 10 days. |
@felixge When it's planned to open the PR for review/merge? |
I'm planning to merge this merged, probably in 1.5 weeks from now (got time blocked to work on this project every 2 weeks). |
this toil shouldn't be needed, will try to refactor this later
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed this just because I'm interested! I don't have context on this change, so I left a few questions in the comments.
Also: Curious why we are calling this "runtime metrics v2" if the feature collects perf metrics but about a different process than v1 -- runtime metrics "v1" collects runtime metrics about the process the tracer is running in, whereas "v2" collects runtime metrics about dd-trace-go (and the agent?), as I understand it. Based on the name, I would've expected runtime metrics v2 to do the same thing as v1 but maybe with improved accuracy/more data.
// Enabled runtime metrics v2 by default | ||
if v := os.Getenv("DD_RUNTIME_METRICS_V2_ENABLED"); v == "" { | ||
os.Setenv("DD_RUNTIME_METRICS_V2_ENABLED", "true") | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do this rather than change line 387 in ddtrace/tracer/option.go to c.runtimeMetricsV2 = internal.BoolVal("DD_RUNTIME_METRICS_V2_ENABLED", true)
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This feature is not meant to be enabled for customers yet. We want to add it to dd-trace-go behind an env var so that we can experiment with it in the backend go repos at Datadog. If that goes well, we'll document and advertise this feature and probably turn it on by default.
@@ -329,6 +331,14 @@ func newTracer(opts ...StartOption) *tracer { | |||
t.reportRuntimeMetrics(defaultMetricsReportInterval) | |||
}() | |||
} | |||
if c.runtimeMetricsV2 { | |||
l := slog.New(slogHandler{}) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why use slog instead of tracer logger? Do these logs get routed somewhere else that customers don't see / only we see somewhere?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why use slog instead of tracer logger?
We want to share the new runtime metrics implementation between dd-trace-go and as well as the datadog agent in the future. The latter doesn't use dd-trace-go to instrument itself, that's why we have put the code in https://github.com/DataDog/go-runtime-metrics-internal. For that repo we decided to use slog as the logging interface as it's the new standard Go logger. We should consider adopting it in dd-trace-go itself in the future as well, but for now we decided to integrate with our internal logger via an adapter. Let me know if that makes sense.
Do these logs get routed somewhere else that customers don't see / only we see somewhere?
No, the slow logs just get forwarded to dd-trace-go's internal logger and end up in the customer stdout as usual.
Sorry, I was going to write a PR description, forgot that I hadn't done it when pressing "ready for review" 🙈. We have a little squad to work on this (Anatole, Nayef and I), so I was assuming only they would end up reviewing.
runtime metrics v2 serves the same purpose as v1. The only difference is that it's using the "new" |
@@ -381,6 +384,7 @@ func newConfig(opts ...StartOption) *config { | |||
} | |||
c.logStartup = internal.BoolEnv("DD_TRACE_STARTUP_LOGS", true) | |||
c.runtimeMetrics = internal.BoolVal(getDDorOtelConfig("metrics"), false) | |||
c.runtimeMetricsV2 = internal.BoolEnv("DD_RUNTIME_METRICS_V2_ENABLED", false) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
NIT from @Gandem: We should consider adding a test for this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Resolution: Will probably add one in a follow-up PR.
…untime-v2-metrics
What does this PR do?
Implement DD_RUNTIME_METRICS_V2_ENABLED env variable which allows using the new
runtime/metrics
for runtime metrics. This gives us access to new metrics that are not available inruntime.ReadMemStats
, e.g. scheduler latency.Motivation
Reviewer's Checklist
Unsure? Have a question? Request a review!