-
Notifications
You must be signed in to change notification settings - Fork 477
fix(profiling): always report CPU time regardless of Thread running state #16273
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix(profiling): always report CPU time regardless of Thread running state #16273
Conversation
Codeowners resolved as |
0320373 to
75a24f3
Compare
75a24f3 to
659dec1
Compare
|
Performance SLOsComparing candidate kowalski/fix-profiling-always-report-cpu-time-regardless-of-thread-running-state (659dec1) with baseline main (5e94be5) 📈 Performance Regressions (3 suites)📈 iastaspects - 118/118✅ add_aspectTime: ✅ 110.816µs (SLO: <130.000µs 📉 -14.8%) vs baseline: +5.1% Memory: ✅ 42.920MB (SLO: <46.000MB -6.7%) vs baseline: +4.8% ✅ add_inplace_aspectTime: ✅ 106.330µs (SLO: <130.000µs 📉 -18.2%) vs baseline: +1.3% Memory: ✅ 42.959MB (SLO: <46.000MB -6.6%) vs baseline: +4.9% ✅ add_inplace_noaspectTime: ✅ 28.437µs (SLO: <40.000µs 📉 -28.9%) vs baseline: ~same Memory: ✅ 42.880MB (SLO: <46.000MB -6.8%) vs baseline: +4.6% ✅ add_noaspectTime: ✅ 49.886µs (SLO: <70.000µs 📉 -28.7%) vs baseline: +0.9% Memory: ✅ 43.096MB (SLO: <46.000MB -6.3%) vs baseline: +5.3% ✅ bytearray_aspectTime: ✅ 252.213µs (SLO: <400.000µs 📉 -36.9%) vs baseline: -1.3% Memory: ✅ 42.920MB (SLO: <46.000MB -6.7%) vs baseline: +4.9% ✅ bytearray_extend_aspectTime: ✅ 643.543µs (SLO: <800.000µs 📉 -19.6%) vs baseline: +0.1% Memory: ✅ 42.900MB (SLO: <46.000MB -6.7%) vs baseline: +4.9% ✅ bytearray_extend_noaspectTime: ✅ 271.054µs (SLO: <400.000µs 📉 -32.2%) vs baseline: +1.1% Memory: ✅ 42.920MB (SLO: <46.000MB -6.7%) vs baseline: +4.8% ✅ bytearray_noaspectTime: ✅ 140.756µs (SLO: <300.000µs 📉 -53.1%) vs baseline: +0.5% Memory: ✅ 42.959MB (SLO: <46.000MB -6.6%) vs baseline: +4.8% ✅ bytes_aspectTime: ✅ 217.888µs (SLO: <300.000µs 📉 -27.4%) vs baseline: -1.0% Memory: ✅ 42.900MB (SLO: <46.000MB -6.7%) vs baseline: +4.8% ✅ bytes_noaspectTime: ✅ 134.013µs (SLO: <200.000µs 📉 -33.0%) vs baseline: +0.9% Memory: ✅ 42.998MB (SLO: <46.000MB -6.5%) vs baseline: +5.0% ✅ bytesio_aspectTime: ✅ 3.887ms (SLO: <5.000ms 📉 -22.3%) vs baseline: -0.4% Memory: ✅ 42.900MB (SLO: <46.000MB -6.7%) vs baseline: +4.7% ✅ bytesio_noaspectTime: ✅ 314.014µs (SLO: <420.000µs 📉 -25.2%) vs baseline: -0.7% Memory: ✅ 42.920MB (SLO: <46.000MB -6.7%) vs baseline: +4.9% ✅ capitalize_aspectTime: ✅ 89.481µs (SLO: <300.000µs 📉 -70.2%) vs baseline: +0.7% Memory: ✅ 42.998MB (SLO: <46.000MB -6.5%) vs baseline: +5.1% ✅ capitalize_noaspectTime: ✅ 251.378µs (SLO: <300.000µs 📉 -16.2%) vs baseline: -0.2% Memory: ✅ 42.979MB (SLO: <46.000MB -6.6%) vs baseline: +5.1% ✅ casefold_aspectTime: ✅ 92.197µs (SLO: <500.000µs 📉 -81.6%) vs baseline: +4.1% Memory: ✅ 42.979MB (SLO: <46.000MB -6.6%) vs baseline: +5.0% ✅ casefold_noaspectTime: ✅ 307.143µs (SLO: <500.000µs 📉 -38.6%) vs baseline: -2.5% Memory: ✅ 42.920MB (SLO: <46.000MB -6.7%) vs baseline: +4.8% ✅ decode_aspectTime: ✅ 87.481µs (SLO: <100.000µs 📉 -12.5%) vs baseline: ~same Memory: ✅ 43.037MB (SLO: <46.000MB -6.4%) vs baseline: +5.0% ✅ decode_noaspectTime: ✅ 152.033µs (SLO: <210.000µs 📉 -27.6%) vs baseline: +0.8% Memory: ✅ 42.880MB (SLO: <46.000MB -6.8%) vs baseline: +4.8% ✅ encode_aspectTime: ✅ 85.347µs (SLO: <200.000µs 📉 -57.3%) vs baseline: -0.1% Memory: ✅ 42.939MB (SLO: <46.000MB -6.7%) vs baseline: +4.9% ✅ encode_noaspectTime: ✅ 136.744µs (SLO: <200.000µs 📉 -31.6%) vs baseline: -1.0% Memory: ✅ 42.900MB (SLO: <46.000MB -6.7%) vs baseline: +4.8% ✅ format_aspectTime: ✅ 14.658ms (SLO: <19.200ms 📉 -23.7%) vs baseline: -0.1% Memory: ✅ 43.096MB (SLO: <46.000MB -6.3%) vs baseline: +4.9% ✅ format_map_aspectTime: ✅ 16.500ms (SLO: <21.500ms 📉 -23.3%) vs baseline: ~same Memory: ✅ 43.234MB (SLO: <46.000MB -6.0%) vs baseline: +5.3% ✅ format_map_noaspectTime: ✅ 371.834µs (SLO: <500.000µs 📉 -25.6%) vs baseline: -1.2% Memory: ✅ 42.939MB (SLO: <46.000MB -6.7%) vs baseline: +4.8% ✅ format_noaspectTime: ✅ 313.731µs (SLO: <500.000µs 📉 -37.3%) vs baseline: +0.2% Memory: ✅ 42.939MB (SLO: <46.000MB -6.7%) vs baseline: +4.9% ✅ index_aspectTime: ✅ 125.501µs (SLO: <300.000µs 📉 -58.2%) vs baseline: +0.1% Memory: ✅ 42.920MB (SLO: <46.000MB -6.7%) vs baseline: +4.9% ✅ index_noaspectTime: ✅ 40.769µs (SLO: <300.000µs 📉 -86.4%) vs baseline: +1.2% Memory: ✅ 42.900MB (SLO: <46.000MB -6.7%) vs baseline: +4.9% ✅ join_aspectTime: ✅ 215.350µs (SLO: <300.000µs 📉 -28.2%) vs baseline: +0.2% Memory: ✅ 42.920MB (SLO: <46.000MB -6.7%) vs baseline: +4.7% ✅ join_noaspectTime: ✅ 142.610µs (SLO: <300.000µs 📉 -52.5%) vs baseline: +0.8% Memory: ✅ 42.959MB (SLO: <46.000MB -6.6%) vs baseline: +4.9% ✅ ljust_aspectTime: ✅ 509.528µs (SLO: <700.000µs 📉 -27.2%) vs baseline: +0.9% Memory: ✅ 43.057MB (SLO: <46.000MB -6.4%) vs baseline: +5.0% ✅ ljust_noaspectTime: ✅ 263.609µs (SLO: <300.000µs 📉 -12.1%) vs baseline: ~same Memory: ✅ 42.900MB (SLO: <46.000MB -6.7%) vs baseline: +4.7% ✅ lower_aspectTime: ✅ 299.432µs (SLO: <500.000µs 📉 -40.1%) vs baseline: -0.9% Memory: ✅ 42.959MB (SLO: <46.000MB -6.6%) vs baseline: +5.0% ✅ lower_noaspectTime: ✅ 235.010µs (SLO: <300.000µs 📉 -21.7%) vs baseline: +0.5% Memory: ✅ 42.959MB (SLO: <46.000MB -6.6%) vs baseline: +4.9% ✅ lstrip_aspectTime: ✅ 0.327ms (SLO: <3.000ms 📉 -89.1%) vs baseline: 📈 +23.0% Memory: ✅ 42.939MB (SLO: <46.000MB -6.7%) vs baseline: +4.9% ✅ lstrip_noaspectTime: ✅ 0.180ms (SLO: <3.000ms 📉 -94.0%) vs baseline: +0.9% Memory: ✅ 42.920MB (SLO: <46.000MB -6.7%) vs baseline: +4.9% ✅ modulo_aspectTime: ✅ 14.324ms (SLO: <18.750ms 📉 -23.6%) vs baseline: ~same Memory: ✅ 43.116MB (SLO: <46.000MB -6.3%) vs baseline: +5.0% ✅ modulo_aspect_for_bytearray_bytearrayTime: ✅ 15.224ms (SLO: <19.350ms 📉 -21.3%) vs baseline: +2.4% Memory: ✅ 43.155MB (SLO: <46.000MB -6.2%) vs baseline: +5.0% ✅ modulo_aspect_for_bytesTime: ✅ 14.517ms (SLO: <18.900ms 📉 -23.2%) vs baseline: +0.2% Memory: ✅ 43.096MB (SLO: <46.000MB -6.3%) vs baseline: +4.7% ✅ modulo_aspect_for_bytes_bytearrayTime: ✅ 14.705ms (SLO: <19.150ms 📉 -23.2%) vs baseline: ~same Memory: ✅ 43.155MB (SLO: <46.000MB -6.2%) vs baseline: +5.1% ✅ modulo_noaspectTime: ✅ 0.369ms (SLO: <3.000ms 📉 -87.7%) vs baseline: -1.9% Memory: ✅ 42.920MB (SLO: <46.000MB -6.7%) vs baseline: +4.9% ✅ replace_aspectTime: ✅ 18.488ms (SLO: <24.000ms 📉 -23.0%) vs baseline: +0.1% Memory: ✅ 43.077MB (SLO: <46.000MB -6.4%) vs baseline: +4.8% ✅ replace_noaspectTime: ✅ 281.280µs (SLO: <300.000µs -6.2%) vs baseline: ~same Memory: ✅ 42.880MB (SLO: <46.000MB -6.8%) vs baseline: +4.8% ✅ repr_aspectTime: ✅ 309.803µs (SLO: <420.000µs 📉 -26.2%) vs baseline: +0.4% Memory: ✅ 42.920MB (SLO: <46.000MB -6.7%) vs baseline: +5.0% ✅ repr_noaspectTime: ✅ 47.033µs (SLO: <90.000µs 📉 -47.7%) vs baseline: +1.2% Memory: ✅ 42.959MB (SLO: <46.000MB -6.6%) vs baseline: +4.8% ✅ rstrip_aspectTime: ✅ 380.609µs (SLO: <500.000µs 📉 -23.9%) vs baseline: -0.2% Memory: ✅ 42.979MB (SLO: <46.000MB -6.6%) vs baseline: +4.9% ✅ rstrip_noaspectTime: ✅ 184.098µs (SLO: <300.000µs 📉 -38.6%) vs baseline: -0.4% Memory: ✅ 42.861MB (SLO: <46.000MB -6.8%) vs baseline: +4.4% ✅ slice_aspectTime: ✅ 188.033µs (SLO: <300.000µs 📉 -37.3%) vs baseline: -0.4% Memory: ✅ 42.979MB (SLO: <46.000MB -6.6%) vs baseline: +5.0% ✅ slice_noaspectTime: ✅ 53.806µs (SLO: <90.000µs 📉 -40.2%) vs baseline: -0.8% Memory: ✅ 42.979MB (SLO: <46.000MB -6.6%) vs baseline: +5.0% ✅ stringio_aspectTime: ✅ 3.930ms (SLO: <5.000ms 📉 -21.4%) vs baseline: -0.7% Memory: ✅ 42.959MB (SLO: <46.000MB -6.6%) vs baseline: +4.8% ✅ stringio_noaspectTime: ✅ 345.161µs (SLO: <500.000µs 📉 -31.0%) vs baseline: +0.8% Memory: ✅ 42.900MB (SLO: <46.000MB -6.7%) vs baseline: +4.9% ✅ strip_aspectTime: ✅ 270.870µs (SLO: <350.000µs 📉 -22.6%) vs baseline: +0.8% Memory: ✅ 42.920MB (SLO: <46.000MB -6.7%) vs baseline: +4.9% ✅ strip_noaspectTime: ✅ 177.433µs (SLO: <240.000µs 📉 -26.1%) vs baseline: -0.2% Memory: ✅ 42.920MB (SLO: <46.000MB -6.7%) vs baseline: +4.7% ✅ swapcase_aspectTime: ✅ 340.538µs (SLO: <500.000µs 📉 -31.9%) vs baseline: +0.3% Memory: ✅ 42.900MB (SLO: <46.000MB -6.7%) vs baseline: +4.8% ✅ swapcase_noaspectTime: ✅ 273.213µs (SLO: <400.000µs 📉 -31.7%) vs baseline: +0.1% Memory: ✅ 42.979MB (SLO: <46.000MB -6.6%) vs baseline: +5.0% ✅ title_aspectTime: ✅ 330.120µs (SLO: <500.000µs 📉 -34.0%) vs baseline: +0.8% Memory: ✅ 42.939MB (SLO: <46.000MB -6.7%) vs baseline: +5.0% ✅ title_noaspectTime: ✅ 265.485µs (SLO: <400.000µs 📉 -33.6%) vs baseline: +0.9% Memory: ✅ 42.920MB (SLO: <46.000MB -6.7%) vs baseline: +4.7% ✅ translate_aspectTime: ✅ 494.857µs (SLO: <700.000µs 📉 -29.3%) vs baseline: +0.4% Memory: ✅ 42.939MB (SLO: <46.000MB -6.7%) vs baseline: +4.9% ✅ translate_noaspectTime: ✅ 427.962µs (SLO: <500.000µs 📉 -14.4%) vs baseline: -0.7% Memory: ✅ 42.880MB (SLO: <46.000MB -6.8%) vs baseline: +4.6% ✅ upper_aspectTime: ✅ 302.520µs (SLO: <500.000µs 📉 -39.5%) vs baseline: -0.5% Memory: ✅ 42.900MB (SLO: <46.000MB -6.7%) vs baseline: +4.9% ✅ upper_noaspectTime: ✅ 235.287µs (SLO: <400.000µs 📉 -41.2%) vs baseline: +0.7% Memory: ✅ 42.959MB (SLO: <46.000MB -6.6%) vs baseline: +5.0% 📈 iastaspectsospath - 24/24✅ ospathbasename_aspectTime: ✅ 501.086µs (SLO: <700.000µs 📉 -28.4%) vs baseline: 📈 +17.7% Memory: ✅ 42.880MB (SLO: <46.000MB -6.8%) vs baseline: +4.7% ✅ ospathbasename_noaspectTime: ✅ 433.071µs (SLO: <700.000µs 📉 -38.1%) vs baseline: +0.6% Memory: ✅ 42.939MB (SLO: <46.000MB -6.7%) vs baseline: +5.0% ✅ ospathjoin_aspectTime: ✅ 627.807µs (SLO: <700.000µs 📉 -10.3%) vs baseline: ~same Memory: ✅ 42.979MB (SLO: <46.000MB -6.6%) vs baseline: +5.0% ✅ ospathjoin_noaspectTime: ✅ 637.839µs (SLO: <700.000µs -8.9%) vs baseline: ~same Memory: ✅ 42.979MB (SLO: <46.000MB -6.6%) vs baseline: +5.0% ✅ ospathnormcase_aspectTime: ✅ 354.763µs (SLO: <700.000µs 📉 -49.3%) vs baseline: +1.1% Memory: ✅ 42.900MB (SLO: <46.000MB -6.7%) vs baseline: +4.8% ✅ ospathnormcase_noaspectTime: ✅ 360.032µs (SLO: <700.000µs 📉 -48.6%) vs baseline: +0.7% Memory: ✅ 42.998MB (SLO: <46.000MB -6.5%) vs baseline: +5.1% ✅ ospathsplit_aspectTime: ✅ 494.066µs (SLO: <700.000µs 📉 -29.4%) vs baseline: +0.5% Memory: ✅ 42.959MB (SLO: <46.000MB -6.6%) ✅ ospathsplit_noaspectTime: ✅ 503.477µs (SLO: <700.000µs 📉 -28.1%) vs baseline: +0.4% Memory: ✅ 42.939MB (SLO: <46.000MB -6.7%) vs baseline: +4.9% ✅ ospathsplitdrive_aspectTime: ✅ 373.469µs (SLO: <700.000µs 📉 -46.6%) vs baseline: -1.2% Memory: ✅ 42.959MB (SLO: <46.000MB -6.6%) vs baseline: +4.7% ✅ ospathsplitdrive_noaspectTime: ✅ 72.926µs (SLO: <700.000µs 📉 -89.6%) vs baseline: -0.9% Memory: ✅ 42.920MB (SLO: <46.000MB -6.7%) vs baseline: +4.8% ✅ ospathsplitext_aspectTime: ✅ 457.266µs (SLO: <700.000µs 📉 -34.7%) vs baseline: -0.2% Memory: ✅ 42.900MB (SLO: <46.000MB -6.7%) vs baseline: +4.9% ✅ ospathsplitext_noaspectTime: ✅ 465.535µs (SLO: <700.000µs 📉 -33.5%) vs baseline: -0.2% Memory: ✅ 42.979MB (SLO: <46.000MB -6.6%) 📈 telemetryaddmetric - 30/30✅ 1-count-metric-1-timesTime: ✅ 3.375µs (SLO: <20.000µs 📉 -83.1%) vs baseline: 📈 +17.0% Memory: ✅ 35.547MB (SLO: <38.000MB -6.5%) vs baseline: +4.9% ✅ 1-count-metrics-100-timesTime: ✅ 204.960µs (SLO: <220.000µs -6.8%) vs baseline: +0.6% Memory: ✅ 35.527MB (SLO: <38.000MB -6.5%) vs baseline: +5.1% ✅ 1-distribution-metric-1-timesTime: ✅ 3.238µs (SLO: <20.000µs 📉 -83.8%) vs baseline: -0.6% Memory: ✅ 35.566MB (SLO: <38.000MB -6.4%) vs baseline: +5.0% ✅ 1-distribution-metrics-100-timesTime: ✅ 215.244µs (SLO: <230.000µs -6.4%) vs baseline: +1.2% Memory: ✅ 35.448MB (SLO: <38.000MB -6.7%) vs baseline: +4.6% ✅ 1-gauge-metric-1-timesTime: ✅ 2.143µs (SLO: <20.000µs 📉 -89.3%) vs baseline: -0.3% Memory: ✅ 35.488MB (SLO: <38.000MB -6.6%) vs baseline: +4.7% ✅ 1-gauge-metrics-100-timesTime: ✅ 137.049µs (SLO: <150.000µs -8.6%) vs baseline: +0.4% Memory: ✅ 35.488MB (SLO: <38.000MB -6.6%) vs baseline: +4.9% ✅ 1-rate-metric-1-timesTime: ✅ 3.024µs (SLO: <20.000µs 📉 -84.9%) vs baseline: -0.1% Memory: ✅ 35.448MB (SLO: <38.000MB -6.7%) vs baseline: +4.5% ✅ 1-rate-metrics-100-timesTime: ✅ 219.050µs (SLO: <250.000µs 📉 -12.4%) vs baseline: +0.3% Memory: ✅ 35.547MB (SLO: <38.000MB -6.5%) vs baseline: +5.1% ✅ 100-count-metrics-100-timesTime: ✅ 20.521ms (SLO: <22.000ms -6.7%) vs baseline: -0.6% Memory: ✅ 35.527MB (SLO: <38.000MB -6.5%) vs baseline: +4.7% ✅ 100-distribution-metrics-100-timesTime: ✅ 2.247ms (SLO: <2.550ms 📉 -11.9%) vs baseline: +0.5% Memory: ✅ 35.547MB (SLO: <38.000MB -6.5%) vs baseline: +5.0% ✅ 100-gauge-metrics-100-timesTime: ✅ 1.419ms (SLO: <1.550ms -8.5%) vs baseline: ~same Memory: ✅ 35.527MB (SLO: <38.000MB -6.5%) vs baseline: +4.9% ✅ 100-rate-metrics-100-timesTime: ✅ 2.243ms (SLO: <2.550ms 📉 -12.0%) vs baseline: -0.7% Memory: ✅ 35.566MB (SLO: <38.000MB -6.4%) vs baseline: +4.9% ✅ flush-1-metricTime: ✅ 4.524µs (SLO: <20.000µs 📉 -77.4%) vs baseline: +0.3% Memory: ✅ 35.547MB (SLO: <38.000MB -6.5%) vs baseline: +4.9% ✅ flush-100-metricsTime: ✅ 174.534µs (SLO: <250.000µs 📉 -30.2%) vs baseline: -0.8% Memory: ✅ 35.527MB (SLO: <38.000MB -6.5%) vs baseline: +4.7% ✅ flush-1000-metricsTime: ✅ 2.193ms (SLO: <2.500ms 📉 -12.3%) vs baseline: -0.2% Memory: ✅ 36.392MB (SLO: <38.750MB -6.1%) vs baseline: +5.1% 🟡 Near SLO Breach (1 suite)🟡 tracer - 6/6✅ largeTime: ✅ 31.887ms (SLO: <32.950ms -3.2%) vs baseline: +0.9% Memory: ✅ 36.805MB (SLO: <39.250MB -6.2%) vs baseline: +5.0% ✅ mediumTime: ✅ 3.138ms (SLO: <3.200ms 🟡 -1.9%) vs baseline: +0.4% Memory: ✅ 35.173MB (SLO: <38.750MB -9.2%) vs baseline: +4.6% ✅ smallTime: ✅ 364.434µs (SLO: <370.000µs 🟡 -1.5%) vs baseline: +3.3% Memory: ✅ 35.173MB (SLO: <38.750MB -9.2%) vs baseline: +4.6%
|
Description
This PR updates the Python Profiler logic to always report CPU Time for sampled Threads, regardless of whether they were running when we computed their spent CPU time.
Why?
In Incident 48491, we investigated an issue where following a
ddtraceupgrade, one of our Python services suddenly started reporting near zero CPU Time and no Frames at all/empty flame graphs.The investigation showed that this changed if we removed the check on
is_runningand always reported CPU Time for each sampled Thread.Originally, this check was introduced to avoid reporting CPU Time on Stacks that were in fact idle states; however it also means it the Thread runs between two Samples but then does not run when we "check whether it is running", then we are blind to what it did during that time.
On top of that, there is no reliable way to check whether a Thread is running – checking this is by definition racy, and although there are potential ways around it, we could never guarantee that the running state we determined matched the one that the Thread was in when we captured its Stack, meaning we could have discrepancies anyway.
Finally, several other Datadog Profilers have this bias and accept it, so we should probably match that behaviour.
Note the performance difference will probably be negligible, but this change may actually slightly improve our CPU usage as we will now make one less system call.