-
Notifications
You must be signed in to change notification settings - Fork 399
[PROF-13510] Heap profiling for ruby 4.x - Prototype #5201
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
|
👋 Hey @ivoanjo, please fill "Change log entry" section in the pull request description. If changes need to be present in CHANGELOG.md you can state it this way **Change log entry**
Yes. A brief summary to be placed into the CHANGELOG.md(possible answers Yes/Yep/Yeah) Or you can opt out like that **Change log entry**
None.(possible answers No/Nope/None) Visited at: 2026-01-20 10:00:27 UTC |
BenchmarksBenchmark execution time: 2026-01-22 16:03:25 Comparing candidate commit ce3c746 in PR branch Found 1 performance improvements and 0 performance regressions! Performance is the same for 43 metrics, 2 unstable metrics. scenario:profiling - intern_all 1000 repeated strings
|
c4fdb76 to
49477bf
Compare
|
Reminder that to measure this, we should do ON/OFF. Some of the cost will be in the VM itself. |
|
✅ Tests 🎉 All green!❄️ No new flaky tests detected 🎯 Code Coverage 🔗 Commit SHA: cb3ef5e | Docs | Datadog PR Page | Was this helpful? Give us feedback! |
ivoanjo
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I ran out of time today, here's what I got so far! In particular, I still need to stare at heap_recorder.c for longer, I'm not yet convinced it's correct, I'm seeing some state getting carried across calls that I'm not confident is right.
The current notes are small-ish stuff, other than the extra overhead that's unneeded (and I believe should be easy to fix) + the code exposing the heap recorder directly to the cpu and wall collector that ideally I'd like to avoid too if possible.
| #ifdef DEFERRED_HEAP_ALLOCATION_RECORDING | ||
| static rb_postponed_job_handle_t finalize_heap_allocation_from_postponed_job_handle; | ||
| #endif |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Minor: More as a style thing, in general I've avoided overly-ifdefing things out when they're harmless.
That is, the cost of a leftover extra field on Rubies that don't need it is so small that I usually prefer the advantage of less code and easier to reason due to less ifdef branching. (Same for most spots in this file)
ext/datadog_profiling_native_extension/collectors_cpu_and_wall_time_worker.c
Outdated
Show resolved
Hide resolved
ext/datadog_profiling_native_extension/collectors_cpu_and_wall_time_worker.c
Outdated
Show resolved
Hide resolved
ext/datadog_profiling_native_extension/collectors_cpu_and_wall_time_worker.c
Outdated
Show resolved
Hide resolved
| context "on Ruby 4.0 or newer" do | ||
| let(:testing_version) { "4.0.0" } | ||
|
|
||
| it "initializes StackRecorder without heap sampling support and warns" do | ||
| before do | ||
| settings.profiling.allocation_enabled = true | ||
| allow(logger).to receive(:debug) | ||
| end | ||
|
|
||
| it "initializes StackRecorder with heap sampling support" do | ||
| expect(Datadog::Profiling::StackRecorder).to receive(:new) | ||
| .with(hash_including(heap_samples_enabled: false, heap_size_enabled: false)) | ||
| .with(hash_including(heap_samples_enabled: true, heap_size_enabled: true)) | ||
| .and_call_original | ||
|
|
||
| expect(logger).to receive(:warn).with(/Datadog Ruby heap profiler is currently incompatible with Ruby 4/) | ||
|
|
||
| build_profiler_component |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we do add an extra setting to toggle the new code, this testcase is still worth keeping. On the other hand, if we don't, then this test-case has become redundant: since Ruby 4 is no longer special, the existing tests below already cover this situation. (The test was added exactly because there was an exception for Ruby 4)
| #ifdef DEFERRED_HEAP_ALLOCATION_RECORDING | ||
| // A pending recording is used to defer the object_id call on Ruby 4+ | ||
| // where calling rb_obj_id during on_newobj_event is unsafe. | ||
| typedef struct { | ||
| VALUE object_ref; | ||
| heap_record *heap_record; | ||
| live_object_data object_data; | ||
| } pending_recording; | ||
|
|
||
| #define MAX_PENDING_RECORDINGS 64 | ||
| #endif |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Minor: The same note about not overly if-def'ing things from the cpu and wall collector I believe applies here as well
| VALUE obj = heap_recorder->pending_recordings[i].object_ref; | ||
| if (obj != Qnil) { | ||
| rb_gc_mark(obj); | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The object can't ever be Qnil (or if it is, we have a bug...), since it was set from an allocation and Qnil is not a heap-allocated object. (It's a tagged pointer)
| if (heap_recorder->active_deferred_object != Qnil) { | ||
| rb_gc_mark(heap_recorder->active_deferred_object); | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Minor: This can can indeed trivially be Qnil; but btw it's OK to mark Qnil, so maybe remove the branch anyway? Less code ;)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
feels weird to do so ^^
ivoanjo
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok here's my comments from the full pass :)
I think the key change needed is to make sure we call during_sample_enter to avoid any other parts of the profiler firing in the middle of finalization.
Having said that, especially on the heap profiler I'm not a huge fan of some of the duplication -- that code is ultra-fiddly and so having complex logic duplicated across if-defs I worry makes it easy to forget to update one of the versions.
| static void finalize_heap_allocation_from_postponed_job(DDTRACE_UNUSED void *_unused) { | ||
| cpu_and_wall_time_worker_state *state = active_sampler_instance_state; | ||
|
|
||
| if (state == NULL) return; | ||
|
|
||
| if (!ddtrace_rb_ractor_main_p()) { | ||
| return; | ||
| } | ||
|
|
||
| // Get the heap_recorder from the thread_context_collector | ||
| heap_recorder *recorder = thread_context_collector_get_heap_recorder(state->thread_context_collector_instance); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, on a second pass, there's two things missing here that can create a bit of a sharp edge:
a. We're missing during_sample_enter and during_sample_exit -- these set a flag that allows us to avoid nested operations inside the profiler. E.g. some sharp edges along the line of "the profiler is sampling something else -> it calls some VM api that causes the VM to check for interruptions -> the VM decides now it's a really nice time to flush heap things -> our current state may not be in a consistent sate" (or reverse -- maybe this is the function that started first, and it triggers an allocation, and the flip situation happens)
b. (Minor) We're missing the discrete_dynamic_sampler_before_sample and discrete_dynamic_sampler_after_sample calls to update the dynamic sampling rate mechanism. In practice, this means that work done inside this function isn't accounted as being profiler overhead. TBH what we're doing right now isn't a lot but... yeah maybe at least leave a comment saying "This is not being accounted for the dynamic sampling rate update, and it's ok because the amount of work we do in this case is very small"
Both things happen in e.g. on_newobj_event so doing the same here should be enough.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Edit: Actually I wonder if we can even get recursion where finalize gets called -> goes into the vm -> vm calls finalize again (e.g. if there's an allocation) or something. during_sample_enter/during_sample_exit would also protect against that.
| bool heap_recorder_has_pending_recordings(heap_recorder *heap_recorder) { | ||
| if (heap_recorder == NULL) { | ||
| return false; | ||
| } | ||
| return heap_recorder->pending_recordings_count > 0; | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Minor: Since this is only for debugging, I wonder if we should just return the count instead of a boolean. Same cost for the logic, and a bit easier to debug from the Ruby side.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As this is style, I'll leave it to the end.
| cleanup_heap_record_if_unused(heap_recorder, pending->heap_record); | ||
| object_record_free(heap_recorder, record); | ||
| } | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks like an inlined version of commit_recording, possibly with the difference that the num_tracked_objects has been pre-incremented.
I think since this logic is quite fiddly, it's probably worth trying to unify them -- possibly having a flag that says "I've already incremented the object!"; or just have callers do the increment.
(For instance -- I just noticed that the new code doesn't check for overflow on num_tracked_objects and this is the kind of "oops" that can easily creep in if the logic gets too duplicated)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
tried a slight refactoring, still fiddly
| describe "pending heap recordings cleanup" do | ||
| def has_pending_recordings? | ||
| described_class::Testing.send(:_native_has_pending_heap_recordings?, stack_recorder) | ||
| end | ||
|
|
||
| def track_object_without_finalize(obj) | ||
| described_class::Testing._native_track_object(stack_recorder, obj, sample_rate, obj.class.name) | ||
| Datadog::Profiling::Collectors::Stack::Testing | ||
| ._native_sample(Thread.current, stack_recorder, metric_values, labels, numeric_labels) | ||
| end | ||
|
|
||
| it "clears pending recordings after finalization" do | ||
| skip "Only applies to Ruby 4+ with deferred heap allocation recording" if RUBY_VERSION < "4" | ||
|
|
||
| test_object = Object.new | ||
|
|
||
| track_object_without_finalize(test_object) | ||
|
|
||
| expect(has_pending_recordings?).to be true | ||
|
|
||
| described_class::Testing._native_finalize_pending_heap_recordings(stack_recorder) | ||
|
|
||
| expect(has_pending_recordings?).to be false | ||
| end | ||
|
|
||
| it "clears pending recordings after multiple allocations" do | ||
| skip "Only applies to Ruby 4+ with deferred heap allocation recording" if RUBY_VERSION < "4" | ||
|
|
||
| 3.times do | ||
| test_object = Object.new | ||
| track_object_without_finalize(test_object) | ||
| end | ||
|
|
||
| expect(has_pending_recordings?).to be true | ||
|
|
||
| described_class::Testing._native_finalize_pending_heap_recordings(stack_recorder) | ||
|
|
||
| expect(has_pending_recordings?).to be false | ||
| end |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Minor: I'm not sure these tests add a lot... Specifically, because they don't assert on any results, it's easy for them to pass while not doing the correct thing.
Furthermore, we have existing coverage where we already check if the objects we intend to sample/track do get sampled/track.
So... I'd be inclined to remove these tests (and the pending_heap_recordings? helper maybe as well?).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
things slightly changed. I'll let you refer to comment above (in this file)
e966c43 to
ea9eba7
Compare
|
I took into account major comments. |
ea9eba7 to
4dbdd06
Compare
|
|
||
| expect(has_pending_recordings?).to be true | ||
|
|
||
| described_class::Testing._native_finalize_pending_heap_recordings(stack_recorder) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The test shows that you need these bits which is not ideal (but I still like that the test forces me to do this).
The idea is to delay the time at which we record object ids. Once we are outside of the allocation code path, we can request the object ID.
- Avoid scheduling many postponed jobs This fixes some of the issues we had with accuracy Also I suspect that this has less overhead - Avoid re-entrancy based on Ivo's comments
Although some of this code is dead code on legacy Rubies, always compiling it in means less ifdefs spread throughout and it helps keep the code focused on modern rubies, rather than on legacy ones.
This check is already covered by `heap_recorder->active_recording != NULL` (they're set and unset together).
This makes it easier to use this in tests.
… avoid extra helper
…lector This moves the logic closer to the heap profiler and helps focus the CpuAndWallTimeWorker on what (triggering or not) and not why (doesn't care?).
This API only became available after I rebased.
This reverts commit e153759. (Avoid touching CHANGELOG for nicer diff)
…is needed This will replace the more heavy-handed query in `thread_context_collector_heap_pending_buffer_pressure`.
…filer directly This avoids other parts of the profiler needing to care about this -- they only need to care to run the `after_sample` callback.
…ons directly We no longer need to ask other parts of the code to raise instead :)
This probably needs adjusting for non-4.0 rubies, will do it as a separate pass.
c0e60ce to
f04eb52
Compare
In the future we may end up using the deferred recording for legacy rubies as well, so might as well lay the groundwork.

Overview
The Datadog Ruby heap profiler tracks live heap objects by storing their
object_idwhen they're allocated, then later usingObjectSpace._id2refto check if those objects are still alive. This mechanism is currently incompatible with Ruby 4.x.Key Components
collectors_cpu_and_wall_time_worker.c- Main sampling coordinatorheap_recorder.c- Tracks live heap objects using object IDsAllocation Flow (Before Fix)
Liveness Check Flow
The Ruby 4.x Problem
What Changed
Ruby 4.x changed how
object_idworks internally. The key issue:on_newobj_eventis called during object allocation (object is in "in-between state")rb_obj_id()during this event mutates the object (assigns an ID)Implemented Solution: Deferred Object ID Recording
We defer the
rb_obj_id()call to after the allocation tracepoint completes usingrb_postponed_job_trigger.Allocation Flow (After Fix - Ruby 4+)
Motivation:
Heap profiling in 4.x
Change log entry
Additional Notes:
How to test the change?