[PROF-13510] Heap profiling for ruby 4.x - Prototype #5201

r1viollet · 2026-01-06T17:15:23Z

Overview

The Datadog Ruby heap profiler tracks live heap objects by storing their object_id when they're allocated, then later using ObjectSpace._id2ref to check if those objects are still alive. This mechanism is currently incompatible with Ruby 4.x.

Key Components

collectors_cpu_and_wall_time_worker.c - Main sampling coordinator
heap_recorder.c - Tracks live heap objects using object IDs

Allocation Flow (Before Fix)

on_newobj_event()  →  start_heap_allocation_recording()  →  end_heap_allocation_recording()
                              ↓
                      rb_obj_id(new_obj)  ← PROBLEM HERE

Liveness Check Flow

heap_recorder_update()  →  st_object_record_update()  →  ruby_ref_from_id()
                                                              ↓
                                                      ObjectSpace._id2ref(obj_id)

The Ruby 4.x Problem

What Changed

Ruby 4.x changed how object_id works internally. The key issue:

on_newobj_event is called during object allocation (object is in "in-between state")
Calling rb_obj_id() during this event mutates the object (assigns an ID)
This mutation is not safe during the allocation tracepoint in Ruby 4.x
Reference: Ruby Issue #21710

Implemented Solution: Deferred Object ID Recording

We defer the rb_obj_id() call to after the allocation tracepoint completes using rb_postponed_job_trigger.

Allocation Flow (After Fix - Ruby 4+)

on_newobj_event()
    ↓
start_heap_allocation_recording()
    - Store VALUE in heap_recorder->active_deferred_object
    - Store metadata in heap_recorder->active_deferred_object_data
    ↓
end_heap_allocation_recording()
    - Move to pending_recordings[] array
    - Pre-increment heap_record->num_tracked_objects
    ↓
rb_postponed_job_trigger()
    ↓
finalize_heap_allocation_from_postponed_job()  ← Runs outside tracepoint
    - heap_recorder_finalize_pending_recordings()
    - Call rb_obj_id() safely
    - Commit object_record to heap_record

⚠️ These changes are AI assisted and will require careful review & analysis of performance impacts.

Motivation:

Heap profiling in 4.x

Change log entry

Additional Notes:

How to test the change?

github-actions · 2026-01-06T17:15:34Z

👋 Hey @ivoanjo, please fill "Change log entry" section in the pull request description.

If changes need to be present in CHANGELOG.md you can state it this way

**Change log entry**

Yes. A brief summary to be placed into the CHANGELOG.md

(possible answers Yes/Yep/Yeah)

Or you can opt out like that

**Change log entry**

None.

(possible answers No/Nope/None)

^{Visited at: 2026-01-20 10:00:27 UTC}

pr-commenter · 2026-01-06T17:43:49Z

Benchmarks

Benchmark execution time: 2026-01-22 16:03:25

Comparing candidate commit ce3c746 in PR branch r1viollet/heap-profiling-4.0 with baseline commit e700369 in branch master.

Found 1 performance improvements and 0 performance regressions! Performance is the same for 43 metrics, 2 unstable metrics.

scenario:profiling - intern_all 1000 repeated strings

🟩 throughput [+1328.144op/s; +1411.655op/s] or [+5.510%; +5.857%]

r1viollet · 2026-01-09T17:00:36Z

Reminder that to measure this, we should do ON/OFF. Some of the cost will be in the VM itself.

datadog-official · 2026-01-09T17:42:23Z

✅ Tests

🎉 All green!

❄️ No new flaky tests detected
🧪 All tests passed

🎯 Code Coverage
• Patch Coverage: 90.24%
• Overall Coverage: 95.20% (-0.01%)

View detailed report

_{This comment will be updated automatically if new data arrives.

🔗 Commit SHA: cb3ef5e | Docs | Datadog PR Page | Was this helpful? Give us feedback!}

ivoanjo

I ran out of time today, here's what I got so far! In particular, I still need to stare at heap_recorder.c for longer, I'm not yet convinced it's correct, I'm seeing some state getting carried across calls that I'm not confident is right.

The current notes are small-ish stuff, other than the extra overhead that's unneeded (and I believe should be easy to fix) + the code exposing the heap recorder directly to the cpu and wall collector that ideally I'd like to avoid too if possible.

ivoanjo · 2026-01-13T12:31:02Z

ext/datadog_profiling_native_extension/collectors_cpu_and_wall_time_worker.c

+  #ifdef DEFERRED_HEAP_ALLOCATION_RECORDING
+    static rb_postponed_job_handle_t finalize_heap_allocation_from_postponed_job_handle;
+  #endif


Minor: More as a style thing, in general I've avoided overly-ifdefing things out when they're harmless.

That is, the cost of a leftover extra field on Rubies that don't need it is so small that I usually prefer the advantage of less code and easier to reason due to less ifdef branching. (Same for most spots in this file)

ext/datadog_profiling_native_extension/collectors_cpu_and_wall_time_worker.c

ext/datadog_profiling_native_extension/extconf.rb

ivoanjo · 2026-01-13T13:05:13Z

spec/datadog/profiling/component_spec.rb

          context "on Ruby 4.0 or newer" do
            let(:testing_version) { "4.0.0" }

-            it "initializes StackRecorder without heap sampling support and warns" do
+            before do
+              settings.profiling.allocation_enabled = true
+              allow(logger).to receive(:debug)
+            end
+
+            it "initializes StackRecorder with heap sampling support" do
              expect(Datadog::Profiling::StackRecorder).to receive(:new)
-                .with(hash_including(heap_samples_enabled: false, heap_size_enabled: false))
+                .with(hash_including(heap_samples_enabled: true, heap_size_enabled: true))
                .and_call_original

-              expect(logger).to receive(:warn).with(/Datadog Ruby heap profiler is currently incompatible with Ruby 4/)
-
              build_profiler_component


If we do add an extra setting to toggle the new code, this testcase is still worth keeping. On the other hand, if we don't, then this test-case has become redundant: since Ruby 4 is no longer special, the existing tests below already cover this situation. (The test was added exactly because there was an exception for Ruby 4)

ext/datadog_profiling_native_extension/heap_recorder.h

ivoanjo · 2026-01-13T15:44:04Z

ext/datadog_profiling_native_extension/heap_recorder.c

+#ifdef DEFERRED_HEAP_ALLOCATION_RECORDING
+// A pending recording is used to defer the object_id call on Ruby 4+
+// where calling rb_obj_id during on_newobj_event is unsafe.
+typedef struct {
+  VALUE object_ref;
+  heap_record *heap_record;
+  live_object_data object_data;
+} pending_recording;
+
+#define MAX_PENDING_RECORDINGS 64
+#endif


Minor: The same note about not overly if-def'ing things from the cpu and wall collector I believe applies here as well

ivoanjo · 2026-01-13T15:46:26Z

ext/datadog_profiling_native_extension/heap_recorder.c

+    VALUE obj = heap_recorder->pending_recordings[i].object_ref;
+    if (obj != Qnil) {
+      rb_gc_mark(obj);
+    }


The object can't ever be Qnil (or if it is, we have a bug...), since it was set from an allocation and Qnil is not a heap-allocated object. (It's a tagged pointer)

ivoanjo · 2026-01-13T15:47:53Z

ext/datadog_profiling_native_extension/heap_recorder.c

+  if (heap_recorder->active_deferred_object != Qnil) {
+    rb_gc_mark(heap_recorder->active_deferred_object);
+  }


Minor: This can can indeed trivially be Qnil; but btw it's OK to mark Qnil, so maybe remove the branch anyway? Less code ;)

feels weird to do so ^^

ivoanjo

Ok here's my comments from the full pass :)

I think the key change needed is to make sure we call during_sample_enter to avoid any other parts of the profiler firing in the middle of finalization.

Having said that, especially on the heap profiler I'm not a huge fan of some of the duplication -- that code is ultra-fiddly and so having complex logic duplicated across if-defs I worry makes it easy to forget to update one of the versions.

ivoanjo · 2026-01-14T12:04:48Z

ext/datadog_profiling_native_extension/collectors_cpu_and_wall_time_worker.c

+static void finalize_heap_allocation_from_postponed_job(DDTRACE_UNUSED void *_unused) {
+  cpu_and_wall_time_worker_state *state = active_sampler_instance_state;
+
+  if (state == NULL) return;
+
+  if (!ddtrace_rb_ractor_main_p()) {
+    return;
+  }
+
+  // Get the heap_recorder from the thread_context_collector
+  heap_recorder *recorder = thread_context_collector_get_heap_recorder(state->thread_context_collector_instance);


Ah, on a second pass, there's two things missing here that can create a bit of a sharp edge:

a. We're missing during_sample_enter and during_sample_exit -- these set a flag that allows us to avoid nested operations inside the profiler. E.g. some sharp edges along the line of "the profiler is sampling something else -> it calls some VM api that causes the VM to check for interruptions -> the VM decides now it's a really nice time to flush heap things -> our current state may not be in a consistent sate" (or reverse -- maybe this is the function that started first, and it triggers an allocation, and the flip situation happens)

b. (Minor) We're missing the discrete_dynamic_sampler_before_sample and discrete_dynamic_sampler_after_sample calls to update the dynamic sampling rate mechanism. In practice, this means that work done inside this function isn't accounted as being profiler overhead. TBH what we're doing right now isn't a lot but... yeah maybe at least leave a comment saying "This is not being accounted for the dynamic sampling rate update, and it's ok because the amount of work we do in this case is very small"

Both things happen in e.g. on_newobj_event so doing the same here should be enough.

Edit: Actually I wonder if we can even get recursion where finalize gets called -> goes into the vm -> vm calls finalize again (e.g. if there's an allocation) or something. during_sample_enter/during_sample_exit would also protect against that.

ext/datadog_profiling_native_extension/heap_recorder.c

ivoanjo · 2026-01-14T12:34:38Z

ext/datadog_profiling_native_extension/heap_recorder.c

+bool heap_recorder_has_pending_recordings(heap_recorder *heap_recorder) {
+  if (heap_recorder == NULL) {
+    return false;
+  }
+  return heap_recorder->pending_recordings_count > 0;
+}


Minor: Since this is only for debugging, I wonder if we should just return the count instead of a boolean. Same cost for the logic, and a bit easier to debug from the Ruby side.

As this is style, I'll leave it to the end.

ext/datadog_profiling_native_extension/heap_recorder.c

ivoanjo · 2026-01-14T12:47:52Z

ext/datadog_profiling_native_extension/heap_recorder.c

+      cleanup_heap_record_if_unused(heap_recorder, pending->heap_record);
+      object_record_free(heap_recorder, record);
+    }
+  }


This looks like an inlined version of commit_recording, possibly with the difference that the num_tracked_objects has been pre-incremented.

I think since this logic is quite fiddly, it's probably worth trying to unify them -- possibly having a flag that says "I've already incremented the object!"; or just have callers do the increment.

(For instance -- I just noticed that the new code doesn't check for overflow on num_tracked_objects and this is the kind of "oops" that can easily creep in if the logic gets too duplicated)

tried a slight refactoring, still fiddly

ivoanjo · 2026-01-14T16:02:29Z

spec/datadog/profiling/stack_recorder_spec.rb

+        describe "pending heap recordings cleanup" do
+          def has_pending_recordings?
+            described_class::Testing.send(:_native_has_pending_heap_recordings?, stack_recorder)
+          end
+
+          def track_object_without_finalize(obj)
+            described_class::Testing._native_track_object(stack_recorder, obj, sample_rate, obj.class.name)
+            Datadog::Profiling::Collectors::Stack::Testing
+              ._native_sample(Thread.current, stack_recorder, metric_values, labels, numeric_labels)
+          end
+
+          it "clears pending recordings after finalization" do
+            skip "Only applies to Ruby 4+ with deferred heap allocation recording" if RUBY_VERSION < "4"
+
+            test_object = Object.new
+
+            track_object_without_finalize(test_object)
+
+            expect(has_pending_recordings?).to be true
+
+            described_class::Testing._native_finalize_pending_heap_recordings(stack_recorder)
+
+            expect(has_pending_recordings?).to be false
+          end
+
+          it "clears pending recordings after multiple allocations" do
+            skip "Only applies to Ruby 4+ with deferred heap allocation recording" if RUBY_VERSION < "4"
+
+            3.times do
+              test_object = Object.new
+              track_object_without_finalize(test_object)
+            end
+
+            expect(has_pending_recordings?).to be true
+
+            described_class::Testing._native_finalize_pending_heap_recordings(stack_recorder)
+
+            expect(has_pending_recordings?).to be false
+          end


Minor: I'm not sure these tests add a lot... Specifically, because they don't assert on any results, it's easy for them to pass while not doing the correct thing.

Furthermore, we have existing coverage where we already check if the objects we intend to sample/track do get sampled/track.

So... I'd be inclined to remove these tests (and the pending_heap_recordings? helper maybe as well?).

things slightly changed. I'll let you refer to comment above (in this file)

r1viollet · 2026-01-16T15:51:01Z

I took into account major comments.
Next step perf tests.
Then we can do style fixups.

r1viollet · 2026-01-19T08:54:21Z

spec/datadog/profiling/stack_recorder_spec.rb

+
+            expect(has_pending_recordings?).to be true
+
+            described_class::Testing._native_finalize_pending_heap_recordings(stack_recorder)


The test shows that you need these bits which is not ideal (but I still like that the test forces me to do this).

The idea is to delay the time at which we record object ids. Once we are outside of the allocation code path, we can request the object ID.

- Avoid scheduling many postponed jobs This fixes some of the issues we had with accuracy Also I suspect that this has less overhead - Avoid re-entrancy based on Ivo's comments

Although some of this code is dead code on legacy Rubies, always compiling it in means less ifdefs spread throughout and it helps keep the code focused on modern rubies, rather than on legacy ones.

This check is already covered by `heap_recorder->active_recording != NULL` (they're set and unset together).

This makes it easier to use this in tests.

… avoid extra helper

…lector This moves the logic closer to the heap profiler and helps focus the CpuAndWallTimeWorker on what (triggering or not) and not why (doesn't care?).

This API only became available after I rebased.

This reverts commit e153759. (Avoid touching CHANGELOG for nicer diff)

…is needed This will replace the more heavy-handed query in `thread_context_collector_heap_pending_buffer_pressure`.

…filer directly This avoids other parts of the profiler needing to care about this -- they only need to care to run the `after_sample` callback.

…iler

…ons directly We no longer need to ask other parts of the code to raise instead :)

This probably needs adjusting for non-4.0 rubies, will do it as a separate pass.

In the future we may end up using the deferred recording for legacy rubies as well, so might as well lay the groundwork.

ivoanjo · 2026-01-23T11:53:54Z

Quick side-by-side results from Ruby 3.3 vs 4.0 in our Ruby on Rails test app:

github-actions bot added the profiling Involves Datadog profiling label Jan 6, 2026

r1viollet force-pushed the r1viollet/heap-profiling-4.0 branch from c4fdb76 to 49477bf Compare January 6, 2026 17:46

ivoanjo reviewed Jan 13, 2026

View reviewed changes

ivoanjo reviewed Jan 14, 2026

View reviewed changes

r1viollet force-pushed the r1viollet/heap-profiling-4.0 branch 2 times, most recently from e966c43 to ea9eba7 Compare January 16, 2026 15:50

github-actions bot added the core Involves Datadog core libraries label Jan 16, 2026

r1viollet force-pushed the r1viollet/heap-profiling-4.0 branch from ea9eba7 to 4dbdd06 Compare January 19, 2026 08:32

r1viollet commented Jan 19, 2026

View reviewed changes

ivoanjo changed the title ~~Heap profiling for ruby 4.x - Prototype~~ [PROF-13510] Heap profiling for ruby 4.x - Prototype Jan 20, 2026

r1viollet and others added 15 commits January 21, 2026 14:28

Heap profiling for ruby 4.x - Prototype

9564ca6

The idea is to delay the time at which we record object ids. Once we are outside of the allocation code path, we can request the object ID.

heap profiling 4.x - Remove unnecessary check

b5e5175

Heap profiling 4.x: fix compilation warning for ruby <4 versions

b7a82ad

profiling: heap 4.x - Batch postoned jobs & fixes

5668338

- Avoid scheduling many postponed jobs This fixes some of the issues we had with accuracy Also I suspect that this has less overhead - Avoid re-entrancy based on Ivo's comments

Heap profiling 4.x - Adjust error handling & flag naming

660f7f4

profiling heap profiling for ruby4: add an experimental flag & fixups

34f28d7

Profiling experimental 4.0 support - Add a changelog entry

e153759

Remove a few of the ifdefs to simplify code

ce75c22

Although some of this code is dead code on legacy Rubies, always compiling it in means less ifdefs spread throughout and it helps keep the code focused on modern rubies, rather than on legacy ones.

Minor: Remove check for active_deferred_object

9d5aa45

This check is already covered by `heap_recorder->active_recording != NULL` (they're set and unset together).

Minor: Move error checking closer to return

851e439

Minor: Simplify marking code

a15509f

Make env flag to enable ruby 4 heap profiler internal for now

68ec35c

Tweak testing of heap profiling enabling logic

f412be4

Minor: Simplify new stats by removing ifdefs

1e9c1d6

Fix C to Ruby conversions that should be unsigned

e1c9925

ivoanjo added 12 commits January 21, 2026 14:30

Return heap profiler debugging state as objects, not a string

5c5e519

This makes it easier to use this in tests.

Minor: Expose pending_recordings_count using heap recorder state to…

d538df7

… avoid extra helper

Refactor heap finalization to move decision inside thread context col…

ab7707c

…lector This moves the logic closer to the heap profiler and helps focus the CpuAndWallTimeWorker on what (triggering or not) and not why (doesn't care?).

Allow after_allocation to signal failures using exceptions

d114053

Use new raise_error API from latest master

e15f05e

This API only became available after I rebased.

Revert "Profiling experimental 4.0 support - Add a changelog entry"

1ffbd39

This reverts commit e153759. (Avoid touching CHANGELOG for nicer diff)

Further cleanups for after_allocation calling

b52d7bf

Start piping mechanism for heap recorder to signal when after_sample …

503346e

…is needed This will replace the more heavy-handed query in `thread_context_collector_heap_pending_buffer_pressure`.

Make requests for flushing pending_recordings be driven by heap pro…

84cf5fc

…filer directly This avoids other parts of the profiler needing to care about this -- they only need to care to run the `after_sample` callback.

Avoid having thread context collector needing to know about heap prof…

a798ea5

…iler

Refactor heap_recorder_finalize_pending_recordings to raise excepti…

fd6fcfb

…ons directly We no longer need to ask other parts of the code to raise instead :)

A few more cleanups

f04eb52

This probably needs adjusting for non-4.0 rubies, will do it as a separate pass.

ivoanjo force-pushed the r1viollet/heap-profiling-4.0 branch from c0e60ce to f04eb52 Compare January 21, 2026 17:10

ivoanjo added 8 commits January 22, 2026 10:29

Minor: Clean up TODO

f13d541

Reduce ifdefs, just let legacy rubies compile code too

6fdb2c8

In the future we may end up using the deferred recording for legacy rubies as well, so might as well lay the groundwork.

Small simplifications

698b334

More small simplifications

b881957

Extract out common code in a slightly different way

3a671f6

Remove header reference (not available on all Rubies)

a383a19

Merge branch 'master' into r1viollet/heap-profiling-4.0

ce3c746

Make linter happy

cb3ef5e


		expect(has_pending_recordings?).to be true

		described_class::Testing._native_finalize_pending_heap_recordings(stack_recorder)

[PROF-13510] Heap profiling for ruby 4.x - Prototype #5201

Are you sure you want to change the base?

[PROF-13510] Heap profiling for ruby 4.x - Prototype #5201

Conversation

r1viollet commented Jan 6, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Overview

Key Components

Allocation Flow (Before Fix)

Liveness Check Flow

The Ruby 4.x Problem

What Changed

Implemented Solution: Deferred Object ID Recording

Allocation Flow (After Fix - Ruby 4+)

Uh oh!

github-actions bot commented Jan 6, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

pr-commenter bot commented Jan 6, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Benchmarks

scenario:profiling - intern_all 1000 repeated strings

Uh oh!

r1viollet commented Jan 9, 2026

Uh oh!

datadog-official bot commented Jan 9, 2026 • edited by datadog-datadog-prod-us1 bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ivoanjo left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

ivoanjo Jan 13, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

ivoanjo left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

r1viollet commented Jan 16, 2026

Uh oh!

Choose a reason for hiding this comment

Uh oh!

ivoanjo commented Jan 23, 2026

Uh oh!

Reviewers

r1viollet commented Jan 6, 2026 •

edited

Loading

github-actions bot commented Jan 6, 2026 •

edited

Loading

pr-commenter bot commented Jan 6, 2026 •

edited

Loading

datadog-official bot commented Jan 9, 2026 •

edited by datadog-datadog-prod-us1 bot

Loading

ivoanjo Jan 13, 2026 •

edited

Loading