Skip to content

Conversation

@r1viollet
Copy link

@r1viollet r1viollet commented Jan 6, 2026

Overview

The Datadog Ruby heap profiler tracks live heap objects by storing their object_id when they're allocated, then later using ObjectSpace._id2ref to check if those objects are still alive. This mechanism is currently incompatible with Ruby 4.x.

Key Components

  1. collectors_cpu_and_wall_time_worker.c - Main sampling coordinator
  2. heap_recorder.c - Tracks live heap objects using object IDs

Allocation Flow (Before Fix)

on_newobj_event()  →  start_heap_allocation_recording()  →  end_heap_allocation_recording()
                              ↓
                      rb_obj_id(new_obj)  ← PROBLEM HERE

Liveness Check Flow

heap_recorder_update()  →  st_object_record_update()  →  ruby_ref_from_id()
                                                              ↓
                                                      ObjectSpace._id2ref(obj_id)

The Ruby 4.x Problem

What Changed

Ruby 4.x changed how object_id works internally. The key issue:

  • on_newobj_event is called during object allocation (object is in "in-between state")
  • Calling rb_obj_id() during this event mutates the object (assigns an ID)
  • This mutation is not safe during the allocation tracepoint in Ruby 4.x
  • Reference: Ruby Issue #21710

Implemented Solution: Deferred Object ID Recording

We defer the rb_obj_id() call to after the allocation tracepoint completes using rb_postponed_job_trigger.

Allocation Flow (After Fix - Ruby 4+)

on_newobj_event()
    ↓
start_heap_allocation_recording()
    - Store VALUE in heap_recorder->active_deferred_object
    - Store metadata in heap_recorder->active_deferred_object_data
    ↓
end_heap_allocation_recording()
    - Move to pending_recordings[] array
    - Pre-increment heap_record->num_tracked_objects
    ↓
rb_postponed_job_trigger()
    ↓
finalize_heap_allocation_from_postponed_job()  ← Runs outside tracepoint
    - heap_recorder_finalize_pending_recordings()
    - Call rb_obj_id() safely
    - Commit object_record to heap_record

⚠️ These changes are AI assisted and will require careful review & analysis of performance impacts.

Motivation:

Heap profiling in 4.x

Change log entry

Additional Notes:

How to test the change?

@github-actions github-actions bot added the profiling Involves Datadog profiling label Jan 6, 2026
@github-actions
Copy link

github-actions bot commented Jan 6, 2026

👋 Hey @ivoanjo, please fill "Change log entry" section in the pull request description.

If changes need to be present in CHANGELOG.md you can state it this way

**Change log entry**

Yes. A brief summary to be placed into the CHANGELOG.md

(possible answers Yes/Yep/Yeah)

Or you can opt out like that

**Change log entry**

None.

(possible answers No/Nope/None)

Visited at: 2026-01-20 10:00:27 UTC

@pr-commenter
Copy link

pr-commenter bot commented Jan 6, 2026

Benchmarks

Benchmark execution time: 2026-01-22 16:03:25

Comparing candidate commit ce3c746 in PR branch r1viollet/heap-profiling-4.0 with baseline commit e700369 in branch master.

Found 1 performance improvements and 0 performance regressions! Performance is the same for 43 metrics, 2 unstable metrics.

scenario:profiling - intern_all 1000 repeated strings

  • 🟩 throughput [+1328.144op/s; +1411.655op/s] or [+5.510%; +5.857%]

@r1viollet r1viollet force-pushed the r1viollet/heap-profiling-4.0 branch from c4fdb76 to 49477bf Compare January 6, 2026 17:46
@r1viollet
Copy link
Author

Reminder that to measure this, we should do ON/OFF. Some of the cost will be in the VM itself.

@datadog-official
Copy link

datadog-official bot commented Jan 9, 2026

✅ Tests

🎉 All green!

❄️ No new flaky tests detected
🧪 All tests passed

🎯 Code Coverage
Patch Coverage: 90.24%
Overall Coverage: 95.20% (-0.01%)

View detailed report

This comment will be updated automatically if new data arrives.
🔗 Commit SHA: cb3ef5e | Docs | Datadog PR Page | Was this helpful? Give us feedback!

Copy link
Member

@ivoanjo ivoanjo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I ran out of time today, here's what I got so far! In particular, I still need to stare at heap_recorder.c for longer, I'm not yet convinced it's correct, I'm seeing some state getting carried across calls that I'm not confident is right.

The current notes are small-ish stuff, other than the extra overhead that's unneeded (and I believe should be easy to fix) + the code exposing the heap recorder directly to the cpu and wall collector that ideally I'd like to avoid too if possible.

Comment on lines 93 to 92
#ifdef DEFERRED_HEAP_ALLOCATION_RECORDING
static rb_postponed_job_handle_t finalize_heap_allocation_from_postponed_job_handle;
#endif
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor: More as a style thing, in general I've avoided overly-ifdefing things out when they're harmless.

That is, the cost of a leftover extra field on Rubies that don't need it is so small that I usually prefer the advantage of less code and easier to reason due to less ifdef branching. (Same for most spots in this file)

Comment on lines 335 to 348
context "on Ruby 4.0 or newer" do
let(:testing_version) { "4.0.0" }

it "initializes StackRecorder without heap sampling support and warns" do
before do
settings.profiling.allocation_enabled = true
allow(logger).to receive(:debug)
end

it "initializes StackRecorder with heap sampling support" do
expect(Datadog::Profiling::StackRecorder).to receive(:new)
.with(hash_including(heap_samples_enabled: false, heap_size_enabled: false))
.with(hash_including(heap_samples_enabled: true, heap_size_enabled: true))
.and_call_original

expect(logger).to receive(:warn).with(/Datadog Ruby heap profiler is currently incompatible with Ruby 4/)

build_profiler_component
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we do add an extra setting to toggle the new code, this testcase is still worth keeping. On the other hand, if we don't, then this test-case has become redundant: since Ruby 4 is no longer special, the existing tests below already cover this situation. (The test was added exactly because there was an exception for Ruby 4)

Comment on lines 82 to 92
#ifdef DEFERRED_HEAP_ALLOCATION_RECORDING
// A pending recording is used to defer the object_id call on Ruby 4+
// where calling rb_obj_id during on_newobj_event is unsafe.
typedef struct {
VALUE object_ref;
heap_record *heap_record;
live_object_data object_data;
} pending_recording;

#define MAX_PENDING_RECORDINGS 64
#endif
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor: The same note about not overly if-def'ing things from the cpu and wall collector I believe applies here as well

Comment on lines 522 to 553
VALUE obj = heap_recorder->pending_recordings[i].object_ref;
if (obj != Qnil) {
rb_gc_mark(obj);
}
Copy link
Member

@ivoanjo ivoanjo Jan 13, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The object can't ever be Qnil (or if it is, we have a bug...), since it was set from an allocation and Qnil is not a heap-allocated object. (It's a tagged pointer)

Comment on lines 529 to 559
if (heap_recorder->active_deferred_object != Qnil) {
rb_gc_mark(heap_recorder->active_deferred_object);
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor: This can can indeed trivially be Qnil; but btw it's OK to mark Qnil, so maybe remove the branch anyway? Less code ;)

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

feels weird to do so ^^

Copy link
Member

@ivoanjo ivoanjo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok here's my comments from the full pass :)

I think the key change needed is to make sure we call during_sample_enter to avoid any other parts of the profiler firing in the middle of finalization.

Having said that, especially on the heap profiler I'm not a huge fan of some of the duplication -- that code is ultra-fiddly and so having complex logic duplicated across if-defs I worry makes it easy to forget to update one of the versions.

Comment on lines 1430 to 1440
static void finalize_heap_allocation_from_postponed_job(DDTRACE_UNUSED void *_unused) {
cpu_and_wall_time_worker_state *state = active_sampler_instance_state;

if (state == NULL) return;

if (!ddtrace_rb_ractor_main_p()) {
return;
}

// Get the heap_recorder from the thread_context_collector
heap_recorder *recorder = thread_context_collector_get_heap_recorder(state->thread_context_collector_instance);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, on a second pass, there's two things missing here that can create a bit of a sharp edge:

a. We're missing during_sample_enter and during_sample_exit -- these set a flag that allows us to avoid nested operations inside the profiler. E.g. some sharp edges along the line of "the profiler is sampling something else -> it calls some VM api that causes the VM to check for interruptions -> the VM decides now it's a really nice time to flush heap things -> our current state may not be in a consistent sate" (or reverse -- maybe this is the function that started first, and it triggers an allocation, and the flip situation happens)

b. (Minor) We're missing the discrete_dynamic_sampler_before_sample and discrete_dynamic_sampler_after_sample calls to update the dynamic sampling rate mechanism. In practice, this means that work done inside this function isn't accounted as being profiler overhead. TBH what we're doing right now isn't a lot but... yeah maybe at least leave a comment saying "This is not being accounted for the dynamic sampling rate update, and it's ok because the amount of work we do in this case is very small"

Both things happen in e.g. on_newobj_event so doing the same here should be enough.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Edit: Actually I wonder if we can even get recursion where finalize gets called -> goes into the vm -> vm calls finalize again (e.g. if there's an allocation) or something. during_sample_enter/during_sample_exit would also protect against that.

Comment on lines 465 to 489
bool heap_recorder_has_pending_recordings(heap_recorder *heap_recorder) {
if (heap_recorder == NULL) {
return false;
}
return heap_recorder->pending_recordings_count > 0;
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor: Since this is only for debugging, I wonder if we should just return the count instead of a boolean. Same cost for the logic, and a bit easier to debug from the Ruby side.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As this is style, I'll leave it to the end.

cleanup_heap_record_if_unused(heap_recorder, pending->heap_record);
object_record_free(heap_recorder, record);
}
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks like an inlined version of commit_recording, possibly with the difference that the num_tracked_objects has been pre-incremented.

I think since this logic is quite fiddly, it's probably worth trying to unify them -- possibly having a flag that says "I've already incremented the object!"; or just have callers do the increment.

(For instance -- I just noticed that the new code doesn't check for overflow on num_tracked_objects and this is the kind of "oops" that can easily creep in if the logic gets too duplicated)

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

tried a slight refactoring, still fiddly

Comment on lines 704 to 742
describe "pending heap recordings cleanup" do
def has_pending_recordings?
described_class::Testing.send(:_native_has_pending_heap_recordings?, stack_recorder)
end

def track_object_without_finalize(obj)
described_class::Testing._native_track_object(stack_recorder, obj, sample_rate, obj.class.name)
Datadog::Profiling::Collectors::Stack::Testing
._native_sample(Thread.current, stack_recorder, metric_values, labels, numeric_labels)
end

it "clears pending recordings after finalization" do
skip "Only applies to Ruby 4+ with deferred heap allocation recording" if RUBY_VERSION < "4"

test_object = Object.new

track_object_without_finalize(test_object)

expect(has_pending_recordings?).to be true

described_class::Testing._native_finalize_pending_heap_recordings(stack_recorder)

expect(has_pending_recordings?).to be false
end

it "clears pending recordings after multiple allocations" do
skip "Only applies to Ruby 4+ with deferred heap allocation recording" if RUBY_VERSION < "4"

3.times do
test_object = Object.new
track_object_without_finalize(test_object)
end

expect(has_pending_recordings?).to be true

described_class::Testing._native_finalize_pending_heap_recordings(stack_recorder)

expect(has_pending_recordings?).to be false
end
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor: I'm not sure these tests add a lot... Specifically, because they don't assert on any results, it's easy for them to pass while not doing the correct thing.

Furthermore, we have existing coverage where we already check if the objects we intend to sample/track do get sampled/track.

So... I'd be inclined to remove these tests (and the pending_heap_recordings? helper maybe as well?).

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

things slightly changed. I'll let you refer to comment above (in this file)

@r1viollet r1viollet force-pushed the r1viollet/heap-profiling-4.0 branch 2 times, most recently from e966c43 to ea9eba7 Compare January 16, 2026 15:50
@github-actions github-actions bot added the core Involves Datadog core libraries label Jan 16, 2026
@r1viollet
Copy link
Author

I took into account major comments.
Next step perf tests.
Then we can do style fixups.

@r1viollet r1viollet force-pushed the r1viollet/heap-profiling-4.0 branch from ea9eba7 to 4dbdd06 Compare January 19, 2026 08:32

expect(has_pending_recordings?).to be true

described_class::Testing._native_finalize_pending_heap_recordings(stack_recorder)
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The test shows that you need these bits which is not ideal (but I still like that the test forces me to do this).

@ivoanjo ivoanjo changed the title Heap profiling for ruby 4.x - Prototype [PROF-13510] Heap profiling for ruby 4.x - Prototype Jan 20, 2026
r1viollet and others added 15 commits January 21, 2026 14:28
The idea is to delay the time at which we record object ids.
Once we are outside of the allocation code path, we can request the object ID.
- Avoid scheduling many postponed jobs
This fixes some of the issues we had with accuracy
Also I suspect that this has less overhead
- Avoid re-entrancy based on Ivo's comments
Although some of this code is dead code on legacy Rubies, always
compiling it in means less ifdefs spread throughout and it helps
keep the code focused on modern rubies, rather than on legacy ones.
This check is already covered by
`heap_recorder->active_recording != NULL` (they're set and unset
together).
This makes it easier to use this in tests.
…lector

This moves the logic closer to the heap profiler and helps focus the
CpuAndWallTimeWorker on what (triggering or not) and not why (doesn't
care?).
This API only became available after I rebased.
This reverts commit e153759.

(Avoid touching CHANGELOG for nicer diff)
…is needed

This will replace the more heavy-handed query in
`thread_context_collector_heap_pending_buffer_pressure`.
…filer directly

This avoids other parts of the profiler needing to care about this --
they only need to care to run the `after_sample` callback.
…ons directly

We no longer need to ask other parts of the code to raise instead :)
This probably needs adjusting for non-4.0 rubies, will do it as
a separate pass.
@ivoanjo ivoanjo force-pushed the r1viollet/heap-profiling-4.0 branch from c0e60ce to f04eb52 Compare January 21, 2026 17:10
@ivoanjo
Copy link
Member

ivoanjo commented Jan 23, 2026

Quick side-by-side results from Ruby 3.3 vs 4.0 in our Ruby on Rails test app:

image

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

core Involves Datadog core libraries profiling Involves Datadog profiling

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants