Skip to content

Commit

Permalink
fix(profiling): reset all profiling c++ mutexes on fork (#11768)
Browse files Browse the repository at this point in the history
I'm not sure why it took so long to surface this defect, but it turns
out that stack v2 can deadlock applications because not all mutices are
reset.

The repro in #11762 appears to be pretty durable. I need to investigate
it a bit more in order to distill it down into a native stress test we
can use moving forward. In practice, this patch suppresses the noted
behavior in the repro.

## Checklist
- [X] PR author has checked that all the criteria below are met
- The PR description includes an overview of the change
- The PR description articulates the motivation for the change
- The change includes tests OR the PR description describes a testing
strategy
- The PR description notes risks associated with the change, if any
- Newly-added code is easy to change
- The change follows the [library release note
guidelines](https://ddtrace.readthedocs.io/en/stable/releasenotes.html)
- The change includes or references documentation updates if necessary
- Backport labels are set (if
[applicable](https://ddtrace.readthedocs.io/en/latest/contributing.html#backporting))

## Reviewer Checklist
- [x] Reviewer has checked that all the criteria below are met 
- Title is accurate
- All changes are related to the pull request's stated goal
- Avoids breaking
[API](https://ddtrace.readthedocs.io/en/stable/versioning.html#interfaces)
changes
- Testing strategy adequately addresses listed risks
- Newly-added code is easy to change
- Release note makes sense to a user of the library
- If necessary, author has acknowledged and discussed the performance
implications of this PR as reported in the benchmarks PR comment
- Backport labels are set in a manner that is consistent with the
[release branch maintenance
policy](https://ddtrace.readthedocs.io/en/latest/contributing.html#backporting)

---------

Co-authored-by: Taegyun Kim <[email protected]>
  • Loading branch information
sanchda and taegyunkim authored Dec 19, 2024
1 parent 099247e commit d855c4a
Show file tree
Hide file tree
Showing 10 changed files with 25 additions and 8 deletions.
Original file line number Diff line number Diff line change
Expand Up @@ -43,6 +43,8 @@ class UploaderBuilder
static void set_output_filename(std::string_view _output_filename);

static std::variant<Uploader, std::string> build();

static void postfork_child();
};

} // namespace Datadog
Original file line number Diff line number Diff line change
Expand Up @@ -14,9 +14,8 @@ namespace Datadog {
void
Datadog::CodeProvenance::postfork_child()
{
get_instance().mtx.~mutex(); // Destroy the mutex
// NB placement-new to re-init and leak the mutex because doing anything else is UB
new (&get_instance().mtx) std::mutex(); // Recreate the mutex
get_instance().reset(); // Reset the state
}

void
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -24,6 +24,7 @@ ddup_postfork_child()
Datadog::Uploader::postfork_child();
Datadog::SampleManager::postfork_child();
Datadog::CodeProvenance::postfork_child();
Datadog::UploaderBuilder::postfork_child();
}

void
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -203,6 +203,6 @@ Datadog::Profile::collect(const ddog_prof_Sample& sample, int64_t endtime_ns)
void
Datadog::Profile::postfork_child()
{
profile_mtx.unlock();
new (&profile_mtx) std::mutex();
cycle_buffers();
}
Original file line number Diff line number Diff line change
Expand Up @@ -408,7 +408,6 @@ Datadog::Sample::push_absolute_ns(int64_t _timestamp_ns)
return true;
}


bool
Datadog::Sample::push_monotonic_ns(int64_t _monotonic_ns)
{
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -175,5 +175,6 @@ Datadog::Uploader::postfork_parent()
void
Datadog::Uploader::postfork_child()
{
unlock();
// NB placement-new to re-init and leak the mutex because doing anything else is UB
new (&upload_lock) std::mutex();
}
Original file line number Diff line number Diff line change
Expand Up @@ -186,3 +186,10 @@ Datadog::UploaderBuilder::build()

return Datadog::Uploader{ output_filename, ddog_exporter };
}

void
Datadog::UploaderBuilder::postfork_child()
{
// NB placement-new to re-init and leak the mutex because doing anything else is UB
new (&tag_mutex) std::mutex();
}
5 changes: 5 additions & 0 deletions ddtrace/internal/datadog/profiling/stack_v2/src/sampler.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -67,6 +67,11 @@ _stack_v2_atfork_child()
// so we don't even reveal this function to the user
_set_pid(getpid());
ThreadSpanLinks::postfork_child();

// `thread_info_map_lock` and `task_link_map_lock` are global locks held in echion
// NB placement-new to re-init and leak the mutex because doing anything else is UB
new (&thread_info_map_lock) std::mutex;
new (&task_link_map_lock) std::mutex;
}

__attribute__((constructor)) void
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -53,10 +53,8 @@ ThreadSpanLinks::reset()
void
ThreadSpanLinks::postfork_child()
{
// Explicitly destroy and reconstruct the mutex to avoid undefined behavior
get_instance().mtx.~mutex();
// NB placement-new to re-init and leak the mutex because doing anything else is UB
new (&get_instance().mtx) std::mutex();

get_instance().reset();
}

Expand Down
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
---
fixes:
- |
profiling: Fixes a bug where profiling mutexes were not cleared on fork in the child process. This could
cause deadlocks in certain configurations.

0 comments on commit d855c4a

Please sign in to comment.