Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

-musl platforms do not include unwind tables for libc #134592

Open
RomanHargrave opened this issue Dec 21, 2024 · 2 comments
Open

-musl platforms do not include unwind tables for libc #134592

RomanHargrave opened this issue Dec 21, 2024 · 2 comments
Labels
C-bug Category: This is a bug. needs-triage This issue may need triage. Remove it if it has been sufficiently triaged. O-musl Target: The musl libc T-bootstrap Relevant to the bootstrap subteam: Rust's build system (x.py and src/bootstrap) T-compiler Relevant to the compiler team, which will review and decide on the PR/issue.

Comments

@RomanHargrave
Copy link

RomanHargrave commented Dec 21, 2024

In certain cases, it may be desirable to capture a backtrace from a signal handler which includes stack frames prior to signal handling.

Consider this program, which attempts to demonstrate the problem in a concise way - please note that the actual implementation is more complex, in order to deal with signal safety:

use libc::{
    c_int, c_void, getpid, getuid, pid_t, sigaction, sigemptyset, sigset_t, size_t, syscall, uid_t,
    SYS_rt_tgsigqueueinfo, SA_SIGINFO, SIGRTMIN,
};
use std::{
    backtrace::Backtrace,
    mem::MaybeUninit,
    ptr::{addr_of, addr_of_mut, null_mut},
};

// please do not use this definition of siginfo_t in live code
#[repr(C)]
struct siginfo_t {
    si_signo: c_int,
    _si_errno: c_int,
    si_code: c_int,
    si_pid: pid_t,
    si_uid: uid_t,
    si_ptr: *mut c_void,
    _si_pad: [c_int; (128 / size_of::<c_int>()) - 3],
}

unsafe extern "C" fn handle_signal(_signo: c_int, _info: *mut siginfo_t, _ucontext: *const c_void) {
    // neither of these operations are signal safe
    let bt = Backtrace::force_capture();
    dbg!(bt);
}

fn main() {
    let sa_mask = unsafe {
        let mut sa_mask = MaybeUninit::<sigset_t>::uninit();
        sigemptyset(sa_mask.as_mut_ptr());
        sa_mask.assume_init()
    };

    let sa = sigaction {
        sa_sigaction: handle_signal as size_t,
        sa_mask,
        sa_flags: SA_SIGINFO,
        sa_restorer: None,
    };

    // please do not blindly copy this code for use in real world applications, it is meant to be
    // brief and functional, not strictly correct.
    unsafe {
        let _ = sigaction(SIGRTMIN(), addr_of!(sa), null_mut());

        let mut si: siginfo_t = MaybeUninit::zeroed().assume_init();
        si.si_signo = SIGRTMIN();
        si.si_code = -1; // SI_QUEUE
        si.si_pid = getpid();
        si.si_uid = getuid();

        let _ = syscall(
            SYS_rt_tgsigqueueinfo,
            getpid(),
            getpid(),
            SIGRTMIN(),
            addr_of_mut!(si),
        );
    }
}

When built for x86_64-unknown-linux-gnu, it produces the following output:

[src/main.rs:24:5] bt = Backtrace [
    { fn: "sigtest::handle_signal", file: "./src/main.rs", line: 23 },
    { fn: "syscall" },
    { fn: "sigtest::main", file: "./src/main.rs", line: 51 },
    { fn: "core::ops::function::FnOnce::call_once", file: "/rustc/f6e511eec7342f59a25f7c0534f1dbea00d01b14/library/core/src/ops/function.rs", line: 250 },
    { fn: "std::sys::backtrace::__rust_begin_short_backtrace", file: "/rustc/f6e511eec7342f59a25f7c0534f1dbea00d01b14/library/std/src/sys/backtrace.rs", line: 154 },
    { fn: "std::rt::lang_start::{{closure}}", file: "/rustc/f6e511eec7342f59a25f7c0534f1dbea00d01b14/library/std/src/rt.rs", line: 164 },
    { fn: "core::ops::function::impls::<impl core::ops::function::FnOnce<A> for &F>::call_once", file: "/rustc/f6e511eec7342f59a25f7c0534f1dbea00d01b14/library/core/src/ops/function.rs", line: 284 },
    { fn: "std::panicking::try::do_call", file: "/rustc/f6e511eec7342f59a25f7c0534f1dbea00d01b14/library/std/src/panicking.rs", line: 554 },
    { fn: "std::panicking::try", file: "/rustc/f6e511eec7342f59a25f7c0534f1dbea00d01b14/library/std/src/panicking.rs", line: 518 },
    { fn: "std::panic::catch_unwind", file: "/rustc/f6e511eec7342f59a25f7c0534f1dbea00d01b14/library/std/src/panic.rs", line: 345 },
    { fn: "std::rt::lang_start_internal::{{closure}}", file: "/rustc/f6e511eec7342f59a25f7c0534f1dbea00d01b14/library/std/src/rt.rs", line: 143 },
    { fn: "std::panicking::try::do_call", file: "/rustc/f6e511eec7342f59a25f7c0534f1dbea00d01b14/library/std/src/panicking.rs", line: 554 },
    { fn: "std::panicking::try", file: "/rustc/f6e511eec7342f59a25f7c0534f1dbea00d01b14/library/std/src/panicking.rs", line: 518 },
    { fn: "std::panic::catch_unwind", file: "/rustc/f6e511eec7342f59a25f7c0534f1dbea00d01b14/library/std/src/panic.rs", line: 345 },
    { fn: "std::rt::lang_start_internal", file: "/rustc/f6e511eec7342f59a25f7c0534f1dbea00d01b14/library/std/src/rt.rs", line: 143 },
    { fn: "std::rt::lang_start", file: "/rustc/f6e511eec7342f59a25f7c0534f1dbea00d01b14/library/std/src/rt.rs", line: 163 },
    { fn: "main" },
    { fn: "__libc_start_main" },
    { fn: "_start" },
]

When built for x86_64-unknown-linux-musl, this is the output:

[src/main.rs:24:5] bt = Backtrace [
    { fn: "sigtest::handle_signal", file: "./src/main.rs", line: 23 },
]

Observe that at one point the backtrace walks through syscall - while in this instance this is because the same thread issuing the syscall receives the signal, it is conceivable - and quite likely - that any arbitrary thread may be caught making a syscall or otherwise inside of libc if some other thread were to send a signal for such purpose.

This is caused by a combination of two similar but technically distinct problems:

  1. musl's signal trampoline does not have CFI annotations - this can be worked around by writing one's own trampoline and making the rt_sigaction syscall directly.
  2. musl, by default, does not include exception handling information in non-debug builds:
#
# Modern GCC wants to put DWARF tables (used for debugging and
# unwinding) in the loaded part of the program where they are
# unstrippable. These options force them back to debug sections (and
# cause them not to get generated at all if debugging is off).
#
tryflag CFLAGS_AUTO -fno-unwind-tables
tryflag CFLAGS_AUTO -fno-asynchronous-unwind-tables

I have not tracked down the build configuration for the musl platform; however, the second issue should be easily addressable in one of two ways:

  1. If musl is already being built, change the build flags for musl such that it always includes unwind tables
  2. Otherwise, begin building the copy of musl which is to be redistributed such that it may include unwind tables

Unwind tables are not included in musl because - as best I can guess - there is concern over memory utilization in extreme environments , and because it is further assumed that unwinding through libc is an unlikely case as this will most likely occur when unwinding from a signal handler - already uncommon - and only when the application is one that does unwinding which, while common, is still a subset of libc's consumers.

As backtraces are a first-class component of Rust's error handling design, and because this functionality works as intended and expected on -gnu - a Tier 1 platform - it seems reasonable to correct the behavior on -musl, particularly if it is indeed as straightforward as building without two flags. For users who require to exclude as much as is necessary from compiled artifacts, including unwind tables and the unwinder, it is still possible to strip the exception handling information at a later time - strip just needs to be configured to remove .eh_frame or .debug_info.

I can do the work if someone could point me in the direction of build configuration.

@RomanHargrave RomanHargrave added the C-bug Category: This is a bug. label Dec 21, 2024
@rustbot rustbot added the needs-triage This issue may need triage. Remove it if it has been sufficiently triaged. label Dec 21, 2024
@RomanHargrave RomanHargrave changed the title Musl does not include unwind tables -musl platforms do not include unwind tables for libc Dec 21, 2024
@workingjubilee workingjubilee added T-compiler Relevant to the compiler team, which will review and decide on the PR/issue. O-musl Target: The musl libc T-release Relevant to the release subteam, which will review and decide on the PR/issue. T-bootstrap Relevant to the bootstrap subteam: Rust's build system (x.py and src/bootstrap) and removed T-release Relevant to the release subteam, which will review and decide on the PR/issue. labels Dec 21, 2024
@hanna-kruppe
Copy link
Contributor

Capturing a stack trace from code running in a signal handler is not a great motivating example: as the comment in the example code notes, it’s not async-signal-safe, i.e., doing it from a signal handler is unsound. Better use cases do the stack walking externally, e.g., during interactive debugging, as part of a sampling profiler, or when analyzing coredumps. This requires the same changes to the libc (frame pointers or CFI).

@RomanHargrave
Copy link
Author

RomanHargrave commented Dec 21, 2024

Capturing a stack trace from code running in a signal handler is not a great motivating example: as the comment in the example code notes, it’s not async-signal-safe, i.e., doing it from a signal handler is unsound. Better use cases do the stack walking externally, e.g., during interactive debugging, as part of a sampling profiler, or when analyzing coredumps. This requires the same changes to the libc (frame pointers or CFI).

The given example uses the unsound approach because it is concise; however the basic libunwind _Unw_Backtrace is signal safe - by allocating space to store the backtrace in the instigating thread and passing it to the signal handler via siginfo, the backtrace can be captured safely, and then resolved to symbols by the instigating thread. I specifically chose to use this example because it demonstrates a scenario wherein the bundled libunwind has to unwind through libc, and the issue arises from the bundled libunwind being reliant upon exception handling information to do its job. Interactive debuggers are far more capable than libunwind, and seem to have no trouble handling missing CFI, which I would guess is done by analyzing code around the IP to figure out the stack layout. Interactive debuggers or even coredumps are not always an option in our case, and (as you point out) don't change anything about the lack of exception handling information.

Nonetheless, that is all discussion of the real-world applicability of the example. I suppose my point is that it best illustrates the symptoms of the problem while being as simplistic as possible. It may be possible to bring out this behavior with setcontext as well, but I have not looked into it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
C-bug Category: This is a bug. needs-triage This issue may need triage. Remove it if it has been sufficiently triaged. O-musl Target: The musl libc T-bootstrap Relevant to the bootstrap subteam: Rust's build system (x.py and src/bootstrap) T-compiler Relevant to the compiler team, which will review and decide on the PR/issue.
Projects
None yet
Development

No branches or pull requests

4 participants