Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Assembly for Rc::<str>::default() looks quite inefficient #135784

Open
jhorstmann opened this issue Jan 20, 2025 · 1 comment
Open

Assembly for Rc::<str>::default() looks quite inefficient #135784

jhorstmann opened this issue Jan 20, 2025 · 1 comment
Labels
C-optimization Category: An issue highlighting optimization opportunities or PRs implementing such T-compiler Relevant to the compiler team, which will review and decide on the PR/issue.

Comments

@jhorstmann
Copy link
Contributor

I tried this code:

#[no_mangle]
pub fn rc_str() -> std::rc::Rc<str> {
    std::rc::Rc::<str>::default()
}

I expected this to allocate with a known layout and and handle potential allocation errors.

Instead, before the allocation, the assembly contains not one, but two calls to rc_inner_layout_for_value_layout (indirect via call r15) (compiler explorer):

rc_str:
        push    r15
        push    r14
        push    rbx
        mov     r15, qword ptr [rip + alloc::rc::rc_inner_layout_for_value_layout::h02e271ca20ae9f03@GOTPCREL]
        mov     edi, 1
        xor     esi, esi
        call    r15
        mov     rbx, rax
        mov     r14, rdx
        mov     edi, 1
        xor     esi, esi
        call    r15
        test    rdx, rdx
        je      .LBB0_2
        mov     rcx, qword ptr [rip + __rust_no_alloc_shim_is_unstable@GOTPCREL]
        movzx   ecx, byte ptr [rcx]
        mov     rdi, rdx
        mov     rsi, rax
        call    qword ptr [rip + __rust_alloc@GOTPCREL]
.LBB0_2:
        test    rax, rax
        je      .LBB0_4
        mov     qword ptr [rax], 1
        mov     qword ptr [rax + 8], 1
        xor     edx, edx
        pop     rbx
        pop     r14
        pop     r15
        ret
.LBB0_4:
        mov     rdi, rbx
        mov     rsi, r14
        call    qword ptr [rip + alloc::alloc::handle_alloc_error::ha0642996a160fa99@GOTPCREL]

The two calls seem to correspond to the source code (line 257 and 2076), but I would have expected the compiler to be smart enough to see those should be the same, and even to compute that value at compile time.

Meta

rustc --version --verbose:

rustc 1.86.0-nightly (6067b3631 2025-01-17)
binary: rustc
commit-hash: 6067b36314ab5eb2eb47cecc464545ba58e1ad24
commit-date: 2025-01-17
host: x86_64-unknown-linux-gnu
release: 1.86.0-nightly
LLVM version: 19.1.7
Compiler returned: 0

@jhorstmann jhorstmann added the C-bug Category: This is a bug. label Jan 20, 2025
@rustbot rustbot added the needs-triage This issue may need triage. Remove it if it has been sufficiently triaged. label Jan 20, 2025
@jieyouxu jieyouxu added T-compiler Relevant to the compiler team, which will review and decide on the PR/issue. C-optimization Category: An issue highlighting optimization opportunities or PRs implementing such and removed C-bug Category: This is a bug. needs-triage This issue may need triage. Remove it if it has been sufficiently triaged. labels Jan 21, 2025
@Kijewski
Copy link
Contributor

[u8] has the same layout as str, doesn't it? Then this should be sane:

pub fn rc_str_from_bytes() -> Rc<str> {
    let rc = Rc::<[u8]>::default();
    unsafe { Rc::from_raw(Rc::into_raw(rc) as *const [u8] as *const str) }
}

It produces only one call (for the non-OOM case) instead of two: https://godbolt.org/z/aMGr4eqxT

example::rc_str_from_bytes::hba4560ddf8c19a90:
        push    rax
        mov     rax, qword ptr [rip + __rust_no_alloc_shim_is_unstable@GOTPCREL]
        movzx   eax, byte ptr [rax]
        mov     edi, 16
        mov     esi, 8
        call    qword ptr [rip + __rust_alloc@GOTPCREL]
        test    rax, rax
        je      .LBB1_2
        mov     qword ptr [rax], 1
        mov     qword ptr [rax + 8], 1
        xor     edx, edx
        pop     rcx
        ret
.LBB1_2:
        mov     edi, 8
        mov     esi, 16
        call    qword ptr [rip + alloc::alloc::handle_alloc_error::ha0547c441587f574@GOTPCREL]

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
C-optimization Category: An issue highlighting optimization opportunities or PRs implementing such T-compiler Relevant to the compiler team, which will review and decide on the PR/issue.
Projects
None yet
Development

No branches or pull requests

4 participants