Skip to content

Extra bitwise operations break store merging, resulting in very slow code #156316

@rdoeffinger

Description

@rdoeffinger

Normally storing a 64-bit value "by hand" byte by byte into a byte array is recognized and results in a single store (see code generated for write_good below).
However doing some integer operations on the value first breaks that optimization, resulting in 17 to 25 instructions instead of 3 to 4.
I originally assume it was due to the shift merging, but it also happens without the shift.
Checked both aarch64 and x86_64.
gcc handles both cases fine since gcc 8.

void write_bad(unsigned char *buffer, unsigned char a, unsigned long long b)
{
    unsigned long long v = (a & 0xf) | (b << 5);
    buffer[0] = v;
    buffer[1] = v >> 8;
    buffer[2] = v >> 16;
    buffer[3] = v >> 24;
    buffer[4] = v >> 32;
    buffer[5] = v >> 40;
    buffer[6] = v >> 48;
    buffer[7] = v >> 56;
}

void write_good(unsigned char *buffer, unsigned long long v)
{
    buffer[0] = v;
    buffer[1] = v >> 8;
    buffer[2] = v >> 16;
    buffer[3] = v >> 24;
    buffer[4] = v >> 32;
    buffer[5] = v >> 40;
    buffer[6] = v >> 48;
    buffer[7] = v >> 56;
}

Generated code according to godbolt.org:

write_bad:
        and     sil, 15
        mov     eax, edx
        shl     eax, 5
        or      al, sil
        mov     byte ptr [rdi], al
        mov     eax, edx
        shr     eax, 3
        mov     byte ptr [rdi + 1], al
        mov     eax, edx
        shr     eax, 11
        mov     byte ptr [rdi + 2], al
        mov     eax, edx
        shr     eax, 19
        mov     byte ptr [rdi + 3], al
        mov     rax, rdx
        shr     rax, 27
        mov     byte ptr [rdi + 4], al
        mov     rax, rdx
        shr     rax, 35
        mov     byte ptr [rdi + 5], al
        mov     rax, rdx
        shr     rax, 43
        mov     byte ptr [rdi + 6], al
        shr     rdx, 51
        mov     byte ptr [rdi + 7], dl
        ret

write_good:
        mov     qword ptr [rdi], rsi
        ret

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions