-
Notifications
You must be signed in to change notification settings - Fork 628
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
MIPS32 support / alignment issues #3761
Comments
Okay, I did a lot more digging into this today. It turns out that it's not just MIPS32 that's having problems. There are a number of pointer misalignment issues within WAMR that I believe are causing runtime issues. Two examples to note are that tables are not aligned correctly (easily fixable), and the CSP related stuff (involving |
Ref: #2349 |
Possible ref: #2136 |
Hi, thanks for reporting the issue, I tried building iwasm under Ubuntu x86-64 with clang and enabled the sanitizer BTW, I have no mips environment to test the case, could you submit a PR to fix issues found? Thanks. |
For MIPS, a simple case is to just use the Zig toolchain to cross compile
and then run under qemu-mips. This is actually sufficient to trigger the
behaviour in question. You can use a build line like this:
CC="zig cc -target mips-linux-musl" cmake .. -DCMAKE_C_FLAGS="-g -Og -pie
-fPIE" -DCMAKE_BUILD_TYPE=Debug -DWAMR_BUILD_SHARED=0 -DWAMR_BUILD_INTERP=1
-DWAMR_BUILD_FAST_INTERP=0 -DWAMR_BUILD_JIT=0 -DWAMR_BUILD_FAST_JIT=0
-DWAMR_BUILD_TARGET=MIPS
Simply run the resulting iwasm with qemu:
qemu-mips iwasm hello_wasi.wasm
But actually, I can reliably produce assertion failures on x86_64 with the
following:
`CC="zig cc -target x86_64-linux-musl" cmake .. -DCMAKE_C_FLAGS="-g -Og
-pie -fPIE" -DCMAKE_BUILD_TYPE=Debug -DWAMR_BUILD_SHARED=0
-DWAMR_BUILD_INTERP=1 -DWAMR_BUILD_FAST_INTERP=0 -DWAMR_BUILD_JIT=0
-DWAMR_BUILD_FAST_JIT=0 -DWAMR_BUILD_TARGET=X86_64`
This one is a little more salient as it seems to build with assertions for
checking alignment of pointers. In particular, the CSP related code in the
interpreter is suspect:
https://github.com/bytecodealliance/wasm-micro-runtime/blob/fed0fe953ca32894f51179c97d3dcc9175456a37/core/iwasm/interpreter/wasm_interp_classic.c#L6640-L6643
Here, a value on the WASM stack is being treated as a raw pointer to a
WASMBranchBlock which on X86_64 has an alignment requirement of 8.
However, no such guarantee is provided by the stack. True, x86_64 will
handle this just fine, but there's no guarantee the compiler will.
Another example is in how tables are allocated in the runtime:
https://github.com/bytecodealliance/wasm-micro-runtime/blob/fed0fe953ca32894f51179c97d3dcc9175456a37/core/iwasm/interpreter/wasm_runtime.c#L2377-L2378
The definitions are as follows:
https://github.com/bytecodealliance/wasm-micro-runtime/blob/fed0fe953ca32894f51179c97d3dcc9175456a37/core/iwasm/interpreter/wasm_runtime.h#L377C12-L379C38
So, `global_data` is only a uint8_t* , and global_data_size does not appear
to have any guarantees about whether it is a multiple of
`align(WASMTableInstance)` (which must be
aligned to at least 8 on x86_64 due to the presence of pointers in the
type). Therefore `first_table` may not be suitably aligned. This one is
simpler to fix by just aligning it
and ensuring the allocation has enough room for it.
Hope this helps!
…On Tue, Sep 3, 2024 at 11:34 PM Wenyong Huang ***@***.***> wrote:
Okay, I did a lot more digging into this today. It turns out that it's not
just MIPS32 that's having problems. There are a number of pointer
misalignment issues within WAMR that I believe are causing runtime issues.
Two examples to note are that tables are not aligned correctly (easily
fixable), and the CSP related stuff (involving frame->sp_boundary) on the
WASM stack isn't aligned properly. That one seems a little less obvious to
fix. There still may be some endianness issues at play here too but the
pointer misalignment is definitely a higher priority to fix. I noticed that
the CI for the repo largely uses 32-bit x86 and RISC-V. The problems will
be much more apparent on platforms where sizeof(int) != sizeof(void*). On
x86_64, when I built in Debug mode on Clang 18, it seems that CMake enabled
some sanitizers which was enough to trigger the issue.
Hi, thanks for reporting the issue, I tried building iwasm under Ubuntu
x86-64 with clang and enabled the sanitizer ubsan: cmake ..
-DWAMR_BUILD_SANITIZER=ubsan, and then tested some standalone cases, but
no sanitizer error was reported. Could you tell us how to reproduce the
issue?
BTW, I have no mips environment to test the case, could you submit a PR to
fix issues found? Thanks.
—
Reply to this email directly, view it on GitHub
<#3761 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AB3VLDEZVGJEEJVUG2OKSDTZUZ5S3AVCNFSM6AAAAABNODZJ2CVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDGMRXHA2DKOJTGI>
.
You are receiving this because you authored the thread.Message ID:
***@***.***>
|
Hi, I tried to fix the issue with below patch, could you test whether it works for you: |
Hi, thanks for your time and response. I read the diff and it does go some
way towards solving the problem, but I have concerns about the
UINT64_MAX == UINTPTR_MAX && WASM_CPU_SUPPORTS_UNALIGNED_ADDR_ACCESS`
logic. The problem is that C does not support misaligned pointers at all,
even if the processor does.
It would be simpler and more correct to have one codepath, where all
pointers are suitably aligned for the
given architecture (i.e. remove the WASM_CPU_SUPPORTS_UNALIGNED_ADDR_ACCESS
define entirely).
In the case where things must be unaligned, or where alignment cannot be
guaranteed,
`memcpy` is useful to sidestep these restrictions. If the concern is
performance, the optimizer will fortunately optimize
all of it away into the code that you want anyways (you can use
__builtin_memcpy if you want to be absolutely sure).
For example, suppose a memcpy is used to copy a value into a variable with
higher alignment requirements, i.e. a uint64_t from the WASM stack. On an
architecture
like x86_64, this will optimize into one `mov` instruction no matter what.
On architectures that don't support
unaligned loads, this will break down into two 32-bit moves which is what
would be required for correctness anyway.
…On Thu, Sep 5, 2024 at 2:00 AM Wenyong Huang ***@***.***> wrote:
Hi, I tried to fix the issue with below patch, could you test whether it
works for you:
fix_unaligned_access.zip
<https://github.com/user-attachments/files/16884624/fix_unaligned_access.zip>
—
Reply to this email directly, view it on GitHub
<#3761 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AB3VLDHKUGBXTJ22NLR3GNDZU7XQVAVCNFSM6AAAAABNODZJ2CVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDGMZQGY3DOMBSGI>
.
You are receiving this because you authored the thread.Message ID:
***@***.***>
|
The WASM_CPU_SUPPORTS_UNALIGNED_ADDR_ACCESS is only enabled in x86-64, x86-32 and aarch64 by default, and it helps reduce the footprint in some situations, e.g., for generate less pre-compiled code in fast interpreter mode. Developer can also select whether to enable it or not, I don't think we had better remove it entirely. In the patch for mips32, you can remove it if you want. |
Unfortunately this is still not correct even on X86_64. For example, on
`-march=generic` for x86_64, it would be legal for the compiler to emit
`movaps` instructions if the type had an alignment requirement of 16 bytes
or higher. This would produce a fault if this condition was not met. Can
you provide some examples or ideally numbers on how much this reduces the
memory footprint of WAMR? I believe that if memory is truly that
constrained on x86-64, x86-32, or aarch64 that a memcpy would be sufficient
to ensure correct operation, even if it would potentially be slower (recall
that in the case where the pointer is correctly aligned, the memcpy would
be elided). The problem is that the standard requires that all pointers be
suitably aligned and this fact can be used by optimization passes, which
will lead to subtle bugs. Fortunately, this is fixable and is not a
fundamental part of the design of WAMR.
…On Fri, Sep 6, 2024 at 2:44 AM Wenyong Huang ***@***.***> wrote:
The WASM_CPU_SUPPORTS_UNALIGNED_ADDR_ACCESS is only enabled in x86-64,
x86-32 and aarch64 by default, and it helps reduce the footprint in some
situations, e.g., for generate less pre-compiled code in fast interpreter
mode. Developer can also select whether to enable it or not, I don't think
we had better remove it entirely. In the patch for mips32, you can remove
it if you want.
—
Reply to this email directly, view it on GitHub
<#3761 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AB3VLDARWNSLPDAPVRLBY63ZVFFOLAVCNFSM6AAAAABNODZJ2CVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDGMZTGM2TQNBWGA>
.
You are receiving this because you authored the thread.Message ID:
***@***.***>
|
I did a test for the coremark standalone case, disabling the macro while keeping storing the label address in the pre-compiled code (that means the label address will be stored at 8-byte align address and there may be unused paddings before it) will increase the pre-compiled size from 115925 bytes to 134596 bytes, and interpreter has to align to 8-byte aligned address before get the label address, which may impact the performance. Since it increases the code size a lot, currently in 64-bit when the macro is disabled, we store the offset of two labels instead (dest label - base label) and a add operation is added to calculate the label address after reading the offset, which also impacts the performance but gets better footprint. The wasm runtime itself doesn't uses The point is that the hardware supports the unaligned access operations and we can control it and it benefits to the developers, is it better to keep the capability? And runtime has provided the option to disable it. |
Hi Wenyong,
Thanks for your response, indeed a 16% change in footprint is higher than I
expected.
The point is that the hardware supports the unaligned access operations
and we can control it and it benefits the developers, why must we limit the
capability? And runtime has provided the option to disable it.
I suppose my main objection is that you can't really control it because the
compiler is free to interpret the code how it wants, even if you observe it
working now there is no guarantee it will work in all cases and on all
toolchains in the future, and the results can be surprising when the code
doesn't generate how you expect. Personally, I think the risks outweigh
the benefits (since there are now two code paths to support and UB is on by
default for some platforms) but I'll leave it to you to decide what's best
for your project. It is unfortunate that there's no way to signal to clang
or gcc to assume that a struct is not aligned other than by using the
packed attribute, which has its own problems here (can't take the address
of a member).
…On Mon, Sep 9, 2024 at 3:59 AM Wenyong Huang ***@***.***> wrote:
I did a test for the coremark standalone case, disabling the macro while
keeping storing the label address in the pre-compiled code (that means the
label address will be stored at 8-byte align address and there may be
unused paddings before it) will increase the pre-compiled size from 115925
bytes to 134596 bytes, and interpreter has to align to 8-byte aligned
address before get the label address, which may impact the performance.
Since it increases the code size a lot, currently in 64-bit when the macro
is disabled, we store the offset of two labels instead (dest label - base
label) and a add operation is added to calculate the label address after
reading the offset, which also impacts the performance but gets better
footprint.
The wasm runtime itself doesn't uses movaps, we didn't find that the
gcc/clang generates movaps instruction after compiling wamr runtime. And
in the generated AOT/JIT code, the movapu is emitted instead if the
address may be not 16-byte aligned.
The point is that the hardware supports the unaligned access operations
and we can control it and it benefits the developers, why must we limit the
capability? And runtime has provided the option to disable it.
—
Reply to this email directly, view it on GitHub
<#3761 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AB3VLDAQLL3L3AF5UTSBO7LZVVIO5AVCNFSM6AAAAABNODZJ2CVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDGMZXGM4TQNRSGE>
.
You are receiving this because you authored the thread.Message ID:
***@***.***>
|
Hello WAMR team,
Thanks for maintaining such a cool project. I have a MIPS related problem I was hoping you could help me with.
How well is MIPS32 supported in WAMR? I'm having a hard time getting a simple "Hello world" Rust wasip1 application running on mips32 (big endian). I'm noticing that on x86[-64], arm, and aarch64 everything seems to work fine, but on mips32, I'm getting an "undefined element" exception in both the Classic and Fast interpreter modes:
wasm-micro-runtime/core/iwasm/interpreter/wasm_interp_fast.c
Line 1705 in eab409a
and similarly for the classic interpreter here:
wasm-micro-runtime/core/iwasm/interpreter/wasm_interp_classic.c
Line 2340 in eab409a
What seems to be happening is the "val" variable in both cases is being dereferenced to the value
0x80000000
which is far outside of the table range in the Rust example. It seems this has to do with the handling of thecall_indirect
function, where it looks up from the wasm table where the call target will be.I wonder, is this an endian related issue? I haven't debugged it further to see what
val
should be on a normal execution.Now, Zig's "hello world" application seems to work just fine with target triple
wasm32-wasi-musl
. It seems to only feature onecall_indirect
.In addition to this, mips64 seems to SIGBUS on a misaligned access when running the Rust hello world (although not directly relevant to my use case). It seems I can't attach the .wasm here directly, but it is simple enough to reproduce: Just
cargo init
and then build for Rust usingcargo build --target wasm32-wasip1
.Thank you very much!
The text was updated successfully, but these errors were encountered: