You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
For architectures that selects USE_SWITCH (with the exception of ARM64, as it is excluded from validating spinlock in do_swap()), logging in spinlock-held context can cause recurring exception:
The point to that assert is to catch a case where something can reach a cooperative context switch while holding a nested lock. The z_swap() API family take locks that get released atomically on suspend (i.e. it's a condition variable), but the framework checks that the key passed will act to unmask interrupts entirely. If it won't, that means that the lock passed is an inner lock inside some other spinlock.
And if we context switch, we'll (by definition) break that lock and allow unlocked code run and interrupts to be serviced, which the outer context was promised wouldn't happen. It's a bad bug, thus the assertion.
This was a routine goof inside the kernel when doing the original SMP work. I'm at a loss for how logging would trip over it though. Can you work up a call tree that shows the problem?
Also: are all the extra delayed threads and the while loop in printf_fn() needed to show the bug? I'm a little confused as to what how they're involved.
Oh! I just saw that there's a partial stack trace in the report. It's showing a k_sleep() being invoked from main, presumably from inside LOG_INF() somewhere. Yeah, that's illegal. You can't sleep inside a lock, for obvious reasons. Why does logging need to sleep?
Describe the bug
For architectures that selects
USE_SWITCH
(with the exception of ARM64, as it is excluded from validating spinlock indo_swap()
), logging in spinlock-held context can cause recurring exception:The error happens when:
CONFIG_LOG=y
,CONFIG_SPIN_VALIDATE=y
,CONFIG_ASSERT=y
, andmsg_alloc() -> mpsc_pbuf_alloc()
taking this branch.The error is reproducible on:
sparc
(qemu_leon3
)riscv
(qemu_riscv64
)x86_64
64BIT-only (qemu_x86_64
)xtensa
should be affected, butqemu_xtensa
doesn't launch on my M1 work laptop¯\_(ツ)_/¯
)To Reproduce
We discovered this after studying an error in our application. To repro on upstream, apply the following diff:
Then run one of the following:
west build -b qemu_leon3 -p auto -t run zephyr/samples/hello_world
west build -b qemu_riscv64 -p auto -t run zephyr/samples/hello_world
west build -b qemu_x86_64 -p auto -t run zephyr/samples/hello_world
Expected behavior
The logging subsystem should work properly under stress and doesn't cause recurring exception.
Impact
Depending on configurations, devices can enter exception when there's a flood of log messages.
Logs and console output
Appended above
Environment (please complete the following information):
main
branch (v4.0.0-2696-g31ebd6036aa0
) built with Zephyr SDK 0.16.8The text was updated successfully, but these errors were encountered: