Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bug Report: RISC-V Load Access Fault Exception in NuttX Kernel #15177

Open
1 task done
GuoyuYin opened this issue Dec 13, 2024 · 0 comments
Open
1 task done

Bug Report: RISC-V Load Access Fault Exception in NuttX Kernel #15177

GuoyuYin opened this issue Dec 13, 2024 · 0 comments
Labels
Arch: risc-v Issues related to the RISC-V (32-bit or 64-bit) architecture Area: Kernel Kernel issues OS: Linux Issues related to Linux (building system, etc) Type: Bug Something isn't working

Comments

@GuoyuYin
Copy link

GuoyuYin commented Dec 13, 2024

Expected Behavior

The program should execute without any memory access violations or crashes. Specifically, the strnlen function should correctly calculate the length of the string provided as input.

Actual Behavior

Instead of executing normally, the program crashes due to a load access fault when trying to read a byte from what seems to be an invalid or inaccessible memory location.

Description

riscv_exception: BUG: EXCEPTION: Load access fault. MCAUSE: 0000000000000005, EPC: 0000000080006302, MTVAL: ffffffffffffffff

While running a fuzz test on the NuttX kernel using Syzkaller, an unexpected exception occurred. The system reported a Load access fault exception with the following details:
MCAUSE: 0x5 (Indicates a load access fault)
EPC: 0x80006302 (Program counter at the time of exception)
MTVAL: 0xffffffffffffffff (Value related to the exception, possibly invalid address)
Querying the nut-img file reveals that the exception happened during the execution of the strnlen function, specifically at offset +0x4a from its start 23. The error log indicates that the system was attempting to perform a load byte (lbu) operation from an address held in register s0, which appears to be causing the issue.
Note: __sanitizer_cov_trace_pc is our instrumentation function.

00000000800062de <strnlen>:
    800062de:    7179                    add    sp,sp,-48
    800062e0:    f022                    sd    s0,32(sp)
    800062e2:    ec26                    sd    s1,24(sp)
    800062e4:    e84a                    sd    s2,16(sp)
    800062e6:    f406                    sd    ra,40(sp)
    800062e8:    e44e                    sd    s3,8(sp)
    800062ea:    892a                    mv    s2,a0
    800062ec:    84ae                    mv    s1,a1
    800062ee:    842a                    mv    s0,a0
    800062f0:    12f000ef              jal    80006c1e <__sanitizer_cov_trace_pc>
    800062f4:    85a6                    mv    a1,s1
    800062f6:    4501                    li    a0,0
    800062f8:    389000ef              jal    80006e80 <__sanitizer_cov_trace_const_cmp8>
    800062fc:    c899                    beqz    s1,80006312 <strnlen+0x34>
    800062fe:    121000ef              jal    80006c1e <__sanitizer_cov_trace_pc>
    80006302:    00044983              lbu    s3,0(s0)
    80006306:    4501                    li    a0,0
    80006308:    85ce                    mv    a1,s3
    8000630a:    2db000ef              jal    80006de4 <__sanitizer_cov_trace_const_cmp1>
    8000630e:    00099d63              bnez    s3,80006328 <strnlen+0x4a>

Debug Logs

The debug logs show that the system was executing various system calls before encountering the exception. Upon reaching the strnlen function, it attempted to load a byte from an address stored in s0. However, this address appears to be invalid or out of bounds, resulting in the load access fault.

syz_setenv(0x9, 0xfffffff2, 0x4)

riscv_exception: BUG: EXCEPTION: Load access fault. MCAUSE: 0000000000000005, EPC: 0000000080006302, MTVAL: ffffffffffffffff
_assert: Current Version: NuttX  12.5.1 fc993539aa-dirty Aug 29 2024 05:46:59 risc-v
up_dump_register: EPC: 0000000080006302
up_dump_register: A4: 0000000000000053 A5: 00000000800503f4 A6: 000000000000011c A7: 00000000800433b0
up_dump_register: T4: 0000000000000000 T5: 0000000000000000 T6: 0000000000000000
up_dump_register: S4: 0000000000000000 S5: ffffffffffffffff S6: ffffffffffffffff S7: 0000000000000000
up_dump_register: SP: 00000000800521f0 FP: ffffffffffffffff TP: 0000000000000000 RA: 0000000080006302
dump_stack:   base: 0x80051970
dump_stack:     sp: 0x800521f0
stack_dump: 0x800521f0: 00000000800522e8 0000000000000000 000000000000000c 00000000800522e8 0000000000000000 000000008000484a 0000000000000001 0000000080007af8
stack_dump: 0x80052270: 0000000000000000 0000000000000000 0000000080035ed0 0000000000000050 000000008005041c 0000000000000000 0000000080052450 0000000080052428
stack_dump: 0x800522f0: 0000000080012f18 0000000080012ea6 0000000080005d9e 00000000800515f0 0000000000000001 0000000000000004 0000000080034b90 0000000080012696
stack_dump: 0x80052370: 0000000080052408 00000000800503f4 000000000000011c 00000000800433b0 000000008005041c 0000000000000000 0000000080052450 0000000080052428
stack_dump: 0x800523f0: 0000000a00002088 0000000000000000 0000000000000000 0000000020000000 0000000000000000 0000000000000000 0000000000000000 0000000000000000
stack_dump: 0x80052470: 0000000000000000 0000000000000000 0000000000000000 0000000000000000 0000000080051300 000000008000d564 0000000000000000 0000000000000001
stack_dump: 0x800524f0: 0000000080051950 000000008000a114 0000000000000000 0000000000000000 0000000000000000 0000000000000000 0000000000000000 0000000000000000
dump_tasks:    PID GROUP PRI POLICY   TYPE    NPX STATE   EVENT      SIGMASK          STACKBASE  STACKSIZE      USED   FILLED    COMMAND
dump_task:       0     0   0 FIFO     Kthread - Ready              0000000000000000 0x80050510      2032      1168    57.4%    Idle_Task

VM DIAGNOSIS:
14:44:17  Registers:
Failed readin g regs: dial tcp 127.0.0.1:24777: connect: connection refused
Failed readin g regs: dial tcp 127.0.0.1:24777: connect: connection refused

Before the exception occurs, the system is executing a series of system calls, including but not limited to syz_sem_getvalue, syz_mq_close, syz_vfork, syz_mq_send,syz_mq_timedsend, syz_setenv, and syz_timer_gettime. The last valid call number is #call_num = 18446744073709551615, which is actually the maximum value of an unsigned 64-bit integer, suggesting that there may be an overflow or invalid parameters used.

Based on the provided assembly code snippet for strnlen, the instruction at 0x80006302 is performing a lbu s3,0(s0) operation, which attempts to load an unsigned byte into register s3 from the address pointed to by s0. Given that MTVAL contains all ones (0xffffffffffffffff), it suggests that either the address being accessed does not exist or there is no valid mapping for it in the page tables, leading to the access fault.

Steps to Reproduce

To reproduce this issue, one can use Syzkaller to execute system calls against the NuttX kernel. The specific sequence leading up to the crash includes calls such as syz_sem_getvalue, syz_mq_close, syz_vfork, and others, culminating in the problematic call to strnlen.
The corresponding syscall specific implementation code is as follows:

//lab: H spec
#include "nuttx/sched.h"
// #include "sched/sched.h"
#include <sys/types.h>
#include <sched.h>
#include <time.h>
#include <signal.h>
#include <errno.h>
#include <sys/time.h>
#include "nuttx/irq.h"
#include "mqueue.h"
#include <fcntl.h>
#include <sys/stat.h>
#include <semaphore.h>
// #include <netinet/in.h>
// #include <arpa/inet.h>

#include <arpa/inet.h>
#include <netinet/in.h>
#include <netinet/tcp.h>
static long syz_setenv(volatile long name_ptr, volatile long value_ptr, volatile long overwrite)
{
    const char *name = (const char *)name_ptr;
    const char *value = (const char *)value_ptr;
    return (long)setenv(name, value, (int)overwrite);
}

Suggested Fix

A potential fix could involve ensuring that all pointers passed to functions like strnlen are properly validated before use. Additionally, implementing checks within the function itself to ensure that it does not attempt to access memory outside of allocated buffers might help prevent similar issues in the future.
The strnlen function is in libs/libc/machine/arch_libc.c with the following code:

#ifdef CONFIG_LIBC_ARCH_STRNLEN
size_t strnlen(FAR const char *s, size_t maxlen)
{
  size_t ret = ARCH_LIBCFUN(strnlen)(s, maxlen);
#  ifdef CONFIG_MM_KASAN
#    ifndef CONFIG_MM_KASAN_DISABLE_READS_CHECK
  __asan_loadN((FAR void *)s, ret);
#    endif
#  endif

  return ret;
}
#endif

The ASan check in the current code assumes that ret is the valid length value returned from strnlen. However, if the incoming string s does not end in \0 or exceeds maxlen, then ret may not accurately reflect the actual string length. In addition, calls to __asan_loadN may cause unnecessary performance overhead or false positives. Therefore, it is recommended that ASan checking be enabled only when it is really needed, and only for memory regions that are known to be valid.

#ifdef CONFIG_MM_KASAN
# ifndef CONFIG_MM_KASAN_DISABLE_READS_CHECK
if (ret > maxlen) {
    ret = maxlen; // Ensure we do not report more than maxlen bytes read.
}
__asan_loadN((FAR void *)s, ret);
# endif
#endif

Although strnlen itself is a relatively simple function, care needs to be taken to handle possible boundary cases, such as the null pointer NULL, and while passing in NULL should be considered a programming error in practice, at the kernel level proper error handling can prevent system crashes or other serious consequences.

if (!s) {
    return 0;
}

On which OS does this issue occur?

[OS: Linux]

What is the version of your OS?

Ubuntu 20.04

NuttX Version

2ff2b82

Issue Architecture

[Arch: risc-v]

Issue Area

[Area: Kernel]

Verification

  • I have verified before submitting the report.
@GuoyuYin GuoyuYin added the Type: Bug Something isn't working label Dec 13, 2024
@github-actions github-actions bot added Arch: risc-v Issues related to the RISC-V (32-bit or 64-bit) architecture Area: Kernel Kernel issues OS: Linux Issues related to Linux (building system, etc) labels Dec 13, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Arch: risc-v Issues related to the RISC-V (32-bit or 64-bit) architecture Area: Kernel Kernel issues OS: Linux Issues related to Linux (building system, etc) Type: Bug Something isn't working
Projects
None yet
Development

No branches or pull requests

1 participant