Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CPU lockup on high load #92

Open
felixonmars opened this issue Dec 23, 2023 · 2 comments
Open

CPU lockup on high load #92

felixonmars opened this issue Dec 23, 2023 · 2 comments

Comments

@felixonmars
Copy link

felixonmars commented Dec 23, 2023

I am getting another 2-way server crash with 6.1.61 kernel compiled myself at commit db74e75. The server was having high load (~500) at the moment.

[44855.988161] INFO: task iou-sqp-533883:533890 blocked for more than 123 seconds.
[44855.995583]       Not tainted 6.1.61-1.1-sophgo-multi-08348-gdb74e759247f #1
[44856.002700] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[44856.010596] task:iou-sqp-533883  state:D stack:0     pid:533890 ppid:533873 flags:0x00000100
[44856.019108] Call Trace:
[44856.021589] [<ffffffff809f5c36>] __schedule+0x34e/0x10b0
[44856.026967] [<ffffffff809f69e6>] schedule+0x4e/0xd0
[44856.031898] [<ffffffff8001f8c0>] do_exit+0xe6/0x922
[44856.036836] [<ffffffff80470426>] io_sq_thread_unpark+0x0/0x52
[44856.042643] [<ffffffff80003f18>] ret_from_exception+0x0/0x16
[44856.050530] INFO: task node:533891 blocked for more than 123 seconds.
[44856.058874]       Not tainted 6.1.61-1.1-sophgo-multi-08348-gdb74e759247f #1
[44856.067643] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[44856.077570] task:node            state:D stack:0     pid:533891 ppid:533873 flags:0x00000100
[44856.088033] Call Trace:
[44856.092551] [<ffffffff809f5c36>] __schedule+0x34e/0x10b0
[44856.099618] [<ffffffff809f69e6>] schedule+0x4e/0xd0
[44856.106508] [<ffffffff8001f8c0>] do_exit+0xe6/0x922
[44856.112896] [<ffffffff800202ac>] do_group_exit+0x34/0x84
[44856.120305] [<ffffffff8002e6ec>] get_signal+0x950/0x97c
[44856.127454] [<ffffffff80005b04>] do_work_pending+0x11a/0x514
[44856.135004] [<ffffffff80003f9e>] resume_userspace_slow+0xc/0xe
[44856.144859] INFO: task iou-sqp-533891:533897 blocked for more than 123 seconds.
[44856.158078]       Not tainted 6.1.61-1.1-sophgo-multi-08348-gdb74e759247f #1
[44856.171042] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[44856.184319] task:iou-sqp-533891  state:D stack:0     pid:533897 ppid:533873 flags:0x00000100
[44856.198653] Call Trace:
[44856.206482] [<ffffffff809f5c36>] __schedule+0x34e/0x10b0
[44856.217475] [<ffffffff809f69e6>] schedule+0x4e/0xd0
[44856.227726] [<ffffffff8001f8c0>] do_exit+0xe6/0x922
[44856.237960] [<ffffffff80470426>] io_sq_thread_unpark+0x0/0x52
[44856.248980] [<ffffffff80003f18>] ret_from_exception+0x0/0x16
[44856.259576] INFO: task node:533898 blocked for more than 123 seconds.
[44856.271175]       Not tainted 6.1.61-1.1-sophgo-multi-08348-gdb74e759247f #1
[44856.283175] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[44856.296057] task:node            state:D stack:0     pid:533898 ppid:533873 flags:0x00000100
[44856.309608] Call Trace:
[44856.317249] [<ffffffff809f5c36>] __schedule+0x34e/0x10b0
[44856.327718] [<ffffffff809f69e6>] schedule+0x4e/0xd0
[44856.337731] [<ffffffff8001f8c0>] do_exit+0xe6/0x922
[44856.347760] [<ffffffff800202ac>] do_group_exit+0x34/0x84
[44856.357977] [<ffffffff8002e6ec>] get_signal+0x950/0x97c
[44856.367954] [<ffffffff80005b04>] do_work_pending+0x11a/0x514
[44856.378642] [<ffffffff80003f9e>] resume_userspace_slow+0xc/0xe
[44856.389253] INFO: task node:533900 blocked for more than 124 seconds.
[44856.400484]       Not tainted 6.1.61-1.1-sophgo-multi-08348-gdb74e759247f #1
[44856.412222] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[44856.424653] task:node            state:D stack:0     pid:533900 ppid:533873 flags:0x00000100
[44856.438093] Call Trace:
[44856.445409] [<ffffffff809f5c36>] __schedule+0x34e/0x10b0
[44856.455309] [<ffffffff809f69e6>] schedule+0x4e/0xd0
[44856.464535] [<ffffffff8001f8c0>] do_exit+0xe6/0x922
[44856.473732] [<ffffffff800202ac>] do_group_exit+0x34/0x84
[44856.483670] [<ffffffff8002e6ec>] get_signal+0x950/0x97c
[44856.493605] [<ffffffff80005b04>] do_work_pending+0x11a/0x514
[44856.503825] [<ffffffff80003f9e>] resume_userspace_slow+0xc/0xe
[44856.514267] INFO: task node:533901 blocked for more than 124 seconds.
[44856.525292]       Not tainted 6.1.61-1.1-sophgo-multi-08348-gdb74e759247f #1
[44856.537294] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[44856.550119] task:node            state:D stack:0     pid:533901 ppid:533873 flags:0x00000100
[44856.563940] Call Trace:
[44856.571408] [<ffffffff809f5c36>] __schedule+0x34e/0x10b0
[44856.582214] [<ffffffff809f69e6>] schedule+0x4e/0xd0
[44856.592102] [<ffffffff8001f8c0>] do_exit+0xe6/0x922
[44856.601798] [<ffffffff800202ac>] do_group_exit+0x34/0x84
[44856.612306] [<ffffffff8002e6ec>] get_signal+0x950/0x97c
[44856.622423] [<ffffffff80005b04>] do_work_pending+0x11a/0x514
[44856.633127] [<ffffffff80003f9e>] resume_userspace_slow+0xc/0xe
[44856.648942] INFO: task node:533902 blocked for more than 124 seconds.
[44856.660497]       Not tainted 6.1.61-1.1-sophgo-multi-08348-gdb74e759247f #1
[44856.672258] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[44856.685118] task:node            state:D stack:0     pid:533902 ppid:533873 flags:0x00000100
[44856.698754] Call Trace:
[44856.706235] [<ffffffff809f5c36>] __schedule+0x34e/0x10b0
[44856.716664] [<ffffffff809f69e6>] schedule+0x4e/0xd0
[44856.726356] [<ffffffff8001f8c0>] do_exit+0xe6/0x922
[44856.736209] [<ffffffff800202ac>] do_group_exit+0x34/0x84
[44856.746764] [<ffffffff8002e6ec>] get_signal+0x950/0x97c
[44856.756868] [<ffffffff80005b04>] do_work_pending+0x11a/0x514
[44856.767551] [<ffffffff80003f9e>] resume_userspace_slow+0xc/0xe
[44856.778183] INFO: task iou-sqp-533883:534374 blocked for more than 124 seconds.
[44856.790332]       Not tainted 6.1.61-1.1-sophgo-multi-08348-gdb74e759247f #1
[44856.802514] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[44856.815526] task:iou-sqp-533883  state:D stack:0     pid:534374 ppid:533873 flags:0x00000100
[44856.829018] Call Trace:
[44856.836263] [<ffffffff809f5c36>] __schedule+0x34e/0x10b0
[44856.846499] [<ffffffff809f69e6>] schedule+0x4e/0xd0
[44856.856378] [<ffffffff8001f8c0>] do_exit+0xe6/0x922
[44856.866038] [<ffffffff80470426>] io_sq_thread_unpark+0x0/0x52
[44856.876731] [<ffffffff80003f18>] ret_from_exception+0x0/0x16
[44856.887184] INFO: task node:534399 blocked for more than 124 seconds.
[44856.898487]       Not tainted 6.1.61-1.1-sophgo-multi-08348-gdb74e759247f #1
[44856.910490] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[44856.923698] task:node            state:D stack:0     pid:534399 ppid:533873 flags:0x00000100
[44856.937177] Call Trace:
[44856.944825] [<ffffffff809f5c36>] __schedule+0x34e/0x10b0
[44856.955026] [<ffffffff809f69e6>] schedule+0x4e/0xd0
[44856.964958] [<ffffffff8001f8c0>] do_exit+0xe6/0x922
[44856.974861] [<ffffffff800202ac>] do_group_exit+0x34/0x84
[44856.985292] [<ffffffff8002e6ec>] get_signal+0x950/0x97c
[44856.995399] [<ffffffff80005b04>] do_work_pending+0x11a/0x514
[44857.005638] [<ffffffff80003f9e>] resume_userspace_slow+0xc/0xe
[44857.016827] INFO: task iou-wrk-534374:534651 blocked for more than 124 seconds.
[44857.029491]       Not tainted 6.1.61-1.1-sophgo-multi-08348-gdb74e759247f #1
[44857.041451] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[44857.054436] task:iou-wrk-534374  state:D stack:0     pid:534651 ppid:533873 flags:0x00000100
[44857.067871] Call Trace:
[44857.075487] [<ffffffff809f5c36>] __schedule+0x34e/0x10b0
[44857.086155] [<ffffffff809f69e6>] schedule+0x4e/0xd0
[44857.096097] [<ffffffff8001f8c0>] do_exit+0xe6/0x922
[44857.105750] [<ffffffff804799aa>] io_wqe_worker+0x316/0x360
[44857.116339] [<ffffffff80003f18>] ret_from_exception+0x0/0x16
[45308.473875] watchdog: BUG: soft lockup - CPU#9 stuck for 22s! [migration/9:72]
[45308.482500] Modules linked in: sctp ip6_udp_tunnel udp_tunnel joydev tun cfg80211 rfkill xt_MASQUERADE iptable_nat nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 libcrc32c xt_TCPMSS xt_tcpudp iptable_filter vfat fat ixgbe ofpart mdio_devres ipmi_si sophgo_spifmc of_mdio spi_nor fixed_phy ipmi_devintf fwnode_mdio igb libphy 8250_dw ipmi_msghandler mtd gpio_dwapb mousedev switchtec mdio uio_pdrv_genirq uio tcp_bbr sch_fq fuse dm_mod loop nfnetlink bpf_preload ip_tables x_tables ext4 crc32c_generic crc16 mbcache jbd2 mmc_block usbhid sdhci_sophgo ast sdhci_pltfm sdhci drm_vram_helper nvme spi_dw_mmio drm_ttm_helper mmc_core nvme_core gpio_keys spi_dw xhci_pci ttm xhci_pci_renesas nvme_common
[45308.503921] watchdog: BUG: soft lockup - CPU#41 stuck for 22s! [migration/41:266]
[45308.553830] CPU: 9 PID: 72 Comm: migration/9 Not tainted 6.1.61-1.1-sophgo-multi-08348-gdb74e759247f #1 d71360b18e23f727e028cd5c570bc8e9bdca43d0
[45308.555467] Modules linked in:
[45308.558366] Hardware name: Sophgo Mango (DT)
[45308.558372] Stopper: multi_cpu_stop+0x0/0x172 <- migrate_swap+0xbe/0x158
[45308.565850]  sctp
[45308.567536] epc : rcu_momentary_dyntick_idle+0x3a/0x80
[45308.568959]  ip6_udp_tunnel
[45308.570425]  ra : multi_cpu_stop+0xb8/0x172
[45308.571994]  udp_tunnel
[45308.573618] epc : ffffffff8009ab24 ra : ffffffff800ef2cc sp : ffffffc80a96bd90
[45308.575680]  joydev
[45308.577162]  gp : ffffffff81bac808 tp : ffffffdffeb61f80 t0 : 0000000000000080
[45308.578526]  tun
[45308.580220]  t1 : 0000000001806000 t2 : 0000000000000000 s0 : ffffffc80a96be10
[45308.581513]  cfg80211
[45308.583208]  s1 : ffffffc84017b9f0 a0 : ffffffff80e0dc60 a1 : 0000000000000002
[45308.584760]  rfkill
[45308.586244]  a2 : ffffffc84017ba18 a3 : ffffffff81c089c8 a4 : 000000004c6f3bd4
[45308.587851]  xt_MASQUERADE
[45308.589301]  a5 : fffffff5db0d3cc0 a6 : 0000000000000001 a7 : 0000000000000000
[45308.589306]  s2 : ffffffc84017ba14 s3 : ffffffffffffffff s4 : ffffffff80e0dc60
[45308.589310]  s5 : 0000000000000001 s6 : 0000000000000000 s7 : 0000000000000002
[45308.589314]  s8 : 0000000000000004 s9 : 0000000000000001 s10: 0000000000000003
[45308.589318]  s11: 0000000000000001 t3 : 0000000000000000 t4 : 000000000000b60e
[45308.591471]  iptable_nat
[45308.593270]  t5 : 000000fc00000000 t6 : 0000000000000001
[45308.595003]  nf_nat
[45308.596607] status: 0000000200000120 badaddr: 0000000000000000 cause: 8000000000000001
[45308.598327]  nf_conntrack
[45308.599910] [<ffffffff8009ab24>] rcu_momentary_dyntick_idle+0x3a/0x80
[45308.601492]  nf_defrag_ipv6
[45308.602925] [<ffffffff800eee3a>] cpu_stopper_thread+0xfc/0x182
[45308.604631]  nf_defrag_ipv4
[45308.606231] [<ffffffff80044e76>] smpboot_thread_fn+0xe6/0x11a
[45308.607673]  libcrc32c
[45308.609194] [<ffffffff80040aee>] kthread+0xbe/0xd4
[45308.610837]  xt_TCPMSS
[45308.612224] [<ffffffff80003f18>] ret_from_exception+0x0/0x16
[45308.613836]  xt_tcpudp iptable_filter vfat fat ixgbe ofpart mdio_devres ipmi_si sophgo_spifmc of_mdio spi_nor fixed_phy ipmi_devintf fwnode_mdio igb libphy 8250_dw ipmi_msghandler mtd gpio_dwapb mousedev switchtec mdio uio_pdrv_genirq uio tcp_bbr sch_fq fuse dm_mod loop nfnetlink bpf_preload ip_tables x_tables ext4 crc32c_generic crc16 mbcache jbd2 mmc_block usbhid sdhci_sophgo ast sdhci_pltfm sdhci drm_vram_helper nvme spi_dw_mmio drm_ttm_helper mmc_core nvme_core gpio_keys spi_dw xhci_pci ttm xhci_pci_renesas nvme_common
[45308.865495] CPU: 41 PID: 266 Comm: migration/41 Tainted: G             L     6.1.61-1.1-sophgo-multi-08348-gdb74e759247f #1 d71360b18e23f727e028cd5c570bc8e9bdca43d0
[45308.883578] Hardware name: Sophgo Mango (DT)
[45308.889560] Stopper: multi_cpu_stop+0x0/0x172 <- migrate_swap+0xbe/0x158
[45308.897843] epc : rcu_momentary_dyntick_idle+0x3a/0x80
[45308.904524]  ra : multi_cpu_stop+0xb8/0x172
[45308.910231] epc : ffffffff8009ab24 ra : ffffffff800ef2cc sp : ffffffc80af7bd90
[45308.919232]  gp : ffffffff81bac808 tp : ffffffeffdf23f00 t0 : 0000000000000080
[45308.927724]  t1 : 0000000001806000 t2 : 0000000000000001 s0 : ffffffc80af7be10
[45308.936569]  s1 : ffffffc83f8f39f0 a0 : ffffffff80e0db30 a1 : 0000000000000002
[45308.945430]  a2 : ffffffc83f8f3a18 a3 : ffffffff81c089c8 a4 : 000000001de9706c
[45308.954400]  a5 : fffffff5db473cc0 a6 : 0000000000000001 a7 : 0000000000000000
[45308.963118]  s2 : ffffffc83f8f3a14 s3 : ffffffffffffffff s4 : ffffffff80e0db30
[45308.971816]  s5 : 0000000000000001 s6 : 0000000000000000 s7 : 0000000000000002
[45308.980622]  s8 : 0000000000000004 s9 : 0000000000000001 s10: 0000000000000003
[45308.989427]  s11: 0000000000000001 t3 : 0000000000000000 t4 : 0000000000000007
[45308.998182]  t5 : 0000000000000005 t6 : 000000000000ffff
[45309.005003] status: 0000000200000120 badaddr: 0000000000000000 cause: 8000000000000001
[45309.014689] [<ffffffff8009ab24>] rcu_momentary_dyntick_idle+0x3a/0x80
[45309.022893] [<ffffffff800eee3a>] cpu_stopper_thread+0xfc/0x182
[45309.030131] [<ffffffff80044e76>] smpboot_thread_fn+0xe6/0x11a
[45309.037489] [<ffffffff80040aee>] kthread+0xbe/0xd4
[45309.043803] [<ffffffff80003f18>] ret_from_exception+0x0/0x16
[45316.540733] watchdog: BUG: soft lockup - CPU#71 stuck for 22s! [migration/71:447]
[45316.552469] Modules linked in: sctp ip6_udp_tunnel udp_tunnel joydev tun cfg80211 rfkill xt_MASQUERADE iptable_nat nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 libcrc32c xt_TCPMSS xt_tcpudp iptable_filter vfat fat ixgbe ofpart mdio_devres ipmi_si sophgo_spifmc of_mdio spi_nor fixed_phy ipmi_devintf fwnode_mdio igb libphy 8250_dw ipmi_msghandler mtd gpio_dwapb mousedev switchtec mdio uio_pdrv_genirq uio tcp_bbr sch_fq fuse dm_mod loop nfnetlink bpf_preload ip_tables x_tables ext4 crc32c_generic crc16 mbcache jbd2 mmc_block usbhid sdhci_sophgo ast sdhci_pltfm sdhci drm_vram_helper nvme spi_dw_mmio drm_ttm_helper mmc_core nvme_core gpio_keys spi_dw xhci_pci ttm xhci_pci_renesas nvme_common
[45316.557529] watchdog: BUG: soft lockup - CPU#80 stuck for 23s! [migration/80:502]
[45316.560695] watchdog: BUG: soft lockup - CPU#83 stuck for 22s! [migration/83:520]
[45316.560735] Modules linked in: sctp ip6_udp_tunnel udp_tunnel joydev tun cfg80211 rfkill xt_MASQUERADE iptable_nat nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 libcrc32c xt_TCPMSS xt_tcpudp iptable_filter vfat fat ixgbe ofpart mdio_devres ipmi_si sophgo_spifmc of_mdio spi_nor fixed_phy ipmi_devintf fwnode_mdio igb libphy 8250_dw ipmi_msghandler mtd gpio_dwapb mousedev switchtec mdio uio_pdrv_genirq uio tcp_bbr sch_fq fuse dm_mod loop nfnetlink bpf_preload ip_tables x_tables ext4 crc32c_generic crc16 mbcache jbd2 mmc_block usbhid sdhci_sophgo ast sdhci_pltfm sdhci drm_vram_helper nvme spi_dw_mmio drm_ttm_helper mmc_core nvme_core gpio_keys spi_dw xhci_pci ttm xhci_pci_renesas nvme_common
[45316.561113] CPU: 83 PID: 520 Comm: migration/83 Tainted: G             L     6.1.61-1.1-sophgo-multi-08348-gdb74e759247f #1 d71360b18e23f727e028cd5c570bc8e9bdca43d0
[45316.561143] Hardware name: Sophgo Mango (DT)
[45316.561151] Stopper: multi_cpu_stop+0x0/0x172 <- migrate_swap+0xbe/0x158
[45316.561210] epc : rcu_momentary_dyntick_idle+0x3a/0x80
[45316.561226]  ra : multi_cpu_stop+0xb8/0x172
[45316.561236] epc : ffffffff8009ab24 ra : ffffffff800ef2cc sp : ffffffc80b76bd90
[45316.561246]  gp : ffffffff81bac808 tp : ffffffd8ff795e80 t0 : 0000000000000080
[45316.561253]  t1 : 0000000001806000 t2 : 0000000000000000 s0 : ffffffc80b76be10
[45316.561261]  s1 : ffffffc8352fb9f0 a0 : ffffffff80e0da90 a1 : 0000000000000002
[45316.561267]  a2 : ffffffc8352fba18 a3 : ffffffff81c089c8 a4 : ffffffffb61fbc6c
[45316.561276]  a5 : fffffff5db935cc0 a6 : 0000000000000001 a7 : 0000000000000000
[45316.561283]  s2 : ffffffc8352fba14 s3 : ffffffffffffffff s4 : ffffffff80e0da90
[45316.561289]  s5 : 0000000000000001 s6 : 0000000000000000 s7 : 0000000000000002
[45316.561294]  s8 : 0000000000000004 s9 : 0000000000000001 s10: 0000000000000003
[45316.561300]  s11: 0000000000000001 t3 : 0000000000000000 t4 : 0000000000000005
[45316.561307]  t5 : 0000000000000002 t6 : 000000000000010a
[45316.561312] status: 0000000200000120 badaddr: 0000000000000000 cause: 8000000000000001
[45316.561321] [<ffffffff8009ab24>] rcu_momentary_dyntick_idle+0x3a/0x80
[45316.561334] [<ffffffff800eee3a>] cpu_stopper_thread+0xfc/0x182
[45316.561344] [<ffffffff80044e76>] smpboot_thread_fn+0xe6/0x11a
[45316.561359] [<ffffffff80040aee>] kthread+0xbe/0xd4
[45316.561368] [<ffffffff80003f18>] ret_from_exception+0x0/0x16
[45316.572736] CPU: 71 PID: 447 Comm: migration/71 Tainted: G             L     6.1.61-1.1-sophgo-multi-08348-gdb74e759247f #1 d71360b18e23f727e028cd5c570bc8e9bdca43d0
[45316.577483] Modules linked in:
[45316.581746] Hardware name: Sophgo Mango (DT)
[45316.603226]  sctp
[45316.611874] Stopper: multi_cpu_stop+0x0/0x172 <- migrate_swap+0xbe/0x158
[45316.616515]  ip6_udp_tunnel
[45316.621297] epc : rcu_momentary_dyntick_idle+0x3a/0x80
[45316.625749]  udp_tunnel
[45316.630454]  ra : multi_cpu_stop+0xb8/0x172
[45316.634929]  joydev
[45316.639520] epc : ffffffff8009ab24 ra : ffffffff800ef2cc sp : ffffffc80b523d90
[45316.644036]  tun
[45316.647471] watchdog: BUG: soft lockup - CPU#125 stuck for 23s! [migration/125:775]
[45316.647580] Modules linked in: sctp ip6_udp_tunnel udp_tunnel joydev tun cfg80211 rfkill xt_MASQUERADE iptable_nat nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 libcrc32c xt_TCPMSS xt_tcpudp iptable_filter vfat fat ixgbe ofpart mdio_devres ipmi_si sophgo_spifmc of_mdio spi_nor fixed_phy ipmi_devintf fwnode_mdio igb libphy 8250_dw ipmi_msghandler mtd gpio_dwapb mousedev switchtec mdio uio_pdrv_genirq uio tcp_bbr sch_fq fuse dm_mod loop nfnetlink bpf_preload ip_tables x_tables ext4 crc32c_generic crc16 mbcache jbd2 mmc_block usbhid sdhci_sophgo ast sdhci_pltfm sdhci drm_vram_helper nvme spi_dw_mmio drm_ttm_helper mmc_core nvme_core gpio_keys spi_dw xhci_pci ttm xhci_pci_renesas nvme_common
[45316.648203] CPU: 125 PID: 775 Comm: migration/125 Tainted: G             L     6.1.61-1.1-sophgo-multi-08348-gdb74e759247f #1 d71360b18e23f727e028cd5c570bc8e9bdca43d0
[45316.648254] Hardware name: Sophgo Mango (DT)
[45316.648266] Stopper: multi_cpu_stop+0x0/0x172 <- migrate_swap+0xbe/0x158
[45316.648351] epc : rcu_momentary_dyntick_idle+0x3a/0x80
[45316.648380]  ra : multi_cpu_stop+0xb8/0x172
[45316.648398] epc : ffffffff8009ab24 ra : ffffffff800ef2cc sp : ffffffc80bf63d90
[45316.648414]  gp : ffffffff81bac808 tp : ffffffd8ffabbf00 t0 : 0000000000000080
[45316.648427]  t1 : 0000000001806000 t2 : 0000000000000002 s0 : ffffffc80bf63e10
[45316.648441]  s1 : ffffffc838d2b9f0 a0 : ffffffff80e0d9b0 a1 : 0000000000000002
[45316.648454]  a2 : ffffffc838d2ba48 a3 : ffffffff81c089c8 a4 : ffffffffb1acaa24
[45316.648461]  a5 : fffffff5dbdf7cc0 a6 : 0000000000000001 a7 : 0000000000000000
[45316.648472]  s2 : ffffffc838d2ba14 s3 : ffffffffffffffff s4 : ffffffff80e0d9b0
[45316.648483]  s5 : 0000000000000001 s6 : 0000000000000000 s7 : 0000000000000002
[45316.648491]  s8 : 0000000000000004 s9 : 0000000000000001 s10: 0000000000000003
[45316.648498]  s11: 0000000000000001 t3 : 0000000000000000 t4 : 000000000000b564
[45316.648510]  t5 : fffffff1d632780c t6 : ffffffc838d2bdb8
[45316.648519] status: 0000000200000120 badaddr: 0000000000000000 cause: 8000000000000001
[45316.648535] [<ffffffff8009ab24>] rcu_momentary_dyntick_idle+0x3a/0x80
[45316.648558] [<ffffffff800eee3a>] cpu_stopper_thread+0xfc/0x182
[45316.648568] [<ffffffff80044e76>] smpboot_thread_fn+0xe6/0x11a
[45316.648585]  gp : ffffffff81bac808 tp : ffffffd8ff693f00 t0 : 0000000000000080
[45316.648598] [<ffffffff80040aee>] kthread+0xbe/0xd4
[45316.648627] [<ffffffff80003f18>] ret_from_exception+0x0/0x16
[45316.652992]  cfg80211
[45316.657396]  t1 : 0000000001806000 t2 : 0000000000000001 s0 : ffffffc80b523e10
[45316.661723]  rfkill
[45316.666339]  s1 : ffffffc83e91b9f0 a0 : ffffffff80e0d9e0 a1 : 0000000000000002
[45316.670669]  xt_MASQUERADE
[45316.675093]  a2 : ffffffc83e91ba18 a3 : ffffffff81c089c8 a4 : ffffffffb8305fc4
[45316.679261]  iptable_nat
[45316.683518]  a5 : fffffff5db7d9cc0 a6 : 0000000000000001 a7 : 0000000000000000
[45316.687708]  nf_nat
[45316.691822]  s2 : ffffffc83e91ba14 s3 : ffffffffffffffff s4 : ffffffff80e0d9e0
[45316.696041]  nf_conntrack
[45316.699993]  s5 : 0000000000000001 s6 : 0000000000000000 s7 : 0000000000000002
[45316.703891]  nf_defrag_ipv6
[45316.711012]  s8 : 0000000000000004 s9 : 0000000000000001 s10: 0000000000000003
[45316.714915]  nf_defrag_ipv4
[45316.718724]  s11: 0000000000000001 t3 : 0000000000000000 t4 : 000000000000b240
[45316.722654]  libcrc32c
[45316.726589]  t5 : 0000038e00000000 t6 : 0000000000000001
[45316.730439]  xt_TCPMSS
[45316.734514] status: 0000000200000120 badaddr: 0000000000000000 cause: 8000000000000001
[45316.738378]  xt_tcpudp
[45316.742344] [<ffffffff8009ab24>] rcu_momentary_dyntick_idle+0x3a/0x80
[45316.746064]  iptable_filter
[45316.749879] [<ffffffff800eee3a>] cpu_stopper_thread+0xfc/0x182
[45316.753627]  vfat
[45316.757549] [<ffffffff80044e76>] smpboot_thread_fn+0xe6/0x11a
[45316.775352]  fat
[45316.782228] [<ffffffff80040aee>] kthread+0xbe/0xd4
[45316.785847]  ixgbe
[45316.789547] [<ffffffff80003f18>] ret_from_exception+0x0/0x16
[45316.793204]  ofpart mdio_devres ipmi_si sophgo_spifmc of_mdio spi_nor fixed_phy ipmi_devintf fwnode_mdio igb libphy 8250_dw ipmi_msghandler mtd gpio_dwapb mousedev switchtec mdio uio_pdrv_genirq uio tcp_bbr sch_fq fuse dm_mod loop nfnetlink bpf_preload ip_tables x_tables ext4 crc32c_generic crc16 mbcache jbd2 mmc_block usbhid sdhci_sophgo ast sdhci_pltfm sdhci drm_vram_helper nvme spi_dw_mmio drm_ttm_helper mmc_core nvme_core gpio_keys spi_dw xhci_pci ttm xhci_pci_renesas nvme_common
[45317.691773] CPU: 80 PID: 502 Comm: migration/80 Tainted: G             L     6.1.61-1.1-sophgo-multi-08348-gdb74e759247f #1 d71360b18e23f727e028cd5c570bc8e9bdca43d0
[45317.712214] Hardware name: Sophgo Mango (DT)
[45317.719354] Stopper: multi_cpu_stop+0x0/0x172 <- migrate_swap+0xbe/0x158
[45317.729201] epc : rcu_momentary_dyntick_idle+0x3a/0x80
[45317.737345]  ra : multi_cpu_stop+0xb8/0x172
[45317.744281] epc : ffffffff8009ab24 ra : ffffffff800ef2cc sp : ffffffc80b6dbd90
[45317.754171]  gp : ffffffff81bac808 tp : ffffffd8ff76bf00 t0 : 0000000000000080
[45317.764055]  t1 : 0000000001806000 t2 : 0000002abb8226f9 s0 : ffffffc80b6dbe10
[45317.774179]  s1 : ffffffc838feb9f0 a0 : ffffffff80e0d990 a1 : 0000000000000002
[45317.784340]  a2 : ffffffc838feba48 a3 : ffffffff81c089c8 a4 : 000000000b6031d4
[45317.794436]  a5 : fffffff5db8decc0 a6 : 0000000000000001 a7 : 0000000000000000
[45317.804493]  s2 : ffffffc838feba14 s3 : ffffffffffffffff s4 : ffffffff80e0d990
[45317.814555]  s5 : 0000000000000001 s6 : 0000000000000000 s7 : 0000000000000002
[45317.824637]  s8 : 0000000000000004 s9 : 0000000000000001 s10: 0000000000000003
[45317.834724]  s11: 0000000000000001 t3 : 0000000000000000 t4 : 000000000000b424
[45317.844934]  t5 : 000001a200000000 t6 : ffffffc5fe6f8000
[45317.852945] status: 0000000200000120 badaddr: 0000000000000000 cause: 8000000000000001
[45317.863767] [<ffffffff8009ab24>] rcu_momentary_dyntick_idle+0x3a/0x80
[45317.873092] [<ffffffff800eee3a>] cpu_stopper_thread+0xfc/0x182
[45317.881867] [<ffffffff80044e76>] smpboot_thread_fn+0xe6/0x11a
[45317.890527] [<ffffffff80040aee>] kthread+0xbe/0xd4
[45317.898340] [<ffffffff80003f18>] ret_from_exception+0x0/0x16
@felixonmars
Copy link
Author

Another server crash from Debian(RevyOS)'s 6.1.61 kernel, but I am not sure if it's the same issue:

[209173.109588] INFO: task node:896412 blocked for more than 124 seconds.
[209173.116301]       Not tainted 6.1.61-pisces #2023.12.19.12.48+c60b48221
[209173.123094] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[209173.131065] task:node            state:D stack:0     pid:896412 ppid:890296 flags:0x00000100
[209173.139659] Call Trace:
[209173.142239] [<ffffffff80a6e16a>] __schedule+0x29c/0x89e
[209173.147634] [<ffffffff80a6e7b8>] schedule+0x4c/0xce
[209173.152644] [<ffffffff80a6ebba>] schedule_preempt_disabled+0x18/0x20
[209173.159136] [<ffffffff80a6fb82>] __mutex_lock.constprop.0+0x336/0x6b8
[209173.165734] [<ffffffff80a70000>] __mutex_lock_slowpath+0x1a/0x22
[209173.171878] [<ffffffff80a7004a>] mutex_lock+0x42/0x4c
[209173.177061] [<ffffffff800d1a98>] proc_cgroup_show+0x5c/0x3ba
[209173.182866] [<ffffffff80306f7a>] proc_single_show+0x4e/0x9e
[209173.188594] [<ffffffff802a836a>] seq_read_iter+0x158/0x362
[209173.194216] [<ffffffff802a8608>] seq_read+0x94/0xc0
[209173.199231] [<ffffffff802797e2>] vfs_read+0xaa/0x238
[209173.204325] [<ffffffff8027a362>] ksys_read+0x6e/0xe4
[209173.209434] [<ffffffff8027a3f2>] sys_read+0x1a/0x22
[209173.214446] [<ffffffff80003cc8>] ret_from_syscall+0x0/0x2
[209173.229174] INFO: task node:896551 blocked for more than 124 seconds.
[209173.235882]       Not tainted 6.1.61-pisces #2023.12.19.12.48+c60b48221
[209173.242680] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[209173.250655] task:node            state:D stack:0     pid:896551 ppid:889336 flags:0x00000100
[209173.259257] Call Trace:
[209173.261829] [<ffffffff80a6e16a>] __schedule+0x29c/0x89e
[209173.267216] [<ffffffff80a6e7b8>] schedule+0x4c/0xce
[209173.272227] [<ffffffff80a6ebba>] schedule_preempt_disabled+0x18/0x20
[209173.278716] [<ffffffff80a6fb82>] __mutex_lock.constprop.0+0x336/0x6b8
[209173.285303] [<ffffffff80a70000>] __mutex_lock_slowpath+0x1a/0x22
[209173.291479] [<ffffffff80a7004a>] mutex_lock+0x42/0x4c
[209173.296665] [<ffffffff800d1a98>] proc_cgroup_show+0x5c/0x3ba
[209173.302464] [<ffffffff80306f7a>] proc_single_show+0x4e/0x9e
[209173.308205] [<ffffffff802a836a>] seq_read_iter+0x158/0x362
[209173.313822] [<ffffffff802a8608>] seq_read+0x94/0xc0
[209173.318828] [<ffffffff802797e2>] vfs_read+0xaa/0x238
[209173.323937] [<ffffffff8027a362>] ksys_read+0x6e/0xe4
[209173.329031] [<ffffffff8027a3f2>] sys_read+0x1a/0x22
[209173.334039] [<ffffffff80003cc8>] ret_from_syscall+0x0/0x2
[209173.348811] INFO: task node:896742 blocked for more than 124 seconds.
[209173.355521]       Not tainted 6.1.61-pisces #2023.12.19.12.48+c60b48221
[209173.362294] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[209173.370269] task:node            state:D stack:0     pid:896742 ppid:890159 flags:0x00000100
[209173.378867] Call Trace:
[209173.381442] [<ffffffff80a6e16a>] __schedule+0x29c/0x89e
[209173.389371] [<ffffffff80a6e7b8>] schedule+0x4c/0xce
[209173.396810] [<ffffffff80a6ebba>] schedule_preempt_disabled+0x18/0x20
[209173.405501] [<ffffffff80a6fb82>] __mutex_lock.constprop.0+0x336/0x6b8
[209173.414293] [<ffffffff80a70000>] __mutex_lock_slowpath+0x1a/0x22
[209173.422578] [<ffffffff80a7004a>] mutex_lock+0x42/0x4c
[209173.429832] [<ffffffff800d1a98>] proc_cgroup_show+0x5c/0x3ba
[209173.437697] [<ffffffff80306f7a>] proc_single_show+0x4e/0x9e
[209173.445493] [<ffffffff802a836a>] seq_read_iter+0x158/0x362
[209173.453113] [<ffffffff802a8608>] seq_read+0x94/0xc0
[209173.460196] [<ffffffff802797e2>] vfs_read+0xaa/0x238
[209173.467368] [<ffffffff8027a362>] ksys_read+0x6e/0xe4
[209173.474484] [<ffffffff8027a3f2>] sys_read+0x1a/0x22
[209173.481473] [<ffffffff80003cc8>] ret_from_syscall+0x0/0x2
sbi_trap_error: hart93: illegal instruction handler failed (error -2)
sbi_trap_error: hart93: mcause=0x0000000000000002 mtval=0x0000000000000000
sbi_trap_error: hart93: mepc=0x000000000015b1fa mstatus=0x0000000a00001820
sbi_trap_error: hart93: ra=0x000000000015b1f8 sp=0x00000000000e1f18
sbi_trap_error: hart93: gp=0xffffffff81a44ec8 tp=0xffffffe2b2a78000
sbi_trap_error: hart93: s0=0x00000000000e1f38 s1=0xffffffe81d005400
sbi_trap_error: hart93: a0=0x0013000000130000 a1=0xffffffffb51fb51f
sbi_trap_error: hart93: a2=0x000000000000b51f a3=0x000000000000b51f
sbi_trap_error: hart93: a4=0x000000000000b520 a5=0x0013000000130000
sbi_trap_error: hart93: a6=0x000000000000b51f a7=0x0000000000000080
sbi_trap_error: hart93: s2=0xfffffff65fa5f180 s3=0x0000000000016b20
sbi_trap_error: hart93: s4=0x0000000000000009 s5=0x0000000000000009
sbi_trap_error: hart93: s6=0x0000000200000022 s7=0xfffffff65fa5f8ff
sbi_trap_error: hart93: s8=0xffffffff81acb488 s9=0x000000000000005d
sbi_trap_error: hart93: s10=0xffffffff81a6ac98 s11=0xffffffe04f79bc08
sbi_trap_error: hart93: t0=0x0000000a00000820 t1=0x0000000000000001
sbi_trap_error: hart93: t2=0xffffffff8100bf28 t3=0x00000000ff00ff00
sbi_trap_error: hart93: t4=0xffffffd8ffa8fd38 t5=0x0000000000000002
sbi_trap_error: hart93: t6=0x02b11ee2af76cd48

@felixonmars
Copy link
Author

Another crash with the first self-compiled kernel, on the same host:

[53111.356078] nvme nvme0: Abort status: 0x0                        
[53111.370408] nvme nvme0: I/O 309 (Write) QID 1 timeout, aborting  
[53111.374461] nvme nvme0: Abort status: 0x0                        
[53111.388915] nvme nvme0: I/O 310 (Write) QID 1 timeout, aborting  
[53111.392554] nvme nvme0: Abort status: 0x0
[53111.406580] nvme nvme0: I/O 311 (Write) QID 1 timeout, aborting  
[53111.410602] nvme nvme0: Abort status: 0x0
[53111.424884] nvme nvme0: I/O 312 (Write) QID 1 timeout, aborting  
[53111.428874] nvme nvme0: Abort status: 0x0
[53111.443731] nvme nvme0: Abort status: 0x0
[55284.493268] rcu: INFO: rcu_sched detected stalls on CPUs/tasks:
[55284.503385] rcu:     41-...0: (1 GPs behind) idle=7b24/1/0x4000000000000000 softirq=3305695/3305728 fqs=1979
[55312.513142] watchdog: BUG: soft lockup - CPU#49 stuck for 22s! [migration/49:314]
[55312.525061] Modules linked in: tun cfg80211 rfkill xt_MASQUERADE iptable_nat nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 libcrc32c xt_TCPMSS xt_tcpudp iptable_filter vfat fat ixgbe mdio_devres ofpart of_mdio ipmi_si fixed_phy sophgo_spifmc fwnode_mdio spi_nor ipmi_devintf libphy igb 8250_dw gpio_dwapb ipmi_msghandler mtd mdio switchtec mousedev uio_pdrv_genirq uio tcp_bbr sch_fq fuse loop dm_mod nfnetlink bpf_preload ip_tables x_tables ext4 crc32c_generic crc16 mbcache jbd2 mmc_block sdhci_sophgo ast sdhci_pltfm usbhid nvme drm_vram_helper sdhci spi_dw_mmio drm_ttm_helper gpio_keys mmc_core spi_dw nvme_core ttm xhci_pci nvme_common xhci_pci_renesas
[55312.607363] CPU: 49 PID: 314 Comm: migration/49 Not tainted 6.1.61-1.1-sophgo-multi-08348-gdb74e759247f #1 d71360b18e23f727e028cd5c570bc8e9bdca43d0
[55312.628591] Hardware name: Sophgo Mango (DT)
[55312.636471] Stopper: multi_cpu_stop+0x0/0x172 <- migrate_swap+0xbe/0x158
[55312.647469] epc : rcu_momentary_dyntick_idle+0x3a/0x80
[55312.656635]  ra : multi_cpu_stop+0xb8/0x172
[55312.664391] epc : ffffffff8009ab24 ra : ffffffff800ef2cc sp : ffffffc80b0fbd90
[55312.675607]  gp : ffffffff81bac808 tp : ffffffe7fed63f00 t0 : 0000000000000080
[55312.686709]  t1 : 0000000001806000 t2 : 0000000000000001 s0 : ffffffc80b0fbe10
[55312.697704]  s1 : ffffffc8390eb970 a0 : ffffffff80e0da88 a1 : 0000000000000002
[55312.708524]  a2 : ffffffc8390eb9c8 a3 : ffffffff81c089c8 a4 : ffffffffe72fec6c
[55312.719750]  a5 : fffffff5db55bcc0 a6 : 0000000000000001 a7 : 0000000000000000
[55312.730866]  s2 : ffffffc8390eb994 s3 : ffffffffffffffff s4 : ffffffff80e0da88
[55312.741914]  s5 : 0000000000000001 s6 : 0000000000000000 s7 : 0000000000000002
[55312.752799]  s8 : 0000000000000004 s9 : 0000000000000001 s10: 0000000000000003
[55312.763432]  s11: 0000000000000001 t3 : 0000000000000000 t4 : 0000000000000000
[55312.774591]  t5 : 0000000000000034 t6 : 000000000000ffff
[55312.783551] status: 0000000200000120 badaddr: 0000000000000000 cause: 8000000000000001
[55312.794943] [<ffffffff8009ab24>] rcu_momentary_dyntick_idle+0x3a/0x80
[55312.804845] [<ffffffff800eee3a>] cpu_stopper_thread+0xfc/0x182
[55312.813935] [<ffffffff80044e76>] smpboot_thread_fn+0xe6/0x11a
[55312.823287] [<ffffffff80040aee>] kthread+0xbe/0xd4
[55312.831636] [<ffffffff80003f18>] ret_from_exception+0x0/0x16
[55340.510039] watchdog: BUG: soft lockup - CPU#49 stuck for 48s! [migration/49:314]
[55340.521060] Modules linked in: tun cfg80211 rfkill xt_MASQUERADE iptable_nat nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 libcrc32c xt_TCPMSS xt_tcpudp iptable_filter vfat fat ixgbe mdio_devres ofpart of_mdio ipmi_si fixed_phy sophgo_spifmc fwnode_mdio spi_nor ipmi_devintf libphy igb 8250_dw gpio_dwapb ipmi_msghandler mtd mdio switchtec mousedev uio_pdrv_genirq uio tcp_bbr sch_fq fuse loop dm_mod nfnetlink bpf_preload ip_tables x_tables ext4 crc32c_generic crc16 mbcache jbd2 mmc_block sdhci_sophgo ast sdhci_pltfm usbhid nvme drm_vram_helper sdhci spi_dw_mmio drm_ttm_helper gpio_keys mmc_core spi_dw nvme_core ttm xhci_pci nvme_common xhci_pci_renesas
[55340.599220] CPU: 49 PID: 314 Comm: migration/49 Tainted: G             L     6.1.61-1.1-sophgo-multi-08348-gdb74e759247f #1 d71360b18e23f727e028cd5c570bc8e9bdca43d0
[55340.619981] Hardware name: Sophgo Mango (DT)
[55340.627532] Stopper: multi_cpu_stop+0x0/0x172 <- migrate_swap+0xbe/0x158
[55340.637316] epc : rcu_momentary_dyntick_idle+0x3a/0x80
[55340.645582]  ra : multi_cpu_stop+0xb8/0x172
[55340.652699] epc : ffffffff8009ab24 ra : ffffffff800ef2cc sp : ffffffc80b0fbd90
[55340.663138]  gp : ffffffff81bac808 tp : ffffffe7fed63f00 t0 : 0000000000000080
[55340.673447]  t1 : 0000000001806000 t2 : 0000000000000001 s0 : ffffffc80b0fbe10
[55340.683806]  s1 : ffffffc8390eb970 a0 : ffffffff80e0da88 a1 : 0000000000000002
[55340.694164]  a2 : ffffffc8390eb9c8 a3 : ffffffff81c089c8 a4 : ffffffffc2766c74
[55340.704272]  a5 : fffffff5db55bcc0 a6 : 0000000000000001 a7 : 0000000000000000
[55340.714400]  s2 : ffffffc8390eb994 s3 : ffffffffffffffff s4 : ffffffff80e0da88
[55340.724650]  s5 : 0000000000000001 s6 : 0000000000000000 s7 : 0000000000000002
[55340.734871]  s8 : 0000000000000004 s9 : 0000000000000001 s10: 0000000000000003
[55340.745127]  s11: 0000000000000001 t3 : 0000000000000000 t4 : 0000000000000000
[55340.755163]  t5 : 0000000000000034 t6 : 000000000000ffff
[55340.763195] status: 0000000200000120 badaddr: 0000000000000000 cause: 8000000000000001
[55340.774063] [<ffffffff8009ab24>] rcu_momentary_dyntick_idle+0x3a/0x80
[55340.783460] [<ffffffff800eee3a>] cpu_stopper_thread+0xfc/0x182
[55340.792051] [<ffffffff80044e76>] smpboot_thread_fn+0xe6/0x11a
[55340.800608] [<ffffffff80040aee>] kthread+0xbe/0xd4
[55340.808312] [<ffffffff80003f18>] ret_from_exception+0x0/0x16

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant