Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Segmentation fault on arm64 for 0.19.7 #5205

Open
5 tasks done
ravermeister opened this issue Nov 17, 2024 · 24 comments
Open
5 tasks done

Segmentation fault on arm64 for 0.19.7 #5205

ravermeister opened this issue Nov 17, 2024 · 24 comments
Labels
bug Something isn't working

Comments

@ravermeister
Copy link

ravermeister commented Nov 17, 2024

Requirements

  • Is this a bug report? For questions or discussions use https://lemmy.ml/c/lemmy_support
  • Did you check to see if this issue already exists?
  • Is this only a single bug? Do not put multiple bugs in one issue.
  • Do you agree to follow the rules in our Code of Conduct?
  • Is this a backend issue? Use the lemmy-ui repo for UI / frontend issues.

Summary

When trying to start lemmy with docker compose, the container immediately exits.
When trying to start the lemmy server manually the segmentation fault occurs

Steps to Reproduce

adjust the configuration files as explained in the install instructions

Then:

~>docker compose run -it --entrypoint /bin/bash lemmy

✔ Container 62fd3fd0a34a_lemmy-pictrs-1    Running 0.0s
✔ Container e743dab5f880_lemmy-postgres-1  Running 0.0s

~>lemmy_server

Segmentation fault

Technical Details

cat /proc/version

Linux version 6.6.51-v8-16k+ (dom@buildbot) (aarch64-linux-gnu-gcc (Ubuntu 11.4.0-1ubuntu1~22.04) 11.4.0, GNU ld (GNU Binutils for Ubuntu) 2.38) #1797 SMP PREEMPT Wed Sep 18 18:16:38 BST 2024
cat /sys/firmware/devicetree/base/model

Raspberry Pi 5 Model B Rev 1.0# 

Version

BE 0.19.7

Lemmy Instance URL

No response

@ravermeister ravermeister added the bug Something isn't working label Nov 17, 2024
@Nutomic
Copy link
Member

Nutomic commented Nov 18, 2024

Our ARM builds are compiled for aarch64. It seems that architecture is supported on your device, but you need to install a different OS variant.

https://raspberrypi.stackexchange.com/questions/111011/aarch64-or-armv8-with-arch-on-a-raspberry-pi-4

@ravermeister
Copy link
Author

Hmm but isn't my architecture already aarch64?
according to

cat /proc/version
Linux version 6.6.51-v8-16k+ (dom@buildbot) (aarch64-linux-gnu-gcc (Ubuntu 11.4.0-1ubuntu1~22.04) 11.4.0, GNU ld (GNU Binutils for Ubuntu) 2.38) #1797 SMP PREEMPT Wed Sep 18 18:16:38 BST 2024

It is the official raspbian image by the way...

@Nutomic
Copy link
Member

Nutomic commented Nov 18, 2024

Then maybe something went wrong with the build. Can you test if 0.19.0 for example is working?

@dessalines
Copy link
Member

Also, what is the docker version?

@ravermeister
Copy link
Author

ravermeister commented Nov 18, 2024

Sure, my Docker version is:

docker -v
Docker version 27.3.1, build ce12230

docker compose version
Docker Compose version v2.29.7

and the same segmentation fault occurs in version 0.19.0 and 0.19.0-alpha.12 too

@dessalines
Copy link
Member

We don't have arm machines so this would be pretty difficult to debug and test. Even our arm64 builds are done via a different image, here: https://github.com/raskyld/lemmy-cross-toolchains . We'd need someone with experience building images for arm figure out what's going on.

cc @raskyld

@Nothing4You
Copy link
Collaborator

on my arm vms the 0.19.7 image is working fine, so the issue does not appear to be the image itself

uname -a
Linux f3heh9 6.1.0-20-arm64 #1 SMP Debian 6.1.85-1 (2024-04-11) aarch64 GNU/Linux
docker -v
Docker version 26.1.1, build 4cf5afa

@ravermeister
Copy link
Author

ravermeister commented Nov 19, 2024

I'm running the stock (updated) raspbian image .
head of raspinfo is:

Raspberry Pi 5 Model B Rev 1.0
PRETTY_NAME="Debian GNU/Linux 12 (bookworm)"
NAME="Debian GNU/Linux"
VERSION_ID="12"
VERSION="12 (bookworm)"                                  
Raspberry Pi reference 2023-12-05
Generated using pi-gen, https://github.com/RPi-Distro/pi-gen, 70cd6f2a1e34d07f5cba7047aea5b92457372e05, stage4

Linux viruspi 6.6.51-v8-16k+ #1797 SMP PREEMPT Wed Sep 18 18:16:38 BST 2024 aarch64 GNU/Linux
Revision        : d04170
Serial          : 38bc7a27177aa590
Model           : Raspberry Pi 5 Model B Rev 1.0
Throttled flag  : throttled=0x0

@Nothing4You
Copy link
Collaborator

Just to confirm, could you try just launching a shell in the container standalone?

docker run --rm -it --entrypoint /bin/bash dessalines/lemmy:0.19.7

I'd expect this to also segfault if it's a general architecture issue.

@ravermeister
Copy link
Author

I did this, and I can successfully launch a shell only lemmy_server fails, see Steps to reproduce in #5205 (comment)

@Nothing4You
Copy link
Collaborator

is there a possibility that you're trying to run an image for another platform somehow?

what do you get from this?

docker image inspect -f '{{ .RepoTags }} - {{ .RepoDigests }} - {{ .Architecture }}' dessalines/lemmy:0.19.7

what do you get when you run arch in the shell inside the container?

@ravermeister
Copy link
Author

ravermeister commented Nov 20, 2024

I don't think so:

docker image inspect -f '{{ .RepoTags }} - {{ .RepoDigests }} - {{ .Architecture }}' dessalines/lemmy:0.19.7

[dessalines/lemmy:0.19.7] - [dessalines/lemmy@sha256:62282b9d6ee0a8692f2283b24dd3f4bbbf987c0c2f560fc2617829975c536118] - arm64
lemmy@lemmy:/$ arch
aarch64

Seems something is wrong with lemmy_server
In case it helps, this is strace lemmy_server

strace lemmy_server
execve("/usr/local/bin/lemmy_server", ["lemmy_server"], 0x7fffd755b090 /* 8 vars */) = -1 EINVAL (Invalid argument)
+++ killed by SIGSEGV +++
Segmentation fault

@raskyld
Copy link
Contributor

raskyld commented Nov 20, 2024

I would recommend to build the project in debug profile to have access to debug symbols first.
Either use:

  • an arm64 machine to compile lemmy_server,
  • the raspberry itself, it will be slow but you are sure it'll use the good libraries for your hardware at least,
  • cross-compiling from amd64 to arm64, but let me tell you that would be a pain in the A.

Then, once you have the debug binary compiled, make sure your have gdb (or better rust-gdb) installed and rust-gdb target/debug/lemmy_server, that will open an interactive debugging session, just write run and gdb should catch the SIGSEGV, you can then use bt to get a backtrace and post it there to help us understand which LOC of lemmy_server cause this behaviour on your hardware.

Good luck!

@Nothing4You
Copy link
Collaborator

could this be related to specific CPU features maybe?

for reference, the arm64 virtual machines at hetzner look like this:

processor	: 0
BogoMIPS	: 50.00
Features	: fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics fphp asimdhp cpuid asimdrdm lrcpc dcpop asimddp ssbs
CPU implementer	: 0x41
CPU architecture: 8
CPU variant	: 0x3
CPU part	: 0xd0c
CPU revision	: 1

my m1 mac looks like this:

processor	: 0
BogoMIPS	: 48.00
Features	: fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics fphp asimdhp cpuid asimdrdm jscvt fcma lrcpc dcpop sha3 asimddp sha512 asimdfhm dit uscat ilrcpc flagm ssbs sb paca pacg dcpodp flagm2 frint
CPU implementer	: 0x61
CPU architecture: 8
CPU variant	: 0x0
CPU part	: 0x000
CPU revision	: 0

The features common in both of these are

  • aes
  • asimd
  • asimddp
  • asimdhp
  • asimdrdm
  • atomics
  • cpuid
  • crc32
  • dcpop
  • evtstrm
  • fp
  • fphp
  • lrcpc
  • pmull
  • sha1
  • sha2
  • ssbs

@Nothing4You
Copy link
Collaborator

based on the cpuinfo of a pi 5 b i found on a random blog, the only cpu feature present on my machines not present on the pi seems to be SSBS, Speculative Store Bypass Safe, which seems to be related to speculative execution vulnerability mitigation. that doesn't sound like it would likely be related to me.

@raskyld
Copy link
Contributor

raskyld commented Nov 20, 2024

Eventually, I see QEMU supports cortex-a76 so you could try to run lemmy_server there to see if you can reproduce: https://www.qemu.org/docs/master/system/arm/virt.html

@Nothing4You
Copy link
Collaborator

tried emulating cortex-a76 on my mac with qemu (UTM), also works just fine:

lemmy@0fb5faa31c1f:/$ lscpu
Architecture:             aarch64
  CPU op-mode(s):         32-bit, 64-bit
  Byte Order:             Little Endian
CPU(s):                   4
  On-line CPU(s) list:    0-3
Vendor ID:                ARM
  Model name:             Cortex-A76
    Model:                1
    Thread(s) per core:   1
    Core(s) per cluster:  4
    Socket(s):            -
    Cluster(s):           1
    Stepping:             r4p1
    BogoMIPS:             125.00
    Flags:                fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics fphp asimdhp cpuid asimdrdm lrcpc dcpop asimddp
NUMA:
  NUMA node(s):           1
  NUMA node0 CPU(s):      0-3
Vulnerabilities:
  Gather data sampling:   Not affected
  Itlb multihit:          Not affected
  L1tf:                   Not affected
  Mds:                    Not affected
  Meltdown:               Not affected
  Mmio stale data:        Not affected
  Reg file data sampling: Not affected
  Retbleed:               Not affected
  Spec rstack overflow:   Not affected
  Spec store bypass:      Mitigation; Speculative Store Bypass disabled via prctl
  Spectre v1:             Mitigation; __user pointer sanitization
  Spectre v2:             Mitigation; CSV2, BHB
  Srbds:                  Not affected
  Tsx async abort:        Not affected
lemmy@0fb5faa31c1f:/$ lemmy_server
thread 'main' panicked at crates/utils/src/settings/mod.rs:22:22:
Failed to load settings file, see documentation (https://join-lemmy.org/docs/en/administration/configuration.html).: LemmyError { message: Unknown("No such file or directory (os error 2)"), inner: No such file or directory (os error 2), context: SpanTrace [] }
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace
lemmy@0fb5faa31c1f:/$ LEMMY_CONFIG_LOCATION=/lemmy.hjson lemmy_server
Lemmy v0.19.7
2024-11-21T02:01:10.141698Z  INFO actix_server::builder: starting 4 workers
2024-11-21T02:01:10.142630Z  INFO actix_server::server: Tokio runtime found; starting in existing Tokio runtime
Error: LemmyError { message: Unknown("Error connecting to database"), inner: Error connecting to database

Caused by:
    could not translate host name "postgres" to address: Name or service not known
    , context: SpanTrace [] }
lemmy@0fb5faa31c1f:/$

@ravermeister
Copy link
Author

ravermeister commented Nov 21, 2024

I would compile by myself on the raspberry pi 5b, but I'm not very familiar with rust. If someone could point me to a good step by step manual how to compile lemmy I'll share the results here.

In case it helps, here is my /proc/cpuinfo:

processor       : 0
BogoMIPS        : 108.00
Features        : fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics fphp asimdhp cpuid asimdrdm lrcpc dcpop asimddp
CPU implementer : 0x41
CPU architecture: 8
CPU variant     : 0x4
CPU part        : 0xd0b
CPU revision    : 1

processor       : 1
BogoMIPS        : 108.00
Features        : fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics fphp asimdhp cpuid asimdrdm lrcpc dcpop asimddp
CPU implementer : 0x41
CPU architecture: 8
CPU variant     : 0x4
CPU part        : 0xd0b
CPU revision    : 1

processor       : 2
BogoMIPS        : 108.00
Features        : fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics fphp asimdhp cpuid asimdrdm lrcpc dcpop asimddp
CPU implementer : 0x41
CPU architecture: 8
CPU variant     : 0x4
CPU part        : 0xd0b
CPU revision    : 1

processor       : 3
BogoMIPS        : 108.00
Features        : fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics fphp asimdhp cpuid asimdrdm lrcpc dcpop asimddp
CPU implementer : 0x41
CPU architecture: 8
CPU variant     : 0x4
CPU part        : 0xd0b
CPU revision    : 1

Revision        : d04170
Serial          : 38bc7a27177aa590
Model           : Raspberry Pi 5 Model B Rev 1.0

@Nutomic
Copy link
Member

Nutomic commented Nov 21, 2024

You can find build instructions in the docs (ignore the parts about pnpm and lemmy-ui). Basically install Rust dependencies, git clone --recursive and cargo build.

@ravermeister
Copy link
Author

ravermeister commented Nov 21, 2024

Hmm cargo check doesn't run:

cargo -V
cargo 1.65.0
cargo check
warning: /lemmy/crates/api_crud/Cargo.toml: unused manifest key: lints   
warning: /lemmy/Cargo.toml: unused manifest key: lints
warning: /lemmy/Cargo.toml: unused manifest key: workspace.lints
warning: /lemmy/crates/utils/Cargo.toml: unused manifest key: lints
warning: /lemmy/crates/db_views_actor/Cargo.toml: unused manifest key: lints
warning: /lemmy/crates/db_schema/Cargo.toml: unused manifest key: lints
warning: /lemmy/crates/apub/Cargo.toml: unused manifest key: lints
warning: /lemmy/crates/db_perf/Cargo.toml: unused manifest key: lints
warning: /lemmy/crates/db_views/Cargo.toml: unused manifest key: lints
warning: /lemmy/crates/routes/Cargo.toml: unused manifest key: lints
warning: /lemmy/crates/db_views_moderator/Cargo.toml: unused manifest key: lints
warning: /lemmy/crates/federate/Cargo.toml: unused manifest key: lints
warning: /lemmy/crates/api_common/Cargo.toml: unused manifest key: lints
warning: /lemmy/crates/api/Cargo.toml: unused manifest key: lints
error: failed to parse lock file at: /lemmy/Cargo.lock

Caused by:
  lock file version `4` was found, but this version of Cargo does not understand this lock file, perhaps Cargo needs to be updated?

I'll check with Debian 13 to see if the rust version is newer....

@ravermeister
Copy link
Author

ravermeister commented Nov 21, 2024

Debian 13 seems to work, compile in progress...
But seems to take time...

It seems that compiling heats up my passive cooled raspberry too much, my system freezed during compile, and I'm on a train a.t.m I'll try to compile again when I'm back on keyboard

@ravermeister
Copy link
Author

ravermeister commented Dec 8, 2024

I would recommend to build the project in debug profile to have access to debug symbols first. Either use:

  • an arm64 machine to compile lemmy_server,
  • the raspberry itself, it will be slow but you are sure it'll use the good libraries for your hardware at least,
  • cross-compiling from amd64 to arm64, but let me tell you that would be a pain in the A.

Then, once you have the debug binary compiled, make sure your have gdb (or better rust-gdb) installed and rust-gdb target/debug/lemmy_server, that will open an interactive debugging session, just write run and gdb should catch the SIGSEGV, you can then use bt to get a backtrace and post it there to help us understand which LOC of lemmy_server cause this behaviour on your hardware.

Good luck!

I could build lemmy_server on my machine with cargo -j 2 build

With this build no segmentation error occurs, I could successfully launch:

root@1a8fa0ccd57d:/lemmy/target/debug# ./lemmy_server -h A link aggregator for the fediverse

Usage: lemmy_server [OPTIONS]

Options:                                                       --disable-scheduled-tasks                                    Don't run scheduled tasks [env: LEMMY_DISABLE_SCHEDULED_TASKS=]
      --disable-http-server
          Disables the HTTP server [env: LEMMY_DISABLE_HTTP_SERVER=]
      --disable-activity-sending
          Disable sending outgoing ActivityPub messages [env: LEMMY_DISABLE_ACTIVITY_SENDING=]
      --federate-process-index <FEDERATE_PROCESS_INDEX>
          The index of this outgoing federation process [env: LEMMY_FEDERATE_PROCESS_INDEX=] [default: 1]               --federate-process-count <FEDERATE_PROCESS_COUNT>
          How many outgoing federation processes you are starting in total [env: LEMMY_FEDERATE_PROCESS_COUNT=] [default: 1]
  -h, --help
          Print help (see more with '--help')
  -V, --version
          Print version

In case it helps, here is the compiled lemmy_server binary https://cloud.rimkus.it/s/mqwiAJSGKpY5N5E

@raskyld
Copy link
Contributor

raskyld commented Dec 9, 2024

It is really likely that you have a shared library, which is incompatible with the one used to build the official arm image then. Compiling it on the hardware directly ensure the produced binary targets the version you have installed.

@ravermeister
Copy link
Author

I see, meanwhile I tried 0.19.8-beta.0 and even there no segmentation fault occurs

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

5 participants