Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Force alignment of allocated memory to prevent segmentation faults on ARM systems #224

Open
wants to merge 2 commits into
base: master
Choose a base branch
from

Conversation

gutschke
Copy link

@gutschke gutschke commented Jan 21, 2024

I fixed a minor nuisances that I encountered with the agent. Feel free to either pull outright, or edit and apply your own fix. This is a one-liner.

I noticed when running the agent on my Raspberry Pi (which has a 32bit userspace and 64bit kernel for some random reason), it would always result in a segmentation fault. Turns out, dynamically allocated memory wasn't aligned properly. This isn't an issue on x86 systems, which supports unaligned memory accesses but proved to be a problem on ARM. My fix should address the observed issue, and it can't make things any worse as it only ever increases alignment, but it's a bit simplistic. A proper fix would need a lot more architecture-specific code. In practice, I am not sure that's actually needed, though. Most commonly used CPU architectures should be happy with aligning to a multiple of the natural pointer size.

…ced crashes on ARM based devices which have stricter alignment requirements than x86. Ideally, this code should be rewritten to pad the sizes of all "structs" and to consider all of the different architecture-specific alignment requirements. But in practice, I suspect that rounding up allocation sizes to a multiple of pointer sizes is likely good enough. This is a pragmatic one-line change that shouldn't break existing code and should prevent any of the crashes that I have observed.
@si458
Copy link
Collaborator

si458 commented Jan 24, 2024

i think this has been fixed with this PR? #187

@gutschke
Copy link
Author

I believe this is just another manifestation of the same underlying bug. There are bound to be lots of places where structures are accessed directly, instead of always making a copy first.

The work-around proposed in the discussion for PR #187 of disabling the 64bit kernel might very well be functional. But that's a bit heavy-handed. While it isn't very common to have a 32bit userspace with a 64bit kernel, it can easily happen when upgrading an older distribution that was previously only available in 32bit, as many distributions have started switching to a single kernel for both their 32bit and 64bit releases. I suspect that this affects a lot of Raspberry Pi users, as RPi distributions went through this transition only a few years ago.

In the bug that I am fixing here, I believe the crash happened a few lines later (around line 1111) when accessing the newly allocated ILibMemory_Header. Every single time we dereference any of the members of that struct, an unaligned access occurs. And some of those might very well be hidden inside of macros. It will be quite labor-intensive to track down each of the accesses that needs to be fixed. It is much easier to simply ensure that "extra" data is allocated with proper alignment in the first place.

Incidentally, there is a related issue that I didn't fix. If you try to install MeshAgent on a 32bit OS with a 64bit kernel, the automatic detection of the architecture doesn't work correctly and the installation script will try to download and install a 64bit binary which subsequently fails to run with a rather confusing error message of "file not found". The root cause of course is not the missing binary, but the missing run-time loader. I simply overrode the architecture manually and proceeded with the installation. But that's an issue that some users might have trouble with. Do you want me to file a bug?

@si458
Copy link
Collaborator

si458 commented Jan 24, 2024

No, that bug has already been fixed with again #187,
The issue is because the binary file isn't compatible with 64bit on 64bit arm so the raspberry pi's freak out and cause a file not found or a another error which js what u get.
We just need to build new binary and then release them, it's all in the works...

Example meshcentral issues
Ylianst/MeshCentral#5108
Ylianst/MeshCentral#5284

@si458 si458 closed this Jan 24, 2024
@gutschke
Copy link
Author

gutschke commented Jan 24, 2024

Are you talking about commit 0ba325a? Or are there additional commits that I didn't see.

While that commit seems reasonable to me, I don't think that fixes the same problem. It just looks very similar. Take a look at the change that I suggested in f72893f. I think this is another problem.

When ILibMemory_Init() is called with a primarySize that isn't a multiple of the widest alignment (e.g. 4 or 8 depending on platform), then it places an extra ILibMemory_Header immediate after the memory that was in the primary allocation, and that header is now in unaligned memory. As soon as the code tries to fill out the structure, it'll crash (around line 1111).

You are correct though and the fix was incomplete. I am still not super familiar with the code and am mostly stabbing at it blindly, but I think I found the missing places that also need to enforce correct alignment now. I updated my branch, but I am not sure how to tell GitHub to update the pull request with 8e7fb7b.

@si458 si458 reopened this Jan 24, 2024
@si458
Copy link
Collaborator

si458 commented Jan 24, 2024

OK I've reopened the PR for you which shows 2 commits.
I'm sadly not familiar either with C.
But I do have a pi 4! So I will get it setup 2moz with 64/64 and do some built testing and report back!
It might be we need both your pr and the other pr to fix the arm64 builds!

@gutschke
Copy link
Author

gutschke commented Jan 24, 2024

The bug only triggers if the caller provides a primarySize that doesn't meet the alignment requirements. And that obviously depends on what they are trying to store in that allocation. I didn't bother walking the stack trace up to see who happens to make such an unfortunate allocation. But I wouldn't be surprised if it only happens for 32 bit builds running on 64 bit hardware.

So, if you have trouble reproducing the problem, that's something you might need to try. If in doubt, print the primarySize to the console. If you see any values that aren't divisible by 4 (for the 32 bit binary), that's suspicious.

I love to help and provide patches for the bugs that I encounter myself and that I can figure out how to fix. Unfortunately, I can't really commit to doing any more than that. On the other had, for the primary developers (which I understand would mean both you and Ylianst) it would likely make a lot of sense to find a way to emulate the majority of the official target platforms. Raspberry Pi can be emulated in QEmu and an emulated environment makes it much easier to quickly switch userspace, kernels, and/or hardware. It also can easily do snapshots/rollbacks which I find invaluable for this type of work. If I had to do this, I'd try to get a Proxmox VE cluster up and running with most of the target platforms. It's a nice front-end to juggling a lot of different machines.

Having said that, I have a bunch of Raspberry Pi devices in production use here, and I can run some tests on them. I have both older 32-bit only hardware, and newer 64-bit capable hardware. And I use both 32 and 64 bit userspace, including at least one device with 32 bit userspace and 64 bit kernel. So, if there is something you want me to test, please speak up.

@si458
Copy link
Collaborator

si458 commented Jan 25, 2024

ive just done a complete FRESH install of raspbian bookworm 64bit on both my pi 3b+ and my pi4,
and its now installing perfectly fine, no update of agent or install scripts, i even checked the config.txt and arm_64bit=1 was set by default?
are you 100% using the latest meshcentral version 1.1.19 ?
only issue is you MUST switch from wayland to x11 with raspi-config to get the Desktop tab to work/show

@gutschke
Copy link
Author

gutschke commented Jan 25, 2024

I have 32bit only devices, 64bit devices, and 32/64bit devices. The problem only shows with the latter. You need a 32bit userspace running a kernel that supports both 32 and 64bit ARM at the same time.

I don't know how easy it is to trigger this situation with a fresh install of Ubuntu, but it seems common when upgrading an older 32bit installation of the OS that runs on hardware which is 64bit capable. And that's not unusual, because, for a while, Ubuntu suggested for users to install a 32bit OS on this hardware, and they then later changed the kernel from a 32bit build to a universal 32/64 bit version.

So anybody who installed their Raspberry Pi a few years ago and only every upgraded instead of doing a fresh install will fit this. You might be able to replicate by intentionally installing a 32bit version of Ubuntu on your hardware.

@si458
Copy link
Collaborator

si458 commented Jan 25, 2024

@gutschke right got you! i will do more testing.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants