Read the blog post at FEX-Emu's Site!
Another month and here we are with a new release! We also celebrated our seven year anniversary late last month; but enough about that boring
stuff, let's talk about what we improved!
### Remap procfs cmdline using PR_SET_MM_MAP
This has been a thorn in our side for a while. When an application reads the cmdline FEX would need to rewrite the file contents to remove the
FEXInterpreter argument. Turns out the kernel has had this feature for quite a while to remap this file, we just weren't utilizing it. Now instead of mangling the data, we are using the correct interface from the kernel. This means that things like Mesa application profiles and KDE
Plasma see the correct application name in all instances.
Big shoutout to the external contributors that implemented this for us!
Implement support for JIT codebuffer guard page based restart
This one takes a bit to explain what this is and why it is necessary. When writing our AArch64 code emitter, we made the decision not to do range
checks for how much memory is remaining in our JIT code buffer. We instead used a heuristic to determine how much space is required whcih usually
worked. The problem with heuristics of course is that they can fail and our "fallback" case was to crash. This was a known problem that we would need
to resolve at some point, and that was finally this month that we go around to it. Due to us utilizing larger "multiblock" JIT blocks, we had started
having a more likely chance of hitting this crash, which usually ends up being due to x87 heavy code because the JIT translation is heavy.
Now when the heuristic fails, our code emitter will try writing to our guard page and we will catch the SIGSEGV and restart the JIT with a larger
code buffer. Fixing these edge case crash behaviours and making our JIT more robust in the process.
Initial code caching features landing
There's an absolute ton of work that is going in to this and it's not yet ready for users yet, but it would be remiss to not call out all the effort
on this front.
This month we landed initial support for "code maps" and offline "code cache" generation. There is not yet any way for a user to actually utilize these
code maps and caches but these are the required steps to get us to the transparent code-caching that we are expecting to have. Watch out for the
coming months as we finish fleshing out this feature fully wired up.
Fixes APICID count
This is a bit of a weird feature that we had accidentally missed. When reading CPUID processes get what is called an APIC ID, which is essentially
just a core index. Some applications will use this ID as a way to determine how many unique CPU cores are available on the system. We were
accidentally always returning zero which was causing some applications to only think the system had one CPU. With this fixed, the FPGA software that
this was detected in now generates the correct number of worker threads for the cores in the system. This of course improves their synthesize time
dramatically since they scale well with the number of cores in the system.
Disable io_uring syscalls
Our good friends over at felix86 alerted us to an issue around io_uring causing infinite loops in node.js and libuv. Upon further
investigation we determine that there is an ABI break in io_uring between x86 and Arm64 that we previously didn't know about. This comes down to how
the user submission queues in io_uring can embed epoll_event structures and these have different layouts between the architectures.
Because we can't safely rewrite the queue data to handle this layout difference, we have determined the only course of action is to disable the
syscalls. Luckily most games don't rely on this syscall interface or applications will have a legacy fallback for when it is unsupported. In that
vein, node.js now works again.
FEAT_LRCPC2 performance errata
This month we found out that a large number of Cortex and Neoverse CPU cores have an errata that only affects the instructions added in FEAT_LRCPC2.
We have disabled this extension on the affected CPU cores, which can give a reasonable performance improvement in games that were TSO emulation bounded.
JIT and emulation bug fixes
There were a bunch of bug fixes in both our JIT and Linux syscall emulation this month as usual, but this month's report is already running long so if
you're interested, take a peek at our pull requests to find out more.
Raw Changes
FEX Release FEX-2512
-
Async
-
CMake
- Disable libstdc++'s debug mode when compiling thunkgen (53db3ad)
-
CPUID
-
CodeCache
-
CodeEmitter
- Removes a few spurious asserts (de1d37e)
-
Config
-
Docs
- fix typo (73a32ff)
-
Externals
-
FEX
-
FEXConfig
- Fix string list handling (fbefd78)
-
FEXCore
-
Fixes JITGuardPage calculation in a threaded environment (faf74ee)
-
Remove usage of "remote atomic" xor (3c9f6c8)
-
CodeCache
- Move spin-loop over to a WFE loop (e2f4065)
-
Common
- Adds the ability to override HostFeatures registers by config. (06c2319)
-
Config
- Expose GetConv members (5ee190a)
-
Win32
- Move WritePriorityMutex away from SRWLock (22c3cd5)
-
-
FEXInterpreter
- Fixes crash with code maps (379dc40)
-
FEXServer
- Add support for a
wait_fd(d74b5c4)
- Add support for a
-
Github
- Add a steamrt4 builder (427b235)
-
HostFeatures
-
IR
- Replace hand-written operators with three-way comparison (f290d2f)
-
JIT
-
Linux
- Disable io_uring (3dd591e)
-
LinuxSyscalls
-
LookupCache
- Fix use of invalidated iterators (3969d0a)
-
OpcodeDispatcher
- Simplify convoluted logic for computing call offsets (39fb266)
-
SVE256
- Fixes AVX scalar round with insert (8bb3398)
-
Scripts
-
SteamRT4
- Adds support for building the steam depot (bd7215d)
-
SyscallsSMCTracking
- Workaround assert in ELF mapping (bd13c02)
-
Utils
- WritePriorityMutex
- Support being forkable (a12b892)
- WritePriorityMutex
-
Misc
- Remap /proc/pid/cmdline with PR_SET_MM_MAP (c460cf0)
- Set current code block in x87 pass (90c8fcf)
- Reflect application changes to argv[0] in /proc/self/cmdline (9c72113)
- Support detecting unsquashfs>4.7.0 decompressors (6fd471e)
- Minor fixes (3d69029)
- Support (downstream) kernel-side unaligned atomic handling (The rebase sequel) (d3bf87f)
- Implement address size modifier handling in CMPSOp and SCASOp (5205ae4)
- Introduce two-pass code invalidation model (40c2db4)
- Fix movzx instruction syntax (aba0c57)
- Refactoring of storing code in x87 opt. stack pass (cf37617)
- Align code with clang-format (0be8485)
- Use gradual memory growth (1e3c642)
- Update xxhash to v0.8.3 (747ea0a)
-
docs
- Update ProgrammingConcerns (53b2245)

