Skip to content

FEX Release FEX-2512

Latest

Choose a tag to compare

@Sonicadvance1 Sonicadvance1 released this 06 Dec 01:15
· 101 commits to main since this release
ba1b474

Read the blog post at FEX-Emu's Site!

Another month and here we are with a new release! We also celebrated our seven year anniversary late last month; but enough about that boring
stuff, let's talk about what we improved!
### Remap procfs cmdline using PR_SET_MM_MAP
This has been a thorn in our side for a while. When an application reads the cmdline FEX would need to rewrite the file contents to remove the
FEXInterpreter argument. Turns out the kernel has had this feature for quite a while to remap this file, we just weren't utilizing it. Now instead of mangling the data, we are using the correct interface from the kernel. This means that things like Mesa application profiles and KDE
Plasma see the correct application name in all instances.

KDE Plasma before
KDE Plasma after

Big shoutout to the external contributors that implemented this for us!

Implement support for JIT codebuffer guard page based restart

This one takes a bit to explain what this is and why it is necessary. When writing our AArch64 code emitter, we made the decision not to do range
checks for how much memory is remaining in our JIT code buffer. We instead used a heuristic to determine how much space is required whcih usually
worked. The problem with heuristics of course is that they can fail and our "fallback" case was to crash. This was a known problem that we would need
to resolve at some point, and that was finally this month that we go around to it. Due to us utilizing larger "multiblock" JIT blocks, we had started
having a more likely chance of hitting this crash, which usually ends up being due to x87 heavy code because the JIT translation is heavy.

Now when the heuristic fails, our code emitter will try writing to our guard page and we will catch the SIGSEGV and restart the JIT with a larger
code buffer. Fixing these edge case crash behaviours and making our JIT more robust in the process.

Initial code caching features landing

There's an absolute ton of work that is going in to this and it's not yet ready for users yet, but it would be remiss to not call out all the effort
on this front.
This month we landed initial support for "code maps" and offline "code cache" generation. There is not yet any way for a user to actually utilize these
code maps and caches but these are the required steps to get us to the transparent code-caching that we are expecting to have. Watch out for the
coming months as we finish fleshing out this feature fully wired up.

Fixes APICID count

This is a bit of a weird feature that we had accidentally missed. When reading CPUID processes get what is called an APIC ID, which is essentially
just a core index. Some applications will use this ID as a way to determine how many unique CPU cores are available on the system. We were
accidentally always returning zero which was causing some applications to only think the system had one CPU. With this fixed, the FPGA software that
this was detected in now generates the correct number of worker threads for the cores in the system. This of course improves their synthesize time
dramatically since they scale well with the number of cores in the system.

Disable io_uring syscalls

Our good friends over at felix86 alerted us to an issue around io_uring causing infinite loops in node.js and libuv. Upon further
investigation we determine that there is an ABI break in io_uring between x86 and Arm64 that we previously didn't know about. This comes down to how
the user submission queues in io_uring can embed epoll_event structures and these have different layouts between the architectures.

Because we can't safely rewrite the queue data to handle this layout difference, we have determined the only course of action is to disable the
syscalls. Luckily most games don't rely on this syscall interface or applications will have a legacy fallback for when it is unsupported. In that
vein, node.js now works again.

FEAT_LRCPC2 performance errata

This month we found out that a large number of Cortex and Neoverse CPU cores have an errata that only affects the instructions added in FEAT_LRCPC2.
We have disabled this extension on the affected CPU cores, which can give a reasonable performance improvement in games that were TSO emulation bounded.

JIT and emulation bug fixes

There were a bunch of bug fixes in both our JIT and Linux syscall emulation this month as usual, but this month's report is already running long so if
you're interested, take a peek at our pull requests to find out more.

Raw Changes

FEX Release FEX-2512

  • Async

    • Adapt precondition checks when receiving FDs (3b83bdd)
    • Move file descriptor checks from read_some() to read() (da0668c)
  • CMake

    • Disable libstdc++'s debug mode when compiling thunkgen (53db3ad)
  • CPUID

    • Fixes regression from #5033 (a27c4b3)
    • Fixes APICID for processor count calculation. (0427d48)
  • CodeCache

    • Implement offline compiler for cache generation (d0e47f9)
    • Introduce code maps (32f1dcd)
  • CodeEmitter

    • Removes a few spurious asserts (de1d37e)
  • Config

    • Document the new FEX_APP_CACHE_LOCATION option (a251e61)
    • Refactor value getter interface (9e9f2cc)
  • Docs

  • Externals

    • Update catch2 from 3.5.3 to 3.11.0 (f4e3e4a)
    • Update fmt to 12.1.0 (2febb52)
  • FEX

    • Moves FEX thunk callback function generation to the frontend (aaef344)

    • Implements support for JIT CodeBuffer guard page restart (b383691)

    • VDSO

  • FEXConfig

    • Fix string list handling (fbefd78)
  • FEXCore

    • Fixes JITGuardPage calculation in a threaded environment (faf74ee)

    • Remove usage of "remote atomic" xor (3c9f6c8)

    • CodeCache

      • Move spin-loop over to a WFE loop (e2f4065)
    • Common

      • Adds the ability to override HostFeatures registers by config. (06c2319)
    • Config

    • Win32

      • Move WritePriorityMutex away from SRWLock (22c3cd5)
  • FEXInterpreter

    • Fixes crash with code maps (379dc40)
  • FEXServer

    • Add support for a wait_fd (d74b5c4)
  • Github

  • HostFeatures

    • Extend LRCPC2 errata to more CPUs (92d5ba5)
    • Disable SupportsTSOImm9 for some CPUs (57e23b2)
  • IR

    • Replace hand-written operators with three-way comparison (f290d2f)
  • JIT

    • Add support for serializing relocations (6af9057)
    • Prepare FEX relocations for code caching (e4fa399)
    • Fix indirect delinker branch distance (8214ffc)
    • Handle long ADR/ADRP (5d908d9)
  • Linux

  • LinuxSyscalls

    • Fix null pointer dereference in LookupExecutableFileSection (b34df33)
    • Fixes alloca use-after-free in RecvMMsg (4afbdd9)
    • Fix incorrectly inferred base address observed in glxtest (0b52e1c)
  • LookupCache

    • Fix use of invalidated iterators (3969d0a)
  • OpcodeDispatcher

    • Simplify convoluted logic for computing call offsets (39fb266)
  • SVE256

    • Fixes AVX scalar round with insert (8bb3398)
  • Scripts

  • SteamRT4

    • Adds support for building the steam depot (bd7215d)
  • SyscallsSMCTracking

    • Workaround assert in ELF mapping (bd13c02)
  • Utils

    • WritePriorityMutex
  • Misc

    • Remap /proc/pid/cmdline with PR_SET_MM_MAP (c460cf0)
    • Set current code block in x87 pass (90c8fcf)
    • Reflect application changes to argv[0] in /proc/self/cmdline (9c72113)
    • Support detecting unsquashfs>4.7.0 decompressors (6fd471e)
    • Minor fixes (3d69029)
    • Support (downstream) kernel-side unaligned atomic handling (The rebase sequel) (d3bf87f)
    • Implement address size modifier handling in CMPSOp and SCASOp (5205ae4)
    • Introduce two-pass code invalidation model (40c2db4)
    • Fix movzx instruction syntax (aba0c57)
    • Refactoring of storing code in x87 opt. stack pass (cf37617)
    • Align code with clang-format (0be8485)
    • Use gradual memory growth (1e3c642)
    • Update xxhash to v0.8.3 (747ea0a)
  • docs

    • Update ProgrammingConcerns (53b2245)