Skip to content

SIGABRT crashes when creating many engines/modules due to GLOBAL_CODE registry address reuse #12511

@zacharywhitley

Description

@zacharywhitley

SIGABRT crashes when creating many engines/modules due to GLOBAL_CODE registry address reuse

Summary

When creating and destroying many wasmtime::Engine and wasmtime::Module instances in a single process, the GLOBAL_CODE registry's assert!() statements cause SIGABRT crashes after approximately 350-400 iterations. This is caused by virtual address reuse before Arc references are fully released.

Environment

  • Wasmtime version: 41.0.1 (also confirmed in recent main)
  • Platform: macOS Darwin 24.5.0, Linux
  • Configuration: signals_based_traps(false) (required for JVM integration)

Reproduction

use wasmtime::{Config, Engine, Module};

fn main() {
    let wat = "(module (func (export \"test\") (result i32) i32.const 42))";

    for i in 0..500 {
        let mut config = Config::new();
        config.signals_based_traps(false);
        let engine = Engine::new(&config).unwrap();
        let _module = Module::new(&engine, wat).unwrap();
        // Engine and module dropped here

        if i % 100 == 0 {
            eprintln!("Iteration {}", i);
        }
    }
    println!("Completed all iterations");
}

Expected: All 500 iterations complete successfully.
Actual: Process aborts with SIGABRT around iteration 350-400.

Root Cause Analysis

The Problem

In crates/wasmtime/src/runtime/module/registry.rs:

pub fn register_code(image: &Arc<CodeMemory>, address: Range<usize>) {
    // ...
    let prev = global_code().write().insert(end, (start, image.clone()));
    assert!(prev.is_none());  // ABORTS if duplicate key
}

pub fn unregister_code(address: Range<usize>) {
    // ...
    let code = global_code().write().remove(&end);
    assert!(code.is_some());  // ABORTS if key not found
}

Why This Happens

  1. Engine A allocates code memory at virtual address range [0x1000, 0x2000)
  2. register_code registers this with key 0x1FFF (end - 1)
  3. Engine A is "dropped" but Arc<CodeMemory> references may still exist (from Module, Store, etc.)
  4. Engine B is created; the OS reuses virtual address [0x1000, 0x2000) for its code
  5. register_code tries to insert key 0x1FFF again
  6. assert!(prev.is_none()) fails → SIGABRT

The reverse can also happen:

  1. Old Arc<CodeMemory> finally deallocates, calling unregister_code
  2. But the new engine already re-registered at that address
  3. unregister_code removes the new engine's registration
  4. Later, new engine's drop calls unregister_code
  5. assert!(code.is_some()) fails → SIGABRT

Why ~350-400 Iterations?

This threshold corresponds to when virtual address reuse becomes statistically likely given:

  • macOS/Linux mmap allocation patterns
  • Accumulated Arc references not yet fully released by Rust's deferred deallocation

Proposed Fix

Make register_code and unregister_code idempotent by tracking registered addresses:

fn registered_addresses() -> &'static RwLock<BTreeSet<usize>> {
    static REGISTERED: OnceLock<RwLock<BTreeSet<usize>>> = OnceLock::new();
    REGISTERED.get_or_init(Default::default)
}

pub fn register_code(image: &Arc<CodeMemory>, address: Range<usize>) {
    if address.is_empty() {
        return;
    }
    let start = address.start;
    let end = address.end - 1;

    // Check if already registered - make operation idempotent
    {
        let mut tracked = registered_addresses().write();
        if tracked.contains(&end) {
            return; // Already registered, skip
        }
        tracked.insert(end);
    }

    global_code().write().insert(end, (start, image.clone()));
}

pub fn unregister_code(address: Range<usize>) {
    if address.is_empty() {
        return;
    }
    let end = address.end - 1;

    // Check if registered - make operation idempotent
    {
        let mut tracked = registered_addresses().write();
        if !tracked.contains(&end) {
            return; // Not registered, skip
        }
        tracked.remove(&end);
    }

    global_code().write().remove(&end);
}

Why This Fix is Safe

  1. No functionality change: The registry still correctly tracks all live code regions
  2. Minimal overhead: BTreeSet lookup is O(log n), same as the existing BTreeMap
  3. Thread-safe: Uses the same RwLock pattern as the existing code
  4. Backward compatible: No API changes

Impact

This issue affects:

  • JVM integrations (Java, Kotlin, Scala) where signals_based_traps(false) is required
  • Long-running servers that dynamically load/unload WASM modules
  • Test suites with many engine/module creation tests
  • Any application creating 350+ engines/modules in a single process

Workarounds

  1. Engine reuse: Share a singleton engine across the application (mitigates but doesn't eliminate)
  2. Process isolation: Run tests in separate processes (inconvenient for CI)

Neither workaround fully solves the issue for libraries that can't control caller behavior.

Related

  • Wasmtime's trap handling architecture in docs/contributing-architecture.md
  • ModuleRegistry and GLOBAL_MODULES for similar patterns

Additional Context

We discovered this issue while building Java bindings for wasmtime (wasmtime4j). Our test suite has ~860 tests, many creating engines and modules. The suite consistently crashed around test 350-400 before we implemented this fix.

We've been running with this fix in production and all tests now pass reliably.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions