-
Notifications
You must be signed in to change notification settings - Fork 1.6k
Description
SIGABRT crashes when creating many engines/modules due to GLOBAL_CODE registry address reuse
Summary
When creating and destroying many wasmtime::Engine and wasmtime::Module instances in a single process, the GLOBAL_CODE registry's assert!() statements cause SIGABRT crashes after approximately 350-400 iterations. This is caused by virtual address reuse before Arc references are fully released.
Environment
- Wasmtime version: 41.0.1 (also confirmed in recent main)
- Platform: macOS Darwin 24.5.0, Linux
- Configuration:
signals_based_traps(false)(required for JVM integration)
Reproduction
use wasmtime::{Config, Engine, Module};
fn main() {
let wat = "(module (func (export \"test\") (result i32) i32.const 42))";
for i in 0..500 {
let mut config = Config::new();
config.signals_based_traps(false);
let engine = Engine::new(&config).unwrap();
let _module = Module::new(&engine, wat).unwrap();
// Engine and module dropped here
if i % 100 == 0 {
eprintln!("Iteration {}", i);
}
}
println!("Completed all iterations");
}Expected: All 500 iterations complete successfully.
Actual: Process aborts with SIGABRT around iteration 350-400.
Root Cause Analysis
The Problem
In crates/wasmtime/src/runtime/module/registry.rs:
pub fn register_code(image: &Arc<CodeMemory>, address: Range<usize>) {
// ...
let prev = global_code().write().insert(end, (start, image.clone()));
assert!(prev.is_none()); // ABORTS if duplicate key
}
pub fn unregister_code(address: Range<usize>) {
// ...
let code = global_code().write().remove(&end);
assert!(code.is_some()); // ABORTS if key not found
}Why This Happens
- Engine A allocates code memory at virtual address range
[0x1000, 0x2000) register_coderegisters this with key0x1FFF(end - 1)- Engine A is "dropped" but
Arc<CodeMemory>references may still exist (from Module, Store, etc.) - Engine B is created; the OS reuses virtual address
[0x1000, 0x2000)for its code register_codetries to insert key0x1FFFagainassert!(prev.is_none())fails → SIGABRT
The reverse can also happen:
- Old
Arc<CodeMemory>finally deallocates, callingunregister_code - But the new engine already re-registered at that address
unregister_coderemoves the new engine's registration- Later, new engine's drop calls
unregister_code assert!(code.is_some())fails → SIGABRT
Why ~350-400 Iterations?
This threshold corresponds to when virtual address reuse becomes statistically likely given:
- macOS/Linux mmap allocation patterns
- Accumulated Arc references not yet fully released by Rust's deferred deallocation
Proposed Fix
Make register_code and unregister_code idempotent by tracking registered addresses:
fn registered_addresses() -> &'static RwLock<BTreeSet<usize>> {
static REGISTERED: OnceLock<RwLock<BTreeSet<usize>>> = OnceLock::new();
REGISTERED.get_or_init(Default::default)
}
pub fn register_code(image: &Arc<CodeMemory>, address: Range<usize>) {
if address.is_empty() {
return;
}
let start = address.start;
let end = address.end - 1;
// Check if already registered - make operation idempotent
{
let mut tracked = registered_addresses().write();
if tracked.contains(&end) {
return; // Already registered, skip
}
tracked.insert(end);
}
global_code().write().insert(end, (start, image.clone()));
}
pub fn unregister_code(address: Range<usize>) {
if address.is_empty() {
return;
}
let end = address.end - 1;
// Check if registered - make operation idempotent
{
let mut tracked = registered_addresses().write();
if !tracked.contains(&end) {
return; // Not registered, skip
}
tracked.remove(&end);
}
global_code().write().remove(&end);
}Why This Fix is Safe
- No functionality change: The registry still correctly tracks all live code regions
- Minimal overhead: BTreeSet lookup is O(log n), same as the existing BTreeMap
- Thread-safe: Uses the same RwLock pattern as the existing code
- Backward compatible: No API changes
Impact
This issue affects:
- JVM integrations (Java, Kotlin, Scala) where
signals_based_traps(false)is required - Long-running servers that dynamically load/unload WASM modules
- Test suites with many engine/module creation tests
- Any application creating 350+ engines/modules in a single process
Workarounds
- Engine reuse: Share a singleton engine across the application (mitigates but doesn't eliminate)
- Process isolation: Run tests in separate processes (inconvenient for CI)
Neither workaround fully solves the issue for libraries that can't control caller behavior.
Related
- Wasmtime's trap handling architecture in
docs/contributing-architecture.md ModuleRegistryandGLOBAL_MODULESfor similar patterns
Additional Context
We discovered this issue while building Java bindings for wasmtime (wasmtime4j). Our test suite has ~860 tests, many creating engines and modules. The suite consistently crashed around test 350-400 before we implemented this fix.
We've been running with this fix in production and all tests now pass reliably.