-
Notifications
You must be signed in to change notification settings - Fork 289
Open
Labels
Description
Problem
ModelManager.importModel() and the model loading pipeline read entire files into memory via file.arrayBuffer(). For 2-8 GB LLM models, this spikes JS heap memory and can cause OOM crashes on constrained devices.
Current State
importModel()callsnew Uint8Array(await file.arrayBuffer())— full file in memoryModelLoadContext.data: Uint8Arrayforces loaders to receive the full file- Double-buffering: JS heap copy + WASM linear memory copy
Proposed Solution
- Add streaming interface to ModelLoadContext:
dataStream?: ReadableStream<Uint8Array> - Update
importModel()to usefile.stream()and pipe chunks to storage - When LocalFileStorage is active, avoid copy entirely by passing the File handle
- Update backend loaders to support chunked writes to their WASM FS
Impact
- High for users downloading large LLMs (2+ GB)
- Medium complexity — requires interface changes across core + backends
From PR #370 review comments (greptile + coderabbit).
Reactions are currently unavailable