Skip to content

Support streaming model import to avoid OOM for large (multi-GB) files #372

@sanchitmonga22

Description

@sanchitmonga22

Problem

ModelManager.importModel() and the model loading pipeline read entire files into memory via file.arrayBuffer(). For 2-8 GB LLM models, this spikes JS heap memory and can cause OOM crashes on constrained devices.

Current State

  • importModel() calls new Uint8Array(await file.arrayBuffer()) — full file in memory
  • ModelLoadContext.data: Uint8Array forces loaders to receive the full file
  • Double-buffering: JS heap copy + WASM linear memory copy

Proposed Solution

  1. Add streaming interface to ModelLoadContext: dataStream?: ReadableStream<Uint8Array>
  2. Update importModel() to use file.stream() and pipe chunks to storage
  3. When LocalFileStorage is active, avoid copy entirely by passing the File handle
  4. Update backend loaders to support chunked writes to their WASM FS

Impact

  • High for users downloading large LLMs (2+ GB)
  • Medium complexity — requires interface changes across core + backends

From PR #370 review comments (greptile + coderabbit).

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions