Unexpected CUDA out of memory for minimal example #819

rwesterteiger · 2024-10-03T03:31:11Z

Description

Loading a simple quantized GGUF model with CUDA fails with:
Error: DriverError(CUDA_ERROR_OUT_OF_MEMORY, "out of memory")

log.txt

Environment

Linux Mint 21.3 Virginia
GeForce RTX 2060 SUPER (8 GiB VRAM)
CUDA 12.6

When running the sample below, less than 1 GiB of VRAM was occupied.

Sample code

    let model = GgufModelBuilder::new(
        "bartowski/Meta-Llama-3.1-8B-Instruct-GGUF",
        vec![ "Meta-Llama-3.1-8B-Instruct-Q4_K_M.gguf"])
    .build()
    .await?;

Versions used

Rust toolchain:
- stable-x86_64-unknown-linux-gnu (default)
- rustc 1.81.0 (eeb90cda1 2024-09-04)

Cargo.toml.txt

GIT hashes:
- mistralrs: 329e0e8
- mistralrs-core: 329e0e8

The text was updated successfully, but these errors were encountered:

rwesterteiger added the bug Something isn't working label Oct 3, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Unexpected CUDA out of memory for minimal example #819

Unexpected CUDA out of memory for minimal example #819

rwesterteiger commented Oct 3, 2024

Unexpected CUDA out of memory for minimal example #819

Unexpected CUDA out of memory for minimal example #819

Comments

rwesterteiger commented Oct 3, 2024

Description

Environment

Sample code

Versions used