[BUG] Memory Leak and Performance Issue with Pixtral 12B VLM #275

NeoChen1024 · 2025-01-23T14:50:29Z

OS

Linux

GPU Library

CUDA 12.x

Python version

3.11

Describe the bug

When using Pixtral 12B VLM 8.0bpw (quantized by turboderp at https://huggingface.co/turboderp/pixtral-12b-exl2/tree/8.0bpw), the main process slowly leaks memory with lots of vision request.

At one time it leaked about 48GiB of memory, then I manually stopped it.

Reproduction steps

I run my image captioning script here with lots of parallel request (I tried 32): https://github.com/NeoChen1024/scripts/blob/master/llm-image-captioning.py
Then it will slowly leak memory, and also slowdown decoding stage (text generation t/s slowly declines, also the average GPU usage).
Enabling or disabling uvloop and CUDA malloc backend doesn't help.

Expected behavior

It doesn't leak memory, speed stays the same.

Logs

No response

Additional context

No response

Acknowledgements

I have looked for similar issues before submitting this one.
I have read the disclaimer, and this issue is related to a code bug. If I have a question, I will use the Discord server.
I understand that the developers have lives and my issue will be answered when possible.
I understand the developers of this program are human, and I will ask my questions politely.

bdashore3 · 2025-01-25T21:13:22Z

I'd say this is an exl2 issue and cc @turboderp

bdashore3 · 2025-02-02T17:23:54Z

@NeoChen1024 Please try commit 96e8375. Thanks @turboderp for finding the problem

NeoChen1024 added the bug Something isn't working label Jan 23, 2025

bdashore3 added the exl2 issue Exl2 issue, may be fixed in its dev branch label Jan 25, 2025

bdashore3 removed the exl2 issue Exl2 issue, may be fixed in its dev branch label Feb 2, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[BUG] Memory Leak and Performance Issue with Pixtral 12B VLM #275

[BUG] Memory Leak and Performance Issue with Pixtral 12B VLM #275

NeoChen1024 commented Jan 23, 2025 •

edited

Loading

bdashore3 commented Jan 25, 2025

bdashore3 commented Feb 2, 2025

[BUG] Memory Leak and Performance Issue with Pixtral 12B VLM #275

[BUG] Memory Leak and Performance Issue with Pixtral 12B VLM #275

Comments

NeoChen1024 commented Jan 23, 2025 • edited Loading

OS

GPU Library

Python version

Describe the bug

Reproduction steps

Expected behavior

Logs

Additional context

Acknowledgements

bdashore3 commented Jan 25, 2025

bdashore3 commented Feb 2, 2025

NeoChen1024 commented Jan 23, 2025 •

edited

Loading