Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CUDA memory increasing and process freeze [Performance] #22872

Open
kkluonaitis opened this issue Nov 18, 2024 · 0 comments
Open

CUDA memory increasing and process freeze [Performance] #22872

kkluonaitis opened this issue Nov 18, 2024 · 0 comments
Labels
ep:CUDA issues related to the CUDA execution provider performance issues related to performance regressions

Comments

@kkluonaitis
Copy link

kkluonaitis commented Nov 18, 2024

Describe the issue

In production I run long-t5 model for data procesing, tried using onnxruntime-gpu 1.19.0. I run 3 processes on the same instances, which share GPU resources, but all processes kinda freeze after gradual GPU memory increase. In nvidia-smi I saw a processes using some GPU memory (not all), but application logs just stopped. Rolled back to onnxruntime to 1.18.0, which works fine. Current dependencies do not allow to upgrade to 1.20.0. I know that sharing GPU between processes may not be the best practice, but this is cost efficient and worked until now.

Any ideas what could be eating up the memory?

To reproduce

The model I use:
https://huggingface.co/agemagician/mlong-t5-tglobal-large

Urgency

No response

Platform

Linux

OS Version

Amazon Linux AMI 2.0.20230606 x86_64

ONNX Runtime Installation

Built from Source

ONNX Runtime Version or Commit ID

1.19.0

ONNX Runtime API

Python

Architecture

X64

Execution Provider

CUDA

Execution Provider Library Version

CUDA 11.8

Model File

No response

Is this a quantized model?

No

@kkluonaitis kkluonaitis added the performance issues related to performance regressions label Nov 18, 2024
@github-actions github-actions bot added the ep:CUDA issues related to the CUDA execution provider label Nov 18, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
ep:CUDA issues related to the CUDA execution provider performance issues related to performance regressions
Projects
None yet
Development

No branches or pull requests

1 participant