We read every piece of feedback, and take your input very seriously.
To see all available qualifiers, see our documentation.
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
单机两卡加载13B模型进行推理很慢: 1、随着prompt的增加,推理耗时明显越慢 2、双卡中一个使用率很低,一个使用率100%
+---------------------------------------------------------------------------------------+ | NVIDIA-SMI 535.146.02 Driver Version: 535.146.02 CUDA Version: 12.2 | |-----------------------------------------+----------------------+----------------------+ | GPU Name Persistence-M | Bus-Id Disp.A | Volatile Uncorr. ECC | | Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. | | | | MIG M. | |=========================================+======================+======================| | 0 NVIDIA GeForce RTX 4090 Off | 00000000:01:00.0 Off | Off | | 0% 48C P2 72W / 450W | 18853MiB / 24564MiB | 3% Default | | | | N/A | +-----------------------------------------+----------------------+----------------------+ | 1 NVIDIA GeForce RTX 4090 Off | 00000000:03:00.0 Off | Off | | 0% 49C P2 87W / 450W | 16495MiB / 24564MiB | 100% Default | | | | N/A | +-----------------------------------------+----------------------+----------------------+
The text was updated successfully, but these errors were encountered:
@LanseerWang 您好, 普通的多卡加载推理会比单卡加载推理慢很多,可以考虑用VLLM库进行加速,VLLM可以支持多卡张量并行部署,速度会快很多。
Sorry, something went wrong.
No branches or pull requests
单机两卡加载13B模型进行推理很慢:
1、随着prompt的增加,推理耗时明显越慢
2、双卡中一个使用率很低,一个使用率100%
+---------------------------------------------------------------------------------------+
| NVIDIA-SMI 535.146.02 Driver Version: 535.146.02 CUDA Version: 12.2 |
|-----------------------------------------+----------------------+----------------------+
| GPU Name Persistence-M | Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|=========================================+======================+======================|
| 0 NVIDIA GeForce RTX 4090 Off | 00000000:01:00.0 Off | Off |
| 0% 48C P2 72W / 450W | 18853MiB / 24564MiB | 3% Default |
| | | N/A |
+-----------------------------------------+----------------------+----------------------+
| 1 NVIDIA GeForce RTX 4090 Off | 00000000:03:00.0 Off | Off |
| 0% 49C P2 87W / 450W | 16495MiB / 24564MiB | 100% Default |
| | | N/A |
+-----------------------------------------+----------------------+----------------------+
The text was updated successfully, but these errors were encountered: