-
Notifications
You must be signed in to change notification settings - Fork 152
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Slower than expected GPU inference in deployment/libtorch
example
#273
Comments
Hi @mattpopovich , Seems that PyTorch 1.9 requires two warm-ups on the GPU, and we need to ignore the first two calculation times. Could you test it again or upgrade your PyTorch to 1.10.1? (Check pytorch/pytorch#58801 for more details.) The part of TensorRT C++ is under development, we have implemented the core parts of model converting, now there are several parts that still need to be implemented:
And all contributions are welcome here! |
Great find! I ran way too many tests on my machine (below) with PyTorch, TorchVision, and OpenCV built from source (originally I was seeing slow inference no matter how many times I "warmed up" the model, but I have since been unable to recreate that). It looks like 3 warm-ups is necessary for all recent versions of PyTorch:CUDA 11.4.3, PyTorch 1.10.1, TorchVision 0.11.2, OpenCV 4.5.5:Click to show software configuration
# python3 -m torch.utils.collect_env
Collecting environment information...
PyTorch version: 1.10.0a0+git302ee7b
Is debug build: False
CUDA used to build PyTorch: 11.4
ROCM used to build PyTorch: N/A
OS: Ubuntu 20.04.3 LTS (x86_64)
GCC version: (Ubuntu 9.3.0-17ubuntu1~20.04) 9.3.0
Clang version: Could not collect
CMake version: version 3.21.1
Libc version: glibc-2.31
Python version: 3.8.10 (default, Nov 26 2021, 20:14:08) [GCC 9.3.0] (64-bit runtime)
Python platform: Linux-5.4.0-89-generic-x86_64-with-glibc2.29
Is CUDA available: True
CUDA runtime version: 11.4.152
GPU models and configuration:
GPU 0: Tesla V100-SXM2-32GB
GPU 1: Tesla V100-SXM2-32GB
GPU 2: Tesla V100-SXM2-32GB
GPU 3: Tesla V100-SXM2-32GB
Nvidia driver version: 470.74
cuDNN version: Probably one of the following:
/usr/lib/x86_64-linux-gnu/libcudnn.so.8.1.0
/usr/lib/x86_64-linux-gnu/libcudnn_adv_infer.so.8.1.0
/usr/lib/x86_64-linux-gnu/libcudnn_adv_train.so.8.1.0
/usr/lib/x86_64-linux-gnu/libcudnn_cnn_infer.so.8.1.0
/usr/lib/x86_64-linux-gnu/libcudnn_cnn_train.so.8.1.0
/usr/lib/x86_64-linux-gnu/libcudnn_ops_infer.so.8.1.0
/usr/lib/x86_64-linux-gnu/libcudnn_ops_train.so.8.1.0
HIP runtime version: N/A
MIOpen runtime version: N/A
Versions of relevant libraries:
[pip3] mypy-extensions==0.4.3
[pip3] numpy==1.22.0
[pip3] pytorch-lightning==1.5.8
[pip3] torch==1.10.0a0+git302ee7b
[pip3] torchmetrics==0.6.2
[pip3] torchvision==0.11.0a0+e7ec7e2
[conda] Could not collect
CUDA 11.4.2, PyTorch 1.10.1, TorchVision 0.11.2, OpenCV 4.5.5:Click to show software configuration
# python3 -m torch.utils.collect_env
Collecting environment information...
PyTorch version: 1.10.0a0+git302ee7b
Is debug build: False
CUDA used to build PyTorch: 11.4
ROCM used to build PyTorch: N/A
OS: Ubuntu 20.04.3 LTS (x86_64)
GCC version: (Ubuntu 9.3.0-17ubuntu1~20.04) 9.3.0
Clang version: Could not collect
CMake version: version 3.21.1
Libc version: glibc-2.31
Python version: 3.8.10 (default, Sep 28 2021, 16:10:42) [GCC 9.3.0] (64-bit runtime)
Python platform: Linux-5.4.0-89-generic-x86_64-with-glibc2.29
Is CUDA available: True
CUDA runtime version: 11.4.120
GPU models and configuration:
GPU 0: Tesla V100-SXM2-32GB
GPU 1: Tesla V100-SXM2-32GB
GPU 2: Tesla V100-SXM2-32GB
GPU 3: Tesla V100-SXM2-32GB
Nvidia driver version: 470.74
cuDNN version: Probably one of the following:
/usr/lib/x86_64-linux-gnu/libcudnn.so.8.1.0
/usr/lib/x86_64-linux-gnu/libcudnn_adv_infer.so.8.1.0
/usr/lib/x86_64-linux-gnu/libcudnn_adv_train.so.8.1.0
/usr/lib/x86_64-linux-gnu/libcudnn_cnn_infer.so.8.1.0
/usr/lib/x86_64-linux-gnu/libcudnn_cnn_train.so.8.1.0
/usr/lib/x86_64-linux-gnu/libcudnn_ops_infer.so.8.1.0
/usr/lib/x86_64-linux-gnu/libcudnn_ops_train.so.8.1.0
HIP runtime version: N/A
MIOpen runtime version: N/A
Versions of relevant libraries:
[pip3] mypy-extensions==0.4.3
[pip3] numpy==1.21.4
[pip3] pytorch-lightning==1.5.8
[pip3] torch==1.10.0a0+git302ee7b
[pip3] torchmetrics==0.6.2
[pip3] torchvision==0.11.0a0+e7ec7e2
[conda] Could not collect
CUDA 11.4.2, PyTorch 1.10.0, TorchVision 0.11.1, OpenCV 4.5.4Click to show software configuration
# python3 -m torch.utils.collect_env
Collecting environment information...
PyTorch version: 1.10.0a0+git36449ea
Is debug build: False
CUDA used to build PyTorch: 11.4
ROCM used to build PyTorch: N/A
OS: Ubuntu 20.04.3 LTS (x86_64)
GCC version: (Ubuntu 9.3.0-17ubuntu1~20.04) 9.3.0
Clang version: Could not collect
CMake version: version 3.21.1
Libc version: glibc-2.31
Python version: 3.8.10 (default, Sep 28 2021, 16:10:42) [GCC 9.3.0] (64-bit runtime)
Python platform: Linux-5.4.0-89-generic-x86_64-with-glibc2.29
Is CUDA available: True
CUDA runtime version: 11.4.120
GPU models and configuration:
GPU 0: Tesla V100-SXM2-32GB
GPU 1: Tesla V100-SXM2-32GB
GPU 2: Tesla V100-SXM2-32GB
GPU 3: Tesla V100-SXM2-32GB
Nvidia driver version: 470.74
cuDNN version: Probably one of the following:
/usr/lib/x86_64-linux-gnu/libcudnn.so.8.1.0
/usr/lib/x86_64-linux-gnu/libcudnn_adv_infer.so.8.1.0
/usr/lib/x86_64-linux-gnu/libcudnn_adv_train.so.8.1.0
/usr/lib/x86_64-linux-gnu/libcudnn_cnn_infer.so.8.1.0
/usr/lib/x86_64-linux-gnu/libcudnn_cnn_train.so.8.1.0
/usr/lib/x86_64-linux-gnu/libcudnn_ops_infer.so.8.1.0
/usr/lib/x86_64-linux-gnu/libcudnn_ops_train.so.8.1.0
HIP runtime version: N/A
MIOpen runtime version: N/A
Versions of relevant libraries:
[pip3] numpy==1.21.4
[pip3] pytorch-lightning==1.5.4
[pip3] torch==1.10.0a0+git36449ea
[pip3] torchmetrics==0.6.0
[pip3] torchvision==0.11.0a0+fa347eb
[conda] Could not collect
CUDA 11.4.1, PyTorch 1.10.1, TorchVision 0.11.2, OpenCV 4.5.5:Click to show software configuration
# python3 -m torch.utils.collect_env
Collecting environment information...
PyTorch version: 1.10.0a0+git302ee7b
Is debug build: False
CUDA used to build PyTorch: 11.4
ROCM used to build PyTorch: N/A
OS: Ubuntu 20.04.3 LTS (x86_64)
GCC version: (Ubuntu 9.3.0-17ubuntu1~20.04) 9.3.0
Clang version: Could not collect
CMake version: version 3.21.1
Libc version: glibc-2.31
Python version: 3.8.10 (default, Sep 28 2021, 16:10:42) [GCC 9.3.0] (64-bit runtime)
Python platform: Linux-5.4.0-89-generic-x86_64-with-glibc2.29
Is CUDA available: True
CUDA runtime version: 11.4.120
GPU models and configuration:
GPU 0: Tesla V100-SXM2-32GB
GPU 1: Tesla V100-SXM2-32GB
GPU 2: Tesla V100-SXM2-32GB
GPU 3: Tesla V100-SXM2-32GB
Nvidia driver version: 470.74
cuDNN version: Probably one of the following:
/usr/lib/x86_64-linux-gnu/libcudnn.so.8.1.0
/usr/lib/x86_64-linux-gnu/libcudnn_adv_infer.so.8.1.0
/usr/lib/x86_64-linux-gnu/libcudnn_adv_train.so.8.1.0
/usr/lib/x86_64-linux-gnu/libcudnn_cnn_infer.so.8.1.0
/usr/lib/x86_64-linux-gnu/libcudnn_cnn_train.so.8.1.0
/usr/lib/x86_64-linux-gnu/libcudnn_ops_infer.so.8.1.0
/usr/lib/x86_64-linux-gnu/libcudnn_ops_train.so.8.1.0
HIP runtime version: N/A
MIOpen runtime version: N/A
Versions of relevant libraries:
[pip3] mypy-extensions==0.4.3
[pip3] numpy==1.21.4
[pip3] pytorch-lightning==1.5.8
[pip3] torch==1.10.0a0+git302ee7b
[pip3] torchmetrics==0.6.2
[pip3] torchvision==0.11.0a0+e7ec7e2
[conda] Could not collect
CUDA 11.4.1, PyTorch 1.10.0 commit 3fd9dcf, TorchVision 0.11.1, OpenCV 4.5.4:Click to show software configuration
# python3 -m torch.utils.collect_env
Collecting environment information...
PyTorch version: 1.10.0a0+git3fd9dcf
Is debug build: False
CUDA used to build PyTorch: 11.4
ROCM used to build PyTorch: N/A
OS: Ubuntu 20.04.3 LTS (x86_64)
GCC version: (Ubuntu 9.3.0-17ubuntu1~20.04) 9.3.0
Clang version: Could not collect
CMake version: version 3.21.1
Libc version: glibc-2.31
Python version: 3.8.10 (default, Sep 28 2021, 16:10:42) [GCC 9.3.0] (64-bit runtime)
Python platform: Linux-5.4.0-89-generic-x86_64-with-glibc2.29
Is CUDA available: True
CUDA runtime version: 11.4.120
GPU models and configuration:
GPU 0: Tesla V100-SXM2-32GB
GPU 1: Tesla V100-SXM2-32GB
GPU 2: Tesla V100-SXM2-32GB
GPU 3: Tesla V100-SXM2-32GB
Nvidia driver version: 470.74
cuDNN version: Probably one of the following:
/usr/lib/x86_64-linux-gnu/libcudnn.so.8.1.0
/usr/lib/x86_64-linux-gnu/libcudnn_adv_infer.so.8.1.0
/usr/lib/x86_64-linux-gnu/libcudnn_adv_train.so.8.1.0
/usr/lib/x86_64-linux-gnu/libcudnn_cnn_infer.so.8.1.0
/usr/lib/x86_64-linux-gnu/libcudnn_cnn_train.so.8.1.0
/usr/lib/x86_64-linux-gnu/libcudnn_ops_infer.so.8.1.0
/usr/lib/x86_64-linux-gnu/libcudnn_ops_train.so.8.1.0
HIP runtime version: N/A
MIOpen runtime version: N/A
Versions of relevant libraries:
[pip3] numpy==1.21.4
[pip3] pytorch-lightning==1.5.4
[pip3] torch==1.10.0a0+git3fd9dcf
[pip3] torchmetrics==0.6.1
[pip3] torchvision==0.11.0a0+fa347eb
[conda] Could not collect
CUDA 11.4.0, PyTorch 1.10.0, TorchVision 0.11.1, OpenCV 4.5.4:Click to show software configuration
# python3 -m torch.utils.collect_env
Collecting environment information...
PyTorch version: 1.10.0a0+git36449ea
Is debug build: False
CUDA used to build PyTorch: 11.4
ROCM used to build PyTorch: N/A
OS: Ubuntu 20.04.3 LTS (x86_64)
GCC version: (Ubuntu 9.3.0-17ubuntu1~20.04) 9.3.0
Clang version: Could not collect
CMake version: version 3.21.1
Libc version: glibc-2.31
Python version: 3.8.10 (default, Sep 28 2021, 16:10:42) [GCC 9.3.0] (64-bit runtime)
Python platform: Linux-5.4.0-89-generic-x86_64-with-glibc2.29
Is CUDA available: True
CUDA runtime version: 11.4.120
GPU models and configuration:
GPU 0: Tesla V100-SXM2-32GB
GPU 1: Tesla V100-SXM2-32GB
GPU 2: Tesla V100-SXM2-32GB
GPU 3: Tesla V100-SXM2-32GB
Nvidia driver version: 470.74
cuDNN version: Probably one of the following:
/usr/lib/x86_64-linux-gnu/libcudnn.so.8.1.0
/usr/lib/x86_64-linux-gnu/libcudnn_adv_infer.so.8.1.0
/usr/lib/x86_64-linux-gnu/libcudnn_adv_train.so.8.1.0
/usr/lib/x86_64-linux-gnu/libcudnn_cnn_infer.so.8.1.0
/usr/lib/x86_64-linux-gnu/libcudnn_cnn_train.so.8.1.0
/usr/lib/x86_64-linux-gnu/libcudnn_ops_infer.so.8.1.0
/usr/lib/x86_64-linux-gnu/libcudnn_ops_train.so.8.1.0
HIP runtime version: N/A
MIOpen runtime version: N/A
Versions of relevant libraries:
[pip3] numpy==1.21.4
[pip3] pytorch-lightning==1.5.4
[pip3] torch==1.10.0a0+git36449ea
[pip3] torchmetrics==0.6.0
[pip3] torchvision==0.11.0a0+fa347eb
[conda] Could not collect
CUDA 11.3.1, PyTorch 1.9.1, TorchVision 0.10.1, OpenCV 4.5.4Click to show software configuration
python3 -m torch.utils.collect_env
Collecting environment information...
PyTorch version: 1.9.0a0+gitdfbd030
Is debug build: False
CUDA used to build PyTorch: 11.3
ROCM used to build PyTorch: N/A
OS: Ubuntu 20.04.3 LTS (x86_64)
GCC version: (Ubuntu 9.3.0-17ubuntu1~20.04) 9.3.0
Clang version: Could not collect
CMake version: version 3.21.1
Libc version: glibc-2.31
Python version: 3.8 (64-bit runtime)
Python platform: Linux-5.4.0-89-generic-x86_64-with-glibc2.29
Is CUDA available: True
CUDA runtime version: 11.3.109
GPU models and configuration:
GPU 0: Tesla V100-SXM2-32GB
GPU 1: Tesla V100-SXM2-32GB
GPU 2: Tesla V100-SXM2-32GB
GPU 3: Tesla V100-SXM2-32GB
Nvidia driver version: 470.74
cuDNN version: Probably one of the following:
/usr/lib/x86_64-linux-gnu/libcudnn.so.8.1.0
/usr/lib/x86_64-linux-gnu/libcudnn_adv_infer.so.8.1.0
/usr/lib/x86_64-linux-gnu/libcudnn_adv_train.so.8.1.0
/usr/lib/x86_64-linux-gnu/libcudnn_cnn_infer.so.8.1.0
/usr/lib/x86_64-linux-gnu/libcudnn_cnn_train.so.8.1.0
/usr/lib/x86_64-linux-gnu/libcudnn_ops_infer.so.8.1.0
/usr/lib/x86_64-linux-gnu/libcudnn_ops_train.so.8.1.0
HIP runtime version: N/A
MIOpen runtime version: N/A
Versions of relevant libraries:
[pip3] numpy==1.21.4
[pip3] pytorch-lightning==1.5.4
[pip3] torch==1.9.0a0+gitdfbd030
[pip3] torchmetrics==0.6.0
[pip3] torchvision==0.10.0a0+ca1a620
[conda] Could not collect
CUDA 11.2.0, PyTorch 1.9.0, TorchVision 1.10.0, OpenCV 4.5.2Click to show software configuration
python3 -m torch.utils.collect_env
Collecting environment information...
PyTorch version: 1.9.0a0+gitd69c22d
Is debug build: False
CUDA used to build PyTorch: 11.2
ROCM used to build PyTorch: N/A
OS: Ubuntu 20.04.1 LTS (x86_64)
GCC version: (Ubuntu 9.3.0-17ubuntu1~20.04) 9.3.0
Clang version: Could not collect
CMake version: version 3.21.1
Libc version: glibc-2.31
Python version: 3.8 (64-bit runtime)
Python platform: Linux-5.4.0-89-generic-x86_64-with-glibc2.29
Is CUDA available: True
CUDA runtime version: 11.2.67
GPU models and configuration:
GPU 0: Tesla V100-SXM2-32GB
GPU 1: Tesla V100-SXM2-32GB
GPU 2: Tesla V100-SXM2-32GB
GPU 3: Tesla V100-SXM2-32GB
Nvidia driver version: 470.74
cuDNN version: Probably one of the following:
/usr/lib/x86_64-linux-gnu/libcudnn.so.8.1.0
/usr/lib/x86_64-linux-gnu/libcudnn_adv_infer.so.8.1.0
/usr/lib/x86_64-linux-gnu/libcudnn_adv_train.so.8.1.0
/usr/lib/x86_64-linux-gnu/libcudnn_cnn_infer.so.8.1.0
/usr/lib/x86_64-linux-gnu/libcudnn_cnn_train.so.8.1.0
/usr/lib/x86_64-linux-gnu/libcudnn_ops_infer.so.8.1.0
/usr/lib/x86_64-linux-gnu/libcudnn_ops_train.so.8.1.0
HIP runtime version: N/A
MIOpen runtime version: N/A
Versions of relevant libraries:
[pip3] numpy==1.21.4
[pip3] pytorch-lightning==1.5.2
[pip3] torch==1.9.0a0+gitd69c22d
[pip3] torchmetrics==0.6.0
[pip3] torchvision==0.10.0a0+300a8a4
[conda] Could not collect
Different PC with pre-built everything, not running in docker: CUDA 11.5, PyTorch 1.10.0, TorchVision 0.11.0, OpenCV 4.5.3Click to show software configuration
$ python3 -m torch.utils.collect_env
Collecting environment information...
PyTorch version: 1.10.0a0+git36449ea
Is debug build: False
CUDA used to build PyTorch: 11.5
ROCM used to build PyTorch: N/A
OS: Ubuntu 18.04.6 LTS (x86_64)
GCC version: (Ubuntu 9.4.0-1ubuntu1~18.04) 9.4.0
Clang version: Could not collect
CMake version: version 3.21.3
Libc version: glibc-2.25
Python version: 3.6.9 (default, Dec 8 2021, 21:08:43) [GCC 8.4.0] (64-bit runtime)
Python platform: Linux-5.4.0-1065-azure-x86_64-with-Ubuntu-18.04-bionic
Is CUDA available: True
CUDA runtime version: 11.5.119
GPU models and configuration: GPU 0: Tesla V100-PCIE-16GB
Nvidia driver version: 495.29.05
cuDNN version: Probably one of the following:
/usr/lib/x86_64-linux-gnu/libcudnn.so.8.3.1
/usr/lib/x86_64-linux-gnu/libcudnn_adv_infer.so.8.3.1
/usr/lib/x86_64-linux-gnu/libcudnn_adv_train.so.8.3.1
/usr/lib/x86_64-linux-gnu/libcudnn_cnn_infer.so.8.3.1
/usr/lib/x86_64-linux-gnu/libcudnn_cnn_train.so.8.3.1
/usr/lib/x86_64-linux-gnu/libcudnn_ops_infer.so.8.3.1
/usr/lib/x86_64-linux-gnu/libcudnn_ops_train.so.8.3.1
HIP runtime version: N/A
MIOpen runtime version: N/A
Versions of relevant libraries:
[pip3] numpy==1.19.5
[pip3] torch==1.10.0a0+git36449ea
[pip3] torchvision==0.11.0a0+cdacbe0
[conda] Could not collect
|
Hi @mattpopovich , Thanks for the detailed experimental data provided here! I believe this phenomenon can be explained, and as such I'm closing this ticket but let us know if you have further questions. |
🐛 Describe the bug
I created some yolo-rt-stack torchscript models by following the script here. I then followed the README instructions to build the LibTorch C++ code. Everything works as expected except inference on the GPU is much slower (7x) than the CPU.
Can you confirm these results or am I doing something wrong? I believe previously (July-August 2021 timeframe) I was seeing inference times in the 8-10ms range.
v4.0:
Click to show v4.0
v6.0:
Click to show v6.0
Thanks again for all your help thus far. I'm going to look into
deployment/tensorrt
next to see what inference times I can get there.Versions
Click to display Versions
The text was updated successfully, but these errors were encountered: