Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Slower than expected GPU inference in deployment/libtorch example #273

Closed
mattpopovich opened this issue Jan 12, 2022 · 3 comments
Closed
Labels
question Further information is requested

Comments

@mattpopovich
Copy link
Contributor

🐛 Describe the bug

I created some yolo-rt-stack torchscript models by following the script here. I then followed the README instructions to build the LibTorch C++ code. Everything works as expected except inference on the GPU is much slower (7x) than the CPU.

Can you confirm these results or am I doing something wrong? I believe previously (July-August 2021 timeframe) I was seeing inference times in the 8-10ms range.

v4.0:

Click to show v4.0

root@pc:yolov5-rt-stack/deployment/libtorch/build# ./yolort_torch --input_source ../../../bus.jpg --checkpoint ../../../yolov5s-v4.0-RT-v0.5.2-YOLOv5.torchscript.pt --labelmap ../../../coco.names
Set CPU mode
Loading model
Model loaded
Run once on empty image
[W TensorImpl.h:1153] Warning: Named tensors and all their associated APIs are an experimental feature and subject to change. Please do not use them for anything important until they are released as stable. (function operator())
Pre-process takes : 18 ms
Inference takes : 106 ms
Detected labels:  0
 0
 0
 5
 0
[ CPULongType{5} ]
Detected boxes:  669.2656  391.3025  809.8663  885.2344
  54.0635  397.8318  235.9531  901.3731
 222.8834  406.8119  341.5572  854.7792
  18.6320  232.9767  810.9739  760.1169
   0.4640  502.0519   88.5140  887.0480
[ CPUFloatType{5,4} ]
Detected scores:  0.8901
 0.8733
 0.8537
 0.7234
 0.3769
[ CPUFloatType{5} ]
root@pc:yolov5-rt-stack/deployment/libtorch/build# ./yolort_torch --input_source ../../../bus.jpg --checkpoint ../../../yolov5s-v4.0-RT-v0.5.2-YOLOv5.torchscript.pt --labelmap ../../../coco.names --gpu 
Set GPU mode
Loading model
Model loaded
Run once on empty image
[W TensorImpl.h:1153] Warning: Named tensors and all their associated APIs are an experimental feature and subject to change. Please do not use them for anything important until they are released as stable. (function operator())
Pre-process takes : 21 ms
Inference takes : 748 ms
Detected labels:  0
 0
 0
 5
 0
[ CUDALongType{5} ]
Detected boxes:  669.2656  391.3025  809.8663  885.2344
  54.0635  397.8318  235.9531  901.3730
 222.8834  406.8120  341.5572  854.7791
  18.6320  232.9767  810.9739  760.1170
   0.4640  502.0522   88.5139  887.0480
[ CUDAFloatType{5,4} ]
Detected scores:  0.8901
 0.8733
 0.8537
 0.7234
 0.3769
[ CUDAFloatType{5} ]

v6.0:

Click to show v6.0

root@pc:yolov5-rt-stack/deployment/libtorch/build# ./yolort_torch --input_source ../../../bus.jpg --checkpoint ../../../yolov5s-v6.0-RT-v0.5.2-YOLOv5.torchscript.pt --labelmap ../../../coco.names
Set CPU mode
Loading model
Model loaded
Run once on empty image
[W TensorImpl.h:1153] Warning: Named tensors and all their associated APIs are an experimental feature and subject to change. Please do not use them for anything important until they are released as stable. (function operator())
Pre-process takes : 15 ms
Inference takes : 95 ms
Detected labels:  0
 0
 0
 5
 0
[ CPULongType{5} ]
Detected boxes:  224.5497  402.5811  342.7194  862.6057
  51.8626  398.3438  245.3290  906.3114
 679.8232  385.5574  809.3773  883.1394
   0.1952  201.8805  812.9611  786.3345
   0.0480  558.7347   75.8148  871.5754
[ CPUFloatType{5,4} ]
Detected scores:  0.8959
 0.8846
 0.8579
 0.5181
 0.3932
[ CPUFloatType{5} ]
root@pc:yolov5-rt-stack/deployment/libtorch/build# ./yolort_torch --input_source ../../../bus.jpg --checkpoint ../../../yolov5s-v6.0-RT-v0.5.2-YOLOv5.torchscript.pt --labelmap ../../../coco.names --gpu 
Set GPU mode
Loading model
Model loaded
Run once on empty image
[W TensorImpl.h:1153] Warning: Named tensors and all their associated APIs are an experimental feature and subject to change. Please do not use them for anything important until they are released as stable. (function operator())
Pre-process takes : 28 ms
Inference takes : 746 ms
Detected labels:  0
 0
 0
 5
 0
[ CUDALongType{5} ]
Detected boxes:  224.5497  402.5810  342.7194  862.6058
  51.8626  398.3439  245.3289  906.3113
 679.8232  385.5574  809.3773  883.1393
   0.1954  201.8804  812.9608  786.3347
   0.0480  558.7346   75.8148  871.5754
[ CUDAFloatType{5,4} ]
Detected scores:  0.8959
 0.8846
 0.8579
 0.5181
 0.3932
[ CUDAFloatType{5} ]

Thanks again for all your help thus far. I'm going to look into deployment/tensorrt next to see what inference times I can get there.

Versions

Click to display Versions

# python3 -m torch.utils.collect_env
Collecting environment information...
PyTorch version: 1.9.0a0+gitd69c22d
Is debug build: False
CUDA used to build PyTorch: 11.2
ROCM used to build PyTorch: N/A

OS: Ubuntu 20.04.2 LTS (x86_64)
GCC version: (Ubuntu 9.3.0-17ubuntu1~20.04) 9.3.0
Clang version: Could not collect
CMake version: version 3.21.1
Libc version: glibc-2.31

Python version: 3.8 (64-bit runtime)
Python platform: Linux-5.4.0-92-generic-x86_64-with-glibc2.29
Is CUDA available: True
CUDA runtime version: 11.2.152
GPU models and configuration: 
GPU 0: GeForce GTX 1080
GPU 1: GeForce GTX 1080
GPU 2: GeForce GTX 1080

Nvidia driver version: 460.91.03
cuDNN version: Probably one of the following:
/usr/lib/x86_64-linux-gnu/libcudnn.so.8.1.0
/usr/lib/x86_64-linux-gnu/libcudnn_adv_infer.so.8.1.0
/usr/lib/x86_64-linux-gnu/libcudnn_adv_train.so.8.1.0
/usr/lib/x86_64-linux-gnu/libcudnn_cnn_infer.so.8.1.0
/usr/lib/x86_64-linux-gnu/libcudnn_cnn_train.so.8.1.0
/usr/lib/x86_64-linux-gnu/libcudnn_ops_infer.so.8.1.0
/usr/lib/x86_64-linux-gnu/libcudnn_ops_train.so.8.1.0
HIP runtime version: N/A
MIOpen runtime version: N/A

Versions of relevant libraries:
[pip3] numpy==1.21.4
[pip3] pytorch-lightning==1.5.8
[pip3] torch==1.9.0a0+gitd69c22d
[pip3] torchmetrics==0.6.2
[pip3] torchvision==0.10.0a0+300a8a4
[conda] Could not collect

@zhiqwang
Copy link
Owner

zhiqwang commented Jan 12, 2022

Hi @mattpopovich ,

Seems that PyTorch 1.9 requires two warm-ups on the GPU, and we need to ignore the first two calculation times. Could you test it again or upgrade your PyTorch to 1.10.1? (Check pytorch/pytorch#58801 for more details.)

The part of TensorRT C++ is under development, we have implemented the core parts of model converting, now there are several parts that still need to be implemented:

  1. We use the YOLO.load_from_yolov5() strategy in the TensorRT, so we should implement the pre-processing in the C++ example, and now the existing version is a bit rough.
  2. we use the static shape mechanism in the part of the model conversion to TensorRT engine, we need to add dynamic shape support, this is very important for practical applications. Check Support dynamic shape mechanism for TensorRT #266.

And all contributions are welcome here!

@zhiqwang zhiqwang added the question Further information is requested label Jan 12, 2022
@mattpopovich
Copy link
Contributor Author

mattpopovich commented Jan 15, 2022

Great find! I ran way too many tests on my machine (below) with PyTorch, TorchVision, and OpenCV built from source (originally I was seeing slow inference no matter how many times I "warmed up" the model, but I have since been unable to recreate that).

It looks like 3 warm-ups is necessary for all recent versions of PyTorch:


CUDA 11.4.3, PyTorch 1.10.1, TorchVision 0.11.2, OpenCV 4.5.5:

Click to show software configuration

# python3 -m torch.utils.collect_env
Collecting environment information...
PyTorch version: 1.10.0a0+git302ee7b
Is debug build: False
CUDA used to build PyTorch: 11.4
ROCM used to build PyTorch: N/A

OS: Ubuntu 20.04.3 LTS (x86_64)
GCC version: (Ubuntu 9.3.0-17ubuntu1~20.04) 9.3.0
Clang version: Could not collect
CMake version: version 3.21.1
Libc version: glibc-2.31

Python version: 3.8.10 (default, Nov 26 2021, 20:14:08)  [GCC 9.3.0] (64-bit runtime)
Python platform: Linux-5.4.0-89-generic-x86_64-with-glibc2.29
Is CUDA available: True
CUDA runtime version: 11.4.152
GPU models and configuration: 
GPU 0: Tesla V100-SXM2-32GB
GPU 1: Tesla V100-SXM2-32GB
GPU 2: Tesla V100-SXM2-32GB
GPU 3: Tesla V100-SXM2-32GB

Nvidia driver version: 470.74
cuDNN version: Probably one of the following:
/usr/lib/x86_64-linux-gnu/libcudnn.so.8.1.0
/usr/lib/x86_64-linux-gnu/libcudnn_adv_infer.so.8.1.0
/usr/lib/x86_64-linux-gnu/libcudnn_adv_train.so.8.1.0
/usr/lib/x86_64-linux-gnu/libcudnn_cnn_infer.so.8.1.0
/usr/lib/x86_64-linux-gnu/libcudnn_cnn_train.so.8.1.0
/usr/lib/x86_64-linux-gnu/libcudnn_ops_infer.so.8.1.0
/usr/lib/x86_64-linux-gnu/libcudnn_ops_train.so.8.1.0
HIP runtime version: N/A
MIOpen runtime version: N/A

Versions of relevant libraries:
[pip3] mypy-extensions==0.4.3
[pip3] numpy==1.22.0
[pip3] pytorch-lightning==1.5.8
[pip3] torch==1.10.0a0+git302ee7b
[pip3] torchmetrics==0.6.2
[pip3] torchvision==0.11.0a0+e7ec7e2
[conda] Could not collect

  • 1 warm-up of the model, the next inference takes: 1278ms
  • 2 warm-ups of the model, the next inference takes: 1044ms
  • 3 warm-ups of the model, the next inference takes: 11ms
  • 4 warm-ups of the model, the next inference takes: 10ms
  • 6 warm-ups of the model, the next inference takes: 10ms

CUDA 11.4.2, PyTorch 1.10.1, TorchVision 0.11.2, OpenCV 4.5.5:

Click to show software configuration

# python3 -m torch.utils.collect_env
Collecting environment information...
PyTorch version: 1.10.0a0+git302ee7b
Is debug build: False
CUDA used to build PyTorch: 11.4
ROCM used to build PyTorch: N/A

OS: Ubuntu 20.04.3 LTS (x86_64)
GCC version: (Ubuntu 9.3.0-17ubuntu1~20.04) 9.3.0
Clang version: Could not collect
CMake version: version 3.21.1
Libc version: glibc-2.31

Python version: 3.8.10 (default, Sep 28 2021, 16:10:42)  [GCC 9.3.0] (64-bit runtime)
Python platform: Linux-5.4.0-89-generic-x86_64-with-glibc2.29
Is CUDA available: True
CUDA runtime version: 11.4.120
GPU models and configuration: 
GPU 0: Tesla V100-SXM2-32GB
GPU 1: Tesla V100-SXM2-32GB
GPU 2: Tesla V100-SXM2-32GB
GPU 3: Tesla V100-SXM2-32GB

Nvidia driver version: 470.74
cuDNN version: Probably one of the following:
/usr/lib/x86_64-linux-gnu/libcudnn.so.8.1.0
/usr/lib/x86_64-linux-gnu/libcudnn_adv_infer.so.8.1.0
/usr/lib/x86_64-linux-gnu/libcudnn_adv_train.so.8.1.0
/usr/lib/x86_64-linux-gnu/libcudnn_cnn_infer.so.8.1.0
/usr/lib/x86_64-linux-gnu/libcudnn_cnn_train.so.8.1.0
/usr/lib/x86_64-linux-gnu/libcudnn_ops_infer.so.8.1.0
/usr/lib/x86_64-linux-gnu/libcudnn_ops_train.so.8.1.0
HIP runtime version: N/A
MIOpen runtime version: N/A

Versions of relevant libraries:
[pip3] mypy-extensions==0.4.3
[pip3] numpy==1.21.4
[pip3] pytorch-lightning==1.5.8
[pip3] torch==1.10.0a0+git302ee7b
[pip3] torchmetrics==0.6.2
[pip3] torchvision==0.11.0a0+e7ec7e2
[conda] Could not collect

  • 1 warm-up of the model, the next inference takes: 1260ms
  • 2 warm-ups of the model, the next inference takes: 1075ms
  • 3 warm-ups of the model, the next inference takes: 11ms
  • 4 warm-ups of the model, the next inference takes: 15ms
  • 6 warm-ups of the model, the next inference takes: 11ms

CUDA 11.4.2, PyTorch 1.10.0, TorchVision 0.11.1, OpenCV 4.5.4

Click to show software configuration

# python3 -m torch.utils.collect_env
Collecting environment information...
PyTorch version: 1.10.0a0+git36449ea
Is debug build: False
CUDA used to build PyTorch: 11.4
ROCM used to build PyTorch: N/A

OS: Ubuntu 20.04.3 LTS (x86_64)
GCC version: (Ubuntu 9.3.0-17ubuntu1~20.04) 9.3.0
Clang version: Could not collect
CMake version: version 3.21.1
Libc version: glibc-2.31

Python version: 3.8.10 (default, Sep 28 2021, 16:10:42)  [GCC 9.3.0] (64-bit runtime)
Python platform: Linux-5.4.0-89-generic-x86_64-with-glibc2.29
Is CUDA available: True
CUDA runtime version: 11.4.120
GPU models and configuration: 
GPU 0: Tesla V100-SXM2-32GB
GPU 1: Tesla V100-SXM2-32GB
GPU 2: Tesla V100-SXM2-32GB
GPU 3: Tesla V100-SXM2-32GB

Nvidia driver version: 470.74
cuDNN version: Probably one of the following:
/usr/lib/x86_64-linux-gnu/libcudnn.so.8.1.0
/usr/lib/x86_64-linux-gnu/libcudnn_adv_infer.so.8.1.0
/usr/lib/x86_64-linux-gnu/libcudnn_adv_train.so.8.1.0
/usr/lib/x86_64-linux-gnu/libcudnn_cnn_infer.so.8.1.0
/usr/lib/x86_64-linux-gnu/libcudnn_cnn_train.so.8.1.0
/usr/lib/x86_64-linux-gnu/libcudnn_ops_infer.so.8.1.0
/usr/lib/x86_64-linux-gnu/libcudnn_ops_train.so.8.1.0
HIP runtime version: N/A
MIOpen runtime version: N/A

Versions of relevant libraries:
[pip3] numpy==1.21.4
[pip3] pytorch-lightning==1.5.4
[pip3] torch==1.10.0a0+git36449ea
[pip3] torchmetrics==0.6.0
[pip3] torchvision==0.11.0a0+fa347eb
[conda] Could not collect

  • 1 warm-up of the model, the next inference takes: 1282ms
  • 2 warm-ups of the model, the next inference takes: 1040ms
  • 3 warm-ups of the model, the next inference takes: 9ms
  • 4 warm-ups of the model, the next inference takes: 15ms
  • 6 warm-ups of the model, the next inference takes: 10ms

CUDA 11.4.1, PyTorch 1.10.1, TorchVision 0.11.2, OpenCV 4.5.5:

Click to show software configuration

# python3 -m torch.utils.collect_env
Collecting environment information...
PyTorch version: 1.10.0a0+git302ee7b
Is debug build: False
CUDA used to build PyTorch: 11.4
ROCM used to build PyTorch: N/A

OS: Ubuntu 20.04.3 LTS (x86_64)
GCC version: (Ubuntu 9.3.0-17ubuntu1~20.04) 9.3.0
Clang version: Could not collect
CMake version: version 3.21.1
Libc version: glibc-2.31

Python version: 3.8.10 (default, Sep 28 2021, 16:10:42)  [GCC 9.3.0] (64-bit runtime)
Python platform: Linux-5.4.0-89-generic-x86_64-with-glibc2.29
Is CUDA available: True
CUDA runtime version: 11.4.120
GPU models and configuration: 
GPU 0: Tesla V100-SXM2-32GB
GPU 1: Tesla V100-SXM2-32GB
GPU 2: Tesla V100-SXM2-32GB
GPU 3: Tesla V100-SXM2-32GB

Nvidia driver version: 470.74
cuDNN version: Probably one of the following:
/usr/lib/x86_64-linux-gnu/libcudnn.so.8.1.0
/usr/lib/x86_64-linux-gnu/libcudnn_adv_infer.so.8.1.0
/usr/lib/x86_64-linux-gnu/libcudnn_adv_train.so.8.1.0
/usr/lib/x86_64-linux-gnu/libcudnn_cnn_infer.so.8.1.0
/usr/lib/x86_64-linux-gnu/libcudnn_cnn_train.so.8.1.0
/usr/lib/x86_64-linux-gnu/libcudnn_ops_infer.so.8.1.0
/usr/lib/x86_64-linux-gnu/libcudnn_ops_train.so.8.1.0
HIP runtime version: N/A
MIOpen runtime version: N/A

Versions of relevant libraries:
[pip3] mypy-extensions==0.4.3
[pip3] numpy==1.21.4
[pip3] pytorch-lightning==1.5.8
[pip3] torch==1.10.0a0+git302ee7b
[pip3] torchmetrics==0.6.2
[pip3] torchvision==0.11.0a0+e7ec7e2
[conda] Could not collect

  • 1 warm-up of the model, the next inference takes: 1232ms
  • 2 warm-ups of the model, the next inference takes: 1087ms
  • 3 warm-ups of the model, the next inference takes: 11ms
  • 4 warm-ups of the model, the next inference takes: 9ms
  • 6 warm-ups of the model, the next inference takes: 15ms

CUDA 11.4.1, PyTorch 1.10.0 commit 3fd9dcf, TorchVision 0.11.1, OpenCV 4.5.4:

Click to show software configuration

# python3 -m torch.utils.collect_env
Collecting environment information...
PyTorch version: 1.10.0a0+git3fd9dcf
Is debug build: False
CUDA used to build PyTorch: 11.4
ROCM used to build PyTorch: N/A

OS: Ubuntu 20.04.3 LTS (x86_64)
GCC version: (Ubuntu 9.3.0-17ubuntu1~20.04) 9.3.0
Clang version: Could not collect
CMake version: version 3.21.1
Libc version: glibc-2.31

Python version: 3.8.10 (default, Sep 28 2021, 16:10:42)  [GCC 9.3.0] (64-bit runtime)
Python platform: Linux-5.4.0-89-generic-x86_64-with-glibc2.29
Is CUDA available: True
CUDA runtime version: 11.4.120
GPU models and configuration: 
GPU 0: Tesla V100-SXM2-32GB
GPU 1: Tesla V100-SXM2-32GB
GPU 2: Tesla V100-SXM2-32GB
GPU 3: Tesla V100-SXM2-32GB

Nvidia driver version: 470.74
cuDNN version: Probably one of the following:
/usr/lib/x86_64-linux-gnu/libcudnn.so.8.1.0
/usr/lib/x86_64-linux-gnu/libcudnn_adv_infer.so.8.1.0
/usr/lib/x86_64-linux-gnu/libcudnn_adv_train.so.8.1.0
/usr/lib/x86_64-linux-gnu/libcudnn_cnn_infer.so.8.1.0
/usr/lib/x86_64-linux-gnu/libcudnn_cnn_train.so.8.1.0
/usr/lib/x86_64-linux-gnu/libcudnn_ops_infer.so.8.1.0
/usr/lib/x86_64-linux-gnu/libcudnn_ops_train.so.8.1.0
HIP runtime version: N/A
MIOpen runtime version: N/A

Versions of relevant libraries:
[pip3] numpy==1.21.4
[pip3] pytorch-lightning==1.5.4
[pip3] torch==1.10.0a0+git3fd9dcf
[pip3] torchmetrics==0.6.1
[pip3] torchvision==0.11.0a0+fa347eb
[conda] Could not collect

  • 1 warm-up of the model, the next inference takes: 1153ms
  • 2 warm-ups of the model, the next inference takes: 779 ms
  • 3 warm-ups of the model, the next inference takes: 7 ms
  • 4 warm-ups of the model, the next inference takes: 8 ms

CUDA 11.4.0, PyTorch 1.10.0, TorchVision 0.11.1, OpenCV 4.5.4:

Click to show software configuration

# python3 -m torch.utils.collect_env
Collecting environment information...
PyTorch version: 1.10.0a0+git36449ea
Is debug build: False
CUDA used to build PyTorch: 11.4
ROCM used to build PyTorch: N/A

OS: Ubuntu 20.04.3 LTS (x86_64)
GCC version: (Ubuntu 9.3.0-17ubuntu1~20.04) 9.3.0
Clang version: Could not collect
CMake version: version 3.21.1
Libc version: glibc-2.31

Python version: 3.8.10 (default, Sep 28 2021, 16:10:42)  [GCC 9.3.0] (64-bit runtime)
Python platform: Linux-5.4.0-89-generic-x86_64-with-glibc2.29
Is CUDA available: True
CUDA runtime version: 11.4.120
GPU models and configuration: 
GPU 0: Tesla V100-SXM2-32GB
GPU 1: Tesla V100-SXM2-32GB
GPU 2: Tesla V100-SXM2-32GB
GPU 3: Tesla V100-SXM2-32GB

Nvidia driver version: 470.74
cuDNN version: Probably one of the following:
/usr/lib/x86_64-linux-gnu/libcudnn.so.8.1.0
/usr/lib/x86_64-linux-gnu/libcudnn_adv_infer.so.8.1.0
/usr/lib/x86_64-linux-gnu/libcudnn_adv_train.so.8.1.0
/usr/lib/x86_64-linux-gnu/libcudnn_cnn_infer.so.8.1.0
/usr/lib/x86_64-linux-gnu/libcudnn_cnn_train.so.8.1.0
/usr/lib/x86_64-linux-gnu/libcudnn_ops_infer.so.8.1.0
/usr/lib/x86_64-linux-gnu/libcudnn_ops_train.so.8.1.0
HIP runtime version: N/A
MIOpen runtime version: N/A

Versions of relevant libraries:
[pip3] numpy==1.21.4
[pip3] pytorch-lightning==1.5.4
[pip3] torch==1.10.0a0+git36449ea
[pip3] torchmetrics==0.6.0
[pip3] torchvision==0.11.0a0+fa347eb
[conda] Could not collect

  • 1 warm-up of the model, the next inference takes: 1290ms
  • 2 warm-ups of the model, the next inference takes: 1091ms
  • 3 warm-ups of the model, the next inference takes: 10ms
  • 4 warm-ups of the model, the next inference takes: 8ms
  • 6 warm-ups of the model, the next inference takes: 11ms

CUDA 11.3.1, PyTorch 1.9.1, TorchVision 0.10.1, OpenCV 4.5.4

Click to show software configuration

python3 -m torch.utils.collect_env
Collecting environment information...
PyTorch version: 1.9.0a0+gitdfbd030
Is debug build: False
CUDA used to build PyTorch: 11.3
ROCM used to build PyTorch: N/A

OS: Ubuntu 20.04.3 LTS (x86_64)
GCC version: (Ubuntu 9.3.0-17ubuntu1~20.04) 9.3.0
Clang version: Could not collect
CMake version: version 3.21.1
Libc version: glibc-2.31

Python version: 3.8 (64-bit runtime)
Python platform: Linux-5.4.0-89-generic-x86_64-with-glibc2.29
Is CUDA available: True
CUDA runtime version: 11.3.109
GPU models and configuration: 
GPU 0: Tesla V100-SXM2-32GB
GPU 1: Tesla V100-SXM2-32GB
GPU 2: Tesla V100-SXM2-32GB
GPU 3: Tesla V100-SXM2-32GB

Nvidia driver version: 470.74
cuDNN version: Probably one of the following:
/usr/lib/x86_64-linux-gnu/libcudnn.so.8.1.0
/usr/lib/x86_64-linux-gnu/libcudnn_adv_infer.so.8.1.0
/usr/lib/x86_64-linux-gnu/libcudnn_adv_train.so.8.1.0
/usr/lib/x86_64-linux-gnu/libcudnn_cnn_infer.so.8.1.0
/usr/lib/x86_64-linux-gnu/libcudnn_cnn_train.so.8.1.0
/usr/lib/x86_64-linux-gnu/libcudnn_ops_infer.so.8.1.0
/usr/lib/x86_64-linux-gnu/libcudnn_ops_train.so.8.1.0
HIP runtime version: N/A
MIOpen runtime version: N/A

Versions of relevant libraries:
[pip3] numpy==1.21.4
[pip3] pytorch-lightning==1.5.4
[pip3] torch==1.9.0a0+gitdfbd030
[pip3] torchmetrics==0.6.0
[pip3] torchvision==0.10.0a0+ca1a620
[conda] Could not collect

  • 1 warm-up of the model, the next inference takes: 1167ms
  • 2 warm-ups of the model, the next inference takes: 894ms
  • 3 warm-ups of the model, the next inference takes: 7ms
  • 4 warm-ups of the model, the next inference takes: 8ms
  • 6 warm-ups of the model, the next inference takes: 7ms

CUDA 11.2.0, PyTorch 1.9.0, TorchVision 1.10.0, OpenCV 4.5.2

Click to show software configuration

python3 -m torch.utils.collect_env
Collecting environment information...
PyTorch version: 1.9.0a0+gitd69c22d
Is debug build: False
CUDA used to build PyTorch: 11.2
ROCM used to build PyTorch: N/A

OS: Ubuntu 20.04.1 LTS (x86_64)
GCC version: (Ubuntu 9.3.0-17ubuntu1~20.04) 9.3.0
Clang version: Could not collect
CMake version: version 3.21.1
Libc version: glibc-2.31

Python version: 3.8 (64-bit runtime)
Python platform: Linux-5.4.0-89-generic-x86_64-with-glibc2.29
Is CUDA available: True
CUDA runtime version: 11.2.67
GPU models and configuration: 
GPU 0: Tesla V100-SXM2-32GB
GPU 1: Tesla V100-SXM2-32GB
GPU 2: Tesla V100-SXM2-32GB
GPU 3: Tesla V100-SXM2-32GB

Nvidia driver version: 470.74
cuDNN version: Probably one of the following:
/usr/lib/x86_64-linux-gnu/libcudnn.so.8.1.0
/usr/lib/x86_64-linux-gnu/libcudnn_adv_infer.so.8.1.0
/usr/lib/x86_64-linux-gnu/libcudnn_adv_train.so.8.1.0
/usr/lib/x86_64-linux-gnu/libcudnn_cnn_infer.so.8.1.0
/usr/lib/x86_64-linux-gnu/libcudnn_cnn_train.so.8.1.0
/usr/lib/x86_64-linux-gnu/libcudnn_ops_infer.so.8.1.0
/usr/lib/x86_64-linux-gnu/libcudnn_ops_train.so.8.1.0
HIP runtime version: N/A
MIOpen runtime version: N/A

Versions of relevant libraries:
[pip3] numpy==1.21.4
[pip3] pytorch-lightning==1.5.2
[pip3] torch==1.9.0a0+gitd69c22d
[pip3] torchmetrics==0.6.0
[pip3] torchvision==0.10.0a0+300a8a4
[conda] Could not collect

  • 1 warm-up of the model, the next inference takes: 1188ms
  • 2 warm-ups of the model, the next inference takes: 899ms
  • 3 warm-ups of the model, the next inference takes: 10ms
  • 4 warm-ups of the model, the next inference takes: 15ms
  • 6 warm-ups of the model, the next inference takes: 12ms

Different PC with pre-built everything, not running in docker:

CUDA 11.5, PyTorch 1.10.0, TorchVision 0.11.0, OpenCV 4.5.3

Click to show software configuration

$ python3 -m torch.utils.collect_env 
Collecting environment information...
PyTorch version: 1.10.0a0+git36449ea
Is debug build: False
CUDA used to build PyTorch: 11.5
ROCM used to build PyTorch: N/A

OS: Ubuntu 18.04.6 LTS (x86_64)
GCC version: (Ubuntu 9.4.0-1ubuntu1~18.04) 9.4.0
Clang version: Could not collect
CMake version: version 3.21.3
Libc version: glibc-2.25

Python version: 3.6.9 (default, Dec  8 2021, 21:08:43)  [GCC 8.4.0] (64-bit runtime)
Python platform: Linux-5.4.0-1065-azure-x86_64-with-Ubuntu-18.04-bionic
Is CUDA available: True
CUDA runtime version: 11.5.119
GPU models and configuration: GPU 0: Tesla V100-PCIE-16GB
Nvidia driver version: 495.29.05
cuDNN version: Probably one of the following:
/usr/lib/x86_64-linux-gnu/libcudnn.so.8.3.1
/usr/lib/x86_64-linux-gnu/libcudnn_adv_infer.so.8.3.1
/usr/lib/x86_64-linux-gnu/libcudnn_adv_train.so.8.3.1
/usr/lib/x86_64-linux-gnu/libcudnn_cnn_infer.so.8.3.1
/usr/lib/x86_64-linux-gnu/libcudnn_cnn_train.so.8.3.1
/usr/lib/x86_64-linux-gnu/libcudnn_ops_infer.so.8.3.1
/usr/lib/x86_64-linux-gnu/libcudnn_ops_train.so.8.3.1
HIP runtime version: N/A
MIOpen runtime version: N/A

Versions of relevant libraries:
[pip3] numpy==1.19.5
[pip3] torch==1.10.0a0+git36449ea
[pip3] torchvision==0.11.0a0+cdacbe0
[conda] Could not collect

  • 1 warm-up of the model, the next inference takes: 1601ms
  • 2 warm-ups of the model, the next inference takes: 1239ms
  • 3 warm-ups of the model, the next inference takes: 13ms
  • 4 warm-ups of the model, the next inference takes: 12ms
  • 6 warm-ups of the model, the next inference takes: 13ms

@zhiqwang
Copy link
Owner

Hi @mattpopovich , Thanks for the detailed experimental data provided here! I believe this phenomenon can be explained, and as such I'm closing this ticket but let us know if you have further questions.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question Further information is requested
Projects
None yet
Development

No branches or pull requests

2 participants