Compiling on K80, executing on P100 #233

barakhi · 2019-11-04T17:01:59Z

Hi,
I have installed and compiled detectron2 on K80 GPU under a conda environment.
Training on K80 GPU works fine, on several machines.
While training on P100 GPU I'have got the following error:

File "/net/mraid11/export/data/Projects/detectron2/detectron2_repo/detectron2/layers/roi_align.py", line 95, in forward
input, rois, self.output_size, self.spatial_scale, self.sampling_ratio, self.aligned
File "/net/mraid11/export/data/Projects/detectron2/detectron2_repo/detectron2/layers/roi_align.py", line 20, in forward
input, roi, spatial_scale, output_size[0], output_size[1], sampling_ratio, aligned
RuntimeError: CUDA error: no kernel image is available for execution on the device (ROIAlign_forward_cuda at /net/mraid11/export/data/Projects/detectron2/detectron2_repo/detectron2/layers/csrc/ROIAlign/ROIAlign_cuda.cu:361)
frame #0: c10::Error::Error(c10::SourceLocation, std::string const&) + 0x47 (0x2af4cfc99687 in /home/data/Software/Anaconda3_2018/envs/detectron2/lib/python3.7/site-packages/torch/lib/libc10.so)
frame #1: detectron2::ROIAlign_forward_cuda(at::Tensor const&, at::Tensor const&, float, int, int, int, bool) + 0x9f7 (0x2af4eab00151 in /net/mraid11/export/data/Projects/detectron2/detectron2_repo/detectron2/_C.cpython-37m-x86_64-linux-gnu.so)
frame #2: detectron2::ROIAlign_forward(at::Tensor const&, at::Tensor const&, float, int, int, int, bool) + 0x9c (0x2af4eaacb1bc in /net/mraid11/export/data/Projects/detectron2/detectron2_repo/detectron2/_C.cpython-37m-x86_64-linux-gnu.so)
frame #3: + 0x5a00f (0x2af4eaadc00f in /net/mraid11/export/data/Projects/detectron2/detectron2_repo/detectron2/_C.cpython-37m-x86_64-linux-gnu.so)
frame #4: + 0x541ef (0x2af4eaad61ef in /net/mraid11/export/data/Projects/detectron2/detectron2_repo/detectron2/_C.cpython-37m-x86_64-linux-gnu.so)

frame #9: THPFunction_apply(_object, _object) + 0x8d6 (0x2af4a1ff9e96 in /home/data/Software/Anaconda3_2018/envs/detectron2/lib/python3.7/site-packages/torch/lib/libtorch_python.so)

To Reproduce

install and run detectron2 on a K80 GPU, then run on a P100 GPU.

My installation process:
conda create --name detectron2
conda activate detectron2

conda install ipython
pip install ninja yacs cython matplotlib tqdm opencv-python

conda install pytorch torchvision cudatoolkit=9.2 -c pytorch

cd ~/data/Projects/detectron2/
pip install git+https://github.com/facebookresearch/fvcore.git
pip install 'git+https://github.com/cocodataset/cocoapi.git#subdirectory=PythonAPI'

git clone https://github.com/facebookresearch/detectron2 detectron2_repo
pip install -e detectron2_repo

cd detectron2_repo/datasets/
mkdir -p coco
ln -s /home/data/Datasets/coco/annotations coco/annotations
ln -s /home/data/Datasets/coco/train2017 coco/train2017
ln -s /home/data/Datasets/coco/val2017 coco/val2017
ln -s /home/data/Datasets/coco/test2017 coco/test2017

python tools/train_net.py --config-file configs/COCO-InstanceSegmentation/mask_rcnn_R_50_FPN_1x.yaml SOLVER.IMS_PER_BATCH 2 SOLVER.BASE_LR 0.0025

what changes you made / what code you wrote
what command you run
what you observed (full logs are preferred)

Expected behavior

If there are no obvious error in "what you observed" provided above,
please tell us the expected behavior.

Environment

sys.platform linux
Python 3.7.4 (default, Aug 13 2019, 20:35:49) [GCC 7.3.0]
Numpy 1.17.2
Detectron2 Compiler GCC 4.9
Detectron2 CUDA Compiler 9.2
DETECTRON2_ENV_MODULE
PyTorch 1.3.0
PyTorch Debug Build False
torchvision 0.4.1a0+d94043a
CUDA available True
GPU 0,1,2,3 Tesla P100-PCIE-16GB
CUDA_HOME /usr/local/cuda-9.2
NVCC Cuda compilation tools, release 9.2, V9.2.88
Pillow 6.2.0
cv2 4.1.1

PyTorch built with:

GCC 7.3
Intel(R) Math Kernel Library Version 2019.0.4 Product Build 20190411 for Intel(R) 64 architecture applications
Intel(R) MKL-DNN v0.20.5 (Git Hash 0125f28c61c1f822fd48570b4c1066f96fcb9b2e)
OpenMP 201511 (a.k.a. OpenMP 4.5)
NNPACK is enabled
CUDA Runtime 9.2
NVCC architecture flags: -gencode;arch=compute_35,code=sm_35;-gencode;arch=compute_50,code=sm_50;-gencode;arch=compute_60,code=sm_60;-gencode;arch=compute_61,code=sm_61;-gencode;arch=compute_70,code=sm_70;-gencode;arch=compute_50,code=compute_50
CuDNN 7.6.3
Magma 2.5.1
Build settings: BLAS=MKL, BUILD_NAMEDTENSOR=OFF, BUILD_TYPE=Release, CXX_FLAGS= -Wno-deprecated -fvisibility-inlines-hidden -fopenmp -DUSE_FBGEMM -DUSE_QNNPACK -DUSE_PYTORCH_QNNPACK -O2 -fPIC -Wno-narrowing -Wall -Wextra -Wno-missing-field-initializers -Wno-type-limits -Wno-array-bounds -Wno-unknown-pragmas -Wno-sign-compare -Wno-unused-parameter -Wno-unused-variable -Wno-unused-function -Wno-unused-result -Wno-strict-overflow -Wno-strict-aliasing -Wno-error=deprecated-declarations -Wno-stringop-overflow -Wno-error=pedantic -Wno-error=redundant-decls -Wno-error=old-style-cast -fdiagnostics-color=always -faligned-new -Wno-unused-but-set-variable -Wno-maybe-uninitialized -fno-math-errno -fno-trapping-math -Wno-stringop-overflow, DISABLE_NUMA=1, PERF_WITH_AVX=1, PERF_WITH_AVX2=1, PERF_WITH_AVX512=1, USE_CUDA=True, USE_EXCEPTION_PTR=1, USE_GFLAGS=OFF, USE_GLOG=OFF, USE_MKL=ON, USE_MKLDNN=ON, USE_MPI=OFF, USE_NCCL=ON, USE_NNPACK=ON, USE_OPENMP=ON, USE_STATIC_DISPATCH=OFF,

ppwwyyxx · 2019-11-04T17:06:24Z

You cannot compile code for one GPU architecture and run it on a different GPU architecture.
If you don't know how to compile code for a GPU architecture that's different from the GPU, then just avoid doing this.

barakhi · 2019-11-04T17:21:51Z

I don't think I understand. I'm working in a cluster, where several GPU machines are connected to and can be used. In some of the machines there are K80 GPUs, on others P100,
Up to now, I had no problem to compile any code (including detectron1, maskrcnn-benchmark and many other code examples) on a K80 GPU and run it on any of the other machines (including P100, V100 etc) with the same CUDA/CUDNN etc. Is it not possible on detectron2 ?

ppwwyyxx · 2019-11-04T17:28:42Z

It should work if you set TORCH_CUDA_ARCH_LIST env variable to contain architecture code for both architectures prior to building. But if not I don't think that's a detectron2 issue.

qianwangn · 2019-11-05T04:12:23Z

according to the log, I already build with sm_70 which V100 needed. but why I can not run on V100 ?

ppwwyyxx · 2019-11-05T04:14:35Z

Could you show according to which log?

qianwangn · 2019-11-05T04:16:36Z

arch=compute_70,code=sm_70;-gencode
`PyTorch built with:

GCC 7.3
Intel(R) Math Kernel Library Version 2018.0.3 Product Build 20180406 for Intel(R) 64 architecture applications
Intel(R) MKL-DNN v0.20.5 (Git Hash 0125f28c61c1f822fd48570b4c1066f96fcb9b2e)
OpenMP 201511 (a.k.a. OpenMP 4.5)
NNPACK is enabled
CUDA Runtime 10.0
NVCC architecture flags: -gencode;arch=compute_35,code=sm_35;-gencode;arch=compute_50,code=sm_50;-gencode;arch=compute_60,code=sm_60;-gencode;arch=compute_61,code=sm_61;-gencode;arch=compute_70,code=sm_70;-gencode;arch=compute_75,code=sm_75;-gencode;arch=compute_50,code=compute_50
CuDNN 7.6.3
Magma 2.5.1
Build settings: BLAS=MKL, BUILD_NAMEDTENSOR=OFF, BUILD_TYPE=Release, CXX_FLAGS= -Wno-deprecated -fvisibility-inlines-hidden -fopenmp -DUSE_FBGEMM -DUSE_QNNPACK -DUSE_PYTORCH_QNNPACK -O2 -fPIC -Wno-narrowing -Wall -Wextra -Wno-missing-field-initializers -Wno-type-limits -Wno-array-bounds -Wno-unknown-pragmas -Wno-sign-compare -Wno-unused-parameter -Wno-unused-variable -Wno-unused-function -Wno-unused-result -Wno-strict-overflow -Wno-strict-aliasing -Wno-error=deprecated-declarations -Wno-stringop-overflow -Wno-error=pedantic -Wno-error=redundant-decls -Wno-error=old-style-cast -fdiagnostics-color=always -faligned-new -Wno-unused-but-set-variable -Wno-maybe-uninitialized -fno-math-errno -fno-trapping-math -Wno-stringop-overflow, DISABLE_NUMA=1, PERF_WITH_AVX=1, PERF_WITH_AVX2=1, PERF_WITH_AVX512=1, USE_CUDA=True, USE_EXCEPTION_PTR=1, USE_GFLAGS=OFF, USE_GLOG=OFF, USE_MKL=ON, USE_MKLDNN=ON, USE_MPI=OFF, USE_NCCL=ON, USE_NNPACK=ON, USE_OPENMP=ON, USE_STATIC_DISPATCH=OFF,-`

ppwwyyxx · 2019-11-05T06:28:37Z

As the log says this is what pytorch is built with. This may not be what detectron2 is built with.

qianwangn · 2019-11-06T02:58:27Z

sorry, my falut.

That should be solved by
export TORCH_CUDA_ARCH_LIST="3.5;5.0;6.0;6.1;7.0;7.5"
before python setup.py build develop

ppwwyyxx closed this as completed Nov 4, 2019

ppwwyyxx added the installation / environment label Nov 4, 2019

ppwwyyxx mentioned this issue Nov 5, 2019

V100 bug #238

Closed

github-actions bot locked as resolved and limited conversation to collaborators Jan 11, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Compiling on K80, executing on P100 #233

Compiling on K80, executing on P100 #233

barakhi commented Nov 4, 2019

ppwwyyxx commented Nov 4, 2019

barakhi commented Nov 4, 2019

ppwwyyxx commented Nov 4, 2019

qianwangn commented Nov 5, 2019

ppwwyyxx commented Nov 5, 2019

qianwangn commented Nov 5, 2019

ppwwyyxx commented Nov 5, 2019

qianwangn commented Nov 6, 2019 •

edited

Loading

Compiling on K80, executing on P100 #233

Compiling on K80, executing on P100 #233

Comments

barakhi commented Nov 4, 2019

Hi, I have installed and compiled detectron2 on K80 GPU under a conda environment. Training on K80 GPU works fine, on several machines. While training on P100 GPU I'have got the following error:

To Reproduce

Expected behavior

Environment

ppwwyyxx commented Nov 4, 2019

barakhi commented Nov 4, 2019

ppwwyyxx commented Nov 4, 2019

qianwangn commented Nov 5, 2019

ppwwyyxx commented Nov 5, 2019

qianwangn commented Nov 5, 2019

ppwwyyxx commented Nov 5, 2019

qianwangn commented Nov 6, 2019 • edited Loading

Hi,
I have installed and compiled detectron2 on K80 GPU under a conda environment.
Training on K80 GPU works fine, on several machines.
While training on P100 GPU I'have got the following error:

qianwangn commented Nov 6, 2019 •

edited

Loading