Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Compiling on K80, executing on P100 #233

Closed
barakhi opened this issue Nov 4, 2019 · 8 comments
Closed

Compiling on K80, executing on P100 #233

barakhi opened this issue Nov 4, 2019 · 8 comments

Comments

@barakhi
Copy link

barakhi commented Nov 4, 2019

Hi,
I have installed and compiled detectron2 on K80 GPU under a conda environment.
Training on K80 GPU works fine, on several machines.
While training on P100 GPU I'have got the following error:

File "/net/mraid11/export/data/Projects/detectron2/detectron2_repo/detectron2/layers/roi_align.py", line 95, in forward
input, rois, self.output_size, self.spatial_scale, self.sampling_ratio, self.aligned
File "/net/mraid11/export/data/Projects/detectron2/detectron2_repo/detectron2/layers/roi_align.py", line 20, in forward
input, roi, spatial_scale, output_size[0], output_size[1], sampling_ratio, aligned
RuntimeError: CUDA error: no kernel image is available for execution on the device (ROIAlign_forward_cuda at /net/mraid11/export/data/Projects/detectron2/detectron2_repo/detectron2/layers/csrc/ROIAlign/ROIAlign_cuda.cu:361)
frame #0: c10::Error::Error(c10::SourceLocation, std::string const&) + 0x47 (0x2af4cfc99687 in /home/data/Software/Anaconda3_2018/envs/detectron2/lib/python3.7/site-packages/torch/lib/libc10.so)
frame #1: detectron2::ROIAlign_forward_cuda(at::Tensor const&, at::Tensor const&, float, int, int, int, bool) + 0x9f7 (0x2af4eab00151 in /net/mraid11/export/data/Projects/detectron2/detectron2_repo/detectron2/_C.cpython-37m-x86_64-linux-gnu.so)
frame #2: detectron2::ROIAlign_forward(at::Tensor const&, at::Tensor const&, float, int, int, int, bool) + 0x9c (0x2af4eaacb1bc in /net/mraid11/export/data/Projects/detectron2/detectron2_repo/detectron2/_C.cpython-37m-x86_64-linux-gnu.so)
frame #3: + 0x5a00f (0x2af4eaadc00f in /net/mraid11/export/data/Projects/detectron2/detectron2_repo/detectron2/_C.cpython-37m-x86_64-linux-gnu.so)
frame #4: + 0x541ef (0x2af4eaad61ef in /net/mraid11/export/data/Projects/detectron2/detectron2_repo/detectron2/_C.cpython-37m-x86_64-linux-gnu.so)

frame #9: THPFunction_apply(_object*, _object*) + 0x8d6 (0x2af4a1ff9e96 in /home/data/Software/Anaconda3_2018/envs/detectron2/lib/python3.7/site-packages/torch/lib/libtorch_python.so)

To Reproduce

install and run detectron2 on a K80 GPU, then run on a P100 GPU.

My installation process:
conda create --name detectron2
conda activate detectron2

conda install ipython
pip install ninja yacs cython matplotlib tqdm opencv-python

conda install pytorch torchvision cudatoolkit=9.2 -c pytorch

cd ~/data/Projects/detectron2/
pip install git+https://github.com/facebookresearch/fvcore.git
pip install 'git+https://github.com/cocodataset/cocoapi.git#subdirectory=PythonAPI'

git clone https://github.com/facebookresearch/detectron2 detectron2_repo
pip install -e detectron2_repo

cd detectron2_repo/datasets/
mkdir -p coco
ln -s /home/data/Datasets/coco/annotations coco/annotations
ln -s /home/data/Datasets/coco/train2017 coco/train2017
ln -s /home/data/Datasets/coco/val2017 coco/val2017
ln -s /home/data/Datasets/coco/test2017 coco/test2017

python tools/train_net.py --config-file configs/COCO-InstanceSegmentation/mask_rcnn_R_50_FPN_1x.yaml SOLVER.IMS_PER_BATCH 2 SOLVER.BASE_LR 0.0025

  1. what changes you made / what code you wrote
  2. what command you run
  3. what you observed (full logs are preferred)

Expected behavior

If there are no obvious error in "what you observed" provided above,
please tell us the expected behavior.

Environment


sys.platform linux
Python 3.7.4 (default, Aug 13 2019, 20:35:49) [GCC 7.3.0]
Numpy 1.17.2
Detectron2 Compiler GCC 4.9
Detectron2 CUDA Compiler 9.2
DETECTRON2_ENV_MODULE
PyTorch 1.3.0
PyTorch Debug Build False
torchvision 0.4.1a0+d94043a
CUDA available True
GPU 0,1,2,3 Tesla P100-PCIE-16GB
CUDA_HOME /usr/local/cuda-9.2
NVCC Cuda compilation tools, release 9.2, V9.2.88
Pillow 6.2.0
cv2 4.1.1


PyTorch built with:

  • GCC 7.3
  • Intel(R) Math Kernel Library Version 2019.0.4 Product Build 20190411 for Intel(R) 64 architecture applications
  • Intel(R) MKL-DNN v0.20.5 (Git Hash 0125f28c61c1f822fd48570b4c1066f96fcb9b2e)
  • OpenMP 201511 (a.k.a. OpenMP 4.5)
  • NNPACK is enabled
  • CUDA Runtime 9.2
  • NVCC architecture flags: -gencode;arch=compute_35,code=sm_35;-gencode;arch=compute_50,code=sm_50;-gencode;arch=compute_60,code=sm_60;-gencode;arch=compute_61,code=sm_61;-gencode;arch=compute_70,code=sm_70;-gencode;arch=compute_50,code=compute_50
  • CuDNN 7.6.3
  • Magma 2.5.1
  • Build settings: BLAS=MKL, BUILD_NAMEDTENSOR=OFF, BUILD_TYPE=Release, CXX_FLAGS= -Wno-deprecated -fvisibility-inlines-hidden -fopenmp -DUSE_FBGEMM -DUSE_QNNPACK -DUSE_PYTORCH_QNNPACK -O2 -fPIC -Wno-narrowing -Wall -Wextra -Wno-missing-field-initializers -Wno-type-limits -Wno-array-bounds -Wno-unknown-pragmas -Wno-sign-compare -Wno-unused-parameter -Wno-unused-variable -Wno-unused-function -Wno-unused-result -Wno-strict-overflow -Wno-strict-aliasing -Wno-error=deprecated-declarations -Wno-stringop-overflow -Wno-error=pedantic -Wno-error=redundant-decls -Wno-error=old-style-cast -fdiagnostics-color=always -faligned-new -Wno-unused-but-set-variable -Wno-maybe-uninitialized -fno-math-errno -fno-trapping-math -Wno-stringop-overflow, DISABLE_NUMA=1, PERF_WITH_AVX=1, PERF_WITH_AVX2=1, PERF_WITH_AVX512=1, USE_CUDA=True, USE_EXCEPTION_PTR=1, USE_GFLAGS=OFF, USE_GLOG=OFF, USE_MKL=ON, USE_MKLDNN=ON, USE_MPI=OFF, USE_NCCL=ON, USE_NNPACK=ON, USE_OPENMP=ON, USE_STATIC_DISPATCH=OFF,
@ppwwyyxx
Copy link
Contributor

ppwwyyxx commented Nov 4, 2019

You cannot compile code for one GPU architecture and run it on a different GPU architecture.
If you don't know how to compile code for a GPU architecture that's different from the GPU, then just avoid doing this.

@barakhi
Copy link
Author

barakhi commented Nov 4, 2019

I don't think I understand. I'm working in a cluster, where several GPU machines are connected to and can be used. In some of the machines there are K80 GPUs, on others P100,
Up to now, I had no problem to compile any code (including detectron1, maskrcnn-benchmark and many other code examples) on a K80 GPU and run it on any of the other machines (including P100, V100 etc) with the same CUDA/CUDNN etc. Is it not possible on detectron2 ?

@ppwwyyxx
Copy link
Contributor

ppwwyyxx commented Nov 4, 2019

It should work if you set TORCH_CUDA_ARCH_LIST env variable to contain architecture code for both architectures prior to building. But if not I don't think that's a detectron2 issue.

@ppwwyyxx ppwwyyxx mentioned this issue Nov 5, 2019
@qianwangn
Copy link

according to the log, I already build with sm_70 which V100 needed. but why I can not run on V100 ?

@ppwwyyxx
Copy link
Contributor

ppwwyyxx commented Nov 5, 2019

Could you show according to which log?

@qianwangn
Copy link

arch=compute_70,code=sm_70;-gencode
`PyTorch built with:

  • GCC 7.3
  • Intel(R) Math Kernel Library Version 2018.0.3 Product Build 20180406 for Intel(R) 64 architecture applications
  • Intel(R) MKL-DNN v0.20.5 (Git Hash 0125f28c61c1f822fd48570b4c1066f96fcb9b2e)
  • OpenMP 201511 (a.k.a. OpenMP 4.5)
  • NNPACK is enabled
  • CUDA Runtime 10.0
  • NVCC architecture flags: -gencode;arch=compute_35,code=sm_35;-gencode;arch=compute_50,code=sm_50;-gencode;arch=compute_60,code=sm_60;-gencode;arch=compute_61,code=sm_61;-gencode;arch=compute_70,code=sm_70;-gencode;arch=compute_75,code=sm_75;-gencode;arch=compute_50,code=compute_50
  • CuDNN 7.6.3
  • Magma 2.5.1
  • Build settings: BLAS=MKL, BUILD_NAMEDTENSOR=OFF, BUILD_TYPE=Release, CXX_FLAGS= -Wno-deprecated -fvisibility-inlines-hidden -fopenmp -DUSE_FBGEMM -DUSE_QNNPACK -DUSE_PYTORCH_QNNPACK -O2 -fPIC -Wno-narrowing -Wall -Wextra -Wno-missing-field-initializers -Wno-type-limits -Wno-array-bounds -Wno-unknown-pragmas -Wno-sign-compare -Wno-unused-parameter -Wno-unused-variable -Wno-unused-function -Wno-unused-result -Wno-strict-overflow -Wno-strict-aliasing -Wno-error=deprecated-declarations -Wno-stringop-overflow -Wno-error=pedantic -Wno-error=redundant-decls -Wno-error=old-style-cast -fdiagnostics-color=always -faligned-new -Wno-unused-but-set-variable -Wno-maybe-uninitialized -fno-math-errno -fno-trapping-math -Wno-stringop-overflow, DISABLE_NUMA=1, PERF_WITH_AVX=1, PERF_WITH_AVX2=1, PERF_WITH_AVX512=1, USE_CUDA=True, USE_EXCEPTION_PTR=1, USE_GFLAGS=OFF, USE_GLOG=OFF, USE_MKL=ON, USE_MKLDNN=ON, USE_MPI=OFF, USE_NCCL=ON, USE_NNPACK=ON, USE_OPENMP=ON, USE_STATIC_DISPATCH=OFF,-`

@ppwwyyxx
Copy link
Contributor

ppwwyyxx commented Nov 5, 2019

As the log says this is what pytorch is built with. This may not be what detectron2 is built with.

@qianwangn
Copy link

qianwangn commented Nov 6, 2019

sorry, my falut.

That should be solved by
export TORCH_CUDA_ARCH_LIST="3.5;5.0;6.0;6.1;7.0;7.5"
before python setup.py build develop

@github-actions github-actions bot locked as resolved and limited conversation to collaborators Jan 11, 2021
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

3 participants