Setting --gpu-id 3 when i train ScanNet on Votenet ，i got error #2471

jumptiger66 · 2023-04-25T13:00:35Z

Thanks for your error report and we appreciate it a lot.

Checklist

I have searched related issues but cannot get the expected help.
The bug has not been fixed in the latest version.

Describe the bug
I want to train Votenet (ICCV2019) with ScanNet/SUNRGBD on the default GPU 0. The code runs without any problems on this GPU. However, when I switch to a different GPU (such as GPU 1, 2, or 3, since I am only training on a single GPU anyway), I get an error related to the device.

Reproduction

What command or script did you run?

python tools/train.py '/home/sunhao/code2023/mmdetection3d/configs/votenet/votenet_16x8_sunrgbd-3d-10class.py' --gpu-id 3

Did you make any modifications on the code or config? Did you understand what you have modified?
No modifications.
What dataset did you use?
ScanNet/SunRGBD

Environment

Please run python mmdet3d/utils/collect_env.py to collect necessary environment information and paste it here.
sys.platform: linux
Python: 3.7.16 (default, Jan 17 2023, 22:20:44) [GCC 11.2.0]
CUDA available: True
GPU 0,1,2,3: NVIDIA GeForce RTX 3090
CUDA_HOME: /usr/local/cuda-11.5
NVCC: Cuda compilation tools, release 11.5, V11.5.50
GCC: gcc (Ubuntu 7.5.0-3ubuntu1~18.04) 7.5.0
PyTorch: 1.12.1
PyTorch compiling details: PyTorch built with:

GCC 9.3
C++ Version: 201402
Intel(R) oneAPI Math Kernel Library Version 2021.4-Product Build 20210904 for Intel(R) 64 architecture applications
Intel(R) MKL-DNN v2.6.0 (Git Hash 52b5f107dd9cf10910aaa19cb47f3abf9b349815)
OpenMP 201511 (a.k.a. OpenMP 4.5)
LAPACK is enabled (usually provided by MKL)
NNPACK is enabled
CPU capability usage: AVX2
CUDA Runtime 11.3
NVCC architecture flags: -gencode;arch=compute_37,code=sm_37;-gencode;arch=compute_50,code=sm_50;-gencode;arch=compute_60,code=sm_60;-gencode;arch=compute_61,code=sm_61;-gencode;arch=compute_70,code=sm_70;-gencode;arch=compute_75,code=sm_75;-gencode;arch=compute_80,code=sm_80;-gencode;arch=compute_86,code=sm_86;-gencode;arch=compute_37,code=compute_37
CuDNN 8.3.2 (built against CUDA 11.5)
Magma 2.5.2
Build settings: BLAS_INFO=mkl, BUILD_TYPE=Release, CUDA_VERSION=11.3, CUDNN_VERSION=8.3.2, CXX_COMPILER=/opt/rh/devtoolset-9/root/usr/bin/c++, CXX_FLAGS= -fabi-version=11 -Wno-deprecated -fvisibility-inlines-hidden -DUSE_PTHREADPOOL -fopenmp -DNDEBUG -DUSE_KINETO -DUSE_FBGEMM -DUSE_QNNPACK -DUSE_PYTORCH_QNNPACK -DUSE_XNNPACK -DSYMBOLICATE_MOBILE_DEBUG_HANDLE -DEDGE_PROFILER_USE_KINETO -O2 -fPIC -Wno-narrowing -Wall -Wextra -Werror=return-type -Wno-missing-field-initializers -Wno-type-limits -Wno-array-bounds -Wno-unknown-pragmas -Wno-unused-parameter -Wno-unused-function -Wno-unused-result -Wno-unused-local-typedefs -Wno-strict-overflow -Wno-strict-aliasing -Wno-error=deprecated-declarations -Wno-stringop-overflow -Wno-psabi -Wno-error=pedantic -Wno-error=redundant-decls -Wno-error=old-style-cast -fdiagnostics-color=always -faligned-new -Wno-unused-but-set-variable -Wno-maybe-uninitialized -fno-math-errno -fno-trapping-math -Werror=format -Werror=cast-function-type -Wno-stringop-overflow, LAPACK_INFO=mkl, PERF_WITH_AVX=1, PERF_WITH_AVX2=1, PERF_WITH_AVX512=1, TORCH_VERSION=1.12.1, USE_CUDA=ON, USE_CUDNN=ON, USE_EXCEPTION_PTR=1, USE_GFLAGS=OFF, USE_GLOG=OFF, USE_MKL=ON, USE_MKLDNN=OFF, USE_MPI=OFF, USE_NCCL=ON, USE_NNPACK=ON, USE_OPENMP=ON, USE_ROCM=OFF,

TorchVision: 0.13.1
OpenCV: 4.7.0
MMCV: 1.6.0
MMCV Compiler: GCC 9.3
MMCV CUDA Compiler: 11.3
MMDetection: 2.24.1
MMSegmentation: 0.24.1
MMDetection3D: 1.0.0rc3+120a93d
spconv2.0: False

You may add addition that may be helpful for locating the problem, such as
- How you installed PyTorch [e.g., pip, conda, source]
  Conda
- Other environment variables that may be related (such as $PATH, $LD_LIBRARY_PATH, $PYTHONPATH, etc.)

Error traceback
Traceback (most recent call last):
File "/home/sunhao/code2023/mmdetection3d/tools/train.py", line 284, in
main()
File "/home/sunhao/code2023/mmdetection3d/tools/train.py", line 280, in main
meta=meta)
File "/home/sunhao/code2023/mmdetection3d/mmdet3d/apis/train.py", line 351, in train_model
meta=meta)
File "/home/sunhao/code2023/mmdetection3d/mmdet3d/apis/train.py", line 319, in train_detector
runner.run(data_loaders, cfg.workflow)
File "/home/sunhao/anaconda3/envs/tr3d/lib/python3.7/site-packages/mmcv/runner/epoch_based_runner.py", line 136, in run
epoch_runner(data_loaders[i], **kwargs)
File "/home/sunhao/anaconda3/envs/tr3d/lib/python3.7/site-packages/mmcv/runner/epoch_based_runner.py", line 53, in train
self.run_iter(data_batch, train_mode=True, **kwargs)
File "/home/sunhao/anaconda3/envs/tr3d/lib/python3.7/site-packages/mmcv/runner/epoch_based_runner.py", line 32, in run_iter
**kwargs)
File "/home/sunhao/anaconda3/envs/tr3d/lib/python3.7/site-packages/mmcv/parallel/data_parallel.py", line 77, in train_step
return self.module.train_step(*inputs[0], **kwargs[0])
File "/home/sunhao/anaconda3/envs/tr3d/lib/python3.7/site-packages/mmdet/models/detectors/base.py", line 248, in train_step
losses = self(**data)
File "/home/sunhao/anaconda3/envs/tr3d/lib/python3.7/site-packages/torch/nn/modules/module.py", line 1130, in _call_impl
return forward_call(*input, **kwargs)
File "/home/sunhao/anaconda3/envs/tr3d/lib/python3.7/site-packages/mmcv/runner/fp16_utils.py", line 116, in new_func
return old_func(*args, **kwargs)
File "/home/sunhao/code2023/mmdetection3d/mmdet3d/models/detectors/base.py", line 60, in forward
return self.forward_train(**kwargs)
File "/home/sunhao/code2023/mmdetection3d/mmdet3d/models/detectors/votenet.py", line 59, in forward_train
x = self.extract_feat(points_cat)
File "/home/sunhao/code2023/mmdetection3d/mmdet3d/models/detectors/single_stage.py", line 61, in extract_feat
x = self.backbone(points)
File "/home/sunhao/anaconda3/envs/tr3d/lib/python3.7/site-packages/torch/nn/modules/module.py", line 1130, in _call_impl
return forward_call(*input, **kwargs)
File "/home/sunhao/anaconda3/envs/tr3d/lib/python3.7/site-packages/mmcv/runner/fp16_utils.py", line 116, in new_func
return old_func(*args, **kwargs)
File "/home/sunhao/code2023/mmdetection3d/mmdet3d/models/backbones/pointnet2_sa_ssg.py", line 119, in forward
sa_xyz[i], sa_features[i])
File "/home/sunhao/anaconda3/envs/tr3d/lib/python3.7/site-packages/torch/nn/modules/module.py", line 1130, in _call_impl
return forward_call(*input, **kwargs)
File "/home/sunhao/code2023/mmdetection3d/mmdet3d/ops/pointnet_modules/point_sa_module.py", line 200, in forward
target_xyz)
File "/home/sunhao/code2023/mmdetection3d/mmdet3d/ops/pointnet_modules/point_sa_module.py", line 138, in _sample_points
indices = self.points_sampler(points_xyz, features)
File "/home/sunhao/anaconda3/envs/tr3d/lib/python3.7/site-packages/torch/nn/modules/module.py", line 1130, in _call_impl
return forward_call(*input, **kwargs)
File "/home/sunhao/anaconda3/envs/tr3d/lib/python3.7/site-packages/mmcv/runner/fp16_utils.py", line 205, in new_func
return old_func(*args, **kwargs)
File "/home/sunhao/anaconda3/envs/tr3d/lib/python3.7/site-packages/mmcv/ops/points_sampler.py", line 127, in forward
npoint)
File "/home/sunhao/anaconda3/envs/tr3d/lib/python3.7/site-packages/torch/nn/modules/module.py", line 1130, in _call_impl
return forward_call(*input, **kwargs)
File "/home/sunhao/anaconda3/envs/tr3d/lib/python3.7/site-packages/mmcv/ops/points_sampler.py", line 144, in forward
fps_idx = furthest_point_sample(points.contiguous(), npoint)
File "/home/sunhao/anaconda3/envs/tr3d/lib/python3.7/site-packages/mmcv/ops/furthest_point_sample.py", line 39, in forward
m=num_points,
RuntimeError: furthest_point_sampling_forward_impl: at param 1, inconsistent device: cuda:0 vs cuda:3

Exception raised from Dispatch at /tmp/mmcv/mmcv/ops/csrc/common/pytorch_device_registry.hpp:116 (most recent call first):
frame #0: c10::Error::Error(c10::SourceLocation, std::string) + 0x57 (0x7f06af200497 in /home/sunhao/anaconda3/envs/tr3d/lib/python3.7/site-packages/torch/lib/libc10.so)
frame #1: c10::detail::torchCheckFail(char const*, char const*, unsigned int, std::string const&) + 0x64 (0x7f06af1d7c94 in /home/sunhao/anaconda3/envs/tr3d/lib/python3.7/site-packages/torch/lib/libc10.so)
frame #2: auto Dispatch<DeviceRegistry<void ()(at::Tensor, at::Tensor, at::Tensor, int, int, int), &(furthest_point_sampling_forward_impl(at::Tensor, at::Tensor, at::Tensor, int, int, int))>, at::Tensor&, at::Tensor&, at::Tensor&, int&, int&, int&>(DeviceRegistry<void ()(at::Tensor, at::Tensor, at::Tensor, int, int, int), &(furthest_point_sampling_forward_impl(at::Tensor, at::Tensor, at::Tensor, int, int, int))> const&, char const*, at::Tensor&, at::Tensor&, at::Tensor&, int&, int&, int&) + 0x385 (0x7f05ef8c13b5 in /home/sunhao/anaconda3/envs/tr3d/lib/python3.7/site-packages/mmcv/_ext.cpython-37m-x86_64-linux-gnu.so)
frame #3: furthest_point_sampling_forward_impl(at::Tensor, at::Tensor, at::Tensor, int, int, int) + 0x62 (0x7f05ef8c0c32 in /home/sunhao/anaconda3/envs/tr3d/lib/python3.7/site-packages/mmcv/_ext.cpython-37m-x86_64-linux-gnu.so)
frame #4: furthest_point_sampling_forward(at::Tensor, at::Tensor, at::Tensor, int, int, int) + 0x69 (0x7f05ef8c0cd9 in /home/sunhao/anaconda3/envs/tr3d/lib/python3.7/site-packages/mmcv/_ext.cpython-37m-x86_64-linux-gnu.so)
frame #5: + 0x2c6e84 (0x7f05ef91ce84 in /home/sunhao/anaconda3/envs/tr3d/lib/python3.7/site-packages/mmcv/_ext.cpython-37m-x86_64-linux-gnu.so)
frame #6: + 0x2b4ba1 (0x7f05ef90aba1 in /home/sunhao/anaconda3/envs/tr3d/lib/python3.7/site-packages/mmcv/_ext.cpython-37m-x86_64-linux-gnu.so)
frame #7: _PyMethodDef_RawFastCallKeywords + 0x301 (0x4af061 in /home/sunhao/anaconda3/envs/tr3d/bin/python3.7)
frame #8: /home/sunhao/anaconda3/envs/tr3d/bin/python3.7() [0x4ae9f0]
frame #9: _PyEval_EvalFrameDefault + 0x15d6 (0x4a82b6 in /home/sunhao/anaconda3/envs/tr3d/bin/python3.7)
frame #10: _PyFunction_FastCallDict + 0x116 (0x4c0d96 in /home/sunhao/anaconda3/envs/tr3d/bin/python3.7)
frame #11: THPFunction_apply(_object*, _object*) + 0x5d6 (0x7f06ef136c96 in /home/sunhao/anaconda3/envs/tr3d/lib/python3.7/site-packages/torch/lib/libtorch_python.so)
frame #12: _PyMethodDef_RawFastCallKeywords + 0x1fb (0x4aef5b in /home/sunhao/anaconda3/envs/tr3d/bin/python3.7)
frame #13: /home/sunhao/anaconda3/envs/tr3d/bin/python3.7() [0x4ae9f0]
frame #14: _PyEval_EvalFrameDefault + 0x971 (0x4a7651 in /home/sunhao/anaconda3/envs/tr3d/bin/python3.7)
frame #15: _PyFunction_FastCallDict + 0x116 (0x4c0d96 in /home/sunhao/anaconda3/envs/tr3d/bin/python3.7)
frame #16: /home/sunhao/anaconda3/envs/tr3d/bin/python3.7() [0x4c9a80]
frame #17: PyObject_Call + 0x60 (0x4c7170 in /home/sunhao/anaconda3/envs/tr3d/bin/python3.7)
frame #18: _PyEval_EvalFrameDefault + 0x1ea4 (0x4a8b84 in /home/sunhao/anaconda3/envs/tr3d/bin/python3.7)
frame #19: _PyEval_EvalCodeWithName + 0x201 (0x4a5a81 in /home/sunhao/anaconda3/envs/tr3d/bin/python3.7)
frame #20: _PyFunction_FastCallDict + 0x2d7 (0x4c0f57 in /home/sunhao/anaconda3/envs/tr3d/bin/python3.7)
frame #21: _PyObject_Call_Prepend + 0x6e (0x4c6abe in /home/sunhao/anaconda3/envs/tr3d/bin/python3.7)
frame #22: /home/sunhao/anaconda3/envs/tr3d/bin/python3.7() [0x578b67]
frame #23: _PyObject_FastCallKeywords + 0x430 (0x4b77a0 in /home/sunhao/anaconda3/envs/tr3d/bin/python3.7)
frame #24: /home/sunhao/anaconda3/envs/tr3d/bin/python3.7() [0x4aea19]
frame #25: _PyEval_EvalFrameDefault + 0x971 (0x4a7651 in /home/sunhao/anaconda3/envs/tr3d/bin/python3.7)
frame #26: _PyFunction_FastCallDict + 0x116 (0x4c0d96 in /home/sunhao/anaconda3/envs/tr3d/bin/python3.7)
frame #27: _PyEval_EvalFrameDefault + 0x1ea4 (0x4a8b84 in /home/sunhao/anaconda3/envs/tr3d/bin/python3.7)
frame #28: _PyEval_EvalCodeWithName + 0x201 (0x4a5a81 in /home/sunhao/anaconda3/envs/tr3d/bin/python3.7)
frame #29: _PyFunction_FastCallDict + 0x2d7 (0x4c0f57 in /home/sunhao/anaconda3/envs/tr3d/bin/python3.7)
frame #30: /home/sunhao/anaconda3/envs/tr3d/bin/python3.7() [0x4c9a80]
frame #31: PyObject_Call + 0x60 (0x4c7170 in /home/sunhao/anaconda3/envs/tr3d/bin/python3.7)
frame #32: _PyEval_EvalFrameDefault + 0x1ea4 (0x4a8b84 in /home/sunhao/anaconda3/envs/tr3d/bin/python3.7)
frame #33: _PyEval_EvalCodeWithName + 0x201 (0x4a5a81 in /home/sunhao/anaconda3/envs/tr3d/bin/python3.7)
frame #34: _PyFunction_FastCallDict + 0x2d7 (0x4c0f57 in /home/sunhao/anaconda3/envs/tr3d/bin/python3.7)
frame #35: _PyObject_Call_Prepend + 0x6e (0x4c6abe in /home/sunhao/anaconda3/envs/tr3d/bin/python3.7)
frame #36: /home/sunhao/anaconda3/envs/tr3d/bin/python3.7() [0x578b67]
frame #37: _PyObject_FastCallKeywords + 0x430 (0x4b77a0 in /home/sunhao/anaconda3/envs/tr3d/bin/python3.7)
frame #38: /home/sunhao/anaconda3/envs/tr3d/bin/python3.7() [0x4aea19]
frame #39: _PyEval_EvalFrameDefault + 0x468a (0x4ab36a in /home/sunhao/anaconda3/envs/tr3d/bin/python3.7)
frame #40: _PyFunction_FastCallKeywords + 0x106 (0x4b9d16 in /home/sunhao/anaconda3/envs/tr3d/bin/python3.7)
frame #41: /home/sunhao/anaconda3/envs/tr3d/bin/python3.7() [0x4ae8df]
frame #42: _PyEval_EvalFrameDefault + 0x468a (0x4ab36a in /home/sunhao/anaconda3/envs/tr3d/bin/python3.7)
frame #43: _PyEval_EvalCodeWithName + 0x201 (0x4a5a81 in /home/sunhao/anaconda3/envs/tr3d/bin/python3.7)
frame #44: _PyFunction_FastCallDict + 0x2d7 (0x4c0f57 in /home/sunhao/anaconda3/envs/tr3d/bin/python3.7)
frame #45: /home/sunhao/anaconda3/envs/tr3d/bin/python3.7() [0x4c9a80]
frame #46: PyObject_Call + 0x60 (0x4c7170 in /home/sunhao/anaconda3/envs/tr3d/bin/python3.7)
frame #47: _PyEval_EvalFrameDefault + 0x1ea4 (0x4a8b84 in /home/sunhao/anaconda3/envs/tr3d/bin/python3.7)
frame #48: _PyEval_EvalCodeWithName + 0x201 (0x4a5a81 in /home/sunhao/anaconda3/envs/tr3d/bin/python3.7)
frame #49: _PyFunction_FastCallDict + 0x2d7 (0x4c0f57 in /home/sunhao/anaconda3/envs/tr3d/bin/python3.7)
frame #50: _PyObject_Call_Prepend + 0x6e (0x4c6abe in /home/sunhao/anaconda3/envs/tr3d/bin/python3.7)
frame #51: /home/sunhao/anaconda3/envs/tr3d/bin/python3.7() [0x578b67]
frame #52: _PyObject_FastCallKeywords + 0x430 (0x4b77a0 in /home/sunhao/anaconda3/envs/tr3d/bin/python3.7)
frame #53: /home/sunhao/anaconda3/envs/tr3d/bin/python3.7() [0x4aea19]
frame #54: _PyEval_EvalFrameDefault + 0x971 (0x4a7651 in /home/sunhao/anaconda3/envs/tr3d/bin/python3.7)
frame #55: _PyFunction_FastCallDict + 0x116 (0x4c0d96 in /home/sunhao/anaconda3/envs/tr3d/bin/python3.7)
frame #56: _PyEval_EvalFrameDefault + 0x1ea4 (0x4a8b84 in /home/sunhao/anaconda3/envs/tr3d/bin/python3.7)
frame #57: _PyEval_EvalCodeWithName + 0x201 (0x4a5a81 in /home/sunhao/anaconda3/envs/tr3d/bin/python3.7)
frame #58: _PyFunction_FastCallDict + 0x2d7 (0x4c0f57 in /home/sunhao/anaconda3/envs/tr3d/bin/python3.7)
frame #59: /home/sunhao/anaconda3/envs/tr3d/bin/python3.7() [0x4c9a80]
frame #60: PyObject_Call + 0x60 (0x4c7170 in /home/sunhao/anaconda3/envs/tr3d/bin/python3.7)
frame #61: _PyEval_EvalFrameDefault + 0x1ea4 (0x4a8b84 in /home/sunhao/anaconda3/envs/tr3d/bin/python3.7)
frame #62: _PyEval_EvalCodeWithName + 0x201 (0x4a5a81 in /home/sunhao/anaconda3/envs/tr3d/bin/python3.7)
frame #63: _PyFunction_FastCallDict + 0x2d7 (0x4c0f57 in /home/sunhao/anaconda3/envs/tr3d/bin/python3.7)

A placeholder for trackback.

Bug fix

The text was updated successfully, but these errors were encountered:

jumptiger66 · 2023-04-27T06:44:14Z

I switch to the latest mmdetection3d version ( 1.1.0 ) and this bug has been solved.

JingweiZhang12 assigned Xiangxu-0103 Apr 26, 2023

jumptiger66 closed this as completed Apr 27, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Setting --gpu-id 3 when i train ScanNet on Votenet ，i got error #2471

Setting --gpu-id 3 when i train ScanNet on Votenet ，i got error #2471

jumptiger66 commented Apr 25, 2023

jumptiger66 commented Apr 27, 2023

Setting --gpu-id 3 when i train ScanNet on Votenet ，i got error #2471

Setting --gpu-id 3 when i train ScanNet on Votenet ，i got error #2471

Comments

jumptiger66 commented Apr 25, 2023

jumptiger66 commented Apr 27, 2023