Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Setting --gpu-id 3 when i train ScanNet on Votenet ,i got error #2471

Closed
jumptiger66 opened this issue Apr 25, 2023 · 1 comment
Closed
Assignees

Comments

@jumptiger66
Copy link

Thanks for your error report and we appreciate it a lot.

Checklist

  1. I have searched related issues but cannot get the expected help.
  2. The bug has not been fixed in the latest version.

Describe the bug
I want to train Votenet (ICCV2019) with ScanNet/SUNRGBD on the default GPU 0. The code runs without any problems on this GPU. However, when I switch to a different GPU (such as GPU 1, 2, or 3, since I am only training on a single GPU anyway), I get an error related to the device.

Reproduction

  1. What command or script did you run?
python tools/train.py '/home/sunhao/code2023/mmdetection3d/configs/votenet/votenet_16x8_sunrgbd-3d-10class.py' --gpu-id 3
  1. Did you make any modifications on the code or config? Did you understand what you have modified?
    No modifications.

  2. What dataset did you use?
    ScanNet/SunRGBD

Environment

  1. Please run python mmdet3d/utils/collect_env.py to collect necessary environment information and paste it here.
    sys.platform: linux
    Python: 3.7.16 (default, Jan 17 2023, 22:20:44) [GCC 11.2.0]
    CUDA available: True
    GPU 0,1,2,3: NVIDIA GeForce RTX 3090
    CUDA_HOME: /usr/local/cuda-11.5
    NVCC: Cuda compilation tools, release 11.5, V11.5.50
    GCC: gcc (Ubuntu 7.5.0-3ubuntu1~18.04) 7.5.0
    PyTorch: 1.12.1
    PyTorch compiling details: PyTorch built with:
  • GCC 9.3
  • C++ Version: 201402
  • Intel(R) oneAPI Math Kernel Library Version 2021.4-Product Build 20210904 for Intel(R) 64 architecture applications
  • Intel(R) MKL-DNN v2.6.0 (Git Hash 52b5f107dd9cf10910aaa19cb47f3abf9b349815)
  • OpenMP 201511 (a.k.a. OpenMP 4.5)
  • LAPACK is enabled (usually provided by MKL)
  • NNPACK is enabled
  • CPU capability usage: AVX2
  • CUDA Runtime 11.3
  • NVCC architecture flags: -gencode;arch=compute_37,code=sm_37;-gencode;arch=compute_50,code=sm_50;-gencode;arch=compute_60,code=sm_60;-gencode;arch=compute_61,code=sm_61;-gencode;arch=compute_70,code=sm_70;-gencode;arch=compute_75,code=sm_75;-gencode;arch=compute_80,code=sm_80;-gencode;arch=compute_86,code=sm_86;-gencode;arch=compute_37,code=compute_37
  • CuDNN 8.3.2 (built against CUDA 11.5)
  • Magma 2.5.2
  • Build settings: BLAS_INFO=mkl, BUILD_TYPE=Release, CUDA_VERSION=11.3, CUDNN_VERSION=8.3.2, CXX_COMPILER=/opt/rh/devtoolset-9/root/usr/bin/c++, CXX_FLAGS= -fabi-version=11 -Wno-deprecated -fvisibility-inlines-hidden -DUSE_PTHREADPOOL -fopenmp -DNDEBUG -DUSE_KINETO -DUSE_FBGEMM -DUSE_QNNPACK -DUSE_PYTORCH_QNNPACK -DUSE_XNNPACK -DSYMBOLICATE_MOBILE_DEBUG_HANDLE -DEDGE_PROFILER_USE_KINETO -O2 -fPIC -Wno-narrowing -Wall -Wextra -Werror=return-type -Wno-missing-field-initializers -Wno-type-limits -Wno-array-bounds -Wno-unknown-pragmas -Wno-unused-parameter -Wno-unused-function -Wno-unused-result -Wno-unused-local-typedefs -Wno-strict-overflow -Wno-strict-aliasing -Wno-error=deprecated-declarations -Wno-stringop-overflow -Wno-psabi -Wno-error=pedantic -Wno-error=redundant-decls -Wno-error=old-style-cast -fdiagnostics-color=always -faligned-new -Wno-unused-but-set-variable -Wno-maybe-uninitialized -fno-math-errno -fno-trapping-math -Werror=format -Werror=cast-function-type -Wno-stringop-overflow, LAPACK_INFO=mkl, PERF_WITH_AVX=1, PERF_WITH_AVX2=1, PERF_WITH_AVX512=1, TORCH_VERSION=1.12.1, USE_CUDA=ON, USE_CUDNN=ON, USE_EXCEPTION_PTR=1, USE_GFLAGS=OFF, USE_GLOG=OFF, USE_MKL=ON, USE_MKLDNN=OFF, USE_MPI=OFF, USE_NCCL=ON, USE_NNPACK=ON, USE_OPENMP=ON, USE_ROCM=OFF,

TorchVision: 0.13.1
OpenCV: 4.7.0
MMCV: 1.6.0
MMCV Compiler: GCC 9.3
MMCV CUDA Compiler: 11.3
MMDetection: 2.24.1
MMSegmentation: 0.24.1
MMDetection3D: 1.0.0rc3+120a93d
spconv2.0: False

  1. You may add addition that may be helpful for locating the problem, such as
    • How you installed PyTorch [e.g., pip, conda, source]
      Conda
    • Other environment variables that may be related (such as $PATH, $LD_LIBRARY_PATH, $PYTHONPATH, etc.)

Error traceback
Traceback (most recent call last):
File "/home/sunhao/code2023/mmdetection3d/tools/train.py", line 284, in
main()
File "/home/sunhao/code2023/mmdetection3d/tools/train.py", line 280, in main
meta=meta)
File "/home/sunhao/code2023/mmdetection3d/mmdet3d/apis/train.py", line 351, in train_model
meta=meta)
File "/home/sunhao/code2023/mmdetection3d/mmdet3d/apis/train.py", line 319, in train_detector
runner.run(data_loaders, cfg.workflow)
File "/home/sunhao/anaconda3/envs/tr3d/lib/python3.7/site-packages/mmcv/runner/epoch_based_runner.py", line 136, in run
epoch_runner(data_loaders[i], **kwargs)
File "/home/sunhao/anaconda3/envs/tr3d/lib/python3.7/site-packages/mmcv/runner/epoch_based_runner.py", line 53, in train
self.run_iter(data_batch, train_mode=True, **kwargs)
File "/home/sunhao/anaconda3/envs/tr3d/lib/python3.7/site-packages/mmcv/runner/epoch_based_runner.py", line 32, in run_iter
**kwargs)
File "/home/sunhao/anaconda3/envs/tr3d/lib/python3.7/site-packages/mmcv/parallel/data_parallel.py", line 77, in train_step
return self.module.train_step(*inputs[0], **kwargs[0])
File "/home/sunhao/anaconda3/envs/tr3d/lib/python3.7/site-packages/mmdet/models/detectors/base.py", line 248, in train_step
losses = self(**data)
File "/home/sunhao/anaconda3/envs/tr3d/lib/python3.7/site-packages/torch/nn/modules/module.py", line 1130, in _call_impl
return forward_call(*input, **kwargs)
File "/home/sunhao/anaconda3/envs/tr3d/lib/python3.7/site-packages/mmcv/runner/fp16_utils.py", line 116, in new_func
return old_func(*args, **kwargs)
File "/home/sunhao/code2023/mmdetection3d/mmdet3d/models/detectors/base.py", line 60, in forward
return self.forward_train(**kwargs)
File "/home/sunhao/code2023/mmdetection3d/mmdet3d/models/detectors/votenet.py", line 59, in forward_train
x = self.extract_feat(points_cat)
File "/home/sunhao/code2023/mmdetection3d/mmdet3d/models/detectors/single_stage.py", line 61, in extract_feat
x = self.backbone(points)
File "/home/sunhao/anaconda3/envs/tr3d/lib/python3.7/site-packages/torch/nn/modules/module.py", line 1130, in _call_impl
return forward_call(*input, **kwargs)
File "/home/sunhao/anaconda3/envs/tr3d/lib/python3.7/site-packages/mmcv/runner/fp16_utils.py", line 116, in new_func
return old_func(*args, **kwargs)
File "/home/sunhao/code2023/mmdetection3d/mmdet3d/models/backbones/pointnet2_sa_ssg.py", line 119, in forward
sa_xyz[i], sa_features[i])
File "/home/sunhao/anaconda3/envs/tr3d/lib/python3.7/site-packages/torch/nn/modules/module.py", line 1130, in _call_impl
return forward_call(*input, **kwargs)
File "/home/sunhao/code2023/mmdetection3d/mmdet3d/ops/pointnet_modules/point_sa_module.py", line 200, in forward
target_xyz)
File "/home/sunhao/code2023/mmdetection3d/mmdet3d/ops/pointnet_modules/point_sa_module.py", line 138, in _sample_points
indices = self.points_sampler(points_xyz, features)
File "/home/sunhao/anaconda3/envs/tr3d/lib/python3.7/site-packages/torch/nn/modules/module.py", line 1130, in _call_impl
return forward_call(*input, **kwargs)
File "/home/sunhao/anaconda3/envs/tr3d/lib/python3.7/site-packages/mmcv/runner/fp16_utils.py", line 205, in new_func
return old_func(*args, **kwargs)
File "/home/sunhao/anaconda3/envs/tr3d/lib/python3.7/site-packages/mmcv/ops/points_sampler.py", line 127, in forward
npoint)
File "/home/sunhao/anaconda3/envs/tr3d/lib/python3.7/site-packages/torch/nn/modules/module.py", line 1130, in _call_impl
return forward_call(*input, **kwargs)
File "/home/sunhao/anaconda3/envs/tr3d/lib/python3.7/site-packages/mmcv/ops/points_sampler.py", line 144, in forward
fps_idx = furthest_point_sample(points.contiguous(), npoint)
File "/home/sunhao/anaconda3/envs/tr3d/lib/python3.7/site-packages/mmcv/ops/furthest_point_sample.py", line 39, in forward
m=num_points,
RuntimeError: furthest_point_sampling_forward_impl: at param 1, inconsistent device: cuda:0 vs cuda:3

Exception raised from Dispatch at /tmp/mmcv/mmcv/ops/csrc/common/pytorch_device_registry.hpp:116 (most recent call first):
frame #0: c10::Error::Error(c10::SourceLocation, std::string) + 0x57 (0x7f06af200497 in /home/sunhao/anaconda3/envs/tr3d/lib/python3.7/site-packages/torch/lib/libc10.so)
frame #1: c10::detail::torchCheckFail(char const*, char const*, unsigned int, std::string const&) + 0x64 (0x7f06af1d7c94 in /home/sunhao/anaconda3/envs/tr3d/lib/python3.7/site-packages/torch/lib/libc10.so)
frame #2: auto Dispatch<DeviceRegistry<void ()(at::Tensor, at::Tensor, at::Tensor, int, int, int), &(furthest_point_sampling_forward_impl(at::Tensor, at::Tensor, at::Tensor, int, int, int))>, at::Tensor&, at::Tensor&, at::Tensor&, int&, int&, int&>(DeviceRegistry<void ()(at::Tensor, at::Tensor, at::Tensor, int, int, int), &(furthest_point_sampling_forward_impl(at::Tensor, at::Tensor, at::Tensor, int, int, int))> const&, char const*, at::Tensor&, at::Tensor&, at::Tensor&, int&, int&, int&) + 0x385 (0x7f05ef8c13b5 in /home/sunhao/anaconda3/envs/tr3d/lib/python3.7/site-packages/mmcv/_ext.cpython-37m-x86_64-linux-gnu.so)
frame #3: furthest_point_sampling_forward_impl(at::Tensor, at::Tensor, at::Tensor, int, int, int) + 0x62 (0x7f05ef8c0c32 in /home/sunhao/anaconda3/envs/tr3d/lib/python3.7/site-packages/mmcv/_ext.cpython-37m-x86_64-linux-gnu.so)
frame #4: furthest_point_sampling_forward(at::Tensor, at::Tensor, at::Tensor, int, int, int) + 0x69 (0x7f05ef8c0cd9 in /home/sunhao/anaconda3/envs/tr3d/lib/python3.7/site-packages/mmcv/_ext.cpython-37m-x86_64-linux-gnu.so)
frame #5: + 0x2c6e84 (0x7f05ef91ce84 in /home/sunhao/anaconda3/envs/tr3d/lib/python3.7/site-packages/mmcv/_ext.cpython-37m-x86_64-linux-gnu.so)
frame #6: + 0x2b4ba1 (0x7f05ef90aba1 in /home/sunhao/anaconda3/envs/tr3d/lib/python3.7/site-packages/mmcv/_ext.cpython-37m-x86_64-linux-gnu.so)
frame #7: _PyMethodDef_RawFastCallKeywords + 0x301 (0x4af061 in /home/sunhao/anaconda3/envs/tr3d/bin/python3.7)
frame #8: /home/sunhao/anaconda3/envs/tr3d/bin/python3.7() [0x4ae9f0]
frame #9: _PyEval_EvalFrameDefault + 0x15d6 (0x4a82b6 in /home/sunhao/anaconda3/envs/tr3d/bin/python3.7)
frame #10: _PyFunction_FastCallDict + 0x116 (0x4c0d96 in /home/sunhao/anaconda3/envs/tr3d/bin/python3.7)
frame #11: THPFunction_apply(_object*, _object*) + 0x5d6 (0x7f06ef136c96 in /home/sunhao/anaconda3/envs/tr3d/lib/python3.7/site-packages/torch/lib/libtorch_python.so)
frame #12: _PyMethodDef_RawFastCallKeywords + 0x1fb (0x4aef5b in /home/sunhao/anaconda3/envs/tr3d/bin/python3.7)
frame #13: /home/sunhao/anaconda3/envs/tr3d/bin/python3.7() [0x4ae9f0]
frame #14: _PyEval_EvalFrameDefault + 0x971 (0x4a7651 in /home/sunhao/anaconda3/envs/tr3d/bin/python3.7)
frame #15: _PyFunction_FastCallDict + 0x116 (0x4c0d96 in /home/sunhao/anaconda3/envs/tr3d/bin/python3.7)
frame #16: /home/sunhao/anaconda3/envs/tr3d/bin/python3.7() [0x4c9a80]
frame #17: PyObject_Call + 0x60 (0x4c7170 in /home/sunhao/anaconda3/envs/tr3d/bin/python3.7)
frame #18: _PyEval_EvalFrameDefault + 0x1ea4 (0x4a8b84 in /home/sunhao/anaconda3/envs/tr3d/bin/python3.7)
frame #19: _PyEval_EvalCodeWithName + 0x201 (0x4a5a81 in /home/sunhao/anaconda3/envs/tr3d/bin/python3.7)
frame #20: _PyFunction_FastCallDict + 0x2d7 (0x4c0f57 in /home/sunhao/anaconda3/envs/tr3d/bin/python3.7)
frame #21: _PyObject_Call_Prepend + 0x6e (0x4c6abe in /home/sunhao/anaconda3/envs/tr3d/bin/python3.7)
frame #22: /home/sunhao/anaconda3/envs/tr3d/bin/python3.7() [0x578b67]
frame #23: _PyObject_FastCallKeywords + 0x430 (0x4b77a0 in /home/sunhao/anaconda3/envs/tr3d/bin/python3.7)
frame #24: /home/sunhao/anaconda3/envs/tr3d/bin/python3.7() [0x4aea19]
frame #25: _PyEval_EvalFrameDefault + 0x971 (0x4a7651 in /home/sunhao/anaconda3/envs/tr3d/bin/python3.7)
frame #26: _PyFunction_FastCallDict + 0x116 (0x4c0d96 in /home/sunhao/anaconda3/envs/tr3d/bin/python3.7)
frame #27: _PyEval_EvalFrameDefault + 0x1ea4 (0x4a8b84 in /home/sunhao/anaconda3/envs/tr3d/bin/python3.7)
frame #28: _PyEval_EvalCodeWithName + 0x201 (0x4a5a81 in /home/sunhao/anaconda3/envs/tr3d/bin/python3.7)
frame #29: _PyFunction_FastCallDict + 0x2d7 (0x4c0f57 in /home/sunhao/anaconda3/envs/tr3d/bin/python3.7)
frame #30: /home/sunhao/anaconda3/envs/tr3d/bin/python3.7() [0x4c9a80]
frame #31: PyObject_Call + 0x60 (0x4c7170 in /home/sunhao/anaconda3/envs/tr3d/bin/python3.7)
frame #32: _PyEval_EvalFrameDefault + 0x1ea4 (0x4a8b84 in /home/sunhao/anaconda3/envs/tr3d/bin/python3.7)
frame #33: _PyEval_EvalCodeWithName + 0x201 (0x4a5a81 in /home/sunhao/anaconda3/envs/tr3d/bin/python3.7)
frame #34: _PyFunction_FastCallDict + 0x2d7 (0x4c0f57 in /home/sunhao/anaconda3/envs/tr3d/bin/python3.7)
frame #35: _PyObject_Call_Prepend + 0x6e (0x4c6abe in /home/sunhao/anaconda3/envs/tr3d/bin/python3.7)
frame #36: /home/sunhao/anaconda3/envs/tr3d/bin/python3.7() [0x578b67]
frame #37: _PyObject_FastCallKeywords + 0x430 (0x4b77a0 in /home/sunhao/anaconda3/envs/tr3d/bin/python3.7)
frame #38: /home/sunhao/anaconda3/envs/tr3d/bin/python3.7() [0x4aea19]
frame #39: _PyEval_EvalFrameDefault + 0x468a (0x4ab36a in /home/sunhao/anaconda3/envs/tr3d/bin/python3.7)
frame #40: _PyFunction_FastCallKeywords + 0x106 (0x4b9d16 in /home/sunhao/anaconda3/envs/tr3d/bin/python3.7)
frame #41: /home/sunhao/anaconda3/envs/tr3d/bin/python3.7() [0x4ae8df]
frame #42: _PyEval_EvalFrameDefault + 0x468a (0x4ab36a in /home/sunhao/anaconda3/envs/tr3d/bin/python3.7)
frame #43: _PyEval_EvalCodeWithName + 0x201 (0x4a5a81 in /home/sunhao/anaconda3/envs/tr3d/bin/python3.7)
frame #44: _PyFunction_FastCallDict + 0x2d7 (0x4c0f57 in /home/sunhao/anaconda3/envs/tr3d/bin/python3.7)
frame #45: /home/sunhao/anaconda3/envs/tr3d/bin/python3.7() [0x4c9a80]
frame #46: PyObject_Call + 0x60 (0x4c7170 in /home/sunhao/anaconda3/envs/tr3d/bin/python3.7)
frame #47: _PyEval_EvalFrameDefault + 0x1ea4 (0x4a8b84 in /home/sunhao/anaconda3/envs/tr3d/bin/python3.7)
frame #48: _PyEval_EvalCodeWithName + 0x201 (0x4a5a81 in /home/sunhao/anaconda3/envs/tr3d/bin/python3.7)
frame #49: _PyFunction_FastCallDict + 0x2d7 (0x4c0f57 in /home/sunhao/anaconda3/envs/tr3d/bin/python3.7)
frame #50: _PyObject_Call_Prepend + 0x6e (0x4c6abe in /home/sunhao/anaconda3/envs/tr3d/bin/python3.7)
frame #51: /home/sunhao/anaconda3/envs/tr3d/bin/python3.7() [0x578b67]
frame #52: _PyObject_FastCallKeywords + 0x430 (0x4b77a0 in /home/sunhao/anaconda3/envs/tr3d/bin/python3.7)
frame #53: /home/sunhao/anaconda3/envs/tr3d/bin/python3.7() [0x4aea19]
frame #54: _PyEval_EvalFrameDefault + 0x971 (0x4a7651 in /home/sunhao/anaconda3/envs/tr3d/bin/python3.7)
frame #55: _PyFunction_FastCallDict + 0x116 (0x4c0d96 in /home/sunhao/anaconda3/envs/tr3d/bin/python3.7)
frame #56: _PyEval_EvalFrameDefault + 0x1ea4 (0x4a8b84 in /home/sunhao/anaconda3/envs/tr3d/bin/python3.7)
frame #57: _PyEval_EvalCodeWithName + 0x201 (0x4a5a81 in /home/sunhao/anaconda3/envs/tr3d/bin/python3.7)
frame #58: _PyFunction_FastCallDict + 0x2d7 (0x4c0f57 in /home/sunhao/anaconda3/envs/tr3d/bin/python3.7)
frame #59: /home/sunhao/anaconda3/envs/tr3d/bin/python3.7() [0x4c9a80]
frame #60: PyObject_Call + 0x60 (0x4c7170 in /home/sunhao/anaconda3/envs/tr3d/bin/python3.7)
frame #61: _PyEval_EvalFrameDefault + 0x1ea4 (0x4a8b84 in /home/sunhao/anaconda3/envs/tr3d/bin/python3.7)
frame #62: _PyEval_EvalCodeWithName + 0x201 (0x4a5a81 in /home/sunhao/anaconda3/envs/tr3d/bin/python3.7)
frame #63: _PyFunction_FastCallDict + 0x2d7 (0x4c0f57 in /home/sunhao/anaconda3/envs/tr3d/bin/python3.7)

A placeholder for trackback.

Bug fix

@jumptiger66
Copy link
Author

I switch to the latest mmdetection3d version ( 1.1.0 ) and this bug has been solved.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants