Skip to content

Cuda error when the depth value is large. #47

@jarvishou829

Description

@jarvishou829

I record a data sequence by myself and run the code. After processing about 800 frames, the following error appears. It seems that the dim of map_states["voxel_vertex_idx"] and map_states["voxel_center_xyz"] exceeds the num_embeddings in the config file which is set to 20000. When I set the num_embeddings to 40000, after 1400+ frames the error appears again. How can I solve this correctly? I find that when the depth value is large, the dim of map_states["voxel_vertex_idx"] and map_states["voxel_center_xyz"] turns to get large. When I scale the depth value to 0.5 of the origin value the error no longer appears, but the rendered result is not good.

../aten/src/ATen/native/cuda/Indexing.cu:975: indexSelectLargeIndex: block: [164,0,0], thread: [123,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
../aten/src/ATen/native/cuda/Indexing.cu:975: indexSelectLargeIndex: block: [164,0,0], thread: [124,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
../aten/src/ATen/native/cuda/Indexing.cu:975: indexSelectLargeIndex: block: [164,0,0], thread: [125,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
../aten/src/ATen/native/cuda/Indexing.cu:975: indexSelectLargeIndex: block: [164,0,0], thread: [126,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
../aten/src/ATen/native/cuda/Indexing.cu:975: indexSelectLargeIndex: block: [164,0,0], thread: [127,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
Process Process-2:
Traceback (most recent call last):
  File "/home/user/miniconda3/envs/voxfusion/lib/python3.8/multiprocessing/process.py", line 315, in _bootstrap
    self.run()
  File "/home/user/miniconda3/envs/voxfusion/lib/python3.8/multiprocessing/process.py", line 108, in run
    self._target(*self._args, **self._kwargs)
  File "/home/user/nerf_ws/ori/voxfusion/src/mapping.py", line 128, in spin
    self.do_mapping(share_data, tracked_frame, writer=writer)
  File "/home/user/nerf_ws/ori/voxfusion/src/mapping.py", line 182, in do_mapping
    bundle_adjust_frames(
  File "/home/user/nerf_ws/ori/voxfusion/src/utils/renderer.py", line 496, in bundle_adjust_frames
    final_outputs = render_rays(
  File "/home/user/nerf_ws/ori/voxfusion/src/utils/renderer.py", line 288, in render_rays
    chunk_inputs = get_features(chunk_samples, map_states, voxel_size)
  File "/home/user/miniconda3/envs/voxfusion/lib/python3.8/site-packages/torch/autograd/grad_mode.py", line 27, in decorate_context
    return func(*args, **kwargs)
  File "/home/user/nerf_ws/ori/voxfusion/src/utils/renderer.py", line 96, in get_features
    point_feats = F.embedding(F.embedding(
  File "/home/user/miniconda3/envs/voxfusion/lib/python3.8/site-packages/torch/nn/functional.py", line 2199, in embedding
    return torch.embedding(weight, input, padding_idx, scale_grad_by_freq, sparse)
RuntimeError: CUDA error: device-side assert triggered
terminate called after throwing an instance of 'c10::CUDAError'
  what():  CUDA error: device-side assert triggered
Exception raised from operator() at ../c10/cuda/CUDACachingAllocator.cpp:1808 (most recent call first):
frame #0: c10::Error::Error(c10::SourceLocation, std::string) + 0x3e (0x7f29bcada20e in /home/user/miniconda3/envs/voxfusion/lib/python3.8/site-packages/torch/lib/libc10.so)
frame #1: <unknown function> + 0x2759b (0x7f29bcb5559b in /home/user/miniconda3/envs/voxfusion/lib/python3.8/site-packages/torch/lib/libc10_cuda.so)
frame #2: <unknown function> + 0x27621 (0x7f29bcb55621 in /home/user/miniconda3/envs/voxfusion/lib/python3.8/site-packages/torch/lib/libc10_cuda.so)
frame #3: <unknown function> + 0x608180 (0x7f29afd28180 in /home/user/miniconda3/envs/voxfusion/lib/python3.8/site-packages/torch/lib/libtorch_python.so)
frame #4: <unknown function> + 0x4669f8 (0x7f29afb869f8 in /home/user/miniconda3/envs/voxfusion/lib/python3.8/site-packages/torch/lib/libtorch_python.so)
frame #5: c10::TensorImpl::release_resources() + 0x175 (0x7f29bcac17a5 in /home/user/miniconda3/envs/voxfusion/lib/python3.8/site-packages/torch/lib/libc10.so)
frame #6: <unknown function> + 0x3628c5 (0x7f29afa828c5 in /home/user/miniconda3/envs/voxfusion/lib/python3.8/site-packages/torch/lib/libtorch_python.so)
frame #7: <unknown function> + 0x67ca08 (0x7f29afd9ca08 in /home/user/miniconda3/envs/voxfusion/lib/python3.8/site-packages/torch/lib/libtorch_python.so)
frame #8: THPVariable_subclass_dealloc(_object*) + 0x2d5 (0x7f29afd9cdd5 in /home/user/miniconda3/envs/voxfusion/lib/python3.8/site-packages/torch/lib/libtorch_python.so)
frame #9: <unknown function> + 0x114b78 (0x55f151578b78 in /home/user/miniconda3/envs/voxfusion/bin/python)
frame #10: <unknown function> + 0x13b248 (0x55f15159f248 in /home/user/miniconda3/envs/voxfusion/bin/python)
frame #11: <unknown function> + 0x121e38 (0x55f151585e38 in /home/user/miniconda3/envs/voxfusion/bin/python)
frame #12: <unknown function> + 0x1330d8 (0x55f1515970d8 in /home/user/miniconda3/envs/voxfusion/bin/python)
frame #13: <unknown function> + 0x1330c1 (0x55f1515970c1 in /home/user/miniconda3/envs/voxfusion/bin/python)
frame #14: <unknown function> + 0x1330c1 (0x55f1515970c1 in /home/user/miniconda3/envs/voxfusion/bin/python)

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions