Skip to content

Unexpected segmentation fault encountered in worker. #3

@YuChengHsieh

Description

@YuChengHsieh

Thank you for the nice work, but I face the unexpected segmentation fault encountered in worker even with num_workers set to 0. Error message are listed below. I use NVIDIA Tesla V100. Do you have any idea? Thanks.

batch_size: 8
train_set_size: 403
LR 0.000100
Epoch:   0%|                                                                                                       | 0/75 [00:00<?, ?ep/sERROR: Unexpected segmentation fault encountered in worker.                                                         | 0/51 [00:00<?, ?it/s]
ERROR: Unexpected segmentation fault encountered in worker.
ERROR: Unexpected segmentation fault encountered in worker.
ERROR: Unexpected segmentation fault encountered in worker.
ERROR: Unexpected segmentation fault encountered in worker.
ERROR: Unexpected segmentation fault encountered in worker.
Train ep1:   0%|                                                                                                   | 0/51 [00:00<?, ?it/s]
Epoch:   0%|                                                                                                       | 0/75 [00:01<?, ?ep/s]
Traceback (most recent call last):
  File "/home/u111061517/.local/lib/python3.7/site-packages/torch/utils/data/dataloader.py", line 761, in _try_get_data
    data = self._data_queue.get(timeout=timeout)
  File "/home/u111061517/miniconda3/envs/dmhnet/lib/python3.7/queue.py", line 179, in get
    self.not_empty.wait(remaining)
  File "/home/u111061517/miniconda3/envs/dmhnet/lib/python3.7/threading.py", line 300, in wait
ERROR: Unexpected segmentation fault encountered in worker.
ERROR: Unexpected segmentation fault encountered in worker.
    gotit = waiter.acquire(True, timeout)
  File "/home/u111061517/.local/lib/python3.7/site-packages/torch/utils/data/_utils/signal_handling.py", line 66, in handler
    _error_if_any_worker_fails()
RuntimeError: DataLoader worker (pid 270398) is killed by signal: Segmentation fault. 

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "train.py", line 321, in <module>
    input = next(iterator_train)
  File "/home/u111061517/.local/lib/python3.7/site-packages/torch/utils/data/dataloader.py", line 345, in __next__
    data = self._next_data()
  File "/home/u111061517/.local/lib/python3.7/site-packages/torch/utils/data/dataloader.py", line 841, in _next_data
    idx, data = self._get_data()
  File "/home/u111061517/.local/lib/python3.7/site-packages/torch/utils/data/dataloader.py", line 798, in _get_data
    success, data = self._try_get_data()
  File "/home/u111061517/.local/lib/python3.7/site-packages/torch/utils/data/dataloader.py", line 774, in _try_get_data
    raise RuntimeError('DataLoader worker (pid(s) {}) exited unexpectedly'.format(pids_str))
RuntimeError: DataLoader worker (pid(s) 270395, 270398, 270405, 270406, 270407) exited unexpectedly
Segmentation fault (core dumped)

The following error message occurred when I set num_workers to 0

num_workers: 8
batch_size: 8
train_set_size: 403
LR 0.000100
Epoch:   0%|               | 0/75 [00:00<?, ?ep/sSegmentation fault (core dumped)     
| 0/51 [00:00<?, ?it/s]

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions