-
Notifications
You must be signed in to change notification settings - Fork 2
Open
Description
Thank you for the nice work, but I face the unexpected segmentation fault encountered in worker even with num_workers set to 0. Error message are listed below. I use NVIDIA Tesla V100. Do you have any idea? Thanks.
batch_size: 8
train_set_size: 403
LR 0.000100
Epoch: 0%| | 0/75 [00:00<?, ?ep/sERROR: Unexpected segmentation fault encountered in worker. | 0/51 [00:00<?, ?it/s]
ERROR: Unexpected segmentation fault encountered in worker.
ERROR: Unexpected segmentation fault encountered in worker.
ERROR: Unexpected segmentation fault encountered in worker.
ERROR: Unexpected segmentation fault encountered in worker.
ERROR: Unexpected segmentation fault encountered in worker.
Train ep1: 0%| | 0/51 [00:00<?, ?it/s]
Epoch: 0%| | 0/75 [00:01<?, ?ep/s]
Traceback (most recent call last):
File "/home/u111061517/.local/lib/python3.7/site-packages/torch/utils/data/dataloader.py", line 761, in _try_get_data
data = self._data_queue.get(timeout=timeout)
File "/home/u111061517/miniconda3/envs/dmhnet/lib/python3.7/queue.py", line 179, in get
self.not_empty.wait(remaining)
File "/home/u111061517/miniconda3/envs/dmhnet/lib/python3.7/threading.py", line 300, in wait
ERROR: Unexpected segmentation fault encountered in worker.
ERROR: Unexpected segmentation fault encountered in worker.
gotit = waiter.acquire(True, timeout)
File "/home/u111061517/.local/lib/python3.7/site-packages/torch/utils/data/_utils/signal_handling.py", line 66, in handler
_error_if_any_worker_fails()
RuntimeError: DataLoader worker (pid 270398) is killed by signal: Segmentation fault.
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "train.py", line 321, in <module>
input = next(iterator_train)
File "/home/u111061517/.local/lib/python3.7/site-packages/torch/utils/data/dataloader.py", line 345, in __next__
data = self._next_data()
File "/home/u111061517/.local/lib/python3.7/site-packages/torch/utils/data/dataloader.py", line 841, in _next_data
idx, data = self._get_data()
File "/home/u111061517/.local/lib/python3.7/site-packages/torch/utils/data/dataloader.py", line 798, in _get_data
success, data = self._try_get_data()
File "/home/u111061517/.local/lib/python3.7/site-packages/torch/utils/data/dataloader.py", line 774, in _try_get_data
raise RuntimeError('DataLoader worker (pid(s) {}) exited unexpectedly'.format(pids_str))
RuntimeError: DataLoader worker (pid(s) 270395, 270398, 270405, 270406, 270407) exited unexpectedly
Segmentation fault (core dumped)
The following error message occurred when I set num_workers to 0
num_workers: 8
batch_size: 8
train_set_size: 403
LR 0.000100
Epoch: 0%| | 0/75 [00:00<?, ?ep/sSegmentation fault (core dumped)
| 0/51 [00:00<?, ?it/s]
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
No labels