Does knowledge distillation support multi-gpu? #56

wingvortex · 2022-02-08T07:48:21Z

Hi, thanks for your sharing.
When I tried to use multi-gpu to train Knowledge Distillation:
python3 -m torch.distributed.run --nproc_per_node $N_GPU distillation.py ...
I got the error:
torch.distributed.elastic.multiprocessing.errors.ChildFailedError:
distillation.py FAILED
Failures:
<NO_OTHER_FAILURES>

The text was updated successfully, but these errors were encountered:

JeiKeiLim · 2022-02-08T23:22:43Z

KD currently does not support multi-gpu.
We adopted KD method from "End-to-end semi-supervised object dection with soft teacher" and generating teacher's feature(I would say it's a guide feature for student) was heavy operation.

The bottom line is KD is already using two GPUs. One for student and one for teacher.
You can check out in https://github.com/j-marple-dev/AYolov2/blob/main/distillation.py#L66

FYI, we have not managed to see a beneficial point using KD yet.

wingvortex · 2022-02-09T00:55:07Z

Already noticed that both the student and the teacher takes a GPU, and the teacher uses quite a lot of GPU memory. Thanks for your extra information, do you mean the performance gain is limited when applying the semi-supervised object detection? In my case, the labeled data to unlabeled data ratio is 1:2.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Does knowledge distillation support multi-gpu? #56

Does knowledge distillation support multi-gpu? #56

wingvortex commented Feb 8, 2022

JeiKeiLim commented Feb 8, 2022

wingvortex commented Feb 9, 2022

Does knowledge distillation support multi-gpu? #56

Does knowledge distillation support multi-gpu? #56

Comments

wingvortex commented Feb 8, 2022

JeiKeiLim commented Feb 8, 2022

wingvortex commented Feb 9, 2022