Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Does knowledge distillation support multi-gpu? #56

Open
wingvortex opened this issue Feb 8, 2022 · 2 comments
Open

Does knowledge distillation support multi-gpu? #56

wingvortex opened this issue Feb 8, 2022 · 2 comments

Comments

@wingvortex
Copy link

Hi, thanks for your sharing.
When I tried to use multi-gpu to train Knowledge Distillation:
python3 -m torch.distributed.run --nproc_per_node $N_GPU distillation.py ...
I got the error:
torch.distributed.elastic.multiprocessing.errors.ChildFailedError:
distillation.py FAILED
Failures:
<NO_OTHER_FAILURES>

@JeiKeiLim
Copy link
Contributor

KD currently does not support multi-gpu.
We adopted KD method from "End-to-end semi-supervised object dection with soft teacher" and generating teacher's feature(I would say it's a guide feature for student) was heavy operation.

The bottom line is KD is already using two GPUs. One for student and one for teacher.
You can check out in https://github.com/j-marple-dev/AYolov2/blob/main/distillation.py#L66

FYI, we have not managed to see a beneficial point using KD yet.

@wingvortex
Copy link
Author

Already noticed that both the student and the teacher takes a GPU, and the teacher uses quite a lot of GPU memory. Thanks for your extra information, do you mean the performance gain is limited when applying the semi-supervised object detection? In my case, the labeled data to unlabeled data ratio is 1:2.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants