Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Nasty ResNet50 issue #17

Open
huangzy2333 opened this issue Sep 5, 2024 · 1 comment
Open

Nasty ResNet50 issue #17

huangzy2333 opened this issue Sep 5, 2024 · 1 comment

Comments

@huangzy2333
Copy link

Hi there, many thanks for your excellent work and codes!

When I followed your codes to train a nasty resnet50 on cifar100 dataset, the training process fail from the beginning, the loss increases and becomes nan during the first several batches. I followed your instructions to train an original resnet50 first and then train the nasty teacher by following codes, without modifying any params.json file.

python .\train_scratch.py --save_path .\experiments\CIFAR100\baseline\resnet50 --gpu_id 0

python .\train_nasty.py --save_path .\experiments\CIFAR100\kd_nasty_resnet50\nasty_resnet50 --gpu_id 0

And here are the training logs for these two models.

baseline_resnet50_training.log

nasty-resnet50-training.log

I check the hyper parameters in the nasty resnet50 json file and they do align the paper. It also seems strange that other nasty networks trained on cifar100 (resnet18, resnext29) do not appear this issue.

Have you encountered this kind of problem and would you have any suggestions? And would it be possible to share your training log of nasty resnet50 on cifar100? Thank you so much for your time in advance and looking forward to your reply!

@HowieMa
Copy link
Collaborator

HowieMa commented Sep 7, 2024

Hi, unfortunately, I have graduated and cannot get the original training logs for resnet50+cifar100. I have posted a similar log for ResNext-29 + CIFAR-100 before (#14) before, and you can take it as a reference. Besides, one potential solution is to reduce the weights for the nasty loss during training, which may help the collapse.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants