You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Hi there, many thanks for your excellent work and codes!
When I followed your codes to train a nasty resnet50 on cifar100 dataset, the training process fail from the beginning, the loss increases and becomes nan during the first several batches. I followed your instructions to train an original resnet50 first and then train the nasty teacher by following codes, without modifying any params.json file.
I check the hyper parameters in the nasty resnet50 json file and they do align the paper. It also seems strange that other nasty networks trained on cifar100 (resnet18, resnext29) do not appear this issue.
Have you encountered this kind of problem and would you have any suggestions? And would it be possible to share your training log of nasty resnet50 on cifar100? Thank you so much for your time in advance and looking forward to your reply!
The text was updated successfully, but these errors were encountered:
Hi, unfortunately, I have graduated and cannot get the original training logs for resnet50+cifar100. I have posted a similar log for ResNext-29 + CIFAR-100 before (#14) before, and you can take it as a reference. Besides, one potential solution is to reduce the weights for the nasty loss during training, which may help the collapse.
Hi there, many thanks for your excellent work and codes!
When I followed your codes to train a nasty resnet50 on cifar100 dataset, the training process fail from the beginning, the loss increases and becomes nan during the first several batches. I followed your instructions to train an original resnet50 first and then train the nasty teacher by following codes, without modifying any params.json file.
python .\train_scratch.py --save_path .\experiments\CIFAR100\baseline\resnet50 --gpu_id 0
python .\train_nasty.py --save_path .\experiments\CIFAR100\kd_nasty_resnet50\nasty_resnet50 --gpu_id 0
And here are the training logs for these two models.
baseline_resnet50_training.log
nasty-resnet50-training.log
I check the hyper parameters in the nasty resnet50 json file and they do align the paper. It also seems strange that other nasty networks trained on cifar100 (resnet18, resnext29) do not appear this issue.
Have you encountered this kind of problem and would you have any suggestions? And would it be possible to share your training log of nasty resnet50 on cifar100? Thank you so much for your time in advance and looking forward to your reply!
The text was updated successfully, but these errors were encountered: