Skip to content

Commit 89341fa

Browse files
committed
Update README.md
1 parent 8a3d682 commit 89341fa

File tree

1 file changed

+1
-1
lines changed

1 file changed

+1
-1
lines changed

README.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -8,7 +8,7 @@ This is the official pytorch implementation of our paper:
88
<img width=70% src="https://user-images.githubusercontent.com/3794909/92182326-6f78c400-ee19-11ea-80e4-2d6e4d73ce82.png"/>
99
</div>
1010

11-
In this paper, we introduce a simple yet effective approach that can boost the vanilla ResNet-50 to 80%+ Top-1 accuracy on ImageNet without any tricks. Generally, ourmethod is based on the recently proposed [MEAL](https://arxiv.org/abs/1812.02425), i.e.,ensemble knowledge distillation via discriminators. We further simplify it through 1) adopting the similarity loss anddiscriminator only on the final outputs and 2) using the av-erage of softmax probabilities from all teacher ensemblesas the stronger supervision for distillation. One crucial perspective of our method is that the one-hot/hard label shouldnot be used in the distillation process. We show that such asimple framework can achieve state-of-the-art results with-out involving any commonly-used tricks, such as 1) archi-tecture modification; 2) outside training data beyond Im-ageNet; 3) autoaug/randaug; 4) cosine learning rate; 5) mixup/cutmix training; 6) label smoothing; etc.
11+
In this paper, we introduce a simple yet effective approach that can boost the vanilla ResNet-50 to 80%+ Top-1 accuracy on ImageNet without any tricks. Generally, our method is based on the recently proposed [MEAL](https://arxiv.org/abs/1812.02425), i.e., ensemble knowledge distillation via discriminators. We further simplify it through 1) adopting the similarity loss and discriminator only on the final outputs and 2) using the average of softmax probabilities from all teacher ensembles as the stronger supervision for distillation. One crucial perspective of our method is that the one-hot/hard label should not be used in the distillation process. We show that such a simple framework can achieve state-of-the-art results without involving any commonly-used tricks, such as 1) architecture modification; 2) outside training data beyond ImageNet; 3) autoaug/randaug; 4) cosine learning rate; 5) mixup/cutmix training; 6) label smoothing; etc.
1212

1313
## Citation
1414

0 commit comments

Comments
 (0)