-In this paper, we introduce a simple yet effective approach that can boost the vanilla ResNet-50 to 80%+ Top-1 accuracy on ImageNet without any tricks. Generally, ourmethod is based on the recently proposed [MEAL](https://arxiv.org/abs/1812.02425), i.e.,ensemble knowledge distillation via discriminators. We further simplify it through 1) adopting the similarity loss anddiscriminator only on the final outputs and 2) using the av-erage of softmax probabilities from all teacher ensemblesas the stronger supervision for distillation. One crucial perspective of our method is that the one-hot/hard label shouldnot be used in the distillation process. We show that such asimple framework can achieve state-of-the-art results with-out involving any commonly-used tricks, such as 1) archi-tecture modification; 2) outside training data beyond Im-ageNet; 3) autoaug/randaug; 4) cosine learning rate; 5) mixup/cutmix training; 6) label smoothing; etc.
0 commit comments