This is a PyTorch implementation of the RRD paper:
@misc{giakoumoglou2024relational,
title={Relational Representation Distillation},
author={Nikolaos Giakoumoglou and Tania Stathaki},
year={2024},
eprint={2407.12073},
archivePrefix={arXiv},
primaryClass={cs.CV},
url={https://arxiv.org/abs/2407.12073},
}
Please refer to CIFAR-100 for more details.
Please refer to ImageNet for more details. Also supports CIFAR-100 classification.
Knowledge distillation transfers knowledge from large, high-capacity teacher models to more compact student networks. The standard approach minimizes the Kullback–Leibler (KL) divergence between the probabilistic outputs of the teacher and student, effectively aligning predictions but neglecting the structural relationships encoded within the teacher’s internal representations. Recent advances have adopted contrastive learning objectives to address this limitation; however, such instance-discrimination–based methods inevitably induce a “class collision problem”, in which semantically related samples are inappropriately pushed apart despite belonging to similar classes. To overcome this, we propose Relational Representation Distillation (RRD) that preserves the relative relationships among instances rather than enforcing absolute separation. Our method introduces separate temperature parameters for teacher and student distributions, with a sharper teacher (low
Figure 1. Visualization of the information bottleneck effect.
The teacher produces a sharper similarity distribution
This project is under the CC-BY-NC 4.0 license. See LICENSE for details.
