Skip to content

Latest commit

 

History

History
64 lines (48 loc) · 2.4 KB

README.md

File metadata and controls

64 lines (48 loc) · 2.4 KB




Apache License


Combined-KD

This project is about Combined-KD long paper: How to Select One Among All ? An Extensive Empirical Study Towards the Robustness of Knowledge Distillation in Natural Language Understanding which is accepted by EMNLP findings 2021. In this project, we proposed a Combined-KD (ComKD) by taking advantage of data-augmentation and progressive training. Results show that our proposed ComKD not only achieves a new state-of-the-art (SOTA) on the GLUE benchmark, but also more robust than other KD methods under OOD evaluation and adversarial attacks. Paper link:https://arxiv.org/abs/2109.05696v1

Requirements and Installation

  1. Set the conda environment

    conda env create -f environment.yml
    
  2. Restore the pretrained model from huggingface in following dirs (Can be downloaded from huggingface e.g., https://huggingface.co/distilbert-base-uncased): -- ./bert_models/distilbert-base -- ./bert_models/distilroberta-base -- ./bert_models/uncased_L-6_H-768_A-12 -- ./bert_models/uncased_L-4_H-256_A-4

  3. Restore the finetuned teacher model in following dirs: -- ./ckpts/teachers_roberta_large -- ./ckpts/teachers_bert_base

  4. Download glue benchmark in following dir (https://gluebenchmark.com/): -- ./glue_data

Getting Started

Train the models:

bash  run_combinekd_mnli_bert.sh           for BERT model
bash  run_combinekd_mnli_roberta.sh        for RoBERTa model

License

This project's license is under the Apache 2.0 license.

@misc{li2021select,
      title={How to Select One Among All? An Extensive Empirical Study Towards the Robustness of Knowledge Distillation in Natural Language Understanding}, 
      author={Tianda Li and Ahmad Rashid and Aref Jafari and Pranav Sharma and Ali Ghodsi and Mehdi Rezagholizadeh},
      year={2021},
      eprint={2109.05696},
      archivePrefix={arXiv},
      primaryClass={cs.CL}
}