Skip to content

Latest commit

 

History

History
18 lines (13 loc) · 1.06 KB

README.md

File metadata and controls

18 lines (13 loc) · 1.06 KB

中文说明 | English

This example demonstrates distilltion on CMRC 2018 task, and using DRCD dataset as data augmentation.

  • run_cmrc2018_train.sh : trains a treacher model (roberta-wwm-base) on CMRC 2018.
  • run_cmrc2018_distill_T3.sh : distills the teacher to T3 with CMRC 2018 and DRCD datasets.
  • run_cmrc2018_distill_T4tiny.sh : distills the teacher to T4tiny with CMRC 2018 and DRCD datasets.

Set the following variables in the shell scripts before running:

  • BERT_DIR : where RoBERTa-wwm-base stores,including vocab.txt, pytorch_model.bin, bert_config.json
  • OUTPUT_ROOT_DIR : this directory stores logs and trained model weights
  • DATA_ROOT_DIR : it includes CMRC 2018 and DRCD datasets:
    • ${DATA_ROOT_DIR}/cmrc2018/squad-style-data/cmrc2018_train.json
    • ${DATA_ROOT_DIR}/cmrc2018/squad-style-data/cmrc2018_dev.json
    • ${DATA_ROOT_DIR}/drcd/DRCD_training.json
  • The trained teacher weights file trained_teacher_model has to be specified if running run_cmrc2018_distill_T3.sh or run_cmrc2018_distill_T4tiny.sh.