Skip to content

Commit b23edd6

Browse files
committed
update FAQ
1 parent 2827a40 commit b23edd6

File tree

2 files changed

+8
-2
lines changed

2 files changed

+8
-2
lines changed

README.md

+5-1
Original file line numberDiff line numberDiff line change
@@ -312,7 +312,11 @@ Fore more details, see the explanations in [API documentation](API.md)
312312

313313
## FAQ
314314

315-
TBA
315+
**Q**: How to initialize the student model?
316+
317+
**A**: The student model could be randomly initialized (i.e., with no prior knwledge) or be initialized by pre-trained weights.
318+
For example, when distilling a BERT-base model to a 3-layer BERT, you could initialize the student model with [RBT3](#https://github.com/ymcui/Chinese-BERT-wwm) (for Chinese tasks) or the first three layers of BERT (for English tasks) to avoid cold start problem.
319+
We recommend that users use pre-trained student models whenever possible to fully take the advantage of large-scale pre-training.
316320

317321
## Citation
318322

README_ZH.md

+3-1
Original file line numberDiff line numberDiff line change
@@ -308,7 +308,9 @@ Distiller负责执行实际的蒸馏过程。目前实现了以下的distillers:
308308

309309
## FAQ
310310

311-
TBA
311+
**Q**: 学生模型该如何初始化?
312+
313+
**A**: 知识蒸馏本质上是“老师教学生”的过程。在初始化学生模型时,可以采用随机初始化的形式(即完全不包含任何先验知识),也可以载入已训练好的模型权重。例如,从BERT-base模型蒸馏到3层BERT时,可以预先载入[RBT3](#https://github.com/ymcui/Chinese-BERT-wwm)模型权重(中文任务)或BERT的前三层权重(英文任务),然后进一步进行蒸馏,避免了蒸馏过程的“冷启动”问题。我们建议用户在使用时尽量采用已预训练过的学生模型,以充分利用大规模数据预训练所带来的优势。
312314

313315
## 引用
314316

0 commit comments

Comments
 (0)