-
Notifications
You must be signed in to change notification settings - Fork 4.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
相同dpo数据,相同训练配置和训练参数,在safe-rlhf框架训练完成以后回复正常,在llama-factory训练以后模型重复输出 #6458
Comments
复现命令? |
#!/bin/bash export CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7 PATH_ORI=${0%/*} WORKDIR=/LLaMA-Factory MASTER_PORT=12344 if [ ${WORLD_SIZE} -gt 1 ] else set -o pipefail $submit $WORKDIR/src/train.py
|
用的是 base model 还是 instruct model? |
用的是qwen 的instruct model,看起来--templat qwen 应该保证了训练数据输入和instruct model微调过程的一致性。 |
有对比过两者 loss 和 logp 曲线吗 |
@Xuanwu-Gong 方便看一下推理脚本命令吗?是否用 llamafactory 推理的, eos token 有没有设置对 |
Reminder
System Info
llamafactory
version: 0.9.1.dev0Reproduction
这个是两个框架下面dpo微调时的参数配置
同时对数据token进行了统计,能确保所有数据都没有出现超过截断长度。
Expected behavior
No response
Others
No response
The text was updated successfully, but these errors were encountered: