相同dpo数据，相同训练配置和训练参数，在safe-rlhf框架训练完成以后回复正常，在llama-factory训练以后模型重复输出 #6458

Xuanwu-Gong · 2024-12-27T01:53:31Z

Reminder

I have read the README and searched the existing issues.

System Info

llamafactory version: 0.9.1.dev0
Platform: Linux-5.10.101-1.el8.ssai.x86_64-x86_64-with-glibc2.31
Python version: 3.11.10
PyTorch version: 2.5.1+cu124 (GPU)
Transformers version: 4.46.1
Datasets version: 3.1.0
Accelerate version: 1.0.1
PEFT version: 0.12.0
TRL version: 0.9.6
GPU type: NVIDIA A100-SXM4-80GB
vLLM version: 0.6.5

Reproduction

这个是两个框架下面dpo微调时的参数配置

同时对数据token进行了统计，能确保所有数据都没有出现超过截断长度。

Expected behavior

No response

Others

No response

The text was updated successfully, but these errors were encountered:

hiyouga · 2024-12-27T08:21:39Z

复现命令？

Xuanwu-Gong · 2024-12-27T10:04:32Z

#!/bin/bash
task_name=xxxx
pretrained_model_path=xxxx
dataset_dir=xxxx
model_output_dir=xxxx
dataset=xxxx
max_len=8196
bsz=1
epoch=2
accum=1

export CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7
export NCCL_IB_GID_INDEX=3
export NCCL_IB_HCA=^mlx5_0

PATH_ORI=${0%/*}
WORK_PATH=$(echo ${PATH_ORI} | sed -r 's//{2,}///')
WORKDIR=$(echo ${PATH_ORI} | sed -r 's//{2,}///')
cd ${WORK_PATH}

WORKDIR=/LLaMA-Factory

MASTER_PORT=12344
MASTER_IP=""
if [ "${RANK}" == "0" ];then
while [[ "$MASTER_IP" == "" ]]
do
MASTER_IP=ping ${MASTER_ADDR} -c 3 | sed '1{s/[^(]*(//;s/).*//;q}'
sleep 1
done
else
sleep 60
MASTER_IP=getent hosts ${MASTER_ADDR} | awk '{print $1}'
fi
export MASTER_NAME=$MASTER_ADDR
echo WORLD_SIZE=${WORLD_SIZE}
echo RANK=${RANK}

if [ ${WORLD_SIZE} -gt 1 ]
then
submit="python -m deepspeed.launcher.launch --node_rank=${RANK} --world_info=${WORLD_INFO} --master_addr=${MASTER_IP} --master_port=${MASTER_PORT} "

else
submit="deepspeed --num_gpus 8 --master_port=9901 "
fi

set -o pipefail

$submit $WORKDIR/src/train.py
--deepspeed $WORKDIR/examples/deepspeed/ds_z3_offload_config.json
--stage dpo
--model_name_or_path "${pretrained_model_path}"
--do_train
--dataset "${dataset}"
--template qwen
--finetuning_type full
--output_dir "${model_output_dir}/${task_name}"
--overwrite_output_dir True
--overwrite_cache
--per_device_train_batch_size ${bsz}
--gradient_accumulation_steps ${accum}
--lr_scheduler_type cosine
--logging_steps 10
--save_steps 1000
--learning_rate 1e-6
--warmup_ratio 0.03
--num_train_epochs ${epoch}
--weight_decay 0.05
--adam_beta2 0.95
--cutoff_len ${max_len}
--dataset_dir "${dataset_dir}"
--plot_loss
--preprocessing_num_workers 16
--bf16
--seed 42
--flash_attn fa2

会不会是dpo的loss计算过程和safe-rlhf有出入

hiyouga · 2024-12-27T11:26:43Z

用的是 base model 还是 instruct model？

Xuanwu-Gong · 2024-12-27T11:48:13Z

用的是 base model 还是 instruct model？

用的是qwen 的instruct model，看起来--templat qwen 应该保证了训练数据输入和instruct model微调过程的一致性。

hiyouga · 2024-12-27T12:33:55Z

有对比过两者 loss 和 logp 曲线吗

Xuanwu-Gong · 2024-12-30T04:39:09Z

有对比过两者 loss 和 logp 曲线吗

整体loss曲线和reward accuracy曲线都十分接近😭，其中绿色曲线是safe-rlhf，黄色曲线是llama factory

hiyouga · 2024-12-30T05:34:50Z

@Xuanwu-Gong 方便看一下推理脚本命令吗？是否用 llamafactory 推理的， eos token 有没有设置对

github-actions bot added the pending This problem is yet to be addressed label Dec 27, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

相同dpo数据，相同训练配置和训练参数，在safe-rlhf框架训练完成以后回复正常，在llama-factory训练以后模型重复输出 #6458

相同dpo数据，相同训练配置和训练参数，在safe-rlhf框架训练完成以后回复正常，在llama-factory训练以后模型重复输出 #6458

Xuanwu-Gong commented Dec 27, 2024 •

edited

Loading

hiyouga commented Dec 27, 2024

Xuanwu-Gong commented Dec 27, 2024 •

edited

Loading

hiyouga commented Dec 27, 2024

Xuanwu-Gong commented Dec 27, 2024

hiyouga commented Dec 27, 2024 •

edited

Loading

Xuanwu-Gong commented Dec 30, 2024

hiyouga commented Dec 30, 2024

相同dpo数据，相同训练配置和训练参数，在safe-rlhf框架训练完成以后回复正常，在llama-factory训练以后模型重复输出 #6458

相同dpo数据，相同训练配置和训练参数，在safe-rlhf框架训练完成以后回复正常，在llama-factory训练以后模型重复输出 #6458

Comments

Xuanwu-Gong commented Dec 27, 2024 • edited Loading

Reminder

System Info

Reproduction

Expected behavior

Others

hiyouga commented Dec 27, 2024

Xuanwu-Gong commented Dec 27, 2024 • edited Loading

hiyouga commented Dec 27, 2024

Xuanwu-Gong commented Dec 27, 2024

hiyouga commented Dec 27, 2024 • edited Loading

Xuanwu-Gong commented Dec 30, 2024

hiyouga commented Dec 30, 2024

Xuanwu-Gong commented Dec 27, 2024 •

edited

Loading

Xuanwu-Gong commented Dec 27, 2024 •

edited

Loading

hiyouga commented Dec 27, 2024 •

edited

Loading