-
Notifications
You must be signed in to change notification settings - Fork 136
Open
Labels
bugSomething isn't workingSomething isn't working
Description
System Info
用train_math.py文件进行训练,最后得到的adapter_model.safetensors文件大小只有40字节,内容是 {"metadata":{"format":"pt"}},似乎adapter并没有参与训练?这样RL训练过程对吗?
Who can help?
Information
- The official example scripts
- My own modified scripts
Tasks
- An officially supported task in the codebase (such as scrips/, ...)
- My own task or dataset (give details below)
Reproduction
python -u train_math.py --dataset_path "/mnt/workspace/yly/longcot/openr/train/mat/envs/math/data/math_500.jsonl" --model_name_or_path "/mnt/workspace/yly/model/Qwen2.5-Math-7B-Instruct" --prm_model_name_or_path "/mnt/workspace/yly/model/math-shepherd-mistral-7b-prm" --algorithm_name "TPPO" --num_mini_batch 4 --ppo_epoch 1
Expected behavior
获得正常的adapter文件
Metadata
Metadata
Assignees
Labels
bugSomething isn't workingSomething isn't working