Skip to content

RL训练得到的adapter文件是空的 #89

@linyaoyang

Description

@linyaoyang

System Info

用train_math.py文件进行训练,最后得到的adapter_model.safetensors文件大小只有40字节,内容是 {"metadata":{"format":"pt"}},似乎adapter并没有参与训练?这样RL训练过程对吗?

Who can help?

@morning9393

Information

  • The official example scripts
  • My own modified scripts

Tasks

  • An officially supported task in the codebase (such as scrips/, ...)
  • My own task or dataset (give details below)

Reproduction

python -u train_math.py --dataset_path "/mnt/workspace/yly/longcot/openr/train/mat/envs/math/data/math_500.jsonl" --model_name_or_path "/mnt/workspace/yly/model/Qwen2.5-Math-7B-Instruct" --prm_model_name_or_path "/mnt/workspace/yly/model/math-shepherd-mistral-7b-prm" --algorithm_name "TPPO" --num_mini_batch 4 --ppo_epoch 1

Expected behavior

获得正常的adapter文件

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions