RL训练得到的adapter文件是空的

### System Info

用train_math.py文件进行训练，最后得到的adapter_model.safetensors文件大小只有40字节，内容是        {"__metadata__":{"format":"pt"}}，似乎adapter并没有参与训练？这样RL训练过程对吗？

### Who can help?

@morning9393 

### Information

- [X] The official example scripts
- [ ] My own modified scripts

### Tasks

- [X] An officially supported task in the codebase (such as scrips/, ...)
- [ ] My own task or dataset (give details below)

### Reproduction

python -u train_math.py  --dataset_path "/mnt/workspace/yly/longcot/openr/train/mat/envs/math/data/math_500.jsonl" --model_name_or_path "/mnt/workspace/yly/model/Qwen2.5-Math-7B-Instruct"  --prm_model_name_or_path "/mnt/workspace/yly/model/math-shepherd-mistral-7b-prm" --algorithm_name "TPPO" --num_mini_batch 4 --ppo_epoch 1

### Expected behavior

获得正常的adapter文件

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

RL训练得到的adapter文件是空的 #89

System Info

Who can help?

Information

Tasks

Reproduction

Expected behavior

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

RL训练得到的adapter文件是空的 #89

Description

System Info

Who can help?

Information

Tasks

Reproduction

Expected behavior

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions