Process hangs when using the given metrics to evaluate in PRM multi-gpu training.

### System Info

I use Linux platform and my python environment setting is the same as requirement.txt.
During my multi-gpu training of PRM, I found that my process will always hangs on at some point in the evaluation loop. After debugging, I found that it's caused by the given "preprocess_logits_for_metrics" function in the prm/code/finetune_qwen.py. What's strange is that the evaluation works well in a single-gpu training.
After making some tests, I found out that this phenomenon will only happen if the two gpus are preprocessing data samples with different nums of steps,a as what I show below, the final logits differ after we take out the step tags. 
![1733746042(1)](https://github.com/user-attachments/assets/11f4ae80-4ccc-4c52-b530-5b9f27f56adb)

![image](https://github.com/user-attachments/assets/2ea7f3f3-f045-48ea-80c5-b3c1cd8564d4)

For now, the bug can be fixed by adding padding code in Transformers' Trainer and doing some small tricks, but I wonder if that could be fixed permanantly. or be fixed by setting some paramters.

![image](https://github.com/user-attachments/assets/e3a4d177-50b2-4b84-aa99-f0f1be8f4197)
![image](https://github.com/user-attachments/assets/3255f133-e517-4bf8-b625-7d06c42062e6)

 

### Who can help?

_No response_

### Information

- [X] The official example scripts
- [ ] My own modified scripts

### Tasks

- [X] An officially supported task in the codebase (such as scrips/, ...)
- [ ] My own task or dataset (give details below)

### Reproduction

Simply run the finetune_qwen.py with a multi_gpu setting.

### Expected behavior

The evaluation process should work well.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Process hangs when using the given metrics to evaluate in PRM multi-gpu training. #83

System Info

Who can help?

Information

Tasks

Reproduction

Expected behavior

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Process hangs when using the given metrics to evaluate in PRM multi-gpu training. #83

Description

System Info

Who can help?

Information

Tasks

Reproduction

Expected behavior

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions