-
Notifications
You must be signed in to change notification settings - Fork 136
Description
System Info
The environment is consistent with the default, and the GPU is Nvidia A40
Who can help?
No response
Information
- The official example scripts
- My own modified scripts
Tasks
- An officially supported task in the codebase (such as scrips/, ...)
- My own task or dataset (give details below)
Reproduction
From v4.47 onwards, when a model cache is to be returned, generate will return a Cache instance instead by default (as opposed to the legacy tuple of tuples format). If you want to keep returning the legacy format, please set return_legacy_cache=True.
Traceback (most recent call last):
File "/home/tank/o1_train/openr/train/mat/scripts/train_math.py", line 107, in
main(sys.argv[1:])
File "/home/tank/o1_train/openr/train/mat/scripts/train_math.py", line 99, in main
runner.run()
File "/home/tank/o1_train/openr/train/mat/scripts/../../mat/runner/shared/math_runner.py", line 76, in run
rewards = self.prm.get_reward(obs, actions)
File "/home/tank/miniconda3/envs/open_reasoner/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 116, in decorate_context
return func(*args, **kwargs)
File "/home/tank/o1_train/openr/train/mat/scripts/../../mat/models/ms_prm.py", line 40, in get_reward
last_step_score = step_score[-1]
IndexError: index -1 is out of bounds for dimension 0 with size 0
Expected behavior
How should I fix it, should I replace the A100 GPU or something? I can't simply check the length of the array and continue it