Skip to content

Commit 178faf7

Browse files
committed
update example
1 parent 52ceaa7 commit 178faf7

File tree

2 files changed

+3
-3
lines changed

2 files changed

+3
-3
lines changed

docs/reasoning_examples.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -71,7 +71,7 @@ User: Using the numbers [79, 17, 60], create an equation that equals 36. You can
7171
We use the following command to RL-tune a `Qwen/Qwen2.5-3B` base model and observe R1-zero-like training curves within 3 hours training on 8 GPUs.
7272

7373
```
74-
python examples/r1_zero_math.py \
74+
python examples/r1_zero_countdown.py \
7575
--critic_type grpo \
7676
--gpus 8 \
7777
--vllm_gpu_ratio 0.7 \
Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -21,8 +21,8 @@
2121

2222

2323
class ZeroMathActor(PPOActor):
24-
def __init__(self, ipc_server, vllm_args, args: PPOArgs) -> None:
25-
super().__init__(ipc_server, vllm_args, args)
24+
def init(self, actor_id):
25+
super().init(actor_id)
2626
if args.oracle == "countdown":
2727
self.oracle = CountdownOracle()
2828
else:

0 commit comments

Comments
 (0)