Fix stop tokens in PPO #2304

RedTachyon · 2025-01-28T15:47:24Z

Quick fix -- I saw that in the PPO recipe, when determining stop tokens to use for generation, the branch where they're not listed in the config fails due to a simple bug (it was hasattr(tokenizer.stop_tokens), should be hasattr(tokenizer, "stop_tokens").

I'm guessing it never came up in any tests since the default config lists the tokens explicitly, so this branch is never hit.

Context

What is the purpose of this PR? Is it to

add a new feature
fix a bug
update tests and/or documentation
other (please add here)

Changelog

What are the changes made in this PR?

Fixed reading stop tokens from the tokenizer, if not specified in the config

Test plan

Please make sure to do each of the following if applicable to your PR. If you're unsure about any one of these just ask and we will happily help. We also have a contributing page for some guidance on contributing.

run pre-commit hooks and linters (make sure you've first installed via pre-commit install)
add unit tests for any new functionality
update docstrings for any new or updated methods or classes
run unit tests via pytest tests
run recipe tests via pytest tests -m integration_test
manually run any new or modified recipes with sufficient proof of correctness
include relevant commands and any other artifacts in this summary (pastes of loss curves, eval results, etc.)

UX

If your function changed a public API, please add a dummy example of what the user experience will look like when calling it.
Here is a docstring example
and a tutorial example

I did not change any public API
I have added an example to docs or docstrings

pytorch-bot · 2025-01-28T15:47:28Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/torchtune/2304

📄 Preview Python docs built from this PR

Note: Links to docs will display an error until the docs builds have been completed.

✅ No Failures

As of commit 0001f22 with merge base afd5bc4 ():
💚 Looks good so far! There are no failures yet. 💚

This comment was automatically generated by Dr. CI and updates every 15 minutes.

SalmanMohammadi · 2025-01-28T15:52:49Z

Good catch!

In hindsight, I believe all of our tokenizers should have stop_tokens defined, so we may be able to remove this check entirely. cc @RdoubleA to be sure.

RedTachyon · 2025-01-30T22:03:45Z

My intuition is that it's better to keep the guardrails, especially for the sake of future development (and hackability, i.e. how I came across this bug). But ultimately it's not a huge deal, it would only become relevant in edge cases (e.g. someone messing with custom tokenizers)

SalmanMohammadi

My intuition is that it's better to keep the guardrails, especially for the sake of future development (and hackability, i.e. how I came across this bug). But ultimately it's not a huge deal, it would only become relevant in edge cases (e.g. someone messing with custom tokenizers)

If you don't think it over-complicates/confuses things, I'm happy : )

Feel free to merge when CI is green.

SalmanMohammadi · 2025-01-31T10:25:42Z

@RedTachyon you might need to rebase against main so the CI works after #2322

RedTachyon · 2025-01-31T10:29:56Z

Done, hopefully it works now

SalmanMohammadi · 2025-01-31T11:05:20Z

@RedTachyon as an aside, have you been using this recipe? If so, any feedback you might have to improve it would be greatly appreciated. We're in the early stages of adding RLHF support to torchtune. I know one of the next major steps would be adding a distributed recipe, and perhaps jumping on the hype train of GRPO.

RedTachyon · 2025-01-31T11:37:13Z

It so happens that I came across this problem when developing a distributed GRPO based on torchtune. It's mostly working now, but still in a fairly research-y state (that is, the training loop runs, reward goes up a bit, but not quite as much as I would like)

I'd need to balance our research needs with the cleanliness required for an OSS project, but I should be able to share it to be upstreamed if that'd be helpful (I imagine that FAIR/Meta, of all organizations, shouldn't have a problem with that)

SalmanMohammadi · 2025-01-31T11:46:57Z

It so happens that I came across this problem when developing a distributed GRPO based on torchtune. It's mostly working now, but still in a fairly research-y state (that is, the training loop runs, reward goes up a bit, but not quite as much as I would like)

I'd need to balance our research needs with the cleanliness required for an OSS project, but I should be able to share it to be upstreamed if that'd be helpful (I imagine that FAIR/Meta, of all organizations, shouldn't have a problem with that)

That would be awesome. I'm currently working on the efficient inference-side of distributed online RL algorithms. If you'd be up for it I'd love to see what you've got up and running.

If you'd like to chat a bit more I'm also around on the torchtune Discord : )

Fix stop tokens in PPO

9c53135

facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Jan 28, 2025

SalmanMohammadi requested a review from RdoubleA January 30, 2025 18:32

SalmanMohammadi approved these changes Jan 30, 2025

View reviewed changes

Merge branch 'pytorch:main' into patch-1

0001f22

SalmanMohammadi merged commit 6487029 into pytorch:main Jan 31, 2025
17 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix stop tokens in PPO #2304

Fix stop tokens in PPO #2304

RedTachyon commented Jan 28, 2025

pytorch-bot bot commented Jan 28, 2025 •

edited

Loading

SalmanMohammadi commented Jan 28, 2025

RedTachyon commented Jan 30, 2025

SalmanMohammadi left a comment

SalmanMohammadi commented Jan 31, 2025

RedTachyon commented Jan 31, 2025

SalmanMohammadi commented Jan 31, 2025

RedTachyon commented Jan 31, 2025

SalmanMohammadi commented Jan 31, 2025

Fix stop tokens in PPO #2304

Fix stop tokens in PPO #2304

Conversation

RedTachyon commented Jan 28, 2025

Context

Changelog

Test plan

UX

pytorch-bot bot commented Jan 28, 2025 • edited Loading

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/torchtune/2304

✅ No Failures

SalmanMohammadi commented Jan 28, 2025

RedTachyon commented Jan 30, 2025

SalmanMohammadi left a comment

Choose a reason for hiding this comment

SalmanMohammadi commented Jan 31, 2025

RedTachyon commented Jan 31, 2025

SalmanMohammadi commented Jan 31, 2025

RedTachyon commented Jan 31, 2025

SalmanMohammadi commented Jan 31, 2025

pytorch-bot bot commented Jan 28, 2025 •

edited

Loading