Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix stop tokens in PPO #2304

Merged
merged 2 commits into from
Jan 31, 2025
Merged

Fix stop tokens in PPO #2304

merged 2 commits into from
Jan 31, 2025

Conversation

RedTachyon
Copy link
Contributor

Quick fix -- I saw that in the PPO recipe, when determining stop tokens to use for generation, the branch where they're not listed in the config fails due to a simple bug (it was hasattr(tokenizer.stop_tokens), should be hasattr(tokenizer, "stop_tokens").

I'm guessing it never came up in any tests since the default config lists the tokens explicitly, so this branch is never hit.

Context

What is the purpose of this PR? Is it to

  • add a new feature
  • fix a bug
  • update tests and/or documentation
  • other (please add here)

Changelog

What are the changes made in this PR?

  • Fixed reading stop tokens from the tokenizer, if not specified in the config

Test plan

Please make sure to do each of the following if applicable to your PR. If you're unsure about any one of these just ask and we will happily help. We also have a contributing page for some guidance on contributing.

  • run pre-commit hooks and linters (make sure you've first installed via pre-commit install)
  • add unit tests for any new functionality
  • update docstrings for any new or updated methods or classes
  • run unit tests via pytest tests
  • run recipe tests via pytest tests -m integration_test
  • manually run any new or modified recipes with sufficient proof of correctness
  • include relevant commands and any other artifacts in this summary (pastes of loss curves, eval results, etc.)

UX

If your function changed a public API, please add a dummy example of what the user experience will look like when calling it.
Here is a docstring example
and a tutorial example

  • I did not change any public API
  • I have added an example to docs or docstrings

Copy link

pytorch-bot bot commented Jan 28, 2025

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/torchtune/2304

Note: Links to docs will display an error until the docs builds have been completed.

✅ No Failures

As of commit 0001f22 with merge base afd5bc4 (image):
💚 Looks good so far! There are no failures yet. 💚

This comment was automatically generated by Dr. CI and updates every 15 minutes.

@facebook-github-bot facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Jan 28, 2025
@SalmanMohammadi
Copy link
Collaborator

Good catch!

In hindsight, I believe all of our tokenizers should have stop_tokens defined, so we may be able to remove this check entirely. cc @RdoubleA to be sure.

@RedTachyon
Copy link
Contributor Author

My intuition is that it's better to keep the guardrails, especially for the sake of future development (and hackability, i.e. how I came across this bug). But ultimately it's not a huge deal, it would only become relevant in edge cases (e.g. someone messing with custom tokenizers)

Copy link
Collaborator

@SalmanMohammadi SalmanMohammadi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My intuition is that it's better to keep the guardrails, especially for the sake of future development (and hackability, i.e. how I came across this bug). But ultimately it's not a huge deal, it would only become relevant in edge cases (e.g. someone messing with custom tokenizers)

If you don't think it over-complicates/confuses things, I'm happy : )

Feel free to merge when CI is green.

@SalmanMohammadi
Copy link
Collaborator

@RedTachyon you might need to rebase against main so the CI works after #2322

@RedTachyon
Copy link
Contributor Author

Done, hopefully it works now

@SalmanMohammadi
Copy link
Collaborator

@RedTachyon as an aside, have you been using this recipe? If so, any feedback you might have to improve it would be greatly appreciated. We're in the early stages of adding RLHF support to torchtune. I know one of the next major steps would be adding a distributed recipe, and perhaps jumping on the hype train of GRPO.

@RedTachyon
Copy link
Contributor Author

It so happens that I came across this problem when developing a distributed GRPO based on torchtune. It's mostly working now, but still in a fairly research-y state (that is, the training loop runs, reward goes up a bit, but not quite as much as I would like)

I'd need to balance our research needs with the cleanliness required for an OSS project, but I should be able to share it to be upstreamed if that'd be helpful (I imagine that FAIR/Meta, of all organizations, shouldn't have a problem with that)

@SalmanMohammadi
Copy link
Collaborator

It so happens that I came across this problem when developing a distributed GRPO based on torchtune. It's mostly working now, but still in a fairly research-y state (that is, the training loop runs, reward goes up a bit, but not quite as much as I would like)

I'd need to balance our research needs with the cleanliness required for an OSS project, but I should be able to share it to be upstreamed if that'd be helpful (I imagine that FAIR/Meta, of all organizations, shouldn't have a problem with that)

That would be awesome. I'm currently working on the efficient inference-side of distributed online RL algorithms. If you'd be up for it I'd love to see what you've got up and running.

If you'd like to chat a bit more I'm also around on the torchtune Discord : )

@SalmanMohammadi SalmanMohammadi merged commit 6487029 into pytorch:main Jan 31, 2025
17 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants