Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Does AutoModelForCausalLMWithValueHead get abandoned in PPOv2Trainer ? #2188

Open
2 of 4 tasks
Sino-Huang opened this issue Oct 6, 2024 · 0 comments
Open
2 of 4 tasks
Labels
🐛 bug Something isn't working

Comments

@Sino-Huang
Copy link

Sino-Huang commented Oct 6, 2024

System Info

I saw that in the PPOv2 example, the policy model is directly created from AutoModelForCausalLM.from_pretrained
I want to know if it is interchangeable with AutoModelForCausalLMWithValueHead.from_pretrained

Also I found that if I use AutoModelForCausalLMWithValueHead in PPOv2 I will have faster ppo training speed compared to using AutoModelForCausalLM . I wonder why this happened.

Information

  • The official example scripts
  • My own modified scripts

Tasks

  • An officially supported task in the examples folder
  • My own task or dataset (give details below)

Reproduction

change the creation of policy model from calling AutoModelForCausalLM to AutoModelForCausalLMWithValueHead

Expected behavior

I would like to know what causes the difference in performance.

@Sino-Huang Sino-Huang added the 🐛 bug Something isn't working label Oct 6, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
🐛 bug Something isn't working
Projects
None yet
Development

No branches or pull requests

1 participant