You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I saw that in the PPOv2 example, the policy model is directly created from AutoModelForCausalLM.from_pretrained
I want to know if it is interchangeable with AutoModelForCausalLMWithValueHead.from_pretrained
Also I found that if I use AutoModelForCausalLMWithValueHead in PPOv2 I will have faster ppo training speed compared to using AutoModelForCausalLM . I wonder why this happened.
Information
The official example scripts
My own modified scripts
Tasks
An officially supported task in the examples folder
My own task or dataset (give details below)
Reproduction
change the creation of policy model from calling AutoModelForCausalLM to AutoModelForCausalLMWithValueHead
Expected behavior
I would like to know what causes the difference in performance.
The text was updated successfully, but these errors were encountered:
System Info
I saw that in the PPOv2 example, the policy model is directly created from
AutoModelForCausalLM.from_pretrained
I want to know if it is interchangeable with
AutoModelForCausalLMWithValueHead.from_pretrained
Also I found that if I use
AutoModelForCausalLMWithValueHead
in PPOv2 I will have faster ppo training speed compared to usingAutoModelForCausalLM
. I wonder why this happened.Information
Tasks
examples
folderReproduction
change the creation of policy model from calling AutoModelForCausalLM to AutoModelForCausalLMWithValueHead
Expected behavior
I would like to know what causes the difference in performance.
The text was updated successfully, but these errors were encountered: