Does AutoModelForCausalLMWithValueHead get abandoned in PPOv2Trainer ? #2188

Sino-Huang · 2024-10-06T13:34:53Z

System Info

I saw that in the PPOv2 example, the policy model is directly created from AutoModelForCausalLM.from_pretrained
I want to know if it is interchangeable with AutoModelForCausalLMWithValueHead.from_pretrained

Also I found that if I use AutoModelForCausalLMWithValueHead in PPOv2 I will have faster ppo training speed compared to using AutoModelForCausalLM . I wonder why this happened.

Information

The official example scripts
My own modified scripts

Tasks

An officially supported task in the examples folder
My own task or dataset (give details below)

Reproduction

change the creation of policy model from calling AutoModelForCausalLM to AutoModelForCausalLMWithValueHead

Expected behavior

I would like to know what causes the difference in performance.

The text was updated successfully, but these errors were encountered:

Sino-Huang added the 🐛 bug Something isn't working label Oct 6, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Does AutoModelForCausalLMWithValueHead get abandoned in PPOv2Trainer ? #2188

Does AutoModelForCausalLMWithValueHead get abandoned in PPOv2Trainer ? #2188

Sino-Huang commented Oct 6, 2024 •

edited

Loading

Does AutoModelForCausalLMWithValueHead get abandoned in PPOv2Trainer ? #2188

Does AutoModelForCausalLMWithValueHead get abandoned in PPOv2Trainer ? #2188

Comments

Sino-Huang commented Oct 6, 2024 • edited Loading

System Info

Information

Tasks

Reproduction

Expected behavior

Sino-Huang commented Oct 6, 2024 •

edited

Loading