Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

PeftModelForCausalLM.generate ignores prompt tuning parameters unless use_cache=False #2123

Open
2 of 4 tasks
mattlgarber opened this issue Oct 2, 2024 · 1 comment
Open
2 of 4 tasks

Comments

@mattlgarber
Copy link

mattlgarber commented Oct 2, 2024

System Info

Python 3.9.18

peft==0.13.0
transformers==4.45.1

Who can help?

No response

Information

  • The official example scripts
  • My own modified scripts

Tasks

  • An officially supported task in the examples folder
  • My own task or dataset (give details below)

Reproduction

When using prompt tuning, the generate method of the PEFT model produces the same output as the base model unless use_cache=False:

from transformers import AutoModelForCausalLM, AutoTokenizer
from peft import PromptTuningConfig, get_peft_model, AutoPeftModelForCausalLM


base_model = AutoModelForCausalLM.from_pretrained('meta-llama/Meta-Llama-3-8B-Instruct', device_map='auto')
tokenizer = AutoTokenizer.from_pretrained('meta-llama/Meta-Llama-3-8B-Instruct')
peft_config = PromptTuningConfig(task_type='CAUSAL_LM', prompt_tuning_init='TEXT', prompt_tuning_init_text='What color is a swan?', tokenizer_name_or_path='meta-llama/Meta-Llama-3-8B-Instruct', num_virtual_tokens=9)
peft_model = get_peft_model(base_model, peft_config)

print(tokenizer.batch_decode(peft_model.base_model.generate(input_ids=tokenizer('It is', return_tensors='pt').input_ids.to('cuda'), do_sample=False, max_new_tokens=5)))
# > ['<|begin_of_text|>It is a well-known fact that']

# PEFT model unexpectedly outputs the same output as above.
print(tokenizer.batch_decode(peft_model.generate(input_ids=tokenizer('It is', return_tensors='pt').input_ids.to('cuda'), do_sample=False, max_new_tokens=5, use_cache=True)))
# > ['<|begin_of_text|>It is a well-known fact that']

# Setting use_cache=False leads to expected output.
print(tokenizer.batch_decode(peft_model.generate(input_ids=tokenizer('It is', return_tensors='pt').input_ids.to('cuda'), do_sample=False, max_new_tokens=5, use_cache=False)))
# > ['<|begin_of_text|>It is a white bird, but']

I didn't observe this unexpected behavior when using Write/palmyra-small (which uses GPT2LMHeadModel), so it doesn't seem to affect all architectures. I haven't tested other prompt learning methods, but I suspect they might be affected as well.

Expected behavior

PEFT models should respect prompt tuning parameters when generating regardless of use_cache setting.

@BenjaminBossan
Copy link
Member

Thanks for reporting the problem and providing a reproducer. When I tried it, however, I got different results:

['<|begin_of_text|>It is a well-known fact that']
['<|begin_of_text|>It is a white bird, but']
['<|begin_of_text|>It is a white bird, but']

(just showing the print, removed a couple of logs from the output)

I'm on the main branch of peft and of transformers. Could you check if installing these packages from source fixes the issue for you too?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants