`PeftModelForCausalLM.generate` ignores prompt tuning parameters unless `use_cache=False` #2123

mattlgarber · 2024-10-02T19:40:32Z

System Info

Python 3.9.18

peft==0.13.0
transformers==4.45.1

Who can help?

No response

Information

The official example scripts
My own modified scripts

Tasks

An officially supported task in the examples folder
My own task or dataset (give details below)

Reproduction

When using prompt tuning, the generate method of the PEFT model produces the same output as the base model unless use_cache=False:

from transformers import AutoModelForCausalLM, AutoTokenizer
from peft import PromptTuningConfig, get_peft_model, AutoPeftModelForCausalLM


base_model = AutoModelForCausalLM.from_pretrained('meta-llama/Meta-Llama-3-8B-Instruct', device_map='auto')
tokenizer = AutoTokenizer.from_pretrained('meta-llama/Meta-Llama-3-8B-Instruct')
peft_config = PromptTuningConfig(task_type='CAUSAL_LM', prompt_tuning_init='TEXT', prompt_tuning_init_text='What color is a swan?', tokenizer_name_or_path='meta-llama/Meta-Llama-3-8B-Instruct', num_virtual_tokens=9)
peft_model = get_peft_model(base_model, peft_config)

print(tokenizer.batch_decode(peft_model.base_model.generate(input_ids=tokenizer('It is', return_tensors='pt').input_ids.to('cuda'), do_sample=False, max_new_tokens=5)))
# > ['<|begin_of_text|>It is a well-known fact that']

# PEFT model unexpectedly outputs the same output as above.
print(tokenizer.batch_decode(peft_model.generate(input_ids=tokenizer('It is', return_tensors='pt').input_ids.to('cuda'), do_sample=False, max_new_tokens=5, use_cache=True)))
# > ['<|begin_of_text|>It is a well-known fact that']

# Setting use_cache=False leads to expected output.
print(tokenizer.batch_decode(peft_model.generate(input_ids=tokenizer('It is', return_tensors='pt').input_ids.to('cuda'), do_sample=False, max_new_tokens=5, use_cache=False)))
# > ['<|begin_of_text|>It is a white bird, but']

I didn't observe this unexpected behavior when using Write/palmyra-small (which uses GPT2LMHeadModel), so it doesn't seem to affect all architectures. I haven't tested other prompt learning methods, but I suspect they might be affected as well.

Expected behavior

PEFT models should respect prompt tuning parameters when generating regardless of use_cache setting.

The text was updated successfully, but these errors were encountered:

BenjaminBossan · 2024-10-07T10:59:51Z

Thanks for reporting the problem and providing a reproducer. When I tried it, however, I got different results:

['<|begin_of_text|>It is a well-known fact that']
['<|begin_of_text|>It is a white bird, but']
['<|begin_of_text|>It is a white bird, but']

(just showing the print, removed a couple of logs from the output)

I'm on the main branch of peft and of transformers. Could you check if installing these packages from source fixes the issue for you too?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

`PeftModelForCausalLM.generate` ignores prompt tuning parameters unless `use_cache=False` #2123

`PeftModelForCausalLM.generate` ignores prompt tuning parameters unless `use_cache=False` #2123

mattlgarber commented Oct 2, 2024 •

edited

Loading

BenjaminBossan commented Oct 7, 2024

PeftModelForCausalLM.generate ignores prompt tuning parameters unless use_cache=False #2123

PeftModelForCausalLM.generate ignores prompt tuning parameters unless use_cache=False #2123

Comments

mattlgarber commented Oct 2, 2024 • edited Loading

System Info

Who can help?

Information

Tasks

Reproduction

Expected behavior

BenjaminBossan commented Oct 7, 2024

`PeftModelForCausalLM.generate` ignores prompt tuning parameters unless `use_cache=False` #2123

`PeftModelForCausalLM.generate` ignores prompt tuning parameters unless `use_cache=False` #2123

mattlgarber commented Oct 2, 2024 •

edited

Loading