Deprecation: Transformers will no longer support `past_key_values` to be tuples #1962

BenjaminBossan · 2024-07-26T10:43:54Z

Quick question, I am seeing this in peft:

peft/src/peft/peft_model.py

Line 1665 in f2b6d13

if uses_cache and (model_kwargs["past_key_values"] is not None):

where there is a reliance on get_seq_length() which we are deprecating + we will no longer convert the cache to tuple object automatically in 2 releases

This will presumably affect all prompt learning methods and thus needs to be fixed soon.

For Llama, I identified the following tests, which would result in past_key_values being tuples and can serve as a starting point to work on this issue:

tests/test_decoder_models.py::PeftDecoderModelTester::test_inference_safetensors_70_test_trl_internal_testing_tiny_random_LlamaForCausalLM_prefix_tuning
tests/test_decoder_models.py::PeftDecoderModelTester::test_passing_input_embeds_works_70_test_trl_internal_testing_tiny_random_LlamaForCausalLM_prefix_tuning
tests/test_decoder_models.py::PeftDecoderModelTester::test_training_prompt_learning_tasks_72_test_trl_internal_testing_tiny_random_LlamaForCausalLM_prefix_tuning

(note that there are more tests that would fail, this is just a selection)

I haven't really worked on the prompt learning methods in PEFT and know little about the inner workings of transformers cache, so any support would be welcome.

The text was updated successfully, but these errors were encountered:

piyush-llm · 2024-07-30T05:47:57Z

@BenjaminBossan @ArthurZucker Can you provide more information regarding deprecation of get_seq_length(). I have some idea about prompt learning methods and transformer's KV Cache.

Current implementation of cache utilities:
https://github.com/huggingface/transformers/blob/f0bc49e7f61f74f055c47ad40e6010f57eed0b0b/src/transformers/cache_utils.py#L60

Can you point me to the correct branch?

gante · 2024-08-01T09:46:29Z

Hi @piyush-llm 👋

Alongside a Cache, the prepare_inputs_for_generation method also returns cache_positions on models that have received the update. This new tensor contains the indexes of the cache that will be updated, so its last value can be used as the sequence length :)

BenjaminBossan · 2024-08-01T15:03:55Z

Thanks for the details. Also check out #869 for discussion and code snippets.

YOUSEFNANIS · 2024-08-02T16:42:13Z

As reported by @ArthurZucker:

Quick question, I am seeing this in peft:

peft/src/peft/peft_model.py

Line 1665 in f2b6d13

if uses_cache and (model_kwargs["past_key_values"] is not None):

where there is a reliance on get_seq_length() which we are deprecating + we will no longer convert the cache to tuple object automatically in 2 releases

This will presumably affect all prompt learning methods and thus needs to be fixed soon.

For Llama, I identified the following tests, which would result in past_key_values being tuples and can serve as a starting point to work on this issue:
tests/test_decoder_models.py::PeftDecoderModelTester::test_inference_safetensors_70_test_trl_internal_testing_tiny_random_LlamaForCausalLM_prefix_tuning
tests/test_decoder_models.py::PeftDecoderModelTester::test_passing_input_embeds_works_70_test_trl_internal_testing_tiny_random_LlamaForCausalLM_prefix_tuning
tests/test_decoder_models.py::PeftDecoderModelTester::test_training_prompt_learning_tasks_72_test_trl_internal_testing_tiny_random_LlamaForCausalLM_prefix_tuning
(note that there are more tests that would fail, this is just a selection)

I haven't really worked on the prompt learning methods in PEFT and know little about the inner workings of transformers cache, so any support would be welcome.

is it ok to just convert past_key_values from a tuple to a list ?

BenjaminBossan · 2024-08-05T10:34:32Z

is it ok to just convert past_key_values from a tuple to a list ?

Unfortunately not, it would be expected to be a Cache object.

BenjaminBossan · 2024-08-07T15:42:38Z

An update from the internal discussion: So Cache shouldn't be used at all during training. At the same time, the role of past_key_values will be migrated to mean "cache". This probably means that going forward, we cannot use past_key_values at all for the purpose of injecting "virtual tokens" (or rather, embeddings) as we do right now for prompt learning.

I had hoped that this can be avoid, but it smells like this whole approach needs a rewrite. One idea would be to calculate the virtual embeddings as we do right now (PeftModel.get_prompt) but instead of handing them off to past_key_values, they need to be injected via a pre-forward hook (or, if that's not feasible, we need to monkey patch forward 🙈).

For generating, we'll probably still need a separate code path, since we do want to use caching for generation.

github-actions · 2024-09-11T15:04:01Z

This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.

BenjaminBossan · 2024-09-11T15:07:32Z

not stale

JerryLife · 2024-09-12T13:34:49Z

I have encountered the same issue in prefix tuning. If completely addressing this issue takes time, is there any alternative way to get around with it?

BenjaminBossan · 2024-09-12T16:25:52Z

@JerryLife Could you please try of this suggestion fixes it for you:

#869 (comment)

Please let me know if it works or not, the more data I have, the better I can plan the fix.

JerryLife · 2024-09-13T03:16:05Z

Thank you for your prompt help! It works on my side. Here are more details.
Modules: peft==0.12.0, transformers==4.44.2
Model: MistralForCasualLM

My workaround

Define a custom get_prompt function

from typing import Optional
import torch
from peft import PeftType
from peft.utils import TRANSFORMERS_MODELS_TO_PREFIX_TUNING_POSTPROCESS_MAPPING
from transformers import DynamicCache

def custom_get_prompt(self, batch_size: int, task_ids: Optional[torch.Tensor] = None) -> torch.Tensor:
    """
    Returns the virtual prompts to use for Peft. Only applicable when using a prompt learning method.
    """
    peft_config = self.active_peft_config
    prompt_encoder = self.prompt_encoder[self.active_adapter]
    prompt_tokens = (
        self.prompt_tokens[self.active_adapter]
        .unsqueeze(0)
        .expand(batch_size, -1)
        .to(prompt_encoder.embedding.weight.device)
    )
    if peft_config.peft_type == PeftType.PREFIX_TUNING:
        prompt_tokens = prompt_tokens[:, : peft_config.num_virtual_tokens]
        if peft_config.inference_mode:
            past_key_values = prompt_encoder.embedding.weight.repeat(batch_size, 1, 1)
        else:
            past_key_values = prompt_encoder(prompt_tokens)
        if self.base_model_torch_dtype is not None:
            past_key_values = past_key_values.to(self.base_model_torch_dtype)
        past_key_values = past_key_values.view(
            batch_size,
            peft_config.num_virtual_tokens,
            peft_config.num_layers * 2,
            peft_config.num_attention_heads,
            peft_config.token_dim // peft_config.num_attention_heads,
        )
        if peft_config.num_transformer_submodules == 2:
            past_key_values = torch.cat([past_key_values, past_key_values], dim=2)
        past_key_values = past_key_values.permute([2, 0, 3, 1, 4]).split(
            peft_config.num_transformer_submodules * 2
        )
        if TRANSFORMERS_MODELS_TO_PREFIX_TUNING_POSTPROCESS_MAPPING.get(self.config.model_type, None) is not None:
            post_process_fn = TRANSFORMERS_MODELS_TO_PREFIX_TUNING_POSTPROCESS_MAPPING[self.config.model_type]
            past_key_values = post_process_fn(past_key_values)

        ############################## Workaround for PEFT 0.13 ##############################
        past_key_values = DynamicCache.from_legacy_cache(past_key_values)
        ######################################################################################

        return past_key_values
    else:
        if peft_config.peft_type == PeftType.MULTITASK_PROMPT_TUNING:
            prompts = prompt_encoder(prompt_tokens, task_ids)
        else:
            if peft_config.inference_mode:
                prompts = prompt_encoder.embedding.weight.repeat(batch_size, 1, 1)
            else:
                prompts = prompt_encoder(prompt_tokens)
        return prompts

Monkey patch the function on PeftModel using gorilla

import gorilla
from peft import PeftModel
patch = gorilla.Patch(peft.PeftModel, 'get_prompt', custom_get_prompt, settings=gorilla.Settings(allow_hit=True))
gorilla.apply(patch)

Then, the training works on my side. Thank you again for your asssitance!

BenjaminBossan · 2024-09-13T09:48:56Z

Thanks a lot @JerryLife this really helps us move forward.

Modules: peft==0.13

If you could give me access to PEFT version 0.13, that would make my life a lot easier ;-)

JerryLife · 2024-09-14T09:27:10Z

@BenjaminBossan I am sorry for the typo. I have corrected it as peft==0.12.0 and I am looking forward to 0.13 in the near future ;). Thank you for your efforts!

See huggingface#869, huggingface#1962

github-actions · 2024-10-08T15:03:52Z

This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.

BenjaminBossan · 2024-10-08T15:12:29Z

not stale

See #869, #1962 Fix several issues caused by changes to cache in transformers. In particular, past_key_values for prefix tuning is now converted to a transformers Cache instance. --------- Co-authored-by: Raushan Turganbay <[email protected]>

BenjaminBossan added help wanted Extra attention is needed contributions-welcome labels Jul 26, 2024

BenjaminBossan mentioned this issue Aug 7, 2024

How to correctly use Prefixing Tuning? #869

Closed

4 tasks

BenjaminBossan added a commit to BenjaminBossan/peft that referenced this issue Sep 25, 2024

[WIP] Fix to prefix tuning to fit transformers

7c8e287

See huggingface#869, huggingface#1962

This was referenced Sep 25, 2024

Fix to prefix tuning to fit transformers #2096

Merged

Peft should update the way of passing past_key_values of prefix-tuning, prompt tuning etc #2121

Closed

BenjaminBossan added the wip label Oct 8, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Deprecation: Transformers will no longer support `past_key_values` to be tuples #1962

Deprecation: Transformers will no longer support `past_key_values` to be tuples #1962

BenjaminBossan commented Jul 26, 2024

piyush-llm commented Jul 30, 2024 •

edited

Loading

gante commented Aug 1, 2024

BenjaminBossan commented Aug 1, 2024

YOUSEFNANIS commented Aug 2, 2024

BenjaminBossan commented Aug 5, 2024

BenjaminBossan commented Aug 7, 2024

github-actions bot commented Sep 11, 2024

BenjaminBossan commented Sep 11, 2024

JerryLife commented Sep 12, 2024

BenjaminBossan commented Sep 12, 2024

JerryLife commented Sep 13, 2024 •

edited

Loading

BenjaminBossan commented Sep 13, 2024

JerryLife commented Sep 14, 2024 •

edited

Loading

github-actions bot commented Oct 8, 2024

BenjaminBossan commented Oct 8, 2024

Deprecation: Transformers will no longer support past_key_values to be tuples #1962

Deprecation: Transformers will no longer support past_key_values to be tuples #1962

Comments

BenjaminBossan commented Jul 26, 2024

piyush-llm commented Jul 30, 2024 • edited Loading

gante commented Aug 1, 2024

BenjaminBossan commented Aug 1, 2024

YOUSEFNANIS commented Aug 2, 2024

BenjaminBossan commented Aug 5, 2024

BenjaminBossan commented Aug 7, 2024

github-actions bot commented Sep 11, 2024

BenjaminBossan commented Sep 11, 2024

JerryLife commented Sep 12, 2024

BenjaminBossan commented Sep 12, 2024

JerryLife commented Sep 13, 2024 • edited Loading

My workaround

BenjaminBossan commented Sep 13, 2024

JerryLife commented Sep 14, 2024 • edited Loading

github-actions bot commented Oct 8, 2024

BenjaminBossan commented Oct 8, 2024

Deprecation: Transformers will no longer support `past_key_values` to be tuples #1962

Deprecation: Transformers will no longer support `past_key_values` to be tuples #1962

piyush-llm commented Jul 30, 2024 •

edited

Loading

JerryLife commented Sep 13, 2024 •

edited

Loading

JerryLife commented Sep 14, 2024 •

edited

Loading