[WIP] Add LoRA multihead attention module #1324

BenjaminBossan · 2024-01-05T13:00:56Z

First stab at adding LoRA support for nn.MultiheadAttention. See #761.

Todos:

~~For now, only works with _qkv_same_embed_dim=True -- make it work with False too.~~ _qkv_same_embed_dim=False is out of scope for this PR and can be added in a later PR if needed.
Show that it works in a real world test: See user feedback on the issue.
Unit tests
~~Docs~~ Apart from docstrings, I don't think anything else needs to be added

Update: I now also included the out_proj to apply LoRA to.

This is a simple test that I ran successfully with the PR in its current state:

import open_clip
import requests
import torch
from torch import nn
from peft import LoraConfig, get_peft_model
from PIL import Image
from peft.tuners.lora.layer import MultiheadAttention as PeftMha

model, preprocess = open_clip.create_model_from_pretrained('hf-hub:laion/CLIP-ViT-g-14-laion2B-s12B-b42K')
tokenizer = open_clip.get_tokenizer('hf-hub:laion/CLIP-ViT-g-14-laion2B-s12B-b42K')
peft_model = get_peft_model(model, config)
opt = torch.optim.SGD(peft_model.parameters(), 0.1)
print(len([m for m in peft_model.modules() if isinstance(m, PeftMha)]))  # 64 PEFT MHA layers
peft_model.print_trainable_parameters()  # trainable params: 2,588,672 || all params: 1,055,873,793 || trainable%: 0.24516869508096598

# text encoder
text = tokenizer(["a diagram", "a dog", "a cat"])
text_features = peft_model.encode_text(text)
loss = text_features.sum()
loss.backward()
opt.step()

# image encoder
url = 'http://images.cocodataset.org/val2017/000000039769.jpg'
image = Image.open(requests.get(url, stream=True).raw)
image = preprocess(image).unsqueeze(0)
image_features = model.encode_image(image)
image_features.sum().backward()
opt.step()

For now, only works with _qkv_same_embed_dim=True.

HuggingFaceDocBuilderDev · 2024-01-05T13:04:34Z

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

This is no longer necessary when unloading the model because the base_layer is already the original layer. This is just a leftover from before we adopted the base_layer pattern.

There was a bug because the removal of the parameter resulted in it no longer appearing in the state_dict and named_parameters. This commit fixes this bug. The bug also exists in the referenced lora-torch library.

younesbelkada

Nice work ! I left few preliminary comments, I think we can go for the _restore_weights approach for now as I don't see any other alternative

younesbelkada · 2024-01-09T05:59:19Z

src/peft/tuners/lora/layer.py

+ lora_alpha: int = 1,
+ lora_dropout: float = 0.0,
+ fan_in_fan_out: bool = False, # Set this to True if the layer to replace stores weight like (fan_in, fan_out)
+ is_target_conv_1d_layer: bool = False,


Suggested change

is_target_conv_1d_layer: bool = False,

I don't think this is used?

younesbelkada · 2024-01-09T05:59:28Z

src/peft/tuners/lora/layer.py

+
+ self._active_adapter = adapter_name
+ self.update_layer(adapter_name, r, lora_alpha, lora_dropout, init_lora_weights, use_rslora)
+ self.is_target_conv_1d_layer = is_target_conv_1d_layer


Suggested change

self.is_target_conv_1d_layer = is_target_conv_1d_layer

We can also just hard-code it to False

younesbelkada · 2024-01-09T06:02:00Z

src/peft/tuners/lora/layer.py

+ self._restore_weights()
+ return super().state_dict(*args, **kwargs)
+
+ def named_modules(self, *args, **kwargs):


do we need also to over-write the modules() method?

Not needed, as modules calls named_modules under the hood. I added a comment to that effect.

younesbelkada · 2024-01-09T06:04:55Z

src/peft/tuners/lora/model.py

@@ -193,11 +193,6 @@ def _replace_module(self, parent, child_name, new_module, child):
 if hasattr(child, "base_layer"):
 child = child.base_layer

- if not hasattr(new_module, "base_layer"):


Why this has been removed?

Sorry, forgot to put this into the description of the PR.

These lines are obsolete for some time now. They only apply when we unload the model (otherwise, the if does not match). Remember when we made the base_layer switch, we ensured that when unloading, we simply return the base_layer, no more need to create a new layer (say, a new nn.Linear when using lora.Linear) and replace the new layer's weight by the parent layer's weight. The base_layer already has the original weight. Therefore, these lines are unnecessary.

I removed them now because they were annoying with MultiheadAttention, because that layer has no weight attribute, so this line would fail.

- Some clarifying comments - Remove fan_in_fan_out Also: - Raise proper error instead of assert

pacman100

Thank you Benjamin for adding support for torch MHA layer in LoRA, interesting way to use merge, forward and unmerge logic!

BenjaminBossan · 2024-01-10T10:37:41Z

@younesbelkada Could I address all your concerns?

I pinged the user who wanted to test it on their case. When it comes to docs, I didn't really find a place where we list all supported layers, so no update needed really.

Before, LoRA was applied only to the in_proj. Now it is also applied to the out_proj. Unfortunately, there is no easy way to just apply a normal lora.Linear to the out_proj by targeting it with target_modules. If that worked, it would be much nicer to do that, so that users can decide for themselves if they want to apply LoRA to the out_proj or not. The reason why it doesn't work is twofold: 1. We cannot really control the order in which LoRA is applied, so when the LoRA adapter is injected to out_proj, the whole MHA layer may already be wrapped by lora.MultiheadAttention. 2. Even if we successfully applied a normal lora.Linear to the out_proj, it would not work correctly. This is because the forward method of out_proj is not used at all by nn.MultiheadAttention. Instead, it just passes the weight and bias to F.multi_head_attention_forward. Therefore, we must ensure that the weights are merged and unmerged correctly, same as for in_proj, and we cannot do that if we use a normal lora.Linear. Note that the test test_merge_layers for MHA fails. This is most likely because of an existing bug in now merging is implemented, see PR huggingface#1355. Once that is merged, the test should pass.

BenjaminBossan · 2024-01-12T16:21:54Z

Note: The test test_merge_layers for MHA fails. This is most likely because of an existing bug in how merging is implemented, see PR #1355. Once that is merged, the test should pass.

ambroser53 · 2024-01-23T15:53:00Z

Just want to bump a bunch of the issues I've mentioned in #761 but specifically the problem with requires_grad reproducable in this repo

bghira · 2024-02-26T14:58:32Z

just wanted to bump this one because it's really the only way for tuning CLIP models after they are released.

BenjaminBossan · 2024-02-26T15:45:54Z

@bghira Do you happen to have a use case where you could test if this PR works and is working well enough speed-wise? I think the implementation could be ready to be merged but ideally we'd have someone with a real use case give it a try.

bghira · 2024-02-26T16:57:24Z

i do and i may be able to test it. stupid question but is the code example above complete? i dont see the hinge loss function

BenjaminBossan · 2024-02-26T17:13:09Z

stupid question but is the code example above complete? i dont see the hinge loss function

You mean the code right at the top? No, it's not complete at all, just a quick test to show that MHA is applied and the backward pass does not fail. This is not proper nor complete training code.

damian0815 · 2024-07-26T10:41:06Z

it's only happening after calling .forward() on the model (restoring the state dict before that works fine). moreover if i put a breakpoint on the line where the failing restore happens and execute set(model.state_dict().keys()).symmetric_difference(restore_state_dict.keys()) in the debugger, the result is an empty set().

BenjaminBossan · 2024-07-26T10:51:09Z

definitely useful, yes.

That's good to hear. Hopefully this PR can be merged some day so that we can have MHA support in PEFT proper, it's just that multihead attention is implemented in a way that makes applying LoRA very difficult and requires some hacks. To wit:

However, the restoring fails with

I think this is related to this part:

https://github.com/huggingface/peft/pull/1324/files#diff-24a141c266b7b714ae8fcc470f31bc283f7b0f5a671bbf6d5f092741fc374104R1290-R1294

Could you check if calling _restore_weights manually would solve the error?

The code would be something along these lines:

for module in model.modules():
    if isinstance(module, peft.tuners.lora.MultiheadAttention):
        module._restore_weights()

damian0815 · 2024-07-26T12:23:50Z

yes, that solved it - thanks (but i had to use peft.tuners.lora.layers.MultiheadAttention for the fully qualified module class)

BenjaminBossan · 2024-07-26T14:44:05Z

Great, thanks for confirming @damian0815, and sorry for the wrong path.

I tried to create a unit test based on the description you provided, I think I could reproduce your error. Could you quickly check if the test captures your situation?

@pytest.mark.xfail(strict=True)
def test_mha_load_init_model_first():
    # this test fails as it currently requires a workaround to pass, see test below
    # https://github.com/huggingface/peft/pull/1324#issuecomment-2252473980
    inputs = torch.rand(10, 10, 10)
    model = ModelMha()
    config = LoraConfig(target_modules=["mha"], init_lora_weights=False)
    model = get_peft_model(model, config).eval()
    restore_state_dict = {k: v.detach().cpu() for k, v in model.state_dict().items()}

    del model

    model = ModelMha()
    # inferencing with PEFT model first is necessary to trigger the error in load_state_dict
    model = get_peft_model(model, config)
    model(inputs)
    model.load_state_dict(restore_state_dict)


def test_mha_load_init_model_first_with_workaround():
    import peft

    inputs = torch.rand(10, 10, 10)
    model = ModelMha()
    config = LoraConfig(target_modules=["mha"], init_lora_weights=False)
    model = get_peft_model(model, config).eval()
    with torch.inference_mode():
        output_before = model(inputs)
        restore_state_dict = {k: v.detach().cpu() for k, v in model.state_dict().items()}

    del model

    model = ModelMha()
    model = get_peft_model(model, config)
    model(inputs)

    # workaround, see test above
    for module in model.modules():
        if isinstance(module, peft.tuners.lora.layer.MultiheadAttention):
            module._restore_weights()

    model.load_state_dict(restore_state_dict)
    with torch.inference_mode():
        output_after = model(inputs)

    assert torch.allclose(output_before, output_after)

Unfortunately, I could not find a way to hook into load_state_dict to automatically call _restore_weights, since load_state_dict is not recursive, so the PEFT MultiheadAttention is never directly invoked :( I hope this is enough of an edge case that I can ignore it for now.

damian0815 · 2024-07-26T14:53:01Z

looks about right - we're not deleting/reloading the model in-between though, simply messing with the weights (doing a blend with the base model -- which is in fact disabled when LoRA training is active, but the save/restore logic runs anyway) and then restoring the weights by loading the restore_state_dict in place.

github-actions · 2024-08-19T15:03:46Z

This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.

There was a situation were loading the state dict would fail and require a workaround. For this, there was an xfail-ing test with strict=True. This test no longer fails, so the marker has been removed, as well as the test with the workaround.

The buffer does not need to be part of the checkpoint, by making it non-persistent, the file size can be greatly reduced.

Fix bug in parsing command line arguments in the PiSSA preprocess.py script from the PiSSA example.

In docs and examples, use eval_strategy instead of evaluation_strategy, which is deprecated.

Extend the functionality of having different adapters in the same batch to also work with `modules_to_save`.

There was a bug in BOFT that made it impossible in some circumstances to load more than one adapter (creating more than 1 adapter was possible though). This was because a code path that adjusts boft_n_butterfly_factor was only visited when creating a fresh adapter, but not when updating with the 2nd adapter. This was fixed by moving this code path from the BOFT layer's __init__ method to update_layer. A test for loading multiple adapters was added. Since this was a gap in our test suite, this test will be applied to all appropriate PEFT methods, not only BOFT, but the others methods are all passing without needing further changes. For good measure, I also added BOFT to the test suite that checks multiple active adapters. These tests would have also passed without the fix in this PR, since these tests do not load multiple adapters but instead create them, which always worked. Still it's better to have these tests as well.

Eetq/hqq/aqlm don't support XPU yet.

review-notebook-app · 2024-09-18T13:09:17Z

Check out this pull request on

See visual diffs & provide feedback on Jupyter Notebooks.

Powered by ReviewNB

github-actions · 2024-10-13T15:03:40Z

This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.

mashijie1028 · 2024-10-20T18:07:17Z

@BenjaminBossan
Hi! I found that LoRA does not work for in_proj_weight in attn of open_clip. I was wondering how to fix this.
To be more specific, when I implement LoRA as follows:

lora_config = LoraConfig(
    r=16,
    target_modules=["in_proj_weight"],
    lora_alpha=32,
    lora_dropout=0.05
)

An error occurs as ValueError: Target modules {'in_proj_weight'} not found in the base model. Please check the target modules and try again.
But when I implement for out_proj, LoRA works fine!
Could you please tell me how to set target_modules in LoraConfig to implement LoRA on attn layers? Thanks!

By the way, I download peft as you mentioned before:

python -m pip install git+https://github.com/BenjaminBossan/peft.git@feat-add-lora-multihead-attention

(I report the same issue here)

Params need to be re-registered to appear in state dict.

[WIP] Add LoRA multihead attention module

49fab86

For now, only works with _qkv_same_embed_dim=True.

BenjaminBossan mentioned this pull request Jan 5, 2024

fine-tuning OpenClip with Hugingface's PEFT (such as LoRA) #761

Open

BenjaminBossan added 5 commits January 5, 2024 14:08

Make style

d8e9589

Remove commented code

0e188a3

Remove assignment of weight to new module

b409d81

This is no longer necessary when unloading the model because the base_layer is already the original layer. This is just a leftover from before we adopted the base_layer pattern.

Make state_dict and named_parameters work

173062c

There was a bug because the removal of the parameter resulted in it no longer appearing in the state_dict and named_parameters. This commit fixes this bug. The bug also exists in the referenced lora-torch library.

Extend test coverage a bit

1e007f5

younesbelkada reviewed Jan 9, 2024

View reviewed changes

BenjaminBossan added 4 commits January 9, 2024 11:49

Clean ups after reviewer feedback:

557c4a1

- Some clarifying comments - Remove fan_in_fan_out Also: - Raise proper error instead of assert

Reviewer feedback: removed another unnecessary arg

add1f51

Make style

e44e030

Merge branch 'main' into feat-add-lora-multihead-attention

8d62579

pacman100 approved these changes Jan 9, 2024

View reviewed changes

BenjaminBossan added 3 commits February 7, 2024 15:41

Merge branch 'main' into feat-add-lora-multihead-attention

9dc4a4d

Fix bug with incorrectly set gradient

c3fb2ce

Fix failing tests

17d407b

BenjaminBossan added 3 commits February 26, 2024 16:24

Merge branch 'main' into feat-add-lora-multihead-attention

4cbf6e9

Move to pytest style asserts

e0cae11

Fix safe merging code

52c8d9b

BenjaminBossan added 2 commits March 11, 2024 11:48

Merge branch 'main' into feat-add-lora-multihead-attention

977c84b

No need to set bias for MHA anymore, see huggingface#1530

96d376d

BenjaminBossan added 3 commits July 26, 2024 16:47

Better way of param initialization

fb18886

Add tests for broken loading and workaround

4ff2ec3

make style

d1f6ab2

BenjaminBossan and others added 14 commits September 3, 2024 16:54

Merge branch 'main' into feat-add-lora-multihead-attention

65363be

Fix wrong merge conflict resolution in test

7ba2e68

Ensure that base weights have requires_grad False

6ef04b0

Merge branch 'main' into feat-add-lora-multihead-attention

07c7240

Remove xpass-ing test

cc3ac3d

There was a situation were loading the state dict would fail and require a workaround. For this, there was an xfail-ing test with strict=True. This test no longer fails, so the marker has been removed, as well as the test with the workaround.

Merge branch 'main' into feat-add-lora-multihead-attention

03c466f

MAINT: Give stale bot permissions for PRs too (huggingface#2064)

e558caa

ENH BOFT don't save boft_P buffer (huggingface#2050)

38f4a98

The buffer does not need to be part of the checkpoint, by making it non-persistent, the file size can be greatly reduced.

FIX Command line args in PiSSA preprocess (huggingface#2053)

7e5c61d

Fix bug in parsing command line arguments in the PiSSA preprocess.py script from the PiSSA example.

MNT Update deprecated evaluation_strategy (huggingface#1664)

183bf52

In docs and examples, use eval_strategy instead of evaluation_strategy, which is deprecated.

ENH Multi adapters in same batch: modules_to_save (huggingface#1990)

b970607

Extend the functionality of having different adapters in the same batch to also work with `modules_to_save`.

TST Skip some quantization tests on XPU (huggingface#2074)

79e2b38

Eetq/hqq/aqlm don't support XPU yet.

Improve test coverage for initialization of MHA

61e6934

BenjaminBossan mentioned this pull request Sep 23, 2024

Prompt-Tuning for text-to-image diffusion models #2085

Open

Merge branch 'main' into feat-add-lora-multihead-attention

ced2f15

BenjaminBossan added 2 commits October 21, 2024 15:42

Fix bug with unloading multihead attention layer

4c31bbc

Fix bug in unloading

1dbb9a5

Params need to be re-registered to appear in state dict.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[WIP] Add LoRA multihead attention module #1324

[WIP] Add LoRA multihead attention module #1324

BenjaminBossan commented Jan 5, 2024 •

edited

Loading

HuggingFaceDocBuilderDev commented Jan 5, 2024

younesbelkada left a comment

younesbelkada Jan 9, 2024

younesbelkada Jan 9, 2024

younesbelkada Jan 9, 2024

BenjaminBossan Jan 9, 2024

younesbelkada Jan 9, 2024

BenjaminBossan Jan 9, 2024

pacman100 left a comment

BenjaminBossan commented Jan 10, 2024

BenjaminBossan commented Jan 12, 2024

ambroser53 commented Jan 23, 2024

bghira commented Feb 26, 2024

BenjaminBossan commented Feb 26, 2024

bghira commented Feb 26, 2024

BenjaminBossan commented Feb 26, 2024

damian0815 commented Jul 26, 2024 •

edited

Loading

BenjaminBossan commented Jul 26, 2024

damian0815 commented Jul 26, 2024 •

edited

Loading

BenjaminBossan commented Jul 26, 2024

damian0815 commented Jul 26, 2024 •

edited

Loading

github-actions bot commented Aug 19, 2024

review-notebook-app bot commented Sep 18, 2024

github-actions bot commented Oct 13, 2024

mashijie1028 commented Oct 20, 2024 •

edited

Loading

[WIP] Add LoRA multihead attention module #1324

Are you sure you want to change the base?

[WIP] Add LoRA multihead attention module #1324

Conversation

BenjaminBossan commented Jan 5, 2024 • edited Loading

HuggingFaceDocBuilderDev commented Jan 5, 2024

younesbelkada left a comment

Choose a reason for hiding this comment

younesbelkada Jan 9, 2024

Choose a reason for hiding this comment

younesbelkada Jan 9, 2024

Choose a reason for hiding this comment

younesbelkada Jan 9, 2024

Choose a reason for hiding this comment

BenjaminBossan Jan 9, 2024

Choose a reason for hiding this comment

younesbelkada Jan 9, 2024

Choose a reason for hiding this comment

BenjaminBossan Jan 9, 2024

Choose a reason for hiding this comment

pacman100 left a comment

Choose a reason for hiding this comment

BenjaminBossan commented Jan 10, 2024

BenjaminBossan commented Jan 12, 2024

ambroser53 commented Jan 23, 2024

bghira commented Feb 26, 2024

BenjaminBossan commented Feb 26, 2024

bghira commented Feb 26, 2024

BenjaminBossan commented Feb 26, 2024

damian0815 commented Jul 26, 2024 • edited Loading

BenjaminBossan commented Jul 26, 2024

damian0815 commented Jul 26, 2024 • edited Loading

BenjaminBossan commented Jul 26, 2024

damian0815 commented Jul 26, 2024 • edited Loading

github-actions bot commented Aug 19, 2024

review-notebook-app bot commented Sep 18, 2024

github-actions bot commented Oct 13, 2024

mashijie1028 commented Oct 20, 2024 • edited Loading

BenjaminBossan commented Jan 5, 2024 •

edited

Loading

damian0815 commented Jul 26, 2024 •

edited

Loading

damian0815 commented Jul 26, 2024 •

edited

Loading

damian0815 commented Jul 26, 2024 •

edited

Loading

mashijie1028 commented Oct 20, 2024 •

edited

Loading