[WIP] Add LoRA multihead attention module #1324

For now, only works with _qkv_same_embed_dim=True.

This is no longer necessary when unloading the model because the base_layer is already the original layer. This is just a leftover from before we adopted the base_layer pattern.

There was a bug because the removal of the parameter resulted in it no longer appearing in the state_dict and named_parameters. This commit fixes this bug. The bug also exists in the referenced lora-torch library.

- Some clarifying comments - Remove fan_in_fan_out Also: - Raise proper error instead of assert

Before, LoRA was applied only to the in_proj. Now it is also applied to the out_proj. Unfortunately, there is no easy way to just apply a normal lora.Linear to the out_proj by targeting it with target_modules. If that worked, it would be much nicer to do that, so that users can decide for themselves if they want to apply LoRA to the out_proj or not. The reason why it doesn't work is twofold: 1. We cannot really control the order in which LoRA is applied, so when the LoRA adapter is injected to out_proj, the whole MHA layer may already be wrapped by lora.MultiheadAttention. 2. Even if we successfully applied a normal lora.Linear to the out_proj, it would not work correctly. This is because the forward method of out_proj is not used at all by nn.MultiheadAttention. Instead, it just passes the weight and bias to F.multi_head_attention_forward. Therefore, we must ensure that the weights are merged and unmerged correctly, same as for in_proj, and we cannot do that if we use a normal lora.Linear. Note that the test test_merge_layers for MHA fails. This is most likely because of an existing bug in now merging is implemented, see PR huggingface#1355. Once that is merged, the test should pass.

Not trivial to implement for MHA

Probably not hard to implement

There was a situation were loading the state dict would fail and require a workaround. For this, there was an xfail-ing test with strict=True. This test no longer fails, so the marker has been removed, as well as the test with the workaround.

The buffer does not need to be part of the checkpoint, by making it non-persistent, the file size can be greatly reduced.

Fix bug in parsing command line arguments in the PiSSA preprocess.py script from the PiSSA example.

In docs and examples, use eval_strategy instead of evaluation_strategy, which is deprecated.

Extend the functionality of having different adapters in the same batch to also work with `modules_to_save`.

There was a bug in BOFT that made it impossible in some circumstances to load more than one adapter (creating more than 1 adapter was possible though). This was because a code path that adjusts boft_n_butterfly_factor was only visited when creating a fresh adapter, but not when updating with the 2nd adapter. This was fixed by moving this code path from the BOFT layer's __init__ method to update_layer. A test for loading multiple adapters was added. Since this was a gap in our test suite, this test will be applied to all appropriate PEFT methods, not only BOFT, but the others methods are all passing without needing further changes. For good measure, I also added BOFT to the test suite that checks multiple active adapters. These tests would have also passed without the fix in this PR, since these tests do not load multiple adapters but instead create them, which always worked. Still it's better to have these tests as well.

Eetq/hqq/aqlm don't support XPU yet.

Params need to be re-registered to appear in state dict.

Commits on Jan 8, 2024

Extend test coverage a bit

BenjaminBossan committed Jan 8, 2024

Configuration menu

View commit details

Copy full SHA for 1e007f5

Browse repository at this point

Copy the full SHA

1e007f5 View commit details

Browse the repository at this point in the history

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[WIP] Add LoRA multihead attention module #1324

[WIP] Add LoRA multihead attention module #1324

Commits on Jan 5, 2024

Commits on Jan 8, 2024

Commits on Jan 9, 2024

Commits on Jan 12, 2024

Commits on Feb 7, 2024

Commits on Feb 26, 2024

Commits on Mar 11, 2024

Commits on Mar 26, 2024

Commits on May 21, 2024

Commits on Jul 25, 2024

Commits on Jul 26, 2024

Commits on Sep 3, 2024

Commits on Sep 4, 2024

Commits on Sep 12, 2024

Commits on Sep 18, 2024

Commits on Oct 14, 2024

Commits on Oct 21, 2024

Commits on Oct 22, 2024

[WIP] Add LoRA multihead attention module #1324

Are you sure you want to change the base?

[WIP] Add LoRA multihead attention module #1324

Commits on Jan 5, 2024

Commits on Jan 8, 2024

Commits on Jan 9, 2024

Commits on Jan 12, 2024

Commits on Feb 7, 2024

Commits on Feb 26, 2024

Commits on Mar 11, 2024

Commits on Mar 26, 2024

Commits on May 21, 2024

Commits on Jul 25, 2024

Commits on Jul 26, 2024

Commits on Sep 3, 2024

Commits on Sep 4, 2024

Commits on Sep 12, 2024

Commits on Sep 18, 2024

Commits on Oct 14, 2024

Commits on Oct 21, 2024

Commits on Oct 22, 2024