Don't use `_get_clones` #2270

ebsmothers · 2025-01-16T02:03:15Z

We've used the _get_clones utility a lot for cleaner instantiation of our TransformerDecoder from a single layer. However, we shouldn't use it for cases that require random initialization and don't subsequently override their params with a state dict load (i.e. what we do for LoRA when we're not resuming from an intermediate checkpoint). The following script demonstrates why: the initialized values for cloned modules will be the same across layers, so if we use _get_clones (and don't subsequently load in a weight to override the init values), all our layers have identical values.

import torch
from torchtune.modules.transformer import _get_clones
from torchtune.modules.peft import LoRALinear

def main():
    loras_loop = [None] * 4
    for i in range(4):
        loras_loop[i] = LoRALinear(in_dim=16, out_dim=8, rank=4, alpha=1.0)

    loras_cloned = _get_clones(
        LoRALinear(in_dim=16, out_dim=8, rank=4, alpha=1.0), 4
    )

    loop_max_diff = torch.max(torch.abs(loras_loop[0].lora_a.weight - loras_loop[3].lora_a.weight))
    cloned_max_diff = torch.max(torch.abs(loras_cloned[0].lora_a.weight - loras_cloned[3].lora_a.weight))

    print(f"Max diff in for loop: {loop_max_diff}")
    print(f"Max diff with _get_clones: {cloned_max_diff}")

if __name__ == "__main__":
    main()


...

Max diff in for loop: 0.4612632691860199
Max diff with _get_clones: 0.0

The text was updated successfully, but these errors were encountered:

RdoubleA · 2025-01-16T20:00:38Z

@ebsmothers what would be the recommended alternative here? Could we just call reset_parameters or similar on all the layers in _get_clones?

Ankur-singh · 2025-01-16T20:39:06Z

Dumb question: why can't we use for-loop? Does it have to do something with performance?

RdoubleA · 2025-01-16T20:52:40Z

The layer is already instantiated outside of TransformerDecoder/VisionTransformer, or wherever get_clones is used. So we can't use a for loop to instantiate multiple layers, unless we change the __init__ signatures to accept a ModuleList instead.

ebsmothers · 2025-01-16T22:21:14Z

@Ankur-singh not a dumb question at all. Actually it has nothing to do with performance but more to do with convenience. The idea of _get_clones is that you don't need a for loop, you can just pass a single layer with num_layers param and have it built for you. But if it messes with initialization then probably better to just always write the for loop.

@RdoubleA actually we do support it now (at least for TransformerDecoder). We even have a bunch of models being built in this way already, see e.g. Llama 3.1. So really we just need to migrate other builders onto this. (Personally I wouldn't recommend the reset_parameters approach cause it just introduces some extra API coupling).

Ankur-singh · 2025-01-16T23:25:00Z

Ability to pass ModuleList is quite neat. We should update the VisionTransformer.__init__ as well.

@ebsmothers after migrating other builders, will support for layers: nn.Module be dropped? If we plan to keep it around, then updating _get_clones to use a for-loop might be a good idea, as it would better accommodate the ModuleList format. This change could also improve flexibility and readability when managing layers in the model.

ebsmothers · 2025-01-17T18:23:39Z

@Ankur-singh agreed about updating VisionTransformer too. Nitpicking a bit but really we should probably drop support for n_layers (since ModuleList is still nn.Module). I don't think we can really keep _get_clones in its current format either, since the for loop approach requires that each Module be created inside the function rather than passed as an argument. (Though lmk if I'm misunderstanding your point)

Ankur-singh · 2025-01-17T18:37:01Z

the for loop approach requires that each Module be created inside the function rather than passed as an argument.

Thats a very good point. For now, we can update _get_clones to use the for-loop. Once all the models are updated to use the new builder API, we can drop support for n_layers.

Does that sound good? I can submit a PR updating all the model builders.

ebsmothers · 2025-01-17T19:01:14Z

Yes that sounds great. Thanks @Ankur-singh, looking forward to the PR!

ebsmothers added the best practice Things we should be doing but aren't label Jan 16, 2025

joecummings changed the title ~~Don't use _get_clones~~ Don't use _get_clones Jan 16, 2025

RdoubleA added the community help wanted We would love the community's help completing this issue label Jan 16, 2025

Ankur-singh mentioned this issue Jan 19, 2025

Update model builders #2282

Merged

13 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Don't use `_get_clones` #2270

Don't use `_get_clones` #2270

ebsmothers commented Jan 16, 2025

RdoubleA commented Jan 16, 2025

Ankur-singh commented Jan 16, 2025

RdoubleA commented Jan 16, 2025

ebsmothers commented Jan 16, 2025

Ankur-singh commented Jan 16, 2025

ebsmothers commented Jan 17, 2025

Ankur-singh commented Jan 17, 2025

ebsmothers commented Jan 17, 2025

Don't use _get_clones #2270

Don't use _get_clones #2270

Comments

ebsmothers commented Jan 16, 2025

RdoubleA commented Jan 16, 2025

Ankur-singh commented Jan 16, 2025

RdoubleA commented Jan 16, 2025

ebsmothers commented Jan 16, 2025

Ankur-singh commented Jan 16, 2025

ebsmothers commented Jan 17, 2025

Ankur-singh commented Jan 17, 2025

ebsmothers commented Jan 17, 2025

Don't use `_get_clones` #2270

Don't use `_get_clones` #2270