-
Notifications
You must be signed in to change notification settings - Fork 510
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Don't use _get_clones
#2270
Comments
@ebsmothers what would be the recommended alternative here? Could we just call |
Dumb question: why can't we use |
The layer is already instantiated outside of |
@Ankur-singh not a dumb question at all. Actually it has nothing to do with performance but more to do with convenience. The idea of @RdoubleA actually we do support it now (at least for TransformerDecoder). We even have a bunch of models being built in this way already, see e.g. Llama 3.1. So really we just need to migrate other builders onto this. (Personally I wouldn't recommend the |
Ability to pass @ebsmothers after migrating other builders, will support for |
@Ankur-singh agreed about updating |
Thats a very good point. For now, we can update Does that sound good? I can submit a PR updating all the model builders. |
Yes that sounds great. Thanks @Ankur-singh, looking forward to the PR! |
We've used the _get_clones utility a lot for cleaner instantiation of our
TransformerDecoder
from a single layer. However, we shouldn't use it for cases that require random initialization and don't subsequently override their params with a state dict load (i.e. what we do for LoRA when we're not resuming from an intermediate checkpoint). The following script demonstrates why: the initialized values for cloned modules will be the same across layers, so if we use_get_clones
(and don't subsequently load in a weight to override the init values), all our layers have identical values.The text was updated successfully, but these errors were encountered: