(Part 2) feat: allow for tp_size attr for tplizing the model#37054
(Part 2) feat: allow for tp_size attr for tplizing the model#37054SunMarc merged 5 commits intohuggingface:mainfrom
Conversation
|
Hi 👋, thank you for opening this pull request! The pull request is converted to draft by default. The CI will be paused while the PR is in draft mode. When it is ready for review, please click the |
SunMarc
left a comment
There was a problem hiding this comment.
Thanks, left a couple of comments
| tp_size (`str`, *optional*): | ||
| A torch tensor parallel degree. If not provided would default to world size. |
There was a problem hiding this comment.
Not needed for this specific PR. I don't know if we want to add this option yet cc @ArthurZucker
There was a problem hiding this comment.
We can have it in a separate PR as well, however, its needed to support TP + FSDP/DDP.
I don't know if we want to add this option yet
Sure, @ArthurZucker Let me know your thoughts.
There was a problem hiding this comment.
@SunMarc Would appreciate it here, been looking at enabling TP + FSDP and this is exactly what I used myself.
cc @ArthurZucker
|
Please fix the conflits and I will merge this PR ! |
a33e9ef to
b7abb2a
Compare
|
@SunMarc Fixed the conflicts and the failing test seem to be unrelated. Thanks |
|
@SunMarc looks like even the recently merged commit is failing for this testcase, so its totally unrelated to this PR. |
| import torch | ||
|
|
||
| from transformers import AutoModelForCausalLM | ||
|
|
||
|
|
||
| m2 = AutoModelForCausalLM.from_pretrained("TinyLlama/TinyLlama-1.1B-Chat-v1.0", tp_plan=None) | ||
| m = AutoModelForCausalLM.from_pretrained("TinyLlama/TinyLlama-1.1B-Chat-v1.0", tp_plan="auto") | ||
|
|
||
| ft = m.lm_head.weight.full_tensor().to("cpu") | ||
| assert torch.equal(ft, m2.lm_head.weight.to("cpu")) |
There was a problem hiding this comment.
let's add this in the tensor_parallel test file instead of having this here. Please also add a description of what you are trying to do
There was a problem hiding this comment.
Apologies, this file is not intended for this PR, hence removed, thanks.
| generation_config = kwargs.pop("generation_config", None) | ||
| gguf_file = kwargs.pop("gguf_file", None) | ||
| tp_plan = kwargs.pop("tp_plan", None) | ||
| tp_size = kwargs.pop("tp_size", None) |
There was a problem hiding this comment.
let's raise an error if tp_size was set but not tp_plan
There was a problem hiding this comment.
@SunMarc Addressed this comment, thank you.
307fc4e to
33af129
Compare
ccf1889 to
43bb071
Compare
Signed-off-by: Mehant Kammakomati <mehant.kammakomati2@ibm.com>
Signed-off-by: Mehant Kammakomati <mehant.kammakomati2@ibm.com>
Signed-off-by: Mehant Kammakomati <mehant.kammakomati2@ibm.com>
|
@SunMarc rebased the branch, are we waiting on something? |
|
Waiting for the tests to pass ;) I will merge it as soon as the ci is green ! |
|
The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update. |
…face#37054) * feat: custom tp_size, new transformers tp interface Signed-off-by: Mehant Kammakomati <mehant.kammakomati2@ibm.com> * fix: review cmt - error when tp_plan not set for tp_size Signed-off-by: Mehant Kammakomati <mehant.kammakomati2@ibm.com> * fix: nit in docs Signed-off-by: Mehant Kammakomati <mehant.kammakomati2@ibm.com> --------- Signed-off-by: Mehant Kammakomati <mehant.kammakomati2@ibm.com> Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com> Co-authored-by: Matej Sirovatka <54212263+S1ro1@users.noreply.github.com>
…face#37054) * feat: custom tp_size, new transformers tp interface Signed-off-by: Mehant Kammakomati <mehant.kammakomati2@ibm.com> * fix: review cmt - error when tp_plan not set for tp_size Signed-off-by: Mehant Kammakomati <mehant.kammakomati2@ibm.com> * fix: nit in docs Signed-off-by: Mehant Kammakomati <mehant.kammakomati2@ibm.com> --------- Signed-off-by: Mehant Kammakomati <mehant.kammakomati2@ibm.com> Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com> Co-authored-by: Matej Sirovatka <54212263+S1ro1@users.noreply.github.com>
What does this PR do?
Discussed at huggingface/accelerate#3457
tp_sizeto allow for TP sharding apart from world sizetp_sizean attribute of the model only initialized after TP sharding completed which can be an indicator if the model has undergone tp sharding for usage in accelerate. (discussed with @SunMarc)tp_sizefrom train arguments, since from now on it is to perform TP training only if the model has undergone TP sharding already.Before submitting
Pull Request section?
to it if that's the case.
documentation guidelines, and
here are tips on formatting docstrings.
Who can review?
Anyone in the community is free to review the PR once the tests have passed. Feel free to tag
members/contributors who may be interested in your PR.