-
Notifications
You must be signed in to change notification settings - Fork 346
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Create a small tutorial on how to accelerate HF Llama models with Transformer-Engine #615
Create a small tutorial on how to accelerate HF Llama models with Transformer-Engine #615
Conversation
Signed-off-by: Sudhakar Singh <[email protected]>
86117f1
to
133e33d
Compare
Signed-off-by: Sudhakar Singh <[email protected]>
Please add a link to the tutorial to https://github.com/NVIDIA/TransformerEngine/blob/main/docs/index.rst |
Signed-off-by: Sudhakar Singh <[email protected]>
Signed-off-by: Sudhakar Singh <[email protected]>
7e21190
to
1af8871
Compare
Signed-off-by: Sudhakar Singh <[email protected]>
40e46e0
to
ad4241d
Compare
…to llama_accelerate_tutorial Signed-off-by: Sudhakar Singh <[email protected]>
Signed-off-by: Sudhakar Singh <[email protected]>
docs/examples/te_llama/tutorial_accelerate_hf_llama_with_te.ipynb
Outdated
Show resolved
Hide resolved
docs/examples/te_llama/tutorial_accelerate_hf_llama_with_te.ipynb
Outdated
Show resolved
Hide resolved
docs/examples/te_llama/tutorial_accelerate_hf_llama_with_te.ipynb
Outdated
Show resolved
Hide resolved
docs/examples/te_llama/tutorial_accelerate_hf_llama_with_te.ipynb
Outdated
Show resolved
Hide resolved
docs/examples/te_llama/tutorial_accelerate_hf_llama_with_te.ipynb
Outdated
Show resolved
Hide resolved
docs/examples/te_llama/tutorial_accelerate_hf_llama_with_te.ipynb
Outdated
Show resolved
Hide resolved
docs/examples/te_llama/tutorial_accelerate_hf_llama_with_te.ipynb
Outdated
Show resolved
Hide resolved
docs/examples/te_llama/tutorial_accelerate_hf_llama_with_te.ipynb
Outdated
Show resolved
Hide resolved
docs/examples/te_llama/tutorial_accelerate_hf_llama_with_te.ipynb
Outdated
Show resolved
Hide resolved
docs/examples/te_llama/tutorial_accelerate_hf_llama_with_te.ipynb
Outdated
Show resolved
Hide resolved
Signed-off-by: Sudhakar Singh <[email protected]>
Signed-off-by: Sudhakar Singh <[email protected]>
Signed-off-by: Sudhakar Singh <[email protected]>
Signed-off-by: Sudhakar Singh <[email protected]>
Signed-off-by: Sudhakar Singh <[email protected]>
Signed-off-by: Sudhakar Singh <[email protected]>
Signed-off-by: Sudhakar Singh <[email protected]>
Signed-off-by: Sudhakar Singh <[email protected]>
Signed-off-by: Sudhakar Singh <[email protected]>
Signed-off-by: Sudhakar Singh <[email protected]>
Signed-off-by: Sudhakar Singh <[email protected]>
Signed-off-by: Sudhakar Singh <[email protected]>
Signed-off-by: Sudhakar Singh <[email protected]>
Signed-off-by: Sudhakar Singh <[email protected]>
Signed-off-by: Sudhakar Singh <[email protected]>
Signed-off-by: Sudhakar Singh <[email protected]>
docs/examples/te_llama/tutorial_accelerate_hf_llama_with_te.ipynb
Outdated
Show resolved
Hide resolved
docs/examples/te_llama/tutorial_accelerate_hf_llama_with_te.ipynb
Outdated
Show resolved
Hide resolved
docs/examples/te_llama/tutorial_accelerate_hf_llama_with_te.ipynb
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
😎
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
😎
Signed-off-by: Sudhakar Singh <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
From my side LGTM, thank you!
te_llama.py
file which has following code:TELlamaDecoderLayer
(wrapper over TE'sTransformerLayer
which replace HF'sLlamaDecoderLayer
)TELlamaForCausalLM
creates the Language Model withTELlamaDecoderLayer
instead ofLlamaDecoderLayer
.from_pretrained_local
that loads HF Llama 2 checkpoint weights (which are meant forLlamaDecoderLayer
) intoTELlamaDecoderLayer
(ultimately the TE'sTransformerLayer
).tutorial_accelerate_hf_llama_with_te.ipynb
that showcases how to use the newTELlamaDecoderLayer
. Also shows some basic benchmarks on H100 GPUs.utils.py
file which contains necessary dataloading, model setup code to run the tutorial notebook seamlessly.