Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Create a small tutorial on how to accelerate HF Llama models with Transformer-Engine #615

Merged

Conversation

sudhakarsingh27
Copy link
Collaborator

@sudhakarsingh27 sudhakarsingh27 commented Jan 19, 2024

  1. Add a te_llama.py file which has following code:
    • TELlamaDecoderLayer (wrapper over TE's TransformerLayer which replace HF's LlamaDecoderLayer)
    • TELlamaForCausalLM creates the Language Model with TELlamaDecoderLayer instead of LlamaDecoderLayer.
    • from_pretrained_local that loads HF Llama 2 checkpoint weights (which are meant for LlamaDecoderLayer) into TELlamaDecoderLayer (ultimately the TE's TransformerLayer).
  2. Add a tutorial in jupyter notebook tutorial_accelerate_hf_llama_with_te.ipynb that showcases how to use the new TELlamaDecoderLayer. Also shows some basic benchmarks on H100 GPUs.
  3. Add utils.py file which contains necessary dataloading, model setup code to run the tutorial notebook seamlessly.

@ptrendx ptrendx added the 1.4.0 label Jan 30, 2024
Signed-off-by: Sudhakar Singh <[email protected]>
@sudhakarsingh27 sudhakarsingh27 force-pushed the llama_accelerate_tutorial branch from 86117f1 to 133e33d Compare February 16, 2024 12:50
Signed-off-by: Sudhakar Singh <[email protected]>
@ptrendx
Copy link
Member

ptrendx commented Feb 16, 2024

Please add a link to the tutorial to https://github.com/NVIDIA/TransformerEngine/blob/main/docs/index.rst

@sudhakarsingh27 sudhakarsingh27 force-pushed the llama_accelerate_tutorial branch from 7e21190 to 1af8871 Compare February 20, 2024 22:23
Signed-off-by: Sudhakar Singh <[email protected]>
@sudhakarsingh27 sudhakarsingh27 force-pushed the llama_accelerate_tutorial branch from 40e46e0 to ad4241d Compare February 26, 2024 21:02
…to llama_accelerate_tutorial

Signed-off-by: Sudhakar Singh <[email protected]>
@sudhakarsingh27 sudhakarsingh27 marked this pull request as ready for review February 26, 2024 21:05
Signed-off-by: Sudhakar Singh <[email protected]>
Signed-off-by: Sudhakar Singh <[email protected]>
Signed-off-by: Sudhakar Singh <[email protected]>
Signed-off-by: Sudhakar Singh <[email protected]>
Signed-off-by: Sudhakar Singh <[email protected]>
Signed-off-by: Sudhakar Singh <[email protected]>
Signed-off-by: Sudhakar Singh <[email protected]>
Signed-off-by: Sudhakar Singh <[email protected]>
Signed-off-by: Sudhakar Singh <[email protected]>
Signed-off-by: Sudhakar Singh <[email protected]>
Signed-off-by: Sudhakar Singh <[email protected]>
Signed-off-by: Sudhakar Singh <[email protected]>
Signed-off-by: Sudhakar Singh <[email protected]>
Signed-off-by: Sudhakar Singh <[email protected]>
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

😎

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

😎

@timmoon10 timmoon10 self-requested a review March 1, 2024 02:18
Copy link
Member

@ptrendx ptrendx left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From my side LGTM, thank you!

@sudhakarsingh27 sudhakarsingh27 merged commit 0bd84ed into NVIDIA:main Mar 1, 2024
9 checks passed
ptrendx pushed a commit that referenced this pull request Mar 1, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants