New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

Sign up for GitHub

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Jump to bottom

Create a small tutorial on how to accelerate HF Llama models with Transformer-Engine #615

Merged

sudhakarsingh27 merged 26 commits into NVIDIA:main from sudhakarsingh27:llama_accelerate_tutorial

Mar 1, 2024

Collaborator

sudhakarsingh27 commented Jan 19, 2024 •

edited

Loading

Add a te_llama.py file which has following code:
- TELlamaDecoderLayer (wrapper over TE's TransformerLayer which replace HF's LlamaDecoderLayer)
- TELlamaForCausalLM creates the Language Model with TELlamaDecoderLayer instead of LlamaDecoderLayer.
- from_pretrained_local that loads HF Llama 2 checkpoint weights (which are meant for LlamaDecoderLayer) into TELlamaDecoderLayer (ultimately the TE's TransformerLayer).
Add a tutorial in jupyter notebook tutorial_accelerate_hf_llama_with_te.ipynb that showcases how to use the new TELlamaDecoderLayer. Also shows some basic benchmarks on H100 GPUs.
Add utils.py file which contains necessary dataloading, model setup code to run the tutorial notebook seamlessly.

ptrendx added the 1.4.0 label


          te_llama code and tutorial

133e33d

Signed-off-by: Sudhakar Singh <[email protected]>

sudhakarsingh27 force-pushed the llama_accelerate_tutorial branch from 86117f1 to 133e33d Compare

February 16, 2024 12:50


          remove extra file

f3114b0

Signed-off-by: Sudhakar Singh <[email protected]>

Member

ptrendx commented Feb 16, 2024

Please add a link to the tutorial to https://github.com/NVIDIA/TransformerEngine/blob/main/docs/index.rst

sudhakarsingh27 added 2 commits

February 16, 2024 14:55


          add tutorial to the rst index

3a0a4ae

Signed-off-by: Sudhakar Singh <[email protected]>


          change the location of the tutorial

ad36148

Signed-off-by: Sudhakar Singh <[email protected]>

sudhakarsingh27 force-pushed the llama_accelerate_tutorial branch from 7e21190 to 1af8871 Compare

February 20, 2024 22:23


          add more feedback

ad4241d

Signed-off-by: Sudhakar Singh <[email protected]>

sudhakarsingh27 force-pushed the llama_accelerate_tutorial branch from 40e46e0 to ad4241d Compare

February 26, 2024 21:02


          Merge branch 'main' of https://github.com/NVIDIA/TransformerEngine in…

cad4e06

…to llama_accelerate_tutorial

Signed-off-by: Sudhakar Singh <[email protected]>

sudhakarsingh27 marked this pull request as ready for review

February 26, 2024 21:05


          fix small errs while rendering in rst

a59ec0b

Signed-off-by: Sudhakar Singh <[email protected]>

ptrendx reviewed

View reviewed changes

docs/examples/te_llama/tutorial_accelerate_hf_llama_with_te.ipynb Outdated Show resolved Hide resolved

ptrendx reviewed

View reviewed changes

docs/examples/te_llama/tutorial_accelerate_hf_llama_with_te.ipynb Outdated Show resolved Hide resolved

ptrendx reviewed

View reviewed changes

docs/examples/te_llama/tutorial_accelerate_hf_llama_with_te.ipynb Outdated Show resolved Hide resolved

ptrendx reviewed

View reviewed changes

docs/examples/te_llama/tutorial_accelerate_hf_llama_with_te.ipynb Show resolved Hide resolved

ptrendx reviewed

View reviewed changes

docs/examples/te_llama/tutorial_accelerate_hf_llama_with_te.ipynb Outdated Show resolved Hide resolved

ptrendx reviewed

View reviewed changes

docs/examples/te_llama/tutorial_accelerate_hf_llama_with_te.ipynb Outdated Show resolved Hide resolved

ptrendx reviewed

View reviewed changes

docs/examples/te_llama/tutorial_accelerate_hf_llama_with_te.ipynb Outdated Show resolved Hide resolved

ptrendx reviewed

View reviewed changes

docs/examples/te_llama/tutorial_accelerate_hf_llama_with_te.ipynb Show resolved Hide resolved

ptrendx reviewed

View reviewed changes

docs/examples/te_llama/tutorial_accelerate_hf_llama_with_te.ipynb Outdated Show resolved Hide resolved

ptrendx reviewed

View reviewed changes

docs/examples/te_llama/tutorial_accelerate_hf_llama_with_te.ipynb Outdated Show resolved Hide resolved

ptrendx reviewed

View reviewed changes

docs/examples/te_llama/tutorial_accelerate_hf_llama_with_te.ipynb Outdated Show resolved Hide resolved

ptrendx reviewed

View reviewed changes

docs/examples/te_llama/tutorial_accelerate_hf_llama_with_te.ipynb Outdated Show resolved Hide resolved

sudhakarsingh27 added 5 commits

February 28, 2024 03:59


          fix more from feedback

cb43994

Signed-off-by: Sudhakar Singh <[email protected]>


          weeding out minor errors

3d15934

Signed-off-by: Sudhakar Singh <[email protected]>


          more weeding out errors

5b08ff0

Signed-off-by: Sudhakar Singh <[email protected]>


          more fixes

51fc99f

Signed-off-by: Sudhakar Singh <[email protected]>


          reorganize sections

369a3de

Signed-off-by: Sudhakar Singh <[email protected]>

sudhakarsingh27 added 12 commits

February 28, 2024 14:55


          remove extraneous code

e97101a

Signed-off-by: Sudhakar Singh <[email protected]>


          final fixes

0f68243

Signed-off-by: Sudhakar Singh <[email protected]>


          Merge branch 'main' into llama_accelerate_tutorial

efb0e3a


          more fixes

ec0f7b3

Signed-off-by: Sudhakar Singh <[email protected]>


          more fixes

a6ca4a8

Signed-off-by: Sudhakar Singh <[email protected]>


          add spaces to correctly spell Transformer Engine

8c5d547

Signed-off-by: Sudhakar Singh <[email protected]>


          add spaces to spell Llama 2

986e88d

Signed-off-by: Sudhakar Singh <[email protected]>


          add svg for the transformer vs llama image

e4583e1

Signed-off-by: Sudhakar Singh <[email protected]>


          change images to svg

9b732e6

Signed-off-by: Sudhakar Singh <[email protected]>


          increase image size in svg

53fe227

Signed-off-by: Sudhakar Singh <[email protected]>


          resize svg images

db76301

Signed-off-by: Sudhakar Singh <[email protected]>


          fix llamadecoderlayer image

299116b

Signed-off-by: Sudhakar Singh <[email protected]>

timmoon10 reviewed

View reviewed changes

docs/examples/te_llama/tutorial_accelerate_hf_llama_with_te.ipynb Outdated Show resolved Hide resolved

docs/examples/te_llama/tutorial_accelerate_hf_llama_with_te.ipynb Outdated Show resolved Hide resolved

docs/examples/te_llama/tutorial_accelerate_hf_llama_with_te.ipynb Outdated Show resolved Hide resolved

docs/examples/te_llama/media/transformer_llama.png Outdated

Collaborator

timmoon10 Mar 1, 2024

😎

Collaborator Author

sudhakarsingh27 Mar 1, 2024

😎

timmoon10 self-requested a review

March 1, 2024 02:18

sudhakarsingh27 added 2 commits

February 29, 2024 21:50


          more fixes

079d17a

Signed-off-by: Sudhakar Singh <[email protected]>


          Merge branch 'main' into llama_accelerate_tutorial

88bbc01

ptrendx approved these changes

View reviewed changes

Member

ptrendx left a comment

From my side LGTM, thank you!

sudhakarsingh27 merged commit 0bd84ed into NVIDIA:main

9 checks passed

ptrendx pushed a commit that referenced this pull request


          Create a small tutorial on how to accelerate HF Llama models with Tra…

5da878d

…nsformer-Engine (#615)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels