Skip to content

Commit ea9dd5f

Browse files
aws-bowencchannanjgaws
authored andcommitted
1 parent 18b448f commit ea9dd5f

File tree

2 files changed

+2
-2
lines changed

2 files changed

+2
-2
lines changed

README.md

+1-1
Original file line numberDiff line numberDiff line change
@@ -419,7 +419,7 @@ for running HuggingFace `facebook/opt-13b` autoregressive sampling on a trn1.2xl
419419
- [OPT](https://huggingface.co/docs/transformers/model_doc/opt)
420420
- [GPT-Neox [Experimental]](https://huggingface.co/docs/transformers/model_doc/gpt_neox)
421421
- [Bloom [Experimental]](https://huggingface.co/docs/transformers/model_doc/bloom)
422-
- [LLaMA [Experimental]](https://huggingface.co/docs/transformers/main/model_doc/llama)
422+
- [LLaMA [Prototype]](https://huggingface.co/docs/transformers/main/model_doc/llama)
423423

424424
# Upcoming features
425425

releasenotes.md

+1-1
Original file line numberDiff line numberDiff line change
@@ -6,7 +6,7 @@ Date: 2023-07-03
66

77
- [Experimental] Added support for GPT-NeoX models.
88
- [Experimental] Added support for BLOOM models.
9-
- [Experimental] Added support for LLaMA models.
9+
- [Prototype] Added support for LLaMA models.
1010
- Added support for more flexible tensor-parallel configurations to GPT2, OPT, and BLOOM. Previously, we had two constraints on `tp_degree`: 1) The attention heads needs to be evenly divisible by `tp_degree` 2) The `tp_degree` needs to satisfy the runtime topologies constraint for collective communication (i.e Allreduce). For more details on supported topologies, see: [Tensor-parallelism support](README.md#tensor-parallelism-support) and https://awsdocs-neuron.readthedocs-hosted.com/en/latest/general/arch/neuron-features/collective-communication.html. We now get rid of 1) by using 1-axis padding.
1111
- Added multi-query / multi-group attention support for GPT2.
1212

0 commit comments

Comments
 (0)