[JAX] Refine MHA API and add DPA API #653

zlsh80826 · 2024-02-05T07:15:31Z

This PR does the following changes:

Refine the MHA API to align with TransformerLayer API
a. Replace num_heads with num_attention_heads.
b. Replace droout_rate with attention_dropout.
c. Replace output_layernorm with input_layernorm. where this params is used to apply the layernorm on the inputs.
d. Replace apply_residual_connection_post_layernorm with return_layernorm_output.
e. Replace fuse_qkv with fused_qkv_params.
f. The old params are marked as deprecated, and will be removed in the future.

Add DotProductAttention module API
Since JAX doesn't have the view/stride concept, all tensors are always in continuous memory format. To have a unify API for all qkvpacked, kvpacked, qkv separate with 1 input tensor, 2 input tensors and 3 input tensors respectively. This module accepts qkv_layout to parse query, key, value tensors for the different inputs combinations.

 * bs3hd: query tensor is treated as a qkvpacked tensor with shape = [b, s, 3, h, d].
   key and value arguments in :attr:`__call__()` are ignored in this layout.
 * bshd_bs2hd: query tensor with shape = [b, s, h, d]. key tensor is treaded as a kvpacked
   tensor with shape = [b, s, 2, h, d]. `value` argument in :attr:`__call__()` is ignored.
 * bshd_bshd_bshd: query, key, and value are seperated with shape = [b, s, h, d].

Support BSHD_BSHD_BSHD fused attention custom call

Signed-off-by: Reese Wang <[email protected]>

zlsh80826 · 2024-02-05T07:15:42Z

/te-ci jax

Signed-off-by: Reese Wang <[email protected]>

zlsh80826 · 2024-02-05T14:00:06Z

/te-ci jax

transformer_engine/jax/praxis/transformer.py

qa/L0_jax_unittest/test.sh

transformer_engine/common/fused_attn/fused_attn.cpp

transformer_engine/jax/csrc/modules.cpp

transformer_engine/jax/cpp_extensions.py

zlsh80826 · 2024-02-06T14:33:34Z

@denera, thanks for the review. Right, we currently have many duplicated code for three different qkv layouts, I think I can reorganize the code in transformer_engine/jax/csrc/modules.cpp to make it clean. However, I afraid we can't reduce the code for transformer_engine/jax/cpp_extensions.py easily since the core difference between FusedAttn classes and scaledSoftmax classes is that the number of arguments are not the same. You can see that the ScaledMaskSoftmax has different number of inputs, so it has to override the entire abstract function. The three different FusedAttn layouts have different number of inputs, so the base class can't help. I have some ideas to integrate them, but I afraid that will make this PR too long and delay the DPA release. I would like to open another PR for refactoring the cpp_extensions.py and fused_attn.py (also rename self_attn to fused_attn_qkvpacked, cross_attn to fused_attn_kvpacked), how do you think?

denera · 2024-02-06T14:36:46Z

@zlsh80826 I think that's fair. I agree that it's likely too much to do in this PR. I wanted to bring it up as a question just to have it on our radar. Otherwise I think the rest of it looks good. Thanks!!

Signed-off-by: Reese Wang <[email protected]>

zlsh80826 · 2024-02-07T16:24:04Z

/te-ci

Signed-off-by: Reese Wang <[email protected]>

zlsh80826 · 2024-02-08T16:27:12Z

/te-ci

tests/jax/test_praxis_layers.py

transformer_engine/jax/cpp_extensions.py

Signed-off-by: Reese Wang <[email protected]>

zlsh80826 · 2024-02-10T13:01:43Z

/te-ci

cyanguwa

LGTM

Signed-off-by: Reese Wang <[email protected]>

zlsh80826 · 2024-02-20T08:57:02Z

/te-ci

zlsh80826 · 2024-02-22T17:01:02Z

JET CI verified.

denera · 2024-02-22T18:08:56Z

Merging this PR. Thanks @zlsh80826!

zlsh80826 added 14 commits February 5, 2024 06:15

Refine MHA API

bc6868a

Signed-off-by: Reese Wang <[email protected]>

Reuse func from the flax

12f11c7

Signed-off-by: Reese Wang <[email protected]>

DPA draft

25c1a98

Signed-off-by: Reese Wang <[email protected]>

qkv packed draft

84062a1

Signed-off-by: Reese Wang <[email protected]>

Fix test_layer with fused attn

c893ef1

Signed-off-by: Reese Wang <[email protected]>

Add attn_bias_type and enhance a few code flow

b03c651

Signed-off-by: Reese Wang <[email protected]>

Move scale_factor from __call__ to init

53aadaf

Signed-off-by: Reese Wang <[email protected]>

Enhance the docs

f5876d5

Signed-off-by: Reese Wang <[email protected]>

Add DPA public API and tests

24a8eef

Signed-off-by: Reese Wang <[email protected]>

Refine docs

36d5f69

Signed-off-by: Reese Wang <[email protected]>

Refine docs

6c5ded9

Signed-off-by: Reese Wang <[email protected]>

Fix conflict

cd4f13a

Signed-off-by: Reese Wang <[email protected]>

Add qkv separate fused attn

a34eca4

Signed-off-by: Reese Wang <[email protected]>

Apply BSHD_BSHD_BSHD format

4e22fae

Signed-off-by: Reese Wang <[email protected]>

zlsh80826 mentioned this pull request Feb 5, 2024

[JAX] Refine MHA module #612

Closed

zlsh80826 added 3 commits February 5, 2024 08:51

Remove debug log

0f5239d

Signed-off-by: Reese Wang <[email protected]>

Add fused attention layer tests

69174c2

Signed-off-by: Reese Wang <[email protected]>

Add NVTE_FUSED_ATTN docs

f06c2c5

Signed-off-by: Reese Wang <[email protected]>

zlsh80826 requested review from cyanguwa, denera and ptrendx February 5, 2024 14:37

ksivaman reviewed Feb 5, 2024

View reviewed changes

transformer_engine/jax/praxis/transformer.py Outdated Show resolved Hide resolved

ksivaman reviewed Feb 5, 2024

View reviewed changes

transformer_engine/jax/praxis/transformer.py Show resolved Hide resolved

ksivaman reviewed Feb 5, 2024

View reviewed changes

qa/L0_jax_unittest/test.sh Outdated Show resolved Hide resolved

denera reviewed Feb 5, 2024

View reviewed changes

denera approved these changes Feb 6, 2024

View reviewed changes

zlsh80826 added 2 commits February 7, 2024 16:05

Fine-grained fused attn settings

fcf6c56

Signed-off-by: Reese Wang <[email protected]>

Remove the default value of num_attetnion_head and head_dim

8666d48

Signed-off-by: Reese Wang <[email protected]>

Add teardown for fused attn env

cc59cbd

Signed-off-by: Reese Wang <[email protected]>

cyanguwa approved these changes Feb 8, 2024

View reviewed changes

tests/jax/test_praxis_layers.py Show resolved Hide resolved

tests/jax/test_praxis_layers.py Show resolved Hide resolved

transformer_engine/jax/cpp_extensions.py Outdated Show resolved Hide resolved

zlsh80826 added 3 commits February 10, 2024 07:42

Unify the Optional notation

6f660d5

Signed-off-by: Reese Wang <[email protected]>

Fix Pre/Post scale bias comments

1d26d02

Signed-off-by: Reese Wang <[email protected]>

Add no_mask tests

2b0bf34

Signed-off-by: Reese Wang <[email protected]>

nouiz added the jax label Feb 14, 2024

zlsh80826 requested review from ksivaman and cyanguwa February 15, 2024 03:03

cyanguwa approved these changes Feb 15, 2024

View reviewed changes

zlsh80826 added 2 commits February 20, 2024 08:43

Add checkpoint_name for fused attn

e3e8f9f

Signed-off-by: Reese Wang <[email protected]>

Fix the fused attn batcher

18d7b64

Signed-off-by: Reese Wang <[email protected]>

denera mentioned this pull request Feb 21, 2024

[JAX] Support arbitrary bias shape in fused attention custom ops #676

Closed

denera merged commit 9b2fed5 into NVIDIA:main Feb 22, 2024
25 of 28 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[JAX] Refine MHA API and add DPA API #653

[JAX] Refine MHA API and add DPA API #653

zlsh80826 commented Feb 5, 2024 •

edited

Loading

zlsh80826 commented Feb 5, 2024

zlsh80826 commented Feb 5, 2024

zlsh80826 commented Feb 6, 2024 •

edited

Loading

denera commented Feb 6, 2024

zlsh80826 commented Feb 7, 2024

zlsh80826 commented Feb 8, 2024

zlsh80826 commented Feb 10, 2024

cyanguwa left a comment

zlsh80826 commented Feb 20, 2024

zlsh80826 commented Feb 22, 2024

denera commented Feb 22, 2024

[JAX] Refine MHA API and add DPA API #653

[JAX] Refine MHA API and add DPA API #653

Conversation

zlsh80826 commented Feb 5, 2024 • edited Loading

zlsh80826 commented Feb 5, 2024

zlsh80826 commented Feb 5, 2024

zlsh80826 commented Feb 6, 2024 • edited Loading

denera commented Feb 6, 2024

zlsh80826 commented Feb 7, 2024

zlsh80826 commented Feb 8, 2024

zlsh80826 commented Feb 10, 2024

cyanguwa left a comment

Choose a reason for hiding this comment

zlsh80826 commented Feb 20, 2024

zlsh80826 commented Feb 22, 2024

denera commented Feb 22, 2024

zlsh80826 commented Feb 5, 2024 •

edited

Loading

zlsh80826 commented Feb 6, 2024 •

edited

Loading