Translate the repetition pattern to expand and reshape #3645

naoyam · 2024-12-25T02:05:09Z

In RoPE, this repeat pattern shows up commonly:

https://github.com/huggingface/transformers/blob/main/src/transformers/models/llama/modeling_llama.py#L136

torch.cat((freqs, freqs), dim=-1)

This pattern shows up as PadOp followed by CatOp in nvFuser.

This preseg pass translates this pattern to expand and reshape ops. For example, given a pattern like:

t0 = [i0];
t1 = cat({t0, t0}, -1);

It will be translated to:

t0 = [i0]
t2 = broadcast(t0, {true, false});
t3 = expand(t2, {2, i0});
t4 = reshape(t3, {2 * i0});

And all uses of t1 will be replaced by t4.

While the pattern can be handled by the resize scheduler, it's currently limited to segments with pointwise ops only, and its scheduling heuristics are not tuned yet. Specifically, I experimentally observed a significant perf gain with the Mistral backward function.

naoyam · 2024-12-25T02:05:33Z

!test --diff

naoyam · 2024-12-25T04:10:56Z

!test --diff

naoyam · 2024-12-25T04:20:10Z

!test --diff

jjsjann123 · 2024-12-30T17:59:25Z

tests/cpp/test_preseg_passes.cpp

+  auto tv2 = cat({tv0, tv0}, 1);
+
+  fusion.addOutput(tv1);
+  fusion.addOutput(tv2);


out of curiosity, when we say there's nothing to allow the output IDs to be mapped, Is it saying that IDs of tv1 and tv2 are dis-connected?

Since the iter domain of tv1 and tv2 are produced by the same IDs with constants, wouldn't IdModel still be able to identify them as exact mapping?

First of all, the comment at line 911 is wrong. Both cat ops use the same ID.

Yes, tv1 and tv2 are disconnected. No property that we use to build IdModel would detect the two IDs of the tensors have the same extent. It doesn't mean we can't extend IdModel to detect that, it's just not how it's implemented at this moment. Unless there's any motivating case, I don't think we would need to worry too much about it.

jjsjann123 · 2024-12-30T18:32:09Z

csrc/preseg_passes/translate_repeat_to_expand.cpp

+  void inspect() {
+    const auto exprs = fusion_->exprs();
+
+    for (auto pad : ir_utils::filterByType<PadOp>(exprs)) {


QQ: what's the reason that we choose to filter on PadOp here, instead of looking for CatOp and iterate through all its input?

It's just a nitpicking question, I think when we iterate through each PadOp, we could repetitively remove entries and adding things back into repeat_info_map_.

I think I was initially also thinking about supporting concatenation without CatOp. I think I have seen a sequence of ops like a PadOp followed by an addition, maybe in some backward kernels.

I dropped that since I don't find that pattern for repetitions.
4c1abda

csrc/preseg_passes/translate_repeat_to_expand.cpp

naoyam · 2024-12-30T23:56:35Z

!test

jjsjann123

LGTM

naoyam · 2024-12-31T04:43:00Z

!build

Adds `repeat` as an alias op as well as the `RepeatOp` IR node. The `repeat` op has almost the same semantics as the PyTorch repeat. The main motivation is to fix #3682, which is due to #3645, which introduced a preseg pass that detects and translates a repeat pattern to broadcast, expand and reshape. The issue of #3682 is because that the translation-based method does not work when a broadcast ID is repeated. I originally just used `TensorDomain::flatten` (https://github.com/NVIDIA/Fuser/blob/main/csrc/ir/nodes.cpp#L3674-L3740), which just merges broadcast IDs. However, for reshape, it should not merge but squeeze them. Merging broadcast IDs triggered an assertion of the transpose scheduler as seen in #3682. `TensorDomain::flatten` needs to be fixed (#3691), but that's a separate issue. For fixing #3682, since repeating broadcast IDs cannot be translated to the broadcast-expand-reshape pattern anyway, I added the new `RepeatOp` node. I initially thought it could be just `LoadStoreOp` but decided to have a different IR node since, unlike usual LoadStore case, some of the broadcast IDs of a producer becomes concrete IDs in the corresponding consumer logical domain. I did actually try using `LoadStoreOp` but some of the preseg passes complained the mismatched broadcast pattern. Repeating non-broadcast IDs is still done by the broadcast-expand-reshape patten. Only for repeating broadcast IDs gets represented using the `RepeatOp` node. Fixes #3682

Translate the repetition pattern with expand and reshape

e25a464

cleanup

330a62f

naoyam added the rope label Dec 25, 2024

naoyam added 2 commits December 24, 2024 20:09

Drop support of addition-based concat as it isn't immediately necessary

4c1abda

Merge remote-tracking branch 'origin/main' into translate_repeat_pattern

e5fcf14

remove

baebd7b

naoyam marked this pull request as ready for review December 25, 2024 04:20

naoyam requested a review from jjsjann123 December 25, 2024 04:20

naoyam mentioned this pull request Dec 25, 2024

Feature request: Schedule a small matmul op as a reduction (or pointwise) op #3646

Open

naoyam changed the title ~~Translate the repetition pattern with expand and reshape~~ Translate the repetition pattern to expand and reshape Dec 25, 2024

jjsjann123 reviewed Dec 30, 2024

View reviewed changes

fix

498c8da

Merge branch 'main' into translate_repeat_pattern

45bef63

jjsjann123 approved these changes Dec 31, 2024

View reviewed changes

naoyam merged commit 6466834 into main Dec 31, 2024
15 of 16 checks passed

naoyam deleted the translate_repeat_pattern branch December 31, 2024 05:22

naoyam mentioned this pull request Jan 9, 2025

Introduce repeat and RepeatOp #3687

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Translate the repetition pattern to expand and reshape #3645

Translate the repetition pattern to expand and reshape #3645

naoyam commented Dec 25, 2024 •

edited

Loading

naoyam commented Dec 25, 2024

naoyam commented Dec 25, 2024

naoyam commented Dec 25, 2024

jjsjann123 Dec 30, 2024

naoyam Dec 30, 2024

jjsjann123 Dec 30, 2024

naoyam Dec 30, 2024

naoyam commented Dec 30, 2024

jjsjann123 left a comment

naoyam commented Dec 31, 2024

Translate the repetition pattern to expand and reshape #3645

Translate the repetition pattern to expand and reshape #3645

Conversation

naoyam commented Dec 25, 2024 • edited Loading

naoyam commented Dec 25, 2024

naoyam commented Dec 25, 2024

naoyam commented Dec 25, 2024

jjsjann123 Dec 30, 2024

Choose a reason for hiding this comment

naoyam Dec 30, 2024

Choose a reason for hiding this comment

jjsjann123 Dec 30, 2024

Choose a reason for hiding this comment

naoyam Dec 30, 2024

Choose a reason for hiding this comment

naoyam commented Dec 30, 2024

jjsjann123 left a comment

Choose a reason for hiding this comment

naoyam commented Dec 31, 2024

naoyam commented Dec 25, 2024 •

edited

Loading