General resize support in propagateResize #3457

jjsjann123 · 2024-11-21T18:06:05Z

Previously, vectorization analysis can only support resize in PadOp with positive extents: 1. General resize operation or negative extent would exclude the resized iter domain to participate in vectorized data movement; 2. Sliced inputs wouldn't have vectorized load.

This is a series of stacked PRs to adds support in vectorization analysis for general resize and it allows vectorized load on sliced inputs as well.

Order of PRs:
1. Adding general support for resize op in propagateResize during projection; Adding support for negative resize extent in propagation.

Adding alignment check on stride for resize-introduced non-contiguity, where a contiguous dimension becomes non-contiguous due to resize on its immediate inner dimension. Adding alignment check to handle stride check for resize-introduced non-contiguous access #3528
Enable vectorized load on slice, refactoring slice vectorize manual test to use automatic scheduler instead. Enabling vectorization load for Slice #3529

jjsjann123 · 2024-11-21T18:08:43Z

!test

jjsjann123 · 2024-11-26T07:04:10Z

!test

jjsjann123 · 2024-11-26T19:10:47Z

!test

jjsjann123 · 2024-11-26T23:47:43Z

!test

jjsjann123 · 2024-11-27T00:08:58Z

!build

jjsjann123 · 2024-11-27T00:32:11Z

!test

jjsjann123 · 2024-11-27T01:28:10Z

csrc/scheduler/vectorize_helper.cpp

+      // (return factor as-is), but we take gcd(abs(factor), extent) when
+      // slicing on left. This is a conservative analysis of the offset for data
+      // accessing. A better approach needs to consider the actual start pointer
+      // address and handle it in alignment analysis in runtime info.


The handling of the negative extent here is our conservative handling of offset check. We do not consider the actual offset of the tensor. We would miss out on opportunities where slice could have returned an offset valid for vectorization, when neither the tensor ptr offset nor the extent here are legit for vectorization.

I wonder why the right-side expansion factor is only considered when it's positive. Can't we ignore it no matter if it's positive or negative?

offline comment from @naoyam , how about stacked resize, is it safe to ignore the negative right-side resize extent.

Update as suggested.

jjsjann123 · 2024-11-27T01:30:22Z

csrc/scheduler/vectorize_helper.cpp

  }
  return mappers;
 }

+std::unordered_map<TensorView*, TensorResizeAlignmentInfo>
+mapResizeAlignmentToInputs(TensorView* ref) {


I went off a different approach from our conversation last week. Since the second pass to retrieve the resize is somewhat separate from the existing analysis, I think it's a good opportunity to do the right thing and use IdGraph to detect resize.

jjsjann123 · 2024-11-27T01:31:30Z

csrc/scheduler/vectorize_helper.cpp

+
+      for (const auto& [expr_g, dir] : fwd_path) {
+        Expr* expr = expr_g->front();
+        if (expr->isA<Resize>()) {


I'm also keeping the analysis conservative here, any resize encountered would assume that it broke the collapsing of its outer iter domain, hence adding it to the stride check for alignment.

jjsjann123 · 2024-11-27T03:48:38Z

!test

jjsjann123 · 2024-11-27T17:42:20Z

CI is green. The failing test doesn't look like related. 🥳

jjsjann123 · 2024-12-02T16:59:25Z

!test --diff

jjsjann123 · 2024-12-02T18:14:01Z

!test --diff

naoyam · 2024-12-04T03:21:04Z

Adding general support for resize op in propagateResize during projection. (no longer specialized support for resize in PadOp);

Adding alignment check to address extra stride check on tensor, where a contiguous dimension becomes non-contiguous due to resize operation;

Refactoring slice vectorize manual test to us automatic pw-scheduler and the vectorization analysis.

I'm having a hard time to understand what change corresponds to what. Does it make sense to divide this PR to smaller PRs?

jjsjann123 · 2024-12-04T03:24:43Z

Adding general support for resize op in propagateResize during projection. (no longer specialized support for resize in PadOp);

Adding alignment check to address extra stride check on tensor, where a contiguous dimension becomes non-contiguous due to resize operation;

Refactoring slice vectorize manual test to us automatic pw-scheduler and the vectorization analysis.

I'm having a hard time to understand what change corresponds to what. Does it make sense to divide this PR to smaller PRs?

A fair request. I'll break it up.

naoyam · 2024-12-04T03:29:18Z

🙇

jjsjann123 · 2024-12-05T11:05:16Z

@naoyam break it into 3 PRs. PR description has link to the other two.

naoyam

Do we have tests for the new capability?

csrc/scheduler/vectorize_helper.cpp

naoyam · 2024-12-06T01:39:56Z

csrc/scheduler/vectorize_helper.cpp

+      // (return factor as-is), but we take gcd(abs(factor), extent) when
+      // slicing on left. This is a conservative analysis of the offset for data
+      // accessing. A better approach needs to consider the actual start pointer
+      // address and handle it in alignment analysis in runtime info.


I wonder why the right-side expansion factor is only considered when it's positive. Can't we ignore it no matter if it's positive or negative?

jjsjann123 · 2024-12-07T23:45:19Z

Do we have tests for the new capability?

The only change is that we removed a test, which used to check that we disable vectorization when pad with negative extent.
I don't really have many tests here, since this update itself isn't safe. We need the second PR that adds check on stride of the dimension that's on the left of the resize.

In terms of testing for the entire stacked PRs, I was re-using the manual slice tests and swtiched them to use the vectorization analysis to check if we are making the right calls. Those are in the last one of the stacked PR. https://github.com/NVIDIA/Fuser/pull/3529/files#diff-98e2e8f5cb3849584331987b87d7cc114cb88cfddddd2f739ca8b452d92f4ece

naoyam · 2024-12-17T16:05:49Z

Do we have tests for the new capability?

The only change is that we removed a test, which used to check that we disable vectorization when pad with negative extent. I don't really have many tests here, since this update itself isn't safe. We need the second PR that adds check on stride of the dimension that's on the left of the resize.

In terms of testing for the entire stacked PRs, I was re-using the manual slice tests and swtiched them to use the vectorization analysis to check if we are making the right calls. Those are in the last one of the stacked PR. https://github.com/NVIDIA/Fuser/pull/3529/files#diff-98e2e8f5cb3849584331987b87d7cc114cb88cfddddd2f739ca8b452d92f4ece

IIUC, this PR changes the analysis of the vectorization factor. Since it's a major part of the analysis, I think it should come with unit tests. The alignment analysis should also have its own tests.

While adding these fine-grained tests may be boring and time-consuming, this is one of the most important and challenging analysis, so I believe it's really important to have a good coverage.

csrc/scheduler/vectorize_helper.cpp

naoyam · 2024-12-17T16:46:37Z

csrc/scheduler/vectorize_helper.cpp

+      std::vector<Val*> expands = {
+          resize_op->leftExpand(), resize_op->rightExpand()};
+      if (!p2c) {
+        // reverse pad extent for c2p propagation
+        std::for_each(expands.begin(), expands.end(), [](Val*& val) {
+          val = SimplifyingIrBuilder::negExpr(val);
+        });


Note the lambda signature, Val*&
This one is modifying in place...

I'll update the comment to note that.

Still unclear to me. I don't see any usage of expands after this part. What am I missing?

no you didn't, I missed to apply the expands onto projected_extent.

NOTE for myself, adding a unit test to verify this.

So, this wasn't necessary after all?

I just figured out an easier way to do this. 🤕

I'm just trying to get the destination size and somehow I was obsessed with computing it from the source. Forgot that I can just do id_to->extent() instead.

jjsjann123 · 2024-12-17T20:03:33Z

Do we have tests for the new capability?

The only change is that we removed a test, which used to check that we disable vectorization when pad with negative extent. I don't really have many tests here, since this update itself isn't safe. We need the second PR that adds check on stride of the dimension that's on the left of the resize.
In terms of testing for the entire stacked PRs, I was re-using the manual slice tests and swtiched them to use the vectorization analysis to check if we are making the right calls. Those are in the last one of the stacked PR. https://github.com/NVIDIA/Fuser/pull/3529/files#diff-98e2e8f5cb3849584331987b87d7cc114cb88cfddddd2f739ca8b452d92f4ece

IIUC, this PR changes the analysis of the vectorization factor. Since it's a major part of the analysis, I think it should come with unit tests. The alignment analysis should also have its own tests.

While adding these fine-grained tests may be boring and time-consuming, this is one of the most important and challenging analysis, so I believe it's really important to have a good coverage.

sounds fair. Let me look into that.

jjsjann123 · 2024-12-18T19:57:12Z

tests/cpp/test_vectorization_analysis.cpp

+
+} // namespace
+
+using VectorizationAnalysisTest = NVFuserTest;


@naoyam unit test added as suggested. Thanks a lot for calling that out.

jjsjann123 · 2024-12-18T23:06:35Z

!test --diff-bench

naoyam · 2024-12-19T03:00:13Z

csrc/scheduler/vectorize_helper.cpp

+      projected_extent = comp(projected_extent, resize_op->leftExpand());
+      projected_extent = comp(projected_extent, resize_op->rightExpand());
+
+      // cap extent by the destination


Can you give an example where this matters?

I pointed it out in the test

Thanks. I guess that is the only case where taking the min would matter, right? Could you add a comment why this is necessary?

jjsjann123 · 2024-12-19T03:12:02Z

tests/cpp/test_vectorization_analysis.cpp

+  auto inner_neg_large =
+      pad(tv0, {IrBuilder::create<Val>(-8L), IrBuilder::create<Val>(-8L)});
+  // output id with extent 0 cannot be vectorized
+  expection_list.emplace_back(std::make_pair(inner_neg_large, 0));


@naoyam This would be the example.

if we resize and eliminate the destination extent

…tion. (no longer specialized support for resize in PadOp);

naoyam

LGTM. Thanks for adding the tests.

jjsjann123 · 2024-12-19T04:54:28Z

!test --diff-bench

Previously, vectorization analysis can only support resize in `PadOp` with positive extents: 1. General resize operation or negative extent would exclude the resized iter domain to participate in vectorized data movement; 2. Sliced inputs wouldn't have vectorized load. This is a series of stacked PRs to adds support in vectorization analysis for general resize and it allows vectorized load on sliced inputs as well. Order of PRs: 1. Adding general support for `resize` op in `propagateResize` during projection; Adding support for negative resize extent in propagation. #3457 [with updated more restrictive analysis on 1, the second PR is only optional at this point.] This PR is dropped ~2. Adding alignment check on stride for resize-introduced non-contiguity, where a contiguous dimension becomes non-contiguous due to resize on its immediate inner dimension. #3528~ **_3. Enable vectorized load on slice, refactoring slice vectorize manual test to use automatic scheduler instead._**

jjsjann123 commented Nov 27, 2024

View reviewed changes

jjsjann123 marked this pull request as ready for review November 27, 2024 01:31

jjsjann123 changed the title ~~[WIP][DO NOT REVIEW] Enable slice in vectorization analysis~~ Enable general resize in vectorization analysis Nov 27, 2024

jjsjann123 requested a review from naoyam November 27, 2024 01:32

jjsjann123 marked this pull request as draft December 4, 2024 03:24

jjsjann123 changed the title ~~Enable general resize in vectorization analysis~~ General resize support in propagateResize Dec 5, 2024

This was referenced Dec 5, 2024

Adding alignment check to handle stride check for resize-introduced non-contiguous access #3528

Closed

Enabling vectorization load for Slice #3529

Merged

jjsjann123 marked this pull request as ready for review December 5, 2024 11:03

naoyam reviewed Dec 6, 2024

View reviewed changes

jjsjann123 force-pushed the jjsjann123/slice_vec_analysis branch from 47eabff to 7761ed1 Compare December 7, 2024 23:35

jjsjann123 requested a review from naoyam December 9, 2024 18:11

naoyam reviewed Dec 17, 2024

View reviewed changes

csrc/scheduler/vectorize_helper.cpp Outdated Show resolved Hide resolved

naoyam reviewed Dec 17, 2024

View reviewed changes

jjsjann123 force-pushed the jjsjann123/slice_vec_analysis branch from 7761ed1 to f7023d7 Compare December 18, 2024 19:56

jjsjann123 commented Dec 18, 2024

View reviewed changes

jjsjann123 requested a review from naoyam December 18, 2024 19:58

naoyam reviewed Dec 19, 2024

View reviewed changes

jjsjann123 commented Dec 19, 2024

View reviewed changes

Adding general support for resize op in propagateResize during projec…

b070eac

…tion. (no longer specialized support for resize in PadOp);

naoyam approved these changes Dec 19, 2024

View reviewed changes

jjsjann123 force-pushed the jjsjann123/slice_vec_analysis branch from f7023d7 to b070eac Compare December 19, 2024 04:53

jjsjann123 merged commit 56ad61c into main Dec 19, 2024
58 checks passed

jjsjann123 deleted the jjsjann123/slice_vec_analysis branch December 19, 2024 17:02


		} // namespace

		using VectorizationAnalysisTest = NVFuserTest;

General resize support in propagateResize #3457

General resize support in propagateResize #3457

Conversation

jjsjann123 commented Nov 21, 2024 • edited Loading

jjsjann123 commented Nov 21, 2024

jjsjann123 commented Nov 26, 2024

jjsjann123 commented Nov 26, 2024

jjsjann123 commented Nov 26, 2024

jjsjann123 commented Nov 27, 2024

jjsjann123 commented Nov 27, 2024

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jjsjann123 commented Nov 27, 2024

jjsjann123 commented Nov 27, 2024

jjsjann123 commented Dec 2, 2024

jjsjann123 commented Dec 2, 2024

naoyam commented Dec 4, 2024

jjsjann123 commented Dec 4, 2024

naoyam commented Dec 4, 2024

jjsjann123 commented Dec 5, 2024

naoyam left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jjsjann123 commented Dec 7, 2024 • edited Loading

naoyam commented Dec 17, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jjsjann123 commented Dec 17, 2024

Choose a reason for hiding this comment

jjsjann123 commented Dec 18, 2024

Choose a reason for hiding this comment

Choose a reason for hiding this comment

naoyam Dec 19, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

naoyam left a comment

Choose a reason for hiding this comment

jjsjann123 commented Dec 19, 2024

jjsjann123 commented Nov 21, 2024 •

edited

Loading

jjsjann123 commented Dec 7, 2024 •

edited

Loading

naoyam commented Dec 17, 2024 •

edited

Loading

naoyam Dec 19, 2024 •

edited

Loading