Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

General resize support in propagateResize #3457

Merged
merged 1 commit into from
Dec 19, 2024

Conversation

jjsjann123
Copy link
Collaborator

@jjsjann123 jjsjann123 commented Nov 21, 2024

Previously, vectorization analysis can only support resize in PadOp with positive extents: 1. General resize operation or negative extent would exclude the resized iter domain to participate in vectorized data movement; 2. Sliced inputs wouldn't have vectorized load.

This is a series of stacked PRs to adds support in vectorization analysis for general resize and it allows vectorized load on sliced inputs as well.

Order of PRs:
1. Adding general support for resize op in propagateResize during projection; Adding support for negative resize extent in propagation.

  1. Adding alignment check on stride for resize-introduced non-contiguity, where a contiguous dimension becomes non-contiguous due to resize on its immediate inner dimension. Adding alignment check to handle stride check for resize-introduced non-contiguous access #3528

  2. Enable vectorized load on slice, refactoring slice vectorize manual test to use automatic scheduler instead. Enabling vectorization load for Slice #3529

@jjsjann123
Copy link
Collaborator Author

!test

3 similar comments
@jjsjann123
Copy link
Collaborator Author

!test

@jjsjann123
Copy link
Collaborator Author

!test

@jjsjann123
Copy link
Collaborator Author

!test

@jjsjann123
Copy link
Collaborator Author

!build

@jjsjann123
Copy link
Collaborator Author

!test

// (return factor as-is), but we take gcd(abs(factor), extent) when
// slicing on left. This is a conservative analysis of the offset for data
// accessing. A better approach needs to consider the actual start pointer
// address and handle it in alignment analysis in runtime info.
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The handling of the negative extent here is our conservative handling of offset check. We do not consider the actual offset of the tensor. We would miss out on opportunities where slice could have returned an offset valid for vectorization, when neither the tensor ptr offset nor the extent here are legit for vectorization.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder why the right-side expansion factor is only considered when it's positive. Can't we ignore it no matter if it's positive or negative?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

offline comment from @naoyam , how about stacked resize, is it safe to ignore the negative right-side resize extent.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Update as suggested.

}
return mappers;
}

std::unordered_map<TensorView*, TensorResizeAlignmentInfo>
mapResizeAlignmentToInputs(TensorView* ref) {
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I went off a different approach from our conversation last week. Since the second pass to retrieve the resize is somewhat separate from the existing analysis, I think it's a good opportunity to do the right thing and use IdGraph to detect resize.


for (const auto& [expr_g, dir] : fwd_path) {
Expr* expr = expr_g->front();
if (expr->isA<Resize>()) {
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm also keeping the analysis conservative here, any resize encountered would assume that it broke the collapsing of its outer iter domain, hence adding it to the stride check for alignment.

@jjsjann123 jjsjann123 marked this pull request as ready for review November 27, 2024 01:31
@jjsjann123 jjsjann123 changed the title [WIP][DO NOT REVIEW] Enable slice in vectorization analysis Enable general resize in vectorization analysis Nov 27, 2024
@jjsjann123 jjsjann123 requested a review from naoyam November 27, 2024 01:32
@jjsjann123
Copy link
Collaborator Author

!test

@jjsjann123
Copy link
Collaborator Author

CI is green. The failing test doesn't look like related. 🥳

@jjsjann123
Copy link
Collaborator Author

!test --diff

1 similar comment
@jjsjann123
Copy link
Collaborator Author

!test --diff

@naoyam
Copy link
Collaborator

naoyam commented Dec 4, 2024

  • Adding general support for resize op in propagateResize during projection. (no longer specialized support for resize in PadOp);
  • Adding alignment check to address extra stride check on tensor, where a contiguous dimension becomes non-contiguous due to resize operation;
  • Refactoring slice vectorize manual test to us automatic pw-scheduler and the vectorization analysis.

I'm having a hard time to understand what change corresponds to what. Does it make sense to divide this PR to smaller PRs?

@jjsjann123
Copy link
Collaborator Author

  • Adding general support for resize op in propagateResize during projection. (no longer specialized support for resize in PadOp);
  • Adding alignment check to address extra stride check on tensor, where a contiguous dimension becomes non-contiguous due to resize operation;
  • Refactoring slice vectorize manual test to us automatic pw-scheduler and the vectorization analysis.

I'm having a hard time to understand what change corresponds to what. Does it make sense to divide this PR to smaller PRs?

A fair request. I'll break it up.

@jjsjann123 jjsjann123 marked this pull request as draft December 4, 2024 03:24
@naoyam
Copy link
Collaborator

naoyam commented Dec 4, 2024

🙇

@jjsjann123 jjsjann123 changed the title Enable general resize in vectorization analysis General resize support in propagateResize Dec 5, 2024
@jjsjann123 jjsjann123 marked this pull request as ready for review December 5, 2024 11:03
@jjsjann123
Copy link
Collaborator Author

@naoyam break it into 3 PRs. PR description has link to the other two.

Copy link
Collaborator

@naoyam naoyam left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we have tests for the new capability?

csrc/scheduler/vectorize_helper.cpp Outdated Show resolved Hide resolved
// (return factor as-is), but we take gcd(abs(factor), extent) when
// slicing on left. This is a conservative analysis of the offset for data
// accessing. A better approach needs to consider the actual start pointer
// address and handle it in alignment analysis in runtime info.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder why the right-side expansion factor is only considered when it's positive. Can't we ignore it no matter if it's positive or negative?

@jjsjann123 jjsjann123 force-pushed the jjsjann123/slice_vec_analysis branch from 47eabff to 7761ed1 Compare December 7, 2024 23:35
@jjsjann123
Copy link
Collaborator Author

jjsjann123 commented Dec 7, 2024

Do we have tests for the new capability?

The only change is that we removed a test, which used to check that we disable vectorization when pad with negative extent.
I don't really have many tests here, since this update itself isn't safe. We need the second PR that adds check on stride of the dimension that's on the left of the resize.

In terms of testing for the entire stacked PRs, I was re-using the manual slice tests and swtiched them to use the vectorization analysis to check if we are making the right calls. Those are in the last one of the stacked PR. https://github.com/NVIDIA/Fuser/pull/3529/files#diff-98e2e8f5cb3849584331987b87d7cc114cb88cfddddd2f739ca8b452d92f4ece

@jjsjann123 jjsjann123 requested a review from naoyam December 9, 2024 18:11
@naoyam
Copy link
Collaborator

naoyam commented Dec 17, 2024

Do we have tests for the new capability?

The only change is that we removed a test, which used to check that we disable vectorization when pad with negative extent. I don't really have many tests here, since this update itself isn't safe. We need the second PR that adds check on stride of the dimension that's on the left of the resize.

In terms of testing for the entire stacked PRs, I was re-using the manual slice tests and swtiched them to use the vectorization analysis to check if we are making the right calls. Those are in the last one of the stacked PR. https://github.com/NVIDIA/Fuser/pull/3529/files#diff-98e2e8f5cb3849584331987b87d7cc114cb88cfddddd2f739ca8b452d92f4ece

IIUC, this PR changes the analysis of the vectorization factor. Since it's a major part of the analysis, I think it should come with unit tests. The alignment analysis should also have its own tests.

While adding these fine-grained tests may be boring and time-consuming, this is one of the most important and challenging analysis, so I believe it's really important to have a good coverage.

Comment on lines 389 to 395
std::vector<Val*> expands = {
resize_op->leftExpand(), resize_op->rightExpand()};
if (!p2c) {
// reverse pad extent for c2p propagation
std::for_each(expands.begin(), expands.end(), [](Val*& val) {
val = SimplifyingIrBuilder::negExpr(val);
});
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unused?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note the lambda signature, Val*&
This one is modifying in place...

I'll update the comment to note that.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Still unclear to me. I don't see any usage of expands after this part. What am I missing?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no you didn't, I missed to apply the expands onto projected_extent.

NOTE for myself, adding a unit test to verify this.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So, this wasn't necessary after all?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just figured out an easier way to do this. 🤕

I'm just trying to get the destination size and somehow I was obsessed with computing it from the source. Forgot that I can just do id_to->extent() instead.

@jjsjann123
Copy link
Collaborator Author

Do we have tests for the new capability?

The only change is that we removed a test, which used to check that we disable vectorization when pad with negative extent. I don't really have many tests here, since this update itself isn't safe. We need the second PR that adds check on stride of the dimension that's on the left of the resize.
In terms of testing for the entire stacked PRs, I was re-using the manual slice tests and swtiched them to use the vectorization analysis to check if we are making the right calls. Those are in the last one of the stacked PR. https://github.com/NVIDIA/Fuser/pull/3529/files#diff-98e2e8f5cb3849584331987b87d7cc114cb88cfddddd2f739ca8b452d92f4ece

IIUC, this PR changes the analysis of the vectorization factor. Since it's a major part of the analysis, I think it should come with unit tests. The alignment analysis should also have its own tests.

While adding these fine-grained tests may be boring and time-consuming, this is one of the most important and challenging analysis, so I believe it's really important to have a good coverage.

sounds fair. Let me look into that.

@jjsjann123 jjsjann123 force-pushed the jjsjann123/slice_vec_analysis branch from 7761ed1 to f7023d7 Compare December 18, 2024 19:56

} // namespace

using VectorizationAnalysisTest = NVFuserTest;
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@naoyam unit test added as suggested. Thanks a lot for calling that out.

@jjsjann123 jjsjann123 requested a review from naoyam December 18, 2024 19:58
@jjsjann123
Copy link
Collaborator Author

!test --diff-bench

projected_extent = comp(projected_extent, resize_op->leftExpand());
projected_extent = comp(projected_extent, resize_op->rightExpand());

// cap extent by the destination
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you give an example where this matters?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I pointed it out in the test

Copy link
Collaborator

@naoyam naoyam Dec 19, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks. I guess that is the only case where taking the min would matter, right? Could you add a comment why this is necessary?

auto inner_neg_large =
pad(tv0, {IrBuilder::create<Val>(-8L), IrBuilder::create<Val>(-8L)});
// output id with extent 0 cannot be vectorized
expection_list.emplace_back(std::make_pair(inner_neg_large, 0));
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@naoyam This would be the example.

if we resize and eliminate the destination extent

…tion. (no longer specialized support for resize in PadOp);
Copy link
Collaborator

@naoyam naoyam left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Thanks for adding the tests.

@jjsjann123 jjsjann123 force-pushed the jjsjann123/slice_vec_analysis branch from f7023d7 to b070eac Compare December 19, 2024 04:53
@jjsjann123
Copy link
Collaborator Author

!test --diff-bench

@jjsjann123 jjsjann123 merged commit 56ad61c into main Dec 19, 2024
58 checks passed
@jjsjann123 jjsjann123 deleted the jjsjann123/slice_vec_analysis branch December 19, 2024 17:02
jjsjann123 added a commit that referenced this pull request Dec 23, 2024
Previously, vectorization analysis can only support resize in `PadOp`
with positive extents: 1. General resize operation or negative extent
would exclude the resized iter domain to participate in vectorized data
movement; 2. Sliced inputs wouldn't have vectorized load.

This is a series of stacked PRs to adds support in vectorization
analysis for general resize and it allows vectorized load on sliced
inputs as well.

Order of PRs:
1. Adding general support for `resize` op in `propagateResize` during
projection; Adding support for negative resize extent in propagation.
#3457

[with updated more restrictive analysis on 1, the second PR is only
optional at this point.] This PR is dropped
~2. Adding alignment check on stride for resize-introduced
non-contiguity, where a contiguous dimension becomes non-contiguous due
to resize on its immediate inner dimension. #3528~

**_3. Enable vectorized load on slice, refactoring slice vectorize
manual test to use automatic scheduler instead._**
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants