-
Notifications
You must be signed in to change notification settings - Fork 53
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
General resize support in propagateResize #3457
Conversation
!test |
3 similar comments
!test |
!test |
!test |
!build |
!test |
csrc/scheduler/vectorize_helper.cpp
Outdated
// (return factor as-is), but we take gcd(abs(factor), extent) when | ||
// slicing on left. This is a conservative analysis of the offset for data | ||
// accessing. A better approach needs to consider the actual start pointer | ||
// address and handle it in alignment analysis in runtime info. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The handling of the negative extent here is our conservative handling of offset check. We do not consider the actual offset of the tensor. We would miss out on opportunities where slice could have returned an offset valid for vectorization, when neither the tensor ptr offset nor the extent here are legit for vectorization.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wonder why the right-side expansion factor is only considered when it's positive. Can't we ignore it no matter if it's positive or negative?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
offline comment from @naoyam , how about stacked resize, is it safe to ignore the negative right-side resize extent.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Update as suggested.
csrc/scheduler/vectorize_helper.cpp
Outdated
} | ||
return mappers; | ||
} | ||
|
||
std::unordered_map<TensorView*, TensorResizeAlignmentInfo> | ||
mapResizeAlignmentToInputs(TensorView* ref) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I went off a different approach from our conversation last week. Since the second pass to retrieve the resize is somewhat separate from the existing analysis, I think it's a good opportunity to do the right thing and use IdGraph to detect resize.
csrc/scheduler/vectorize_helper.cpp
Outdated
|
||
for (const auto& [expr_g, dir] : fwd_path) { | ||
Expr* expr = expr_g->front(); | ||
if (expr->isA<Resize>()) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm also keeping the analysis conservative here, any resize encountered would assume that it broke the collapsing of its outer iter domain, hence adding it to the stride check for alignment.
!test |
CI is green. The failing test doesn't look like related. 🥳 |
!test --diff |
1 similar comment
!test --diff |
I'm having a hard time to understand what change corresponds to what. Does it make sense to divide this PR to smaller PRs? |
A fair request. I'll break it up. |
🙇 |
@naoyam break it into 3 PRs. PR description has link to the other two. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we have tests for the new capability?
csrc/scheduler/vectorize_helper.cpp
Outdated
// (return factor as-is), but we take gcd(abs(factor), extent) when | ||
// slicing on left. This is a conservative analysis of the offset for data | ||
// accessing. A better approach needs to consider the actual start pointer | ||
// address and handle it in alignment analysis in runtime info. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wonder why the right-side expansion factor is only considered when it's positive. Can't we ignore it no matter if it's positive or negative?
47eabff
to
7761ed1
Compare
The only change is that we removed a test, which used to check that we disable vectorization when pad with negative extent. In terms of testing for the entire stacked PRs, I was re-using the manual slice tests and swtiched them to use the vectorization analysis to check if we are making the right calls. Those are in the last one of the stacked PR. https://github.com/NVIDIA/Fuser/pull/3529/files#diff-98e2e8f5cb3849584331987b87d7cc114cb88cfddddd2f739ca8b452d92f4ece |
IIUC, this PR changes the analysis of the vectorization factor. Since it's a major part of the analysis, I think it should come with unit tests. The alignment analysis should also have its own tests. While adding these fine-grained tests may be boring and time-consuming, this is one of the most important and challenging analysis, so I believe it's really important to have a good coverage. |
csrc/scheduler/vectorize_helper.cpp
Outdated
std::vector<Val*> expands = { | ||
resize_op->leftExpand(), resize_op->rightExpand()}; | ||
if (!p2c) { | ||
// reverse pad extent for c2p propagation | ||
std::for_each(expands.begin(), expands.end(), [](Val*& val) { | ||
val = SimplifyingIrBuilder::negExpr(val); | ||
}); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Unused?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note the lambda signature, Val*&
This one is modifying in place...
I'll update the comment to note that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Still unclear to me. I don't see any usage of expands
after this part. What am I missing?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
no you didn't, I missed to apply the expands onto projected_extent.
NOTE for myself, adding a unit test to verify this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So, this wasn't necessary after all?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I just figured out an easier way to do this. 🤕
I'm just trying to get the destination size and somehow I was obsessed with computing it from the source. Forgot that I can just do id_to->extent()
instead.
sounds fair. Let me look into that. |
7761ed1
to
f7023d7
Compare
|
||
} // namespace | ||
|
||
using VectorizationAnalysisTest = NVFuserTest; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@naoyam unit test added as suggested. Thanks a lot for calling that out.
!test --diff-bench |
csrc/scheduler/vectorize_helper.cpp
Outdated
projected_extent = comp(projected_extent, resize_op->leftExpand()); | ||
projected_extent = comp(projected_extent, resize_op->rightExpand()); | ||
|
||
// cap extent by the destination |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you give an example where this matters?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I pointed it out in the test
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks. I guess that is the only case where taking the min would matter, right? Could you add a comment why this is necessary?
auto inner_neg_large = | ||
pad(tv0, {IrBuilder::create<Val>(-8L), IrBuilder::create<Val>(-8L)}); | ||
// output id with extent 0 cannot be vectorized | ||
expection_list.emplace_back(std::make_pair(inner_neg_large, 0)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@naoyam This would be the example.
if we resize and eliminate the destination extent
…tion. (no longer specialized support for resize in PadOp);
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. Thanks for adding the tests.
f7023d7
to
b070eac
Compare
!test --diff-bench |
Previously, vectorization analysis can only support resize in `PadOp` with positive extents: 1. General resize operation or negative extent would exclude the resized iter domain to participate in vectorized data movement; 2. Sliced inputs wouldn't have vectorized load. This is a series of stacked PRs to adds support in vectorization analysis for general resize and it allows vectorized load on sliced inputs as well. Order of PRs: 1. Adding general support for `resize` op in `propagateResize` during projection; Adding support for negative resize extent in propagation. #3457 [with updated more restrictive analysis on 1, the second PR is only optional at this point.] This PR is dropped ~2. Adding alignment check on stride for resize-introduced non-contiguity, where a contiguous dimension becomes non-contiguous due to resize on its immediate inner dimension. #3528~ **_3. Enable vectorized load on slice, refactoring slice vectorize manual test to use automatic scheduler instead._**
Previously, vectorization analysis can only support resize in
PadOp
with positive extents: 1. General resize operation or negative extent would exclude the resized iter domain to participate in vectorized data movement; 2. Sliced inputs wouldn't have vectorized load.This is a series of stacked PRs to adds support in vectorization analysis for general resize and it allows vectorized load on sliced inputs as well.
Order of PRs:
1. Adding general support for
resize
op inpropagateResize
during projection; Adding support for negative resize extent in propagation.Adding alignment check on stride for resize-introduced non-contiguity, where a contiguous dimension becomes non-contiguous due to resize on its immediate inner dimension. Adding alignment check to handle stride check for resize-introduced non-contiguous access #3528
Enable vectorized load on slice, refactoring slice vectorize manual test to use automatic scheduler instead. Enabling vectorization load for Slice #3529