-
Notifications
You must be signed in to change notification settings - Fork 13k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[mlir] computeSliceParameters: Fix offset when m(0) != 0 #122492
[mlir] computeSliceParameters: Fix offset when m(0) != 0 #122492
Conversation
like `affine_map<(d0) -> (d0 + 3)` in ``` %generic = linalg.generic {indexing_maps = [affine_map<(d0) -> (d0 + 3)>, affine_map<(d0) -> (d0)>], iterator_types = ["parallel"]} ins(%arg0: tensor<9xf32>) outs(%empty : tensor<6xf32>) { ^bb0(%in : f32, %out: f32): linalg.yield %in : f32 } -> tensor<6xf32> ``` tiling currently computes the wrong slice offsets. When tiling above example with a size of 3, it would compute ``` scf.for %i = ... %slice = tensor.extract_slice %arg0[%i + 3] [6] [1] linalg.generic {indexing_maps = [affine_map<(d0) -> (d0 + 3)>, affine_map<(d0) -> (d0)>], iterator_types = ["parallel"]} ins(%slice: tensor<6xf32>) ``` and thus apply the `+3` twice (once in the extract slice and a second time in the linalg.generic). This PR fixes this to yield an offset of `tensor.extract_slice %arg0[%i] [6] [1]` instead.
@llvm/pr-subscribers-mlir @llvm/pr-subscribers-mlir-linalg Author: Matthias Gehre (mgehre-amd) ChangesFor affine maps where
tiling currently computes the wrong slice offsets. When tiling above example with a size of 3, it would compute
and thus apply the This PR fixes this to yield an offset of Full diff: https://github.com/llvm/llvm-project/pull/122492.diff 2 Files Affected:
diff --git a/mlir/lib/Dialect/Linalg/Utils/Utils.cpp b/mlir/lib/Dialect/Linalg/Utils/Utils.cpp
index 38e427af1c4846..2ebeccb1505048 100644
--- a/mlir/lib/Dialect/Linalg/Utils/Utils.cpp
+++ b/mlir/lib/Dialect/Linalg/Utils/Utils.cpp
@@ -596,8 +596,17 @@ computeSliceParameters(OpBuilder &builder, Location loc, Value valueToTile,
auto m = map.getSubMap({r});
LLVM_DEBUG(llvm::dbgs() << "computeSliceParameters: submap: " << m << "\n");
IRRewriter rewriter(builder);
- OpFoldResult offset = makeComposedFoldedAffineApply(rewriter, loc, m, lbs);
+ // The offset of the slice is map(lbs) - map(0).
+ SmallVector<Attribute> zeros(lbs.size(), rewriter.getIndexAttr(0));
+ SmallVector<Attribute> mAtZero;
+ auto res = m.constantFold(zeros, mAtZero);
+ assert(succeeded(res));
+ auto atZeroInt = getConstantIntValue(mAtZero[0]);
+ assert(atZeroInt);
+ OpFoldResult offset = makeComposedFoldedAffineApply(
+ rewriter, loc, m.getResult(0) - *atZeroInt, lbs);
sliceParams.offsets.push_back(offset);
+
OpFoldResult closedIntSize =
makeComposedFoldedAffineApply(rewriter, loc, m, subShapeSizes);
// Resulting size needs to be made half open interval again.
diff --git a/mlir/test/Dialect/Linalg/tile-offset.mlir b/mlir/test/Dialect/Linalg/tile-offset.mlir
new file mode 100644
index 00000000000000..41763dd688c1e9
--- /dev/null
+++ b/mlir/test/Dialect/Linalg/tile-offset.mlir
@@ -0,0 +1,25 @@
+// RUN: mlir-opt %s -transform-interpreter -split-input-file | FileCheck %s
+
+// CHECK-LABEL: func @tile_offset
+// CHECK-SAME: %[[ARG0:[a-zA-Z0-9_]+]]:
+func.func @tile_offset(%arg0 : tensor<9xf32>) -> tensor<6xf32> {
+ %empty = tensor.empty() : tensor<6xf32>
+ // CHECK: scf.for %[[ITER:[a-zA-Z0-9_]+]] =
+ // CHECK: tensor.extract_slice %[[ARG0]][%[[ITER]]] [6] [1]
+ %generic = linalg.generic
+ {indexing_maps = [affine_map<(d0) -> (d0 + 3)>,
+ affine_map<(d0) -> (d0)>],
+ iterator_types = ["parallel"]} ins(%arg0: tensor<9xf32>) outs(%empty : tensor<6xf32>) {
+ ^bb0(%in : f32, %out: f32):
+ linalg.yield %in : f32
+ } -> tensor<6xf32>
+ return %generic : tensor<6xf32>
+}
+
+module attributes {transform.with_named_sequence} {
+ transform.named_sequence @__transform_main(%arg1: !transform.any_op {transform.readonly}) {
+ %0 = transform.structured.match ops{["linalg.generic"]} in %arg1 : (!transform.any_op) -> !transform.any_op
+ %1, %loop = transform.structured.tile_using_for %0 tile_sizes [3] : (!transform.any_op) -> (!transform.any_op, !transform.any_op)
+ transform.yield
+ }
+}
|
auto res = m.constantFold(zeros, mAtZero); | ||
assert(succeeded(res)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the res
is not used anyway... So in non-debug builds this will show up as warning/error (if warnings are converted to error). What is the constantFold
for?
Could you also add a message to the assert.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The m.constantFold
is evaluating the affine map m
with all inputs being zero.
The constantFold
returns failure()
when the affine map has symbols, but afaik linalg guarantees that the affine map has no symbols, thus the evaluation of m
into a constant is always possible. (Thus the assert; I added a message to it).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I also added [[maybe_unused]]
to res
to avoid errors in non-debug configs.
Friendly ping :-) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks! Took me a while to work through if this is the easiest way of doing this. I cant think of anything better.
For affine maps where
m(0) != 0
,like
affine_map<(d0) -> (d0 + 3)
intiling currently computes the wrong slice offsets. When tiling above example with a size of 3, it would compute
and thus apply the
+3
twice (once in the extract slice and a second time in the linalg.generic).This PR fixes this to yield an offset of
tensor.extract_slice %arg0[%i] [6] [1]
instead.