Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[mlir] computeSliceParameters: Fix offset when m(0) != 0 #122492

Merged

Conversation

mgehre-amd
Copy link
Contributor

For affine maps where m(0) != 0,
like affine_map<(d0) -> (d0 + 3) in

  %generic = linalg.generic
    {indexing_maps = [affine_map<(d0) -> (d0 + 3)>,
                      affine_map<(d0) -> (d0)>],
     iterator_types = ["parallel"]} ins(%arg0: tensor<9xf32>) outs(%empty : tensor<6xf32>) {
    ^bb0(%in : f32, %out: f32):
      linalg.yield %in : f32
    } -> tensor<6xf32>

tiling currently computes the wrong slice offsets. When tiling above example with a size of 3, it would compute

scf.for %i = ...
  %slice = tensor.extract_slice %arg0[%i + 3] [6] [1]
  linalg.generic
    {indexing_maps = [affine_map<(d0) -> (d0 + 3)>,
                      affine_map<(d0) -> (d0)>],
     iterator_types = ["parallel"]} ins(%slice: tensor<6xf32>)

and thus apply the +3 twice (once in the extract slice and a second time in the linalg.generic).

This PR fixes this to yield an offset of
tensor.extract_slice %arg0[%i] [6] [1] instead.

like `affine_map<(d0) -> (d0 + 3)` in
```
  %generic = linalg.generic
    {indexing_maps = [affine_map<(d0) -> (d0 + 3)>,
                      affine_map<(d0) -> (d0)>],
     iterator_types = ["parallel"]} ins(%arg0: tensor<9xf32>) outs(%empty : tensor<6xf32>) {
    ^bb0(%in : f32, %out: f32):
      linalg.yield %in : f32
    } -> tensor<6xf32>
```
tiling currently computes the wrong slice offsets. When tiling above example with a size of 3, it would compute
```
scf.for %i = ...
  %slice = tensor.extract_slice %arg0[%i + 3] [6] [1]
  linalg.generic
    {indexing_maps = [affine_map<(d0) -> (d0 + 3)>,
                      affine_map<(d0) -> (d0)>],
     iterator_types = ["parallel"]} ins(%slice: tensor<6xf32>)
```
and thus apply the `+3` twice (once in the extract slice and a second time in the linalg.generic).

This PR fixes this to yield an offset of
`tensor.extract_slice %arg0[%i] [6] [1]` instead.
@llvmbot
Copy link
Member

llvmbot commented Jan 10, 2025

@llvm/pr-subscribers-mlir

@llvm/pr-subscribers-mlir-linalg

Author: Matthias Gehre (mgehre-amd)

Changes

For affine maps where m(0) != 0,
like affine_map&lt;(d0) -&gt; (d0 + 3) in

  %generic = linalg.generic
    {indexing_maps = [affine_map&lt;(d0) -&gt; (d0 + 3)&gt;,
                      affine_map&lt;(d0) -&gt; (d0)&gt;],
     iterator_types = ["parallel"]} ins(%arg0: tensor&lt;9xf32&gt;) outs(%empty : tensor&lt;6xf32&gt;) {
    ^bb0(%in : f32, %out: f32):
      linalg.yield %in : f32
    } -&gt; tensor&lt;6xf32&gt;

tiling currently computes the wrong slice offsets. When tiling above example with a size of 3, it would compute

scf.for %i = ...
  %slice = tensor.extract_slice %arg0[%i + 3] [6] [1]
  linalg.generic
    {indexing_maps = [affine_map&lt;(d0) -&gt; (d0 + 3)&gt;,
                      affine_map&lt;(d0) -&gt; (d0)&gt;],
     iterator_types = ["parallel"]} ins(%slice: tensor&lt;6xf32&gt;)

and thus apply the +3 twice (once in the extract slice and a second time in the linalg.generic).

This PR fixes this to yield an offset of
tensor.extract_slice %arg0[%i] [6] [1] instead.


Full diff: https://github.com/llvm/llvm-project/pull/122492.diff

2 Files Affected:

  • (modified) mlir/lib/Dialect/Linalg/Utils/Utils.cpp (+10-1)
  • (added) mlir/test/Dialect/Linalg/tile-offset.mlir (+25)
diff --git a/mlir/lib/Dialect/Linalg/Utils/Utils.cpp b/mlir/lib/Dialect/Linalg/Utils/Utils.cpp
index 38e427af1c4846..2ebeccb1505048 100644
--- a/mlir/lib/Dialect/Linalg/Utils/Utils.cpp
+++ b/mlir/lib/Dialect/Linalg/Utils/Utils.cpp
@@ -596,8 +596,17 @@ computeSliceParameters(OpBuilder &builder, Location loc, Value valueToTile,
     auto m = map.getSubMap({r});
     LLVM_DEBUG(llvm::dbgs() << "computeSliceParameters: submap: " << m << "\n");
     IRRewriter rewriter(builder);
-    OpFoldResult offset = makeComposedFoldedAffineApply(rewriter, loc, m, lbs);
+    // The offset of the slice is map(lbs) - map(0).
+    SmallVector<Attribute> zeros(lbs.size(), rewriter.getIndexAttr(0));
+    SmallVector<Attribute> mAtZero;
+    auto res = m.constantFold(zeros, mAtZero);
+    assert(succeeded(res));
+    auto atZeroInt = getConstantIntValue(mAtZero[0]);
+    assert(atZeroInt);
+    OpFoldResult offset = makeComposedFoldedAffineApply(
+        rewriter, loc, m.getResult(0) - *atZeroInt, lbs);
     sliceParams.offsets.push_back(offset);
+
     OpFoldResult closedIntSize =
         makeComposedFoldedAffineApply(rewriter, loc, m, subShapeSizes);
     // Resulting size needs to be made half open interval again.
diff --git a/mlir/test/Dialect/Linalg/tile-offset.mlir b/mlir/test/Dialect/Linalg/tile-offset.mlir
new file mode 100644
index 00000000000000..41763dd688c1e9
--- /dev/null
+++ b/mlir/test/Dialect/Linalg/tile-offset.mlir
@@ -0,0 +1,25 @@
+// RUN: mlir-opt %s -transform-interpreter -split-input-file | FileCheck %s
+
+// CHECK-LABEL: func @tile_offset
+//  CHECK-SAME:   %[[ARG0:[a-zA-Z0-9_]+]]:
+func.func @tile_offset(%arg0 : tensor<9xf32>) -> tensor<6xf32> {
+  %empty = tensor.empty() : tensor<6xf32>
+  // CHECK: scf.for %[[ITER:[a-zA-Z0-9_]+]] =
+  // CHECK: tensor.extract_slice %[[ARG0]][%[[ITER]]] [6] [1]
+  %generic = linalg.generic
+    {indexing_maps = [affine_map<(d0) -> (d0 + 3)>,
+                      affine_map<(d0) -> (d0)>],
+     iterator_types = ["parallel"]} ins(%arg0: tensor<9xf32>) outs(%empty : tensor<6xf32>) {
+    ^bb0(%in : f32, %out: f32):
+      linalg.yield %in : f32
+    } -> tensor<6xf32>
+  return %generic : tensor<6xf32>
+}
+
+module attributes {transform.with_named_sequence} {
+  transform.named_sequence @__transform_main(%arg1: !transform.any_op {transform.readonly}) {
+    %0 = transform.structured.match ops{["linalg.generic"]} in %arg1 : (!transform.any_op) -> !transform.any_op
+    %1, %loop = transform.structured.tile_using_for %0 tile_sizes [3] : (!transform.any_op) -> (!transform.any_op, !transform.any_op)
+    transform.yield
+  }
+}

Comment on lines 602 to 603
auto res = m.constantFold(zeros, mAtZero);
assert(succeeded(res));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the res is not used anyway... So in non-debug builds this will show up as warning/error (if warnings are converted to error). What is the constantFold for?

Could you also add a message to the assert.

Copy link
Contributor Author

@mgehre-amd mgehre-amd Jan 13, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The m.constantFold is evaluating the affine map m with all inputs being zero.
The constantFold returns failure() when the affine map has symbols, but afaik linalg guarantees that the affine map has no symbols, thus the evaluation of m into a constant is always possible. (Thus the assert; I added a message to it).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I also added [[maybe_unused]] to res to avoid errors in non-debug configs.

@mgehre-amd
Copy link
Contributor Author

Friendly ping :-)

Copy link
Contributor

@MaheshRavishankar MaheshRavishankar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! Took me a while to work through if this is the easiest way of doing this. I cant think of anything better.

@mgehre-amd mgehre-amd merged commit 67b9d3f into llvm:main Jan 21, 2025
8 checks passed
@mgehre-amd mgehre-amd deleted the matthias.fix_tiling_slice_offset_upstream branch January 21, 2025 09:33
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants