Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[flang][OpenMP] Rewrite standalone loop (without bind) directives to simd #122632

Merged
merged 1 commit into from
Jan 21, 2025

Conversation

ergawy
Copy link
Member

@ergawy ergawy commented Jan 12, 2025

Extends conversion support for loop directives. This PR handles standalone loop constructs that do not have a bind clause attached by rewriting them to equivalent simd constructs. The reasoning behind that decision is documented in the rewrite function itself.

@llvmbot llvmbot added flang Flang issues not falling into any other category flang:fir-hlfir flang:openmp labels Jan 12, 2025
@llvmbot
Copy link
Member

llvmbot commented Jan 12, 2025

@llvm/pr-subscribers-flang-openmp

@llvm/pr-subscribers-flang-fir-hlfir

Author: Kareem Ergawy (ergawy)

Changes

Extends conversion support for loop directives. This PR handles standalone loop constructs by rewriting them to equivalent simd constructs. The reasoning behind that decision is documented in the rewrite function itself.


Full diff: https://github.com/llvm/llvm-project/pull/122632.diff

3 Files Affected:

  • (modified) flang/lib/Optimizer/OpenMP/GenericLoopConversion.cpp (+78-6)
  • (modified) flang/test/Lower/OpenMP/loop-directive.f90 (+42-3)
  • (modified) flang/test/Transforms/generic-loop-rewriting-todo.mlir (-13)
diff --git a/flang/lib/Optimizer/OpenMP/GenericLoopConversion.cpp b/flang/lib/Optimizer/OpenMP/GenericLoopConversion.cpp
index c3c1f3b2848b82..6b1b689aad9c23 100644
--- a/flang/lib/Optimizer/OpenMP/GenericLoopConversion.cpp
+++ b/flang/lib/Optimizer/OpenMP/GenericLoopConversion.cpp
@@ -30,19 +30,37 @@ class GenericLoopConversionPattern
     : public mlir::OpConversionPattern<mlir::omp::LoopOp> {
 public:
   enum class GenericLoopCombinedInfo {
-    None,
+    Standalone,
     TargetTeamsLoop,
     TargetParallelLoop
   };
 
   using mlir::OpConversionPattern<mlir::omp::LoopOp>::OpConversionPattern;
 
+  explicit GenericLoopConversionPattern(mlir::MLIRContext *ctx)
+      : mlir::OpConversionPattern<mlir::omp::LoopOp>{ctx} {
+    this->setHasBoundedRewriteRecursion(true);
+  }
+
   mlir::LogicalResult
   matchAndRewrite(mlir::omp::LoopOp loopOp, OpAdaptor adaptor,
                   mlir::ConversionPatternRewriter &rewriter) const override {
     assert(mlir::succeeded(checkLoopConversionSupportStatus(loopOp)));
 
-    rewriteToDistributeParallelDo(loopOp, rewriter);
+    GenericLoopCombinedInfo combinedInfo = findGenericLoopCombineInfo(loopOp);
+
+    switch (combinedInfo) {
+    case GenericLoopCombinedInfo::Standalone:
+      rewriteToSimdLoop(loopOp, rewriter);
+      break;
+    case GenericLoopCombinedInfo::TargetParallelLoop:
+      assert(false);
+      break;
+    case GenericLoopCombinedInfo::TargetTeamsLoop:
+      rewriteToDistributeParallelDo(loopOp, rewriter);
+      break;
+    }
+
     rewriter.eraseOp(loopOp);
     return mlir::success();
   }
@@ -52,9 +70,8 @@ class GenericLoopConversionPattern
     GenericLoopCombinedInfo combinedInfo = findGenericLoopCombineInfo(loopOp);
 
     switch (combinedInfo) {
-    case GenericLoopCombinedInfo::None:
-      return loopOp.emitError(
-          "not yet implemented: Standalone `omp loop` directive");
+    case GenericLoopCombinedInfo::Standalone:
+      break;
     case GenericLoopCombinedInfo::TargetParallelLoop:
       return loopOp.emitError(
           "not yet implemented: Combined `omp target parallel loop` directive");
@@ -86,7 +103,7 @@ class GenericLoopConversionPattern
   static GenericLoopCombinedInfo
   findGenericLoopCombineInfo(mlir::omp::LoopOp loopOp) {
     mlir::Operation *parentOp = loopOp->getParentOp();
-    GenericLoopCombinedInfo result = GenericLoopCombinedInfo::None;
+    GenericLoopCombinedInfo result = GenericLoopCombinedInfo::Standalone;
 
     if (auto teamsOp = mlir::dyn_cast_if_present<mlir::omp::TeamsOp>(parentOp))
       if (mlir::isa_and_present<mlir::omp::TargetOp>(teamsOp->getParentOp()))
@@ -100,6 +117,61 @@ class GenericLoopConversionPattern
     return result;
   }
 
+  /// Rewrites standalone `loop` directives to equivalent `simd` constructs. The reasoning behind this decision is that according to the spec (version 5.2, section 11.7.1):
+  ///
+  /// "If the bind clause is not specified on a construct for which it may be
+  /// specified and the construct is closely nested inside a teams or parallel
+  /// construct, the effect is as if binding is teams or parallel. If none of
+  /// those conditions hold, the binding region is not defined."
+  ///
+  /// which means that standalone `loop` directives have undefined binding
+  /// region. Moreover, the spec says (in the next paragraph):
+  ///
+  /// "The specified binding region determines the binding thread set.
+  /// Specifically, if the binding region is a teams region, then the binding
+  /// thread set is the set of initial threads that are executing that region
+  /// while if the binding region is a parallel region, then the binding thread
+  /// set is the team of threads that are executing that region. If the binding
+  /// region is not defined, then the binding thread set is the encountering
+  /// thread."
+  ///
+  /// which means that the binding thread set for a standalone `loop` directive
+  /// is only the encountering thread.
+  ///
+  /// Since the encountering thread is the binding thread (set) for a
+  /// standalone `loop` directive, the best we can do in such case is to "simd"
+  /// the directive.
+  void
+  rewriteToSimdLoop(mlir::omp::LoopOp loopOp,
+                          mlir::ConversionPatternRewriter &rewriter) const {
+    loopOp.emitWarning("Detected standalone OpenMP `loop` directive, the "
+                       "associated loop will be rewritten to `simd`.");
+    mlir::omp::SimdOperands simdClauseOps;
+    simdClauseOps.privateVars = loopOp.getPrivateVars();
+
+    auto privateSyms = loopOp.getPrivateSyms();
+    if (privateSyms)
+      simdClauseOps.privateSyms.assign(privateSyms->begin(),
+                                       privateSyms->end());
+
+    Fortran::common::openmp::EntryBlockArgs simdArgs;
+    simdArgs.priv.vars = simdClauseOps.privateVars;
+
+    auto simdOp =
+        rewriter.create<mlir::omp::SimdOp>(loopOp.getLoc(), simdClauseOps);
+    mlir::Block *simdBlock =
+        genEntryBlock(rewriter, simdArgs, simdOp.getRegion());
+
+    mlir::IRMapping mapper;
+    mlir::Block &loopBlock = *loopOp.getRegion().begin();
+
+    for (auto [loopOpArg, simdopArg] :
+         llvm::zip_equal(loopBlock.getArguments(), simdBlock->getArguments()))
+      mapper.map(loopOpArg, simdopArg);
+
+    rewriter.clone(*loopOp.begin(), mapper);
+  }
+
   void rewriteToDistributeParallelDo(
       mlir::omp::LoopOp loopOp,
       mlir::ConversionPatternRewriter &rewriter) const {
diff --git a/flang/test/Lower/OpenMP/loop-directive.f90 b/flang/test/Lower/OpenMP/loop-directive.f90
index 4b4d640e449eeb..9fa0de3bfe171a 100644
--- a/flang/test/Lower/OpenMP/loop-directive.f90
+++ b/flang/test/Lower/OpenMP/loop-directive.f90
@@ -11,7 +11,7 @@
 subroutine test_no_clauses()
   integer :: i, j, dummy = 1
 
-  ! CHECK: omp.loop private(@[[I_PRIV]] %{{.*}}#0 -> %[[ARG:.*]] : !fir.ref<i32>) {
+  ! CHECK: omp.simd private(@[[I_PRIV]] %{{.*}}#0 -> %[[ARG:.*]] : !fir.ref<i32>) {
   ! CHECK-NEXT:   omp.loop_nest (%[[IV:.*]]) : i32 = (%{{.*}}) to (%{{.*}}) {{.*}} {
   ! CHECK:          %[[ARG_DECL:.*]]:2 = hlfir.declare %[[ARG]]
   ! CHECK:          fir.store %[[IV]] to %[[ARG_DECL]]#1 : !fir.ref<i32>
@@ -27,7 +27,7 @@ subroutine test_no_clauses()
 ! CHECK-LABEL: func.func @_QPtest_collapse
 subroutine test_collapse()
   integer :: i, j, dummy = 1
-  ! CHECK: omp.loop private(@{{.*}} %{{.*}}#0 -> %{{.*}}, @{{.*}} %{{.*}}#0 -> %{{.*}} : {{.*}}) {
+  ! CHECK: omp.simd private(@{{.*}} %{{.*}}#0 -> %{{.*}}, @{{.*}} %{{.*}}#0 -> %{{.*}} : {{.*}}) {
   ! CHECK-NEXT:   omp.loop_nest (%{{.*}}, %{{.*}}) : i32 {{.*}} {
   ! CHECK:        }
   ! CHECK: }
@@ -43,7 +43,7 @@ subroutine test_collapse()
 ! CHECK-LABEL: func.func @_QPtest_private
 subroutine test_private()
   integer :: i, dummy = 1
-  ! CHECK: omp.loop private(@[[DUMMY_PRIV]] %{{.*}}#0 -> %[[DUMMY_ARG:.*]], @{{.*}} %{{.*}}#0 -> %{{.*}} : {{.*}}) {
+  ! CHECK: omp.simd private(@[[DUMMY_PRIV]] %{{.*}}#0 -> %[[DUMMY_ARG:.*]], @{{.*}} %{{.*}}#0 -> %{{.*}} : {{.*}}) {
   ! CHECK-NEXT:   omp.loop_nest (%{{.*}}) : i32 = (%{{.*}}) to (%{{.*}}) {{.*}} {
   ! CHECK:          %[[DUMMY_DECL:.*]]:2 = hlfir.declare %[[DUMMY_ARG]] {uniq_name = "_QFtest_privateEdummy"}
   ! CHECK:          %{{.*}} = fir.load %[[DUMMY_DECL]]#0
@@ -100,3 +100,42 @@ subroutine test_bind()
   end do
   !$omp end loop
 end subroutine
+
+! CHECK-LABEL: func.func @_QPtest_nested_directives
+subroutine test_nested_directives
+  implicit none
+  integer, parameter :: N = 100000
+  integer a(N), b(N), c(N)
+  integer j,i, num, flag;
+  num = N
+
+  ! CHECK: omp.teams {
+
+  ! Verify the first `loop` directive was combined with `target teams` into 
+  ! `target teams distribute parallel do`.
+  ! CHECK:   omp.parallel {{.*}} {
+  ! CHECK:     omp.distribute {
+  ! CHECK:       omp.wsloop {
+  ! CHECK:         omp.loop_nest {{.*}} {
+
+  ! Very the second `loop` directive was rewritten to `simd`.
+  ! CHECK:           omp.simd {{.*}} {
+  ! CHECK:             omp.loop_nest {{.*}} {
+  ! CHECK:             }
+  ! CHECK:           }
+
+  ! CHECK:         }
+  ! CHECK:       } {omp.composite}
+  ! CHECK:     } {omp.composite}
+  ! CHECK:   } {omp.composite}
+  ! CHECK: }
+  !$omp target teams map(to: a,b) map(from: c)
+  !$omp loop
+  do j=1,1000
+    !$omp loop
+    do i=1,N
+      c(i) = a(i) * b(i)
+    end do
+  end do
+  !$omp end target teams
+end subroutine
diff --git a/flang/test/Transforms/generic-loop-rewriting-todo.mlir b/flang/test/Transforms/generic-loop-rewriting-todo.mlir
index 9ea6bf001b6685..becd6b8dcb5cb4 100644
--- a/flang/test/Transforms/generic-loop-rewriting-todo.mlir
+++ b/flang/test/Transforms/generic-loop-rewriting-todo.mlir
@@ -1,18 +1,5 @@
 // RUN: fir-opt --omp-generic-loop-conversion -verify-diagnostics %s
 
-func.func @_QPtarget_loop() {
-  %c0 = arith.constant 0 : i32
-  %c10 = arith.constant 10 : i32
-  %c1 = arith.constant 1 : i32
-  // expected-error@below {{not yet implemented: Standalone `omp loop` directive}}
-  omp.loop {
-    omp.loop_nest (%arg3) : i32 = (%c0) to (%c10) inclusive step (%c1) {
-      omp.yield
-    }
-  }
-  return
-}
-
 func.func @_QPtarget_parallel_loop() {
   omp.target {
     omp.parallel {

@ergawy ergawy force-pushed the convert_standalone_loop_directives branch 2 times, most recently from d1436cc to 5344c60 Compare January 12, 2025 07:52
Copy link
Contributor

@mjklemm mjklemm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@ergawy ergawy changed the title [flang][OpenMP] Rewrite standalone loop directives to simd [flang][OpenMP] Rewrite standalone loop (without bind) directives to simd Jan 13, 2025
@ergawy ergawy force-pushed the convert_standalone_loop_directives branch from 5344c60 to 52076c7 Compare January 13, 2025 08:12
@kiranchandramohan
Copy link
Contributor

This change (if it follows the standard) should be an OpenMP dialect pass.

@ergawy
Copy link
Member Author

ergawy commented Jan 13, 2025

This change (if it follows the standard) should be an OpenMP dialect pass.

You mean to add it under mlir/lib/Dialect/OpenMP/Transforms? There is no Transforms directory at the moment but can add one (makes more sense to have it there). If so, do you mind doing that in a follow up PR?

@kiranchandramohan
Copy link
Contributor

This change (if it follows the standard) should be an OpenMP dialect pass.

You mean to add it under mlir/lib/Dialect/OpenMP/Transforms? There is no Transforms directory at the moment but can add one (makes more sense to have it there). If so, do you mind doing that in a follow up PR?

Yes, it is fine in a follow up PR.

Copy link
Member

@skatrak skatrak left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, additions in this patch LGTM. I just have a general question about the need for having this pass be aware of target at all.

@ergawy ergawy force-pushed the convert_standalone_loop_directives branch from 52076c7 to d7341ec Compare January 13, 2025 19:21
@ergawy
Copy link
Member Author

ergawy commented Jan 20, 2025

Ping! Please take a look and let me know if there are any objections to merging this PR.

Copy link
Member

@skatrak skatrak left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks!

Extends conversion support for `loop` directives. This PR handles
standalone `loop` constructs by rewriting them to equivalent `simd`
constructs. The reasoning behind that decision is documented in the
rewrite function itself.
@ergawy ergawy force-pushed the convert_standalone_loop_directives branch from d7341ec to 41a9cf5 Compare January 21, 2025 13:14
@ergawy ergawy merged commit 29f7392 into llvm:main Jan 21, 2025
8 checks passed
ergawy added a commit that referenced this pull request Jan 27, 2025
…#122674)

Extends rewriting of `loop` directives by supporting `bind` clause for
standalone directives. This follows both the spec and the current state
of clang as follows:
* No `bind` or `bind(thread)`: the `loop` is rewritten to `simd`.
* `bind(parallel)`: the `loop` is rewritten to `do`.
* `bind(teams)`: the `loop` is rewritten to `distribute`.

This is a follow-up PR for
#122632, only the latest commit
in this PR is relevant to the PR.
github-actions bot pushed a commit to arm/arm-toolchain that referenced this pull request Jan 27, 2025
…one `loop`s (#122674)

Extends rewriting of `loop` directives by supporting `bind` clause for
standalone directives. This follows both the spec and the current state
of clang as follows:
* No `bind` or `bind(thread)`: the `loop` is rewritten to `simd`.
* `bind(parallel)`: the `loop` is rewritten to `do`.
* `bind(teams)`: the `loop` is rewritten to `distribute`.

This is a follow-up PR for
llvm/llvm-project#122632, only the latest commit
in this PR is relevant to the PR.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
flang:fir-hlfir flang:openmp flang Flang issues not falling into any other category
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants