[flang][OpenMP] Rewrite standalone `loop` (without `bind`) directives to `simd` #122632

ergawy · 2025-01-12T07:49:46Z

Extends conversion support for loop directives. This PR handles standalone loop constructs that do not have a bind clause attached by rewriting them to equivalent simd constructs. The reasoning behind that decision is documented in the rewrite function itself.

llvmbot · 2025-01-12T07:50:17Z

@llvm/pr-subscribers-flang-openmp

@llvm/pr-subscribers-flang-fir-hlfir

Author: Kareem Ergawy (ergawy)

Changes

Extends conversion support for loop directives. This PR handles standalone loop constructs by rewriting them to equivalent simd constructs. The reasoning behind that decision is documented in the rewrite function itself.

Full diff: https://github.com/llvm/llvm-project/pull/122632.diff

3 Files Affected:

(modified) flang/lib/Optimizer/OpenMP/GenericLoopConversion.cpp (+78-6)
(modified) flang/test/Lower/OpenMP/loop-directive.f90 (+42-3)
(modified) flang/test/Transforms/generic-loop-rewriting-todo.mlir (-13)

diff --git a/flang/lib/Optimizer/OpenMP/GenericLoopConversion.cpp b/flang/lib/Optimizer/OpenMP/GenericLoopConversion.cpp
index c3c1f3b2848b82..6b1b689aad9c23 100644
--- a/flang/lib/Optimizer/OpenMP/GenericLoopConversion.cpp
+++ b/flang/lib/Optimizer/OpenMP/GenericLoopConversion.cpp
@@ -30,19 +30,37 @@ class GenericLoopConversionPattern
     : public mlir::OpConversionPattern<mlir::omp::LoopOp> {
 public:
   enum class GenericLoopCombinedInfo {
-    None,
+    Standalone,
     TargetTeamsLoop,
     TargetParallelLoop
   };
 
   using mlir::OpConversionPattern<mlir::omp::LoopOp>::OpConversionPattern;
 
+  explicit GenericLoopConversionPattern(mlir::MLIRContext *ctx)
+      : mlir::OpConversionPattern<mlir::omp::LoopOp>{ctx} {
+    this->setHasBoundedRewriteRecursion(true);
+  }
+
   mlir::LogicalResult
   matchAndRewrite(mlir::omp::LoopOp loopOp, OpAdaptor adaptor,
                   mlir::ConversionPatternRewriter &rewriter) const override {
     assert(mlir::succeeded(checkLoopConversionSupportStatus(loopOp)));
 
-    rewriteToDistributeParallelDo(loopOp, rewriter);
+    GenericLoopCombinedInfo combinedInfo = findGenericLoopCombineInfo(loopOp);
+
+    switch (combinedInfo) {
+    case GenericLoopCombinedInfo::Standalone:
+      rewriteToSimdLoop(loopOp, rewriter);
+      break;
+    case GenericLoopCombinedInfo::TargetParallelLoop:
+      assert(false);
+      break;
+    case GenericLoopCombinedInfo::TargetTeamsLoop:
+      rewriteToDistributeParallelDo(loopOp, rewriter);
+      break;
+    }
+
     rewriter.eraseOp(loopOp);
     return mlir::success();
   }
@@ -52,9 +70,8 @@ class GenericLoopConversionPattern
     GenericLoopCombinedInfo combinedInfo = findGenericLoopCombineInfo(loopOp);
 
     switch (combinedInfo) {
-    case GenericLoopCombinedInfo::None:
-      return loopOp.emitError(
-          "not yet implemented: Standalone `omp loop` directive");
+    case GenericLoopCombinedInfo::Standalone:
+      break;
     case GenericLoopCombinedInfo::TargetParallelLoop:
       return loopOp.emitError(
           "not yet implemented: Combined `omp target parallel loop` directive");
@@ -86,7 +103,7 @@ class GenericLoopConversionPattern
   static GenericLoopCombinedInfo
   findGenericLoopCombineInfo(mlir::omp::LoopOp loopOp) {
     mlir::Operation *parentOp = loopOp->getParentOp();
-    GenericLoopCombinedInfo result = GenericLoopCombinedInfo::None;
+    GenericLoopCombinedInfo result = GenericLoopCombinedInfo::Standalone;
 
     if (auto teamsOp = mlir::dyn_cast_if_present<mlir::omp::TeamsOp>(parentOp))
       if (mlir::isa_and_present<mlir::omp::TargetOp>(teamsOp->getParentOp()))
@@ -100,6 +117,61 @@ class GenericLoopConversionPattern
     return result;
   }
 
+  /// Rewrites standalone `loop` directives to equivalent `simd` constructs. The reasoning behind this decision is that according to the spec (version 5.2, section 11.7.1):
+  ///
+  /// "If the bind clause is not specified on a construct for which it may be
+  /// specified and the construct is closely nested inside a teams or parallel
+  /// construct, the effect is as if binding is teams or parallel. If none of
+  /// those conditions hold, the binding region is not defined."
+  ///
+  /// which means that standalone `loop` directives have undefined binding
+  /// region. Moreover, the spec says (in the next paragraph):
+  ///
+  /// "The specified binding region determines the binding thread set.
+  /// Specifically, if the binding region is a teams region, then the binding
+  /// thread set is the set of initial threads that are executing that region
+  /// while if the binding region is a parallel region, then the binding thread
+  /// set is the team of threads that are executing that region. If the binding
+  /// region is not defined, then the binding thread set is the encountering
+  /// thread."
+  ///
+  /// which means that the binding thread set for a standalone `loop` directive
+  /// is only the encountering thread.
+  ///
+  /// Since the encountering thread is the binding thread (set) for a
+  /// standalone `loop` directive, the best we can do in such case is to "simd"
+  /// the directive.
+  void
+  rewriteToSimdLoop(mlir::omp::LoopOp loopOp,
+                          mlir::ConversionPatternRewriter &rewriter) const {
+    loopOp.emitWarning("Detected standalone OpenMP `loop` directive, the "
+                       "associated loop will be rewritten to `simd`.");
+    mlir::omp::SimdOperands simdClauseOps;
+    simdClauseOps.privateVars = loopOp.getPrivateVars();
+
+    auto privateSyms = loopOp.getPrivateSyms();
+    if (privateSyms)
+      simdClauseOps.privateSyms.assign(privateSyms->begin(),
+                                       privateSyms->end());
+
+    Fortran::common::openmp::EntryBlockArgs simdArgs;
+    simdArgs.priv.vars = simdClauseOps.privateVars;
+
+    auto simdOp =
+        rewriter.create<mlir::omp::SimdOp>(loopOp.getLoc(), simdClauseOps);
+    mlir::Block *simdBlock =
+        genEntryBlock(rewriter, simdArgs, simdOp.getRegion());
+
+    mlir::IRMapping mapper;
+    mlir::Block &loopBlock = *loopOp.getRegion().begin();
+
+    for (auto [loopOpArg, simdopArg] :
+         llvm::zip_equal(loopBlock.getArguments(), simdBlock->getArguments()))
+      mapper.map(loopOpArg, simdopArg);
+
+    rewriter.clone(*loopOp.begin(), mapper);
+  }
+
   void rewriteToDistributeParallelDo(
       mlir::omp::LoopOp loopOp,
       mlir::ConversionPatternRewriter &rewriter) const {
diff --git a/flang/test/Lower/OpenMP/loop-directive.f90 b/flang/test/Lower/OpenMP/loop-directive.f90
index 4b4d640e449eeb..9fa0de3bfe171a 100644
--- a/flang/test/Lower/OpenMP/loop-directive.f90
+++ b/flang/test/Lower/OpenMP/loop-directive.f90
@@ -11,7 +11,7 @@
 subroutine test_no_clauses()
   integer :: i, j, dummy = 1
 
-  ! CHECK: omp.loop private(@[[I_PRIV]] %{{.*}}#0 -> %[[ARG:.*]] : !fir.ref<i32>) {
+  ! CHECK: omp.simd private(@[[I_PRIV]] %{{.*}}#0 -> %[[ARG:.*]] : !fir.ref<i32>) {
   ! CHECK-NEXT:   omp.loop_nest (%[[IV:.*]]) : i32 = (%{{.*}}) to (%{{.*}}) {{.*}} {
   ! CHECK:          %[[ARG_DECL:.*]]:2 = hlfir.declare %[[ARG]]
   ! CHECK:          fir.store %[[IV]] to %[[ARG_DECL]]#1 : !fir.ref<i32>
@@ -27,7 +27,7 @@ subroutine test_no_clauses()
 ! CHECK-LABEL: func.func @_QPtest_collapse
 subroutine test_collapse()
   integer :: i, j, dummy = 1
-  ! CHECK: omp.loop private(@{{.*}} %{{.*}}#0 -> %{{.*}}, @{{.*}} %{{.*}}#0 -> %{{.*}} : {{.*}}) {
+  ! CHECK: omp.simd private(@{{.*}} %{{.*}}#0 -> %{{.*}}, @{{.*}} %{{.*}}#0 -> %{{.*}} : {{.*}}) {
   ! CHECK-NEXT:   omp.loop_nest (%{{.*}}, %{{.*}}) : i32 {{.*}} {
   ! CHECK:        }
   ! CHECK: }
@@ -43,7 +43,7 @@ subroutine test_collapse()
 ! CHECK-LABEL: func.func @_QPtest_private
 subroutine test_private()
   integer :: i, dummy = 1
-  ! CHECK: omp.loop private(@[[DUMMY_PRIV]] %{{.*}}#0 -> %[[DUMMY_ARG:.*]], @{{.*}} %{{.*}}#0 -> %{{.*}} : {{.*}}) {
+  ! CHECK: omp.simd private(@[[DUMMY_PRIV]] %{{.*}}#0 -> %[[DUMMY_ARG:.*]], @{{.*}} %{{.*}}#0 -> %{{.*}} : {{.*}}) {
   ! CHECK-NEXT:   omp.loop_nest (%{{.*}}) : i32 = (%{{.*}}) to (%{{.*}}) {{.*}} {
   ! CHECK:          %[[DUMMY_DECL:.*]]:2 = hlfir.declare %[[DUMMY_ARG]] {uniq_name = "_QFtest_privateEdummy"}
   ! CHECK:          %{{.*}} = fir.load %[[DUMMY_DECL]]#0
@@ -100,3 +100,42 @@ subroutine test_bind()
   end do
   !$omp end loop
 end subroutine
+
+! CHECK-LABEL: func.func @_QPtest_nested_directives
+subroutine test_nested_directives
+  implicit none
+  integer, parameter :: N = 100000
+  integer a(N), b(N), c(N)
+  integer j,i, num, flag;
+  num = N
+
+  ! CHECK: omp.teams {
+
+  ! Verify the first `loop` directive was combined with `target teams` into 
+  ! `target teams distribute parallel do`.
+  ! CHECK:   omp.parallel {{.*}} {
+  ! CHECK:     omp.distribute {
+  ! CHECK:       omp.wsloop {
+  ! CHECK:         omp.loop_nest {{.*}} {
+
+  ! Very the second `loop` directive was rewritten to `simd`.
+  ! CHECK:           omp.simd {{.*}} {
+  ! CHECK:             omp.loop_nest {{.*}} {
+  ! CHECK:             }
+  ! CHECK:           }
+
+  ! CHECK:         }
+  ! CHECK:       } {omp.composite}
+  ! CHECK:     } {omp.composite}
+  ! CHECK:   } {omp.composite}
+  ! CHECK: }
+  !$omp target teams map(to: a,b) map(from: c)
+  !$omp loop
+  do j=1,1000
+    !$omp loop
+    do i=1,N
+      c(i) = a(i) * b(i)
+    end do
+  end do
+  !$omp end target teams
+end subroutine
diff --git a/flang/test/Transforms/generic-loop-rewriting-todo.mlir b/flang/test/Transforms/generic-loop-rewriting-todo.mlir
index 9ea6bf001b6685..becd6b8dcb5cb4 100644
--- a/flang/test/Transforms/generic-loop-rewriting-todo.mlir
+++ b/flang/test/Transforms/generic-loop-rewriting-todo.mlir
@@ -1,18 +1,5 @@
 // RUN: fir-opt --omp-generic-loop-conversion -verify-diagnostics %s
 
-func.func @_QPtarget_loop() {
-  %c0 = arith.constant 0 : i32
-  %c10 = arith.constant 10 : i32
-  %c1 = arith.constant 1 : i32
-  // expected-error@below {{not yet implemented: Standalone `omp loop` directive}}
-  omp.loop {
-    omp.loop_nest (%arg3) : i32 = (%c0) to (%c10) inclusive step (%c1) {
-      omp.yield
-    }
-  }
-  return
-}
-
 func.func @_QPtarget_parallel_loop() {
   omp.target {
     omp.parallel {

mjklemm

LGTM

kiranchandramohan · 2025-01-13T10:43:09Z

This change (if it follows the standard) should be an OpenMP dialect pass.

ergawy · 2025-01-13T12:40:31Z

This change (if it follows the standard) should be an OpenMP dialect pass.

You mean to add it under mlir/lib/Dialect/OpenMP/Transforms? There is no Transforms directory at the moment but can add one (makes more sense to have it there). If so, do you mind doing that in a follow up PR?

kiranchandramohan · 2025-01-13T12:43:36Z

This change (if it follows the standard) should be an OpenMP dialect pass.

You mean to add it under mlir/lib/Dialect/OpenMP/Transforms? There is no Transforms directory at the moment but can add one (makes more sense to have it there). If so, do you mind doing that in a follow up PR?

Yes, it is fine in a follow up PR.

skatrak

Thanks, additions in this patch LGTM. I just have a general question about the need for having this pass be aware of target at all.

flang/lib/Optimizer/OpenMP/GenericLoopConversion.cpp

ergawy · 2025-01-20T07:30:27Z

Ping! Please take a look and let me know if there are any objections to merging this PR.

skatrak

LGTM, thanks!

flang/lib/Optimizer/OpenMP/GenericLoopConversion.cpp

Extends conversion support for `loop` directives. This PR handles standalone `loop` constructs by rewriting them to equivalent `simd` constructs. The reasoning behind that decision is documented in the rewrite function itself.

…#122674) Extends rewriting of `loop` directives by supporting `bind` clause for standalone directives. This follows both the spec and the current state of clang as follows: * No `bind` or `bind(thread)`: the `loop` is rewritten to `simd`. * `bind(parallel)`: the `loop` is rewritten to `do`. * `bind(teams)`: the `loop` is rewritten to `distribute`. This is a follow-up PR for #122632, only the latest commit in this PR is relevant to the PR.

…one `loop`s (#122674) Extends rewriting of `loop` directives by supporting `bind` clause for standalone directives. This follows both the spec and the current state of clang as follows: * No `bind` or `bind(thread)`: the `loop` is rewritten to `simd`. * `bind(parallel)`: the `loop` is rewritten to `do`. * `bind(teams)`: the `loop` is rewritten to `distribute`. This is a follow-up PR for llvm/llvm-project#122632, only the latest commit in this PR is relevant to the PR.

llvmbot added flang Flang issues not falling into any other category flang:fir-hlfir flang:openmp labels Jan 12, 2025

ergawy requested review from Meinersbur, skatrak, kiranchandramohan and mjklemm January 12, 2025 07:50

ergawy force-pushed the convert_standalone_loop_directives branch 2 times, most recently from d1436cc to 5344c60 Compare January 12, 2025 07:52

mjklemm approved these changes Jan 12, 2025

View reviewed changes

ergawy changed the title ~~[flang][OpenMP] Rewrite standalone loop directives to simd~~ [flang][OpenMP] Rewrite standalone loop (without bind) directives to simd Jan 13, 2025

ergawy force-pushed the convert_standalone_loop_directives branch from 5344c60 to 52076c7 Compare January 13, 2025 08:12

ergawy mentioned this pull request Jan 13, 2025

[flang][OpenMP] Support bind clause code-gen for standalone loops #122674

Merged

skatrak reviewed Jan 13, 2025

View reviewed changes

ergawy force-pushed the convert_standalone_loop_directives branch from 52076c7 to d7341ec Compare January 13, 2025 19:21

skatrak approved these changes Jan 21, 2025

View reviewed changes

flang/lib/Optimizer/OpenMP/GenericLoopConversion.cpp Show resolved Hide resolved

flang/lib/Optimizer/OpenMP/GenericLoopConversion.cpp Show resolved Hide resolved

[flang][OpenMP] Rewrite standalone loop directives to simd

41a9cf5

Extends conversion support for `loop` directives. This PR handles standalone `loop` constructs by rewriting them to equivalent `simd` constructs. The reasoning behind that decision is documented in the rewrite function itself.

ergawy force-pushed the convert_standalone_loop_directives branch from d7341ec to 41a9cf5 Compare January 21, 2025 13:14

ergawy merged commit 29f7392 into llvm:main Jan 21, 2025
8 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[flang][OpenMP] Rewrite standalone `loop` (without `bind`) directives to `simd` #122632

[flang][OpenMP] Rewrite standalone `loop` (without `bind`) directives to `simd` #122632

ergawy commented Jan 12, 2025 •

edited

Loading

llvmbot commented Jan 12, 2025 •

edited

Loading

mjklemm left a comment

kiranchandramohan commented Jan 13, 2025

ergawy commented Jan 13, 2025

kiranchandramohan commented Jan 13, 2025

skatrak left a comment

ergawy commented Jan 20, 2025

skatrak left a comment

[flang][OpenMP] Rewrite standalone loop (without bind) directives to simd #122632

[flang][OpenMP] Rewrite standalone loop (without bind) directives to simd #122632

Conversation

ergawy commented Jan 12, 2025 • edited Loading

llvmbot commented Jan 12, 2025 • edited Loading

mjklemm left a comment

Choose a reason for hiding this comment

kiranchandramohan commented Jan 13, 2025

ergawy commented Jan 13, 2025

kiranchandramohan commented Jan 13, 2025

skatrak left a comment

Choose a reason for hiding this comment

ergawy commented Jan 20, 2025

skatrak left a comment

Choose a reason for hiding this comment

[flang][OpenMP] Rewrite standalone `loop` (without `bind`) directives to `simd` #122632

[flang][OpenMP] Rewrite standalone `loop` (without `bind`) directives to `simd` #122632

ergawy commented Jan 12, 2025 •

edited

Loading

llvmbot commented Jan 12, 2025 •

edited

Loading