Adding support for scheduling the epilogue computation when smem_epilogue parameter is true #3581

protonu · 2024-12-12T18:55:48Z

Refactoring some code and adding some support for smem_epilogue
TODO: add support for smem_epilogue when the output of the mma op is not cast down to half precision.

rdspring1

Here is my transformation helper. It can be used in scheduleMmaResult. Just trying to avoid some merge conflicts.

void HopperMultipleMatmulScheduler::transformLikeMmaOutput(
    TensorView* tv,
    bool is_mma_result) {
  // TODO Add constraints

  auto apply_k_dim_offset = [is_mma_result](int64_t idx) constexpr {
    return (is_mma_result) ? idx - 1 : idx;
  };

  // Original: [..., Mo, No, Mi, Ni]
  tv->split(apply_k_dim_offset(-2), getM(params_->mma_macro));
  tv->split(apply_k_dim_offset(-1), getN(params_->mma_macro));
  // After Split: [..., Mo, No, Mio, Mii, Nio, Nii]
  tv->reorder({{apply_k_dim_offset(-3), apply_k_dim_offset(-2)}});
  // After Reorder: [..., Mo, No, Mio, Nio, Mii, Nii]
  tv->merge(apply_k_dim_offset(-4));
  // After Merge: [..., Mo, No, Mio * Nio, Mii, Nii]
  tv->axis(apply_k_dim_offset(-3))->parallelize(ParallelType::TIDy);
  // After Parallelize: [..., Mo, No, Mio * Nio (TIDy), Mii, Nii]
}

protonu · 2024-12-13T23:32:31Z

!build

protonu · 2024-12-16T16:28:15Z

!test

jacobhinkle · 2024-12-16T16:51:37Z

csrc/scheduler/hopper_multi_matmul.cpp

+      blockTileTensors({d});
+      parallelizeBlocks({d});
+      transformLikeMmaOutput(d, /*is_mma_result=*/false);

-      // Apply mma common transformation
-      for (auto tv : tvs_to_schedule) {
-        transformLikeMmaOutput(tv, /*is_mma_result=*/false);
-      }
+      scheduler_utils::BoundedDirectionalTransformPropagator::backward(
+          d,
+          -1,
+          {dc},
+          scheduler_utils::BoundedDirectionalTransformPropagator::Options()
+              .propagateParallelType()
+              .propagateToBoundary());

-      // Schedule register cache; Output from epilogue
-      {
-        auto s = mma_utils::MmaSwizzler::scheduleMmaOutputAllocation(
-            dc->getLoopDomain());
-        dc->setLoopDomain(s.as<IterDomain*>());
-        dc->setAllocationDomain(s.as<IterDomain*>(), true);
-      }
+      auto s = mma_utils::MmaSwizzler::scheduleMmaOutputAllocation(
+          dc->getLoopDomain());
+      dc->setLoopDomain(s.as<IterDomain*>());
+      dc->setAllocationDomain(s.as<IterDomain*>(), true);


Why propagate from d to dc then schedule dc directly like this instead of just directly scheduling dc? What does the propagation provide us? Could use this in a code comment.

Please take a look at the comment.

csrc/scheduler/hopper_multi_matmul.cpp

jacobhinkle

LGTM. Hopefully bypassing or manipulating stmatrix for float outputs will be straightforward.

csrc/scheduler/hopper_multi_matmul.cpp

rdspring1

Are there any bank conflicts in the epilogue fusions?

I'd run ncu and report anything interesting.

ncu -k regex:nvfuser --metrics 'regex:l1tex__data_bank_conflicts_pipe_lsu_mem_shared.*sum$' ./build/test_matmul --gtest_filter='*HopperMatmulScheduler*

protonu · 2024-12-17T18:39:58Z

Are there any bank conflicts in the epilogue fusions?

I'd run ncu and report anything interesting.
ncu -k regex:nvfuser --metrics 'regex:l1tex__data_bank_conflicts_pipe_lsu_mem_shared.*sum$' ./build/test_matmul --gtest_filter='*HopperMatmulScheduler*

For the test HopperMatmulSchedulerTest*FusedMultiplySumBiasNeg* there are no bank conflicts reported.

jacobhinkle · 2024-12-18T17:23:58Z

!build

protonu · 2024-12-18T17:31:19Z

!test

protonu requested review from jacobhinkle and rdspring1 December 12, 2024 18:55

rdspring1 reviewed Dec 12, 2024

View reviewed changes

protonu force-pushed the pbasu_mma_epilogue_hopper_no_smem_epilogue branch from 94f39a1 to be5f18b Compare December 13, 2024 18:41

protonu force-pushed the pbasu_mma_epilogue_hopper_smem_epilogue branch from b9df449 to aa86b8b Compare December 13, 2024 20:59

protonu requested a review from rdspring1 December 13, 2024 20:59

protonu marked this pull request as ready for review December 13, 2024 21:00

protonu force-pushed the pbasu_mma_epilogue_hopper_no_smem_epilogue branch from 895df0f to 09cb407 Compare December 14, 2024 01:42

protonu force-pushed the pbasu_mma_epilogue_hopper_smem_epilogue branch 2 times, most recently from c0cfa4e to af126a1 Compare December 14, 2024 03:55

Base automatically changed from pbasu_mma_epilogue_hopper_no_smem_epilogue to main December 14, 2024 14:46

protonu added 5 commits December 16, 2024 07:55

adding a new unit test for mma+bias and propating schedules

4dca55b

rebase and address reviewer comments

9787e91

removing header

4ae9a67

adding a new unit test for mma+bias and propating schedules

0c28599

rebase and address reviewer comments

e0d5fd9

protonu force-pushed the pbasu_mma_epilogue_hopper_smem_epilogue branch from af126a1 to 8066a54 Compare December 16, 2024 16:04

adding support for smem_epilogue

2c19191

protonu force-pushed the pbasu_mma_epilogue_hopper_smem_epilogue branch from 8066a54 to 2c19191 Compare December 16, 2024 16:27

Merge branch 'main' into pbasu_mma_epilogue_hopper_smem_epilogue

16eaf31

jacobhinkle reviewed Dec 16, 2024

View reviewed changes

adding a comment

223e391

protonu requested a review from jacobhinkle December 16, 2024 17:28

jacobhinkle reviewed Dec 16, 2024

View reviewed changes

csrc/scheduler/hopper_multi_matmul.cpp Outdated Show resolved Hide resolved

protonu requested a review from jacobhinkle December 16, 2024 18:00

protonu added 2 commits December 16, 2024 10:49

address reviewer comments

1615cb8

Merge branch 'main' into pbasu_mma_epilogue_hopper_smem_epilogue

ad7f17a

jacobhinkle reviewed Dec 16, 2024

View reviewed changes

csrc/scheduler/hopper_multi_matmul.cpp Outdated Show resolved Hide resolved

address reviewer comments

1cbcab3

protonu requested a review from jacobhinkle December 16, 2024 19:13

jacobhinkle approved these changes Dec 16, 2024

View reviewed changes

csrc/scheduler/hopper_multi_matmul.cpp Outdated Show resolved Hide resolved

rdspring1 reviewed Dec 16, 2024

View reviewed changes

protonu added 2 commits December 17, 2024 13:05

Merge branch 'main' into pbasu_mma_epilogue_hopper_smem_epilogue

5945437

addressing reviewer comments

d781e95

protonu added 2 commits December 17, 2024 14:06

Merge branch 'main' into pbasu_mma_epilogue_hopper_smem_epilogue

1429022

Merge branch 'main' into pbasu_mma_epilogue_hopper_smem_epilogue

e2ce5bc

Merge branch 'main' into pbasu_mma_epilogue_hopper_smem_epilogue

e755049

protonu merged commit c31b919 into main Dec 18, 2024
23 of 24 checks passed

protonu deleted the pbasu_mma_epilogue_hopper_smem_epilogue branch December 18, 2024 19:02

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Adding support for scheduling the epilogue computation when smem_epilogue parameter is true #3581

Adding support for scheduling the epilogue computation when smem_epilogue parameter is true #3581

protonu commented Dec 12, 2024 •

edited

Loading

rdspring1 left a comment

protonu commented Dec 13, 2024

protonu commented Dec 16, 2024

jacobhinkle Dec 16, 2024 •

edited

Loading

protonu Dec 16, 2024

jacobhinkle left a comment

rdspring1 left a comment

protonu commented Dec 17, 2024

jacobhinkle commented Dec 18, 2024

protonu commented Dec 18, 2024

Adding support for scheduling the epilogue computation when smem_epilogue parameter is true #3581

Adding support for scheduling the epilogue computation when smem_epilogue parameter is true #3581

Conversation

protonu commented Dec 12, 2024 • edited Loading

rdspring1 left a comment

Choose a reason for hiding this comment

protonu commented Dec 13, 2024

protonu commented Dec 16, 2024

jacobhinkle Dec 16, 2024 • edited Loading

Choose a reason for hiding this comment

protonu Dec 16, 2024

Choose a reason for hiding this comment

jacobhinkle left a comment

Choose a reason for hiding this comment

rdspring1 left a comment

Choose a reason for hiding this comment

protonu commented Dec 17, 2024

jacobhinkle commented Dec 18, 2024

protonu commented Dec 18, 2024

protonu commented Dec 12, 2024 •

edited

Loading

jacobhinkle Dec 16, 2024 •

edited

Loading