Accept Hopper matmuls and update default heuristic #3579

jacobhinkle · 2024-12-12T14:42:21Z

No description provided.

This enables Hopper matmul in our automatic scheduler by translating them without introducing new broadcasts. Specifically: 1. Update `mma_utils::MatmulPattern::translateToMmaOp` to optionally avoid intermediates by using an `MmaOp::AxisMapping`. Enable this option when the target arch is not Ampere or Turing. 3. Unguard some tests in `test_translate_mma.cpp` This does not update the default heuristic or change the `canSchedule` checks. See #3579 for that follow-up PR --------- Co-authored-by: Ryan Spring <[email protected]> Co-authored-by: Naoya Maruyama <[email protected]> Co-authored-by: Jingyue Wu <[email protected]> Co-authored-by: nsarka <[email protected]> Co-authored-by: Protonu <[email protected]> Co-authored-by: samnordmann <[email protected]>

Must have been a broken merge

I'm still skipping the ones with batch dimensions on A since these hit an error currently. Will investigate later but we only need 2d A for now.

jacobhinkle · 2024-12-19T20:00:30Z

csrc/scheduler/mma_utils.cpp

-        axis_mapping.a_axes.push_back(d);
-      }
-      axis_mapping.a_axes.reserve(out_dim);
-      for (size_t d : c10::irange(out_dim - 2)) {


I think this was just due to a busted merge.

jacobhinkle · 2024-12-19T20:36:24Z

csrc/scheduler/matmul_utils.cpp

+      macro_encode.n = 256;
+      while (macro_encode.n >= 8) {
+        if (n_extent % macro_encode.n != 0) {
+          macro_encode.n /= 2;


Currently this only chooses powers of two. For small problems I think we could choose one of the other sizes. For example if n_extent == 72 then we should probably use that size.

jacobhinkle · 2024-12-19T20:38:05Z

csrc/scheduler/matmul_utils.cpp

+
+    const auto tryIncreaseM = [&]() {
+      if (ratiosValid(m_ratio + 1, n_ratio)) {
+        m_ratio++;


Should these also be powers of two? Currently this will chooses sizes like 192

Should fix this for both matmul and linear, and for avoid_intermediates_ and not

The dtype for stmatrix should have never been constrained to only Half. The only constraint we have is that the dtype size is 16-bit. This PR is needed for us to actually use stmatrix in bfloat16 matmuls.

…mul_heuristic

Accept Hopper matmuls and update default heuristic

cb13e25

jacobhinkle mentioned this pull request Dec 12, 2024

Enable translation of Hopper matmuls #3440

Merged

jacobhinkle added 6 commits December 18, 2024 09:05

Merge remote-tracking branch 'origin/main' into hopper_matmul_heuristic

cd2d1e1

Merge remote-tracking branch 'origin/main' into hopper_matmul_heuristic

1692b5d

Fix bug in translating hopper LinearOps

c8097b9

Must have been a broken merge

Factor default heuristic by arch

3751734

Unguard 2dA_2dB LinearOp translation tests on Hopper

076d56a

I'm still skipping the ones with batch dimensions on A since these hit an error currently. Will investigate later but we only need 2d A for now.

Check prologues in compile-time check

700df1f

jacobhinkle commented Dec 19, 2024

View reviewed changes

Fix up getMmaOp

89f4887

jacobhinkle commented Dec 19, 2024

View reviewed changes

Revert innocuous change to ampere path

15200fe

jacobhinkle commented Dec 19, 2024

View reviewed changes

jacobhinkle added 4 commits December 19, 2024 16:00

Fix condition in prologue check

bba9c88

Merge remote-tracking branch 'origin/main' into hopper_matmul_heuristic

e5def4c

Fix up merge

9a691c6

Add incomplete fix for repeated operands in patterns

80c1232

Should fix this for both matmul and linear, and for avoid_intermediates_ and not

jacobhinkle changed the title ~~[WIP] Accept Hopper matmuls and update default heuristic~~ Accept Hopper matmuls and update default heuristic Dec 20, 2024

jacobhinkle added 6 commits December 23, 2024 09:55

Enable BFloat16 in stmatrix

fbde1e2

The dtype for stmatrix should have never been constrained to only Half. The only constraint we have is that the dtype size is 16-bit. This PR is needed for us to actually use stmatrix in bfloat16 matmuls.

Remove mistakenly-pasted line

e311250

Merge remote-tracking branch 'origin/stmatrix_bfloat' into hopper_mat…

7305778

…mul_heuristic

Fix compile error

3f7b6a6

Merge remote-tracking branch 'origin/stmatrix_bfloat' into hopper_mat…

d8f80e2

…mul_heuristic

Merge remote-tracking branch 'origin/main' into hopper_matmul_heuristic

6c17823

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Accept Hopper matmuls and update default heuristic #3579

Accept Hopper matmuls and update default heuristic #3579

jacobhinkle commented Dec 12, 2024

jacobhinkle Dec 19, 2024

jacobhinkle Dec 19, 2024

jacobhinkle Dec 19, 2024

Accept Hopper matmuls and update default heuristic #3579

Are you sure you want to change the base?

Accept Hopper matmuls and update default heuristic #3579

Conversation

jacobhinkle commented Dec 12, 2024

jacobhinkle Dec 19, 2024

Choose a reason for hiding this comment

jacobhinkle Dec 19, 2024

Choose a reason for hiding this comment

jacobhinkle Dec 19, 2024

Choose a reason for hiding this comment