Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Test horizontal matmul fusion in Llama2FFN test #3610

Merged
merged 4 commits into from
Dec 19, 2024

Conversation

jacobhinkle
Copy link
Collaborator

@jacobhinkle jacobhinkle commented Dec 18, 2024

This removes some barriers to horizontal fusion and updates the test which is currently Ampere-only.

Note that most of the horizontal fusion code hasn't been exercised much so we might continue hitting small snags as we start using it more. My intention with this PR is to test it automatically by modifying the test. Likewise, we will need changes to the canSchedule checks and default heuristics to ensure sane behavior when doing horizontal fusions, so there will likely be more PRs of this flavor soon.

This removes some barriers to horizontal fusion and updates the test
which is currently Ampere-only.
@jacobhinkle
Copy link
Collaborator Author

!test

@@ -750,14 +760,21 @@ std::unique_ptr<MatmulParams> getMatmulHeuristics(
problem_shape[(size_t)MatmulDimRole::Batch],
inner_dims,
tensor_roles);
// TODO: more sophisticated handling of multiple matmuls when using plugin
mparams->tile_sizes.cta_tile.m /= patterns.size();
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ideally the heuristic would understand that we might have multiple matmuls being computed in a single main loop, and that the operands for that main loop can be loaded simultaneously. For example if the A operand is used in two matmuls then we will have 3 operands loaded instead of 3, meaning we should make the CTA tile at most 2/3 as large as it would be for a single matmul. This is more conservative for now to ensure we don't run out of smem.

@jacobhinkle jacobhinkle marked this pull request as ready for review December 19, 2024 15:38
@jacobhinkle
Copy link
Collaborator Author

!build

Copy link
Collaborator

@rdspring1 rdspring1 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM to my naive eyes.

Comment: We have many options for matmuls like FuseMultipleMatmuls and FuseMatmul. Is it because nvfuser accepts matmuls, but they are routed to aten by default? I wonder if online tuning would be useful to pick between aten and nvfuser.

csrc/scheduler/matmul_utils.cpp Outdated Show resolved Hide resolved
@jacobhinkle
Copy link
Collaborator Author

Comment: We have many options for matmuls like FuseMultipleMatmuls and FuseMatmul. Is it because nvfuser accepts matmuls, but they are routed to aten by default? I wonder if online tuning would be useful to pick between aten and nvfuser.

No it should not be a scheduler decision I think. The intention is for us to trust our matmul fusion enough to enable both of these by default and convert them to DisableOptions. It is true that we might have more matmul options related to fusion though, so maybe we could make multiple_matmuls an argument to the fuse_matmul option, so we could use it like NVFUSER_ENABLE=fuse_matmul(multiple_matmuls) instead.

@jacobhinkle
Copy link
Collaborator Author

!build

@jacobhinkle jacobhinkle merged commit 962f002 into main Dec 19, 2024
17 checks passed
@jacobhinkle jacobhinkle deleted the horizontal_matmul_fusion branch December 19, 2024 19:59
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants