-
Notifications
You must be signed in to change notification settings - Fork 53
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Test horizontal matmul fusion in Llama2FFN test #3610
Conversation
This removes some barriers to horizontal fusion and updates the test which is currently Ampere-only.
!test |
csrc/scheduler/matmul_utils.cpp
Outdated
@@ -750,14 +760,21 @@ std::unique_ptr<MatmulParams> getMatmulHeuristics( | |||
problem_shape[(size_t)MatmulDimRole::Batch], | |||
inner_dims, | |||
tensor_roles); | |||
// TODO: more sophisticated handling of multiple matmuls when using plugin | |||
mparams->tile_sizes.cta_tile.m /= patterns.size(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ideally the heuristic would understand that we might have multiple matmuls being computed in a single main loop, and that the operands for that main loop can be loaded simultaneously. For example if the A operand is used in two matmuls then we will have 3 operands loaded instead of 3, meaning we should make the CTA tile at most 2/3 as large as it would be for a single matmul. This is more conservative for now to ensure we don't run out of smem.
!build |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM to my naive eyes.
Comment: We have many options for matmuls like FuseMultipleMatmuls
and FuseMatmul
. Is it because nvfuser
accepts matmuls, but they are routed to aten
by default? I wonder if online tuning would be useful to pick between aten
and nvfuser
.
Co-authored-by: Ryan Spring <[email protected]>
No it should not be a scheduler decision I think. The intention is for us to trust our matmul fusion enough to enable both of these by default and convert them to |
!build |
This removes some barriers to horizontal fusion and updates the test which is currently Ampere-only.
Note that most of the horizontal fusion code hasn't been exercised much so we might continue hitting small snags as we start using it more. My intention with this PR is to test it automatically by modifying the test. Likewise, we will need changes to the canSchedule checks and default heuristics to ensure sane behavior when doing horizontal fusions, so there will likely be more PRs of this flavor soon.