Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow overwrite flashinfer use_tensorcore #2169

Merged
merged 2 commits into from
Nov 25, 2024
Merged

Conversation

merrymercy
Copy link
Contributor

No description provided.

else:
self.decode_use_tensor_cores = False
if not _grouped_size_compiled_for_decode_kernels(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

May we remove this _grouped_size_compiled_for_decode_kernels I think it's useless in FlashInfer v0.2 cc @yzh119

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes we can use some heuristic:

  1. For fp16, use_tensor_cores=True when gqa_group_size > 4
  2. For fp8, we can always enable use_tensor_cores=True

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree.

@merrymercy merrymercy merged commit 8e1adb8 into main Nov 25, 2024
1 of 13 checks passed
@merrymercy merrymercy deleted the pr-fix-flashinfer branch November 25, 2024 04:58
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants