-
Notifications
You must be signed in to change notification settings - Fork 528
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[TEST] flashinfer version upgrade to v0.2.0 #2054
base: main
Are you sure you want to change the base?
[TEST] flashinfer version upgrade to v0.2.0 #2054
Conversation
|
||
|
||
class WrapperDispatch(Enum): | ||
SLIDING_WINDOW = auto() | ||
CROSS_ATTENTION = auto() | ||
|
||
|
||
def _grouped_size_compiled_for_decode_kernels( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
With FlashInfer v0.2, Tensor Core will be enabled by default for decoding, so this is no longer necessary.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was looking at this issue: flashinfer-ai/flashinfer#549. I can try running without this function to ensure that it works as intended.
@@ -7,6 +7,8 @@ | |||
Each backend supports two operators: extend (i.e. prefill with cached prefix) and decode. | |||
""" | |||
|
|||
# |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please delete this.
@@ -612,6 +619,8 @@ def call_begin_forward( | |||
self.num_qo_heads, | |||
self.num_kv_heads, | |||
self.head_dim, | |||
q_data_type=self.q_data_type, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
QQ Why do you specify the data type here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am trying to sync with @yzh119 on this; I was seeing a dtype mismatch in the BatchPrefillWithRaggedKVCacheWrapper plan function otherwise.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I’ve fixe it with the hopper branch. cc @yzh119 I’ll land soon.
scripts/ci_install_dependency.sh
Outdated
@@ -5,4 +5,8 @@ Install the dependency in CI. | |||
pip install --upgrade pip | |||
pip install -e "python[all]" | |||
pip install transformers==4.45.2 sentence_transformers accelerate peft | |||
pip install flashinfer -i https://flashinfer.ai/whl/cu121/torch2.4/ --force-reinstall | |||
git clone https://github.com/flashinfer-ai/flashinfer.git --recursive | |||
git reset --hard 32d9510d67187f1f3a379cce81302cdd15a557d2 # Revert to before PR https://github.com/flashinfer-ai/flashinfer/pull/609 merged |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can use this for testing, not for release.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, this makes sense. This PR is mainly to test flashinfer for #2016 with flashinfer from source as it is failing all tests. We wanted to know where the issue stems from.
180a79b
to
e58161e
Compare
a0938ab
to
fbb8844
Compare
fbb8844
to
3815058
Compare
@james-p-xu If you want to use the latest version of FlashInfer, you can refer to https://github.com/flashinfer-ai/flashinfer-nightly/releases |
7b04976
to
b93675b
Compare
…) with QKV dtypes
da5ccdb
to
6655727
Compare
Motivation
Modifications
Checklist