Fixing the issue of CCL support during the decoding phase of Disaggregated Serving#776
Open
Fixing the issue of CCL support during the decoding phase of Disaggregated Serving#776
Conversation
47dae86 to
13dcb55
Compare
992574f to
508611e
Compare
508611e to
8e8e588
Compare
Contributor
|
@ochougul Can you please review this? |
819910f to
431c4c5
Compare
quic-rishinr
approved these changes
Feb 13, 2026
…gated Serving and also adding the CCL support during Prefilling process Signed-off-by: Vahid Janfaza <vjanfaza@qti.qualcomm.com>
…gated Serving and also adding the CCL support during Prefilling process Signed-off-by: Vahid Janfaza <vjanfaza@qti.qualcomm.com>
…gated Serving and also adding the CCL support during Prefilling process Signed-off-by: Vahid Janfaza <vjanfaza@qti.qualcomm.com>
…gated Serving Signed-off-by: Vahid Janfaza <vjanfaza@qti.qualcomm.com>
431c4c5 to
9f01a4a
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
In this PR, we are addressing the compilation error which is happening when we enable CCL during decoding qpc generation of gpt-oss model in Disaggregated Serving. For example, in the following command:
python3 -m qaic_disagg
--prefill-port 9802
--decode-port 9902
--port 8002
--decode-device-group 16,17,18,19
--prefill-device-group 20,21,22,23
--model openai/gpt-oss-20b
--prefill-max-num-seqs 1
--decode-max-num-seqs 1
--prefill-max-seq-len-to-capture 128
--max-model-len 4096
--prefill-override-qaic-config "split_retained_state_io:True mxfp6_matmul:True enable_chunking:True"
--decode-override-qaic-config "mxfp6_matmul:True retain_full_kv:True ccl_enabled=True comp_ctx_lengths_decode=1024,2048,4096"
-vvv
--dtype bfloat16
--kv-cache-dtype mxint8
--kv-handOff-port 5068
--tool-call-parser openai
--enable-auto-tool-choice
--enable-log-outputs
We are activating CCL during decoding however this causes a compilation error "Error message: No input that uniquely identifies specialization". The source of this error is because of new changes in modeling_gpt_oss.py script which were for the support of disaggregated serving in gpt-oss however it causes error with CCL feature.