Skip to content

Onboarding Qwen3VL Dense#780

Draft
qcdipankar wants to merge 11 commits intoquic:mainfrom
qcdipankar:qwen3_vl
Draft

Onboarding Qwen3VL Dense#780
qcdipankar wants to merge 11 commits intoquic:mainfrom
qcdipankar:qwen3_vl

Conversation

@qcdipankar
Copy link
Contributor

Adding Qwen3VL Support to QEff

requires-python = ">=3.8,<3.11"
dependencies = [
"transformers==4.55.0",
"transformers==4.57.0",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@quic-rishinr / @quic-hemagnih : can we trigger TA?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes we should raise it, and start the run of all the models with 4.57 in parallel, typically it takes 1week.

attention_mask, torch.tensor(MIN_MASKED_ATTENTION_VALUE, dtype=torch.float32), attn_weights
)

attn_weights = nn.functional.softmax(attn_weights, dim=-1, dtype=torch.float32).to(query.dtype)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you set this to dtype passed from pretrained()

Copy link
Contributor

@quic-hemagnih quic-hemagnih left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am still reviewing the modelling file.


messages = [messages] * batch_size

inputs = processor.apply_chat_template(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we can combine the code from line 62 to 77 and 122 to 140 at one place.

Idea is to avoid the code repetition.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this we can discuss

requires-python = ">=3.8,<3.11"
dependencies = [
"transformers==4.55.0",
"transformers==4.57.0",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes we should raise it, and start the run of all the models with 4.57 in parallel, typically it takes 1week.

qcdipankar and others added 7 commits February 16, 2026 13:12
Signed-off-by: Dipankar Sarkar <dipankar@qti.qualcomm.com>
Signed-off-by: Dipankar Sarkar <dipankar@qti.qualcomm.com>
Signed-off-by: Dipankar Sarkar <dipankar@qti.qualcomm.com>
Signed-off-by: Dipankar Sarkar <dipankar@qti.qualcomm.com>
Signed-off-by: Dipankar Sarkar <dipankar@qti.qualcomm.com>
Signed-off-by: Dhiraj Kumar Sah <dhirajku@qti.qualcomm.com>
Signed-off-by: Dipankar Sarkar <dipankar@qti.qualcomm.com>
Signed-off-by: Dipankar Sarkar <dipankar@qti.qualcomm.com>
Signed-off-by: Dipankar Sarkar <dipankar@qti.qualcomm.com>
Signed-off-by: Dipankar Sarkar <dipankar@qti.qualcomm.com>
Signed-off-by: Dipankar Sarkar <dipankar@qti.qualcomm.com>
@qcdipankar qcdipankar marked this pull request as draft February 19, 2026 09:16
QEffMptForCausalLM,
QEffPhi3ForCausalLM,
QEffQwen2ForCausalLM,
QEffQwen_2_5_vl_DecoderWrapper,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you add QEffQwen3VLDecoderWrapper here under SamplerTransform? The on-device sampling is generic, so it can support new VLMs. Thank you.

If not, we can also raise a new patch @quic-sanising

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants

Comments