Skip to content

Support PTQ for Qwen3 Omni #647

@tommyip

Description

@tommyip

Detailed description of the requested feature

Qwen3 Omni is a relatively new omni model that can process text/audio/image inputs and text/audio outputs. Would be nice if we can quantize it using modelopt to NVFP4 or AWQ.

The model consists of a few submodules:

  • Vision encoder (Same as Qwen3 VL Vision Encoder)
  • Audio encoder (Stack of conv2d + stack of encoder-decoders)
  • Thinker (Qwen3 30B A3B)
  • Audio output: Talker (Qwen3 MoE scaled down with some minor variations)
  • Audio output: Multi-token prediction (Qwen3 MoE scaled down with some minor variations)
  • Audio output: Code2Wav (ConvNet)

Describe alternatives you've considered

Qwen3 Omni is supposed to be supported by GPTQModel but has vram memory leak issue.

Target hardware/use case

Blackwell

Metadata

Metadata

Assignees

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions