-
Notifications
You must be signed in to change notification settings - Fork 213
Open
Labels
Description
Detailed description of the requested feature
Qwen3 Omni is a relatively new omni model that can process text/audio/image inputs and text/audio outputs. Would be nice if we can quantize it using modelopt to NVFP4 or AWQ.
The model consists of a few submodules:
- Vision encoder (Same as Qwen3 VL Vision Encoder)
- Audio encoder (Stack of conv2d + stack of encoder-decoders)
- Thinker (Qwen3 30B A3B)
- Audio output: Talker (Qwen3 MoE scaled down with some minor variations)
- Audio output: Multi-token prediction (Qwen3 MoE scaled down with some minor variations)
- Audio output: Code2Wav (ConvNet)
Describe alternatives you've considered
Qwen3 Omni is supposed to be supported by GPTQModel but has vram memory leak issue.
Target hardware/use case
Blackwell