-
Notifications
You must be signed in to change notification settings - Fork 282
Description
🐛 Describe the bug
Hello.
I decided to test an example from an Openvino-Notebooks (https://github.com/openvinotoolkit/openvino_notebooks/blob/latest/notebooks/qwen3-tts/qwen3-tts.ipynb), and errors appeared during the conversion.
Previously, I had to change the original code from the example (notebook) to the one below.
Are these errors critical for the operation of the converted model in INT8?
Unfortunately, it generates only 8 seconds of audio. It cuts off part of the text.
⌛ Qwen/Qwen3-TTS-12Hz-1.7B-Base conversion started. Be patient, it may take some time.
⌛ Load Original model
Fetching 4 files: 100%
4/4 [00:00<00:00, 261.21it/s]
✅ Cleaned up config.json (removed model_type from speaker_encoder_config)
✅ Original model successfully loaded
⌛ Convert talker embedding model
✅ Talker embedding model successfully converted
⌛ Convert talker text embedding model
✅ Talker text embedding model successfully converted
⌛ Convert talker text_projection model
✅ Talker text_projection model successfully converted
⌛ Convert Talker Language model
✅ Talker language model successfully converted
⌛ Weights compression with int8_sym mode started
INFO:nncf:Statistics of the bitwidth distribution:
+---------------------------+-----------------------------+----------------------------------------+
| Weight compression mode | % all parameters (layers) | % ratio-defining parameters (layers) |
+===========================+=============================+========================================+
| int8_sym, per-channel | 100% (197 / 197) | 100% (197 / 197) |
+---------------------------+-----------------------------+----------------------------------------+
Applying Weight Compression ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 100% • 0:00:15 • 0:00:00
✅ Weights compression finished
✅ Talker model conversion finished. You can find results in Qwen3-TTS-Base-1.7B-OV
⌛ Convert talker code predictor embedding model
✅ Talker Code Predictor Embedding model successfully converted
⌛ Convert Talker Code Predictor model
✅ Talker Code Predictor model successfully converted
⌛ Weights compression with int8_sym mode started
INFO:nncf:Statistics of the bitwidth distribution:
+---------------------------+-----------------------------+----------------------------------------+
| Weight compression mode | % all parameters (layers) | % ratio-defining parameters (layers) |
+===========================+=============================+========================================+
| int8_sym, per-channel | 100% (51 / 51) | 100% (51 / 51) |
+---------------------------+-----------------------------+----------------------------------------+
Applying Weight Compression ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 100% • 0:00:03 • 0:00:00
✅ Weights compression finished
✅ Talker Code Predictor model conversion finished. You can find results in Qwen3-TTS-Base-1.7B-OV
⌛ Convert Speaker Encoder model
c:\llm\openvino_notebooks\notebooks\qwen3-tts\Qwen3-TTS\qwen_tts\core\models\modeling_qwen3_tts.py:203: TracerWarning: Using len to get tensor shape might cause the trace to be incorrect. Recommended usage would be tensor.shape[0]. Passing a tensor of different shape might lead to errors or silently give incorrect results.
len(length), max_len
c:\llm\openvino_notebooks\notebooks\qwen3-tts\Qwen3-TTS\qwen_tts\core\models\modeling_qwen3_tts.py:206: TracerWarning: torch.as_tensor results are registered as constants in the trace. You can safely ignore this warning if you use this function to create tensors out of constant variables that would be the same every time you call this function. In any other case, this might cause the trace to be incorrect.
mask = torch.as_tensor(mask, dtype=dtype, device=device)
✅ Speaker Encoder model successfully converted
✓ Found speech tokenizer at C:\Users\uuk\.cache\huggingface\hub\models--Qwen--Qwen3-TTS-12Hz-1.7B-Base\snapshots\fd4b254389122332181a7c3db7f27e918eec64e3\speech_tokenizer
⌛ Speech tokenizer conversion started. Be patient, it may take some time.
⌛ Load Speech tokenizer model
✅ Speech tokenizer model successfully loaded
⌛ Convert speech tokenizer encoder
C:\Users\uuk\miniconda3\envs\openvino_notebooks\Lib\site-packages\transformers\models\mimi\modeling_mimi.py:1552: TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!
if channels < 1 or channels > 2:
C:\Users\uuk\miniconda3\envs\openvino_notebooks\Lib\site-packages\transformers\masking_utils.py:738: TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!
if batch_size != position_ids.shape[0]:
✅ Speech tokenizer encoder successfully converted
⌛ Convert speech tokenizer decoder
c:\llm\openvino_notebooks\notebooks\qwen3-tts\Qwen3-TTS\qwen_tts\core\tokenizer_12hz\modeling_qwen3_tts_tokenizer_v2.py:888: TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!
while start_index < codes.shape[-1]:
c:\llm\openvino_notebooks\notebooks\qwen3-tts\Qwen3-TTS\qwen_tts\core\tokenizer_12hz\modeling_qwen3_tts_tokenizer_v2.py:889: TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!
end_index = min(start_index + chunk_size, codes.shape[-1])
c:\llm\openvino_notebooks\notebooks\qwen3-tts\Qwen3-TTS\qwen_tts\core\tokenizer_12hz\modeling_qwen3_tts_tokenizer_v2.py:869: TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!
if codes.shape[1] != self.config.num_quantizers:
c:\llm\openvino_notebooks\notebooks\qwen3-tts\Qwen3-TTS\qwen_tts\core\tokenizer_12hz\modeling_qwen3_tts_tokenizer_v2.py:722: TracerWarning: Iterating over a tensor might cause the trace to be incorrect. Passing a tensor of different shape won't change the number of iterations executed (and might lead to errors or silently give incorrect results).
for idx, layer_codes in enumerate(codes):
c:\llm\openvino_notebooks\notebooks\qwen3-tts\Qwen3-TTS\qwen_tts\core\tokenizer_12hz\modeling_qwen3_tts_tokenizer_v2.py:818: TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!
if codes.shape[1] > self.n_q_semantic:
c:\llm\openvino_notebooks\notebooks\qwen3-tts\Qwen3-TTS\qwen_tts\core\tokenizer_12hz\modeling_qwen3_tts_tokenizer_v2.py:186: TracerWarning: Converting a tensor to a Python float might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!
ideal_length = (math.ceil(n_frames) - 1) * self.stride + (self.kernel_size - self.padding)
✅ Speech tokenizer decoder successfully converted
✅ Speech tokenizer conversion finished. You can find results in Qwen3-TTS-Base-1.7B-OV\speech_tokenizer
Environment
uv pip list
Using Python 3.12.12 environment at: C:\Users\uuk\miniconda3\envs\openvino_notebooks
Package Version Editable project location
about-time 4.2.1
accelerate 1.12.0
aiofiles 24.1.0
alive-progress 3.3.0
anyio 4.12.1
argon2-cffi 25.1.0
argon2-cffi-bindings 25.1.0
arrow 1.4.0
asttokens 3.0.1
async-lru 2.1.0
attrs 25.4.0
audioread 3.1.0
autograd 1.8.0
babel 2.18.0
beautifulsoup4 4.14.3
bleach 6.3.0
brotli 1.2.0
certifi 2026.1.4
cffi 2.0.0
charset-normalizer 3.4.4
cma 4.4.2
colorama 0.4.6
comm 0.2.3
contourpy 1.3.3
cycler 0.12.1
debugpy 1.8.20
decorator 5.2.1
defusedxml 0.7.1
deprecated 1.3.1
einops 0.8.2
executing 2.2.1
fastjsonschema 2.21.2
ffmpy 1.0.0
filelock 3.24.2
flatbuffers 25.12.19
fonttools 4.61.1
fqdn 1.5.1
fsspec 2026.2.0
gradio 6.5.1
gradio-client 2.0.3
graphemeu 0.7.2
groovy 0.1.2
h11 0.16.0
hf-xet 1.2.0
httpcore 1.0.9
httpx 0.28.1
huggingface-hub 0.36.2
idna 3.11
ipykernel 6.31.0
ipython 9.10.0
ipython-pygments-lexers 1.1.1
ipywidgets 8.1.8
isoduration 20.11.0
jedi 0.19.2
jinja2 3.1.6
joblib 1.5.3
json5 0.13.0
jsonpointer 3.0.0
jsonschema 4.26.0
jsonschema-specifications 2025.9.1
jupyter-client 8.8.0
jupyter-core 5.9.1
jupyter-events 0.12.0
jupyter-lsp 2.3.0
jupyter-server 2.17.0
jupyter-server-terminals 0.5.4
jupyterlab 4.5.4
jupyterlab-pygments 0.3.0
jupyterlab-server 2.28.0
jupyterlab-widgets 3.0.16
kiwisolver 1.4.9
lark 1.3.1
lazy-loader 0.4
librosa 0.11.0
llvmlite 0.46.0
markdown-it-py 4.0.0
markupsafe 3.0.3
matplotlib 3.10.8
matplotlib-inline 0.2.1
mdurl 0.1.2
mistune 3.2.0
moocore 0.2.0
mpmath 1.3.0
natsort 8.4.0
nbclient 0.10.4
nbconvert 7.17.0
nbformat 5.10.4
nest-asyncio 1.6.0
networkx 3.4.2
ninja 1.13.0
nncf 2.19.0
notebook-shim 0.2.4
numba 0.63.1
numpy 2.2.6
onnxruntime 1.24.1
openvino 2025.4.1
openvino-telemetry 2025.2.0
orjson 3.11.7
packaging 26.0
pandas 2.3.3
pandocfilters 1.5.1
parso 0.8.6
pillow 12.1.1
pip 26.0.1
platformdirs 4.9.2
pooch 1.9.0
prometheus-client 0.24.1
prompt-toolkit 3.0.52
psutil 7.2.2
pure-eval 0.2.3
pycparser 3.0
pydot 3.0.4
pydub 0.25.1
pygments 2.19.2
pymoo 0.6.1.6
pyparsing 3.3.2
python-dateutil 2.9.0.post0
python-json-logger 4.0.0
python-multipart 0.0.22
pytz 2025.2
pywinpty 3.0.3
pyyaml 6.0.3
pyzmq 27.1.0
qwen-tts 0.0.4 C:\llm\openvino_notebooks\notebooks\qwen3-tts\Qwen3-TTS
referencing 0.37.0
regex 2026.1.15
requests 2.32.5
rfc3339-validator 0.1.4
rfc3986-validator 0.1.1
rfc3987-syntax 1.1.0
rich 14.3.2
rpds-py 0.30.0
safehttpx 0.1.7
safetensors 0.7.0
scikit-learn 1.8.0
scipy 1.17.0
semantic-version 2.10.0
send2trash 2.1.0
setuptools 82.0.0
shellingham 1.5.4
six 1.17.0
soundfile 0.13.1
soupsieve 2.8.3
sox 1.5.0
soxr 1.0.0
stack-data 0.6.3
sympy 1.14.0
tabulate 0.9.0
terminado 0.18.1
threadpoolctl 3.6.0
tinycss2 1.4.0
tokenizers 0.22.2
tomlkit 0.13.3
torch 2.8.0+cpu
torchaudio 2.8.0+cpu
tornado 6.5.4
tqdm 4.67.3
traitlets 5.14.3
transformers 4.57.3
typer 0.23.1
typer-slim 0.23.1
typing-extensions 4.15.0
tzdata 2025.3
uri-template 1.3.0
urllib3 2.6.3
wcwidth 0.6.0
webcolors 25.10.0
webencodings 0.5.1
websocket-client 1.9.0
wheel 0.46.3
widgetsnbextension 4.0.15
wrapt 2.1.1
Minimal Reproducible Example
The code for converting the model
Qwen3-TTS-Base-1.7B-OV
import nncf
from pathlib import Path
model_name = model_selector.value
model_id = model_options[model_name]
ov_model_dir = Path(f"{model_name}-OV")
convert_qwen3_tts_model(
model_id=model_id,
output_dir=ov_model_dir,
quantization_config={
# Use the Enum instead of a string
"mode": nncf.CompressWeightsMode.INT8_SYM
}
)
Are you going to submit a PR?
- Yes I'd like to help by submitting a PR!