Qwen3-TTS on Mac M4

Local text-to-speech using Qwen3-TTS with Apple Silicon GPU acceleration.

Requirements

macOS with Apple Silicon (M1/M2/M3/M4)
Python 3.12 (via pyenv)
~4GB disk space for models

Quick Start

Activate the environment

cd /Users/keith/Projects/Qwen-TTS
source qwen3-tts-env/bin/activate

Voice TTS Web UI (Recommended)

Train your voice once, then generate unlimited speech.

Start the Web UI

source qwen3-tts-env/bin/activate
python voice_tts_app.py

Open http://localhost:7860 in your browser.

How to Use

Tab 1 - Train Voice (one time):

Upload or record your voice (5-15 seconds)
Enter the transcript of what you said (improves quality)
Give your voice profile a name (e.g., "my_voice")
Click "Train & Save Voice Profile"

Tab 2 - Generate Speech (unlimited):

Select your saved voice profile
Enter any text you want spoken
Click "Generate Speech"
Download the .wav file

Voice Profiles

Profiles are saved in voice_profiles/ and persist between sessions.

Stop the Web UI

pkill -f "voice_tts_app.py"

Alternative: One-Shot Voice Clone

For quick one-time voice cloning without saving a profile:

python voice_clone_app.py

Open http://localhost:7860 - upload audio, enter text, generate speech.

Command Line Usage

Basic Voice Cloning

The Base model uses voice cloning - it requires a reference audio file to clone the voice characteristics.

import torch
from qwen_tts import Qwen3TTSModel
import soundfile as sf

# Load model
model = Qwen3TTSModel.from_pretrained(
    "./models/Qwen3-TTS-12Hz-1.7B-Base",
    device_map="mps",
    dtype=torch.float32,
)

# Generate speech (x_vector_only_mode uses speaker embedding from reference)
wavs, sr = model.generate_voice_clone(
    text="Hello, this is a test of Qwen text to speech.",
    language="english",
    ref_audio="reference.wav",  # Your reference audio file
    x_vector_only_mode=True,
    do_sample=True,
    temperature=0.8,
)

# Save output
sf.write("output.wav", wavs[0], sr)

With Reference Text (ICL Mode)

For better voice cloning, provide a transcript of the reference audio:

wavs, sr = model.generate_voice_clone(
    text="Text you want to synthesize.",
    language="english",
    ref_audio="reference.wav",
    ref_text="Transcript of what is said in reference.wav",
    x_vector_only_mode=False,  # Enables ICL mode
    do_sample=True,
    temperature=0.8,
)

Supported Languages

auto - Auto-detect
english, chinese, french, german, italian
japanese, korean, portuguese, russian, spanish

Create Voice Profile (CLI)

# With transcript (better quality)
python create_voice_profile.py my_recording.wav --text "What I said in the recording"

# Without transcript
python create_voice_profile.py my_recording.wav --x-vector-only

# Custom output path
python create_voice_profile.py my_recording.wav -t "transcript" -o voice_profiles/custom_name.pkl

Run the Test Script

python test_tts.py
afplay output.wav  # Play the result

ComfyUI Web Interface

Start ComfyUI

source qwen3-tts-env/bin/activate
python ComfyUI/main.py --listen 0.0.0.0

Open http://localhost:8188 in your browser.

Using Qwen3-TTS Nodes

Right-click canvas → Add Node → Search for "Qwen3"
Add Qwen3TTSModelLoader node
Add Qwen3TTSGenerate node
Connect them and configure:
- Model path: ./models/Qwen3-TTS-12Hz-1.7B-Base
- Text: Your text to synthesize
- Language: english (or other supported language)
- Reference audio: Upload or connect an audio file

Stop ComfyUI

pkill -f "ComfyUI/main.py"

Project Structure

Qwen-TTS/
├── qwen3-tts-env/          # Python virtual environment
├── models/
│   ├── Qwen3-TTS-12Hz-1.7B-Base/      # Main TTS model
│   └── Qwen3-TTS-Tokenizer-12Hz/      # Speech tokenizer
├── voice_profiles/         # Saved voice profiles (.pkl files)
├── ComfyUI/
│   └── custom_nodes/
│       └── ComfyUI-Qwen3-TTS/         # TTS nodes for ComfyUI
├── voice_tts_app.py        # Main web UI: train + generate (recommended)
├── voice_clone_app.py      # One-shot voice cloning web UI
├── create_voice_profile.py # CLI tool to create voice profiles
├── tts_app.py              # Simple TTS using saved profiles
├── test_tts.py             # Command line test script
└── output.wav              # Generated audio output

Tips

Reference audio quality matters - Use clear, noise-free recordings for best results
MPS acceleration - The model runs on Apple Silicon GPU automatically
Temperature - Lower (0.6-0.8) for more consistent output, higher (0.9-1.0) for variation
flash-attn warning - Safe to ignore; it's CUDA-only and doesn't affect Mac

Troubleshooting

"SoX could not be found"

brew install sox

Model loading errors

Ensure models are downloaded:

huggingface-cli download Qwen/Qwen3-TTS-12Hz-1.7B-Base --local-dir ./models/Qwen3-TTS-12Hz-1.7B-Base
huggingface-cli download Qwen/Qwen3-TTS-Tokenizer-12Hz --local-dir ./models/Qwen3-TTS-Tokenizer-12Hz

ComfyUI custom node not showing

Restart ComfyUI - it loads nodes on startup.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Qwen3-TTS on Mac M4

Requirements

Quick Start

Activate the environment

Voice TTS Web UI (Recommended)

Start the Web UI

How to Use

Voice Profiles

Stop the Web UI

Alternative: One-Shot Voice Clone

Command Line Usage

Basic Voice Cloning

With Reference Text (ICL Mode)

Supported Languages

Create Voice Profile (CLI)

Run the Test Script

ComfyUI Web Interface

Start ComfyUI

Using Qwen3-TTS Nodes

Stop ComfyUI

Project Structure

Tips

Troubleshooting

"SoX could not be found"

Model loading errors

ComfyUI custom node not showing

Links

About

Uh oh!

Releases

Packages

Contributors 2

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
.gitignore		.gitignore
README.md		README.md
create_voice_profile.py		create_voice_profile.py
test_tts.py		test_tts.py
tts_app.py		tts_app.py
voice_clone_app.py		voice_clone_app.py
voice_tts_app.py		voice_tts_app.py

kteare/qwen_voice

Folders and files

Latest commit

History

Repository files navigation

Qwen3-TTS on Mac M4

Requirements

Quick Start

Activate the environment

Voice TTS Web UI (Recommended)

Start the Web UI

How to Use

Voice Profiles

Stop the Web UI

Alternative: One-Shot Voice Clone

Command Line Usage

Basic Voice Cloning

With Reference Text (ICL Mode)

Supported Languages

Create Voice Profile (CLI)

Run the Test Script

ComfyUI Web Interface

Start ComfyUI

Using Qwen3-TTS Nodes

Stop ComfyUI

Project Structure

Tips

Troubleshooting

"SoX could not be found"

Model loading errors

ComfyUI custom node not showing

Links

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Uh oh!

Languages

Packages