Tara is a fully local, zero-cost voice assistant that combines the power of Orpheus TTS, LiveKit, and local LLMs to create incredibly natural and expressive speech. This project eliminates the need for cloud-based API services by integrating:
tara.mp4
- Orpheus TTS for human-like speech with natural intonation and emotion
- LiveKit for real-time voice communication
- Local Whisper for accurate speech-to-text conversion
- Ollama for running local large language models
The result is a voice assistant with remarkably natural speech capabilities, including emotional expressions like laughs, sighs, and gasps - all running completely on your local machine.
- 🎯 100% Local - No API costs or cloud dependencies
- 🗣️ Expressive Speech - Natural intonation, rhythm, and emotional expressions
- 🎭 Emotion Tags - Simple text-based tags to control emotion and expression
- 🎙️ Real-time Conversation - Fluid interaction through LiveKit
- 🧠 Local LLM Integration - Uses Ollama to run powerful models locally
- 👂 Advanced Speech Recognition - Fast local transcription with Whisper
Before running Tara, you'll need:
- Python 3.9+
- Orpheus TTS FastAPI Server installed and running
- Ollama installed with the LLama3.2 model
- [Faster Whisper] or compatible STT system Speaches AI
-
First, set up the Orpheus TTS server:
# Clone and set up the Orpheus FastAPI server git clone https://github.com/Lex-au/Orpheus-FastAPI cd Orpheus-FastAPI # Follow the setup instructions from the repository
-
Clone this repository:
git clone https://github.com/dwain-barnes/tara-orpheus-livekit cd tara-orpheus-livekit -
Install dependencies:
pip install -r requirements.txt
-
Create a
.env.localfile with your configuration:# Your LiveKit configuration LIVEKIT_URL=your_livekit_url LIVEKIT_API_KEY=your_api_key LIVEKIT_API_SECRET=your_api_secret
-
Make sure the Orpheus TTS server is running (default: http://localhost:5005)
-
Make sure Ollama is running with the llama3.2 model loaded
-
Make sure to reaplce the default openai tts.py with the tts.py from this repo
-
Run the voice assistant:
python tara.py
-
Connect to the LiveKit room and start interacting with Tara
The system consists of several integrated components:
- Speech-to-Text (STT): Uses Faster Whisper for local transcription
- Language Model: Connects to a local Ollama instance running LLama3.2
- Text-to-Speech (TTS): Modified OpenAI TTS module that connects to Orpheus TTS
- Voice Pipeline: Handles the flow between components via LiveKit
Tara uses special text tags to express emotions in speech:
<giggle>,<laugh>,<chuckle>for humor<sigh>,<groan>for showing disappointment or frustration<gasp>,<cough>,<sniffle>,<yawn>for other human-like expressions
You can modify the tara.py file to:
- Change the voice by editing the
voiceparameter in the TTS setup - Modify the personality by editing the system prompt
- Adjust the LLM model by changing the Ollama model name
- Configure different endpoints for any of the services
The main workflow in tara.py:
# 1) Speech-to-Text with FasterWhisper
stt_plugin = openai.STT.with_faster_whisper(model="Systran/faster-distil-whisper-large-v3")
# 2) Language Model from Ollama
llm_plugin = openai.LLM(
base_url="http://localhost:11434/v1",
api_key=os.environ.get("12343"),
model="llama3.2:latest",
)
# 3) Text-to-Speech using Orpheus TTS
tts_plugin = TTS.create_orpheus_client(
voice="tara",
base_url="http://localhost:5005/v1"
)
# 4) Create a VoicePipelineAgent
agent = VoicePipelineAgent(
vad=ctx.proc.userdata["vad"],
stt=stt_plugin,
llm=llm_plugin,
tts=tts_plugin,
chat_ctx=initial_ctx,
turn_detector=turn_detector.EOUModel(),
)This project is licensed under the MIT License - see the LICENSE file for details.
- Orpheus TTS for the incredible speech synthesis
- LiveKit for the real-time communication platform
- Lex-au for the Orpheus FastAPI implementation