Swara (స్వర - "voice/sound/tone" in Telugu) is an intelligent voice dictation tool for Linux with context awareness and AI processing.
-
Two Modes:
- Write Mode: Fast, accurate transcription
- Command Mode: AI-powered text transformation and generation
-
Context Awareness: Automatically detects and uses selected text
-
Privacy-Focused: Local Whisper.cpp transcription, optional cloud AI
-
Smart Text Injection: Multiple strategies with automatic fallback
-
Wayland Native: Built for modern Linux desktops (Hyprland)
- OS: Linux with Wayland (tested on Hyprland)
- Python: 3.10+
- System packages:
sudo pacman -S ydotool wl-clipboard libnotify python python-pip git make gcc
-
Clone the repository:
cd ~/Projects git clone <your-repo-url> swara cd swara
-
Run installation script:
bash scripts/install.sh
-
Configure Gemini API:
nano .env # Add: GEMINI_API_KEY=your_key_hereGet your key from: https://makersuite.google.com/app/apikey
-
Setup keybindings:
bash scripts/setup-keybindings.sh hyprctl reload
Write Mode (SUPER+ALT+D):
- Press keybinding → Speak (5 seconds) → Auto-stop
- Text appears in your active application
- Great for: Quick notes, messages, dictation
Command Mode (SUPER+ALT+C):
- Select text → Press keybinding → Give command
- Examples:
- "Make this more professional"
- "Fix grammar"
- "Summarize this"
- "Write a reply based on this"
- Configuration: All configurable settings
- Spike Tests: Manual tests for validation
Write Mode Flow:
Record → Transcribe → Inject
Command Mode Flow:
Capture Context → Record → Transcribe → Gemini AI → Inject
Core Technologies:
- Speech-to-Text: Whisper.cpp (local, private)
- AI: Google Gemini 2.0 (Command Mode only)
- Text Injection: ydotool (Wayland-native)
- Context Capture: wl-clipboard
Run spike tests to validate setup:
source venv/bin/activate
# Test context capture
python3 tests/spike/test_selection.py
# Test text injection
python3 tests/spike/test_ydotool.py"Permission denied" when typing:
# Add user to input group and reboot
sudo usermod -aG input $USER"ydotool not found":
sudo pacman -S ydotool"Whisper model not found":
cd ~/whisper.cpp
bash ./models/download-ggml-model.sh base"No text appears after dictation":
# Make sure ydotoold is running
systemctl --user start ydotoold
systemctl --user enable ydotooldFor more troubleshooting, check the logs:
tail -f logs/swara.logEdit config/default.yaml to customize:
# Audio settings
audio:
recording_duration: 5 # Recording duration in seconds
max_duration: 30
beep_on_start: true
# Punctuation (disabled by default for speed)
punctuation:
enabled: false
# AI settings
gemini:
temperature: 0.3
model: "gemini-2.0-flash-exp"
# Output settings
output:
typing_delay: 0.01
injection_strategy: "auto"- Write Mode: 100% local processing (audio never leaves your machine)
- Command Mode: Sends transcribed text + context to Gemini API
- Logs: Stored locally at
logs/swara.log
MIT License - see LICENSE file
- Whisper.cpp - Fast local transcription
- DeepMultilingualPunctuation - Punctuation restoration
- Google Gemini - AI processing
- ydotool - Wayland input simulation
- Custom GTK4 status window
- Voice Activity Detection (VAD)
- Multiple Whisper model support
- Local AI option (llama.cpp)
- Vim/Neovim plugin
- Multi-language support
For issues and questions, please open an issue on GitHub.
Made for the Linux voice dictation community