Swara (స్వర) - Intelligent Voice Dictation for Linux

Swara (స్వర - "voice/sound/tone" in Telugu) is an intelligent voice dictation tool for Linux with context awareness and AI processing.

Features

Two Modes:
- Write Mode: Fast, accurate transcription
- Command Mode: AI-powered text transformation and generation
Context Awareness: Automatically detects and uses selected text
Privacy-Focused: Local Whisper.cpp transcription, optional cloud AI
Smart Text Injection: Multiple strategies with automatic fallback
Wayland Native: Built for modern Linux desktops (Hyprland)

Quick Start

Prerequisites

OS: Linux with Wayland (tested on Hyprland)
Python: 3.10+

System packages:

sudo pacman -S ydotool wl-clipboard libnotify python python-pip git make gcc

Installation

Clone the repository:

cd ~/Projects
git clone <your-repo-url> swara
cd swara

Run installation script:
```
bash scripts/install.sh
```
Configure Gemini API:
```
nano .env
# Add: GEMINI_API_KEY=your_key_here
```
Get your key from: https://makersuite.google.com/app/apikey

Setup keybindings:

bash scripts/setup-keybindings.sh
hyprctl reload

Usage

Write Mode (SUPER+ALT+D):

Press keybinding → Speak (5 seconds) → Auto-stop
Text appears in your active application
Great for: Quick notes, messages, dictation

Command Mode (SUPER+ALT+C):

Select text → Press keybinding → Give command
Examples:
- "Make this more professional"
- "Fix grammar"
- "Summarize this"
- "Write a reply based on this"

Documentation

Configuration: All configurable settings
Spike Tests: Manual tests for validation

Architecture

Write Mode Flow:
Record → Transcribe → Inject

Command Mode Flow:
Capture Context → Record → Transcribe → Gemini AI → Inject

Core Technologies:

Speech-to-Text: Whisper.cpp (local, private)
AI: Google Gemini 2.0 (Command Mode only)
Text Injection: ydotool (Wayland-native)
Context Capture: wl-clipboard

Testing

Run spike tests to validate setup:

source venv/bin/activate

# Test context capture
python3 tests/spike/test_selection.py

# Test text injection
python3 tests/spike/test_ydotool.py

Troubleshooting

Common Issues

"Permission denied" when typing:

# Add user to input group and reboot
sudo usermod -aG input $USER

"ydotool not found":

sudo pacman -S ydotool

"Whisper model not found":

cd ~/whisper.cpp
bash ./models/download-ggml-model.sh base

"No text appears after dictation":

# Make sure ydotoold is running
systemctl --user start ydotoold
systemctl --user enable ydotoold

For more troubleshooting, check the logs:

tail -f logs/swara.log

Configuration

Edit config/default.yaml to customize:

# Audio settings
audio:
  recording_duration: 5  # Recording duration in seconds
  max_duration: 30
  beep_on_start: true

# Punctuation (disabled by default for speed)
punctuation:
  enabled: false

# AI settings
gemini:
  temperature: 0.3
  model: "gemini-2.0-flash-exp"

# Output settings
output:
  typing_delay: 0.01
  injection_strategy: "auto"

Privacy

Write Mode: 100% local processing (audio never leaves your machine)
Command Mode: Sends transcribed text + context to Gemini API
Logs: Stored locally at logs/swara.log

License

MIT License - see LICENSE file

Acknowledgments

Whisper.cpp - Fast local transcription
DeepMultilingualPunctuation - Punctuation restoration
Google Gemini - AI processing
ydotool - Wayland input simulation

Roadmap

Support

For issues and questions, please open an issue on GitHub.

Made for the Linux voice dictation community

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
config		config
logs		logs
models		models
scripts		scripts
src		src
tests/spike		tests/spike
.env.example		.env.example
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Swara (స్వర) - Intelligent Voice Dictation for Linux

Features

Quick Start

Prerequisites

Installation

Usage

Documentation

Architecture

Testing

Troubleshooting

Common Issues

Configuration

Privacy

License

Acknowledgments

Roadmap

Support

About

Uh oh!

Releases 1

Packages

Languages

License

reddynsk/swara

Folders and files

Latest commit

History

Repository files navigation

Swara (స్వర) - Intelligent Voice Dictation for Linux

Features

Quick Start

Prerequisites

Installation

Usage

Documentation

Architecture

Testing

Troubleshooting

Common Issues

Configuration

Privacy

License

Acknowledgments

Roadmap

Support

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 1

Packages 0

Languages

Packages