This is a web-based application that provides close to real-time audio transcription and translation services using browser microphone input. Leveraging multiple AI providers, the application captures live speech, transcribes it, and translates the text into different languages as users speak.
- Browser Microphone Integration
- Close to Real-time speech capture directly in the web browser
- Intelligent phrase detection and transcription
- Multi-Provider Support
- Transcription Providers: Configurable (default: OpenAI)
- Translation Providers: OpenAI, Groq, Ollama
- Supports Multiple Audio Formats
- Supported formats: flac, m4a, mp3, mp4, mpeg, mpga, oga, ogg, wav, webm
- Language Translation
- Supports multiple languages including English, Spanish, French, German, Italian, Japanese, Korean, Chinese
app.py: Main FastAPI applicationai_interface/: AI service implementationsabstract_services.py: Base service interfaceopenai_services.py: OpenAI service implementationollama_services.py: Ollama service implementationgroq_services.py: Groq service implementation
static/: Web assetstemplates/: HTML templates
- Python 3.8+
- pip (Python package manager)
- Modern web browser with microphone support
- Clone the repository
- Create a virtual environment
python3 -m venv venv source venv/bin/activate - Install dependencies
pip install -r requirements.txt
- Copy
.env.templateto.envand configure your API keyscp .env.template .env
This project can be deployed using Docker. The Docker image for the application is available at ghcr.io/jayrinaldime/open-subtitles:latest.
We provide sample Docker Compose files for different configurations:
-
Docker Compose for Local Deployment (Using Ollama):
docker-compose -f docker-compose-local-ollama.yml up
-
Docker Compose for OpenAI Deployment:
docker-compose -f docker-compose-openai.yml up
-
Docker Compose for Groq Deployment:
docker-compose -f docker-compose-groq.yml up
-
Without Docker:
python app.py
Or using Uvicorn:
uvicorn app:app --host 0.0.0.0 --port 8000
-
With Docker:
docker run -p 8000:8000 ghcr.io/jayrinaldime/open-subtitles:latest
- Open the web application in a browser
- Grant microphone permissions
- Speak naturally - the application will:
- Detect when you complete a phrase
- Transcribe your speech in real-time
- Translate the transcribed text to your desired language
GET /: Main application interfacePOST /transcribe: Upload and transcribe audio files- Parameters:
audio: Audio fileaudio_level: Audio levelmax_audio_level: Maximum audio levelsource_language: Source language (default: auto)target_language: Target language for translation (default: en)
- Parameters:
-
Silence Threshold:
- Description: The minimum average audio level required to detect speech. Lower values will make the system more sensitive to quieter sounds.
- Range: 0.0 to 1.0
- Default Value: 0.6
-
Silence Duration (ms):
- Description: The duration of silence (in milliseconds) required to stop recording and process the audio. Longer values will allow for more natural pauses in speech.
- Range: 100ms to 2000ms
- Default Value: 300ms
-
Max Audio Level Threshold:
- Description: The maximum audio level required to process the audio. If the detected audio level exceeds this threshold, the audio will be sent to the server for transcription.
- Range: 1 to 128
- Default Value: 10
-
Transcript View Layout:
- Description: Choose between a detailed or compact view for the transcribed text. The detailed view shows both the original and translated text, while the compact view shows only the translated text.
- Options:
- Detailed
- Compact (default)
- Default Value: Compact
-
Debug Mode:
- Description: Enable debug mode to show additional information about the audio levels and processing.
- Default Value: Disabled
-
Source Language:
- Description: The language of the incoming audio. Set to "auto" for automatic language detection.
- Options:
- Auto-detect
- English (en)
- Spanish (es)
- French (fr)
- German (de)
- Italian (it)
- Japanese (ja)
- Korean (ko)
- Chinese (zh)
- Default Value: Auto-detect
-
Target Language:
- Description: The language to which the transcribed text will be translated.
- Options:
- English (en)
- Spanish (es)
- French (fr)
- German (de)
- Italian (it)
- Japanese (ja)
- Korean (ko)
- Chinese (zh)
- Default Value: English (en)
-
Enable Translation:
- Description: Enable or disable the translation of the transcribed text.
- Default Value: Enabled
- FastAPI
- Python
- Browser Microphone API
- OpenAI API
- Ollama
- Groq API
- Whisper (for local Speech to Text)
- Use docker image https://hub.docker.com/r/onerahmet/openai-whisper-asr-webservice
Current version: 0.0.6
See the LICENSE file for details.
- Fork the repository
- Create a feature branch
- Commit your changes
- Push to the branch
- Create a Pull Request
For issues or inquiries, please open a GitHub issue.