-
Notifications
You must be signed in to change notification settings - Fork 0
Open
Description
Description
In the current proof of concept, the language model is selected manually by the user—either one of the default Whisper models or a KB-Whisper variant.
To streamline the user experience, we propose an automated default setting where the system selects the most suitable model based on language detection.
⸻
Expected Behavior
• If the user does not select a specific model, the system should automatically detect the language of the input audio.
• Based on the detected language, the system selects the optimal model:
• If the language is Swedish → Use KB-Whisper Large
• If another language is detected → Use Whisper Large (OpenAI)
• If language is unknown or confidence is low → Fallback to Whisper Large
⸻
Technical Guidelines
- Perform fast language detection on the first 30 seconds of the audio
• Whisper has a built-in method (language field in the result dict) for this.
• Alternatively, faster detection tools like langdetect or fastText can be considered. - Select the model based on the detected language
• If sv (Swedish) is detected with high confidence (e.g., >90%), load KB-Whisper Large.
• Otherwise, default to Whisper Large (OpenAI). - Log the model selection decision
• Store this information in a sidecar file for traceability and to allow analysis/improvement of the fallback logic over time.
Metadata
Metadata
Assignees
Labels
No labels