Skip to content

Feature Request: Automatic Model Selection Based on Language Detection #3

@erklu

Description

@erklu

Description

In the current proof of concept, the language model is selected manually by the user—either one of the default Whisper models or a KB-Whisper variant.
To streamline the user experience, we propose an automated default setting where the system selects the most suitable model based on language detection.

Expected Behavior
• If the user does not select a specific model, the system should automatically detect the language of the input audio.
• Based on the detected language, the system selects the optimal model:
• If the language is Swedish → Use KB-Whisper Large
• If another language is detected → Use Whisper Large (OpenAI)
• If language is unknown or confidence is low → Fallback to Whisper Large

Technical Guidelines

  1. Perform fast language detection on the first 30 seconds of the audio
    • Whisper has a built-in method (language field in the result dict) for this.
    • Alternatively, faster detection tools like langdetect or fastText can be considered.
  2. Select the model based on the detected language
    • If sv (Swedish) is detected with high confidence (e.g., >90%), load KB-Whisper Large.
    • Otherwise, default to Whisper Large (OpenAI).
  3. Log the model selection decision
    • Store this information in a sidecar file for traceability and to allow analysis/improvement of the fallback logic over time.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions