A crude way of using OpenAI Whisper for alternative dictation in KaldiAG #73

shervinemami · 2022-10-09T11:34:23Z

This is a fairly crude implementation, including various hard-coded settings for Whisper's "base.en" English model, and probably only currently works on Linux & OS X since it has a hardcoded tmpfile path. But it's good enough to begin playing with.

…ldiAG.

mallorbc · 2022-10-13T03:51:28Z

Glad you found my code helpful and added it here. The whisper model takes either a wav file or an array(not sure of the format).

However, I could not get the model working in a timely manner, so I decided to just write to the system. By using io.BytesIO it should be possible to handle it all in memory.

…e default dictation

… RPC. Also sending data as binary instead of WAV file.

LexiconCode

Grabbing the parent directory and then pointing to whisper_dictation.py with absolute path.

import os
pardir = os.path.abspath(os.path.join(__file__, os.pardir))  
whisper_server = os.path.abspath(os.path.join(pardir, "whisper_server.py"))
subprocess.Popen([sys.executable, whisper_server])

sys.executable is needed for the Python executable on Windows as system path cannot be relied on especially with multiple Python. I'm curious to know if this works just as well in Ubuntu

LexiconCode · 2022-11-05T18:47:46Z

OS agnostic temp path for whisper_server.py

import tempfile
temp_dir = tempfile.TemporaryDirectory().name
audio_filename = os.path.join(temp_dir,"whisper.wav")
 
temp_dir.cleanup() # place near the end of `die()` function

@LexiconCode

…Windows, thanks to @LexiconCode.

shervinemami · 2022-11-06T04:04:57Z

Thanks @LexiconCode for these 2 portability improvements, I've uploaded them now :-)

daanzu · 2022-11-06T15:27:22Z

@shervinemami I actually don't think these changes are necessary to support using Whisper in KaldiAG. Since the alternative_dictation config parameter naturally supports taking a callable, I think all of the work can (and should) be placed in your user code: specifically your dragonfly loader. But perhaps I've missed something, so feel free to correct me or ask any questions!

I am adding a somewhat-related note here from gitter: You will likely find alternative dictation to work better for dictation utterances that don't include any "command parts". The problem is that, for the example you posted, KaldiAG tries its best to "cut out" the part where you preface the utterance by spanking "whisper", and only pass the rest of the audio to whisper, but doing that is quite difficult and inexact. You might want to try something like having command that enables a pure dictation rule ("<dictation>") for only the next utterance. This is what I have migrated to usually using, although for different reasons (it allows me to collect a better corpus of audio training examples to build my speech model even better).

A crude way of using OpenAI Whisper for alternative dictation with Ka…

555eb1a

…ldiAG.

shervinemami mentioned this pull request Oct 10, 2022

Thanks for this project mallorbc/whisper_mic#6

Closed

shervin.emami added 2 commits October 25, 2022 19:21

Even uglier hacks, to force OpenAI whisper AlternativeDictation as th…

c0fa9aa

…e default dictation

Moved the whisper alternative dictation into a separate process using…

b6059ba

… RPC. Also sending data as binary instead of WAV file.

LexiconCode reviewed Nov 5, 2022

View reviewed changes

Portability improvements for the whisper dictation backend to run in …

74cd265

…Windows, thanks to @LexiconCode.

drmfinlay mentioned this pull request May 4, 2023

Supporting whisper.cpp? dictation-toolbox/dragonfly#376

Closed

daanzu force-pushed the master branch from e321419 to 109e294 Compare August 31, 2025 13:46

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

A crude way of using OpenAI Whisper for alternative dictation in KaldiAG #73

A crude way of using OpenAI Whisper for alternative dictation in KaldiAG #73

Uh oh!

shervinemami commented Oct 9, 2022

Uh oh!

mallorbc commented Oct 13, 2022

Uh oh!

LexiconCode left a comment •

edited

Loading

Uh oh!

LexiconCode commented Nov 5, 2022 •

edited

Loading

Uh oh!

shervinemami commented Nov 6, 2022

Uh oh!

daanzu commented Nov 6, 2022 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Uh oh!

A crude way of using OpenAI Whisper for alternative dictation in KaldiAG #73

Are you sure you want to change the base?

A crude way of using OpenAI Whisper for alternative dictation in KaldiAG #73

Uh oh!

Conversation

shervinemami commented Oct 9, 2022

Uh oh!

mallorbc commented Oct 13, 2022

Uh oh!

LexiconCode left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

LexiconCode commented Nov 5, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

shervinemami commented Nov 6, 2022

Uh oh!

daanzu commented Nov 6, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

LexiconCode left a comment •

edited

Loading

LexiconCode commented Nov 5, 2022 •

edited

Loading

daanzu commented Nov 6, 2022 •

edited

Loading