WhisperWriter is a small speech-to-text app that uses OpenAI's Whisper model to auto-transcribe recordings from a user's microphone.
Once started, the script runs in the background and waits for a keyboard shortcut to be pressed (ctrl+alt+space
by default, but this can be changed in the Configuration Options). When the shortcut is pressed, the app starts recording from your microphone. It will continue recording until you stop speaking or there is a long enough pause in your speech. While it is recording, a small status window is displayed that shows the current stage of the transcription process. Once the transcription is complete, the transcribed text will be automatically written to the active window.
The transcription can either be done locally through the Whisper Python package or through a request to OpenAI's API. By default, the app will use the API, but you can change this in the Configuration Options. If you choose to use the API, you will need to provide your OpenAI API key in a .env
file. If you choose to transcribe using a local model, you will need to install the command-line tool ffmpeg and potentially Rust as well.
Fun fact: Almost the entirety of this project was pair-programmed with ChatGPT-4 and GitHub Copilot using VS Code. Practically every line, including most of this README, was written by AI. After the initial prototype was finished, WhisperWriter was used to write a lot of the prompts as well!
Before you can run this app, you'll need to have the following software installed:
- Git: https://git-scm.com/downloads
- Python 3.11: https://www.python.org/downloads/
- The Whisper Python package is only compatible with Python versions >=3.7.
If you are running a local model, you will also need to install the command-line tool ffmpeg and add it to your PATH:
# on Ubuntu or Debian
sudo apt update && sudo apt install ffmpeg
# on Arch Linux
sudo pacman -S ffmpeg
# on MacOS using Homebrew (https://brew.sh/)
brew install ffmpeg
# on Windows using Chocolatey (https://chocolatey.org/)
choco install ffmpeg
# on Windows using Scoop (https://scoop.sh/)
scoop install ffmpeg
If you are running into issues, you may need to install Rust. See Whisper Setup.
To set up and run the project, follow these steps:
git clone https://github.com/savbell/whisper-writer
cd whisper-writer
python -m venv venv
# For Linux and macOS:
source venv/bin/activate
# For Windows:
venv\Scripts\activate
pip install -r requirements.txt
To switch between running Whisper locally and using the OpenAI API, you need to modify the src\config.json
file:
- If you prefer using the OpenAI API, set
"use_api"
totrue
. You will also need to set up your OpenAI API key in the next step. - If you prefer using a local Whisper model, set
"use_api"
tofalse
. You may also want to change the device that the model uses; see the Model Options. Make sure you followed the prerequisite steps and installed ffmpeg and Rust if necessary.
{
"use_api": true, // Change this value to false to run Whisper locally
...
}
Copy the ".env.example" file to a new file named ".env":
# For Linux and macOS
cp .env.example .env
# For Windows
copy .env.example .env
Open the ".env" file and add in your OpenAI API key:
OPENAI_API_KEY=<your_openai_key_here>
python run.py
WhisperWriter uses a configuration file to customize its behaviour. To set up the configuration, modify the src\config.json
file:
{
"use_api": true,
"api_options": {
"model": "whisper-1",
"language": null,
"temperature": 0.0,
"initial_prompt": null
},
"local_model_options": {
"model": "base",
"device": null,
"language": null,
"temperature": 0.0,
"initial_prompt": null,
"condition_on_previous_text": true,
"verbose": false
},
"activation_key": "ctrl+alt+space",
"silence_duration": 900,
"writing_key_press_delay": 0.005,
"remove_trailing_period": false,
"add_trailing_space": true,
"remove_capitalization": false,
"print_to_terminal": true
}
use_api
: Set totrue
to use the OpenAI API for transcription. Set tofalse
to use a local Whisper model. (Default:true
)api_options
: Contains options for the OpenAI API. See the API reference for more details.model
: The model to use for transcription. Currently onlywhisper-1
is available. (Default:"whisper-1"
)language
: The language code for the transcription in ISO-639-1 format. (Default:null
)temperature
: Controls the randomness of the transcription output. Lower values (e.g., 0.0) make the output more focused and deterministic. (Default:0.0
)initial_prompt
: A string used as an initial prompt to condition the transcription. Set to null for no initial prompt. (Default:null
)
local_model_options
: Contains options for the local Whisper model. See the function definition for more details.model
: The model to use for transcription. See available models and languages. (Default:"base"
)device
: The device to run the local Whisper model on. Options includecuda
for NVIDIA GPUs,cpu
for CPU-only processing, ornull
to let the system automatically choose the best available device. (Default:null
)language
: The language code for the transcription in ISO-639-1 format. (Default:null
)temperature
: Controls the randomness of the transcription output. Lower values (e.g., 0.0) make the output more focused and deterministic. (Default:0.0
)initial_prompt
: A string used as an initial prompt to condition the transcription. Set to null for no initial prompt. (Default:null
)conditin_on_previous_text
: Set totrue
to use the previously transcribed text as a prompt for the next transcription request. (Default:true
)verbose
: Set totrue
for more detailed transcription output. (Default:false
)
activation_key
: The keyboard shortcut to activate the recording and transcribing process. (Default:"ctrl+alt+space"
)silence_duration
: The duration in milliseconds to wait for silence before stopping the recording. (Default:900
)writing_key_press_delay
: The delay in seconds between each key press when writing the transcribed text. (Default:0.005
)remove_trailing_period
: Set totrue
to remove the trailing period from the transcribed text. (Default:false
)add_trailing_space
: Set totrue
to add a trailing space to the transcribed text. (Default:true
)remove_capitalization
: Set totrue
to convert the transcribed text to lowercase. (Default:false
)print_to_terminal
: Set totrue
to print the script status and transcribed text to the terminal. (Default:true
)
If any of the configuration options are invalid or not provided, the program will use the default values.
We use Semantic Versioning for this project. For the available versions, see the tags on this repository.
The version format is MAJOR.MINOR.PATCH
, where:
MAJOR
versions indicate potentially incompatible changes,MINOR
versions indicate the addition of functionality in a backwards-compatible manner, andPATCH
versions indicate backwards-compatible bug fixes.
For detailed changes, please check the CHANGELOG.md file in this repository.
As of version 1.0.0, the following issues are known:
-
Numba Deprecation Warning: When running the Whisper model locally, a numba depreciation warning is displayed. This is an issue with the Whisper Python package and will be fixed in a future release. The warning can be safely ignored.
-
FP16 Not Supported on CPU Warning: A warning may show if you are running the local model on your CPU rather than a GPU using CUDA. This can be safely ignored.
Please note that this is not an exhaustive list and new issues can emerge over time. You can see all reported issues and their current status in our Issue Tracker. If you encounter a problem not listed here, please open a new issue with a detailed description and reproduction steps, if possible.
This project is licensed under the GNU General Public License. See the LICENSE file for details.