This project is focused on voice communication with ChatGPT.
The user speaks into the computer's microphone, and the application transforms the audio recording into text using the OpenAI Whisper service. Then, if necessary, this text is translated into English (this functionality is not yet implemented). Afterwards, the OpenAI Chat service API is called, using the 'gpt-3.5-turbo-0301' model. The response is then transformed back into audio using the built-in sound synthesis support in the MS Windows operating system.
(Note: It is recommended to have this (Speech support in windows) support installed for the language you are communicating in,
otherwise the output will not work. To check if the support is installed, go to settings -> time & language -> speech ->
installed voice packages.
You can also use the list-voices
command to list all available voices. Like this: voice2gpt list-voices
)
🛑 DISCLAIMER: This project was created in my spare time during 2 evenings just for fun. Do not ask me to make urgent fixes or add new features. I may consider it, but I cannot promise anything. My time is limited. I have a job, a family, dog, garden, etc.
Before you begin, ensure you have met the following requirements:
-
You have installed the version of .NET Core SDK 7.0 or higher.
To install .NET Core SDK, follow these steps:
- Download the .NET Core SDK installer from https://dotnet.microsoft.com/en-us/download/dotnet/7.0
- Run the installer and follow the instructions.
- Verify the installation by running the following command:
dotnet --version
-
Windows 10 or higher
- This application uses the Microsoft Speech Platform, which is only available on Windows 10 or higher. In the future I plan to make implementations for Azure cognitive services and Eleven labs AI text to speech services.
To install and run this project, follow these steps:
-
Clone this repository to your local machine.
-
Navigate to the solution folder 'src'.
-
Install the required dependencies by running the following command:
dotnet restore
-
Run the application by running the following command:
dotnet build
The application is configured using the appsettings.json
file. The following parameters can be configured:
{
"OpenAIServiceOptions": {
"ApiKey": "",
"OrgKey": ""
},
"DeepLTranslatorOptions": {
"ApiKey": ""
}
}
The OpenAI API key (OpenAIServiceOptions:ApiKey
) and organization key
(OpenAIServiceOptions:OrgKey
) can be obtained by registering at https://platform.openai.com/
and creating a new API key. The organization key is not required, but it is recommended to use
it to avoid rate limiting. You can find the organization key at https://platform.openai.com/account/org-settings.
The DeepL API key (DeepLTranslatorOptions:ApiKey
) can be obtained by
registering at https://www.deepl.com/pro and creating a new API key.
This is not required at the moment, because the translation functionality is not yet implemented.
To use this project acts as dotnet global tool, follow these steps:
-
Navigate to the solution folder 'src'.
-
Install the tool by running the following command:
dotnet pack ..\src\Voice2Gpt -c Release -o nupkg dotnet tool install --global --add-source ..\src\Voice2Gpt\nupkg Voice2Gpt.App.CLI
-
Run the application by running the following command for the first time:
voice2gpt --help
-
Navigate to the solution folder 'src'.
-
Update the tool by running the following command:
dotnet pack ..\src\Voice2Gpt -c Release -o nupkg dotnet tool update --global --add-source ..\src\Voice2Gpt\nupkg Voice2Gpt.App.CLI
-
Run the application by running the following command for the first time:
voice2gpt --help
This project is designed as a console application. It accepts the following parameters as input:
-
chat
: Starts the chat with the chatbot- Configuration parameters:
-l
or--log
: Enables logging (default: false)-il
or--input-language
: Input language you will speak in (default: en)-d
or--device
: Microphone device number (default: 0)
- Configuration parameters:
-
list-devices
: Lists all available microphone devices- Configuration parameters:
-l
or--log
: Enables logging (default: false)
- Configuration parameters:
-
list-voices
: Lists all available voices- Configuration parameters:
-l
or--log
: Enables logging (default: false)
- Configuration parameters:
The following command will transcribe speech from device 1 (the default is 0) input language 'sk'
voice2gpt -d 1 -il sk
The following command will transcribe speech from the defaut device 0 input language 'sk'
voice2gpt -il sk
The following command will transcribe speech from the defaut device 0 default input language 'en' and enable logging
voice2gpt -l
Run the following to view all available options:
voice2gpt --help
The following components are used in this project:
- OpenAI ChatGPT - for the main chat functionality
- OpenAI Whisper - for speech transcription
- DeepL Translator - for text translation
- Microsoft Speech Platform - for speech synthesis
- NAudio package - for audio recording and playback
This project is released under the MIT License. See LICENSE for further details.