This project implements a German-speaking AI assistant using Retrieval-Augmented Generation (RAG) with LangChain, Groq API, and Chroma for vector storage.
- Features
- Prerequisites
- Installation
- MacOS Users: Installing PyAudio
- Downloading Ollama Embedding Model
- Usage
- Adding New Skills
- Roadmap
- Contributing
- License
- Uses Groq API with the llama-3.1-70b-versatile
- Implements LangChain for data management and processing
- Utilizes Chroma for chat history storage
- Employs nomic-embed-text:latest from Ollama for embeddings
- Implements text chunking for efficient processing
- Allows user interaction through text or speech input
- Manages configuration data using dotenv
- Python 3.8+ (I use 3.11)
- Groq API access
- Chroma
- Ollama running locally
- Clone the repository:
git clone https://github.com/kabelklaus/AI-Voice-Assistant.git
cd AI-Voice-Assistant
- Install the required packages:
pip install -r requirements.txt
- Rename
.env_example
to.env
and fill in your API keys and other configuration details:
mv .env_example .env
- Edit the
.env
file with your specific credentials and settings.
If you're using a Mac and need to install PyAudio, follow these steps:
- Install Xcode from the App Store and restart your computer.
- Run the following commands in sequence:
xcode-select --install
brew remove portaudio
brew install portaudio
pip3 install pyaudio
Note: Xcode command line tools are required for some installations. Homebrew requires Xcode, so you can also just run:
xcode-select --install
This project uses the nomic-embed-text:latest
model from Ollama for embeddings. To download this model:
- Ensure Ollama is installed on your system. If not, follow the installation instructions at Ollama's official website.
- Open a terminal or command prompt.
- Run the following command to download the model:
ollama pull nomic-embed-text:latest
This command will download and install the latest version of the nomic-embed-text
model.
4. Wait for the download to complete. The model size is approximately 670MB, so it may take a few minutes depending on your internet connection.
Once the model is downloaded, Ollama will automatically use it for generating embeddings in this project.
-
Ensure Ollama is running locally for embeddings.
-
Run the main script:
python main.py
- Follow the prompts to interact with the AI assistant. You can choose between text or speech input.
To extend the assistant's capabilities, you can add new skills:
- Create a new Python file in the
skills
directory (e.g.,new_skill.py
). - Implement the skill's functionality.
- Import and integrate the new skill in
main.py
. - Add the skill to the
response_prompt.py
file to make the LLM aware of it. Include an example of how to use the skill to help the LLM better understand and utilize it.
Example addition to response_prompt.py
:
If the user asks about the weather, respond with:
FUNCTION_CALL: get_weather(location)
For example:
- If the user asks "Wie ist das Wetter in Berlin?", respond with:
FUNCTION_CALL: get_weather("Berlin")
This will ensure that the LLM knows how to use the new skill and can incorporate it into its responses.
- We are considering switching from AstraDB to ChromaDB for vector storage. For more privacy.
- More skills will be added to enhance the assistant's capabilities.
- Regular database cleaning
- Implement a cleaning strategy while preserving entries with "info_type" in metadata
- Cleaning methods to consider:
- Time-based cleaning
- Redundancy removal
- Context cleaning
- Frequency-based cleaning
- Size limitation
Our database cleaning strategy will focus on maintaining relevant information while optimizing storage and performance. Here's a breakdown of our approach:
-
Time-based Cleaning:
- Remove older entries not marked as user information (no "info_type" in metadata)
- E.g., delete entries older than 30 days that aren't saved user information
-
Redundancy Removal:
- Identify very similar entries (high similarity in embeddings)
- Keep only the most recent entry
- Exclude entries with "info_type" in metadata from this process
-
Context Cleaning:
- Remove context information from past conversations that's no longer relevant
- Always retain the most important or frequently accessed information
-
Frequency-based Cleaning:
- Remove entries that are rarely or never retrieved, except those with an "info_type"
-
Size Limitation:
- Implement a maximum number of entries or maximum database size
- When the limit is reached, remove the oldest entries (excluding "info_type" entries)
This strategy will help us maintain a clean, efficient database while preserving crucial user information.
Contributions are welcome! Please feel free to submit a Pull Request.
This project is licensed under the MIT License - see the LICENSE file for details.