Skip to content

Utility to automatically maintain embeddings of local files for usage with vector search and LLMs.

License

Notifications You must be signed in to change notification settings

mossbanay/vectrekker

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

26 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

VecTrekker

Overview

VecTrekker is a simple utility to easily walk through a directory of files, and sync them to a vector database (for example, Pinecone). You can use it (for example) to index your notes for use with an LLM chain.

The current tokenizer is cl100k_base and the current embedding model used is text-embedding-ada-002 from OpenAI.

Quick-start guide

pip install vectrekker
vectrekker --dry-run

You can adjust the configuration in ~/.vectrekker/config.toml (created automatically after first startup) to add your credentials for Pinecone, as well as OpenAI.

Scheduling VecTrekker

It's suggested that you setup a crontab for VecTrekker to periodically scan your directories again, and update any files that are out of date. An example crontab scanning every two hours is

mkdir -p ~/.vectrekker
python3.10 -m venv ~/.vectrekker/.venv
~/.vectrekker/.venv/bin/pip install vectrekker
0 * * * * date >> ~/.vectrekker/vectrekker.log && ~/dev/vectrekker/.venv/bin/vectrekker 2>&1 >> ~/.vectrekker/vectrekker.log

Vector database support

These are the currently supported vector databases.

Database Support
Pinecone

About

Utility to automatically maintain embeddings of local files for usage with vector search and LLMs.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages