Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: Add support for VoyageAI embeddings API #1442

Open
wants to merge 2 commits into
base: develop
Choose a base branch
from

Conversation

Firbydude
Copy link

@Firbydude Firbydude commented Dec 25, 2024

Risks

A bug or change in behavior of embedding provider selection could cause an unexpected switch. This could cause pre-existing embeddings and new embeddings to be incompatible.

I removed isOllama flag from the config. It seemed to only be used when we already knew the provider is ollama so was redundant. It is technically a change in behavior because using ollama with a url override will no longer strip the trailing v1/.

Background

What does this PR do?

Added support for environment variables:

  • USE_VOYAGEAI_EMBEDDING
  • VOYAGEAI_API_KEY
  • VOYAGEAI_EMBEDDING_DIMENSIONS
  • VOYAGEAI_EMBEDDING_MODEL Configuration follows existing patterns. Values for dimensions and model can be found in the VoyageAI API documentation.

Some minor clean-up of the embedding.ts file.

Added unit tests around embedding configuration.

What kind of change is this?

Feature

Why are we doing this? Any context or related work?

Anthropic does not support an embedding API, but recommends Voyage. Various model sizes and domains are supported.

Documentation changes needed?

  • Updated .env.example to include new settings.
  • Removed an unused function getEmbeddingType. I see there are some API docs including this. Is there a doc generation step?

Testing

Detailed testing steps

Using Voyage AI

USE_VOYAGEAI_EMBEDDING=true
VOYAGEAI_API_KEY=<redacted>

Logs:

 ፧ DEBUG
   Getting remote embedding using provider: 
   VoyageAI 

Using Local Model

USE_OPENAI_EMBEDDING=FALSE
USE_OLLAMA_EMBEDDING=FALSE
USE_GAIANET_EMBEDDING=FALSE
USE_VOYAGEAI_EMBEDDING=FALSE

Logs:

 ፧ DEBUG
   Preprocessing text: 
   {"input":"hey charl! updated your embeddings! what's my real name?","length":56} 

 ፧ DEBUG
   Knowledge query: 
   {"original":"hey charl! updated your embeddings! what's my real name?","processed":"hey charl updated your embeddings whats my real name?","length":53} 

 ፧ DEBUG
   Embedding request: 
   {"modelProvider":"anthropic","useOpenAI":"FALSE","input":"hey charl updated your embeddings whats my real na...","inputType":"string","inputLength":53,"isString":true,"isEmpty":false} 

 ["፧ DEBUG - Inside getLocalEmbedding function"] 

 ["፧ Initializing BGE embedding model..."] 

 ፧ DEBUG
   Generating embedding for input: 
   {"inputLength":53,"inputPreview":"hey charl updated your embeddings whats my real name?..."} 

 ፧ DEBUG
   Raw embedding from BGE: 

Discord username

firbydude

@Firbydude Firbydude changed the base branch from main to develop December 25, 2024 03:41
Added support for environment variables:
- USE_VOYAGEAI_EMBEDDING
- VOYAGEAI_API_KEY
- VOYAGEAI_EMBEDDING_DIMENSIONS
- VOYAGEAI_EMBEDDING_MODEL
Configuration follows existing patterns. Values for dimensions and model
can be found in the VoyageAI API documentation.

Some minor cleanup of the embedding.ts file.

Added unit tests around embedding configuration.
@shakkernerd shakkernerd changed the title Add support for VoyageAI embeddings API feat: Add support for VoyageAI embeddings API Dec 26, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants