Remote Embedding Providers #21229

wozz · 2025-12-11T00:01:10Z

wozz
Dec 11, 2025

I'd like to add support for remote embedding providers. One option that seems relatively simple to add is utilizing the genai providers to do embeddings. The main issue with this approach is that none of the existing providers support multi-modal embedding apis/models. Another mechanism is to support clip-as-service which utilizes the api maintained by jina and seems to support the same models already used by frigate.

One way to support embeddings with genai (with currently supported api schema/models) is to use the image models to provide a text based description of images, and then embed that text instead of the image. It seems like this will likely work with some tweaking of the prompts to get the description to be useful for this purpose.

Another thing to consider is if we need support for multiple genai providers, especially as we start leveraging more features that could be present with some providers and not with others.

As the feature-set evolves, we may also need to support detection of prompt/model changes that indicate a reindex is required. For example, if ollama adds support for direct multi-modal embeddings, frigate should add support in a way that allows users to opt-in to the new feature or maintain the old mechanism to avoid a necessary reindex.

I guess another point to discuss is if this feature is necessary at all. My opinion is that as models grow larger, and require more specialized hardware, it's very useful to be able to have a dedicated system for LLM's and other large AI models with the necessary GPUs and other hardware to support it. It's very useful to set this up once and have other systems leverage it via an API. Additionally, since most use cases for embeddings are a non-critical function to a security system, it also makes sense to segregate the logic doing this processing to maintain stability of the core system as load goes up.

hawkeye217 · 2025-12-11T00:18:46Z

hawkeye217
Dec 11, 2025
Collaborator

I'm concerned about the quality implications of a "two step" approach. Vision-language models like Jina CLIP are specifically trained to create embeddings in a shared semantic space where images and text align directly. The two-step process (image --> text description --> text embedding) introduces a significant information bottleneck - the text description loses visual details that CLIP would capture directly from pixels, and a text-only embedding model was never trained on vision-text alignment. This will result in notably lower quality embeddings compared to local CLIP.

A multimodal embedding model is really what's necessary here for this to work well. Ollama doesn't currently support it (see the open feature request here), so it's unlikely that we'd want to implement anything until multimodal embedding models are at least supported there.

Additionally, we just opened an official feature request in the queue here: #21228

3 replies

wozz Dec 11, 2025
Author

what about the jina clip-as-service, since that's basically the same as what's already implemented model-wise, just over an api

NickM-27 Dec 11, 2025
Collaborator Sponsor

Personally I'd be open to that, but we'd need to think about if it makes sense to include as a genai provider, and like you said we'll want to make it possible to define multiple genai providers

hawkeye217 Dec 11, 2025
Collaborator

I'd be open to it as well, and like @NickM-27, I think it would be better to be able to provide more options for remote providers, which is why I think it's worth waiting on that Ollama feature request.

NickM-27 · 2025-12-11T00:21:37Z

NickM-27
Dec 11, 2025
Collaborator Sponsor

Thanks for creating the discussion. It's worth pointing out in general that 0.17 is already frozen for new features, so this would be going into 0.18 which will have many months of work, so we will most likely approach this slowly as we consider the best approach.

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Remote Embedding Providers #21229

Uh oh!

{{title}}

Uh oh!

Replies: 2 comments 3 replies

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

Uh oh!

Remote Embedding Providers #21229

Uh oh!

wozz Dec 11, 2025

Replies: 2 comments · 3 replies

Uh oh!

hawkeye217 Dec 11, 2025 Collaborator

Uh oh!

wozz Dec 11, 2025 Author

Uh oh!

NickM-27 Dec 11, 2025 Collaborator Sponsor

Uh oh!

hawkeye217 Dec 11, 2025 Collaborator

Uh oh!

NickM-27 Dec 11, 2025 Collaborator Sponsor

wozz
Dec 11, 2025

Replies: 2 comments 3 replies

hawkeye217
Dec 11, 2025
Collaborator

wozz Dec 11, 2025
Author

NickM-27 Dec 11, 2025
Collaborator Sponsor

hawkeye217 Dec 11, 2025
Collaborator

NickM-27
Dec 11, 2025
Collaborator Sponsor