-
Notifications
You must be signed in to change notification settings - Fork 81
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Vector Similarity Index #104
base: main
Are you sure you want to change the base?
Conversation
This is probably at the point where it is ready for review. I've successfully used it to store and index both image and text vectors so it's already usable if you're willing to do the search part via the redis client (until we add VSS querying support to this package). I don't know why the node 14 build started playing up part way through - all the exact same dependencies worked on other branches AFAICT, but I needed to install esbuild directly. |
@@ -56,6 +56,12 @@ export class JsonSchemaBuilder<TEntity extends Entity> extends SchemaBuilder<TEn | |||
...this.buildWeight(fieldDef), | |||
...this.buildIndexed(fieldDef), | |||
] | |||
case 'binary': | |||
if (fieldDef.vector) | |||
console.warn(`You have marked the ${fieldDef.type} field '${field}' as vector but RediSearch doesn't support the VECTOR argument on a field for JSON. Ignored.`); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm pretty sure it is supported
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've re-enabled it, but it doesn't work when I tried it (against the latest redis-stack-server
image on docker hub, and with a numeric array for the field). I based the "not supported" on the comment on Redis Insight:
Maybe it's soon-to-be-released as there are JSON + Vector commits for RediSearch ...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I haven't been able to find any definitive docs on what format the JSON vector is expecting - maybe it isn't the encoded bytes but instead the array of floats? If so, the binary field should probably become vector instead so the same value can be set regardless of it being HASH or JSON, or the lib should convert it to a typed array of floats.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Had a few more attempts, but so far haven't been able to get vector similarity working with JSON. Tried array of bytes, array of floats, all result in index failures (using latest redis-stack-server image 7.0.0-RC6)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@CaptainCodeman VSS doesn't work with JSON yet. This is a feature we are working on! Bear with us
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks!, that's what I thought / deduced - any discussion about the format to use is maybe moot for now then :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@CaptainCodeman, we have basic JSON support on the edge
build of RediSearch. This is the master
branch build. You can see a simple test in https://github.com/RediSearch/RediSearch/pull/2704/files in test_vecsim.py
(or in test_vecsim.py
if you pull RediSearch master
branch)
I want to make sure this PR will be future-proof with our support for VSS on JSON
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, I'll wait for it to be available in one of the docker images before picking this up again, but I think it will make sense for the field type to be a typed array (Float32Array
) rather than a buffer.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks like this may be out now, so I'll revisit this PR ...
88119b0
to
f1e2163
Compare
Codecov Report
Additional details and impacted files@@ Coverage Diff @@
## main #104 +/- ##
=======================================
Coverage ? 98.65%
=======================================
Files ? 35
Lines ? 3047
Branches ? 600
=======================================
Hits ? 3006
Misses ? 41
Partials ? 0 Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here. ☔ View full report at Codecov. |
Finally working to index both HASH and JSON data structures with vector similarity. The tests can use some refactoring to re-use the common code, but it's usable to build the index and store the data correctly. |
Based on example in RedisInsight-v2
Redis client changes encoding when passing a JS object to write Writing fields separately correctly encodes binary data See: redis/node-redis#2139
From redis/node-redis#2139 (comment) (doesn't appear to be fixed, even in 4.1.1 release)
Store binary field as Base64 string in JSON (need to check on best practice) Add unit test to verify vector search operation
Why suddenly failing?
This reverts commit 89cf8bd.
Try adding esbuild as direct dependency
Remove encding fix
Enable vector indexing
Regen package-lock.json
try without esbuild
017b7a3
to
ff0800a
Compare
+1 for merging this (I'm not a committer/reviewer, just would like to see it added) |
No worries. It will be merged. Probably early next year. Like February. Just a matter or timing. |
I notice vector fields now support FLOAT64 as well as FLOAT32 for the type parameter so that should probably be accounted for. Ideally it would determine that from the model, whether it had a |
+1 for merging, although I do think support for FLOAT64 is important. |
+1 for merging |
Another vote for merging (actually I've voted twice...) |
Has this been abandoned? This is a very useful feature for redis-om with the popularity of LLM and Embedding documents. |
Kinda. It would need some effort to rebase onto the latest version and adapt to wherever the lib has diverged since this was submitted, but I don't personally have time to devote to doing that right now. I've mostly just been using the node redis client directly or else my own decorator-based mapper / index builder that I've been experimenting with. |
The field type / index building part of supporting Vector Similarity
Part of #85
This is just the index-building and support for saving / loading objects with binary Buffers. The search part is still remaining to be done although it is possible to use with the
redis
client directly in case anyone wants to experiment with it.Also adds support for storing binary data, even if it isn't a vector indexed (e,.g. a profile object with an avatar image, product with photo etc...)