Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Vector Similarity Index #104

Open
wants to merge 28 commits into
base: main
Choose a base branch
from
Open

Conversation

CaptainCodeman
Copy link
Contributor

@CaptainCodeman CaptainCodeman commented Jul 1, 2022

The field type / index building part of supporting Vector Similarity

Part of #85

This is just the index-building and support for saving / loading objects with binary Buffers. The search part is still remaining to be done although it is possible to use with the redis client directly in case anyone wants to experiment with it.

Also adds support for storing binary data, even if it isn't a vector indexed (e,.g. a profile object with an avatar image, product with photo etc...)

@CaptainCodeman
Copy link
Contributor Author

This is probably at the point where it is ready for review.

I've successfully used it to store and index both image and text vectors so it's already usable if you're willing to do the search part via the redis client (until we add VSS querying support to this package).

I don't know why the node 14 build started playing up part way through - all the exact same dependencies worked on other branches AFAICT, but I needed to install esbuild directly.

@CaptainCodeman
Copy link
Contributor Author

Here's some examples of the image matching, the top left image in each set is the source that was searched for and the rest are the matches (out of about 1,500 images - a mix of paintings, photos and digital art). Perfect for a "you might also like ..." product page:

1
Screen Shot 2022-07-06 at 10 42 38 AM

2
Screen Shot 2022-07-06 at 10 42 47 AM

3
Screen Shot 2022-07-06 at 10 42 01 AM

4
Screen Shot 2022-07-06 at 10 42 56 AM

5
Screen Shot 2022-07-06 at 10 43 49 AM

6
Screen Shot 2022-07-06 at 10 43 41 AM

7
Screen Shot 2022-07-06 at 10 42 47 AM

8
Screen Shot 2022-07-06 at 10 43 01 AM

9
Screen Shot 2022-07-06 at 10 43 28 AM

10
Screen Shot 2022-07-06 at 10 43 15 AM

11
Screen Shot 2022-07-06 at 10 43 53 AM

@chayim chayim requested a review from leibale July 7, 2022 06:15
README.md Outdated Show resolved Hide resolved
lib/client.ts Outdated Show resolved Hide resolved
lib/entity/fields/entity-binary-field.ts Outdated Show resolved Hide resolved
@@ -56,6 +56,12 @@ export class JsonSchemaBuilder<TEntity extends Entity> extends SchemaBuilder<TEn
...this.buildWeight(fieldDef),
...this.buildIndexed(fieldDef),
]
case 'binary':
if (fieldDef.vector)
console.warn(`You have marked the ${fieldDef.type} field '${field}' as vector but RediSearch doesn't support the VECTOR argument on a field for JSON. Ignored.`);
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm pretty sure it is supported

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've re-enabled it, but it doesn't work when I tried it (against the latest redis-stack-server image on docker hub, and with a numeric array for the field). I based the "not supported" on the comment on Redis Insight:

Screen Shot

Screen Shot 1

Maybe it's soon-to-be-released as there are JSON + Vector commits for RediSearch ...

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I haven't been able to find any definitive docs on what format the JSON vector is expecting - maybe it isn't the encoded bytes but instead the array of floats? If so, the binary field should probably become vector instead so the same value can be set regardless of it being HASH or JSON, or the lib should convert it to a typed array of floats.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Had a few more attempts, but so far haven't been able to get vector similarity working with JSON. Tried array of bytes, array of floats, all result in index failures (using latest redis-stack-server image 7.0.0-RC6)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@CaptainCodeman VSS doesn't work with JSON yet. This is a feature we are working on! Bear with us

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks!, that's what I thought / deduced - any discussion about the format to use is maybe moot for now then :)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@CaptainCodeman, we have basic JSON support on the edge build of RediSearch. This is the master branch build. You can see a simple test in https://github.com/RediSearch/RediSearch/pull/2704/files in test_vecsim.py (or in test_vecsim.py if you pull RediSearch master branch)
I want to make sure this PR will be future-proof with our support for VSS on JSON

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, I'll wait for it to be available in one of the docker images before picking this up again, but I think it will make sense for the field type to be a typed array (Float32Array) rather than a buffer.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like this may be out now, so I'll revisit this PR ...

package.json Outdated Show resolved Hide resolved
@CaptainCodeman CaptainCodeman force-pushed the 85-vector branch 2 times, most recently from 88119b0 to f1e2163 Compare December 13, 2022 18:48
@codecov-commenter
Copy link

codecov-commenter commented Dec 13, 2022

Codecov Report

❗ No coverage uploaded for pull request base (main@3c3d8e6). Click here to learn what that means.
Patch has no changes to coverable lines.

Additional details and impacted files
@@           Coverage Diff           @@
##             main     #104   +/-   ##
=======================================
  Coverage        ?   98.65%           
=======================================
  Files           ?       35           
  Lines           ?     3047           
  Branches        ?      600           
=======================================
  Hits            ?     3006           
  Misses          ?       41           
  Partials        ?        0           

Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here.

☔ View full report at Codecov.
📢 Do you have feedback about the report comment? Let us know in this issue.

@CaptainCodeman
Copy link
Contributor Author

Finally working to index both HASH and JSON data structures with vector similarity.

The tests can use some refactoring to re-use the common code, but it's usable to build the index and store the data correctly.

Based on example in RedisInsight-v2
Redis client changes encoding when passing a JS object to write
Writing fields separately correctly encodes binary data

See: redis/node-redis#2139
From redis/node-redis#2139 (comment)
(doesn't appear to be fixed, even in 4.1.1 release)
Store binary field as Base64 string in JSON (need to check on best practice)
Add unit test to verify vector search operation
Why suddenly failing?
Try adding esbuild as direct dependency
@simon-two
Copy link

+1 for merging this (I'm not a committer/reviewer, just would like to see it added)

@guyroyse
Copy link
Contributor

No worries. It will be merged. Probably early next year. Like February. Just a matter or timing.

@CaptainCodeman
Copy link
Contributor Author

I notice vector fields now support FLOAT64 as well as FLOAT32 for the type parameter so that should probably be accounted for.

Ideally it would determine that from the model, whether it had a Float32Array or Float64Array property (instead of Buffer) but that would need reflect-metadata and be more suited to a TypeScript decorator mapping approach which I might have a play around with.

@guyroyse guyroyse mentioned this pull request Jan 3, 2023
@Shaun-Regenbaum
Copy link

+1 for merging, although I do think support for FLOAT64 is important.

@anaibol
Copy link

anaibol commented Feb 19, 2023

+1 for merging

@guyroyse guyroyse mentioned this pull request Mar 21, 2023
@simon-two
Copy link

Another vote for merging (actually I've voted twice...)

@mauriciocirelli
Copy link

mauriciocirelli commented Apr 16, 2024

Has this been abandoned? This is a very useful feature for redis-om with the popularity of LLM and Embedding documents.

@CaptainCodeman
Copy link
Contributor Author

Has this been abandoned? This is a very useful feature for redis-om with the popularity of LLM and Embedding documents.

Kinda. It would need some effort to rebase onto the latest version and adapt to wherever the lib has diverged since this was submitted, but I don't personally have time to devote to doing that right now.

I've mostly just been using the node redis client directly or else my own decorator-based mapper / index builder that I've been experimenting with.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

10 participants