Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Vector Similarity Index #104

Open
wants to merge 28 commits into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
28 commits
Select commit Hold shift + click to select a range
653da6c
Vector Similarity field definition & builder
CaptainCodeman Jun 30, 2022
a7daafc
Create vector search integration test
CaptainCodeman Jun 30, 2022
0d1ab37
Allow setting Buffer field for hash data
CaptainCodeman Jun 30, 2022
3991489
Change how hash data is written to prevent Buffer encoding
CaptainCodeman Jun 30, 2022
48636ef
Nicer solution to Buffer encoding issue
CaptainCodeman Jun 30, 2022
11945ba
Vector index is only supported for HASH structures
CaptainCodeman Jul 1, 2022
b2d61d9
Add note to revert hSet tweak when fix is released
CaptainCodeman Jul 1, 2022
2629cec
Change `vector` field to `binary`, keep vector as the index option
CaptainCodeman Jul 1, 2022
143ef47
Refactor test to cleanup in case of failure
CaptainCodeman Jul 1, 2022
f340570
Describe binary \ vector in readme
CaptainCodeman Jul 2, 2022
00f08d4
Kick CI build
CaptainCodeman Jul 2, 2022
9bdbe30
Try without node 14
CaptainCodeman Jul 2, 2022
bc52b64
Make vector result check less strict
CaptainCodeman Jul 2, 2022
7b80089
Revert "Try without node 14"
CaptainCodeman Jul 2, 2022
17ea346
Return buffers from Redis for HASH structures
CaptainCodeman Jul 4, 2022
0e7c340
Handle string or Buffer passed to fromRedisHash
CaptainCodeman Jul 4, 2022
ce04c79
Test for entity binary field encoding / decoding
CaptainCodeman Jul 4, 2022
c13f2b2
Need to return buffers from search as well
CaptainCodeman Jul 4, 2022
35037a6
Fix node 14 CI (?)
CaptainCodeman Jul 6, 2022
ecd5864
Oxford comma
CaptainCodeman Jul 8, 2022
60f6c6e
Update to node-redis 4.2.0
CaptainCodeman Jul 8, 2022
1156558
Store binary as numeric array for Json data structure
CaptainCodeman Jul 8, 2022
6505e05
Remove direct esbuild dependency
CaptainCodeman Jul 8, 2022
57ec10d
Re-add esbuild dependency
CaptainCodeman Jul 8, 2022
14f3dcd
update dependencies
CaptainCodeman Dec 13, 2022
eaa3763
configure coverage for latest vitest
CaptainCodeman Dec 13, 2022
07a9861
update node-redis
CaptainCodeman Dec 13, 2022
ff0800a
working HASH + JSON vector search
CaptainCodeman Dec 13, 2022
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
10 changes: 3 additions & 7 deletions .github/workflows/ci.yml
Original file line number Diff line number Diff line change
Expand Up @@ -26,17 +26,13 @@ jobs:
--health-retries 5
steps:
- name: Checkout
uses: actions/checkout@v2
uses: actions/checkout@v3

- name: Use Node.js ${{ matrix.node-version }}
uses: actions/setup-node@v2.3.0
uses: actions/setup-node@v3
with:
node-version: ${{ matrix.node-version }}
- name: Cache dependencies
uses: c-hive/gha-npm-cache@v1

- name: Update npm
run: npm install --global npm
cache: 'npm'

- name: Install packages
run: npm ci
Expand Down
27 changes: 16 additions & 11 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -222,7 +222,7 @@ const studioSchema = new Schema(Studio, {
})
```

When you create a `Schema`, it modifies the entity you handed it, adding getters and setters for the properties you define. The type those getters and setters accept and return are defined with the type parameter above. Valid values are: `string`, `number`, `boolean`, `string[]`, `date`, `point`, or `text`.
When you create a `Schema`, it modifies the entity you handed it, adding getters and setters for the properties you define. The type those getters and setters accept and return are defined with the type parameter above. Valid values are: `string`, `number`, `boolean`, `string[]`, `date`, `point`, `text`, or `binary`.

The first three do exactly what you think—they define a property that is a [String](https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/String), a [Number](https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/Number), or a [Boolean](https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/Boolean). `string[]` does what you'd think as well, specifically defining an [Array](https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/Array) of Strings.

Expand All @@ -236,17 +236,20 @@ const point = { longitude: 12.34, latitude: 56.78 }

A `text` field is a lot like a `string`. If you're just reading and writing objects, they are identical. But if you want to *search* on them, they are very, very different. I'll cover that in detail when I talk about [using RediSearch](#-using-redisearch) but the tl;dr is that `string` fields can only be matched on their whole value—no partial matches—and are best for keys while `text` fields have full-text search enabled on them and are optimized for human-readable text.

Additional field options can be set depending on the field type. These correspond to the [Field Options](https://redis.io/commands/ft.create/#field-options) available when creating a RediSearch full-text index. Other than the `separator` option, these only affect how content is indexed and searched.
A `binary` field is a binary blob of data using a `Buffer` object. For Hash data structures it will be stored as a binary field in Redis, for JSON data structures it will be serialized to a numeric array. The `binary` field can be indexed as a [Vector Similarity](https://redis.io/docs/stack/search/reference/vectors/) field.

| schema type | RediSearch type | `indexed` | `sortable` | `normalized` | `stemming` | `phonetic` | `weight` | `separator` | `caseSensitive` |
| -------------- | :-------------: | :-------: | :--------: | :----------: | :--------: | :--------: | :------: | :---------: | :-------------: |
| `string` | TAG | yes | HASH Only | HASH Only | - | - | - | yes | yes |
| `number` | NUMERIC | yes | yes | - | - | - | - | - | - |
| `boolean` | TAG | yes | HASH Only | - | - | - | - | - | - |
| `string[]` | TAG | yes | HASH Only | HASH Only | - | - | - | yes | yes |
| `date` | NUMERIC | yes | yes | - | | - | - | - | - |
| `point` | GEO | yes | - | - | | - | - | - | - |
| `text` | TEXT | yes | yes | yes | yes | yes | yes | - | - |
Additional field options can be set depending on the field type. These correspond to the [Field Options](https://redis.io/commands/ft.create/#field-options) avialable when creating a RediSearch full-text index. Other than the `separator` option, these only affect how content is indexed and searched.

| schema type | RediSearch type | `indexed` | `sortable` | `normalized` | `stemming` | `phonetic` | `weight` | `separator` | `caseSensitive` | `vector` |
| -------------- | :-------------: | :-------: | :--------: | :----------: | :--------: | :--------: | :------: | :---------: | :-------------: | :------: |
| `string` | TAG | yes | HASH Only | HASH Only | - | - | - | yes | yes | - |
| `number` | NUMERIC | yes | yes | - | - | - | - | - | - | - |
| `boolean` | TAG | yes | HASH Only | - | - | - | - | - | - | - |
| `string[]` | TAG | yes | HASH Only | HASH Only | - | - | - | yes | yes | - |
| `date` | NUMERIC | yes | yes | - | - | - | - | - | - | - |
| `point` | GEO | yes | - | - | - | - | - | - | - | - |
| `text` | TEXT | yes | yes | yes | yes | yes | yes | - | - | - |
| `binary` | VECTOR | yes | - | - | - | - | - | - | - | yes |

* `indexed`: true | false, whether this field is indexed by RediSearch (default true)
* `sortable`: true | false, whether to create an additional index to optmize sorting (default false)
Expand All @@ -256,6 +259,7 @@ Additional field options can be set depending on the field type. These correspon
* `weight`: number, the importance weighting to use when ranking results (default 1)
* `separator`: string, the character to delimit multiple tags (default '|')
* `caseSensitive`: true | false, whether original letter casing is kept for search (default false)
* `vector`: object containing [Vector Similarity](https://redis.io/docs/stack/search/reference/vectors/) configuration

Example showing additional options:

Expand All @@ -269,6 +273,7 @@ const commentSchema = new Schema(Comment, {
approved: { type: 'boolean', indexed: false },
iphash: { type: 'string', caseSensitive: true },
notes: { type: 'string', indexed: false },
image: { type: 'binary', vector: { algorithm: 'FLAT', dim: 512, distance_metric: 'COSINE' } },
})
```

Expand Down
8 changes: 4 additions & 4 deletions lib/client.ts
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
import { createClient } from 'redis';
import { createClient, commandOptions } from 'redis';
import { Repository } from './repository';
import { JsonRepository, HashRepository } from './repository';
import { Entity } from './entity/entity';
Expand All @@ -11,7 +11,7 @@ export type RedisConnection = ReturnType<typeof createClient>;
* Alias for a JavaScript object used by HSET.
* @internal
*/
export type RedisHashData = { [key: string]: string };
export type RedisHashData = { [key: string]: string | Buffer };

/**
* Alias for any old JavaScript object used by JSON.SET.
Expand Down Expand Up @@ -174,7 +174,7 @@ export class Client {

if (keysOnly) command.push('RETURN', '0');

return this.redis.sendCommand<any[]>(command);
return this.redis.sendCommand<any[]>(command, commandOptions({ returnBuffers: true }));
}

/** @internal */
Expand Down Expand Up @@ -204,7 +204,7 @@ export class Client {
/** @internal */
async hgetall(key: string): Promise<RedisHashData> {
this.validateRedisOpen();
return this.redis.hGetAll(key);
return this.redis.hGetAll(commandOptions({ returnBuffers: true }), key);
}

/** @internal */
Expand Down
2 changes: 1 addition & 1 deletion lib/entity/entity-value.ts
Original file line number Diff line number Diff line change
Expand Up @@ -3,4 +3,4 @@ import { Point } from "./point";
/**
* Valid types for properties on an {@link Entity}.
*/
export type EntityValue = string | number | boolean | Point | Date | any[] | null;
export type EntityValue = string | number | boolean | Point | Date | any[] | Buffer | null;
4 changes: 3 additions & 1 deletion lib/entity/entity.ts
Original file line number Diff line number Diff line change
Expand Up @@ -8,6 +8,7 @@ import {
EntityStringArrayField,
EntityStringField,
EntityTextField,
EntityBinaryField,
EntityFieldConstructor,
} from "./fields";
import { Schema } from "../schema/schema";
Expand All @@ -21,7 +22,8 @@ const ENTITY_FIELD_CONSTRUCTORS: Record<SchemaFieldType, EntityFieldConstructor>
'text': EntityTextField,
'date': EntityDateField,
'point': EntityPointField,
'string[]': EntityStringArrayField
'string[]': EntityStringArrayField,
'binary': EntityBinaryField,
}

/**
Expand Down
45 changes: 45 additions & 0 deletions lib/entity/fields/entity-binary-field.ts
Original file line number Diff line number Diff line change
@@ -0,0 +1,45 @@
import { EntityField } from "./entity-field";
import { RedisHashData, RedisJsonData } from "../../client";
import { EntityValue } from "../entity-value";

export class EntityBinaryField extends EntityField {
toRedisJson(): RedisJsonData {
const data: RedisJsonData = {};
if (this.value !== null) {
const bytes = this.valueAsBuffer
const arr = new Float32Array(bytes.buffer, bytes.byteOffset, bytes.length / Float32Array.BYTES_PER_ELEMENT)
data[this.name] = [...arr]
}
return data;
}

fromRedisJson(value: any) {
if (!this.isBuffer(value)) {
throw Error(`Non-binary value of '${value}' read from Redis for binary field.`)
}
this.value = value
}

toRedisHash(): RedisHashData {
const data: RedisHashData = {};
if (this.value !== null) data[this.name] = this.valueAsBuffer
return data;
}

fromRedisHash(value: string | Buffer) {
if (!this.isBuffer(value)) {
throw Error(`Non-binary value of '${value}' read from Redis for binary field.`)
}
this.value = value as Buffer
}

protected validateValue(value: EntityValue) {
super.validateValue(value);
if (value !== null && !this.isBuffer(value))
throw Error(`Expected value with type of 'binary' but received '${value}'.`);
}

private get valueAsBuffer(): Buffer {
return this.value as Buffer
}
}
7 changes: 4 additions & 3 deletions lib/entity/fields/entity-boolean-field.ts
Original file line number Diff line number Diff line change
Expand Up @@ -9,10 +9,11 @@ export class EntityBooleanField extends EntityField {
return data;
};

fromRedisHash(value: string) {
if (value === '0') {
fromRedisHash(value: string | Buffer) {
const str = value.toString()
if (str === '0') {
this.value = false;
} else if (value === '1') {
} else if (str === '1') {
this.value = true;
} else {
throw Error(`Non-boolean value of '${value}' read from Redis for boolean field.`);
Expand Down
4 changes: 2 additions & 2 deletions lib/entity/fields/entity-date-field.ts
Original file line number Diff line number Diff line change
Expand Up @@ -20,8 +20,8 @@ export class EntityDateField extends EntityField {
return data;
}

fromRedisHash(value: string) {
const parsed = Number.parseFloat(value);
fromRedisHash(value: string | Buffer) {
const parsed = Number.parseFloat(value.toString());
if (Number.isNaN(parsed)) throw Error(`Non-numeric value of '${value}' read from Redis for date field.`);
const date = new Date();
date.setTime(parsed * 1000);
Expand Down
8 changes: 6 additions & 2 deletions lib/entity/fields/entity-field.ts
Original file line number Diff line number Diff line change
Expand Up @@ -44,8 +44,8 @@ export abstract class EntityField {
return data;
}

fromRedisHash(value: string) {
this.value = value;
fromRedisHash(value: string | Buffer) {
this.value = value.toString();
}

protected validateValue(value: EntityValue) {
Expand All @@ -67,4 +67,8 @@ export abstract class EntityField {
protected isBoolean(value: EntityValue) {
return typeof value === 'boolean';
}

protected isBuffer(value: EntityValue) {
return value instanceof Buffer;
}
}
4 changes: 2 additions & 2 deletions lib/entity/fields/entity-number-field.ts
Original file line number Diff line number Diff line change
Expand Up @@ -2,8 +2,8 @@ import { EntityField } from "./entity-field";
import { EntityValue } from "../entity-value";

export class EntityNumberField extends EntityField {
fromRedisHash(value: string) {
const number = Number.parseFloat(value);
fromRedisHash(value: string | Buffer) {
const number = Number.parseFloat(value.toString());
if (Number.isNaN(number)) throw Error(`Non-numeric value of '${value}' read from Redis for number field.`);
this.value = number;
}
Expand Down
7 changes: 4 additions & 3 deletions lib/entity/fields/entity-point-field.ts
Original file line number Diff line number Diff line change
Expand Up @@ -29,9 +29,10 @@ export class EntityPointField extends EntityField {
return data;
};

fromRedisHash(value: string) {
if (value.match(IS_COORD_PAIR)) {
const [longitude, latitude] = value.split(',').map(Number.parseFloat);
fromRedisHash(value: string | Buffer) {
const str = value.toString()
if (str.match(IS_COORD_PAIR)) {
const [longitude, latitude] = str.split(',').map(Number.parseFloat);
this.value = { longitude, latitude };
} else {
throw Error(`Non-point value of '${value}' read from Redis for point field.`);
Expand Down
4 changes: 2 additions & 2 deletions lib/entity/fields/entity-string-array-field.ts
Original file line number Diff line number Diff line change
Expand Up @@ -10,8 +10,8 @@ export class EntityStringArrayField extends EntityField {
return data;
}

fromRedisHash(value: string) {
this.value = value.split(this.separator);
fromRedisHash(value: string | Buffer) {
this.value = value.toString().split(this.separator);
}

protected validateValue(value: EntityValue) {
Expand Down
1 change: 1 addition & 0 deletions lib/entity/fields/index.ts
Original file line number Diff line number Diff line change
@@ -1,3 +1,4 @@
export * from './entity-binary-field'
export * from './entity-boolean-field'
export * from './entity-date-field'
export * from './entity-field-constructor'
Expand Down
5 changes: 5 additions & 0 deletions lib/schema/builders/hash-schema-builder.ts
Original file line number Diff line number Diff line change
Expand Up @@ -52,6 +52,11 @@ export class HashSchemaBuilder<TEntity extends Entity> extends SchemaBuilder<TEn
...this.buildWeight(fieldDef),
...this.buildIndexed(fieldDef),
]
case 'binary':
return [
fieldAlias,
...this.buildVector(fieldDef),
]
};
}
}
5 changes: 5 additions & 0 deletions lib/schema/builders/json-schema-builder.ts
Original file line number Diff line number Diff line change
Expand Up @@ -56,6 +56,11 @@ export class JsonSchemaBuilder<TEntity extends Entity> extends SchemaBuilder<TEn
...this.buildWeight(fieldDef),
...this.buildIndexed(fieldDef),
]
case 'binary':
return [
...fieldInfo,
...this.buildVector(fieldDef),
]
};
}
}
45 changes: 45 additions & 0 deletions lib/schema/builders/schema-builder.ts
Original file line number Diff line number Diff line change
Expand Up @@ -8,6 +8,7 @@ import {
SortableFieldDefinition,
NormalizedFieldDefinition,
WeightFieldDefinition,
BinaryFieldDefinition,
} from "../definition";
import { Schema } from "../schema";

Expand Down Expand Up @@ -60,4 +61,48 @@ export abstract class SchemaBuilder<TEntity extends Entity> {
protected buildWeight(field: WeightFieldDefinition) {
return field.weight ? ['WEIGHT', field.weight.toString()] : []
}

protected buildVector(field: BinaryFieldDefinition) {
// assume that indexed: false takes precedence
if (!(field.indexed ?? this.schema.indexedDefault) || !field.vector) {
return ['NOINDEX']
}

const results = [
'TYPE', field.vector.vector_type ?? 'FLOAT32',
'DIM', field.vector.dim.toString(),
'DISTANCE_METRIC', field.vector.distance_metric,
]

if (field.vector.initial_cap) {
results.push('INITIAL_CAP', field.vector.initial_cap.toString())
}

switch (field.vector.algorithm) {
case 'FLAT':
if (field.vector.block_size) {
results.push('BLOCK_SIZE', field.vector.block_size.toString())
}
break

case 'HNSW':
if (field.vector.m) {
results.push('M', field.vector.m.toString())
}
if (field.vector.ef_construction) {
results.push('EF_CONSTRUCTION', field.vector.ef_construction.toString())
}
if (field.vector.ef_runtime) {
results.push('EF_RUNTIME', field.vector.ef_runtime.toString())
}
break
}

return [
'VECTOR',
field.vector.algorithm,
results.length.toString(),
...results,
]
}
}
Loading