Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

repository.createIndex is not safe for concurrent use #254

Open
golopot opened this issue Nov 6, 2024 · 3 comments
Open

repository.createIndex is not safe for concurrent use #254

golopot opened this issue Nov 6, 2024 · 3 comments
Assignees

Comments

@golopot
Copy link

golopot commented Nov 6, 2024

When repository.createIndex is called multiple times concurrently, it might throw an exception [ErrorReply: Index already exists]. This happens when test runner run app init code concurrently.

Another problem is that the thrown exception should be an Error.

Reproduction:

import { createClient } from "redis";
import { Schema } from "redis-om";
import { Repository } from "redis-om";

async function main() {
  const redis = createClient();
  await redis.connect();
  {
    const schema = new Schema("s", {
      a: { type: "string" },
    });
    const repo = new Repository(schema, redis);
    await repo.createIndex();
  }
  {
    const schema = new Schema("s", {
      a: { type: "string" },
      b: { type: "string" },
    });
    const repo = new Repository(schema, redis);
    await Promise.all([
      repo.createIndex(),
      repo.createIndex(),
      repo.createIndex(),
      repo.createIndex(),
      repo.createIndex(),
    ]);
  }

  return redis.quit();
}

main().catch((err) => {
  console.error("err at main", err);
});
@guyroyse
Copy link
Contributor

This makes sense. Redis itself creates the indices asynchronously. If you call FT.CREATE in rapid succession from Node Redis, or perhaps even from redis-cli itself, I would expect the same error.

I can code around this, but it will require me to ask Redis before each index creation if the index creation is already in progress or not. So, I'll need to think about how to do it in a way that doesn't negatively impact performance.

@guyroyse guyroyse self-assigned this Nov 12, 2024
@golopot
Copy link
Author

golopot commented Nov 12, 2024

I think it can be solved by rewriting createIndex as a Lua script. When running concurrently, a latter call will early return at the compare hash step.

/**
* Creates an index in Redis for use by the {@link Repository#search} method.
* Does not create a new index if the index hasn't changed. Requires that
* RediSearch and RedisJSON are installed on your instance of Redis.
*/
async createIndex() {
const currentIndexHash = await this.client.get(this.#schema.indexHashName)
const incomingIndexHash = this.#schema.indexHash
if (currentIndexHash !== incomingIndexHash) {
await this.dropIndex()
const {
indexName, indexHashName, dataStructure,
schemaName: prefix, useStopWords, stopWords
} = this.#schema
const schema = buildRediSearchSchema(this.#schema)
const options: CreateOptions = {
ON: dataStructure,
PREFIX: `${prefix}:`
}
if (useStopWords === 'OFF') {
options.STOPWORDS = []
} else if (useStopWords === 'CUSTOM') {
options.STOPWORDS = stopWords
}
await Promise.all([
this.client.createIndex(indexName, schema, options),
this.client.set(indexHashName, incomingIndexHash)
])
}
}

@guyroyse
Copy link
Contributor

Scripts can get messy when we start dealing with clustered environments. Here's what I'm thinking:

  1. Write out the new hash value as soon as I know it has changed. Or maybe just always write it out using GETSET.
  2. If there is a mismatch, check to see if the index is still indexing using FT.INFO and reading the indexing property that it returns. If it is indexing, then another bit of code has already triggered to index recreation and we can skip it.
  3. Otherwise, drop and recreate.

The flaw here is that if two bits of code are changing the same index with different schemas, then Redis could get into a weird state where the hash and the index don't match. This should be exceedingly rare and would be easy to fix by deleting the hash in Redis and calling createIndex again. It might even be worthwhile to add a flag to createIndex that allows you to force this behavior.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants