Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Replace removeByPrefix Lua script with Scan #79

Merged
merged 1 commit into from
Jan 3, 2024

Conversation

JulienDeroff
Copy link
Contributor

Resolves the issue #78

Description of the change

Replace the Lua script that removes keys based on a prefix with multiple calls.
Bulk calls made to remove the detected keys by chunks of 2500.

Note: 2500 was chosen empirically after several rounds of testing. The value can be adjusted.

SCAN is being used over KEYS to avoid blocking operations.

Motivation

According to the official documentation:
https://redis.io/commands/eval/

Important: to ensure the correct execution of scripts, both in standalone and clustered deployments, all names of keys that a script accesses must be explicitly provided as input key arguments. The script should only access keys whose names are given as input arguments. Scripts should never access keys with programmatically-generated names or based on the contents of data structures stored in the database.

After some testing, the issue happens when trying to remove between 7500 and 8000 entries by prefix.

@CLAassistant
Copy link

CLAassistant commented Nov 6, 2023

CLA assistant check
All committers have signed the CLA.

Comment on lines +98 to +101
while (keys.Length != 0 || index < chunkSize) {
total += await RemoveAllAsync(keys).AnyContext();
index += chunkSize;
(cursor, keys) = await ScanKeysAsync(regex, cursor, chunkSize).AnyContext();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wouldn't this fail if there were more than 2500 batches?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure why it would fail if there were more than 2500 batches.

I've tested with 10.000.000 entries, it worked fine.

(Except I had to split the parameters passed to SetAllAsync by batches of 1.000.000 as the current implementation does not support bigger enumerables - the ~1M limit is a Redis limit).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was thinking about the logic of keys.Length != 0 || index < chunkSize it didn't seem like the second part was correct.

private async Task<(int, string[])> ScanKeysAsync(string prefix, int index, int chunkSize)
{
var result = await Database.ExecuteAsync("scan", index, "match", prefix, "count", chunkSize).AnyContext();
var uniqueValue = result.ToDictionary().FirstOrDefault();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is the shape of the result here? If it's two elements, can we speed this up by selecting the array element?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good idea.
I've made the change: I'm casting directly to the expected types.

@niemyjski
Copy link
Member

The error message was never logged either: {"ERR Error running script (call to f_b8b6b7b2f77bc4399234cb5cbcf1f8cf083db8c5): @user_script:2: user_script:2: too many results to unpack"}

@niemyjski niemyjski merged commit ec1d2ed into FoundatioFx:master Jan 3, 2024
2 checks passed
@niemyjski
Copy link
Member

Thanks for the PR!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants