Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Replace dict with new hashtable: sorted set datatype #1427

Merged
merged 6 commits into from
Jan 8, 2025

Conversation

SoftlyRaining
Copy link
Contributor

@SoftlyRaining SoftlyRaining commented Dec 11, 2024

This PR replaces dict with hashtable in the ZSET datatype. Instead of mapping key to score as dict did, the hashtable maps key to a node in the skiplist, which contains the score. This takes advantage of hashtable performance improvements and saves 15 bytes per set item - 24 bytes overhead before, 9 bytes after.

Closes #1096

@SoftlyRaining SoftlyRaining marked this pull request as draft December 11, 2024 17:24
@SoftlyRaining SoftlyRaining force-pushed the zset-datatype branch 2 times, most recently from d2d854d to 6a8ee44 Compare December 17, 2024 22:49
@zuiderkwast
Copy link
Contributor

This PR is ready for review! Needed changes have been merged and I've rebased

Great! You could mark it as not draft then. :) We use the top comment and PR title for the final commit message when it gets merged, so you could update those to concisely describe the change (i.e. like a commit message).

It's late in my time zone so I'll look tomorrow.

@SoftlyRaining SoftlyRaining marked this pull request as ready for review December 18, 2024 00:36
@SoftlyRaining SoftlyRaining changed the title [draft] replace dict with hashtable: ZSET datatype replace dict with hashtable: ZSET datatype Dec 18, 2024
@enjoy-binbin enjoy-binbin added release-notes This issue should get a line item in the release notes run-extra-tests Run extra tests on this PR (Runs all tests from daily except valgrind and RESP) labels Dec 18, 2024
Copy link

codecov bot commented Dec 18, 2024

Codecov Report

Attention: Patch coverage is 92.55663% with 23 lines in your changes missing coverage. Please review.

Project coverage is 70.75%. Comparing base (b3b4bdc) to head (8316c90).
Report is 11 commits behind head on unstable.

Files with missing lines Patch % Lines
src/module.c 0.00% 10 Missing ⚠️
src/defrag.c 84.37% 5 Missing ⚠️
src/object.c 37.50% 5 Missing ⚠️
src/db.c 89.47% 2 Missing ⚠️
src/debug.c 92.30% 1 Missing ⚠️
Additional details and impacted files
@@             Coverage Diff              @@
##           unstable    #1427      +/-   ##
============================================
- Coverage     70.83%   70.75%   -0.09%     
============================================
  Files           120      120              
  Lines         64911    64959      +48     
============================================
- Hits          45982    45962      -20     
- Misses        18929    18997      +68     
Files with missing lines Coverage Δ
src/aof.c 80.23% <100.00%> (+0.11%) ⬆️
src/evict.c 98.47% <100.00%> (-0.38%) ⬇️
src/geo.c 93.58% <100.00%> (+0.02%) ⬆️
src/rdb.c 76.75% <100.00%> (+0.67%) ⬆️
src/server.c 87.61% <100.00%> (+0.14%) ⬆️
src/server.h 100.00% <ø> (ø)
src/sort.c 94.82% <100.00%> (-0.34%) ⬇️
src/t_zset.c 96.80% <100.00%> (+1.13%) ⬆️
src/debug.c 52.12% <92.30%> (+0.13%) ⬆️
src/db.c 89.48% <89.47%> (-0.07%) ⬇️
... and 3 more

... and 37 files with indirect coverage changes

Copy link
Contributor

@zuiderkwast zuiderkwast left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks awesome. Just a few nits.

src/db.c Outdated Show resolved Hide resolved
src/t_zset.c Outdated Show resolved Hide resolved
src/t_zset.c Outdated Show resolved Hide resolved
src/t_zset.c Outdated Show resolved Hide resolved
src/defrag.c Show resolved Hide resolved
src/server.c Outdated Show resolved Hide resolved
src/t_zset.c Outdated Show resolved Hide resolved
Copy link
Member

@ranshid ranshid left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

great work @SoftlyRaining !

Some minor comments and I feel somewhat uncomfortable about the way we implemented the union.

src/db.c Outdated Show resolved Hide resolved
src/defrag.c Outdated Show resolved Hide resolved
src/t_zset.c Outdated Show resolved Hide resolved
@zuiderkwast zuiderkwast changed the title replace dict with hashtable: ZSET datatype Replace dict with new hashtable: sorted set datatype Dec 21, 2024
src/t_zset.c Outdated Show resolved Hide resolved
src/t_zset.c Outdated Show resolved Hide resolved
Copy link
Contributor

@zuiderkwast zuiderkwast left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just fix the naming converion (compareNodeScoreEle -> zslCompareNodeScoreEle, etc.). Then I think it's good to merge.

I don't really get why you want to delete zslGetRank. If it's an optimization beyond the replacement of dict, then it can as well be a follow up, right? As you want.

src/defrag.c Outdated Show resolved Hide resolved
@SoftlyRaining
Copy link
Contributor Author

I poked around and found that all uses of zslGetRank could more efficiently use zslGetRankByNode, and it became dead code. I already made the revision at any rate but you're right, it might've been a separate small cleanup PR 😅

@ranshid
Copy link
Member

ranshid commented Jan 5, 2025

I poked around and found that all uses of zslGetRank could more efficiently use zslGetRankByNode, and it became dead code. I already made the revision at any rate but you're right, it might've been a separate small cleanup PR 😅

@SoftlyRaining My only concern is a potential small degradation in performance of zcount and zlexcount. can we just make sure to verify we have no impact on these operations?

@SoftlyRaining
Copy link
Contributor Author

I poked around and found that all uses of zslGetRank could more efficiently use zslGetRankByNode, and it became dead code. I already made the revision at any rate but you're right, it might've been a separate small cleanup PR 😅

@SoftlyRaining My only concern is a potential small degradation in performance of zcount and zlexcount. can we just make sure to verify we have no impact on these operations?

I will remove that aspect of the change so it can be more thoroughly investigated and benchmarked as a separate PR. I'd prefer to avoid delaying the core hashtable work. :)

Copy link
Contributor

@zuiderkwast zuiderkwast left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@ranshid The zslGetRank changes have been removed, so I guess it's safe to merge. WDYT?

Copy link
Member

@ranshid ranshid left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Small comment which I think we could skip handling for now and maybe only extend the comment.
I did not rescan the entire change so LGTM

Comment on lines +306 to +310
if ((node->backward == NULL || node->backward->score < newscore) &&
(node->level[0].forward == NULL || node->level[0].forward->score > newscore)) {
node->score = newscore;
return NULL;
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Small mark: it is still possible that a node position change will NOT take place after this check. For example in case we update the score to something that exactly matches the score of the prev or next node.
The check can be extended to check also equality of the score (but will also need to compare the key order).
I guess this is fine for now, but maybe extend the comment above to explain that?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right, the edge cases. Now, we remove and re-insert the node at the same position in this case, which is OK. It's not introduced in this PR anyway. We can convert the comment to a follow-up issue.

We could use zslCompareNodes here, but then we'd need to set the new score before we compare and then revert it if the check fails. Or use a stack-allocated temporary node for comparing, just to be able to use zslCompareNodes.

Suggested change
if ((node->backward == NULL || node->backward->score < newscore) &&
(node->level[0].forward == NULL || node->level[0].forward->score > newscore)) {
node->score = newscore;
return NULL;
}
double oldscore = node->score;
node->score = newscore;
if ((node->backward == NULL || zslCompareNodes(node->backward, node) <= 0) &&
zslCompareNodes(node->level[0].forward, node) >= 0) {
return NULL;
} else {
/* Restore score to restore skiplist order. */
node->score = oldscore;
}

src/t_zset.c Outdated Show resolved Hide resolved
about edge cases: allowing score to be equal to pref or next node and also compare ele in these cases.

Signed-off-by: Viktor Söderqvist <[email protected]>
@zuiderkwast zuiderkwast merged commit ab627d6 into valkey-io:unstable Jan 8, 2025
57 of 59 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
release-notes This issue should get a line item in the release notes run-extra-tests Run extra tests on this PR (Runs all tests from daily except valgrind and RESP)
Projects
Status: Done
Development

Successfully merging this pull request may close these issues.

Use new hashtable for sorted sets
4 participants