Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Replace dict with new hashtable: hash datatype #1502

Open
wants to merge 2 commits into
base: unstable
Choose a base branch
from

Conversation

SoftlyRaining
Copy link
Contributor

@SoftlyRaining SoftlyRaining commented Jan 2, 2025

This PR replaces dict with the new hashtable data structure in the HASH datatype. There is a new struct for hashtable items which contains a pointer to value sds string and the embedded key sds string. These values were previously stored in dictEntry. This structure is kept opaque so we can easily add small value embedding or other optimizations in the future.

closes #1095

Copy link

codecov bot commented Jan 2, 2025

Codecov Report

Attention: Patch coverage is 91.40625% with 22 lines in your changes missing coverage. Please review.

Project coverage is 70.91%. Comparing base (9e02049) to head (bc7d761).
Report is 5 commits behind head on unstable.

Files with missing lines Patch % Lines
src/defrag.c 73.91% 6 Missing ⚠️
src/module.c 0.00% 6 Missing ⚠️
src/rdb.c 83.33% 5 Missing ⚠️
src/t_hash.c 97.97% 3 Missing ⚠️
src/db.c 92.30% 1 Missing ⚠️
src/debug.c 50.00% 1 Missing ⚠️
Additional details and impacted files
@@             Coverage Diff              @@
##           unstable    #1502      +/-   ##
============================================
- Coverage     70.93%   70.91%   -0.03%     
============================================
  Files           120      120              
  Lines         64981    64965      -16     
============================================
- Hits          46095    46068      -27     
- Misses        18886    18897      +11     
Files with missing lines Coverage Δ
src/aof.c 80.23% <100.00%> (ø)
src/lazyfree.c 86.11% <100.00%> (ø)
src/object.c 82.04% <100.00%> (-0.11%) ⬇️
src/server.c 87.60% <100.00%> (+0.02%) ⬆️
src/server.h 100.00% <ø> (ø)
src/db.c 89.56% <92.30%> (+0.07%) ⬆️
src/debug.c 52.40% <50.00%> (+0.27%) ⬆️
src/t_hash.c 96.14% <97.97%> (+<0.01%) ⬆️
src/rdb.c 76.90% <83.33%> (-0.04%) ⬇️
src/defrag.c 91.92% <73.91%> (+1.95%) ⬆️
... and 1 more

... and 13 files with indirect coverage changes

@SoftlyRaining SoftlyRaining changed the title [draft] Replace dict with new hashtable: sorted set datatype [draft] Replace dict with new hashtable: hash datatype Jan 2, 2025
@zuiderkwast zuiderkwast self-requested a review January 2, 2025 09:45
Copy link
Contributor

@zuiderkwast zuiderkwast left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I skimmed through it. I looks strait-forward. The only material thing to focus on seems to be the hashTypeEntry abstraction.

It would be good to have a clear abstraction and separation, even within t_hash.c, so we don't have too many direct accesses into the hashTypeEntry internals. It will help us later when we want to add things like TTL and embedded value.

src/server.h Outdated Show resolved Hide resolved
src/defrag.c Outdated Show resolved Hide resolved
@SoftlyRaining SoftlyRaining force-pushed the hash-datatype branch 2 times, most recently from 61772d3 to d357204 Compare January 4, 2025 00:22
@SoftlyRaining SoftlyRaining marked this pull request as ready for review January 8, 2025 21:34
@SoftlyRaining SoftlyRaining changed the title [draft] Replace dict with new hashtable: hash datatype Replace dict with new hashtable: hash datatype Jan 8, 2025
@zuiderkwast zuiderkwast added release-notes This issue should get a line item in the release notes run-extra-tests Run extra tests on this PR (Runs all tests from daily except valgrind and RESP) labels Jan 9, 2025
Copy link
Contributor

@zuiderkwast zuiderkwast left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Awesome! Just a few nits.

Introducing the sds_const type can preferably be a follow-up. It might trigger a long discussion.

Comment on lines +39 to +40
/* takes ownership of value, does not take ownership of field */
hashTypeEntry *hashTypeCreateEntry(const sds field, sds value) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • const sds is meaningless. Is it time to introduce the const_sds typedef in sds.h now?

  • Take ownership of field but not value seems asymmetrical, but anything else is suboptimal, right? That's what we do for keys too IIRC. It's OK.

  • Shall we add a comment for the entry API, like the one below for the hash type API?

/*-----------------------------------------------------------------------------
 * Hash entry API
 *----------------------------------------------------------------------------*/

Comment on lines +606 to +607
hashTypeEntry *hash_entry = entry;
sds sds_val = hashTypeEntryGetValue(hash_entry);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The entry can be cast implicitly. You did that for sds sds_field = hashTypeEntryGetField(entry); a few lines above.

Suggested change
hashTypeEntry *hash_entry = entry;
sds sds_val = hashTypeEntryGetValue(hash_entry);
sds sds_val = hashTypeEntryGetValue(entry);

static void scanLaterSet(robj *ob, unsigned long *cursor) {
if (ob->type != OBJ_SET || ob->encoding != OBJ_ENCODING_HASHTABLE) return;
hashtable *ht = ob->ptr;
*cursor = hashtableScanDefrag(ht, *cursor, activeDefragSdsHashtableCallback, NULL, activeDefragAlloc, HASHTABLE_SCAN_EMIT_REF);
}

static void activeDefragHashTypeEntry(void *privdata, void *element_ref) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please add a comment saying it's used as a scan callback.

Comment on lines +51 to +53
sds hashTypeEntryGetField(const hashTypeEntry *entry) {
const unsigned char *field = entry->field_data + entry->field_offset;
return (sds)field;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This (and the GetValue) should probably return const_sds if we add that type.

return size;
}

hashTypeEntry *hashTypeEntryDefrag(hashTypeEntry *entry, void *(*defragfn)(void *), sds (*sdsdefragfn)(sds)) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please add a short comment above each of the functions, at least the non-trivial ones like this one.

sdsfree(dictGetVal(existing));
dictSetVal(ht, existing, v);
/* exists: replace value */
hashTypeEntryReplaceValue(existing, v);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think hashTypeEntryReplaceValue should return the entry, so we don't have to change the signature in the future. With embedded value, this function may need to reallocate the entry.

For now, we can just assert that it returns existing.

Suggested change
hashTypeEntryReplaceValue(existing, v);
serverAssert(hashTypeEntryReplaceValue(existing, v) == existing);

if (withvalues) value = hashTypeCurrentObjectNewSds(&hi, OBJ_HASH_VALUE);
ret = dictAdd(d, key, value);
ret = dictAdd(d, field, value);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Follow-up: Let's replace this sdsReplyDictType with a hashtable.

@@ -1069,25 +1116,25 @@ void hrandfieldWithCountCommand(client *c, long l, int withvalues) {
else {
/* Hashtable encoding (generic implementation) */
unsigned long added = 0;
listpackEntry key, value;
listpackEntry field, value;
dict *d = dictCreate(&hashDictType);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Follow-up: Let's replace this dict with a hashtable. Then we can delete that dict type.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
release-notes This issue should get a line item in the release notes run-extra-tests Run extra tests on this PR (Runs all tests from daily except valgrind and RESP)
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Use new hashtable for hashes
2 participants