You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
The hybrid sorting algorithm in bench_sort.cpp has some issues leading to an incorrect performance improvement of 3x (#173).
Issue 1
The 4 characters are inserted into the upper 32-bits of the order variable in the wrong order. So str[3] is the most significant byte of the order, leading to incorrect initial sorting order.
// What if we take up-to 4 first characters and the indexfor (size_t i = 0; i != strings.size(); ++i)
std::memcpy((char *)&order[i] + offset_in_word, strings[order[i]].c_str(),
std::min<std::size_t>(strings[order[i]].size(), 4ul));
Issue 2
Currently offset_in_word is set to 0, which means that the order array which holds the 32-bit string order indexes, gets overridden by the first 4 chars of the string, then subsequently cleared. So the final sort does nothing as all values in order are zero.
Although setting offset_in_word to 4 works for hybrid_sort_cpp it breaks hybrid_stable_sort_cpp.
Fixes
After temporarily fixing the above issues it becomes about 1.1x slower than sz_sort. In which the bottle neck seems to be the final sort.
I managed to improve this by calling std::sort on each of the partially sorted chunks instead of the all strings, which makes it ~2.25x faster than sz_sort.
Fixed version (with `offset_in_word=4`)
staticidx_thybrid_sort_cpp_v2(strings_tconst &strings, sz_u64_t *order) {
// What if we take up-to 4 first characters and the indexfor (size_t i = 0; i != strings.size(); ++i) {
size_tindex = order[i];
for (size_t j = 0; j < std::min<std::size_t>(strings[(sz_size_t)index].size(), 4ul); ++j) {
std::memcpy((char *)&order[i] + offset_in_word + 3 - j, strings[(sz_size_t)index].c_str() + j, 1ul);
}
}
std::sort(order, order + strings.size(), [&](sz_u64_t i, sz_u64_t j) {
char *i_bytes = (char *)&i;
char *j_bytes = (char *)&j;
return *(uint32_t *)(i_bytes + offset_in_word) < *(uint32_t *)(j_bytes + offset_in_word);
});
constauto extractBytes = [](sz_u64_t v) -> uint32_t {
char *bytes = (char *)&v;
return *(uint32_t *)(bytes + offset_in_word);
};
if (strings.size() >= 2) {
size_t prevIndex = 0;
uint64_t prevBytes = extractBytes(order[0]);
for (size_t i = 1; i < strings.size(); ++i) {
//uint32_t bytes = extractBytes(order[i]);
if (bytes != prevBytes) {
std::sort(order + prevIndex, order + i, [&](sz_u64_t i, sz_u64_t j) {
// Assumes: offset_in_word==4sz_size_t i_index = i & 0xFFFF'FFFF;
sz_size_t j_index = j & 0xFFFF'FFFF;
return strings[i_index] < strings[j_index];
});
prevIndex = i;
prevBytes = bytes;
}
}
std::sort(order + prevIndex, order + strings.size(), [&](sz_u64_t i, sz_u64_t j) {
sz_size_t i_index = i & 0xFFFF'FFFF;
sz_size_t j_index = j & 0xFFFF'FFFF;
return strings[i_index] < strings[j_index];
});
}
for (size_t i = 0; i != strings.size(); ++i) std::memset((char *)&order[i] + offset_in_word, 0, 4ul);
return strings.size();
}
I did not create a PR, because these fixes break the stable sort.
Steps to reproduce
See Above.
Expected behavior
See Above.
StringZilla version
3.11.3
Operating System
Windows 10
Hardware architecture
x86
Which interface are you using?
Other bindings
Contact Details
No response
Are you open to being tagged as a contributor?
I am open to being mentioned in the project .git history as a contributor
Is there an existing issue for this?
I have searched the existing issues
Code of Conduct
I agree to follow this project's Code of Conduct
The text was updated successfully, but these errors were encountered:
@ashvardanian I have found the issue, permute_base was being passed into the hybrid stable sort bench_permute call instead of permute_new so the expect_same call would most likely always fail if the input had duplicate strings.
Describe the bug
The hybrid sorting algorithm in bench_sort.cpp has some issues leading to an incorrect performance improvement of 3x (#173).
Issue 1
The 4 characters are inserted into the upper
32-bits
of theorder
variable in the wrong order. Sostr[3]
is the most significant byte of theorder
, leading to incorrect initial sorting order.Issue 2
Currently
offset_in_word
is set to0
, which means that theorder
array which holds the 32-bit string order indexes, gets overridden by the first 4 chars of the string, then subsequently cleared. So the final sort does nothing as all values inorder
are zero.Although setting
offset_in_word
to4
works forhybrid_sort_cpp
it breakshybrid_stable_sort_cpp
.Fixes
After temporarily fixing the above issues it becomes about 1.1x slower than
sz_sort
. In which the bottle neck seems to be the final sort.I managed to improve this by calling
std::sort
on each of the partially sorted chunks instead of the all strings, which makes it ~2.25x faster thansz_sort
.Fixed version (with `offset_in_word=4`)
I did not create a PR, because these fixes break the stable sort.
Steps to reproduce
See Above.
Expected behavior
See Above.
StringZilla version
3.11.3
Operating System
Windows 10
Hardware architecture
x86
Which interface are you using?
Other bindings
Contact Details
No response
Are you open to being tagged as a contributor?
.git
history as a contributorIs there an existing issue for this?
Code of Conduct
The text was updated successfully, but these errors were encountered: