-
Notifications
You must be signed in to change notification settings - Fork 25
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Implement Yaroslavskiy-Bentley-Bloch Quicksort. #80
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I added a few comments but haven't reviewed the PR in detail. This approach is an improvement over the current implementation. Does this approach work well if all the elements of the input are the same?
src/sort.rs
Outdated
/// } | ||
/// ``` | ||
fn partition_mut(&mut self, pivot_index: usize) -> usize | ||
fn partition_mut(&mut self) -> (usize, usize) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should keep the existing partition_mut
method, because it's a public method which is useful for users. The dual-pivot partitioning can be an internal-only function just used by _get_many_from_sorted_mut_unchecked
, or if you think it would be useful to users for some reason, we could make it public, but as a separate method.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh right, no problem. In case we make it public as well, I would suggest to separate dual-pivot sampling from partitioning into two methods. For now, I would keep them private to easily change their interface if needed.
src/sort.rs
Outdated
i += 1; | ||
let lowermost = 0; | ||
let uppermost = self.len() - 1; | ||
if self.len() < 42 { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Where does 42
come from?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
47 is used as recursion cutoff by the JDK 7 dual-pivot implementation. It doesn't really apply here, because we just use it to stop sampling small arrays. I chose the next smaller integer multiple of 7 since we divide by 7. So we have slightly better spaced sample elements for small arrays. The reason to test for small arrays at all is that the sampling doesn't work for arrays smaller than 7 because seventh
becomes 0 and sample indexes are not unique anymore.
src/sort.rs
Outdated
let lowermost = 0; | ||
let uppermost = self.len() - 1; | ||
if self.len() < 42 { | ||
// Sort outermost elements and use them as pivots. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I suspect that using the outermost elements as pivots won't work well if the array is already sorted. I suppose it's not too big an issue since the array is known to be small in this branch, but it's still not ideal.
A couple of other ideas for a deterministic strategy would be:
-
Use pivot indices at 1/3 and 2/3 (or 1/4 and 3/4) of the length of the array. This would handle sorted inputs better.
-
Use randomly-selected pivots, but use a RNG with a fixed seed instead of
thread_rng
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, not ideal for sorted input. I'm happy with both your ideas. I was also thinking about it and currently am in favor of this:
The sampling has quite a high overhead for small arrays. Instead of sorting the sample with insertion sort, why not sort the whole small array itself with insertion sort, then the original motivation of the constant ~47 holds again as it becomes a recursion cutoff. I first was against this, since the method is called partition_mut()
and should not do a full sort but thinking about it, a full sort of small arrays is totally fine as it is the natural edge case of sorting the sample when sample length and array length coincide. We just have to return valid but artificial pivot indexes, the ones trivial to compute are (lowermost, uppermost)
.
let lowermost = 0;
let uppermost = self.len() - 1;
// Recursion cutoff at an integer multiple of 7.
if self.len() < 42 {
// Sort array instead of sample.
for mut index in 1..self.len() {
while index > 0 && self[index - 1] > self[index] {
self.swap(index - 1, index);
index -= 1;
}
}
return (lowermost, uppermost);
}
// Continue with sampling and quick sort.
It seems to be faster than the current PR for my two data sets. And if we want to make the dual pivot partitioning public, we might probably split the code into separate methods: then recursion cutoff moves into the get_
methods which invoke separate pivot sampling and partitioning methods. But functionality-wise above code at the top of partition_mut()
is equivalent and requires less modifications to the rest of the code.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The recursion cutoff is not perfect if it happens within partition_mut()
, recursion still continues for the middle partition. I think it's best to cleanly separate code into appropriate methods.
src/sort.rs
Outdated
self.swap(i, j); | ||
i += 1; | ||
j -= 1; | ||
// Use 1st and 3rd element of sorted sample as skewed pivots. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This seems like a good strategy. It's possible to design a pathological input for this strategy, but it would look pretty weird and seems unlikely to occur accidentally. Is there a reference for this strategy, or did you come up with it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A symmetric strategy is used by the JDK 7 implementation, they use the 2nd and 4th of five. This master thesis suggests at page 183 in the second paragraph starting with "Finally, the case k = 5", where k is the sample length, that the 1st and 3rd of five is a better choice. But in the end it depends on the input distribution.
I will test also for same elements and refactor this PR as soon as I have time. |
Since we keep the single-pivot partitioning method, I've implemented a variant of Sesquickselect. It suggests the optimal pivot selection from fixed size samples wrt to the relative sought rank It works well with equal element arrays and with sorted and reversely sorted arrays. Tested up to I would suggest to make The sampling does not have to be equally spaced, could also be randomized. I have no favorite yet. |
I used adaptive pivot sampling for the bulk version as well but only for branches with a single index remaining. I dunno how the benches are configured but would like to add larger input arrays. |
* Add test with large array of equal floats. * Enable optimization for test profile to reduce execution time.
Fixes #86 by letting the stack grow on heap whenever necessary. Using dual pivoting should already be superior compared with just splitting |
This dynamic stack grow on heap should in general be used for every recursive implementation whose depth is depending on input data. But we should still try to avoid worst case complexities. I found that the bulk version's single-pivoting branch is the problem. I will try to split Simply removing that branch, reduces runtime of the test for large arrays of equal elements from 700 ms to 50 ms. |
* Delegate single-index invocation to non-bulk implementation. * Non-bulk implementation skips recursion if `index == pivot`.
Reusing non-bulk version for single-index invocation of bulk version does the job even better and reduces code complexity. I've increased the array length by factor 10 in the test and execution time was unchanged. |
* Move single-index delegation into recursion allowing single-pivoting in bulk mode. * Avoid worst case complexity in non-bulk and bulk mode by using dual-pivot if both sample choices of single-pivot are equal. * Fix first sample index from being skipped.
Ready for review/merge. |
Closing in favor of #92. |
This is a dual pivot 3-way quick sort which is less likely to run into worst cases which can end in stack overflows for large arrays.
The previous single pivot 2-way quick sort used by the
FreedmanDiaconis
strategy overflows its stack with this NPY array.