Implement Yaroslavskiy-Bentley-Bloch Quicksort. #80

n3vu0r · 2021-06-21T20:00:20Z

This is a dual pivot 3-way quick sort which is less likely to run into worst cases which can end in stack overflows for large arrays.

The previous single pivot 2-way quick sort used by the FreedmanDiaconis strategy overflows its stack with this NPY array.

jturner314

I added a few comments but haven't reviewed the PR in detail. This approach is an improvement over the current implementation. Does this approach work well if all the elements of the input are the same?

jturner314 · 2021-06-24T18:55:09Z

src/sort.rs

    /// }
    /// ```
-    fn partition_mut(&mut self, pivot_index: usize) -> usize
+    fn partition_mut(&mut self) -> (usize, usize)


We should keep the existing partition_mut method, because it's a public method which is useful for users. The dual-pivot partitioning can be an internal-only function just used by _get_many_from_sorted_mut_unchecked, or if you think it would be useful to users for some reason, we could make it public, but as a separate method.

Oh right, no problem. In case we make it public as well, I would suggest to separate dual-pivot sampling from partitioning into two methods. For now, I would keep them private to easily change their interface if needed.

jturner314 · 2021-06-24T18:59:54Z

src/sort.rs

-                i += 1;
+        let lowermost = 0;
+        let uppermost = self.len() - 1;
+        if self.len() < 42 {


Where does 42 come from?

47 is used as recursion cutoff by the JDK 7 dual-pivot implementation. It doesn't really apply here, because we just use it to stop sampling small arrays. I chose the next smaller integer multiple of 7 since we divide by 7. So we have slightly better spaced sample elements for small arrays. The reason to test for small arrays at all is that the sampling doesn't work for arrays smaller than 7 because seventh becomes 0 and sample indexes are not unique anymore.

jturner314 · 2021-06-24T19:04:02Z

src/sort.rs

+        let lowermost = 0;
+        let uppermost = self.len() - 1;
+        if self.len() < 42 {
+            // Sort outermost elements and use them as pivots.


I suspect that using the outermost elements as pivots won't work well if the array is already sorted. I suppose it's not too big an issue since the array is known to be small in this branch, but it's still not ideal.

A couple of other ideas for a deterministic strategy would be:

Use pivot indices at 1/3 and 2/3 (or 1/4 and 3/4) of the length of the array. This would handle sorted inputs better.

Use randomly-selected pivots, but use a RNG with a fixed seed instead of thread_rng.

Yes, not ideal for sorted input. I'm happy with both your ideas. I was also thinking about it and currently am in favor of this:

The sampling has quite a high overhead for small arrays. Instead of sorting the sample with insertion sort, why not sort the whole small array itself with insertion sort, then the original motivation of the constant ~47 holds again as it becomes a recursion cutoff. I first was against this, since the method is called partition_mut() and should not do a full sort but thinking about it, a full sort of small arrays is totally fine as it is the natural edge case of sorting the sample when sample length and array length coincide. We just have to return valid but artificial pivot indexes, the ones trivial to compute are (lowermost, uppermost).

let lowermost = 0; let uppermost = self.len() - 1; // Recursion cutoff at an integer multiple of 7. if self.len() < 42 { // Sort array instead of sample. for mut index in 1..self.len() { while index > 0 && self[index - 1] > self[index] { self.swap(index - 1, index); index -= 1; } } return (lowermost, uppermost); } // Continue with sampling and quick sort.

It seems to be faster than the current PR for my two data sets. And if we want to make the dual pivot partitioning public, we might probably split the code into separate methods: then recursion cutoff moves into the get_ methods which invoke separate pivot sampling and partitioning methods. But functionality-wise above code at the top of partition_mut() is equivalent and requires less modifications to the rest of the code.

The recursion cutoff is not perfect if it happens within partition_mut(), recursion still continues for the middle partition. I think it's best to cleanly separate code into appropriate methods.

jturner314 · 2021-06-24T19:13:34Z

src/sort.rs

-                self.swap(i, j);
-                i += 1;
-                j -= 1;
+            // Use 1st and 3rd element of sorted sample as skewed pivots.


This seems like a good strategy. It's possible to design a pathological input for this strategy, but it would look pretty weird and seems unlikely to occur accidentally. Is there a reference for this strategy, or did you come up with it?

A symmetric strategy is used by the JDK 7 implementation, they use the 2nd and 4th of five. This master thesis suggests at page 183 in the second paragraph starting with "Finally, the case k = 5", where k is the sample length, that the 1st and 3rd of five is a better choice. But in the end it depends on the input distribution.

n3vu0r · 2021-06-25T10:05:49Z

I will test also for same elements and refactor this PR as soon as I have time.

n3vu0r · 2021-06-28T20:08:15Z

Since we keep the single-pivot partitioning method, I've implemented a variant of Sesquickselect. It suggests the optimal pivot selection from fixed size samples wrt to the relative sought rank index / length and also switches from dual-pivot to single-pivot partitioning for extreme sought ranks (page 17, figure 3). The benches show speed up with adaptive pivot sampling and with smaller recursion cutoff thresholds, at least on my machine. For the bulk version, I kept the recommended skewed pivots for Quicksort in my assumption that multiple indexes change the characteristics from Quickselect towards Quicksort (and there is no single sought rank).

It works well with equal element arrays and with sorted and reversely sorted arrays. Tested up to 1_000_000 elements.

I would suggest to make sample_mut() generic over the sample size via const_generics if an MSRV of 1.51 is fine?

The sampling does not have to be equally spaced, could also be randomized. I have no favorite yet.

n3vu0r · 2021-06-30T14:21:56Z

I used adaptive pivot sampling for the bulk version as well but only for branches with a single index remaining. I dunno how the benches are configured but would like to add larger input arrays.

* Add test with large array of equal floats. * Enable optimization for test profile to reduce execution time.

n3vu0r · 2023-04-01T11:20:16Z

Fixes #86 by letting the stack grow on heap whenever necessary. Using dual pivoting should already be superior compared with just splitting => pivot into pivot == and > pivot or do I miss something?

n3vu0r · 2023-04-02T09:31:56Z

This dynamic stack grow on heap should in general be used for every recursive implementation whose depth is depending on input data. But we should still try to avoid worst case complexities. I found that the bulk version's single-pivoting branch is the problem. I will try to split => pivot here too like its done in the non-bulk version. If this doesn't work out, the single-pivoting branch of the bulk version should be removed.

Simply removing that branch, reduces runtime of the test for large arrays of equal elements from 700 ms to 50 ms.

* Delegate single-index invocation to non-bulk implementation. * Non-bulk implementation skips recursion if `index == pivot`.

n3vu0r · 2023-04-02T10:12:13Z

Reusing non-bulk version for single-index invocation of bulk version does the job even better and reduces code complexity. I've increased the array length by factor 10 in the test and execution time was unchanged.

* Move single-index delegation into recursion allowing single-pivoting in bulk mode. * Avoid worst case complexity in non-bulk and bulk mode by using dual-pivot if both sample choices of single-pivot are equal. * Fix first sample index from being skipped.

n3vu0r · 2023-04-03T08:20:46Z

Ready for review/merge.

n3vu0r · 2023-04-07T18:08:45Z

Closing in favor of #92.

Implement Yaroslavskiy-Bentley-Bloch Quicksort.

6be4cfd

n3vu0r force-pushed the dual-pivot branch from 764e33c to 6be4cfd Compare June 22, 2021 12:47

Implement skewed pivot sampling.

e6b7b02

jturner314 reviewed Jun 24, 2021

View reviewed changes

Adapt pivot sampling to relative sought rank.

0919c2b

n3vu0r added 2 commits June 30, 2021 16:03

Make sampling an implementation detail.

fc13897

Adapt pivot sampling for bulk version.

2b1b502

jturner314 mentioned this pull request Mar 5, 2022

quantile_mut: fatal runtime error: stack overflow #86

Open

n3vu0r added 2 commits August 30, 2022 11:14

Merge branch 'master' into dual-pivot

3e06315

Recursively grow stack on heap whenever necessary.

78d334b

* Add test with large array of equal floats. * Enable optimization for test profile to reduce execution time.

Also protect non-bulk version from stack overflow.

23782e7

Avoid worst case complexity for equal elements.

359dc0c

* Delegate single-index invocation to non-bulk implementation. * Non-bulk implementation skips recursion if `index == pivot`.

n3vu0r added 2 commits April 2, 2023 14:01

Make sampling public and test it.

05b87b4

n3vu0r force-pushed the dual-pivot branch from 05c6956 to 05b87b4 Compare April 3, 2023 06:47

n3vu0r added 3 commits April 3, 2023 08:54

Add test with large (rev) sorted array of floats.

34952ce

Add tests and enlarge arrays.

4dc45a4

Bump MSRV to 1.56 for array_map/rust-version.

0821516

n3vu0r closed this Apr 7, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Implement Yaroslavskiy-Bentley-Bloch Quicksort. #80

Implement Yaroslavskiy-Bentley-Bloch Quicksort. #80

n3vu0r commented Jun 21, 2021

jturner314 left a comment

jturner314 Jun 24, 2021

n3vu0r Jun 25, 2021

jturner314 Jun 24, 2021

n3vu0r Jun 25, 2021

jturner314 Jun 24, 2021

n3vu0r Jun 25, 2021 •

edited

Loading

n3vu0r Jun 25, 2021 •

edited

Loading

jturner314 Jun 24, 2021

n3vu0r Jun 25, 2021

n3vu0r commented Jun 25, 2021

n3vu0r commented Jun 28, 2021

n3vu0r commented Jun 30, 2021

n3vu0r commented Apr 1, 2023

n3vu0r commented Apr 2, 2023 •

edited

Loading

n3vu0r commented Apr 2, 2023

n3vu0r commented Apr 3, 2023

n3vu0r commented Apr 7, 2023

Implement Yaroslavskiy-Bentley-Bloch Quicksort. #80

Implement Yaroslavskiy-Bentley-Bloch Quicksort. #80

Conversation

n3vu0r commented Jun 21, 2021

jturner314 left a comment

Choose a reason for hiding this comment

jturner314 Jun 24, 2021

Choose a reason for hiding this comment

n3vu0r Jun 25, 2021

Choose a reason for hiding this comment

jturner314 Jun 24, 2021

Choose a reason for hiding this comment

n3vu0r Jun 25, 2021

Choose a reason for hiding this comment

jturner314 Jun 24, 2021

Choose a reason for hiding this comment

n3vu0r Jun 25, 2021 • edited Loading

Choose a reason for hiding this comment

n3vu0r Jun 25, 2021 • edited Loading

Choose a reason for hiding this comment

jturner314 Jun 24, 2021

Choose a reason for hiding this comment

n3vu0r Jun 25, 2021

Choose a reason for hiding this comment

n3vu0r commented Jun 25, 2021

n3vu0r commented Jun 28, 2021

n3vu0r commented Jun 30, 2021

n3vu0r commented Apr 1, 2023

n3vu0r commented Apr 2, 2023 • edited Loading

n3vu0r commented Apr 2, 2023

n3vu0r commented Apr 3, 2023

n3vu0r commented Apr 7, 2023

n3vu0r Jun 25, 2021 •

edited

Loading

n3vu0r Jun 25, 2021 •

edited

Loading

n3vu0r commented Apr 2, 2023 •

edited

Loading