Parallelize the Packer #11

CryZe · 2018-02-06T16:29:55Z

Also get rid of a few allocations.

Libbum · 2018-02-06T16:47:10Z

Excellent! Give me a day or so to go over things and do a few checks, but at a cursory glance this looks good. Thanks for your contribution!

(#9)

codecov-io · 2018-02-06T16:52:42Z

Codecov Report

Merging #11 into master will increase coverage by 0.22%.
The diff coverage is 100%.

@@            Coverage Diff             @@
##           master      #11      +/-   ##
==========================================
+ Coverage   93.23%   93.45%   +0.22%     
==========================================
  Files           5        5              
  Lines         266      275       +9     
==========================================
+ Hits          248      257       +9     
  Misses         18       18

Impacted Files	Coverage Δ
src/lib.rs	`98.48% <100%> (+0.07%)`	⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update c74b3a3...d7551f5. Read the comment docs.

Libbum · 2018-02-06T19:53:59Z

I ran into this problem when I did some quick rayon testing myself - doesn't look too hot with the current implementation...

I've taken the show_in_emerald example and dropped off the kiss3d portion, so it just calculates the emerald packing then the volume fraction. Here's some quick benches:

current master:

cargo run --release --example emerald  15.88s user 0.06s system 99% cpu 15.954 total
cargo run --release --example emerald  15.15s user 0.05s system 99% cpu 15.211 total
cargo run --release --example emerald  15.19s user 0.04s system 99% cpu 15.242 total

parallel branch:

cargo run --release --example emerald  1113.72s user 167.66s system 2591% cpu 49.440 total
cargo run --release --example emerald  1107.87s user 167.37s system 2731% cpu 46.694 total
cargo run --release --example emerald  1091.36s user 176.13s system 2628% cpu 48.222 total

I'm using a 14 core 2690v4, so I expect some overhead cost - especially on a 15 second single core run, but at the same time I thought rayon should choose to paralellise over multiple workers or not, depending on situation.
Your implementation looks nice, so I'm running some longer form benchmarks and I'll investigate this a little deeper tomorrow - see if I can locate the bottleneck.

Libbum · 2018-02-07T10:27:04Z

The same with sphere sizes of 0.1 to 0.3

master:

cargo run --release --example emerald  186.99s user 0.06s system 99% cpu 3:07.06 total
cargo run --release --example emerald  185.42s user 0.04s system 99% cpu 3:05.48 total                                                             
cargo run --release --example emerald  187.25s user 0.05s system 99% cpu 3:07.45 total

parallel:

cargo run --release --example emerald  9870.50s user 1521.66s system 2681% cpu 7:04.85 total
cargo run --release --example emerald  9824.32s user 1510.13s system 2731% cpu 6:54.93 total
cargo run --release --example emerald  9762.66s user 1575.09s system 2636% cpu 7:09.96 total

So it's not a spool up/down issue, and I can test on the quicker, 15 second benchmark...

Libbum · 2018-02-07T10:42:01Z

src/lib.rs

@@ -432,14 +442,16 @@ fn identify_f<C: Container>(
 )?;

 // Make sure the spheres are bounded by the containing geometry and do not overlap any spheres in V
- if container.contains(&s_4_positive) && !set_v.iter().any(|v| v.overlaps(&s_4_positive)) {
- f.push(s_4_positive);
+ if container.contains(&s_4_positive) && !set_v.par_iter().any(|v| v.overlaps(&s_4_positive))


Dropping this par_iter and the one directly below gives me 18 seconds on the first benchmark and 2 mins 47 secs on the second. So not that much improvement yet...

Libbum · 2018-02-07T11:01:10Z

By dropping the two set_v contains par_iters, we're left with the par_extend call to set_v, which for small runtime gives us a slight decrease in speed and for longer runtimes only 1.1x improvement. I'd say this comes down to the cloning that's happening, so I don't think the problem is going to be as straightforward as this implementation unfortunately.

This is certainly not my strong suit, so if you have other suggestions or ideas I'm very happy to consider them.

Libbum · 2018-02-09T08:48:06Z

src/lib.rs

+ && nalgebra::distance(&curr_sphere.center, &s_dash.center)
+ <= curr_sphere.radius + s_dash.radius + 2. * new_radius
+ })
+ .cloned(),


Well that totally makes sense. :) I'm out of town today, so will run those quick benches tomorrow when I get home and see how they compare.

So I'm not getting any timing difference with this change at all. Do you have any benches of your own that I can take a look at? Perhaps it's only my machine?

CryZe added 2 commits February 6, 2018 17:26

Parallelize the Packer

706eaa8

Fix Unit Test

d218cc9

Libbum self-requested a review February 6, 2018 16:48

Libbum reviewed Feb 7, 2018

View reviewed changes

Improve Performance

d7551f5

Libbum reviewed Feb 9, 2018

View reviewed changes

Libbum mentioned this pull request Feb 15, 2018

Performance profiling and optimisation #8

Open

CryZe closed this Jan 31, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Parallelize the Packer #11

Parallelize the Packer #11

CryZe commented Feb 6, 2018

Libbum commented Feb 6, 2018 •

edited

Loading

codecov-io commented Feb 6, 2018 •

edited

Loading

Libbum commented Feb 6, 2018

Libbum commented Feb 7, 2018

Libbum Feb 7, 2018

Libbum commented Feb 7, 2018

Libbum Feb 9, 2018

Libbum Feb 10, 2018

Parallelize the Packer #11

Parallelize the Packer #11

Conversation

CryZe commented Feb 6, 2018

Libbum commented Feb 6, 2018 • edited Loading

codecov-io commented Feb 6, 2018 • edited Loading

Codecov Report

Libbum commented Feb 6, 2018

Libbum commented Feb 7, 2018

Libbum Feb 7, 2018

Choose a reason for hiding this comment

Libbum commented Feb 7, 2018

Libbum Feb 9, 2018

Choose a reason for hiding this comment

Libbum Feb 10, 2018

Choose a reason for hiding this comment

Libbum commented Feb 6, 2018 •

edited

Loading

codecov-io commented Feb 6, 2018 •

edited

Loading