Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Parallelize the Packer #11

Closed
wants to merge 3 commits into from
Closed

Parallelize the Packer #11

wants to merge 3 commits into from

Conversation

CryZe
Copy link

@CryZe CryZe commented Feb 6, 2018

Also get rid of a few allocations.

@Libbum
Copy link
Owner

Libbum commented Feb 6, 2018

Excellent! Give me a day or so to go over things and do a few checks, but at a cursory glance this looks good. Thanks for your contribution!

(#9)

@Libbum Libbum self-requested a review February 6, 2018 16:48
@codecov-io
Copy link

codecov-io commented Feb 6, 2018

Codecov Report

Merging #11 into master will increase coverage by 0.22%.
The diff coverage is 100%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master      #11      +/-   ##
==========================================
+ Coverage   93.23%   93.45%   +0.22%     
==========================================
  Files           5        5              
  Lines         266      275       +9     
==========================================
+ Hits          248      257       +9     
  Misses         18       18
Impacted Files Coverage Δ
src/lib.rs 98.48% <100%> (+0.07%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update c74b3a3...d7551f5. Read the comment docs.

@Libbum
Copy link
Owner

Libbum commented Feb 6, 2018

I ran into this problem when I did some quick rayon testing myself - doesn't look too hot with the current implementation...

I've taken the show_in_emerald example and dropped off the kiss3d portion, so it just calculates the emerald packing then the volume fraction. Here's some quick benches:

current master:

cargo run --release --example emerald  15.88s user 0.06s system 99% cpu 15.954 total
cargo run --release --example emerald  15.15s user 0.05s system 99% cpu 15.211 total
cargo run --release --example emerald  15.19s user 0.04s system 99% cpu 15.242 total

parallel branch:

cargo run --release --example emerald  1113.72s user 167.66s system 2591% cpu 49.440 total
cargo run --release --example emerald  1107.87s user 167.37s system 2731% cpu 46.694 total
cargo run --release --example emerald  1091.36s user 176.13s system 2628% cpu 48.222 total

I'm using a 14 core 2690v4, so I expect some overhead cost - especially on a 15 second single core run, but at the same time I thought rayon should choose to paralellise over multiple workers or not, depending on situation.
Your implementation looks nice, so I'm running some longer form benchmarks and I'll investigate this a little deeper tomorrow - see if I can locate the bottleneck.

@Libbum
Copy link
Owner

Libbum commented Feb 7, 2018

The same with sphere sizes of 0.1 to 0.3

master:

cargo run --release --example emerald  186.99s user 0.06s system 99% cpu 3:07.06 total
cargo run --release --example emerald  185.42s user 0.04s system 99% cpu 3:05.48 total                                                             
cargo run --release --example emerald  187.25s user 0.05s system 99% cpu 3:07.45 total

parallel:

cargo run --release --example emerald  9870.50s user 1521.66s system 2681% cpu 7:04.85 total
cargo run --release --example emerald  9824.32s user 1510.13s system 2731% cpu 6:54.93 total
cargo run --release --example emerald  9762.66s user 1575.09s system 2636% cpu 7:09.96 total

So it's not a spool up/down issue, and I can test on the quicker, 15 second benchmark...

src/lib.rs Outdated
@@ -432,14 +442,16 @@ fn identify_f<C: Container>(
)?;

// Make sure the spheres are bounded by the containing geometry and do not overlap any spheres in V
if container.contains(&s_4_positive) && !set_v.iter().any(|v| v.overlaps(&s_4_positive)) {
f.push(s_4_positive);
if container.contains(&s_4_positive) && !set_v.par_iter().any(|v| v.overlaps(&s_4_positive))
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Dropping this par_iter and the one directly below gives me 18 seconds on the first benchmark and 2 mins 47 secs on the second. So not that much improvement yet...

@Libbum
Copy link
Owner

Libbum commented Feb 7, 2018

By dropping the two set_v contains par_iters, we're left with the par_extend call to set_v, which for small runtime gives us a slight decrease in speed and for longer runtimes only 1.1x improvement. I'd say this comes down to the cloning that's happening, so I don't think the problem is going to be as straightforward as this implementation unfortunately.

This is certainly not my strong suit, so if you have other suggestions or ideas I'm very happy to consider them.

&& nalgebra::distance(&curr_sphere.center, &s_dash.center)
<= curr_sphere.radius + s_dash.radius + 2. * new_radius
})
.cloned(),
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well that totally makes sense. :) I'm out of town today, so will run those quick benches tomorrow when I get home and see how they compare.

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So I'm not getting any timing difference with this change at all. Do you have any benches of your own that I can take a look at? Perhaps it's only my machine?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants