Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

reference and parallelization #10

Open
chenming-wu opened this issue Jan 16, 2016 · 12 comments
Open

reference and parallelization #10

chenming-wu opened this issue Jan 16, 2016 · 12 comments
Assignees

Comments

@chenming-wu
Copy link

Hi gabyx. Thanks for offering your work. It is valuable for me.
Can you offer the reference paper of your implementation? I think it would be helpful to understand your code by my own. Question 2 is - do you think ApproxMVBB is easy to parallelization by means of OpenMP/MPI etc. ? In my experiment, the speed using ApproxMVBB almostly equals to the speed of rotating calipers method with parallelization.

@gabyx
Copy link
Owner

gabyx commented Jan 16, 2016

Sure :-):

Here are the Papers used for the implementation:
They are quite mathematical, especially the second one.
So you 're probably better off by reading the paper a bit while looking at the source code.

@Article{malandain2002,
Author = {Gr'egoire Malandain and Jean-Daniel Boissonnat},
Journal = {International Journal of Computational Geometry & Applications},
Month = {December},
Number = {6},
Pages = {489 - 510},
Timestamp = {2015.09.02},
Title = {Computing the Diameter of a Point Set},
Volume = {12},
Year = {2002}}

and

@inproceedings{barequet2001,
Author = {Gill Barequet and Sariel Har-peled},
Booktitle = {In Proc. 10th ACM-SIAM Sympos. Discrete Algorithms},
Pages = {38--91},
Timestamp = {2015.09.02},
Title = {Efficiently Approximating the Minimum-Volume Bounding Box of a Point Set in Three Dimensions},
Year = {2001}}

What parallelization concerns:
I was not yet thinking about that, but the thing is there are several parts which could benefit from that.
I dont know how to parallelize the graham scan or the rotating calipers method, but you already have one as you mentioned?

But I think one straight forward way of some parallelization would be to run the exhaustive grid search (7) in parallel, that would be easy so each thread (OpenMP) could do: one projection into a direction, 2d convex hull, minimal area rectangle and construct the bounding box.
and then after each iteration, synchronize (mutex on the volume value and oobb storage) to store the best result. and continuing till max iteration is reached.

with MPI: hmm, hm... probably in the same fashion (for a start), there is some communication involved, either communicating point indices (if every process has the point cloud) or communication whole point data.

Another thing there are lots of 2d projections of the 3d points and -> that means if we can reduce run time of https://github.com/gabyx/ApproxMVBB/blob/master/include/ApproxMVBB/ProjectedPointSet.hpp
that would certainly be good :-).
Eigen3 already uses SSE instructions, so thats good. And the projections for the Exhaustive Grid Search in step 7 are only done on fraction of the initial point cloud : min/max points (approximating the convex hull) obtained from the gridded point cloud as described in step 6. Roughly 400-1000 points or even less...

There is another interesting paper which is claimed to be even faster and would be cool if that could be implemented in this library as well:

@Article{chang2011,
Acmid = {2019641},
Address = {New York, NY, USA},
Articleno = {122},
Author = {Chang, Chia-Tche and Gorissen, Bastien and Melchior, Samuel},
Doi = {10.1145/2019627.2019641},
Issn = {0730-0301},
Issue_Date = {October 2011},
Journal = {ACM Trans. Graph.},
Keywords = {Computational geometry, bounding box, manifolds, optimization},
Month = oct,
Number = {5},
Numpages = {16},
Pages = {122:1--122:16},
Publisher = {ACM},
Timestamp = {2015.09.03},
Title = {Fast Oriented Bounding Box Optimization on the Rotation Group {$SO(3,\mathbb{R})$}},
Url = {http://doi.acm.org/10.1145/2019627.2019641},
Volume = {30},
Year = {2011},
Bdsk-Url-1 = {http://doi.acm.org/10.1145/2019627.2019641},
Bdsk-Url-2 = {http://dx.doi.org/10.1145/2019627.2019641}}

It is based on optimization with the Nelder-Mead simplex (dont know if that is straight forward to parallelize, its a genetic opt. aglorithm.

@gabyx
Copy link
Owner

gabyx commented Jan 16, 2016

Rotating Calipers in 3d? or in 2d?

@chenming-wu
Copy link
Author

Hi gabyx! Thanks for your detailed reply :)

I think the algorithm I parallelized is exhaustive grid search you mentioned. Rotating Calipers in 2D is only used at computing 2D minimal area rectangle.

All of the three papers above is interesting. I will devour them and then come back to you. Thank you.

@gabyx
Copy link
Owner

gabyx commented Feb 4, 2016

Hi, I made considerable improvements to the library, fixed some small bugs,
conceptually not alot has changed, functions are still the same.

also I added unit tests. They should hopefully turn out to be correct. if they dont, then its numerically inaccuracy: visualizing the tests if the fail is always the first thing to do =)

regards

@gabyx
Copy link
Owner

gabyx commented Feb 26, 2017

The library has now some openmp support, one place where openmp is commented out yet still makes some problems (more investigations needed), but so far its a huge improvement for speed.

@gabyx gabyx self-assigned this Feb 26, 2017
@sheikware
Copy link

Curious under what circumstances you got the huge speed improvement in the openmp parallelization (around the grid search). Unfortunately my testing is done in MSVC and WSL (not native linux), so I may be missing some of the benefits (side note: I had to re-write the parallelization for MSVC because it is stuck on openmp 2.0 and doesn't support user-defined reductions... I can make a PR for it if you want us poor souls that have to use MSVC)

Grid search: 5
Point Samples: 10000

MSVC Single threaded: ~20000 msec
MSVC Openmp (critical section version): ~2000 msec

WSL Single threaded: ~9000 msec
WSL Openmp (reduction version): ~19000 msec
WSL Openmp (critical section version that I used on MSVC): ~21000 msec

@gabyx
Copy link
Owner

gabyx commented Dec 17, 2019

Strange Benchmark: Did you use full optimization and release?
A PR is welcome :-)
I prefere to have some sort of switch depending on the OpenMP version (is there a macro?)

@sheikware
Copy link

For WSL it was -03 -DNDEBUG (for MSVC it is /O2 /Ob2 /D "NDEBUG").

Code being timed is below where the input cloud has 100,000 points (generated randomly from a uniform distribution in the range of -100.0 to 100.0).

ApproxMVBB::OOBB approx_obb = ApproxMVBB::approximateMVBB(cloud.pts(), 
		0.001, /*epsilon*/
		10000, /*point samples*/
		5, /*grid size*/
		0, /*diameter optizmiation loops*/
		5 /*grid search optimization loops*/
		);
approx_obb.expandToMinExtentRelative(0.1);
... = approx_obb.m_q_KI.matrix();
... = approx_obb.m_minPoint;
... = approx_obb.m_maxPoint;

I'm not aware of any OpenMP MACRO that can be used... I just check if WIN32 and use the OpenMP 2.0 style critical section.

I will push the PR... but by supporting OpenMP 2.0 it makes the code much uglier compared to user-defined reductions :)

@gabyx
Copy link
Owner

gabyx commented Jan 23, 2020

see #38

@gabyx
Copy link
Owner

gabyx commented Jan 23, 2020

Could you test this?

@gabyx
Copy link
Owner

gabyx commented Jan 23, 2020

Strange is, that its so slow with OpenMp 4 in WSL? its twice as slow lol

@gabyx
Copy link
Owner

gabyx commented Jan 23, 2020

Honestly I never did a relevant profiling... Needs further investigation. Maybe profile the code ?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants