reference and parallelization #10

chenming-wu · 2016-01-16T21:22:35Z

Hi gabyx. Thanks for offering your work. It is valuable for me.
Can you offer the reference paper of your implementation? I think it would be helpful to understand your code by my own. Question 2 is - do you think ApproxMVBB is easy to parallelization by means of OpenMP/MPI etc. ? In my experiment, the speed using ApproxMVBB almostly equals to the speed of rotating calipers method with parallelization.

gabyx · 2016-01-16T22:22:03Z

Sure :-):

Here are the Papers used for the implementation:
They are quite mathematical, especially the second one.
So you 're probably better off by reading the paper a bit while looking at the source code.

@Article{malandain2002,
Author = {Gr'egoire Malandain and Jean-Daniel Boissonnat},
Journal = {International Journal of Computational Geometry & Applications},
Month = {December},
Number = {6},
Pages = {489 - 510},
Timestamp = {2015.09.02},
Title = {Computing the Diameter of a Point Set},
Volume = {12},
Year = {2002}}

and

@inproceedings{barequet2001,
Author = {Gill Barequet and Sariel Har-peled},
Booktitle = {In Proc. 10th ACM-SIAM Sympos. Discrete Algorithms},
Pages = {38--91},
Timestamp = {2015.09.02},
Title = {Efficiently Approximating the Minimum-Volume Bounding Box of a Point Set in Three Dimensions},
Year = {2001}}

What parallelization concerns:
I was not yet thinking about that, but the thing is there are several parts which could benefit from that.
I dont know how to parallelize the graham scan or the rotating calipers method, but you already have one as you mentioned?

But I think one straight forward way of some parallelization would be to run the exhaustive grid search (7) in parallel, that would be easy so each thread (OpenMP) could do: one projection into a direction, 2d convex hull, minimal area rectangle and construct the bounding box.
and then after each iteration, synchronize (mutex on the volume value and oobb storage) to store the best result. and continuing till max iteration is reached.

with MPI: hmm, hm... probably in the same fashion (for a start), there is some communication involved, either communicating point indices (if every process has the point cloud) or communication whole point data.

Another thing there are lots of 2d projections of the 3d points and -> that means if we can reduce run time of https://github.com/gabyx/ApproxMVBB/blob/master/include/ApproxMVBB/ProjectedPointSet.hpp
that would certainly be good :-).
Eigen3 already uses SSE instructions, so thats good. And the projections for the Exhaustive Grid Search in step 7 are only done on fraction of the initial point cloud : min/max points (approximating the convex hull) obtained from the gridded point cloud as described in step 6. Roughly 400-1000 points or even less...

There is another interesting paper which is claimed to be even faster and would be cool if that could be implemented in this library as well:

@Article{chang2011,
Acmid = {2019641},
Address = {New York, NY, USA},
Articleno = {122},
Author = {Chang, Chia-Tche and Gorissen, Bastien and Melchior, Samuel},
Doi = {10.1145/2019627.2019641},
Issn = {0730-0301},
Issue_Date = {October 2011},
Journal = {ACM Trans. Graph.},
Keywords = {Computational geometry, bounding box, manifolds, optimization},
Month = oct,
Number = {5},
Numpages = {16},
Pages = {122:1--122:16},
Publisher = {ACM},
Timestamp = {2015.09.03},
Title = {Fast Oriented Bounding Box Optimization on the Rotation Group {$SO(3,\mathbb{R})$}},
Url = {http://doi.acm.org/10.1145/2019627.2019641},
Volume = {30},
Year = {2011},
Bdsk-Url-1 = {http://doi.acm.org/10.1145/2019627.2019641},
Bdsk-Url-2 = {http://dx.doi.org/10.1145/2019627.2019641}}

It is based on optimization with the Nelder-Mead simplex (dont know if that is straight forward to parallelize, its a genetic opt. aglorithm.

gabyx · 2016-01-16T22:23:49Z

Rotating Calipers in 3d? or in 2d?

chenming-wu · 2016-01-20T02:30:40Z

Hi gabyx! Thanks for your detailed reply :)

I think the algorithm I parallelized is exhaustive grid search you mentioned. Rotating Calipers in 2D is only used at computing 2D minimal area rectangle.

All of the three papers above is interesting. I will devour them and then come back to you. Thank you.

gabyx · 2016-02-04T17:43:24Z

Hi, I made considerable improvements to the library, fixed some small bugs,
conceptually not alot has changed, functions are still the same.

also I added unit tests. They should hopefully turn out to be correct. if they dont, then its numerically inaccuracy: visualizing the tests if the fail is always the first thing to do =)

regards

gabyx · 2017-02-26T14:10:13Z

The library has now some openmp support, one place where openmp is commented out yet still makes some problems (more investigations needed), but so far its a huge improvement for speed.

sheikware · 2019-12-13T16:27:42Z

Curious under what circumstances you got the huge speed improvement in the openmp parallelization (around the grid search). Unfortunately my testing is done in MSVC and WSL (not native linux), so I may be missing some of the benefits (side note: I had to re-write the parallelization for MSVC because it is stuck on openmp 2.0 and doesn't support user-defined reductions... I can make a PR for it if you want us poor souls that have to use MSVC)

Grid search: 5
Point Samples: 10000

MSVC Single threaded: ~20000 msec
MSVC Openmp (critical section version): ~2000 msec

WSL Single threaded: ~9000 msec
WSL Openmp (reduction version): ~19000 msec
WSL Openmp (critical section version that I used on MSVC): ~21000 msec

gabyx · 2019-12-17T20:20:46Z

Strange Benchmark: Did you use full optimization and release?
A PR is welcome :-)
I prefere to have some sort of switch depending on the OpenMP version (is there a macro?)

sheikware · 2019-12-17T22:00:46Z

For WSL it was -03 -DNDEBUG (for MSVC it is /O2 /Ob2 /D "NDEBUG").

Code being timed is below where the input cloud has 100,000 points (generated randomly from a uniform distribution in the range of -100.0 to 100.0).

ApproxMVBB::OOBB approx_obb = ApproxMVBB::approximateMVBB(cloud.pts(), 
		0.001, /*epsilon*/
		10000, /*point samples*/
		5, /*grid size*/
		0, /*diameter optizmiation loops*/
		5 /*grid search optimization loops*/
		);
approx_obb.expandToMinExtentRelative(0.1);
... = approx_obb.m_q_KI.matrix();
... = approx_obb.m_minPoint;
... = approx_obb.m_maxPoint;

I'm not aware of any OpenMP MACRO that can be used... I just check if WIN32 and use the OpenMP 2.0 style critical section.

I will push the PR... but by supporting OpenMP 2.0 it makes the code much uglier compared to user-defined reductions :)

gabyx · 2020-01-23T18:15:02Z

see #38

gabyx · 2020-01-23T18:15:22Z

Could you test this?

gabyx · 2020-01-23T18:16:23Z

Strange is, that its so slow with OpenMp 4 in WSL? its twice as slow lol

gabyx · 2020-01-23T18:17:49Z

Honestly I never did a relevant profiling... Needs further investigation. Maybe profile the code ?

gabyx added the enhancement label Feb 4, 2016

gabyx self-assigned this Feb 26, 2017

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

reference and parallelization #10

reference and parallelization #10

chenming-wu commented Jan 16, 2016

gabyx commented Jan 16, 2016

gabyx commented Jan 16, 2016

chenming-wu commented Jan 20, 2016

gabyx commented Feb 4, 2016

gabyx commented Feb 26, 2017

sheikware commented Dec 13, 2019

gabyx commented Dec 17, 2019

sheikware commented Dec 17, 2019

gabyx commented Jan 23, 2020

gabyx commented Jan 23, 2020

gabyx commented Jan 23, 2020

gabyx commented Jan 23, 2020

reference and parallelization #10

reference and parallelization #10

Comments

chenming-wu commented Jan 16, 2016

gabyx commented Jan 16, 2016

gabyx commented Jan 16, 2016

chenming-wu commented Jan 20, 2016

gabyx commented Feb 4, 2016

gabyx commented Feb 26, 2017

sheikware commented Dec 13, 2019

gabyx commented Dec 17, 2019

sheikware commented Dec 17, 2019

gabyx commented Jan 23, 2020

gabyx commented Jan 23, 2020

gabyx commented Jan 23, 2020

gabyx commented Jan 23, 2020