-
Notifications
You must be signed in to change notification settings - Fork 216
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
polygon/ring self-intersection correct #868
Comments
Hi and thanks for this proposal! I think that a function removing self-intersections should be a part of the library. AFAIK @barendgehrels was working on this and the result was In Boost.Geometry we divide algorithms into two parts, coordinate-system agnostic and specific/dependent. The former we call algorithms and the latter strategies. So strategies define some coordinate-system specific building blocks like calculating a side of a point wrt a line or performing an intersection of two segments, etc. Then the coordinate-system agnostic algorithm uses these building blocks. I'm not sure if naming it |
We tried (from the tilemaker project), but we never got this extension to compile. You can see on the github I generate the examples from @barendgehrels: TBH i do not understand @barendgehrels approach, so I can not judge on how this compares. He had issues on certain polygons, and this is a generalized approach. Only one approach would make sense. There is also a discussion going on about this method on JTS: GEOS also have done an extensive comparison on polygon repair methods: I think in your definition (correct me if I am wrong), this is an algorithm, because it uses the following coordinate specific operations:
At the moment it is a full-correct implementation, also correcting orientation, closing ring, removing invalid points. But i don't suggest integrating this, i think the self-intersecting removal is the only thing needed for boost geometry. I think also a function like this should be part of boost geometry, many projects can benefit from this. |
Hi Wouter, thank you for approaching us! It looks promising. I will look into it in more detail. I didn't look at our What about the "BEER-WARE" license 😃 can it be changed into Boost Software License (your name can stay there, of course)? |
yes, of course. But you owe me a biertje, ok ? :D It seems the topic became very relevant recently. Projects working on this and some recent scientific papers. |
Sure, I owe you a biertje! 😃 Will you check patents? Indeed I've troubles myself compiling dissolve too, we changed quite some strategies internally, and the extensions are sometimes not updated accordingly - there is no automatic CI for them. This makes it a bit harder to compare the algorithms. |
Quick search show, there are patents, related to self-intersecting polygons, but some are chinese: If you can compare the results, i would be interested to hear about your findings. |
I did a comparison in terms of results and of performance. In short I think we can use this new algorithm. I like the two variants (correct and correct_odd_even). I agree with @awulkiew that the namings are not convenient, but we can fix that later. I did not (yet) repair This is my sample program, it's straightforward and not sophisticated
It can be compiled, for example, like this The results are (noted as dissolved, correct, correct o/e, first area/validity of results, then times in milliseconds):
svgs (generated in The performance depends on the input. This is more often the case. Sometimes dissolve is faster, sometimes correct is faster. The For the two random generated polygons, The In summary: no version is just better, we can investigate more, and maybe even publish both algorithms. The |
I can include more pictures on request, but you can also generate them yourselves. And it would actually be useful to get more realistic random examples (for example with generated stars on random overlapping places and fix those, I will probably adapt the |
Yes, odd-even looks the most correct in this case. This document describes also a lot of test cases which really show the differences in how the union/overlapping regions can be handled: Different approaches to union/intersect will result in different outputs. It is difficult to say what is correct and what is not. Also, regarding the union, i now perform the "union" myself: Ideally, i would just like to use a normal union. But this does slow down the approach severely. |
maybe it is not that strange to provide a "decompose into simple polygons" functions. And implement correct based on this, but also allow the user to just get hold of the decomposed polygon him/herself. |
the non-zero winding is calculated based on the generated polygon. So you only have an outer, with an inner inside. But I guess the winding should be based on the input polygon. I am not sure how to do this easily, do i have to perform a raytracing for this, or is there some other way ? https://github.com/kleunen/boost_geometry_correct/blob/main/correct.hpp#L290-L301 I guess maybe some sweep over the generated polygons should be performed |
The is also the possibility to always combine an outer as an outer. In case of the random generated polygon, this would mean, the complete outer perimeter of the polygon is traced and the inside is completely filled. This is fast, because you can just union the generated polygons. No need to detect winding/odd-even. |
One way would be to provide a "complete" solution, several ways of calculating this resulting valid polygon. Then the user would be able to choose the one that is the most suitable for a specific application. Another way would be the practical approach. Instead of considering a totally random polygon which could probably only be the result of an error in an algorithm or some garbage from memory we could think about the reasons why we may be forced to deal with self-intersections in practice. Why a polygon may be self-intersecting? Is it a result of some faulty algorithm? Is it a human-error while creating areas in OSM? Can this polygon be arbitrary or would there be e.g. several classes of self-intersecting polygons? What the library users will the most typically deal with? Etc. For instance, is there somewhere a collection of self-intersecting polygons that can be found in OSM right now? From the top of my head I can think of two classes of self-intersectiong polygons which I think are the most common (I may be wrong).
The first case is simple, parts of the exterior ring has to be converted into interior rings. The second one is more ambiguous. I'd argue that what should probably happen would be to preserve all interiors and exteriors based on winding which means that the small triangle on the upper right would be removed and the small triangle on the left would be preserved and merged with the rest of the interior. So the result would be a polygon with a hole. It'd argue that it'd be sufficient for an algorithm to cover these plus maybe several other classes of self-intersections found in practice (maybe you can think about more or get them from the OSM). If I had to choose between several algorithms successfully dealing with these I'd probably choose the one of the least complexity or the fastest for my specific use case. |
@awulkiew On one hand, yes, you are right. In OSM data indeed many of these errors are quite simple. I think they are caused by GPS devices that are used to take the measurement, indeed sometimes the user enters them correctly. And some data is retrieved from satellite images which are converted to polygons. The errors you describe are indeed very common. And also sometimes multipolygons are drawn like one polygon: Spikes are also very common, that is why I added also a spike removal threshold. Spikes are usually two completely overlapping segments. They result in a ABA type sub polygon after intersection removal, and they have no significant area. Indeed performance is important. In the OSM data there are actually many self-intersecting polygons. Many of them, actually, render just fine. Because there javascript libraries can handle many cases of self-intersecting polygons. Some, however, are not rendered ok. And this means you will have a large region covered with some rogue polygons. The approach I implemented now in the default correct, handles these cases you mention correctly. If the sub polygon is oriented clockwise, it will become an outer. If it is CCW, it will become an inner. If you want to fully correctly handle non-zero winding, you may need to implement raytracing. This may affect performance very negatively. It seems the odd-even approach, is fully correct according to how odd-even should behave. If I benchmark this, on actual geo data, the polygon that @barendgehrels describes, both the odd-even and non-zero winding approach finish in 1600ms on my mobile cpu / slow / embedded linux device. Probably because there are self-intersection in the outer in this data, but actually no overlapping inners or whatever. So it actually suprises me that in the benchmark of barend there is a difference in performance. |
I think it is possible to calculate the NZW rule without too much negative performance impact. It only applies to overlapping polygons. And because all self-intersection is removed, the polygons are not intersecting anymore. So you can put them in an rtree, ask for overlapping polygons, and basicly calculate the position (inside or out) of this polygon based on the orientation of the rings around. I guess this can also speed up odd-even, because this needs to be calculated only for regions which are overlapping. Other regions you can just union together. |
I will have a look to see how this combining can be improved |
This means that for my second example the small triangle should be preserved because this might be a user drawing multipolygon like that. In general what OSM users will do is going to depend on how the polygon is displayed in the app they're using for editing and how the tiles are rendered on the official website. So basically the expected self-intersections will be those which Potlatch and Mappnik (and maybe JOSM) show well. But that's only one use-case. Btw, my images were generated with inkscape. So this program displays the self-intersecting polygons the same way that we're describing here too. |
Yes, this little area in the corner. It really depends on the situation if you want to keep it. Sometimes you have these small areas near the corners of building, and in that case, you want to remove it. Sometimes, this area is a whole part of the building, and you want to keep it. So you can play a little with this "spike" removal, and remove these small non-significant areas. And hopefully, you will get it right in most cases. Yes, you are right. In the OSM case, you want to follow the behaviour of the editor. |
This comment has been minimized.
This comment has been minimized.
I optimized now the non-zero winding approach, but #869 is now preventing me from further optimizing the odd-even approach. @barendgehrels can you maybe run a benchmark again on your machine ? |
Clipper defines two additional polygon fill types:
I like this positive and negative, but I would define them differently:
Positive can be generated very quickly, it does not need to calculate the winding. This might be useful when performance is a concern. Negative i don't see a practical purpose, but if you have positive, why not have negative ? |
Well done! I can confirm that it is faster now, sometimes much faster, and that the
|
The odd even approach can have the same performance as the other approach. But for this, the difference should be fixed. |
I'm looking at the code. It might be possible to change I don't know if it is faster, but it uses the more standard way. Anyway, it's just a question, I'm just curious to the difference. It is not really necessary to change it. I added some statements and the current method looks fast anyway. |
Yes I noticed the new issue. Will look at it. |
Yes it sounds like this function can be used. The other things is that i basicly crafted my own union which checks intersection before performing the union. This is much faster than the standard union. But i dont understand why. https://github.com/kleunen/boost_geometry_correct/blob/main/correct.hpp#L27-L49 It adds a polygon to an existing multipolygon. |
I tried this, but I think this algorithm misses the cases when two segments have multiple intersection points (they are collinear). Is that possible ? It only returns 1 turn in that case, while 2 turns should be found. |
It can return 2 turns, probably you have to change the policy. There is a policy to get back all turns (that is not the default policy). But you might get some duplicates in that case (where a segment arrives at another segment, and then starts from it) |
Ok. This is no an issue I think i figured out the policy |
I added generation of the reference cases (the polygons only for now), from this document: My NZW approach generates the same output as JTS - Buffer(0) approach Only difference is the handling of the inners. GEOS uses sym_difference to combine outers with the inners. So inners might become outers again. I use difference now, inners always stay holes, even after "repair". I am not sure what would be the "correct" behaviour. You can see in this case, part of the hole is outside the polygon. This does not become a separate polygon (yellow block is an inner): |
There are many ways to handle this and it depends on the situation if it is correct, or not. Therefore I like the different rules / possibilities you implemented, such that users have some choice. In this particular case the GEOS looks better - but just because of the look. If the right polygon was indeed intended as "negative" then it makes sense to not include it. |
You did! 👍 |
Then possibly we should make this behaviour configurable. So you have the filling behaviour, the combining of outer/inners and the combining of polygons within a multipolygon. We can make some sensible default configurations, but allow the user to define other behaviours as well. |
That sounds awesome. |
But can you comment on, why this union is so much faster than the normal union: Ideally I would use a divide/conquer loop like this: Because if the combine function is configurable to be union/sym_difference: This is easier. Is it because less memory allocations are performed, because the union is "in-place" ? |
I don't know it at this moment... It is a normal union, by the way. So what would be the alternative? We can only combine two polygons with a union, like you do. We can't combine multiple polygons. |
I don't understand all your code yet. By the way, the sorting lambda recalculates the area (probably) multiple times, that might be optimized. |
yes, that is something to optimize. well, the alternative for the union, if you perform a union of a multipolygon, with a polygon. And the multipolygon has, say, 10 polygons in there, but only 1 intersects. You only have to update 1 polygon within the multipolygon. But now, i think, all 10 are copied to the output. But ideally, you want to "move" them to the output, if the input multipolygon is not relevant anymore. Not this: union_(mp, p, output) but this: union_(std::move(mp), std::move(p), output) |
I think I understand. But our code doesn't work like that. It leaves all input untouched, even if you don't need it anymore. So yes, then you have to implement a hand crafted version like you did. |
I made the combine/difference configurable, so the sym_difference can be used now in odd/even. I think the default output is now the same as GEOS (for odd/even). I don't see that much difference in performance, only for this large polygon (WKT) i import from file. edit found the performance regression |
It seems there is still an issue with multiple intersections, if multiple segments intersect at the same point. One paper, does not mention this issue at all. The other paper states you can select any outgoing edge and it does not matter. This appears to be true, to some extent. It works, but you may end up at a different ring, and the generated ring may contain a region at the start which overlaps. This can be handled, but it is not so nice. Both papers test with random generated polygons. The probability of a multiple intersection is very low in this case, probably that is why it works in these papers. The test set from GEOS contains "hand" drawn polygons with multiple intersections. |
I updated the approach to handle multiple intersections correctly. Possibly the approach can be made more robust, but this requires quite a different approach. So this needs quite some rework and testing and not sure if it really is better. I think the implementation is very solid now and configurable. But would be good to perform more testing. |
Sounds good! I hope to have a PR of the difference-issue you found next week. Are you still dependent on it? |
yes, this is required for the odd/even filling. |
With the current head, the self_turns call results in the following error:
Is this related to #870 ? Ah yes, i now defined:
|
Would this make it possible to have a buffer around a linestring without dissolving self intersections? Say I have a line like this, BG 1.85 gives me back a multipolygon with one inner and one outer polygon. In my case I'd need one single polygon (or a ring). Here is an example |
I have developed a polygon correct approach (remove self-intersection, correct order, correct inners), based on boost geometry:
https://github.com/kleunen/boost_geometry_correct
The approach for calculating and removing self-intersection is vastly simpler and more efficient than existing libraries.
It was developed for tilemaker:
https://github.com/systemed/tilemaker
But it might be useful for including in boost geometry as well
The text was updated successfully, but these errors were encountered: