-
Notifications
You must be signed in to change notification settings - Fork 63
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
added parallel safe keyword to function definitions #23
Open
nyurik
wants to merge
6
commits into
mapbox:master
Choose a base branch
from
openmaptiles:master
base: master
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Added parallel safe keyword to function definitions Since Postgres 9.6 the `PARALLEL SAFE` modifiers for function definition enables the execution of queries using one or more workers in parallel (more details [here](https://www.postgresql.org/docs/current/parallel-safety.html)). This straight forward PR adds this modifier to this repo functions to avoid them potentially blocking parallel execution when present. I've done tests on all functions to ensure they don't block parallel executions with the following procedure: * Created a sample database on a `postgres:12` Docker container with Postgis 3 installed * Created a sample table `points` with 1M random points and a random `kpi` integer column * Imported the layer `ne_10m_admin_1_states_provinces` from Natural Earth to have also a multipolygon table * Relaxed the conditions for parallel execution: ``` [local] postgres@bench=# set parallel_setup_cost = 10; SET Time: 0.322 ms [local] postgres@bench=# set parallel_tuple_cost = 0.001; SET ``` * Run sample queries with the keyword enabled until the parallel execution was activated * Removed the keyword and run the same query for comparison. The tests below shared are just meant to showcase that `PARALLEL SAFE` can help the faster execution of queries and is not an indicator of the improvement introduced since there are many factors that can affect that. It's worth noting that this makes these functions only available for [Postgres 9.6](https://www.postgresql.org/docs/9.6/sql-createfunction.html) onwards so maybe we could consider having both versions in separated folders. Let me know if that would make sense to modify the PR that way. <details><summary>Query examples</summary> For each query, the execution time in milliseconds with and without the `PARALLEL SAFE` keyword is shared. ### MakeArc explain analyze select makearc(geom_lag, geom, geom_lead) from ( select lag(geom) over (partition by kpi % 3 order by kpi) geom_lag, geom, lead(geom) over (partition by kpi % 3 order by kpi) geom_lead from points ) a union all select makearc(geom_lag, geom, geom_lead) from ( select lag(geom) over (partition by kpi % 4 order by kpi) geom_lag, geom, lead(geom) over (partition by kpi % 4 order by kpi) geom_lead from points ) a; - With parallel: 1148 - Without parallel: 19395 ### MercBuffer explain analyze verbose select count(*) from points where st_area(mercbuffer(geom,kpi)) > kpi^10; - With parallel: 8942 - Without parallel: 25351 ### MercDWithin explain analyze verbose select count(*) from points where MercDWithin( st_transform(geom,3857), st_transform( ST_SetSRID( ST_MakePoint( -71.1043443253471,42.3150676015829) ,4326) ,3857) ,100000); - With parallel: 1222 - Without parallel: 2885 ### MercLength explain analyze verbose select count(*) from points where MercLength( ST_MakeLine( st_transform(geom,3857), st_transform( ST_SetSRID(ST_MakePoint(-71,42),4326) ,3857) ) ) > kpi^10; - With parallel: 2043 - Without parallel: 4978 ### OrientedEnvelope explain analyze select sum(r) from ( select sum(st_area(orientedenvelope(geom))) as r from points group by kpi % 50000 union all select sum(st_area(orientedenvelope(geom))) as r from points group by kpi % 70000 ) a; - With parallel: 8727 - Without parallel: 22693 ### Sieve explain analyze select sum(st_area(sieve(geom,0.1))) from countries where iso_a2 like 'ES' union all select sum(st_area(sieve(geom,0.1))) from countries where iso_a2 like 'IT'; - With parallel: 8368 - Without parallel: 12244 ### SmartShrink explain analyze select sum(st_area(smartshrink(geom,0.1))) from countries where ogc_fid % 7 = 0 UNION ALL select sum(st_area(smartshrink(geom,0.1))) from countries where ogc_fid % 5 = 0; - With parallel: 15129 - Without parallel: 21061 ### TileBBox explain analyze select count(*) from countries where st_centroid(geom) && tilebbox(6,28,26) union select count(*) from points where st_centroid(geom) && tilebbox(6,28,27); - With parallel: 470 - Without parallel: 879 ### ToPoint explain analyze select count(*) from countries where ToPoint(geom) && tilebbox(6,28,26) union select count(*) from points where ToPoint(geom) && tilebbox(6,28,27); - With parallel: 3857 - Without: 4948 </details>
I realized that I skipped `CleanInt` and `CleanNumeric` on my [last PR](#1) 😞. These two functions implemented the same approach as the [`as_numeric`](https://github.com/openmaptiles/openmaptiles/blob/master/layers/building/building.sql#L4-L14) on OpenMapTiles repo, with the only difference being that these return `NULL` instead of `-1`. The approach of using an exception avoids running them in parallel. Hence, I decided to bring the regular expression approach to these functions I proposed on openmaptiles/openmaptiles#728. Here there is already a testing mechanism, so I added more checks to ensure more cases are covered. I also improved a bit the tests to allow compare rounded results against a given number of decimals, instead of using `floor`. Finally, I updated the Travis configuration, since the current one was not working properly. This took more than expected (in fact ended up almost using the same configuration used at [CartoDB/cartodb-postgresql](https://github.com/CartoDB/cartodb-postgresql/blob/master/.travis.yml), but the outcome is good in my opinion, since we now can test all the functions in PostGIS 2.5 on Postgres 9.6, 10, 11, and 12 and PostGIS 3 on 12. You can see the build execution in my [Travis account](https://travis-ci.org/jsanz/postgis-vt-util). I also found another issue, the function `OrientedEnvelope`, equivalent to PostGIS 3 [`ST_OrientedEnvelope`](https://postgis.net/docs/ST_OrientedEnvelope.html) but implemented in SQL, yields different results in PostGIS 2.5 and 3. I haven't put time debugging the algorithm, we may want to open a new issue for this, but not sure about how much is worth, being PostGIS 3 the current stable release. If this PR is merged, we can then make OpenMapTiles `as_numeric` to be just a wrapper to return a `-1`, or if it makes more sense, just replace the function to use `CleanNumeric` I actually prefer a function like this to return a `null` value instead of a value that is a valid input. * Replaced CleanInt/CleanNumeric by parallel safe versions, improved tests * fixes in test and travis config * fix postgis version test for orientedenvelope check * Update .travis.yml Co-Authored-By: Yuri Astrakhan <[email protected]> * added new line at the end of the file Co-authored-by: Yuri Astrakhan <[email protected]>
Current implementation crashes if the value is a single '-' or '+' char, or in other edge cases like 'e2'. The code has been rewritten from openmaptiles/openmaptiles#785 (comment) suggestion based on @RhodiumToad suggested code. This change was fully tested locally using `npm test` P.S. Admins, please enable [travis-ci build](https://travis-ci.org/github/openmaptiles/postgis-vt-util) for this repo. cc @jsanz @sfkeller
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Patch researched and created by @jsanz. It has been merged to OpenMapTiles,
and should probably be merged to the origin as well. Notes from @jsanz:
Since Postgres 9.6 the
PARALLEL SAFE
modifiers for function definition enables the execution of queries using one or more workers in parallel (more details here).This straight forward PR adds this modifier to this repo functions to avoid them potentially blocking parallel execution when present.
I've done tests on all functions to ensure they don't block parallel executions with the following procedure:
postgres:12
Docker container with Postgis 3 installedpoints
with 1M random points and a randomkpi
integer columnne_10m_admin_1_states_provinces
from Natural Earth to have also a multipolygon tableThe tests below shared are just meant to showcase that
PARALLEL SAFE
can help the faster execution of queries and is not an indicator of the improvement introduced since there are many factors that can affect that.It's worth noting that this makes these functions only available for Postgres 9.6 onwards so maybe we could consider having both versions in separated folders. Let me know if that would make sense to modify the PR that way.
Query examples
For each query, the execution time in milliseconds with and without the
PARALLEL SAFE
keyword is shared.MakeArc
MercBuffer
MercDWithin
MercLength
OrientedEnvelope
Sieve
SmartShrink
TileBBox
ToPoint