Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

added parallel safe keyword to function definitions #23

Open
wants to merge 6 commits into
base: master
Choose a base branch
from

Conversation

nyurik
Copy link

@nyurik nyurik commented Dec 6, 2019

Patch researched and created by @jsanz. It has been merged to OpenMapTiles,
and should probably be merged to the origin as well. Notes from @jsanz:

Since Postgres 9.6 the PARALLEL SAFE modifiers for function definition enables the execution of queries using one or more workers in parallel (more details here).

This straight forward PR adds this modifier to this repo functions to avoid them potentially blocking parallel execution when present.

I've done tests on all functions to ensure they don't block parallel executions with the following procedure:

  • Created a sample database on a postgres:12 Docker container with Postgis 3 installed
  • Created a sample table points with 1M random points and a random kpi integer column
  • Imported the layer ne_10m_admin_1_states_provinces from Natural Earth to have also a multipolygon table
  • Relaxed the conditions for parallel execution:
[local] postgres@bench=#  set parallel_setup_cost = 10;
SET
Time: 0.322 ms
[local] postgres@bench=# set parallel_tuple_cost = 0.001;
SET
  • Run sample queries with the keyword enabled until the parallel execution was activated
  • Removed the keyword and run the same query for comparison.

The tests below shared are just meant to showcase that PARALLEL SAFE can help the faster execution of queries and is not an indicator of the improvement introduced since there are many factors that can affect that.

It's worth noting that this makes these functions only available for Postgres 9.6 onwards so maybe we could consider having both versions in separated folders. Let me know if that would make sense to modify the PR that way.

Query examples

For each query, the execution time in milliseconds with and without the PARALLEL SAFE keyword is shared.

MakeArc

explain analyze 
select makearc(geom_lag, geom, geom_lead)
from (
        select lag(geom) over (partition by kpi % 3 order by kpi) geom_lag,
                geom,
                lead(geom) over (partition by kpi % 3 order by kpi) geom_lead
        from points
) a 
union all
select makearc(geom_lag, geom, geom_lead)
from (
        select lag(geom) over (partition by kpi % 4 order by kpi) geom_lag,
                geom,
                lead(geom) over (partition by kpi % 4 order by kpi) geom_lead
        from points
) a;
  • With parallel: 1148
  • Without parallel: 19395

MercBuffer

explain analyze verbose 
select count(*) 
from points 
where st_area(mercbuffer(geom,kpi)) > kpi^10;
  • With parallel: 8942
  • Without parallel: 25351

MercDWithin

explain analyze verbose 
select count(*) 
from points 
where
MercDWithin(
        st_transform(geom,3857),
        st_transform(
                ST_SetSRID(
                        ST_MakePoint( -71.1043443253471,42.3150676015829)
                        ,4326)
                ,3857)
        ,100000);
  • With parallel: 1222
  • Without parallel: 2885

MercLength

explain analyze verbose 
select count(*) 
from points 
where
MercLength(
        ST_MakeLine(
                st_transform(geom,3857),
                st_transform(
                        ST_SetSRID(ST_MakePoint(-71,42),4326)
                        ,3857)
        )
) > kpi^10;
  • With parallel: 2043
  • Without parallel: 4978

OrientedEnvelope

explain analyze  
select sum(r) 
from
(
        select sum(st_area(orientedenvelope(geom))) as r
        from points 
        group by kpi % 50000
        union all
        select sum(st_area(orientedenvelope(geom))) as r
        from points 
        group by kpi % 70000
) a;
  • With parallel: 8727
  • Without parallel: 22693

Sieve

explain analyze
select sum(st_area(sieve(geom,0.1)))
from countries
where iso_a2 like 'ES'
union all
select sum(st_area(sieve(geom,0.1)))
from countries
where iso_a2 like 'IT';
  • With parallel: 8368
  • Without parallel: 12244

SmartShrink

explain analyze
select sum(st_area(smartshrink(geom,0.1)))
from countries where ogc_fid % 7 = 0
UNION ALL
select sum(st_area(smartshrink(geom,0.1)))
from countries where ogc_fid % 5 = 0;
  • With parallel: 15129
  • Without parallel: 21061

TileBBox

explain analyze
select count(*)
from countries
where st_centroid(geom) && tilebbox(6,28,26)
union
select count(*)
from points
where st_centroid(geom) && tilebbox(6,28,27);
  • With parallel: 470
  • Without parallel: 879

ToPoint

explain analyze
select count(*)
from countries
where ToPoint(geom) && tilebbox(6,28,26)
union
select count(*)
from points
where ToPoint(geom) && tilebbox(6,28,27);
  • With parallel: 3857
  • Without: 4948

jsanz and others added 6 commits November 29, 2019 15:46
Added parallel safe keyword to function definitions

Since Postgres 9.6 the `PARALLEL SAFE` modifiers for function definition enables the execution of queries using one or more workers in parallel (more details [here](https://www.postgresql.org/docs/current/parallel-safety.html)).

This straight forward PR adds this modifier to this repo functions to avoid them potentially blocking parallel execution when present.

I've done tests on all functions to ensure they don't block parallel executions with the following procedure:

* Created a sample database on a `postgres:12` Docker container with Postgis 3 installed
* Created a sample table `points` with 1M random points and a random `kpi` integer column
* Imported the layer `ne_10m_admin_1_states_provinces` from Natural Earth to have also a multipolygon table
* Relaxed the conditions for parallel execution:
```
[local] postgres@bench=#  set parallel_setup_cost = 10;
SET
Time: 0.322 ms
[local] postgres@bench=# set parallel_tuple_cost = 0.001;
SET
```

* Run sample queries with the keyword enabled until the parallel execution was activated
* Removed the keyword and run the same query for comparison.

The tests below shared are just meant to showcase that `PARALLEL SAFE` can help the faster execution of queries and is not an indicator of the improvement introduced since there are many factors that can affect that.

It's worth noting that this makes these functions only available for [Postgres 9.6](https://www.postgresql.org/docs/9.6/sql-createfunction.html) onwards so maybe we could consider having both versions in separated folders. Let me know if that would make sense to modify the PR that way.

<details><summary>Query examples</summary>

For each query, the execution time in milliseconds with and without the `PARALLEL SAFE` keyword is shared.

### MakeArc

    explain analyze 
    select makearc(geom_lag, geom, geom_lead)
    from (
            select lag(geom) over (partition by kpi % 3 order by kpi) geom_lag,
                    geom,
                    lead(geom) over (partition by kpi % 3 order by kpi) geom_lead
            from points
    ) a 
    union all
    select makearc(geom_lag, geom, geom_lead)
    from (
            select lag(geom) over (partition by kpi % 4 order by kpi) geom_lag,
                    geom,
                    lead(geom) over (partition by kpi % 4 order by kpi) geom_lead
            from points
    ) a;

- With parallel: 1148
- Without parallel: 19395

### MercBuffer

    explain analyze verbose 
    select count(*) 
    from points 
    where st_area(mercbuffer(geom,kpi)) > kpi^10;

- With parallel: 8942
- Without parallel: 25351

### MercDWithin

    explain analyze verbose 
    select count(*) 
    from points 
    where
    MercDWithin(
            st_transform(geom,3857),
            st_transform(
                    ST_SetSRID(
                            ST_MakePoint( -71.1043443253471,42.3150676015829)
                            ,4326)
                    ,3857)
            ,100000);

- With parallel: 1222
- Without parallel: 2885

### MercLength

    explain analyze verbose 
    select count(*) 
    from points 
    where
    MercLength(
            ST_MakeLine(
                    st_transform(geom,3857),
                    st_transform(
                            ST_SetSRID(ST_MakePoint(-71,42),4326)
                            ,3857)
            )
    ) > kpi^10;

- With parallel: 2043
- Without parallel: 4978

### OrientedEnvelope

    explain analyze  
    select sum(r) 
    from
    (
            select sum(st_area(orientedenvelope(geom))) as r
            from points 
            group by kpi % 50000
            union all
            select sum(st_area(orientedenvelope(geom))) as r
            from points 
            group by kpi % 70000
    ) a;

- With parallel: 8727
- Without parallel: 22693

### Sieve

    explain analyze
    select sum(st_area(sieve(geom,0.1)))
    from countries
    where iso_a2 like 'ES'
    union all
    select sum(st_area(sieve(geom,0.1)))
    from countries
    where iso_a2 like 'IT';

- With parallel: 8368
- Without parallel: 12244

### SmartShrink

    explain analyze
    select sum(st_area(smartshrink(geom,0.1)))
    from countries where ogc_fid % 7 = 0
    UNION ALL
    select sum(st_area(smartshrink(geom,0.1)))
    from countries where ogc_fid % 5 = 0;

- With parallel: 15129
- Without parallel: 21061

### TileBBox

    explain analyze
    select count(*)
    from countries
    where st_centroid(geom) && tilebbox(6,28,26)
    union
    select count(*)
    from points
    where st_centroid(geom) && tilebbox(6,28,27);

- With parallel: 470
- Without parallel: 879

### ToPoint

    explain analyze
    select count(*)
    from countries
    where ToPoint(geom) && tilebbox(6,28,26)
    union
    select count(*)
    from points
    where ToPoint(geom) && tilebbox(6,28,27);

- With parallel: 3857
- Without: 4948

</details>
I realized that I skipped `CleanInt` and `CleanNumeric` on my [last PR](#1) 😞. These two functions implemented the same approach as the [`as_numeric`](https://github.com/openmaptiles/openmaptiles/blob/master/layers/building/building.sql#L4-L14) on OpenMapTiles repo, with the only difference being that these return `NULL` instead of `-1`. The approach of using an exception avoids running them in parallel.

Hence, I decided to bring the regular expression approach to these functions I proposed on openmaptiles/openmaptiles#728. Here there is already a testing mechanism, so I added more checks to ensure more cases are covered. I also improved a bit the tests to allow compare rounded results against a given number of decimals, instead of using `floor`.

Finally, I updated the Travis configuration, since the current one was not working properly. This took more than expected (in fact ended up almost using the same configuration used at [CartoDB/cartodb-postgresql](https://github.com/CartoDB/cartodb-postgresql/blob/master/.travis.yml), but the outcome is good in my opinion, since we now can test all the functions in PostGIS 2.5 on Postgres 9.6, 10, 11, and 12 and PostGIS 3 on 12. You can see the build execution in my [Travis account](https://travis-ci.org/jsanz/postgis-vt-util).

I also found another issue, the function `OrientedEnvelope`, equivalent to  PostGIS 3 [`ST_OrientedEnvelope`](https://postgis.net/docs/ST_OrientedEnvelope.html) but implemented in SQL, yields different results in PostGIS 2.5 and 3. I haven't put time debugging the algorithm, we may want to open a new issue for this, but not sure about how much is worth, being PostGIS 3 the current stable release.

If this PR is merged, we can then make OpenMapTiles `as_numeric` to be just a wrapper to return a `-1`, or if it makes more sense, just replace the function to use `CleanNumeric` I actually prefer a function like this to return a `null` value instead of a value that is a valid input.


* Replaced CleanInt/CleanNumeric by parallel safe versions, improved tests
* fixes in test and travis config
* fix postgis version test for orientedenvelope check
* Update .travis.yml
Co-Authored-By: Yuri Astrakhan <[email protected]>
* added new line at the end of the file
Co-authored-by: Yuri Astrakhan <[email protected]>
Current implementation crashes if the value is a single '-' or '+' char, or in other edge cases like 'e2'.
The code has been rewritten from openmaptiles/openmaptiles#785 (comment) suggestion based on @RhodiumToad suggested code.

This change was fully tested locally using `npm test`

P.S. Admins, please enable [travis-ci build](https://travis-ci.org/github/openmaptiles/postgis-vt-util) for this repo.

cc @jsanz @sfkeller
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants