Available optimizations

Effective caching

Since the beginning, pg_pathman aims to improve performance of queries involving partitioned tables with significant amount of partitions (i.e. 1K+). Although PostgreSQL 10 features built-in support for partitioning, it still uses an exhaustive search in order to select suitable partitions, which gets slower and slower as the total amount increases.

Our per-relation cache (partition dispatch cache) stores an ordered array of structs containing:

Partition's Oid (a unique identifier of this partition);
Left & right bounds (RANGE partitioning) OR hash value (HASH partitioning);

This allows us to perform a binary search (tables partitioned by RANGE), and use hash value modulo amount of partitions as an index of the corresponding partition (tables partitioned by HASH). Furthermore, we strive to remove redundant WHERE clauses (for instance, we don't have to recheck key < 20 for rows of partition covering [0, 10)).

You can examine cache stats using the following query:

select * from pathman_cache_stats;
         context          |  size  | used  | entries
--------------------------+--------+-------+---------
 maintenance              |      0 |     0 |       0
 partition dispatch cache |   9216 |  8496 |       2 -- N partitioned tables
 partition parents cache  | 122880 | 70912 |      10 -- cached pairs (child, parent)
 partition bounds cache   | 122880 | 71936 |      10 -- cached bounds (L + R)
(4 rows)

Custom Nodes

pg_pathman provides a couple of custom plan nodes which aim to reduce execution time, namely:

RuntimeAppend (overrides Append plan node)
RuntimeMergeAppend (overrides MergeAppend plan node)
PartitionFilter (drop-in replacement for INSERT triggers)

PartitionFilter acts as a proxy node for INSERT's child scan, which means it can redirect output tuples to the corresponding partition:

EXPLAIN (COSTS OFF)
INSERT INTO partitioned_table
SELECT generate_series(1, 10), random();
               QUERY PLAN
-----------------------------------------
 Insert on partitioned_table
   ->  Custom Scan (PartitionFilter)
         ->  Subquery Scan on "*SELECT*"
               ->  Result
(4 rows)

RuntimeAppend and RuntimeMergeAppend have much in common: they come in handy in a case when WHERE condition takes form of:

VARIABLE OP PARAM

This kind of expressions can no longer be optimized at planning time since the parameter's value is not known until the execution stage takes place. The problem can be solved by embedding the WHERE condition analysis routine into the original Append's code, thus making it pick only required scans out of a whole bunch of planned partition scans. This effectively boils down to creation of a custom node capable of performing such a check.

Examples

Here's a few examples of queries with improved plans:

create table simple (id int not null);
insert into simple values (10), (20);

create table abc (id int not null);
select create_range_partitions('abc', 'id', 1, 10, 10);


/* pg_pathman */
explain (analyze, costs off) select * from abc where id < 21;
                               QUERY PLAN
------------------------------------------------------------------------
 Append (actual time=0.034..38.046 rows=190000 loops=1)
   ->  Seq Scan on abc_1 (actual time=0.032..13.249 rows=90000 loops=1)
   ->  Seq Scan on abc_2 (actual time=0.006..9.746 rows=100000 loops=1)
 Planning time: 0.379 ms
 Execution time: 45.777 ms
(5 rows)

/* standard inheritance */
explain (analyze, costs off) select * from abc where id < 21;
                               QUERY PLAN
-------------------------------------------------------------------------
 Append (actual time=0.017..51.047 rows=190001 loops=1)
   ->  Seq Scan on abc (actual time=0.016..0.018 rows=1 loops=1)
         Filter: (id < 21)
   ->  Seq Scan on abc_1 (actual time=0.016..20.356 rows=90000 loops=1)
         Filter: (id < 21)
   ->  Seq Scan on abc_2 (actual time=0.011..17.618 rows=100000 loops=1)
         Filter: (id < 21)
 Planning time: 0.847 ms
 Execution time: 57.893 ms
(9 rows)


/* pg_pathman */
explain (analyze, costs off) update simple set id = 2
from abc where simple.id = abc.id;
                                       QUERY PLAN
----------------------------------------------------------------------------------------
 Update on simple (actual time=54.708..54.708 rows=0 loops=1)
   ->  Nested Loop (actual time=0.098..42.023 rows=20000 loops=1)
         ->  Seq Scan on simple (actual time=0.017..0.019 rows=2 loops=1)
         ->  Custom Scan (RuntimeAppend) (actual time=0.022..19.123 rows=10000 loops=2)
               Prune by: (simple.id = abc.id)
               ->  Seq Scan on abc_1 (actual time=0.031..19.920 rows=10000 loops=1)
                     Filter: (simple.id = id)
                     Rows Removed by Filter: 90000
               ->  Seq Scan on abc_2 (actual time=0.012..16.387 rows=10000 loops=1)
                     Filter: (simple.id = id)
                     Rows Removed by Filter: 90000
 Planning time: 3.450 ms
 Execution time: 54.858 ms
(13 rows)

/* standard inheritance */
explain (analyze, costs off) update simple set id = 2
from abc where simple.id = abc.id;
                                      QUERY PLAN
--------------------------------------------------------------------------------------
 Update on simple (actual time=399.068..399.068 rows=0 loops=1)
   ->  Hash Join (actual time=0.096..385.563 rows=20000 loops=1)
         Hash Cond: (abc.id = simple.id)
         ->  Append (actual time=0.034..281.405 rows=1000000 loops=1)
               ->  Seq Scan on abc (actual time=0.006..0.006 rows=0 loops=1)
               ->  Seq Scan on abc_1 (actual time=0.025..25.897 rows=100000 loops=1)
               ->  Seq Scan on abc_2 (actual time=0.008..21.762 rows=100000 loops=1)
               ->  Seq Scan on abc_3 (actual time=0.008..21.162 rows=100000 loops=1)
               ->  Seq Scan on abc_4 (actual time=0.006..21.217 rows=100000 loops=1)
               ->  Seq Scan on abc_5 (actual time=0.006..21.300 rows=100000 loops=1)
               ->  Seq Scan on abc_6 (actual time=0.006..21.507 rows=100000 loops=1)
               ->  Seq Scan on abc_7 (actual time=0.007..21.285 rows=100000 loops=1)
               ->  Seq Scan on abc_8 (actual time=0.006..21.182 rows=100000 loops=1)
               ->  Seq Scan on abc_9 (actual time=0.008..21.253 rows=100000 loops=1)
               ->  Seq Scan on abc_10 (actual time=0.006..21.666 rows=100000 loops=1)
         ->  Hash (actual time=0.019..0.019 rows=2 loops=1)
               Buckets: 1024  Batches: 1  Memory Usage: 9kB
               ->  Seq Scan on simple (actual time=0.006..0.009 rows=2 loops=1)
 Planning time: 1.405 ms
 Execution time: 399.338 ms
(20 rows)


/* pg_pathman */
explain (analyze, costs off) delete from abc where id = 1;
                             QUERY PLAN
---------------------------------------------------------------------
 Delete on abc_1 (actual time=22.751..22.751 rows=0 loops=1)
   ->  Seq Scan on abc_1 (actual time=22.749..22.749 rows=0 loops=1)
         Filter: (id = 1)
         Rows Removed by Filter: 90000
 Planning time: 0.144 ms
 Execution time: 22.786 ms
(6 rows)

/* standard inheritance */
explain (analyze, costs off) delete from abc where id = 1;
                               QUERY PLAN                               
------------------------------------------------------------------------
 Delete on abc (actual time=36.546..36.546 rows=0 loops=1)
   Delete on abc
   Delete on abc_1
   ->  Seq Scan on abc (actual time=0.006..0.006 rows=0 loops=1)
         Filter: (id = 1)
   ->  Seq Scan on abc_1 (actual time=0.092..21.588 rows=10000 loops=1)
         Filter: (id = 1)
         Rows Removed by Filter: 90000
 Planning time: 1.603 ms
 Execution time: 36.615 ms
(10 rows)

Caveats

It's hard to support different releases of PostgreSQL (9.5, 9.6, 10 etc), but we try to do our best. However, there might be some problems with planning that are inherent to the specific release of PostgreSQL, since we use different version-specific pieces of code to make pg_pathman work. For example, we had to disable some optimizations on PostgreSQL 9.5 in:

SELECT ... FROM partitioned_table sub-queries under UPDATE & DELETE statements;
SELECT ... FROM partitioned_table FOR SHARE/UPDATE/etc.

/*
 * SELECT sub-query under UPDATE
 */

/* PostgreSQL 9.5, chose all partitions and parent */
explain (costs off)
update simple set id = 2 where id in (select * from abc
                                      where id = simple.id and id < 20);
                           QUERY PLAN
----------------------------------------------------------------
 Update on simple
   ->  Seq Scan on simple
         Filter: (SubPlan 1)
         SubPlan 1
           ->  Append
                 ->  Seq Scan on abc
                       Filter: ((id < 20) AND (id = simple.id))
                 ->  Seq Scan on abc_1
                       Filter: ((id < 20) AND (id = simple.id))
                 ->  Seq Scan on abc_2
                       Filter: ((id < 20) AND (id = simple.id))
(11 rows)

/* PostgreSQL 9.6+, chose only relevant partitions */
explain (costs off)
update simple set id = 2 where id in (select * from abc
                                      where id = simple.id and id < 20);
                             QUERY PLAN
--------------------------------------------------------------------
 Update on simple
   ->  Seq Scan on simple
         Filter: (SubPlan 1)
         SubPlan 1
           ->  Custom Scan (RuntimeAppend)
                 Prune by: ((abc.id < 20) AND (abc.id = simple.id))
                 ->  Seq Scan on abc_1 abc
                       Filter: (id = simple.id)
                 ->  Seq Scan on abc_2 abc
                       Filter: ((id < 20) AND (id = simple.id))
(10 rows)

/*
 * SELECT ... FOR UPDATE
 */

/* PostgreSQL 9.5, couldn't exclude parent */
EXPLAIN (COSTS OFF)
SELECT * FROM rowmarks.first ORDER BY id FOR UPDATE;
              QUERY PLAN               
---------------------------------------
 LockRows
   ->  Sort
         Sort Key: first.id
         ->  Append
               ->  Seq Scan on first
               ->  Seq Scan on first_0
               ->  Seq Scan on first_1
               ->  Seq Scan on first_2
(8 rows)

/* PostgreSQL 9.6+, better planning time (hidden) */
EXPLAIN (COSTS OFF)
SELECT * FROM rowmarks.first ORDER BY id FOR UPDATE;
              QUERY PLAN               
---------------------------------------
 LockRows
   ->  Sort
         Sort Key: first_0.id
         ->  Append
               ->  Seq Scan on first_0
               ->  Seq Scan on first_1
               ->  Seq Scan on first_2
(7 rows)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Available optimizations

Effective caching

Custom Nodes

Examples

Caveats

Clone this wiki locally