Skip to content

Files

Latest commit

Dec 15, 2023
f400df2 · Dec 15, 2023

History

History
60 lines (30 loc) · 2.94 KB

20210823_01.md

File metadata and controls

60 lines (30 loc) · 2.94 KB

PostgreSQL 15 preview - Allow parallel DISTINCT - 2阶段并行计算

作者

digoal

日期

2021-08-23

标签

PostgreSQL , 并行 distinct


背景

https://git.postgresql.org/gitweb/?p=postgresql.git;a=commit;h=22c4e88ebff408acd52e212543a77158bde59e69

This is implemented by introducing a two-phase DISTINCT. Phase 1 is
performed on parallel workers, rows are made distinct there either by
hashing or by sort/unique. The results from the parallel workers are
combined and the final distinct phase is performed serially to get rid of
any duplicate rows that appear due to combining rows for each of the
parallel workers.

引入了map reduce思路, 支持并行distinct. 如果能形成公共的api就更强了, 用户可以自定义操作.

《Greenplum支持人为多阶段聚合的方法 - 直连segment(PGOPTIONS='-c gp_session_role=utility') Or gp_dist_random('gp_id') Or 多阶段聚合 prefunc》

《PostgreSQL 11 preview - 多阶段并行聚合array_agg, string_agg》

《HybridDB PostgreSQL "Sort、Group、distinct 聚合、JOIN" 不惧怕数据倾斜的黑科技和原理 - 多阶段聚合》

《PostgreSQL 10 自定义并行计算聚合函数的原理与实践 - (含array_agg合并多个数组为单个一元数组的例子)》

您的愿望将传达给PG kernel hacker、数据库厂商等, 帮助提高数据库产品质量和功能, 说不定下一个PG版本就有您提出的功能点. 针对非常好的提议,奖励限量版PG文化衫、纪念品、贴纸、PG热门书籍等,奖品丰富,快来许愿。开不开森.

digoal's wechat