PostgreSQL 通过行迁移无需额外空间回收垃圾膨胀磁盘空间

作者

digoal

日期

2021-08-31

背景

膨胀后常规的垃圾回收方法: pg_repack, vacuum full

但是这两种方法都需要额外的存储空间, 将数据重组后再替换老的数据文件.

如果你的环境已经拮据到无法提供额外的磁盘空间来存放整理后的数据, 那么可以考虑本文的方法, 通过行迁移的方法来进行垃圾回收.

原理:
普通的vacuum只能truncate数据文件末尾的空block, 所以我们可以将末尾的tuple移动到前面, 从而从磁盘回收末尾的block.

为什么只能truncate数据文件末尾的空block?
因为非末尾的block被清掉之后寻址会发生变化, 例如第二个数据块回收掉, 那么2号数据块后面的数据块的编号都需要减1, 而索引的ctid指向的是原来的编号, 因此会导致索引不准确. 当然, 我们可以增加1个文件存储真空块(已回收的中间blockid, 寻址时通过这个数据再进行block定位), 但是会增加寻址的复杂度, 性能可能下降.

好了, 下面说说如何通过末尾block的行迁移来实现无需额外空间的磁盘空间回收.

postgres=# create table t (id int primary key, info text, crt_time timestamp);  
CREATE TABLE  
postgres=# insert into t select generate_series(1,10000000), random()::text, now();  
INSERT 0 10000000

清楚前面的数据, 产生一大块空间浪费

postgres=# delete from t where id<9000000;  
DELETE 8999999

postgres=# vacuum t;  
VACUUM

查看索引和表, 都是1000万条记录时的占用, 没有回收

postgres=# \di+  
                                             List of relations  
 Schema |     Name      | Type  |  Owner   |  Table   | Persistence | Access method |  Size  | Description   
--------+---------------+-------+----------+----------+-------------+---------------+--------+-------------  
 public | comments_pkey | index | postgres | comments | permanent   | btree         | 16 kB  |   
 public | posts_pkey    | index | postgres | posts    | permanent   | btree         | 16 kB  |   
 public | t_pkey        | index | postgres | t        | permanent   | btree         | 214 MB |   
(3 rows)  
  
postgres=# \dt+ t  
                                   List of relations  
 Schema | Name | Type  |  Owner   | Persistence | Access method |  Size  | Description   
--------+------+-------+----------+-------------+---------------+--------+-------------  
 public | t    | table | postgres | permanent   | heap          | 579 MB |   
(1 row)

行迁移会产生DML, 所以建议先删掉索引或PK约束再进行迁移, 反正做完还是需要重建索引来回收垃圾.

drop index t_pkey;  
or  
alter table t drop constraint t_pkey;

接下来看清楚:

删除末尾记录, 返回value, 又插回原始表, 这样就完成了行迁移的动作.

postgres=# with a as (delete from t where ctid = any(array(select ctid from t order by substring(ctid::text,'(\d+),')::int desc limit 1000000)) returning *) insert into t select * from a;  
INSERT 0 1000000  
  
postgres=# vacuum (truncate) t;  
VACUUM  
  
postgres=# \dt+ t  
                                  List of relations  
 Schema | Name | Type  |  Owner   | Persistence | Access method | Size  | Description   
--------+------+-------+----------+-------------+---------------+-------+-------------  
 public | t    | table | postgres | permanent   | heap          | 66 MB |   
(1 row)

postgres=# select count(*) from t;  
  count    
---------  
 1000001  
(1 row)

alter table t add constraint t_pkey primary key (id);  
  
postgres=# \di+  
                                            List of relations  
 Schema |     Name      | Type  |  Owner   |  Table   | Persistence | Access method | Size  | Description   
--------+---------------+-------+----------+----------+-------------+---------------+-------+-------------  
 public | t_pkey        | index | postgres | t        | permanent   | btree         | 21 MB |

PostgreSQL 许愿链接

您的愿望将传达给PG kernel hacker、数据库厂商等, 帮助提高数据库产品质量和功能, 说不定下一个PG版本就有您提出的功能点. 针对非常好的提议，奖励限量版PG文化衫、纪念品、贴纸、PG热门书籍等，奖品丰富，快来许愿。开不开森.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Files

20210831_01.md

20210831_01.md

PostgreSQL 通过行迁移无需额外空间回收垃圾膨胀磁盘空间

作者

日期

标签

背景

PostgreSQL 许愿链接

9.9元购买3个月阿里云RDS PostgreSQL实例

PostgreSQL 解决方案集合

德哥 / digoal's github - 公益是一辈子的事.

PolarDB 学习图谱: 训练营、培训认证、在线互动实验、解决方案、生态合作、写心得拿奖品

购买PolarDB云服务折扣活动进行中, 55元起

About 德哥

Files

20210831_01.md

Latest commit

History

20210831_01.md

File metadata and controls

PostgreSQL 通过行迁移 无需额外空间 回收垃圾膨胀磁盘空间

作者

日期

标签

背景

PostgreSQL 许愿链接

9.9元购买3个月阿里云RDS PostgreSQL实例

PostgreSQL 解决方案集合

德哥 / digoal's github - 公益是一辈子的事.

PolarDB 学习图谱: 训练营、培训认证、在线互动实验、解决方案、生态合作、写心得拿奖品

购买PolarDB云服务折扣活动进行中, 55元起

About 德哥

PostgreSQL 通过行迁移无需额外空间回收垃圾膨胀磁盘空间