Skip to content

Files

Latest commit

Dec 15, 2023
f400df2 · Dec 15, 2023

History

History
224 lines (150 loc) · 15.1 KB

20210801_04.md

File metadata and controls

224 lines (150 loc) · 15.1 KB

PostgreSQL sharding extension citus 10.1 发布了哪些特性

作者

digoal

日期

2021-08-01

标签

PostgreSQL , citus , sharding


背景

7.16号发布的10.1, 7.30官方给了一个解读10.1特性的比较详细的blog.

https://www.citusdata.com/blog/2021/07/30/citus-10-1-extension-to-postgres-whats-new/

what is new in Citus 10.1?
In this post let’s explore the Citus 10.1 fixes to:

  • Shard rebalancer optimizations (performance & UX)
  • Reduce memory usage for multi-row INSERTs (performance)
  • Adaptive executor enhancements (performance & UX)
  • Better performance for the citus_shards view (performance)
  • Inline VALUES in CTEs as subqueries (performance)
  • UX improvements using Citus with Postgres partitions (UX)
  • Protection from accidental dropping/truncating of shards (UX)
  • Add shard_count parameter to create_distributed_table (UX)

Or, if you want the itemized list of all the improvements in 10.1, then take a peek at the Changelog for Citus 10.1.0 over in our GitHub repo.

Shard rebalancer optimizations

In 10.1, the shard rebalancer—which you know we open sourced in Citus 10, hurray—is now optimized for scenarios with many tables and/or lots of Postgres partitions. And the rebalancer is now smarter about ensuring that the target node is OK to move new shards. Plus it works better with high traffic workloads (reads) running concurrently to the shard moves.

I know that Jelte Fennema plans on writing a deeper dive into some of the shard rebalancer improvements in Citus 10.1… so I’ll leave it to Jelte to go deeper in an upcoming post. Suffice it to say, there were lots of shard rebalancer improvements in 10.1, including:

  • Reduces memory usage while rebalancing shards
  • Enables citus.defer_drop_after_shard_move by default
  • Ensures free disk space before moving a shard
  • Makes sure that target node in shard moves is eligible for shard move
  • Optimizes partitioned disk size calculation for shard rebalancer
  • Implements improvement_threshold at shard rebalancer moves
  • Improves orphaned shard cleanup logic
  • Continues to remove shards after failure in DropMarkedShards
  • Makes sure connection is closed after each shard move
  • Fetches shard size on the fly for the rebalance monitor
  • Fixes two race conditions in the get_rebalance_progress
  • Fixes problems with concurrent calls of DropMarkedShards
  • Fixes a bug that allowed moving of shards belonging to a reference table
  • Fixes the relation size bug during rebalancing

Reduce memory usage for multi-row INSERTs

When you want to ingest data into Postgres, you have 3 different ways to do so:

  • Single-row insert—you just insert row by row, repeating the INSERT command
  • Postgres COPY utility—this is the fastest way to ingest data into Postgres (and Citus) and we typically recommend using COPY wherever possible
  • Multi-row INSERT—some applications and ORMs generate INSERTs as multi-row INSERTs, which is just a single INSERT statement with multiple values. It’s pretty common to do multi-row INSERTs.

In 10.1, Citus now uses less memory for multi-row INSERTs. Specifically, peak memory usage is reduced by up to 50%. Hence it is now faster to load data into Citus with multi-row INSERTs. Yay.

Adaptive executor enhancements

If you use Citus to scale out Postgres and you’ve ever encountered these harmless—but annoying—log messages on your worker nodes, you’ll be glad to hear that we’ve updated the Citus adaptive executor in 10.1 so that these messages won’t fill up your logs anymore.

2020-11-16 21:09:09.800 CET [16633] LOG:  could not accept SSL connection: Connection reset by peer  
2020-11-16 21:09:09.872 CET [16657] LOG:  could not accept SSL connection: Undefined error: 0  
2020-11-16 21:09:09.894 CET [16667] LOG:  could not accept SSL connection: Connection reset by peer  

It turns out that when we rewrote the Citus adaptive executor, one side-effect of those changes was that we would terminate no-longer-needed connections, even if they were still in the process of being established. You see, the Citus adaptive executor has a slow-start algorithm and establishes connections every 10ms (by default). As soon as Citus determined that you no longer needed the connections, the Citus executor would terminate them—triggering these noisy log messages.

So in 10.1, we improved the adaptive executor’s decision-making process. Making it so these noisy log messages don’t happen.

Of course, if you use PgBouncer with Citus to manage connection pooling, you’re probably thinking “huh”? With PgBouncer, you would never have seen these log messages—so you’re unaffected by this 10.1 improvement.

And there’s a performance gain too. The improved decision-making—where we factor in the connection-establishment times into our decision logic—gives you significant improvement for workloads where the concurrency is high (>30 concurrent queries) AND all queries hit multiple shards AND individual query execution times are < 10ms. This scenario is common if the queries are doing index scans that are not on the distribution column.

The chart below shows the performance gains—as measured by TPS, aka transactions per second—for this high-concurrency scenario. (We followed these steps to simulate high-concurrency workloads and measure this performance1.

# concurrent connections running on Citus coordinator Citus 10.0 TPS Citus 10.1 TPS Citus 10.1 is
32 69.34 463.62 6.7X faster
128 157.62 691.67 4.4X faster

Better performance for the citus_shards view

When there are lots of shards in a Citus cluster, this citus_shards view could sometimes take longer than you’d like to respond queries. Now, it is oh so much faster.

This view is relevant if you find yourself wanting to see the sizes of individual shards, you’re probably familiar with the citus_shards view.

For example, if your application is a SaaS application that is multi-tenant, you may want to see if a shard is larger than others, to understand if some of your tenants are much bigger than others:

-- for each distributed table in the cluster  
-- find the min,max and avg. shard sizes  
SELECT TABLE_NAME,  
       pg_size_pretty(min(shard_size)) AS min_shard_size,  
       pg_size_pretty(max(shard_size)) AS max_shard_size,  
       pg_size_pretty(avg(shard_size)) AS avg_size_shard  
FROM citus_shards  
GROUP BY TABLE_NAME  
ORDER BY TABLE_NAME;  
  
 table_name | min_shard_size | max_shard_size | avg_size_shard  
------------+----------------+----------------+---------------  
tenant_data | 11 MB          | 357 MB         | 52 MB  
(1 row)  

This citus_shards view is also a useful way to see if your data is balanced well or not. If you observe any imbalances in the total data sizes across the nodes, you can kick in the Citus shard rebalancer to re-distribute the data based on the disk size with rebalance_strategy:= 'by_disk_size'.

SELECT nodename,  
       nodeport,  
       pg_size_pretty(sum(shard_size))  
FROM citus_shards  
WHERE citus_table_type = 'distributed'  
GROUP BY nodename, nodeport;  
  
                    nodename                    | nodeport | pg_size_pretty  
------------------------------------------------+----------+---------------  
private-w1.columnar.postgres.database.azure.com |     5432 | 521 MB  
private-w0.columnar.postgres.database.azure.com |     5432 | 175 MB  
(2 rows)  

Inline VALUES in CTEs as subqueries

The VALUES clause inside a CTE is a handy struct that several of you probably rely on. Before 10.1, Citus was always materializing such CTEs. With Citus 10.1, now such CTEs are inlined by default… hence “pushed down” to the worker nodes. (You’ll hear us talk about pushdown queries a lot in Citus.)

Why should you care? This improvement is useful for two reasons:

  • Network round trips: Citus now avoids one extra network round-trip in this scenario.
  • Query performance: With some larger data sizes, more efficient query plans on the shards are chosen, hence the query execution times are decreased when the VALUES clause is pushed instead of materialized.

So, if you write a SQL query like this, using the VALUES clause:

WITH tmpUserEvents (userid, category) AS (VALUES (10, 3), (11, 3))  
SELECT  count(*)  
FROM    events_table  
WHERE   (user_id, category) IN (SELECT user_id, category FROM tmpUserEvents);  

Then, under the covers, as of 10.1, Citus now will inline the CTE as follows (we simplified this query a bit for the sake of the example):

SELECT  count(*)  
FROM    events_table  
WHERE   (user_id, category) IN ((VALUES (10, 3), (11, 3)));  

As with all CTEs, the default behavior can be overridden by the MATERIALIZED keyword to force the planner to materialize the results and use pull-push execution.

UX improvements using Citus with Postgres partitions

By their very nature, partitioned tables in Postgres tend to have really long names, such as some_descriptive_name_2020_01_01. When Citus is involved, the names become even longer since Citus automatically adds shard identifiers to the table names: some_descriptive_name_2020_01_01_102008.

However, PostgreSQL only supports table names up to 63 chars, and truncates the rest:

CREATE TABLE taaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaable_name_with_65_chars(key int);  
NOTICE:  identifier "taaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaable_name_with_65_chars" will be truncated to "taaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaable_name_with_65_cha"  
CREATE TABLE  

The good news: in Citus 10.1 we have removed the length limits around partition names. Citus had several edge cases around naming of distributed tables or indexes with really long names, and Citus 10.1 fixes all of these issues and users can pick as long table names as they wish.

So the issues that are now fixed:

  • Cannot rename table to have a name >56 characters in length (Issue #4211)
  • alter_table_set_access_method fails in case of long names (Issue #4665)
  • Distributed deadlock when creating index on a partitioned table (Issue #4412)
  • Self deadlock if the index name is long when the placements are on the coordinator (Issue #4238)

Protection from accidentally dropping & truncating shards

If you’re using Citus to shard Postgres on a single node and you connect to the Citus coordinator with psql and then use \d to show relations, Citus is smart enough to not show the shards in the \d output.

But if you use a different interactive tool and you’ve sharded Citus on a single node, you might be able to see the shards on the coordinator—and you might find that confusing. Especially if you’re new to Citus, and especially if you’re not expecting to see both the shards and the distributed tables on the same node. This happened to a Citus user recently… who then, thinking these shards did not belong, tried to drop the shards. And as you know, dropping shards unintentionally—just like accidentally dropping tables—can create some problems. (That’s an understatement.)

This fix in Citus 10.1 will prevent you from accidentally dropping (or truncating) Citus shards on the coordinator.

Add shard_count parameter to create_distributed_table

Prior to 10.1, you already had a GUC you could use to change the shard count on your tables. It is called citus.shard_count which is a setting for Citus tables. But it was a bit awkward to set a GUC to change the shard count. So we’ve made things easier.

As of Citus 10.1, you can set the (optional) shard_count parameter when using the create_distributed_table function to distribute your table—whether you are sharding across a single Citus node or across a distributed cluster. The new shard_count parameter is simple to play with and makes it easier to test with different shard counts when experimenting with Citus. The default citus.shard_count is 32 if you do not change it nor set the parameter.

If you’ve already read Ozan’s “Citus Tips for Postgres” post about how to alter the distribution key and shard count, you know that shard_count was already a parameter for alter_distributed_table. And with 10.1, shard_count is now a parameter for create_distributed_table, too.

Many backlog improvements to make your life easier

Whether fixing bugs, optimizing performance, or adding powerful new features like columnar storage, you could say that everything our Citus team does is part of our mission to bring you Postgres at any scale.

In this post, I covered the highlights of the improvements that Citus 10.1 brings to you. But there are more. if you want the itemized list of all the improvements, please check out the Changelog for Citus 10.1.0 over in our GitHub repo.

  • Getting Started If you’re looking to get started with Citus to try things out, the open source Citus docs are pretty popular and include tutorials for time series workloads, multi-tenant SaaS apps, and customer-facing analytics dashboards. We’ve also put together a bunch of our favorite resources in the Citus getting started page. And if you’re looking for the download instructions, they’re just a click away.

  • Stay in touch: If you’re already a Citus user—hopefully a happy Citus user (?)—be sure to join our Citus slack if you ever have questions—and sign up for the Citus monthly newsletter.

  • Watch one of our newest demos: Umur Cubukcu recently published this awesome video demo about the Hyperscale (Citus) superpowers for scaling Postgres. Just 5 minutes, it’s quite nicely done.

您的愿望将传达给PG kernel hacker、数据库厂商等, 帮助提高数据库产品质量和功能, 说不定下一个PG版本就有您提出的功能点. 针对非常好的提议,奖励限量版PG文化衫、纪念品、贴纸、PG热门书籍等,奖品丰富,快来许愿。开不开森.

digoal's wechat