Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement CTAS for columnstore #75

Closed
wants to merge 6 commits into from

Conversation

nbiscaro
Copy link
Contributor

@nbiscaro nbiscaro commented Jan 7, 2025

Implements #26

Adding support for CREATE TABLE AS SELECT for columnstore tables.

This enhancement allows users to seamlessly execute CTAS (CREATE TABLE AS SELECT) queries between columnstore and any existing heap or columnstore tables. For example:

CREATE TABLE columnstore_table USING columnstore AS SELECT * FROM heap_or_columnstore_table;

@@ -57,8 +58,8 @@ DuckdbHandleDDL(Node *parsetree) {
if (IsA(parsetree, CreateTableAsStmt)) {
auto stmt = castNode(CreateTableAsStmt, parsetree);
char *access_method = stmt->into->accessMethod ? stmt->into->accessMethod : default_table_access_method;
if (strcmp(access_method, "duckdb") != 0) {
/* not a duckdb table, so don't mess with the query */
if (strcmp(access_method, "duckdb") != 0 && strcmp(access_method, "columnstore") != 0) {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is necessary to allow stmt->into->skipData = true;. We implement our CTAS logic after prev_process_utility_hook which creates our table but "skips" inserting any data as a result. This is equivalent to running CREATE TABLE AS ... WITH NO_DATA manually, although it feels a little more natural to use existing codepaths.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I’d prefer not to mix code into DuckdbHandleDDL(). Instead, copy the following code

ctas_skip_data = stmt->into->skipData;
stmt->into->skipData = true;

to wherever it's needed

DuckdbHandleDDL() is specifically for DuckDB temp tables, which we don’t enable at all. Ideally, we should comment out the call to DuckdbHandleDDL() in DuckdbUtilityHook_Cpp()

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Given this comment I don't think we need to set ctas_skip_data. Otherwise, sounds good.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But then how do you handle CREATE TABLE AS ... WITH NO DATA?

@@ -57,8 +58,8 @@ DuckdbHandleDDL(Node *parsetree) {
if (IsA(parsetree, CreateTableAsStmt)) {
auto stmt = castNode(CreateTableAsStmt, parsetree);
char *access_method = stmt->into->accessMethod ? stmt->into->accessMethod : default_table_access_method;
if (strcmp(access_method, "duckdb") != 0) {
/* not a duckdb table, so don't mess with the query */
if (strcmp(access_method, "duckdb") != 0 && strcmp(access_method, "columnstore") != 0) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I’d prefer not to mix code into DuckdbHandleDDL(). Instead, copy the following code

ctas_skip_data = stmt->into->skipData;
stmt->into->skipData = true;

to wherever it's needed

DuckdbHandleDDL() is specifically for DuckDB temp tables, which we don’t enable at all. Ideally, we should comment out the call to DuckdbHandleDDL() in DuckdbUtilityHook_Cpp()


CREATE TABLE t USING columnstore AS SELECT * FROM c;
SELECT * FROM t;
DROP TABLE t;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we also add a test for CTAS from columnstore to Postgres heap table? It seems that it's also working duckdb/pg_duckdb#520

Comment on lines +61 to +62
// if (strcmp(access_method, "duckdb") != 0 && strcmp(access_method, "columnstore") != 0) {
// /* not a duckdb or columnstore table, so don't mess with the query */
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should revert the previous changes here

@@ -0,0 +1,26 @@
LOAD 'pg_mooncake';
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is no longer needed

}

auto *stmt = (CreateTableAsStmt *)parsetree;
if (!stmt->into->accessMethod) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems that when stmt->into->accessMethod is unset, PG will use default table access method, and we need to make sure it's not columnstore

prev_process_utility_hook(pstmt, query_string, read_only_tree, context, params, query_env, dest, qc);

if (auto *ctas_stmt = get_columnstore_ctas_stmt()) {
auto *ctas_query = (Query *)ctas_stmt->query;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It’s preferable to query DuckDB directly rather than using the PostgreSQL interface, as the latter internally translates and issues queries to DuckDB

elog(ERROR, "Failed to connect to SPI for INSERT INTO execution.");
}

int ret = SPI_execute(insert_query.c_str(), false, 0);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it possible that CREATE TABLE AS ... WITH NO_DATA succeeds, but INSERT ... SELECT fails? We need to ensure that these two queries are performed atomically

dpxcc pushed a commit that referenced this pull request Jan 9, 2025
Use `CREATE TABLE AS ... WITH NO DATA` followed by `INSERT ... SELECT`
@dpxcc
Copy link
Contributor

dpxcc commented Jan 9, 2025

@nbiscaro
I've cleaned up your implementation: 55f5667
It addressed all comments and I manually verified that CREATE TABLE AS ... WITH NO DATA and INSERT ... SELECT are atomic
Let me know if you have any comments, otherwise I'm planning to merge it later today so that we can release v0.1 this Friday

@dpxcc
Copy link
Contributor

dpxcc commented Jan 9, 2025

pushed 55f5667

@dpxcc dpxcc closed this Jan 9, 2025
@nbiscaro
Copy link
Contributor Author

nbiscaro commented Jan 9, 2025

Thanks for cleaning this up. Looks great!

dpxcc pushed a commit that referenced this pull request Jan 10, 2025
Use `CREATE TABLE AS ... WITH NO DATA` followed by `INSERT ... SELECT`
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants