-
Notifications
You must be signed in to change notification settings - Fork 30
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Implement CTAS for columnstore #75
Conversation
src/pgduckdb/pgduckdb_ddl.cpp
Outdated
@@ -57,8 +58,8 @@ DuckdbHandleDDL(Node *parsetree) { | |||
if (IsA(parsetree, CreateTableAsStmt)) { | |||
auto stmt = castNode(CreateTableAsStmt, parsetree); | |||
char *access_method = stmt->into->accessMethod ? stmt->into->accessMethod : default_table_access_method; | |||
if (strcmp(access_method, "duckdb") != 0) { | |||
/* not a duckdb table, so don't mess with the query */ | |||
if (strcmp(access_method, "duckdb") != 0 && strcmp(access_method, "columnstore") != 0) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is necessary to allow stmt->into->skipData = true;
. We implement our CTAS logic after prev_process_utility_hook
which creates our table but "skips" inserting any data as a result. This is equivalent to running CREATE TABLE AS ... WITH NO_DATA
manually, although it feels a little more natural to use existing codepaths.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I’d prefer not to mix code into DuckdbHandleDDL()
. Instead, copy the following code
ctas_skip_data = stmt->into->skipData;
stmt->into->skipData = true;
to wherever it's needed
DuckdbHandleDDL()
is specifically for DuckDB temp tables, which we don’t enable at all. Ideally, we should comment out the call to DuckdbHandleDDL()
in DuckdbUtilityHook_Cpp()
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Given this comment I don't think we need to set ctas_skip_data
. Otherwise, sounds good.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But then how do you handle CREATE TABLE AS ... WITH NO DATA
?
src/pgduckdb/pgduckdb_ddl.cpp
Outdated
@@ -57,8 +58,8 @@ DuckdbHandleDDL(Node *parsetree) { | |||
if (IsA(parsetree, CreateTableAsStmt)) { | |||
auto stmt = castNode(CreateTableAsStmt, parsetree); | |||
char *access_method = stmt->into->accessMethod ? stmt->into->accessMethod : default_table_access_method; | |||
if (strcmp(access_method, "duckdb") != 0) { | |||
/* not a duckdb table, so don't mess with the query */ | |||
if (strcmp(access_method, "duckdb") != 0 && strcmp(access_method, "columnstore") != 0) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I’d prefer not to mix code into DuckdbHandleDDL()
. Instead, copy the following code
ctas_skip_data = stmt->into->skipData;
stmt->into->skipData = true;
to wherever it's needed
DuckdbHandleDDL()
is specifically for DuckDB temp tables, which we don’t enable at all. Ideally, we should comment out the call to DuckdbHandleDDL()
in DuckdbUtilityHook_Cpp()
|
||
CREATE TABLE t USING columnstore AS SELECT * FROM c; | ||
SELECT * FROM t; | ||
DROP TABLE t; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we also add a test for CTAS from columnstore to Postgres heap table? It seems that it's also working duckdb/pg_duckdb#520
// if (strcmp(access_method, "duckdb") != 0 && strcmp(access_method, "columnstore") != 0) { | ||
// /* not a duckdb or columnstore table, so don't mess with the query */ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should revert the previous changes here
@@ -0,0 +1,26 @@ | |||
LOAD 'pg_mooncake'; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is no longer needed
} | ||
|
||
auto *stmt = (CreateTableAsStmt *)parsetree; | ||
if (!stmt->into->accessMethod) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems that when stmt->into->accessMethod
is unset, PG will use default table access method, and we need to make sure it's not columnstore
prev_process_utility_hook(pstmt, query_string, read_only_tree, context, params, query_env, dest, qc); | ||
|
||
if (auto *ctas_stmt = get_columnstore_ctas_stmt()) { | ||
auto *ctas_query = (Query *)ctas_stmt->query; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It’s preferable to query DuckDB directly rather than using the PostgreSQL interface, as the latter internally translates and issues queries to DuckDB
elog(ERROR, "Failed to connect to SPI for INSERT INTO execution."); | ||
} | ||
|
||
int ret = SPI_execute(insert_query.c_str(), false, 0); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it possible that CREATE TABLE AS ... WITH NO_DATA
succeeds, but INSERT ... SELECT
fails? We need to ensure that these two queries are performed atomically
Use `CREATE TABLE AS ... WITH NO DATA` followed by `INSERT ... SELECT`
@nbiscaro |
pushed 55f5667 |
Thanks for cleaning this up. Looks great! |
Use `CREATE TABLE AS ... WITH NO DATA` followed by `INSERT ... SELECT`
Implements #26
Adding support for
CREATE TABLE AS SELECT
for columnstore tables.This enhancement allows users to seamlessly execute CTAS (CREATE TABLE AS SELECT) queries between columnstore and any existing heap or columnstore tables. For example: