Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support for set returning functions that don't exist in DuckDB #483

Open
2 tasks
Leo-XM-Zeng opened this issue Dec 6, 2024 · 4 comments
Open
2 tasks

Support for set returning functions that don't exist in DuckDB #483

Leo-XM-Zeng opened this issue Dec 6, 2024 · 4 comments
Milestone

Comments

@Leo-XM-Zeng
Copy link
Contributor

Leo-XM-Zeng commented Dec 6, 2024

What happens?

When "Set Returning Function" as a FORM clause.
How can we distinguish between DuckDB execution and PG execution? I can't think of a solution yet.

test=# explain SELECT * FROM generate_series(1, 3); -- ok
                         QUERY PLAN                         
------------------------------------------------------------
 Custom Scan (DuckDBScan)  (cost=0.00..0.00 rows=0 width=0)
   DuckDB Execution Plan: 
 
 ┌───────────────────────────┐
 │      GENERATE_SERIES      │
 │    ────────────────────   │
 │         Function:         │
 │      GENERATE_SERIES      │
 │                           │
 │          ~2 Rows          │
 └───────────────────────────┘
 
 
(13 rows)

test=# SELECT * FROM generate_series(1, 3);
 generate_series 
-----------------
               1
               2
               3
(3 rows)

test=# SELECT * FROM duckdb.cache_info();
2024-12-06 23:58:25.193 CST [689272] WARNING:  01000: (PGDuckDB/CreatePlan) Prepared query returned an error: 'Catalog Error: Table Function with name cache_info does not exist!
        Did you mean "main.checkpoint"?
        LINE 1: ...e_file_size, cache_file_timestamp FROM duckdb.cache_info() cache_info(remote_p...
                                                          ^
2024-12-06 23:58:25.193 CST [689272] LOCATION:  CreatePlan, pgduckdb_planner.cpp:64
WARNING:  (PGDuckDB/CreatePlan) Prepared query returned an error: 'Catalog Error: Table Function with name cache_info does not exist!
Did you mean "main.checkpoint"?
LINE 1: ...e_file_size, cache_file_timestamp FROM duckdb.cache_info() cache_info(remote_p...
                                                  ^
 remote_path | cache_key | cache_file_size | cache_file_timestamp 
-------------+-----------+-----------------+----------------------
(0 rows)

test=# \df duckdb.cache_info
                             List of functions
 Schema |    Name    |    Result data type     | Argument data types | Type 
--------+------------+-------------------------+---------------------+------
 duckdb | cache_info | SETOF duckdb.cache_info |                     | func
(1 row)

test=# \df generate_series
                                                              List of functions
   Schema   |      Name       |         Result data type          |                        Argument data types                         | Type 
------------+-----------------+-----------------------------------+--------------------------------------------------------------------+------
 pg_catalog | generate_series | SETOF bigint                      | bigint, bigint                                                     | func
 pg_catalog | generate_series | SETOF bigint                      | bigint, bigint, bigint                                             | func
 pg_catalog | generate_series | SETOF integer                     | integer, integer                                                   | func
 pg_catalog | generate_series | SETOF integer                     | integer, integer, integer                                          | func
 pg_catalog | generate_series | SETOF numeric                     | numeric, numeric                                                   | func
 pg_catalog | generate_series | SETOF numeric                     | numeric, numeric, numeric                                          | func
 pg_catalog | generate_series | SETOF timestamp without time zone | timestamp without time zone, timestamp without time zone, interval | func
 pg_catalog | generate_series | SETOF timestamp with time zone    | timestamp with time zone, timestamp with time zone, interval       | func
 pg_catalog | generate_series | SETOF timestamp with time zone    | timestamp with time zone, timestamp with time zone, interval, text | func
(9 rows)

To Reproduce

SELECT * FROM duckdb.cache_info(); -- error
SELECT * FROM generate_series(1, 3); -- ok

OS:

centos8

pg_duckdb Version (if built from source use commit hash):

main

Postgres Version (if built from source use commit hash):

16.6

Hardware:

No response

Full Name:

Man Zeng

Affiliation:

Halo

What is the latest build you tested with? If possible, we recommend testing with the latest nightly build.

I have not tested with any build

Did you include all relevant data sets for reproducing the issue?

No - Other reason (please specify in the issue body)

Did you include all code required to reproduce the issue?

  • Yes, I have

Did you include all relevant configuration (e.g., CPU architecture, Linux distribution) to reproduce the issue?

  • Yes, I have
@mkaruza
Copy link
Collaborator

mkaruza commented Dec 9, 2024

HI, thanks for report. This is not a bug.
Postgres custom function duckdb.cache_info() doesn't exists in duckdb context so execution will fallback to postgres.

Currently we don't have support for custom postgres function that could be used inside duckdb execution (but this is one of features that we would like to have in future).

For example, consider this example that produce same issue

postgres=# CREATE FUNCTION dummy_func() RETURNS VOID AS $$
postgres$# BEGIN                       
postgres$# RETURN;
postgres$# END;
postgres$# $$ LANGUAGE plpgsql;
CREATE FUNCTION
postgres=# SELECT * FROM dummy_func();
 dummy_func 
------------
 
(1 row)

postgres=# SET duckdb.force_execution TO true;
SET
postgres=# SELECT * FROM dummy_func();
WARNING:  (PGDuckDB/CreatePlan) Prepared query returned an error: 'Catalog Error: Table Function with name dummy_func does not exist!
Did you mean "summary"?
LINE 1: SELECT dummy_func FROM dummy_func() dummy_func(dummy_func)
                               ^
 dummy_func 
------------
 
(1 row)

postgres=# 

To address your question - how execution can be distinguish between pg or duckdb, for now, EXPLAIN can give you indication where it will be executed

postgres=# EXPLAIN SELECT * FROM dummy_func();
WARNING:  (PGDuckDB/CreatePlan) Prepared query returned an error: 'Catalog Error: Table Function with name dummy_func does not exist!
Did you mean "summary"?
LINE 1: EXPLAIN SELECT dummy_func FROM dummy_func() dummy_func(dummy_func)
                                       ^
                          QUERY PLAN                           
---------------------------------------------------------------
 Function Scan on dummy_func  (cost=0.25..0.26 rows=1 width=4)
(1 row)

postgres=# 

@mkaruza mkaruza closed this as completed Dec 9, 2024
@Leo-XM-Zeng
Copy link
Contributor Author

嗨,感谢您的举报。这不是一个错误。 Postgres 自定义函数在 duckdb 上下文中不存在,因此执行将回退到 postgres。duckdb.cache_info()

目前,我们不支持可以在 duckdb 执行中使用的自定义 postgres 函数(但这是我们将来希望拥有的功能之一)。

例如,考虑这个生成相同的issue

postgres=# CREATE FUNCTION dummy_func() RETURNS VOID AS $$
postgres$# BEGIN                       
postgres$# RETURN;
postgres$# END;
postgres$# $$ LANGUAGE plpgsql;
CREATE FUNCTION
postgres=# SELECT * FROM dummy_func();
 dummy_func 
------------
 
(1 row)

postgres=# SET duckdb.force_execution TO true;
SET
postgres=# SELECT * FROM dummy_func();
WARNING:  (PGDuckDB/CreatePlan) Prepared query returned an error: 'Catalog Error: Table Function with name dummy_func does not exist!
Did you mean "summary"?
LINE 1: SELECT dummy_func FROM dummy_func() dummy_func(dummy_func)
                               ^
 dummy_func 
------------
 
(1 row)

postgres=# 

为了回答您的问题 - 目前如何区分 pg 或 duckdb 之间的执行可以为您提供指示它将在何处执行EXPLAIN

postgres=# EXPLAIN SELECT * FROM dummy_func();
WARNING:  (PGDuckDB/CreatePlan) Prepared query returned an error: 'Catalog Error: Table Function with name dummy_func does not exist!
Did you mean "summary"?
LINE 1: EXPLAIN SELECT dummy_func FROM dummy_func() dummy_func(dummy_func)
                                       ^
                          QUERY PLAN                           
---------------------------------------------------------------
 Function Scan on dummy_func  (cost=0.25..0.26 rows=1 width=4)
(1 row)

postgres=# 

Yes, I know what you mean. I was trying to get rid of the warning.
Since functions like generate_series have both pg and duckdb, duckdb can execute them.
But a function like duckdb.cache_info is private to pg, so it cannot be executed in duckdb to report a warning and then hand it over to pg.
This makes it difficult to tell when functions like these should be handled by duckdb. I want to ask you to see if there is a good way。

@Leo-XM-Zeng
Copy link
Contributor Author

Or when the function is in the from clause, we don't send it to duckdb for execution, Is that OK with you?

@JelteF JelteF changed the title When will the "Set Returning Function" be delivered to duckdb Support for set returning functions that don't exist in DuckDB Dec 11, 2024
@JelteF JelteF added this to the Long term milestone Dec 11, 2024
@JelteF
Copy link
Collaborator

JelteF commented Dec 11, 2024

I think this is something that we'll eventually want to support. But I think it should be built on top of the logic that we're implementing in #477. In #477 we push reading from postgres tables back down to postgres from duckdb. We should in theory be able to do the same for functions afaict. If the function exists in both postgres and DuckDB there's a choice we need to make though, if we push it down to PG or we use the DuckDB implementation.

I'll re-open this issue to track this long term feature. But I don't expect we will start on this soon. First we need #477 merged before someone can even start on this.

@JelteF JelteF reopened this Dec 11, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants