Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix approx_count_distinct for queries without a FROM #524

Merged
merged 5 commits into from
Jan 10, 2025

Conversation

JelteF
Copy link
Collaborator

@JelteF JelteF commented Jan 10, 2025

I noticed two issues with the new approx_count_distinct implementation:

  1. If no FROM clause was used it was not possible to use it
  2. It would not be detected correctly as duckdb-only without
    duckdb.force_execution = true (or some other mechanism). This
    fixes both of those issues.

Related to #499

I noticed two issues with the new `approx_count_distinct` implementation:

1. If no FROM clause was used it was not possible to use it
2. It would not be detected correctly as duckdb-only without
   `duckdb.force_execution = true` (or some  other mechanism). This
   fixes both of those issues.
*
* If there's no rtable, we're only selecting constants. From a performance
* perspective there's not really a point in using DuckDB. If we remove
* this heck many common queries that are used to inspect postgres will
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
* this heck many common queries that are used to inspect postgres will
* this hack many common queries that are used to inspect postgres will

*/
if (!query->rtable) {
if (!query->rtable && !throw_error) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks odd - if !query->rtable we should stop the function here (otherwise ContainsCatalogTable and subsequent calls would fail pretty badly no?)
Shouldn't it be something like

if (!query->rtable) {
   if (throw_error) {
      elog(...)

* still executing this in DuckDB.
*/
static bool
ContainsFromClause(Query *query, bool throw_error = false) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we really want to keep throw_error?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nope, copy paste mistake. Fixed now.

@@ -197,7 +216,7 @@ DuckdbPlannerHook_Cpp(Query *parse, const char *query_string, int cursor_options
IsAllowedStatement(parse, true);

return DuckdbPlanNode(parse, query_string, cursor_options, bound_params, true);
} else if (duckdb_force_execution && IsAllowedStatement(parse)) {
} else if (duckdb_force_execution && IsAllowedStatement(parse) && ContainsFromClause(parse)) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You want to move ContainsFromClause(parse) before IsAllowedStatement(parse) no? o/w we could hit the same issue calling ContainsCatalogTable w/ a nullptr no?

Copy link
Collaborator Author

@JelteF JelteF Jan 10, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's fine to call ContainsCatalogTable and ContainsPartitionedTable with a nullptr. They both call foreach_node on that pointer, and the nullptr is a valid (and I think the only valid) representation of an empty List.

@JelteF JelteF enabled auto-merge (squash) January 10, 2025 14:44
@JelteF JelteF merged commit 74c55e8 into main Jan 10, 2025
5 checks passed
@JelteF JelteF deleted the fix-approx-count-distinct-without-rtable branch January 10, 2025 14:48
ritwizsinha pushed a commit to ritwizsinha/pg_duckdb that referenced this pull request Jan 11, 2025
I noticed two issues with the new `approx_count_distinct`
implementation:

1. If no FROM clause was used it was not possible to use it
2. It would not be detected correctly as duckdb-only without
   `duckdb.force_execution = true` (or some  other mechanism). This
   fixes both of those issues.

Related to duckdb#499
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants