Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement GroupColumn Decimal128Array #13505

Open
Tracked by #12680
alamb opened this issue Nov 20, 2024 · 3 comments
Open
Tracked by #12680

Implement GroupColumn Decimal128Array #13505

alamb opened this issue Nov 20, 2024 · 3 comments
Assignees

Comments

@alamb
Copy link
Contributor

alamb commented Nov 20, 2024

Is your feature request related to a problem or challenge?

In #12269 @jayzhan211 made significant improvements to how group values are stored in multi-column aggregations.

Specifically for queries like

SELECT ... FROM ... GROUP BY col1, ... colN

The improvement relies on implementing specialized versions of GroupColumn for the types of col1, colN

We have implemented the primitive types and Strings/StringViews now, but we have not implemented all types

This means queries like

SELECT ... FROM ... GROUP BY int_cl, decimal_col

Will fall back to the slower (but general) GroupValuesRows:

/// representation.
pub struct GroupValuesRows {

Describe the solution you'd like

Implement GroupColumn for Decimal128 types.

You can see how to do this here:

macro_rules! downcast_helper {
($t:ty, $d:ident) => {
return Ok(Box::new(GroupValuesPrimitive::<$t>::new($d.clone())))
};
}

@jonathanc-n also made a really nice PR here

and the make sure there are tests for each of those types in queries that group on multiple columns

Describe alternatives you've considered

No response

Additional context

Here is an example for how this was done for Strings: #12809

@alamb
Copy link
Contributor Author

alamb commented Nov 20, 2024

BTW we can verify that this is working as expected after merging

Then you can do

cd datafusion-cli
RUST_LOG=debug cargo run -- -c "create or replace table foo(x decimal(10,3), y int) as values (10.0, 100), (21.2, 200), (33.0, 300); select count(*) from foo group by x, y";

You should not see any lines about Creating GroupValuesRows . Here is what is printed out on main

[2024-11-20T22:08:58Z DEBUG datafusion_physical_plan::aggregates::group_values::row] Creating GroupValuesRows for schema: Field { name: "x", data_type: Decimal128(10, 3), nullable: true, dict_id: 0, dict_is_ordered: false, metadata: {} }, Field { name: "y", data_type: Int32, nullable: true, dict_id: 0, dict_is_ordered: false, metadata: {} }

@jonathanc-n
Copy link
Contributor

take

@jonathanc-n
Copy link
Contributor

jonathanc-n commented Nov 21, 2024

@alamb For this pr, will it need its own custom column implementation for decimal128 instead of instantiate_primitive!, similar to how byte, byteview, stringview, etc. are dealt with? I am thinking that due to the parameters

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants