-
Notifications
You must be signed in to change notification settings - Fork 1.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Refactor signatures for lpad, rpad, left, and right #13420
base: main
Are you sure you want to change the base?
Conversation
You can run
locally to reproduce the test failures. |
@@ -1864,10 +1864,10 @@ query TT | |||
EXPLAIN SELECT letter, letter = LEFT(letter2, 1) FROM simple_string; | |||
---- | |||
logical_plan | |||
01)Projection: simple_string.letter, simple_string.letter = left(simple_string.letter2, Int64(1)) | |||
01)Projection: simple_string.letter, simple_string.letter = left(CAST(simple_string.letter2 AS Utf8View), Int64(1)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm thinking maybe we should avoid casting type if they have the same Logical type like utf8 -> utf8view. 🤔
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would agree. If the signature allows for a string and receives a Utf8 it should accept it as is unless it needs to be coerced to a common type for some other reason. The less casting the better imho
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
100% -- casts can often require trivial computation during query -- in this particular case casting letter2
to a Utf8View means it will copy at least an additional 128 bytes for each row (each view is 128 bytes)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the suggestion. I implement not to cast if their logical types are the same. However, it is failing in some cases where Dictionary
is in the signature. In those cases, the logical type is the same, but the native type is Dictionary
causing type mismatch. The test can be reproduced
cargo test --test sqllogictests -- jctest
Any suggestions?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We might need to implement the kernel for dictionary type.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see. Can you elaborate on how we implement the kernel for dictionary type
? I can give it a try.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Check the fn invoke
function for each function.
For example lpad
fn invoke(&self, args: &[ColumnarValue]) -> Result<ColumnarValue> {
match args[0].data_type() {
Utf8 | Utf8View => make_scalar_function(lpad::<i32>, vec![])(args),
LargeUtf8 => make_scalar_function(lpad::<i64>, vec![])(args),
other => exec_err!("Unsupported data type {other:?} for function lpad"),
}
}
It support utf8/utf8view/largeutf8, but not dictionary. You can rewrite it like this
fn invoke(&self, args: &[ColumnarValue]) -> Result<ColumnarValue> {
invoke_inner(args, &args[0].data_type())
}
fn invoke_inner(args: &[ColumnarValue], data_type: &DataType) -> Result<ColumnarValue> {
match data_type {
DataType::Dictionary(_, v) => {
invoke_inner(args, v.as_ref())
}
Utf8 | Utf8View => make_scalar_function(lpad::<i32>, vec![])(args),
LargeUtf8 => make_scalar_function(lpad::<i64>, vec![])(args),
other => exec_err!("Unsupported data type {other:?} for function lpad"),
}
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the pointer! I added the support for left
function. Can you give a read of whether that is a good design?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hey @jayzhan211, I have added dictionary support for those four functions and repeat
because the logical type casting skipping fails some relevant tests. Please let me know if you have any thoughts on the current implementation.
I also have a question when getting values from Dictionary
to get the actual string, it drops NULL
values, which causes some tests to fail. Is there any helper method I can use to get NULL
preserving values from the Dictionary
?
Which issue does this PR close?
Refactor signatures for lpad, rpad, left, and right. They share very similar signatures.
Closes some tasks in #13301.
What changes are included in this PR?
Signature changes.
Are these changes tested?
Existing tests.
Are there any user-facing changes?
No