-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Refactor signatures for lpad, rpad, left, and right #13420
Draft
jiashenC
wants to merge
6
commits into
apache:main
Choose a base branch
from
jiashenC:refactor_str_manipulation_func
base: main
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
+187
−58
Draft
Changes from 2 commits
Commits
Show all changes
6 commits
Select commit
Hold shift + click to select a range
91fae89
update
jiashenC 89988d8
update
jiashenC 6723ad0
no casting if the logical types are the same
jiashenC 30f4ac9
support dict type in left
jiashenC c66cee6
remove print
jiashenC cf1048a
support dict type for four functions
jiashenC File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm thinking maybe we should avoid casting type if they have the same Logical type like utf8 -> utf8view. 🤔
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would agree. If the signature allows for a string and receives a Utf8 it should accept it as is unless it needs to be coerced to a common type for some other reason. The less casting the better imho
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
100% -- casts can often require trivial computation during query -- in this particular case casting
letter2
to a Utf8View means it will copy at least an additional 128 bytes for each row (each view is 128 bytes)There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the suggestion. I implement not to cast if their logical types are the same. However, it is failing in some cases where
Dictionary
is in the signature. In those cases, the logical type is the same, but the native type isDictionary
causing type mismatch. The test can be reproducedAny suggestions?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We might need to implement the kernel for dictionary type.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Check the
fn invoke
function for each function.For example
lpad
It support utf8/utf8view/largeutf8, but not dictionary. You can rewrite it like this
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the pointer! I added the support for
left
function. Can you give a read of whether that is a good design?There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hey @jayzhan211, I have added dictionary support for those four functions and
repeat
because the logical type casting skipping fails some relevant tests. Please let me know if you have any thoughts on the current implementation.I also have a question when getting values from
Dictionary
to get the actual string, it dropsNULL
values, which causes some tests to fail. Is there any helper method I can use to getNULL
preserving values from theDictionary
?There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
datafusion/datafusion/functions-nested/src/string.rs
Line 394 in bfabd48
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems that iterating values in Dict is quite tricky