-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix ExprSchema extraction of metadata for Cast expressions. #13305
Conversation
note: This is a draft PR only, since I'm not sure what is the correct behavior. See the PR description, and please advise. 🙏🏼 |
I'm not sure why metadata is allowed to change for cast 🤔 |
@sydduckworth Do you remember why #[test]
fn test_expr_metadata() {
let mut meta = HashMap::new();
meta.insert("bar".to_string(), "buzz".to_string());
let expr = col("foo");
let schema = MockExprSchema::new()
.with_data_type(DataType::Int32)
.with_metadata(meta.clone());
// col and alias should be metadata-preserving
assert_eq!(meta, expr.metadata(&schema).unwrap());
assert_eq!(meta, expr.clone().alias("bar").metadata(&schema).unwrap());
// cast should drop input metadata since the type has changed
assert_eq!(
HashMap::new(),
expr.clone()
.cast_to(&DataType::Int64, &schema)
.unwrap()
.metadata(&schema)
.unwrap()
);
let schema = DFSchema::from_unqualified_fields(
vec![Field::new("foo", DataType::Int32, true).with_metadata(meta.clone())]
.into(),
HashMap::new(),
)
.unwrap();
// verify to_field method populates metadata
assert_eq!(&meta, expr.to_field(&schema).unwrap().1.metadata());
} |
Maybe it is time to consider changing the check for output schema = input schema to ignore metadata -- I know we have been holding off on that for a long time, but maybe fundamentally the check is not feasible to maintain 🤔 |
For anyone else following along, the original PR that added this test and related code was |
When I wrote my PR, I had to make a decision about how to handle metadata propagation across expressions. The only one I wasn't sure about was Long story short: there's no reason for that assertion other than my arbitrary decision when implementing the feature. To me it seems reasonable to change the behavior of |
….a. cast preserves field metadata)
Thanks for your reply, I think we can skip the assertion given the change is little. We can consider to ignore the metadata if we meet any case that blocking us to matches the plan checking |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍
…3305) * test(12733): reproducers for schema bugs * fix(12733): properly extract field metadata from Cast expr * test(12733): update metadata preservation test, for new contract (a.k.a. cast preserves field metadata)
…3305) * test(12733): reproducers for schema bugs * fix(12733): properly extract field metadata from Cast expr * test(12733): update metadata preservation test, for new contract (a.k.a. cast preserves field metadata)
…3305) * test(12733): reproducers for schema bugs * fix(12733): properly extract field metadata from Cast expr * test(12733): update metadata preservation test, for new contract (a.k.a. cast preserves field metadata)
…3305) * test(12733): reproducers for schema bugs * fix(12733): properly extract field metadata from Cast expr * test(12733): update metadata preservation test, for new contract (a.k.a. cast preserves field metadata)
Which issue does this PR close?
Part of #12733
Rationale for this change
The field metadata was not being extracted from the ExprSchema for cast expressions. When cast was used as a projection on an aggregate, it would be missing the field metadata and fail the schema-misalignment check (see background epic).
The result was that the logical plan schema did not have the field metadata, whereas the physical plan schema did:
However, I'm not 💯 that this is the right fix. Since it triggers another test to fail -- becausePer the comment below, we agreed to have the cast expr retain the field metadata.cast should drop input metadata since the type has changed
. What is the proper behavior here? Instead, should we be removing the field metadata from the physical schema?What changes are included in this PR?
commit 1 = test case with reproducer. Demonstrates exactly which scenario fails (vs is ok).
commit 2 = fix, and update test cases
Are these changes tested?
yes
Are there any user-facing changes?
no