Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: Ignore names of technical inner fields (of List and Map types) when comparing datatypes for logical equivalence #13522

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

Blizzara
Copy link
Contributor

@Blizzara Blizzara commented Nov 21, 2024

Which issue does this PR close?

Closes #13437

Rationale for this change

Explained more in the issue, but in short: In Substrait consumer we check schemas of the input dataset and the Substrait input relation using logically_equivalent_names_and_types(..). This then calls datatype_is_logically_equal(..) on all fields, which can fail if the technical inner fields of a list or map have differing names. That happens to be the case when reading lists from parquet, as the parquet reader uses "element" as the name vs DF (incl. the substrait consumer) mostly using "item".

I think this makes sense, since arguably the names here shouldn't matter, and since Arrow doesn't mandate any specific names for these fields, we should ignore them.

What changes are included in this PR?

Ignore technical inner fields' names when comparing data types for logical equivalence.

Arguably we should ignore these names in all equivalence testing. That's a bigger change and might be hard to even enumerate all the places to check, so I only did the minimal thing I need here, but if it'd be preferred, I can try to expand to other cases as well - at least datatype_is_semantically_equal.

Are these changes tested?

Added unit test

Are there any user-facing changes?

No

…hen comparing datatypes for logical equivalence
@github-actions github-actions bot added the common Related to common crate label Nov 21, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
common Related to common crate
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Substrait consumer's schema check can fail for list (probs also map) columns
1 participant