fix: Ignore names of technical inner fields (of List and Map types) when comparing datatypes for logical equivalence #13522
+151
−3
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Which issue does this PR close?
Closes #13437
Rationale for this change
Explained more in the issue, but in short: In Substrait consumer we check schemas of the input dataset and the Substrait input relation using
logically_equivalent_names_and_types(..)
. This then callsdatatype_is_logically_equal(..)
on all fields, which can fail if the technical inner fields of a list or map have differing names. That happens to be the case when reading lists from parquet, as the parquet reader uses "element" as the name vs DF (incl. the substrait consumer) mostly using "item".I think this makes sense, since arguably the names here shouldn't matter, and since Arrow doesn't mandate any specific names for these fields, we should ignore them.
What changes are included in this PR?
Ignore technical inner fields' names when comparing data types for logical equivalence.
Arguably we should ignore these names in all equivalence testing. That's a bigger change and might be hard to even enumerate all the places to check, so I only did the minimal thing I need here, but if it'd be preferred, I can try to expand to other cases as well - at least
datatype_is_semantically_equal
.Are these changes tested?
Added unit test
Are there any user-facing changes?
No