fix: Ignore names of technical inner fields (of List and Map types) when comparing datatypes for logical equivalence #13522

Blizzara · 2024-11-21T21:55:01Z

Which issue does this PR close?

Rationale for this change

Explained more in the issue, but in short: In Substrait consumer we check schemas of the input dataset and the Substrait input relation using logically_equivalent_names_and_types(..). This then calls datatype_is_logically_equal(..) on all fields, which can fail if the technical inner fields of a list or map have differing names. That happens to be the case when reading lists from parquet, as the parquet reader uses "element" as the name vs DF (incl. the substrait consumer) mostly using "item".

I think this makes sense, since arguably the names here shouldn't matter, and since Arrow doesn't mandate any specific names for these fields, we should ignore them.

What changes are included in this PR?

Ignore technical inner fields' names when comparing data types for logical equivalence.

Arguably we should ignore these names in all equivalence testing. That's a bigger change and might be hard to even enumerate all the places to check, so I only did the minimal thing I need here, but if it'd be preferred, I can try to expand to other cases as well - at least datatype_is_semantically_equal.

Are these changes tested?

Added unit test

Are there any user-facing changes?

No

…hen comparing datatypes for logical equivalence

fix: Ignore names of technical inner fields (of List and Map types) w…

f4ba8e3

…hen comparing datatypes for logical equivalence

github-actions bot added the common Related to common crate label Nov 21, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: Ignore names of technical inner fields (of List and Map types) when comparing datatypes for logical equivalence #13522

fix: Ignore names of technical inner fields (of List and Map types) when comparing datatypes for logical equivalence #13522

Blizzara commented Nov 21, 2024 •

edited

Loading

fix: Ignore names of technical inner fields (of List and Map types) when comparing datatypes for logical equivalence #13522

Are you sure you want to change the base?

fix: Ignore names of technical inner fields (of List and Map types) when comparing datatypes for logical equivalence #13522

Conversation

Blizzara commented Nov 21, 2024 • edited Loading

Which issue does this PR close?

Rationale for this change

What changes are included in this PR?

Are these changes tested?

Are there any user-facing changes?

Blizzara commented Nov 21, 2024 •

edited

Loading