Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix(expr-common): Coerce to Decimal(20, 0) when combining UInt64 with signed integers #14223

Open
wants to merge 5 commits into
base: main
Choose a base branch
from

Conversation

nuno-faria
Copy link
Contributor

Previously, when combining UInt64 with any signed integer, the resulting type would be Int64, which would result in lost information. Now, combining UInt64 with a signed integer results in a Decimal(20, 0), which is able to encode all (64-bit) integer types. Thanks @jonahgao for the pointers.

The function bitwise_coercion remains the same, since it's probably not a good idea to introduce decimals when performing bitwise operations. In this case, it converts (UInt64 | _) to UInt64.

Which issue does this PR close?

Closes #14208.

What changes are included in this PR?

  • Updated binary_numeric_coercion in expr-common/type_coercion/binary.rs.
  • Added new tests to expr-common/type_coercion/binary.rs.
  • Updated existing sqllogic tests to use the new coercion.

Are these changes tested?

Yes.

Are there any user-facing changes?

No.

… signed integers

Previously, when combining UInt64 with any signed integer, the resulting type would be Int64, which would result in lost information. Now, combining UInt64 with a signed integer results in a Decimal(20, 0), which is able to encode all (64-bit) integer types.
@github-actions github-actions bot added the sqllogictest SQL Logic Tests (.slt) label Jan 21, 2025
// accommodates all values of both types. Note that to avoid information
// loss when combining UInt64 with signed integers we use Decimal128(20, 0).
(Decimal128(20, 0), _)
| (_, Decimal128(20, 0))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks not correct. For example, combining Decimal128(20, 0) with Decimal128(30, 0) should not result in Decimal128(20, 0)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think when both types are decimal they are handled above before this match, when calling the decimal_coercion function.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I rechecked it, and decimal_coercion already covers them in

fn coerce_numeric_type_to_decimal(numeric_type: &DataType) -> Option<DataType> {
use arrow::datatypes::DataType::*;
// This conversion rule is from spark
// https://github.com/apache/spark/blob/1c81ad20296d34f137238dadd67cc6ae405944eb/sql/catalyst/src/main/scala/org/apache/spark/sql/types/DecimalType.scala#L127
match numeric_type {
Int8 => Some(Decimal128(3, 0)),
Int16 => Some(Decimal128(5, 0)),
Int32 => Some(Decimal128(10, 0)),
Int64 => Some(Decimal128(20, 0)),

Although it doesn't handle unsigned integer types, we can supplement it there, maybe as a follow-up PR.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So I think we shouldn't combine decimal with integer types here because decimal_coercion has already handled it.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tried that initially but then it would not handle UInt64 and Decimal128(20, 0):

Cannot infer common argument type for comparison operation UInt64 = Decimal128(20, 0)

So maybe it would be best to add new arms to the coerce_numeric_type_to_decimal to include unsigned integers as well?

    match numeric_type {
        Int8 => Some(Decimal128(3, 0)),
        Int16 => Some(Decimal128(5, 0)),
        Int32 => Some(Decimal128(10, 0)),
        Int64 => Some(Decimal128(20, 0)),
        Float32 => Some(Decimal128(14, 7)),
        Float64 => Some(Decimal128(30, 15)),
        _ => None,
    }

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So maybe it would be best to add new arms to the coerce_numeric_type_to_decimal to include unsigned integers as well?

I think so.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

@jonahgao
Copy link
Member

Perhaps we should add a sqllogictest test for #14208.

@nuno-faria
Copy link
Contributor Author

Perhaps we should add a sqllogictest test for #14208.

Done.

| (UInt32, Int8)
| (Int8, UInt32)
| (UInt32, Int16)
| (Int16, UInt32)
| (UInt32, Int32)
| (Int32, UInt32) => Some(Int64),
(UInt64, _) | (_, UInt64) => Some(UInt64),
Copy link
Member

@jonahgao jonahgao Jan 22, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We need to keep this arm because it can coerce UInt64 vs UInt8/16/32. The following query returns an incorrect result on this PR.

DataFusion CLI v44.0.0
> select arrow_typeof(column1) from values(arrow_cast(1, 'UInt8')), (arrow_cast(1, 'UInt64'));
+-----------------------+
| arrow_typeof(column1) |
+-----------------------+
| UInt8                 |
| UInt8                 |
+-----------------------+
2 row(s) fetched.
Elapsed 0.007 seconds.

> select arrow_typeof(column1) from values(arrow_cast(1, 'UInt8')), (arrow_cast(1000, 'UInt64'));
Arrow error: Cast error: Can't cast value 1000 to type UInt8

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done. I also added a new test to check that.

Copy link
Member

@jonahgao jonahgao left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thank you @nuno-faria

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
sqllogictest SQL Logic Tests (.slt)
Projects
None yet
Development

Successfully merging this pull request may close these issues.

SQL INSERT casts to Int64 when handling UInt64 values
2 participants