Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Numeric coercion failure for expressions passed to UDF #4864

Closed
jonmmease opened this issue Jan 9, 2023 · 2 comments
Closed

Numeric coercion failure for expressions passed to UDF #4864

jonmmease opened this issue Jan 9, 2023 · 2 comments
Labels
bug Something isn't working

Comments

@jonmmease
Copy link
Contributor

Describe the bug
I'm seeing some unintuitive behavior around type coercion for UDFs that input integers.

  1. float values are passed through to the UDF without error and without coercion to integer type.
  2. When the argument is an expression that adds a float and int, the physical planner raises an arrow that "The type of Float64 + Int64 of binary physical should be same"

To Reproduce
Here is a self-contained example that reproduces these two issues

#[cfg(test)]
mod tests_types {
    use std::sync::Arc;
    use datafusion::arrow::array::{ArrayRef, Float64Array, StringArray, TimestampMillisecondArray};
    use datafusion::arrow::datatypes::{DataType, Field, Schema, SchemaRef, TimeUnit};
    use datafusion::arrow::record_batch::RecordBatch;
    use datafusion::arrow::util::pretty::pretty_format_batches;
    use datafusion::datasource::MemTable;
    use datafusion::logical_expr::{ColumnarValue, ReturnTypeFunction, ScalarFunctionImplementation, ScalarUDF, Signature, Volatility};
    use datafusion::prelude::SessionContext;

    #[tokio::test]
    async fn test() {
        // Create context and register table
        let ctx = SessionContext::new();

        // Register custom UDF
        ctx.register_udf(make_int_udf());

        // Perform query 1.
        // The UDF is called with Float64 arguments rather than raise an error 
        // or coerce float to integer
        let res = ctx.sql(r#"
SELECT int_udf(1.0)
    "#).await.unwrap().collect().await.unwrap();

        let formatted = pretty_format_batches(res.as_slice()).unwrap();
        println!("{}", formatted);

        // Perform query 2
        // An error is raised: "The type of Float64 + Int64 of binary physical should be same"
        let res = ctx.sql(r#"
SELECT int_udf(1.0 + 0)
    "#).await.unwrap().collect().await.unwrap();
    }

    pub fn make_int_udf() -> ScalarUDF {
        let datetime_components: ScalarFunctionImplementation =
            Arc::new(move |args: &[ColumnarValue]| {
                return Ok(args[0].clone())
            });

        let return_type: ReturnTypeFunction =
            Arc::new(move |_| Ok(Arc::new(DataType::Int64)));

        let signature = Signature::exact(
            vec![
                DataType::Int64, // month
            ],
            Volatility::Immutable,
        );
        ScalarUDF::new(
            "int_udf",
            &signature,
            &return_type,
            &datetime_components,
        )
    }
}

Output

+---------------------+
| int_udf(Float64(1)) |
+---------------------+
| 1                   |
+---------------------+

called `Result::unwrap()` on an `Err` value: Internal("The type of Float64 + Int64 of binary physical should be same")
thread 'tests_types::test' panicked at 'called `Result::unwrap()` on an `Err` value: Internal("The type of Float64 + Int64 of binary physical should be same")', src/lib.rs:270:40
stack backtrace:

Expected behavior
For query 1, I don't know if the intension is to coerce the float to an int, but I would expect either:

  • An error stating that a Float64 cannot be Coerced to an Int64
  • The UDF to be called with an Int64 array

For query 2, I would expect either of the outcomes above, but not an error in the physical planner.

Additional context
A similar error message was reported by @andygrove in The physical planner error message

Maybe related to #4615?

@jonmmease jonmmease added the bug Something isn't working label Jan 9, 2023
@Jefffrey
Copy link
Contributor

Jefffrey commented Jan 1, 2024

Running with modified version of example on latest main:

use datafusion::error::Result;
use datafusion::prelude::*;
use datafusion::{arrow::datatypes::DataType, logical_expr::Volatility};
use datafusion_expr::{
    ColumnarValue, ReturnTypeFunction, ScalarFunctionImplementation, ScalarUDF, Signature,
};
use std::sync::Arc;

pub fn make_int_udf() -> ScalarUDF {
    let datetime_components: ScalarFunctionImplementation =
        Arc::new(move |args: &[ColumnarValue]| Ok(args[0].clone()));

    let return_type: ReturnTypeFunction =
        Arc::new(move |_| Ok(Arc::new(DataType::Int64)));

    let signature = Signature::exact(vec![DataType::Int64], Volatility::Immutable);
    ScalarUDF::new("int_udf", &signature, &return_type, &datetime_components)
}

#[tokio::main]
async fn main() -> Result<()> {
    let ctx = SessionContext::new();
    ctx.register_udf(make_int_udf());

    // Perform query 1.
    dbg!(ctx.sql("SELECT int_udf(1.0)").await?.show().await);

    // Perform query 2
    dbg!(ctx.sql("SELECT int_udf(1.0 + 0)").await?.show().await);

    Ok(())
}

Output:

[datafusion-examples/examples/simple_udf.rs:44] ctx.sql("SELECT int_udf(1.0)").await?.show().await = Err(
    Context(
        "type_coercion",
        Plan(
            "Coercion from [Float64] to the signature Exact([Int64]) failed.",
        ),
    ),
)
[datafusion-examples/examples/simple_udf.rs:47] ctx.sql("SELECT int_udf(1.0 + 0)").await?.show().await = Err(
    Context(
        "type_coercion",
        Plan(
            "Coercion from [Float64] to the signature Exact([Int64]) failed.",
        ),
    ),
)

Can see both now have consistent behaviour of failing during type coercion.

Probably can close this issue as resolved if this is satisfactory?

cc @alamb @jonmmease

@alamb
Copy link
Contributor

alamb commented Jan 1, 2024

Thank you for helping clean up these tickets @Jefffrey -- much appreciated

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

3 participants