You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Describe the bug
I'm seeing some unintuitive behavior around type coercion for UDFs that input integers.
float values are passed through to the UDF without error and without coercion to integer type.
When the argument is an expression that adds a float and int, the physical planner raises an arrow that "The type of Float64 + Int64 of binary physical should be same"
To Reproduce
Here is a self-contained example that reproduces these two issues
#[cfg(test)]mod tests_types {use std::sync::Arc;use datafusion::arrow::array::{ArrayRef,Float64Array,StringArray,TimestampMillisecondArray};use datafusion::arrow::datatypes::{DataType,Field,Schema,SchemaRef,TimeUnit};use datafusion::arrow::record_batch::RecordBatch;use datafusion::arrow::util::pretty::pretty_format_batches;use datafusion::datasource::MemTable;use datafusion::logical_expr::{ColumnarValue,ReturnTypeFunction,ScalarFunctionImplementation,ScalarUDF,Signature,Volatility};use datafusion::prelude::SessionContext;#[tokio::test]asyncfntest(){// Create context and register tablelet ctx = SessionContext::new();// Register custom UDF
ctx.register_udf(make_int_udf());// Perform query 1.// The UDF is called with Float64 arguments rather than raise an error // or coerce float to integerlet res = ctx.sql(r#"SELECT int_udf(1.0) "#).await.unwrap().collect().await.unwrap();let formatted = pretty_format_batches(res.as_slice()).unwrap();println!("{}", formatted);// Perform query 2// An error is raised: "The type of Float64 + Int64 of binary physical should be same"let res = ctx.sql(r#"SELECT int_udf(1.0 + 0) "#).await.unwrap().collect().await.unwrap();}pubfnmake_int_udf() -> ScalarUDF{let datetime_components:ScalarFunctionImplementation =
Arc::new(move |args:&[ColumnarValue]| {returnOk(args[0].clone())});let return_type:ReturnTypeFunction =
Arc::new(move |_| Ok(Arc::new(DataType::Int64)));let signature = Signature::exact(vec![DataType::Int64,// month],Volatility::Immutable,);ScalarUDF::new("int_udf",&signature,&return_type,&datetime_components,)}}
Output
+---------------------+
| int_udf(Float64(1)) |
+---------------------+
| 1 |
+---------------------+
called `Result::unwrap()` on an `Err` value: Internal("The type of Float64 + Int64 of binary physical should be same")
thread 'tests_types::test' panicked at 'called `Result::unwrap()` on an `Err` value: Internal("The type of Float64 + Int64 of binary physical should be same")', src/lib.rs:270:40
stack backtrace:
Expected behavior
For query 1, I don't know if the intension is to coerce the float to an int, but I would expect either:
An error stating that a Float64 cannot be Coerced to an Int64
The UDF to be called with an Int64 array
For query 2, I would expect either of the outcomes above, but not an error in the physical planner.
Describe the bug
I'm seeing some unintuitive behavior around type coercion for UDFs that input integers.
To Reproduce
Here is a self-contained example that reproduces these two issues
Output
Expected behavior
For query 1, I don't know if the intension is to coerce the float to an int, but I would expect either:
For query 2, I would expect either of the outcomes above, but not an error in the physical planner.
Additional context
A similar error message was reported by @andygrove in The physical planner error message
Maybe related to #4615?
The text was updated successfully, but these errors were encountered: