Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Compute ScalarFunction properties including return_type and nullable on creation #13825

Open
jayzhan211 opened this issue Dec 18, 2024 · 3 comments
Assignees
Labels
enhancement New feature or request

Comments

@jayzhan211
Copy link
Contributor

jayzhan211 commented Dec 18, 2024

Is your feature request related to a problem or challenge?

Continue on the discussion from #13756 (comment)

We create scalar function with the defined function udf and inputs args and keep them in Expr. We then coerce the inputs in optimizer step with TypeCoercionRewriter and compute the nullability when we create physical expr create_physical_expr.

Both return_type and nullability are computed based on the inputs which are defined by ScalarUDFImpl in logical layer. As long as we have udf and inputs: Vec<Expr> we can decide whether it has a valid types and what implicit coercion should we do, and we can define the return_type and nullablity based on those information.

I think such properties can be decided as early as possible (when the function is created in logical layer).

Once we have return_type, I think we can avoid call of data_types_with_scalar_udf again in ExprSchemable::get_type, coerce_arguments_for_signature_with_scalar_udf and create_physical_expr

Describe the solution you'd like

Compute the return_type and nullable as early as possible.

#[derive(Clone, PartialEq, Eq, PartialOrd, Hash, Debug)]
pub struct ScalarFunction {
    /// The function
    pub func: Arc<crate::ScalarUDF>,
    /// List of expressions to feed to the functions as arguments
    pub args: Vec<Expr>,
    
    // New fields for ScalarFunction
    pub return_type: Vec<DataType>,
    pub nullable: bool,
}

ScalarFunction::new_udf is used to create function, need to double check if there is other way to create scalar function. We need to make sure they all go to the same computation.

Ideally we have something like

ScalarFunction::new(udf: Arc<crate::ScalarUDF>, args: Vec<Expr>) -> Result<()> {
  1. Check udf's signature against args including length check, type check, implicit coercion
  2. Call fun.return_type() to determine the return type (or even return Field that includes the nullability)
  
  return error if anything goes wrong
}

Describe alternatives you've considered

No response

Additional context

We create physical expr with logical input and schema which doesn't seem correct too.

/// Create a physical expression for the UDF.
pub fn create_physical_expr(
    fun: &ScalarUDF,
    input_phy_exprs: &[Arc<dyn PhysicalExpr>],
    input_schema: &Schema,
    args: &[Expr],
    input_dfschema: &DFSchema,
) -> Result<Arc<dyn PhysicalExpr>> {
    let input_expr_types = input_phy_exprs
        .iter()
        .map(|e| e.data_type(input_schema))
        .collect::<Result<Vec<_>>>()?;

    // verify that input data types is consistent with function's `TypeSignature`
    data_types_with_scalar_udf(&input_expr_types, fun)?;

    // Since we have arg_types, we dont need args and schema.
    let return_type =
        fun.return_type_from_exprs(args, input_dfschema, &input_expr_types)?;

    Ok(Arc::new(
        ScalarFunctionExpr::new(
            fun.name(),
            Arc::new(fun.clone()),
            input_phy_exprs.to_vec(),
            return_type,
        )
        .with_nullable(fun.is_nullable(args, input_dfschema)),
    ))
}
@jayzhan211 jayzhan211 added the enhancement New feature or request label Dec 18, 2024
@findepi
Copy link
Member

findepi commented Dec 18, 2024

Compute the return_type and nullable as early as possible.

Correct, but not only that.
We should clearly separate function calls where coercions need to be applied from those where they shouldn't. (see also @alamb 's #13756 (comment) )

@jayzhan211
Copy link
Contributor Author

from those where they shouldn't

I think the example in the comment requires coercion. ExprSchemable::get_type is asking for the return type of the function. To compute the real return type, coerced types are required.

@jayzhan211
Copy link
Contributor Author

ScalarFunction::new(udf: Arc<crate::ScalarUDF>, args: Vec<Expr>). We might also need schema, need to take a look whether we have valid schema at this point

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants