Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: Extend the simple UDAF interface with function-level variables #12067

Open
wants to merge 2 commits into
base: main
Choose a base branch
from

Conversation

liujiayi771
Copy link
Contributor

@liujiayi771 liujiayi771 commented Jan 12, 2025

Some aggregate functions implemented based on the simple UDAF interface need to obtain information such as step, argTypes, and resultType in the AccumulatorType struct. This PR adds the capability for user-defined variables to the simple UDAF interface, supporting the passing of step, argTypes, and resultType to AccumulatorType struct when registering the SimpleAggregateAdapter.

  1. The UDAF author add function-level variables as data members in the UDAF class (outside of its AccumulatorType struct).
  2. The UDAF class has an initialize() method that receives the aggregation step, the types, and assigns values to the data members in the UDAF class.
  3. The AccumulatorType struct has a data member that is a pointer to the UDAF class. This would allow member methods inside AccumulatorType to access data members in the UDAF class. This UDAF-pointer is set in SimpleAggregateAdapter::initializeNewGroupsInternal().

@facebook-github-bot facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Jan 12, 2025
Copy link

netlify bot commented Jan 12, 2025

Deploy Preview for meta-velox canceled.

Name Link
🔨 Latest commit fe4c0b7
🔍 Latest deploy log https://app.netlify.com/sites/meta-velox/deploys/678b1eb5e3013d0008edc69e

@liujiayi771
Copy link
Contributor Author

liujiayi771 commented Jan 12, 2025

@kagamiori Some time ago, we attempted to add function-level states to the simple UDAF interface in #9167 and we have discussed some details in #8711, but @mbasmanova suggested that we implement it in a manner similar to the simple scalar functions interface. You had previously provided a specific implementation plan, and this PR is implemented according to the plan you proposed at that time. However, I did not add support for constantInput, as we currently only need to use rawInputType and resultType. Adding constantInput would require introducing new variables in the AggregateFunctionFactory, and I believe we can expand it when we actually need it.

We currently need this capability to implement the decimal average aggregate function in Spark. Could you help to take a look. cc @rui-mo.

@liujiayi771 liujiayi771 force-pushed the func-level-variables branch 3 times, most recently from 4ad7ba7 to 94af18b Compare January 13, 2025 05:46
@kagamiori
Copy link
Contributor

kagamiori commented Jan 16, 2025

@kagamiori Some time ago, we attempted to add function-level states to the simple UDAF interface in #9167 and we have discussed some details in #8711, but @mbasmanova suggested that we implement it in a manner similar to the simple scalar functions interface. You had previously provided a specific implementation plan, and this PR is implemented according to the plan you proposed at that time. However, I did not add support for constantInput, as we currently only need to use rawInputType and resultType. Adding constantInput would require introducing new variables in the AggregateFunctionFactory, and I believe we can expand it when we actually need it.

We currently need this capability to implement the decimal average aggregate function in Spark. Could you help to take a look. cc @rui-mo.

Sure, let me take a look.

This was the second prototype after getting @mbasmanova's suggestions: #9775. It looks like this PR has the same design 👍.

Copy link
Contributor

@kagamiori kagamiori left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @liujiayi771, this looks mostly good to me except a few nits. Thank you for extending the simple UDAF interface to improve its usability!

TypePtr resultType)
: Aggregate(std::move(resultType)), fn_{std::make_unique<FUNC>()} {
if constexpr (support_initialize_) {
FUNC::initialize(fn_.get(), step, argTypes, resultType_);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: Just fn_->initialize(step, argTypes, resultType_);

@@ -103,6 +110,7 @@ class SimpleAggregateAdapter : public Aggregate {
// These functions are called on groups of both non-null and null
// accumulators. These functions also return a bool indicating whether the
// current group should be a NULL in the result vector.
std::unique_ptr<FUNC> fn_;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: Move the member variable to the bottom of the class.

using IntermediateType = Row<int64_t, double>;
using OutputType = double;

TypePtr inputType_;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: Add a comment above to say explicitly that these are the function level variable we're going to test, and explain that they are set during the creation of the aggregation function and checked in addInput().

Comment on lines 511 to 515
resultType,
name,
valueType);
valueType,
step,
argTypes);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: reorder the arguments to be name, step, argType, resultType, valueType to be more consistent with the lambda's parameter list.

Comment on lines 46 to 50
const TypePtr& inputType,
const TypePtr& sumType,
const TypePtr& resultType) {
const TypePtr& resultType,
core::AggregationNode::Step step,
const std::vector<TypePtr>& argTypes) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: Reorder the parameter list to be step, argTypes, resultType, inputType, sumType.

Comment on lines 164 to 165
TypePtr inputType;
TypePtr resultType;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: inputType -> inputType_, resultType -> resultType_

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants