Add exp-weighted functions to DuckDB #5139

mauropagano · 2022-10-30T15:15:51Z

Exponentially weighted functions are used frequently in the finance world but they are tricky to implement in SQL.
In the pandas they are ewm().<<function>> (sum, mean, stdev, var) and they are implemented in cython

>>> import pandas as pd
>>> df = pd.DataFrame(dict(a=[1,2,3,4]))
>>> df.assign(s=df.a.ewm(alpha=0.5).sum(), m=df.a.ewm(alpha=0.5).mean())
   a      s         m
0  1  1.000  1.000000
1  2  2.500  1.666667
2  3  4.250  2.428571
3  4  6.125  3.266667

The logic to implement them is not too complicated but it requires carrying state across rows.
A user could implement them using with recursive but that's slow or using a UDAF but that isn't a thing in DuckDB AFAIK.

I asked on Discord if people felt they would be useful to have them in DuckDB and Alex seemed supportive (not sure why tagging him doesn't work), filing this issue as per @hawkfish suggestion.

Is that something you feel should go into DuckDB core (asking here as per Contributing Guidelines)? My biased sense is yes, not many others are offering these functionalities and it seems kind of low-hanging fruit.
Do you feel this is within the level of complexity an outsider can implement? I'll for sure need some help, I only contributed #4259

If yes, I'd volunteer to do the work to add sum_w and mean_w (name tbd), starting with exposing only alpha parameter.
Most other parameters are just ways to implement the same thing, adjust should probably be True most of the times anyway and ignore_na is something a user can deal with using filter conditions.
Also std and var could be in another PR.

PS: I didn't use the standard template as this is not a bug, happy to switch

The text was updated successfully, but these errors were encountered:

hawkfish · 2022-11-03T22:19:41Z

The pandas documentation is here.

This is a right royal pain to do in straight SQL - you have to compute the weights using RANK() and then use them twice in another window - with the same OVER clause - which we won't optimise right now (dependencies between window functions are not supported) so we will resort.

To implement this as a window function using segment trees, you would need the RANK value, which would require passing in the partition mask and somehow hooking that into the aggregation infrastructure. Either that or provide a separate holistic window function that computes it on the fly (not even sure that would work).

I think this will have to wait until we get a chance to refactor the segment trees to be on a function-by-function basis, with a common default for most. This would also facilitate the new merge sort trees approach that Adrian presented at SIGMOD 2022.

mauropagano · 2022-11-03T23:27:47Z

Thanks for taking a look at it Richard!

I'm not sure I follow how the double RANK() solves it though.
For adjust=True the exponent of (1 - alpha) for any row needs to change (increase, lowering the weight) the more you walk down the rows in the window, I'm struggling to see how the double pass enables that.

My naive idea was to have a dedicated aggregate function that carries state across rows.
When used as straight aggregation it would the final row value and when used as window it would report the so-far computed value.
Once is a loop the scaling becomes trivial (e.g. for sum value(x) = (value(x-1) * (1-alpha)) + x).
Same approach already works with with recursive but that's quite slow.

github-actions · 2023-07-29T00:30:38Z

This issue is stale because it has been open 90 days with no activity. Remove stale label or comment or this will be closed in 30 days.

mauropagano · 2023-07-31T14:44:33Z

Unless you guys feel there is a good reason to close this, I'd keep it open.

There is currently still no way to do this efficiently in duckdb, UDAF might be an option when it comes available and assuming performance is good there.

Just to provide a bit more justification to keep this open, the same functionalities exists / has been requested in polars, spark, cudf, dask

github-actions · 2023-10-31T00:30:56Z

This issue is stale because it has been open 90 days with no activity. Remove stale label or comment or this will be closed in 30 days.

mauropagano · 2023-10-31T01:03:31Z

I can't remove the tag myself so I guess I'll comment

jakkes · 2024-01-08T10:10:11Z

Definitely waiting for this feature..!

github-actions · 2024-04-08T00:27:58Z

This issue is stale because it has been open 90 days with no activity. Remove stale label or comment or this will be closed in 30 days.

jakkes · 2024-04-08T05:42:09Z

Bump

hawkfish · 2024-04-08T13:40:14Z

This is a feature, not a bug, so please convert this to a discussion if you want it kept alive. Thx.

mauropagano · 2024-06-18T18:11:09Z

I filed this one, how do I convert it to a discussion?

hawkfish · 2024-06-18T18:17:57Z

I filed this one, how do I convert it to a discussion?

I'm not sure! I did find this document but maybe you need to do it?

mauropagano · 2024-06-18T18:23:52Z

I can't seem to find the button either (it's not on the right side like the doc suggests, probably I don't have the right privs), is it ok if I keep this one open for now or anybody with the right privilege can move it to discussion if needed?

Maxxen added the enhancement label Oct 31, 2022

github-actions bot added the stale label Jul 29, 2023

github-actions bot removed the stale label Aug 1, 2023

github-actions bot added the stale label Oct 31, 2023

github-actions bot removed the stale label Nov 1, 2023

szarnyasg removed the enhancement label Nov 29, 2023

github-actions bot added the stale label Apr 8, 2024

carlopi converted this issue into discussion #12591 Jun 18, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

This issue was moved to a discussion.

Add exp-weighted functions to DuckDB #5139

Add exp-weighted functions to DuckDB #5139

mauropagano commented Oct 30, 2022

hawkfish commented Nov 3, 2022 •

edited

Loading

mauropagano commented Nov 3, 2022

github-actions bot commented Jul 29, 2023

mauropagano commented Jul 31, 2023

github-actions bot commented Oct 31, 2023

mauropagano commented Oct 31, 2023

jakkes commented Jan 8, 2024

github-actions bot commented Apr 8, 2024

jakkes commented Apr 8, 2024

hawkfish commented Apr 8, 2024

mauropagano commented Jun 18, 2024

hawkfish commented Jun 18, 2024

mauropagano commented Jun 18, 2024

This issue was moved to a discussion.

This issue was moved to a discussion.

Add exp-weighted functions to DuckDB #5139

Add exp-weighted functions to DuckDB #5139

Comments

mauropagano commented Oct 30, 2022

hawkfish commented Nov 3, 2022 • edited Loading

mauropagano commented Nov 3, 2022

github-actions bot commented Jul 29, 2023

mauropagano commented Jul 31, 2023

github-actions bot commented Oct 31, 2023

mauropagano commented Oct 31, 2023

jakkes commented Jan 8, 2024

github-actions bot commented Apr 8, 2024

jakkes commented Apr 8, 2024

hawkfish commented Apr 8, 2024

mauropagano commented Jun 18, 2024

hawkfish commented Jun 18, 2024

mauropagano commented Jun 18, 2024

This issue was moved to a discussion.

hawkfish commented Nov 3, 2022 •

edited

Loading