Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[FEA] need official EWM function #1263

Open
1 of 5 tasks
yidong72 opened this issue Mar 22, 2019 · 23 comments · Fixed by #9027
Open
1 of 5 tasks

[FEA] need official EWM function #1263

yidong72 opened this issue Mar 22, 2019 · 23 comments · Fixed by #9027
Assignees
Labels
feature request New feature or request libcudf Affects libcudf (C++/CUDA) code. Python Affects Python cuDF API.

Comments

@yidong72
Copy link

yidong72 commented Mar 22, 2019

Is your feature request related to a problem? Please describe.
EWM is a very popular method used in time series analysis, especially in the domain of FSI. cuIndicator is using EWM a lot to compute the technical indicators. It is good to have official support in the cuDF.

Describe the solution you'd like
DataFrame.ewm(com=None, span=None, halflife=None, alpha=None, min_periods=0, adjust=True, ignore_na=False, axis=0). The same interface as the Pandas EWM function

Describe alternatives you've considered
cuIndicator has the implementation that is based on rolling window methods. cuIndicator EWM.

Additional context
EWM can be implemented by prefix-sum method if we weight the past carefully. I have the example implementation for it.

  • mean
  • standard deviation
  • variance
  • covariance
  • correlation
@yidong72 yidong72 added Needs Triage Need team to review and classify feature request New feature or request labels Mar 22, 2019
@kkraus14 kkraus14 added Python Affects Python cuDF API. libcudf Affects libcudf (C++/CUDA) code. and removed Needs Triage Need team to review and classify labels Mar 27, 2019
@harrism
Copy link
Member

harrism commented Jul 4, 2019

@yidong72 @randerzander @beckernick What are the specific aggregations needed to implement this on top of the new rolling window functionality?

@rouniuyizu
Copy link

Hello, any plan in merging this or other implementation/s of EWM function? Or any temp fix that I could use for now?
Appreciated.

@harrism
Copy link
Member

harrism commented Jul 2, 2020

@kkraus14 @yidong72 @beckernick need help understanding what is needed from libcudf.

@kkraus14
Copy link
Collaborator

kkraus14 commented Jul 2, 2020

We haven't scoped this function as of yet from the cuDF Python side so we can't guide libcudf as of yet. I don't think this is currently a high priority for us.

@yidong72
Copy link
Author

yidong72 commented Jul 2, 2020

@harrism. The implementation I have is just adding a weight term to the time series items in the rolling window fun. So it should be straight forward implementation on top of rolling window fun

@yidong72
Copy link
Author

We have seen some folks at FSI who are interested in the official EWM function. Check this issue we got from gQuant project. I can fix it in a hacky way but it is nice to have official support from cudf.

NVIDIA/fsi-samples#100

Please increase the priority of this issue.

@beckernick
Copy link
Member

beckernick commented Jul 14, 2021

In pandas, EWM provides various exponential weighted functions including mean, variance, standard deviation, and more. I'm going to update the issue to include a task-list of the various functions.

Exponential weighted mean is the canonical usage, which makes it a good starting point for the next release.

@beckernick beckernick added this to the Time Series Analysis milestone Jul 14, 2021
@brandon-b-miller brandon-b-miller self-assigned this Jul 19, 2021
@brandon-b-miller
Copy link
Contributor

I've scoped this out and there's a couple of design caveats I would like to discuss before proceeding with an implementation.

TL;DR: I am not sure how to do this in a way that is actually performant.

This function in pandas behaves more like a single large window over the entire data than a rolling window function like what is normally envisioned. That is to say that by default, each element of the result is the weighted average of all of those that come before it in the sequence.

The formula for a single result is quite clear from the pandas documentation:

Screen Shot 2021-07-28 at 11 32 12 AM

There's really two straightforward ways of computing this sequence and neither of them seem to really help us very much.

  1. Compute all the ys in parallel (each processing element is responsible for a single y). This doesn't help us because the work is very uneven, with the worst off thread (the last one) still needing to compute coefficients for the entire sequence and then reduce the result into a weighted average.
  2. Generate the results sequentially, keeping track of the numerator and denominator of the equation, and recognizing that each time we advance we need only multiply the numerator by a factor of (1 - α), and multiply the denominator by the same (1 - α) and add 1 to get the next result. This is indeed how pandas makes the calculation tractable.

In the case where we really are using this within a window function, this problem goes away, as long as the window size is small relative to the data size (each thread applies the above sequential algorithm for its window). We could thus implement this on top of rolling technically, but we can't just wrap that functionality with window=len(data) otherwise the performance will be abysmal.

It seems like what is needed here is a truly parallel algorithm that properly balances the work each computing element is doing across the moving average calculation.

@harrism
Copy link
Member

harrism commented Jul 29, 2021

This can be computed efficiently in parallel using two scans (one for the numerator, one for the denominator) and a binop (divide).

@brandon-b-miller
Copy link
Contributor

Unless I am misunderstanding that works for getting one of the datapoints we need (any single one) but not the entire sequence. Each element of the result is the result of dividing two things, but those things are the sums of sequences and those sequences are different for each element in question. Consider the first few denominators d_i and let beta = (1 - alpha):

d_0 = 1
d_1 = 1 + beta
d_2 = 1 + beta + beta ** 2

In general d_n = sum(k = 0:n [beta ** n]) which is a finite series for which I do not think there is a closed form solution in terms of simple arithmetic operations. That said, one can see that

d_1 = beta * (1) + 1 = (beta * d_0) + 1
d_2 = beta * (1 + beta) + 1 = (beta * d_1) + 1

Meaning each successive term is related to the last by

d_this = (beta * d_previous) + 1

Which makes for an efficient serial algorithm for computing these terms without having to actually sum over an entirely new set of numbers. Unfortunately this doesn't seem to help us towards a thrust implementation because if we were trying to do an inclusive scan, we'd have this as our binary_op:

def f(d_previous, d_this):
    return (beta * d_previous) + 1
beta = 0.1
f(1, f(2,3))
# 1.11
f(f(1,2), 3)
# 1.1

I believe this breaks the associativity needed for an inclusive scan.

@cwharris
Copy link
Contributor

Is this the naive implementation, or is this totally wrong?

std::vector<double> input(...);
std::vector<double> output(...);
double alpha = 0.5;
auto running_sum = 0;

for (auto i = 0; i < input.size(); i++) {
  running_sum = input[i] + running_sum * (1 - alpha);
  output[i] = running_sum;
}

@harrism
Copy link
Member

harrism commented Jul 29, 2021

Solving recurrence equations is in Guy Bleloch's classic paper "Prefix Sums and their applications". http://www.cs.cmu.edu/~blelloch/papers/Ble93.pdf (Section 1.4)

The trick is to maintain the intermediates as pairs, rather than as individual values. Let b = 1-alpha. For the numerator, the input is a list of pairs [(b, x_0), (b, x_1), (b, x_2), ...]. The numerator operator (in Python), is:

def f(pair_i, pair_j):
    return (pair_i[0]*pair_j[0], pair_i[1]*pair_j[0] + pair_j[1])

Test input demonstrates associativity:

>>> beta=0.1
>>> a=(beta, 1)
>>> b=(beta, 2)
>>> c=(beta, 3)
>>> f(a, f(b, c))
(0.0010000000000000002, 3.21)
>>> f(f(a, b), c)
(0.0010000000000000002, 3.21)

To get the numerator out of the scan, after performing the inclusive scan, just extract all the second elements of the pairs.

Intuitively, we are propagating the product of the 1-alphas separately from the summation. This is a really simple recurrence, for which Blelloch gives a comprehensive proof and requirements on the operator. He also proves that higher order recurrences can be reduced and implemented with scans the same way, with the appropriate operator.

This paper is required reading, IMO. You will see scans everywhere once you start seeing them. :)

(Note, implementation with Thrust is pretty simple -- just use a zip iterator with a constant iterator (1-alpha) and the input iterator and use a lambda that returns the modified pair as in the Python f.)

@vyasr
Copy link
Contributor

vyasr commented Jul 30, 2021

I stumbled on this paper this morning while Googling "Prefix sums recursion relations" after a few of us met to discuss this problem yesterday. It's so elegant how separating the current power of the prefactor makes the recursion operator associative! Thanks for pointing us in the right direction.

@brandon-b-miller
Copy link
Contributor

thanks @harrism this works perfectly using thrust in my experiments. It's a little hard to for me to tell if this really belongs as a rolling aggregation, should that still be the plan or is there a more appropriate place for this to live inside of libcudf?

@harrism
Copy link
Member

harrism commented Aug 3, 2021

My pleasure. I don't know the answer to your question. Is it different from a rolling aggregation in some way? Does it have finite window extents, or does every element depend on all preceding elements over the entire series? CC @jrhemstad

@brandon-b-miller
Copy link
Contributor

The particular pandas API is the version where every element depends on all the previous ones.

pandas does support a windowed version of this via different API. But I am not sure our version, were we ever to support it, would need to actually parallelize within the windows - at least for small window sizes relative to the data the normal recurrence relation might perform fine on its own within the windows.

@harrism
Copy link
Member

harrism commented Aug 4, 2021

For the one where every element depends on all previous ones, it may be best to add this as an operator to our existing scan API.

The windowed version sounds like rolling. Or could be done as an operator to the segmented scan API.

@Haidow
Copy link

Haidow commented Mar 2, 2023

Dose it i supported in 23.02?

@brandon-b-miller
Copy link
Contributor

Hi @Haidow ,
We've been focusing on some other priorities for a while since we last looked at this, but there's been some more interest recently, so I'll start a conversation about finishing this up in the next few releases.

@felixmccuaig
Copy link

@brandon-b-miller Yes plz!

@cpowr
Copy link

cpowr commented Nov 23, 2023

Thanks @brandon-b-miller, any update on this?

@vyasr
Copy link
Contributor

vyasr commented Jun 24, 2024

#9027 adds the mean aggregation. We'll follow up in additional PRs with the other functions.

@harrism
Copy link
Member

harrism commented Jun 24, 2024

Congrats on finally merging this @brandon-b-miller! Scan FTW!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature request New feature or request libcudf Affects libcudf (C++/CUDA) code. Python Affects Python cuDF API.
Projects
Status: In Progress
Development

Successfully merging a pull request may close this issue.