Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[FEA] Ewm function for finite series #100

Open
jumana51 opened this issue Aug 12, 2020 · 3 comments
Open

[FEA] Ewm function for finite series #100

jumana51 opened this issue Aug 12, 2020 · 3 comments

Comments

@jumana51
Copy link

jumana51 commented Aug 12, 2020

Is your feature request related to a problem? Please describe.
The pandas EWM function allows passing of adjust as a parameter. For financial time series, the data is not infinite. Therefore, adjust=False should be used. The discrepancy exists on the first few data points only.

For example, for n = 200 rounded to 2 decimal places:
using gQuant:
200 107.26
201 107.28
202 107.30
203 107.31
204 107.30
...
28123 158.06
28124 158.07
28125 158.07
28126 158.07

using pandas adjust=False:
200 107.07
201 107.09
202 107.11
203 107.12
204 107.11
...
28123 158.06
28124 158.07
28125 158.07
28126 158.07
28127 158.07

Describe the solution you'd like
Allow the qQuant version of EWM to accept the adjust parameter and change the calculation to match the formula used by pandas. I am not sure if the recursive calculation for adjust=False will be amenable for GPU.

Describe alternatives you've considered
Convert the cudf series to pandas, use the pandas EWM function and convert back to cudf.

Additional context
References:
https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.ewm.html
https://pandas.pydata.org/pandas-docs/stable/user_guide/computation.html#stats-moments-exponentially-weighted

@yidong72
Copy link
Collaborator

@jumana51, I have an issue opened for official EWM function support. Check

rapidsai/cudf#1263

I will take look into this issue and add an adjust flag.

@jumana51
Copy link
Author

@yidong72 Thanks.

A general question: Lots of statistics functions were deprecated in Pandas and moved to statsmodels. I think it was a good strategy to keep the separation between data wrangling in pandas vs. statistical calculations. For cudf / gQuant, should we keep the same separation? Just my 2 cents.

BTW, I wish I had discovered this library a bit earlier as I ended up writing my own functions while porting from pandas to cudf :|

@yidong72
Copy link
Collaborator

gQuant organize the workflow by the Task Nodes where you can implement different statistical calculations. That's how we keep things weakly coupled.

We are currently trying to make a major gQuant release. You can review some of the tutorials at this PR #89.
Check this example.
https://github.com/yidong72/gQuant/blob/branch-client/notebooks/01_tutorial.ipynb

Hopefully, gQuant is useful to you.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants