Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Interest in Implementing Cross Validation? #8

Open
M-Harrington opened this issue Jan 30, 2019 · 7 comments
Open

Interest in Implementing Cross Validation? #8

M-Harrington opened this issue Jan 30, 2019 · 7 comments

Comments

@M-Harrington
Copy link

Hi, is there any interest in implementing cross validation for estimating missing values in the package? I have a problem where I had a time steps missing at random throughout my time series, and because of the number of time series I have to estimate the NA's for, it seemed to make the most sense to automatically perform parameter selection through cross validation.

Anyways long story short I've written up methods for K-fold and Monte Carlo cross validation to work with STLplus and I've also written a function to perform a grid search not so different than in the Scikit learn in python. If you're interested I can paste the code here, or if it's not really appropriate for the package itself I can submit my code as an answer to a self-made question on Stack Exchange.

Best,
Matt

PS: the function hasn't been super optimized yet or user-proofed, but I figured I'd include that iff it would be used by general people.

@M-Harrington
Copy link
Author

I implemented a solution and have it hosted on my github under the name STLinterp . https://github.com/m-harrington/stlinterp

@hafen
Copy link
Owner

hafen commented Nov 23, 2019

This is great! Thanks for sharing. I think it would be great as part of the package. To add it to the package, I think a few additional things should be put in place, such as validating the grid argument, etc. If you are interested in polishing it up as an exported function of the package, I'd happily accept a PR and add you as a contributor.

@M-Harrington
Copy link
Author

Hi @hafen, glad you got back to me! I'm more than happy to clean up the function and provide a bit more functionality. I'll make some first passes at cleaning it up, but if you have any guidance after that, it'd be great because I haven't really done any proper R development work so I'm bound to make mistake.

Thanks and I'll let you know when I've done some of those first order corrections!

@M-Harrington
Copy link
Author

Hi @hafen , I added the option to return either the best parameter set of the grid or the entire grid and their scores. I also tweaked the monte carlo method to be a little easier to use. The biggest change mostly was wrapping STLplus in a tryCatch because I noticed some parameter sets could be a bit finicky. Let me know what you think and if you have any other requested changes!

@M-Harrington
Copy link
Author

M-Harrington commented Oct 19, 2020

@hafen the function is pretty much ready, I can make the pull request whenever. Also I've written a brief tutorial on my website to explain how to use it and how to use STLplus to estimate missing values.

https://www.mattrharrington.com/post/fill-in-missing-cyclical-data-using-seasonal-trend-loess-and-cross-validation

Also do you have any advice for preparing documentation?

@hafen
Copy link
Owner

hafen commented Oct 19, 2020

@M-Harrington great! To be ready to drop in, can you document the functions in your script as described here: https://r-pkgs.org/man.html. Specifically, section 10.4 should be useful. Basically if you can give each function a description, document the parameters, and provide examples where necessary, that would be great.

@M-Harrington
Copy link
Author

@hafen Ok I've added the documentation to the best of my ability for the main function and everything looks mostly in order on my end. I wasn't quite sure about what to do with the subroutines that weren't really meant to be called so I've left them mostly undocumented, but I'm happy to change that if you'd like. You're welcome to check out the changes in STLinterp.R on my github or you can just add me as a collaborator and I'll start the pull request.

I did have one quick question though just to make sure before I submit this, how is the fc component factored into the prediction? Previously I was assuming that everything was captured in the seasonal and trend components, but should I be adding or averaging the fc component as well. I.e. not doing reconstruction <- seasonal(stlobj)+ trend(stlobj) and instead something like reconstruction <- seasonal(stlobj)+ trend(stlobj)+fc(stlobj)?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants