Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

paid work- optimise NGBoost #298

Open
Geethen opened this issue Sep 28, 2022 · 7 comments
Open

paid work- optimise NGBoost #298

Geethen opened this issue Sep 28, 2022 · 7 comments

Comments

@Geethen
Copy link

Geethen commented Sep 28, 2022

The company I work for is looking for someone to reduce the amount of time NGBoost takes to train and run inference.
We will be willing to pay someone to refactor the algorithm (rates are negotiable). All work done will be made openly available. depending on the speedups achieved, a research paper could stem from your work (if that is something you are interested in).

Deliverable: A version of NGBoost that runs close to the speeds of LightGBM (or faster) by leveraging numba/C++/GPU acceleration or any other relevant acceleration approaches (maybe incorporating LightGBM techniques).

Link to the company website:

If interested and you have the time available, please email me to discuss options, timelines and payment: [email protected]

@StatMixedML
Copy link

@Geethen You might be interested in using LightGBMLSS which is an extension of LightGBM to probabilistic forecasting.

@alejandroschuler
Copy link
Collaborator

Also, if all you want are prediction intervals and you don't need the full conditional density you should use conformal inference instead: https://cdsamii.github.io/cds-demos/conformal/conformal-tutorial.html.

@Geethen
Copy link
Author

Geethen commented Oct 6, 2022

@StatMixedML Thank you very much- this seems very useful. I will give it a test drive.

@alejandroschuler - I was interested in NGBoost because it has been significantly outperforming LightGBM, XGBoost, etc, on the regression problems I am working on. Nevertheless, thank you for your suggestion.

@astrogilda
Copy link

Also, if all you want are prediction intervals and you don't need the full conditional density you should use conformal inference instead: https://cdsamii.github.io/cds-demos/conformal/conformal-tutorial.html.

Hey @alejandroschuler ! What do you mean by the full conditional density?

@alejandroschuler
Copy link
Collaborator

alejandroschuler commented Oct 6, 2022

@StatMixedML Thank you very much- this seems very useful. I will give it a test drive.

@alejandroschuler - I was interested in NGBoost because it has been significantly outperforming LightGBM, XGBoost, etc, on the regression problems I am working on. Nevertheless, thank you for your suggestion.

That seems suspicious... If you are just doing standard regression point prediction of the response (i.e. you don't care about the distribution of $Y|X$) then ngboost should pretty much never outperform other boosting algorithms. Match them, sure. Beat them now and again by a small margin, maybe. But significantly outperform... pretty much never.

@alejandroschuler
Copy link
Collaborator

alejandroschuler commented Oct 6, 2022

Also, if all you want are prediction intervals and you don't need the full conditional density you should use conformal inference instead: https://cdsamii.github.io/cds-demos/conformal/conformal-tutorial.html.

Hey @alejandroschuler ! What do you mean by the full conditional density?

The data are assumed to be draws from some unknown distribution $X_i, Y_i \overset{IID}{\sim} \mathcal D$. Therefore at each value of the covariates $X=x$ you can define the conditional distribution of $Y|X=x$. This is a (continuous) random variable so it has a density as a function of $y$. NGBoost estimates this conditional density $p_{Y|X}(y,x)$. From a trained model, you can evaluate that with the code np.exp(ngb.predict(x).logpdf(y)).

The conditional density fully describes the conditional distribution so you can build a prediction interval from it (i.e. for each $x$ of interest, find a region of values $y \in \mathcal Y$ such that $P\{Y \in \mathcal Y | X=x\} = 0.95$).

@valeman
Copy link

valeman commented Jan 6, 2023

Conformal Predictive Distributions estimate full conditional density one can find tutorials and more materials here https://github.com/valeman/awesome-conformal-prediction

https://proceedings.mlr.press/v91/vovk18a.html

https://www.youtube.com/watch?v=FUi5jklGvvo&t=3s

Crepes library in Python based on Conformal Predictive Distributions builds the complete CDF for each test object whilst providing guarantees that the density is well calibrated and valid including being located in the right place. https://github.com/henrikbostrom/crepes

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants