Automation refactor with Jax #217

alejandroschuler · 2020-12-31T20:44:52Z

alejandroschuler
Dec 31, 2020
Maintainer

I've been working on refactoring the distributions backend to use jax. The advantage of this is that adding new distributions to ngboost becomes much, much easier at the cost of a little bit of speed. The speed can always be regained by adding the necessary methods, but if the developer does not know how to implement these their class will still work.

The idea is illustrated by comparing two implementations of the normal distribution: a "simple" implementation, and a "fast" implementation. In the"simple" implementation, there is no score implementation and the distribution class has only 3 methods. ngboost uses jax and the provided cdf and sample methods from the distribution to automatically derive/approximate everything that it needs to use the distribution with the log score. In the "fast" implementation I've added a distribution-specific implementation of the log score and some other methods to the distribution that increase speed and numerical stability.

I've also simplified (from the user's perspective) the way that ngboost differentiates between internal parameters and the user-facing parameters (e.g. log-sigma vs. sigma in the normal). The transformations from an interval to the reals and their inverses are now automatic and hidden from the user. The goal is that users can implement methods that work on the user-facing parametrization and ngboost will automatically compose these with the transformation to get methods that work with the internal parametrization (this is not yet implemented for the d_score method, which requires chain rule as well as composition). The cost of this is some potential confusion when implementing fast distribution-specific scores, as now one must work with and know what the internal parametrization is. But users who are implementing scores should be savvy enough to understand this without too much difficulty, whereas users who just want to try a new distribution and get it to work are often confused by the notion so hiding it should be on the whole a good thing. I've tried to make the distinction in the parametrization clear in the code with some naming conventions (i.e. methods and variables starting with _ work/with or represent the internal parametrization.

I've also provided an automatic _fit_marginal method that uses gradient descent and basin hopping. The only issue with it at the moment is that the gradient can be numerically weird if the initial guess is off and I don't have a good distribution-agnostic way of setting what that guess should be.

It would be excellent if we could implement a fast-enough general purpose sampling algorithm (e.g. inverse transformation sampling maybe?) that could effectively "write" the sample method from the cdf or pdf methods. Again, the purpose of all of this is to lower the implementation barrier for new distributions and to make the overall system very flexible so users can add as many or as few methods as they want to get the desired speed/complexity trade-off.

@avati @tonyduan @ryan-wolbeck lmk what you think!

kmedved · 2021-01-01T15:28:54Z

kmedved
Jan 1, 2021

Would moving to Jax mean the end of native windows support? Not that that should be a deal breaker, but Jax currently only works on Windows via WSL, so I'm curious what that would mean.

3 replies

alejandroschuler Jan 2, 2021
Maintainer Author

That is a great question. I have absolutely no idea!

ryan-wolbeck Jan 4, 2021
Maintainer

Yeah I think there might be compatibility issues but I don't think that should weight too heavy into the decision making process here. I think the vast majority of installs come from conda-forge but I also think anyone developing on windows should probably using virtual envs or containerization of their code to make it potable to other environments.

kmedved Jan 4, 2021

Agreed. I think adding additional functionality via new distributions is more important, but wanted to flag as a potential issue. I work in Windows natively for IT security reasons outside of my control, but would just find a workaround here.

tonyduan · 2021-01-04T07:37:25Z

tonyduan
Jan 4, 2021
Maintainer

The "simple" implementation is extremely satisfying, but I think in most cases I think we'd want a "fast" implementation anyway (I'd be curious what the performance differences are, in any case). And Jax's lack of windows compatibility is a bummer, but I'm not sure if this should be a deal breaker or not.

I'm a fan of the refactor of internal vs user-facing parameters.

I'm not very familiar with Scipy's basin hopping algorithm. What's the advantage of this optimizer over what we currently have i.e. calling sp.distribution.fit(...)? Is it to support more general distributions not included in Scipy?

Lastly, I totally agree on the general purpose sampling algorithm, would be a neat nice-to-have!

8 replies

guyko81 Jan 5, 2021

@alejandroschuler I'm just afraid I'll loose accessibility to the newest ngboost - I'm using Windows for my project development.

alejandroschuler Jan 5, 2021
Maintainer Author

If you're willing to translate the ngboost source from jax, I'd love to see a pytorch version! I think it would just involve replacing a few calls to jax to the equivalent calls in pytorch? I can let you know once I have a stable implementation if you want to try.

guyko81 Jan 5, 2021

yes, I'm happy to try it! I'll set up a Docker in WSL for testing jax - it will actually eliminate some of my problem with jax, but really want to see a native Windows support. Please let me know when I can start.

guyko81 Jan 5, 2021

Or I can write a tutorial for installing under Windows, if I find it convenient to develop through WSL - which I might just will...

ryan-wolbeck Jan 5, 2021
Maintainer

@guyko81 @alejandroschuler I'll write up a little docker image to help people get started with docker. Docker doesn't require WSL as far as I know on windows. You can just install docker desktop then just spin up the container and you're off to the races. It's generally good practice to use containers because if it runs it docker it'll run in docker anywhere so if you decide that your windows laptop isn't beefy enough to run training in a reasonable time frame you can just git clone the repo and spin up the container docker-compose up -d on say a large linux box and your good to go. It will also manage your dependencies if you are explicit on versions so your coworkers et all would have no issue using your code going forward. Weather or not Jax becomes the implementation (which I'm in favor of) it's probably good to have an example around for people to use.

ryan-wolbeck · 2021-01-04T19:49:53Z

ryan-wolbeck
Jan 4, 2021
Maintainer

I think this is generally a good idea and I think JAX makes sense. I like that it's a simple transition from standard NP and if this allows for easier development of the additional distributions which is now pretty high touch from for the maintainers and brings more autonomy in the PR process I think that would be a big win. @alejandroschuler have you done any tests to see an impact on run time with this change?

1 reply

alejandroschuler Jan 4, 2021
Maintainer Author

@ryan-wolbeck the exact implementation of these ideas is still a little in flux but I should have something cleaned up today or tomorrow that we can benchmark (at least for the normal dist). Then we can compare the timing of the current implementation vs. the jax "fast" vs. the jax "simple".

alejandroschuler · 2021-01-05T20:58:15Z

alejandroschuler
Jan 5, 2021
Maintainer Author

Alright I think I've settled on a more-or-less stable configuration for things. The main difference is that "score implementations" (now "manifolds") are now subclasses of the distribution object, i.e. they extend both it and the parent score. Previously there was a dynamic mix-in that subclassed the score implementation class with the distribution class but this redesign simplifies things. Again you can compare the fast normal implementation with the equivalent simple version.

Doing this also allowed me to do all of the automated "method building" in the score parent class (i.e. LogScore). The process for constructing the necessary methods for LogScore is somewhat complicated but unfortunately there is no way around this:

The graph shows that as long as cdf() is implemented for the manifold (i.e. for the parent distribution), all other methods can be derived (_metric has a default method as long as sample() is defined, which is not shown). I have also considered adding support for user-defined score_obs/cen() and/or d_score_obs/cen() and/or metric() methods that would allow the user to derive and implement these in the dist-score manifold with respect to the user-facing parametrization instead of the internal parametrization.

You'll also notice that the scores and derivatives are now split between _obs and _cen versions. These represent the observed and interval-censored terms in the log-likelihood when assuming conditional independence of the censoring and outcome processes. Interval censoring generalizes left- and right-censoring because one endpoint of the censoring interval can be +-infinity. This required making a new object to represent a censored outcome (i.e. when observed, we see the outcome, when censored, we see the two endpoints of the censoring interval). The API for this object and the ngboost model object that requires it is still in flux, but this doesn't affect the back-end. I'm mostly happy with how this looks for the time being but it does seem to run a bit slowly when using censored data and I'm not sure why.

The ability to automatically support censored outcomes data is very nice, but does complicate things a bit. However, developers are completely free to ignore this. If they simply name their manifold score method _score() instead of _score_obs() (and same for the derivative) those will override the combination of the observed and censored scores and the class will work just fine in the NGBRegressor object.

4 replies

alejandroschuler Jan 5, 2021
Maintainer Author

@ryan-wolbeck it would be awesome if you could benchmark these implementations against the existing implementation when doing regression with Score=LogScore, Dist=Normal on data of various shapes and sizes., e.g. ngb = NGBRegressor(Dist=Normal, Score=LogScore).fit(X_train, Y_train). I don't really know much about profiling python code, much less jax, but if you can do that and identify speed bottlenecks in the jax implementation that would be incredible.

The other thing that needs to be tested/profiled is the use-case with censored outcomes. The way to do that at the moment looks like:

import jax.numpy as np
from ngboost.distns import Normal
from ngboost.scores import LogScore, CRPScore
from ngboost.distns.simple_normal import Normal as SimpleNormal

from jax.ops import index_update, index
from ngboost.censored import CensoredOutcome

def censor_admin(y, lower=-np.inf, upper=np.inf):
    observed = np.nan*np.zeros(y.shape)
    ix_obs = (y < lower) | (upper < y)
    
    observed = index_update(observed, index[ix_obs], y[ix_obs])
    censored = np.array([[np.nan, np.nan] if i else [lower, upper] for i in ix_obs])
    
    return CensoredOutcome(observed, censored)  

from ngboost import NGBRegressor, NGBCensored

from sklearn.datasets import load_boston
from sklearn.model_selection import train_test_split
from sklearn.metrics import mean_squared_error

X, Y = load_boston(return_X_y = True)
X_train, X_test, Y_train, Y_test = train_test_split(X, Y, test_size=0.2, random_state=0)
Y_train_censored = censor_admin(Y_train, upper=20)
# fake left-censoring (i.e. all observations must be > 20)

ngb = NGBCensored(Dist=Normal, Score=LogScore).fit(X_train, Y_train_censored)
Y_preds = ngb.predict(X_test)
Y_dists = ngb.pred_dist(X_test)

I've noticed that this runs much more slowly than it does with NGBRegressor and no censored outcomes and I'm not really sure why that is.

alejandroschuler Jan 5, 2021
Maintainer Author

@guyko81 this implementation should be a solid enough base to work from if you want to start translating things to pytorch. lmk if you have questions!

guyko81 Jan 6, 2021

@alejandroschuler I got a bit confused by the changes, so the translation won't come from me very soon. BUT: I have realised that it's possible to do automatic differentiation with Sympy as well. I have made an implementation of the Normal LogScore with Sympy, if you want to check please see my github.
master...guyko81:sympy

This way the Windows support could remain and the new implementations are still possible with basically no hassle. I paid attention and used the differentiation by log(sigma), and Sympy handled easily.

The implementation is slower compared to the original solution, but I think it's because the differentiation happens every time, not just once at the beginning. There's probably a way to save the symbolic expression once and lambdify it every time, or maybe even get the lambdified function and use it directly. That would be on the same speed as the original implementation.

guyko81 Jan 6, 2021

@alejandroschuler I have found the solution to calculate the derivative only once and store it for later. This solution has all the good part from the proposals:

fast - as the numpy version with fixed code
flexible and easy to develop - as the jax version
windows support remains - as the numpy version

Can you please consider this proposal? It would be so great to have native support for Windows.

MikeOMa · 2021-03-18T16:51:58Z

MikeOMa
Mar 18, 2021

I think jax would be a great way to go. I had a random thought earlier which might be a work around to avoid dependencies and also potentially allow sympy and jax to work without having to choose one. I imagine other people have already thought about all of this, I am saying it just in case!

When adding these extra features just make jax an optional dependency. If you run say pip install ngboost[full] it would install jax.
The vanilla pip install ngboost would not install jax and when one tries to run from jax_distns import Normal an error message would be printed like

All distributions in jax_distns require jax to be installed. run pip install jax or consider a distribution implemented in distns.

I think it would be quite a nice solution as it keeps the main bulk of ngboost functionality without requiring a larger dependency list.
Would this fit in with your current jax implementation @alejandroschuler? I don't think this idea would work if jax specific modifications need to be made to any of the NGBoost internals.

implementation
I remember geopandas had something like this before this error message would be printed for certain functions https://stackoverflow.com/questions/58416539/geopandas-importerror-the-descartes-package-is-required-for-plotting-polygons-i

Poetry also has support for it.
https://python-poetry.org/docs/pyproject/

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Automation refactor with Jax #217

{{title}}

Replies: 5 comments 16 replies

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

Select a reply

Automation refactor with Jax #217

alejandroschuler Dec 31, 2020 Maintainer

Replies: 5 comments · 16 replies

alejandroschuler Jan 2, 2021 Maintainer Author

ryan-wolbeck Jan 4, 2021 Maintainer

tonyduan Jan 4, 2021 Maintainer

alejandroschuler Jan 5, 2021 Maintainer Author

ryan-wolbeck Jan 5, 2021 Maintainer

ryan-wolbeck Jan 4, 2021 Maintainer

alejandroschuler Jan 4, 2021 Maintainer Author

alejandroschuler Jan 5, 2021 Maintainer Author

alejandroschuler Jan 5, 2021 Maintainer Author

alejandroschuler Jan 5, 2021 Maintainer Author

alejandroschuler
Dec 31, 2020
Maintainer

Replies: 5 comments 16 replies

alejandroschuler Jan 2, 2021
Maintainer Author

ryan-wolbeck Jan 4, 2021
Maintainer

tonyduan
Jan 4, 2021
Maintainer

alejandroschuler Jan 5, 2021
Maintainer Author

ryan-wolbeck Jan 5, 2021
Maintainer

ryan-wolbeck
Jan 4, 2021
Maintainer

alejandroschuler Jan 4, 2021
Maintainer Author

alejandroschuler
Jan 5, 2021
Maintainer Author

alejandroschuler Jan 5, 2021
Maintainer Author

alejandroschuler Jan 5, 2021
Maintainer Author