Updated Early Stopping Behavior to Match Scitkit-Learn's HistGradientBoostingRegressor #234

kmedved · 2021-02-12T18:56:41Z

kmedved
Feb 12, 2021

This is sort of a wishlist thought, but it may be convenient for ngboost to adopt the early stopping behavior from Scikit-Learn's HistGradientBoostingRegressor, as opposed to the current behavior which tracks LightGBM/Xgboost/Catboost. This would be extremely convenient for hyperparameter tuning.

Current behavior. To summarize, LightGBM/XGboost/Catboost allow the user to pass a validation set into the .fit() call, which when paired with early_stopping_rounds allows the user to tune the number of rounds of boosting efficiently. Ngboost currently has the same behavior.

Alternate Behavior. Scikit-learn recently rolled out HistGradientBoostingRegressor, which is a similar boosting algorithm, but has slightly different behavior for early stopping. Rather than asking the user to pass a validation set, HistGradientBoostingRegressor creates its own validation set based on the X/y data passed into the .fit() call, based on the validation_fraction parameter, allowing the user to do early stopping with a simple .fit(X, y, sample_weight) call.

Why. This functionality makes it possible to use early stopping with RandomizedSearchCV/GridSearchCV/cross_val_score. Presently, you can pass an ngboost estimator object to those scorers/searchers, but there's no way to specify a validation set for the early_stopping_rounds parameter, making it not really practical to use these methods for hyperparameter searches with ngboost. You can sort of get around this right now by passing a fit_params parameter, where you specify early_stopping_rounds and a validation set, but as far as I can tell, this will result in using the same validation set for every cross validation fold, which is less than ideal by itself.

This is a bit more than an abstract API convenience. The core advantage of letting users use early stopping without passing a validation set to RandomizedSearchCV/GridSearchCV/cross_val_score is that those tools make it trivial to do multiprocessed hyperparameter searching via the built-in n_jobs parameter. Given the single-core nature of ngboost, this would lead to a proportional increase in hyperparameter search speed (e.g., if you have 8 cores, you can search 8x faster).

Here is some background discussion on this issue at Scikit-learn, and at LightGBM, discussing the differences in the API, and the pros/cons of each approach. And here's a discussion of the issue with using GridSearchCV with the current behavior in Xgboost.

alejandroschuler · 2021-02-15T15:33:53Z

alejandroschuler
Feb 15, 2021
Maintainer

Seems reasonable and should be very easy to implement. @kmedved wanna give it a shot? :P

3 replies

kmedved Feb 15, 2021
Author

Happy to give it a shot, sure! I'm a pretty terrible coder, but will see what I can do.

tonyduan Feb 16, 2021
Maintainer

Agree this would be neat, though I'd request we still enable the existing interface (i.e. an explicit validation set) as an option. :)

kmedved Feb 16, 2021
Author

That's a good idea as well. I'll give it a shot.

ryan-wolbeck · 2021-07-30T16:27:19Z

ryan-wolbeck
Jul 30, 2021
Maintainer

This has been released with v0.3.12

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Updated Early Stopping Behavior to Match Scitkit-Learn's HistGradientBoostingRegressor #234

{{title}}

Replies: 2 comments 3 replies

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

Select a reply

Updated Early Stopping Behavior to Match Scitkit-Learn's HistGradientBoostingRegressor #234

kmedved Feb 12, 2021

Replies: 2 comments · 3 replies

alejandroschuler Feb 15, 2021 Maintainer

kmedved Feb 15, 2021 Author

tonyduan Feb 16, 2021 Maintainer

kmedved Feb 16, 2021 Author

ryan-wolbeck Jul 30, 2021 Maintainer

kmedved
Feb 12, 2021

Replies: 2 comments 3 replies

alejandroschuler
Feb 15, 2021
Maintainer

kmedved Feb 15, 2021
Author

tonyduan Feb 16, 2021
Maintainer

kmedved Feb 16, 2021
Author

ryan-wolbeck
Jul 30, 2021
Maintainer