Replies: 2 comments 3 replies
-
Seems reasonable and should be very easy to implement. @kmedved wanna give it a shot? :P |
Beta Was this translation helpful? Give feedback.
3 replies
-
This has been released with v0.3.12 |
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
This is sort of a wishlist thought, but it may be convenient for ngboost to adopt the early stopping behavior from Scikit-Learn's
HistGradientBoostingRegressor
, as opposed to the current behavior which tracks LightGBM/Xgboost/Catboost. This would be extremely convenient for hyperparameter tuning.Current behavior. To summarize, LightGBM/XGboost/Catboost allow the user to pass a validation set into the
.fit()
call, which when paired withearly_stopping_rounds
allows the user to tune the number of rounds of boosting efficiently. Ngboost currently has the same behavior.Alternate Behavior. Scikit-learn recently rolled out HistGradientBoostingRegressor, which is a similar boosting algorithm, but has slightly different behavior for early stopping. Rather than asking the user to pass a validation set,
HistGradientBoostingRegressor
creates its own validation set based on the X/y data passed into the.fit()
call, based on thevalidation_fraction
parameter, allowing the user to do early stopping with a simple.fit(X, y, sample_weight)
call.Why. This functionality makes it possible to use early stopping with
RandomizedSearchCV
/GridSearchCV
/cross_val_score
. Presently, you can pass an ngboost estimator object to those scorers/searchers, but there's no way to specify a validation set for theearly_stopping_rounds
parameter, making it not really practical to use these methods for hyperparameter searches with ngboost. You can sort of get around this right now by passing afit_params
parameter, where you specify early_stopping_rounds and a validation set, but as far as I can tell, this will result in using the same validation set for every cross validation fold, which is less than ideal by itself.This is a bit more than an abstract API convenience. The core advantage of letting users use early stopping without passing a validation set to
RandomizedSearchCV/GridSearchCV/cross_val_score
is that those tools make it trivial to do multiprocessed hyperparameter searching via the built-inn_jobs
parameter. Given the single-core nature of ngboost, this would lead to a proportional increase in hyperparameter search speed (e.g., if you have 8 cores, you can search 8x faster).Here is some background discussion on this issue at Scikit-learn, and at LightGBM, discussing the differences in the API, and the pros/cons of each approach. And here's a discussion of the issue with using GridSearchCV with the current behavior in Xgboost.
Beta Was this translation helpful? Give feedback.
All reactions