-
The package will now warn when parallel processing has been enabled with foreach but not with future. See
?parallelism
to learn more about transitioning your code to future (#878, #866). -
The package will now log a backtrace for errors and warnings that occur during tuning. When a tuning process encounters issues, see the new
trace
column in thecollect_notes(.Last.tune.result)
output to find precisely where the error occurred (#873). -
When automatic grids are used,
dials::grid_space_filling()
is now used (instead ofdials::grid_latin_hypercube()
). Overall, the new function produces optimized designs (not depending on random numbers). When using Bayesian models, we will use a Latin Hypercube since we produce 5,000 candidates, which is too slow to do with pre-optimized designs.
-
Addressed issue in
int_pctl()
where the function would error when parallelized usingmakePSOCKcluster()
(#885). -
Addressed issue where tuning functions would raise the error
object 'iteration' not found
withplan(multisession)
and the control optionparallel_over = "everything"
(#888).
-
tune now fully supports models in the "censored regression" mode. These models can be fit, tuned, and evaluated like the regression and classification modes. tidymodels.org has more information and tutorials on how to work with survival analysis models.
-
Introduced support for parallel processing using the future framework. The tune package previously supported parallelism with foreach, and users can use either framework for now. In a future release, tune will begin the deprecation cycle for parallelism with foreach, so we encourage users to begin migrating their code now. See the Parallel Processing section in the "Optimizations" article to learn more (#866).
-
Added a
type
argument tocollect_metrics()
to indicate the desired output format. The default,type = "long"
, returns output as before, whiletype = "wide"
pivots the output such that each metric has its own column (#839). -
Added a new function,
compute_metrics()
, that allows for computing new metrics after evaluating against resamples. The arguments and output formats are closely related to those fromcollect_metrics()
, but this function requires that the input be generated with the control optionsave_pred = TRUE
and additionally takes ametrics
argument with a metric set for new metrics to compute. This allows for computing new performance metrics without requiring users to re-fit and re-predict from each model (#663). -
A method for rsample's
int_pctl()
function that will compute percentile confidence intervals on performance metrics for objects produced byfit_resamples()
,tune_*()
, andlast_fit()
. -
The Brier score is now part of the default metric set for classification models.
-
last_fit()
will now error when supplied a fitted workflow (#678). -
Fixes bug where
.notes
entries were sorted in the wrong order in tuning results for resampling schemes with IDs that aren't already in alphabetical order (#728). -
Fixes bug where
.config
entries in the.extracts
column intune_bayes()
output didn't align with the entries they ought to in the.metrics
and.predictions
columns (#715). -
Metrics from apparent resamples are no longer included when estimating performance with
estimate_tune_results()
(and thus withcollect_metrics(..., summarize = TRUE)
andcompute_metrics(..., summarize = TRUE)
, #714). -
Handles edge cases for
tune_bayes()
'iter
argument more soundly. Foriter = 0
, the output oftune_bayes()
should matchtune_grid()
, andtune_bayes()
will now error wheniter < 0
.tune_bayes()
will now alter the state of RNG slightly differently, resulting in changed Bayesian optimization search output (#720). -
augment()
methods totune_results
,resample_results
, andlast_fit
objects now always return tibbles (#759).
-
Improved error message when needed packages aren't installed (#727).
-
augment()
methods totune_results
,resample_results
, andlast_fit
objects now always returns tibbles (#759). -
Improves documentation related to the hyperparameters associated with extracted objects that are generated from submodels. See the "Extracting with submodels" section of
?collect_extracts
to learn more. -
eval_time
andeval_time_target
attribute was added to tune objects. There are also.get_tune_eval_times()
and.get_tune_eval_time_target()
functions. -
collect_predictions()
now reorders the columns so that all prediction columns come first (#798). -
augment()
methods totune_results
,resample_results
, andlast_fit
objects now return prediction results in the first columns (#761). -
autoplot()
will now meaningfully error if only 1 grid point is present, rather than producing a plot (#775). -
Added notes on case weight usage to several functions (#805).
-
For iterative optimization routines,
autoplot()
will use integer breaks whentype = "performance"
ortype = "parameters"
.
-
Several functions gained an
eval_time
argument for the evaluation time of dynamic metrics for censored regression. The placement of the argument breaks passing-by-position for one or more other arguments toautoplot.tune_results()
and the developer-focusedcheck_initial()
(#857). -
Ellipses (...) are now used consistently in the package to require optional arguments to be named. For functions that previously had ellipses at the end of the function signature, they have been moved to follow the last argument without a default value: this applies to
augment.tune_results()
,collect_predictions.tune_results()
,collect_metrics.tune_results()
,select_best.tune_results()
,show_best.tune_results()
, and the developer-focusedestimate_tune_results()
,load_pkgs()
, andencode_set()
. Several other functions that previously did not have ellipses in their signatures gained them: this applies toconf_mat_resampled()
and the developer-focusedcheck_workflow()
. Optional arguments previously passed by position will now error informatively prompting them to be named. These changes don't apply in cases when the ellipses are currently in use to forward arguments to other functions (#863).
-
last_fit()
now works with the 3-way validation split objects fromrsample::initial_validation_split()
.last_fit()
andfit_best()
now have a new argumentadd_validation_set
to include or exclude the validation set in the dataset used to fit the model (#701). -
Disambiguates the
verbose
andverbose_iter
control options to better align with documented functionality. The former controls logging for general progress updates, while the latter only does so for the Bayesian search process. (#682)
- Fixed a bug introduced in tune 1.1.0 in
collect_()
functions where the.iter
column was dropped.
tune 1.1.0 introduces a number of new features and bug fixes, accompanied by various optimizations that substantially decrease the total evaluation time to tune hyperparameters in the tidymodels.
-
Introduced a new function
fit_best()
that provides a shorthand interface to fit a final model after parameter tuning. (#586) -
Refined machinery for logging issues during tuning. Rather than printing out warnings and errors as they appear, the package will now only print unique tuning issues, updating a dynamic summary message that maintains counts of each unique issue. This feature is only enabled for tuning sequentially and can be manually toggled with the
verbose
option. (#588) -
Introduced
collect_extracts()
, a function for collecting extracted objects from tuning results. The format of results closely mirrorscollect_notes()
, where the extracted objects are contained in a list-column alongside the resample ID and workflow.config
. (#579)
-
Fixed bug in
select_by_pct_loss()
where the model with the greatest loss within the limit was returned rather than the most simple model whose loss was within the limit. (#543) -
Fixed bug in
tune_bayes()
where.Last.tune.result
would return intermediate tuning results. (#613) -
Extended
show_best()
,select_best()
,select_by_one_std_error()
,select_by_pct_loss()
to accommodate metrics with a target value of zero (notably,yardstick::mpe()
andyardstick::msd()
). (#243)
-
Implemented various optimizations in tune's backend that substantially decrease the total evaluation time to tune hyperparameters with the tidymodels. (#634, #635, #636, #637, #640, #641, #642, #648, #649, #653, #656, #657)
-
Allowed users to supply list-columns in
grid
arguments. This change allows for manually specifying grid values that must be contained in list-columns, like functions or lists. (#625) -
Clarified error messages in
select_by_*
functions. Error messages now only note entries in...
that are likely candidates for failure toarrange()
, and those error messages are no longer duplicated for each entry in...
. -
Improved condition handling for errors that occur during extraction from workflows. While messages and warnings were appropriately handled, errors occurring due to misspecified
extract()
functions being supplied tocontrol_*()
functions were silently caught. As with warnings, errors are now surfaced both during execution and atprint()
(#575). -
Moved forward with the deprecation of
parameters()
methods forworkflow
s,model_spec
s, andrecipes
. Each of these methods will now warn on every usage and will be defunct in a later release of the package. (#650) -
Various bug fixes and improvements to documentation.
-
last_fit()
,fit_resamples()
,tune_grid()
, andtune_bayes()
do not automatically error if the wrong type ofcontrol
object is passed. If the passed control object is not a superset of the one that is needed, the function will still error. As an example, passingcontrol_grid()
totune_bayes()
will fail but passingcontrol_bayes()
totune_grid()
will not. (#449) -
The
collect_metrics()
method for racing objects was removed (and is now in the finetune package). -
Improved prompts related to parameter tuning. When tuning parameters are supplied that are not compatible with the given engine,
tune_*()
functions will now error. (#549) -
The
control_bayes()
got a new argumentverbose_iter
that is used to control the verbosity of the Bayesian calculations. This change means that theverbose
argument is being passed totune_grid()
to control its verbosity. -
The
control_last_fit()
function gained an argumentallow_par
that defaults toFALSE
. This change addresses failures afterlast_fit()
using modeling engines that require native serialization, and we anticipate little to no increase in time-to-fit resulting from this change. (#539, tidymodels/bonsai#52) -
show_notes()
does a better jobs of... showing notes. (#558)
-
show_notes()
is a new function that can better help understand warnings and errors. -
Logging that occurs using the tuning and resampling functions now show multi-line error messages and warnings in multiple lines.
-
When
fit_resamples()
,last_fit()
,tune_grid()
, ortune_bayes()
complete without error (even if models fail), the results are also available via.Last.tune.result
. -
last_fit()
now accepts acontrol
argument to allow users to control aspects of the last fitting process viacontrol_last_fit()
(#399). -
Case weights are enabled for models that can use them.
-
Some internal functions were exported for use by other packages.
-
A check was added to
fit_resamples()
andlast_fit()
to give a more informative error message when a preprocessor or model have parameters marked for tuning. -
outcome_names()
works correctly when recipe has NA roles. (#518)
-
The
.notes
column now contains information on the type of note (error or warning), the location where it occurred, and the note. Printing a tune result has different output describing the notes. -
collect_notes()
can be used to gather any notes to a tibble. (#363) -
Parallel processing with PSOCK clusters is now more efficient, due to carefully avoiding sending extraneous information to each worker (#384, #396).
-
The engine arguments for xgboost
alpha
,lambda
, andscale_pos_weight
are now tunable. -
When the Bayesian optimization data contain missing values, these are removed before fitting the GP model. If all metrics are missing, no GP is fit and the current results are returned. (#432)
-
Moved
tune()
from tune to hardhat (#442). -
The
parameters()
methods forrecipe
,model_spec
, andworkflow
objects have been soft-deprecated in favor ofextract_parameter_set_dials()
methods (#428).
-
When using
load_pkgs()
, packages that use random numbers on start-up do not affect the state of the RNG. We also added more control of the RNGkind to make it consistent with the user's previous value (#389). -
New
extract_*()
functions have been added that supersede many of the the existingpull_*()
functions. This is part of a larger move across the tidymodels packages towards a family of genericextract_*()
functions. Manypull_*()
functions have been soft-deprecated, and will eventually be removed. (#378)
-
Fixed a bug where the resampled confusion matrix is transposed when
conf_mat_resamped(tidy = FALSE)
(#372) -
False positive warnings no longer occur when using the
doFuture
package for parallel processing (#377)
-
Fixed an issue in
finalize_recipe()
which failed during tuning of recipe steps that contain multipletune()
parameters in an single step. -
Changed
conf_mat_resampled()
to return the same type of object asyardstick::conf_mat()
whentidy = FALSE
(#370). -
The automatic parameter machinery for
sample_size
with the C5.0 engine was changes to usedials::sample_prop()
.
-
The
rsample::pretty()
methods were extended totune_results
objects. -
Added
pillar
methods for formattingtune
objects in list columns. -
A method for
.get_fingerprint()
was added. This helps determine iftune
objects used the same resamples.
-
collect_predictions()
was made generic. -
The default tuning parameter for the SVM polynomial degree was switched from
dials::degree()
todials::prod_degree()
since it must be an integer.
-
last_fit()
andworkflows::fit()
will now give identical results for the same workflow when the underlying model uses random number generation (#300). -
Fixed an issue where recipe tuning parameters could be randomly matched to the tuning grid incorrectly (#316).
-
last_fit()
no longer accidentally adjusts the random seed (#264). -
Fixed two bugs in the acquisition function calculations.
-
New
parallel_over
control argument to adjust the parallel processing method that tune uses. -
The
.config
column that appears in the returned tibble from tuning and fitting resamples has changed slightly. It is now always of the form"Preprocessor<i>_Model<j>"
. -
predict()
can now be called on the workflow returned fromlast_fit()
(#294, #295, #296). -
tune now supports setting the
event_level
option from yardstick through the control objects (i.e.control_grid(event_level = "second")
) (#240, #249). -
tune now supports workflows created with the new
workflows::add_variables()
preprocessor. -
Better control the random number streams in parallel for
tune_grid()
andfit_resamples()
(#11) -
Allow
...
to pass options fromtune_bayes()
toGPfit::GP_fit()
. -
Additional checks are done for the initial grid that is given to
tune_bayes()
. If the initial grid is small relative to the number of model terms, a warning is issued. If the grid is a single point, an error occurs. (#269) -
Formatting of some messages created by
tune_bayes()
now respect the width and wrap lines using the newmessage_wrap()
function. -
tune functions (
tune_grid()
,tune_bayes()
, etc) will now error if a model specification or model workflow are given as the first argument (the soft deprecation period is over). -
An
augment()
method was added for objects generated bytune_*()
,fit_resamples()
, andlast_fit()
.
-
autoplot.tune_results()
now requires objects made by version 0.1.0 or higher of tune. -
tune
objects no longer keep therset
class that they have from theresamples
argument.
-
autoplot.tune_results()
now produces a different plot when the tuning grid is a regular grid (i.e. factorial or nearly factorial in nature). If there are 5+ parameters, the standard plot is produced. Non-regular grids are plotted in the same way (although see next bullet point). See?autoplot.tune_results
for more information. -
autoplot.tune_results()
now transforms the parameter values for the plot. For example, if thepenalty
parameter was used for a regularized regression, the points are plotted on the log-10 scale (its default transformation). For non-regular grids, the facet labels show the transformation type (e.g."penalty (log-10)"
or"cost (log-2)"
). For regular grid, the x-axis is scaled usingscale_x_continuous()
. -
Finally,
autoplot.tune_results()
now shows the parameter labels in a plot. For example, if a k-nearest neighbors model was used withneighbors = tune()
, the parameter will be labeled as"# Nearest Neighbors"
. When an ID was used, such asneighbors = tune("K")
, this is used to identify the parameter. -
In other plotting news,
coord_obs_pred()
has been included for regression models. When plotting the observed and predicted values from a model, this forces the x- and y-axis to be the same range and uses an aspect ratio of 1. -
The outcome names are saved in an attribute called
outcomes
to objects with classtune_results
. Also, several accessor functions (named `.get_tune_*()) were added to more easily access such attributes. -
conf_mat_resampled()
computes the average confusion matrix across resampling statistics for a single model. -
show_best()
, and theselect_*()
functions will now use the first metric in the metric set if no metric is supplied. -
filter_parameters()
can trim the.metrics
column of unwanted results (as well as columns.predictions
and.extracts
) fromtune_*
objects. -
In concert with
dials
> 0.0.7, tuning engine-specific arguments is possible. Many known engine-specific tuning parameters and handled automatically. -
If a grid is given, parameters do not need to be finalized to be used in the
tune_*()
functions. -
Added a
save_workflow
argument tocontrol_*
functions that will result in the workflow object used to carry out tuning/fitting (regardless of whether a formula or recipe was given as input to the function) to be appended to the resultingtune_results
object in aworkflow
attribute. The new.get_tune_workflow()
function can be used to access the workflow. -
Many of the output columns in a
tune_results
object have an additional column called.config
. This is meant to be a unique, qualitative value that used for sorting and merging. These values also correspond to the messages in the logging produced whenverbose = TRUE
.
- The arguments to the main tuning/fitting functions (
tune_grid()
,tune_bayes()
, etc) have been reordered to better align with parsnip'sfit()
. The first argument to all these functions is now a model specification or model workflow. The previous versions are soft-deprecated as of 0.1.0 and will be deprecated as of 0.1.2.
-
Added more packages to be fully loaded in the workers when run in parallel using
doParallel
(#157), (#159), and (#160) -
collect_predictions()
gains two new arguments.parameters
allows for pre-filtering of the hold-out predictions by tuning parameters values. If you are only interested in one sub-model, this makes things much faster. The other option issummarize
and is used when the resampling method has training set rows that are predicted in multiple holdout sets. -
select_best()
,select_by_one_std_err()
, andselect_by_pct_loss()
no longer have a redundantmaximize
argument (#176). Each metric set in yardstick now has a direction (maximize vs. minimize) built in.
tune_bayes()
no longer errors with a recipe, which has tuning parameters, in combination with a parameter set, where the defaults contain unknown values (#168).
-
CRAN release.
-
Changed license to MIT
-
The
...
arguments oftune_grid()
andtune_bayes()
have been moved forward to force optional arguments to be named. -
New
fit_resamples()
for fitting a set of resamples that don't require any tuning. -
Changed
summarise.tune_results()
back toestimate.tune_results()
- Added a
NEWS.md
file to track changes to the package.