Releases: ModelOriented/hstats
CRAN release 1.2.1
Usability
ranger()
survival models now also work out-of-the-box without passing a tailored prediction function. Use the new argumentsurvival = "chf"
inhstats()
,ice()
, andpartial_dep()
to distinguish cumulative hazards (default) and survival probabilities ("prob") per time point.
Other changes
- Fixed wrong ORCID of Michael.
CRAN release 1.2.0
My new home
- My brand new home: https://github.com/ModelOriented/hstats
Other changes
- Factor-valued predictions are no longer possible.
- Consequently, also removed "classification_error" loss.
CRAN release 1.1.2
ICE plots
- The ICE plot of a multioutput model without BY variable will now be using facets (instead of color). Use
swap_dim = TRUE
for the old behavior.
API
- {mlr3}: Non-probabilistic classification now works.
- {mlr3}: For probabilistic classification, you now have to pass
predict_type = "prob"
.
CRAN release 1.1.1
CRAN release 1.1.0
Enhancements
- {hstats} now also works for factor predictions. The levels are represented by one-hot-encoded columns (PR#101).
- The plot method of a two-dimensional PDP has recieved the option
d2_geom = "line"
. Instead of a heatmap of the two features, one of the features is moved to color grouping. Combined withswap_dim = TRUE
, you can swap the role of the twov
variables without recalculating anything. The idea was proposed by Roel Verbelen in issue #91, see also issue #94.
Bug fixes
- Using
BY
andw
via column names would fail for tibbles. This problem was described in #92 by Roel Verbelen. Thx!
Other changes
CRAN release 1.0.0
Major changes
- Quantile approximation:
hstats()
now has the optionapprox = FALSE
. Set toTRUE
to replace values of dense numeric columns bygrid_size = 50
quantile midpoints. This will bring a massive speed-up for one-way calculations. Use this option when one-way calculations are slow, or when you want to increasen_max
. hstats()
:n_max
has been increased from 300 to 500 rows. This will make estimates of H-statistics more stable at the price of longer run time. Reduce to 300 for the old behaviour.hstats()
: Three-way interactions are not anymore calculated by default. Setthreeway_m
to 5 for the old behaviour.- Revised plots: The colors and color palettes have changed and can now also be controlled via global options. For instance, to change the fill color of all bars, set
options(hstats.fill = new value)
. Value labels are more clear, and there are more options. Varying color/fill scales now use viridis (inferno). This can be modified on the fly or viaoptions(hstats.viridis_args = list(...))
. - "hstats_matrix" object: All statistics functions, e.g.,
h2_pairwise()
orperm_importance()
, now return a "hstats_matrix". The values are stored in$M
and can be plotted viaplot()
. Other methods include:dimnames()
,rownames()
,colnames()
,dim()
,nrow()
,ncol()
,head()
,tail()
, and subsetting like a normal matrix. This allows, e.g, to select and plot only one column of the results. perm_importance()
: Theperms
argument has been changed tom_rep
.print()
andsummary()
methods have been revised.- The arguments
w
(case weights) andy
(response) can now also be passed as column names.
Minor changes
- Statistics: The argument
top_m
has been moved to theplot()
method. - Statistics: The clipping threshold
eps
of squared numerator statistics has been reduced from1e-8
to1e-10
. It is now handled inhstats()
instead of the statistic functions. -
H-squared
: The$H^2$ statistic stored in a "hstats" object is now a matrix with one row (it was a vector). -
pd_importance()
: The "hstats" object now contains pre-calculated PD-based importance values in$pd_importance
. -
summary.hstats()
now returns an object of class "hstats_summary" instead of "summary_hstats". -
average_loss()
is more flexible regarding the groupBY
argument. It can also be a variable name. Non-discreteBY
variables are now automatically binned. Likepartial_dep()
, binning is controlled by theby_size = 4
argument. -
average_loss()
also returns a "hstats_matrix" object withprint()
andplot()
method. The values can be extracted via$M
. - The default
v
ofhstats()
andperm_importance()
is nowNULL
. Internally, it is set tocolnames(X)
(minus the column names ofw
andy
if passed as name). - Missing grid values:
partial_dep()
andice()
have received ana.rm
argument that controls if missing values are dropped during grid creation. The defaultTRUE
is compatible with earlier releases. - Missing values in
hstats()
: Discrete variables with missings would causerowsum()
to launch repeated warnings. This case is now catched. - The position of some function arguments have changed.
-
perm_importance()
: The default ofverbose
isTRUE
again.
CRAN release 0.3.0
This is intended to be the last version before 1.0.0.
Visible changes
- Grid of
ice()
andpartial_dep()
: So far, the default grid strategy "uniform" usedpretty()
to generate the evaluation points. To provide more predictable grid sizes, and to be more in line with other implementations of partial dependence and ICE, we now useseq()
to create the uniform grid. h2_pairwise()
andh2_threeway()
will now also include 0 values. Usezero = FALSE
to drop them, see below. The padding with 0 is done at no computational cost, and will affect only up topairwise_m
andthreeway_m
features.hstats()
: The default number of features considered for three-way interactions has been changed fromthreeway_m = pairwise_m
to the more cautiousthreeway_m = min(pairwise_m, 5L)
. Furthermore,threeway_m
is capped atpairwise_m
.- The
print()
method ofsummary.hstats()
is less verbose.
Improvements
h2_overall()
,h2_pairwise()
,h2_threeway()
,plot.hstats()
, andsummary.hstats()
have received an argumentzero = TRUE
. Set toFALSE
to drop statistics having value 0.perm_importance()
andaverage_loss()
will now recycle a univariate response when combined with multivariate predictions. This is useful, e.g., when the prediction function represents the predictions of multiple models that should be evaluated against a common response.
Bug fixes
- All progress bars were initialized 1 step too late.
perm_importance()
andaverage_loss()
would fail for "mlogloss" in case the responsey
was univariate and non-factor/non-character.
Other changes
- All available H-statistics are now calculated within
hstats()
and attached to the resulting object. Each statistic is stored as list with numerator and denominator matrices/vectors. The functionsh2()
,h2_overall()
,h2_pairwise()
, andh2_threeway()
,print.hstats()
,summary().hstats()
,plot.hstats()
will use these without having to recalculate the required numerators and denominators. The results, however, are unchanged.
CRAN release 0.2.0
New major features
-
average_loss(): This new function calculates the average loss of a model for a given dataset, optionally grouped by a discrete vector. It supports the most important loss functions (squared error, Poisson deviance, Gamma deviance, Log loss, multivariate Log loss, absolute error, classification error), and allows for case weights. Custom losses can be passed as vector/matrix valued functions of signature
f(obs, pred)
.
Note that such a custom function needs to return per-row losses, not their average. -
perm_importance(): H-statistics are often calculated for important features only. To support this workflow, we have added permutation importance regarding the most important loss functions. Multivariate losses can be studied individually or collapsed over dimensions. The importance of feature groups can be studied as well. Note that the API of
perm_importance()
is different from the experimentalpd_importance()
, which is calculated from a "hstats" object.
Major changes in defaults
hstats()
now uses the default feature vectorv = colnames(X)
, simplifying the API in most cases. The typical call is nowhstats(object, X = Feature data)
.h2_overall()
,h2_pairwise()
,h2_threeway()
,pd_importance()
by default do not plot results anymore. Setplot = TRUE
to do so.
Minor changes
summary.hstats()
now returns an object of class "summary_hstats" with its ownprint()
method. Like this, one can usesu <- summary()
without printing to the console.- The output of
summary.hstats()
is printed slightly more compact. plot.hstats()
has recieved arotate_x = FALSE
argument for rotating x labels by 45 degrees.plot.hstats()
andsummary.hstats()
have received explicit argumentsnormalize
,squared
,sort
,eps
instead of passing them via...
.plot.hstats()
now passes...
togeom_bar()
.- Slight speed-up of
hstats()
in the one-dimensional case.
Bug fixes
- Probabilistic {mlr3} classifiers did not work out-of-the box. This has been fixed.
CRAN release 0.1.0
This is the initial CRAN release of the package.