- More compact README.
- Updated function description.
This release is intended to be the last before stable version 1.0.0.
Passing a background dataset bg_X
is now optional.
If the explanation data X
is sufficiently large (>= 50 rows), bg_X
is derived as a random sample of bg_n = 200
rows from X
. If X
has less than bg_n
rows, then simply
bg_X = X
. If X
has too few rows (< 50), you will have to pass an explicit bg_X
.
ranger()
survival models now also work out-of-the-box without passing a tailored prediction function. Use the new argumentsurvival = "chf"
inkernelshap()
andpermshap()
to distinguish cumulative hazards (default) and survival probabilities per time point.- The resulting object of
kernelshap()
andpermshap()
now containbg_X
andbg_w
used to calculate the SHAP values.
- Factor-valued predictions are not supported anymore.
- Fix CRAN note about unavailable link to
gam::gam()
. - Added dependency to {MASS} for calculating Moore-Penrose generalized matrix inverse.
New additive explainer additive_shap()
that works for models fitted via
lm()
,glm()
,mgcv::gam()
,mgcv::bam()
,gam::gam()
,survival::coxph()
,survival::survreg()
.
The explainer uses predict(..., type = "terms")
, a beautiful trick
used in fastshap::explain.lm()
. The result will be identical to those returned by kernelshap()
and permshap()
but exponentially faster. Thanks David Watson for the great idea discussed in #130.
permshap()
now returns an object of class "kernelshap" to reduce the number of redundant methods.- To distinguish which algorithm has generated the "kernelshap" object, the outputs of
kernelshap()
,permshap()
(andadditive_shap()
) got an element "algorithm". is.permshap()
has been removed.
- {mlr3}: Non-probabilistic classification now works.
- {mlr3}: For probabilistic classification, you now have to pass
predict_type = "prob"
.
- The README has received an {mlr3} and {caret} example.
- Significant speed-up for pure data.frames, i.e., no data.tables or tibbles.
- Some small performance improvements, e.g., for factor predictions and univariate predictions.
- Slight speed-up of
permshap()
by caching calculations for the two special permutations of all 0 and all 1. Consequently, them_exact
component in the output is reduced by 2.
- Rewrote many examples in the README.
- Added reference to Erik Strumbelj and Ivan Kononeko (2014).
- Added
permshap()
to calculate exact permutation SHAP values. The function currently works for up to 14 features. - Factor-valued predictions are now supported. Each level is represented by its dummy variable.
- Slight speed-up.
- Integer valued case weights are now turned into doubles to avoid integer overflow.
- Multi-output case: column names of predictions are now used as list names of the resulting
S
andSE
lists.
- {mlr3} probabilistic classification would not work out-of-the-box. This has been fixed (with corresponding example in the README) in #100
- The progress bar was initialized at 1 instead of 0. This is fixed.
- Added explanation of sampling Kernel SHAP to help file.
- In internal calculations, use explicit
feature_names
as dimnames (#96).
- Fixed problem in Latex math for MacOS.
- Improved help files and README
- New contributor: Przemyslaw Biecek - welcome on board!
- My new cozy home: https://github.com/ModelOriented/kernelshap
- Webpage created with "pkgdown"
- Introduced Github workflows
- More unit tests
- Removed the
ks_extract()
function. It was designed to extract objects like the matrixS
of SHAP values from the resulting "kernelshap" objectx
. We feel that the standard extraction options (x$S
,x[["S"]]
, orgetElement(x, "S")
) are sufficient. - Adding
$(n \times K)$ matrix of predictions to the output, where$n$ is the number of rows in the explainer dataX
, and$K$ is the dimension of a single prediction (usually 1). - Setting
verbose = FALSE
now does not suppress the warning on too large background data anymore. UsesuppressWarnings()
instead.
- New logo
- Better package description
- Better README
- Removed dependency "dorng". This might have an impact on the seeding if in parallel mode.
- Removed dependency "MASS"
- Rewritten README and examples to better show the role of the background data.
- When
bg_X
contained more columns thanX
, unflexible prediction functions could fail when being applied tobg_X
.
- New argument
feature_names
allows to specify the features to calculate SHAP values for. The default equals tocolnames(X)
. This should be changed only in situations whenX
(the dataset to be explained) contains non-feature columns. - The background dataset can now consist of a single row only. This is useful in situations with natural "off" value such as for image data or for models that can naturally deal with missing values.
Thanks to David Watson, exact calculations are now also possible for
A word of caution: Exact calculations mean to create
The iterative Kernel SHAP sampling algorithm of Covert and Lee (2021) [1] works by randomly sample
A drawback of this strategy is that many (at least 75%) of the
- Step 1 (exact part): There are
$2p$ different on-off vectors$z$ with$\sum z \in {1, p-1}$ , covering a large proportion of the Kernel SHAP distribution. The degree 1 hybrid will list those vectors and use them according to their weights in the upcoming calculations. Depending on$p$ , we can also go a step further to a degree 2 hybrid by adding all$p(p-1)$ vectors with$\sum z \in {2, p-2}$ to the process etc. The necessary predictions are obtained along with other calculations similar to those in [1]. - Step 2 (sampling part): The remaining weight is filled by sampling vectors
$z$ according to Kernel SHAP weights renormalized to the values not yet covered by Step 1. Together with the results from Step 1 - correctly weighted - this now forms a complete iteration as in Covert and Lee (2021). The difference is that most mass is covered by exact calculations. Afterwards, the algorithm iterates until convergence. The output of Step 1 is reused in every iteration, leading to an extremely efficient strategy.
The default behaviour of kernelshap()
is as follows:
-
$p \le 8$ : Exact Kernel SHAP (with respect to the background data) -
$9 \le p \le 16$ : Degree 2 hybrid -
$p > 16$ : Degree 1 hybrid -
$p = 1$ : Exact Shapley values
It is also possible to use a pure sampling strategy, see Section "User visible changes" below. While this is usually not advisable compared to a hybrid approach, the options of kernelshap()
allow to study different properties of Kernel SHAP and doing empirical research on the topic.
Kernel SHAP in the Python implementation "shap" uses a quite similar hybrid strategy, but without iterating. The new logic in the R package thus combines the efficiency of the Python implementation with the convergence monitoring of [1].
[1] Ian Covert and Su-In Lee. Improving KernelSHAP: Practical Shapley Value Estimation Using Linear Regression. Proceedings of The 24th International Conference on Artificial Intelligence and Statistics, PMLR 130:3457-3465, 2021.
- The default value of
m
is reduced from$8p$ to$2p$ except whenhybrid_degree = 0
(pure sampling). - The default value of
exact
is nowTRUE
for$p \le 8$ instead of$p \le 5$ . - A new argument
hybrid_degree
is introduced to control the exact part of the hybrid algorithm. The default is 2 for$4 \le p \le 16$ and degree 1 otherwise. Set to 0 to force a pure sampling strategy (not recommended but useful to demonstrate superiority of hybrid approaches). - The default value of
tol
was reduced from 0.01 to 0.005. - The default of
max_iter
was reduced from 250 to 100. - The order of some of the arguments behind the first four has been changed.
- Paired sampling no longer duplicates
m
. - Thanks to Mathias Ambuehl, the random sampling of z vectors is now fully vectorized.
- The output of
print()
is now more slim. - A new
summary()
function shows more infos.
- The resulting object now contains
m_exact
(the number of on-off vectors used for the exact part),prop_exact
(proportion of mass treated in exact fashion),exact
flag, andtxt
(the info message when starting the algorithm).
- Predictions of
mgcv::gam()
would cause an error incheck_pred()
(they are 1D-arrays). - Fixed small mistakes in the examples of the README (mlr3 and mgcv).
The interface of kernelshap()
has been revised. Instead of specifying a prediction function, it suffices now to pass the fitted model object. The default pred_fun
is now stats::predict
, which works in most cases. Some other cases are catched via model class ("ranger" and mlr3 "Learner"). The pred_fun
can be overwritten by a function of the form function(object, X, ...)
. Additional arguments to the prediction function are passed via ...
of kernelshap()
.
Some examples:
- Logistic regression (logit scale):
kernelshap(fit, X, bg_X)
- Logistic regression (probabilities):
kernelshap(fit, X, bg_X, type = "response")
- Linear regression with logarithmic response, but evaluated on original scale: Here, the default predict function needs to be overwritten:
kernelshap(fit, X, bg_X, pred_fun = function(m, X) exp(predict(m, X)))
-
kernelshap()
has received a more intuitive interface, see breaking change above. - The package now supports multidimensional predictions. Hurray!
- Thanks to David Watson, parallel computing is now supported. The user needs to set up the parallel backend before calling
kernelshap()
, e.g., using the "doFuture" package, and then setparallel = TRUE
. Especially on Windows, sometimes not all global variables or packages are loaded in the parallel instances. These can be specified byparallel_args
, a list of arguments passed toforeach()
. - Even without parallel computing,
kernelshap()
has become much faster. - For
$2 \le p \le 5$ features, the algorithm now returns exact Kernel SHAP values with respect to the given background data. (For$p = 1$ , exact Shapley values are returned.) - Direct handling of "tidymodels" models.
- Besides
matrix
,data.frame
s, andtibble
s, the package now also acceptsdata.table
s (if the prediction function can deal with them). -
kernelshap()
is less picky regarding the output structure ofpred_fun()
. -
kernelshap()
is less picky about the column structure of the background databg_X
. It should simply contain the columns ofX
(but can have more or in different order). The old behaviour was to launch an error ifcolnames(X) != colnames(bg_X)
. - The default
m = "auto"
has been changed fromtrunc(20 * sqrt(p))
tomax(trunc(20 * sqrt(p)), 5 * p
. This will have an effect for cases where the number of features$p > 16$ . The change will imply more robust results for large p. - There were too many "ks_*()" functions to extract elements of a "kernelshap" object. They are now all deprecated and replaced by
ks_extract(, what = "S")
. - Added "MASS", "doRNG", and "foreach" to dependencies.
- Depending on
$m$ and$p$ , the matrix inversion required in the constrained least-squares solution could fail. It is now replaced byMASS::ginv()
, the Moore-Penrose pseudoinverse usingsvd()
.
- David Watson
This is the initial release.