CRAN release 0.2.0
kernelshap 0.2.0
Breaking change
The interface of kernelshap()
has been revised. Instead of specifying a prediction function, it suffices now to pass the fitted model object. The default pred_fun
is now stats::predict
, which works in most cases. Some other cases are catched via model class ("ranger" and mlr3 "Learner"). The pred_fun
can be overwritten by a function of the form function(object, X, ...)
. Additional arguments to the prediction function are passed via ...
of kernelshap()
.
Some examples:
- Logistic regression (logit scale):
kernelshap(fit, X, bg_X)
- Logistic regression (probabilities):
kernelshap(fit, X, bg_X, type = "response")
- Linear regression with logarithmic response, but evaluated on original scale: Here, the default predict function needs to be overwritten:
kernelshap(fit, X, bg_X, pred_fun = function(m, X) exp(predict(m, X)))
Major improvements
-
kernelshap()
has received a more intuitive interface, see breaking change above. - The package now supports multidimensional predictions. Hurray!
- Thanks to David Watson, parallel computing is now supported. The user needs to set up the parallel backend before calling
kernelshap()
, e.g., using the "doFuture" package, and then setparallel = TRUE
. Especially on Windows, sometimes not all global variables or packages are loaded in the parallel instances. These can be specified byparallel_args
, a list of arguments passed toforeach()
. - Even without parallel computing,
kernelshap()
has become much faster. - For
$2 \le p \le 5$ features, the algorithm now returns exact Kernel SHAP values with respect to the given background data. (For$p = 1$ , exact Shapley values are returned.) - Direct handling of "tidymodels" models.
User visible changes
- Besides
matrix
,data.frame
s, andtibble
s, the package now also acceptsdata.table
s (if the prediction function can deal with them). -
kernelshap()
is less picky regarding the output structure ofpred_fun()
. -
kernelshap()
is less picky about the column structure of the background databg_X
. It should simply contain the columns ofX
(but can have more or in different order). The old behaviour was to launch an error ifcolnames(X) != colnames(bg_X)
. - The default
m = "auto"
has been changed fromtrunc(20 * sqrt(p))
tomax(trunc(20 * sqrt(p)), 5 * p
. This will have an effect for cases where the number of features$p > 16$ . The change will imply more robust results for large p. - There were too many "ks_*()" functions to extract elements of a "kernelshap" object. They are now all deprecated and replaced by
ks_extract(, what = "S")
. - Added "MASS", "doRNG", and "foreach" to dependencies.
Bug fixes
- Depending on
$m$ and$p$ , the matrix inversion required in the constrained least-squares solution could fail. It is now replaced byMASS::ginv()
, the Moore-Penrose pseudoinverse usingsvd()
.
New contributor
- David Watson