Skip to content

Commit

Permalink
Merge pull request #21 from mayer79/cran_v2
Browse files Browse the repository at this point in the history
prepare CRAN version 0.2.0
  • Loading branch information
mayer79 authored Sep 5, 2022
2 parents 89701cc + fe37624 commit e4870fb
Show file tree
Hide file tree
Showing 11 changed files with 187 additions and 144 deletions.
1 change: 1 addition & 0 deletions .Rbuildignore
Original file line number Diff line number Diff line change
Expand Up @@ -8,3 +8,4 @@
^\.Rproj\.user$
^compare_with_python.R$
^Z_exact.R$
^CRAN-SUBMISSION$
3 changes: 3 additions & 0 deletions CRAN-SUBMISSION
Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@
Version: 0.2.0
Date: 2022-09-05 12:31:15 UTC
SHA: e28ed70f22cbeb7f234bc554c8eda7153d538e41
8 changes: 4 additions & 4 deletions DESCRIPTION
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
Package: kernelshap
Title: Kernel SHAP
Version: 0.1.900
Version: 0.2.0
Authors@R: c(
person("Michael", "Mayer", , "[email protected]", role = c("aut", "cre")),
person("David", "Watson", , "[email protected]", role = "ctb")
Expand All @@ -13,9 +13,9 @@ Description: Multidimensional version of the iterative Kernel SHAP
provides numeric predictions of dimension one or higher. Examples
include linear regression, logistic regression (logit or probability
scale), other generalized linear models, generalized additive models,
and neural networks. The package plays well together with
meta-learning packages like 'caret' or 'mlr3'. Visualizations can be
done using the R package 'shapviz'.
and neural networks. The package plays well together with
meta-learning packages like 'tidymodels', 'caret' or 'mlr3'.
Visualizations can be done using the R package 'shapviz'.
License: GPL (>= 2)
Depends:
R (>= 3.2.0)
Expand Down
32 changes: 10 additions & 22 deletions NEWS.md
Original file line number Diff line number Diff line change
@@ -1,39 +1,27 @@
# kernelshap 0.1.900 DEVEL
# kernelshap 0.2.0

## Breaking change

The interface of `kernelshap()` has been revised. Instead of specifying a prediction function, it suffices now to pass the fitted model object. The default `pred_fun` is now `stats::predict`, which works in most cases. Some other cases are catched via model class ("ranger" and mlr3 "Learner"). The `pred_fun` can be overwritten by a function of the form `function(object, X, ...)`.
The interface of `kernelshap()` has been revised. Instead of specifying a prediction function, it suffices now to pass the fitted model object. The default `pred_fun` is now `stats::predict`, which works in most cases. Some other cases are catched via model class ("ranger" and mlr3 "Learner"). The `pred_fun` can be overwritten by a function of the form `function(object, X, ...)`. Additional arguments to the prediction function are passed via `...` of `kernelshap()`.

Example: Logistic regression with predictions on logit scale
Some examples:

```
kernelshap(fit, X, bg_X)
```

Example: Logistic regression with predictions on probability scale

```
kernelshap(fit, X, bg_X, type = "response")
```

Example: Log-linear regression to be evaluated on original scale.
Here, the default predict function needs to be overwritten:

```
kernelshap(fit, X, bg_X, pred_fun = function(m, X) exp(predict(m, X)))
```
- Logistic regression (logit scale): `kernelshap(fit, X, bg_X)`
- Logistic regression (probabilities): `kernelshap(fit, X, bg_X, type = "response")`
- Linear regression with logarithmic response, but evaluated on original scale: Here, the default predict function needs to be overwritten: `kernelshap(fit, X, bg_X, pred_fun = function(m, X) exp(predict(m, X)))`

## Major improvements

- `kernelshap()` has received a more intuitive interface, see breaking change above.
- The package now supports multidimensional predictions. Hurray!
- Thanks to David Watson, parallel computing is now supported. The user needs to set up the parallel backend before calling `kernelshap()`, i.e., using the "doFuture" package, and then set `parallel = TRUE`. Especially on Windows, sometimes not all global variables or packages are loaded in the parallel instances. These can be specified by `parallel_args`, a list of arguments passed to `foreach()`.
- Thanks to David Watson, parallel computing is now supported. The user needs to set up the parallel backend before calling `kernelshap()`, e.g., using the "doFuture" package, and then set `parallel = TRUE`. Especially on Windows, sometimes not all global variables or packages are loaded in the parallel instances. These can be specified by `parallel_args`, a list of arguments passed to `foreach()`.
- Even without parallel computing, `kernelshap()` has become much faster.
- For $2 \le p \le 5$ features, the algorithm now returns exact Kernel SHAP values with respect to the given background data. (For $p = 1$, exact *Shapley values* are returned.)
- Besides `matrix`, `data.frame`s, and `tibble`s, the package now also accepts `data.table`s (if the prediction function can deal with them).
- Direct handling of "tidymodels" models.

## User visible changes

- Besides `matrix`, `data.frame`s, and `tibble`s, the package now also accepts `data.table`s (if the prediction function can deal with them).
- `kernelshap()` is less picky regarding the output structure of `pred_fun()`.
- `kernelshap()` is less picky about the column structure of the background data `bg_X`. It should simply contain the columns of `X` (but can have more or in different order). The old behaviour was to launch an error if `colnames(X) != colnames(bg_X)`.
- The default `m = "auto"` has been changed from `trunc(20 * sqrt(p))` to `max(trunc(20 * sqrt(p)), 5 * p`. This will have an effect for cases where the number of features $p > 16$. The change will imply more robust results for large p.
Expand All @@ -46,7 +34,7 @@ kernelshap(fit, X, bg_X, pred_fun = function(m, X) exp(predict(m, X)))

## New contributor

- David Watson is now contributor of the package.
- David Watson

# kernelshap 0.1.0

Expand Down
12 changes: 6 additions & 6 deletions R/kernelshap.R
Original file line number Diff line number Diff line change
Expand Up @@ -52,10 +52,10 @@
#' on the same scale.
#' @param max_iter If the stopping criterion (see \code{tol}) is not reached after
#' \code{max_iter} iterations, the algorithm stops.
#' @param parallel If \code{TRUE}, use parallel \code{foreach} to loop over rows
#' to be explained. Must register backend beforehand, e.g. via \code{doFuture},
#' @param parallel If \code{TRUE}, use parallel \code{foreach::foreach()} to loop over rows
#' to be explained. Must register backend beforehand, e.g. via "doFuture" package,
#' see Readme for an example. Parallelization automatically disables the progress bar.
#' @param parallel_args A named list of arguments passed to \code{foreach()}, see
#' @param parallel_args A named list of arguments passed to \code{foreach::foreach()}, see
#' \code{?foreach::foreach}. Ideally, this is \code{NULL} (default). Only relevant
#' if \code{parallel = TRUE}. Example on Windows: if \code{object} is a generalized
#' additive model fitted with package "mgcv", then one might need to set
Expand All @@ -81,7 +81,7 @@
#' @examples
#' # Linear regression
#' fit <- stats::lm(Sepal.Length ~ ., data = iris)
#' s <- kernelshap(fit, iris[1:2, -1], bg_X = iris[, -1])
#' s <- kernelshap(fit, iris[1:2, -1], bg_X = iris)
#' s
#'
#' # Multivariate model
Expand All @@ -106,11 +106,11 @@
#' )
#'
#' # On scale of linear predictor
#' s <- kernelshap(fit, iris[1:2], bg_X = iris[1:2])
#' s <- kernelshap(fit, iris[1:2], bg_X = iris)
#' s
#'
#' # On scale of response (probability)
#' s <- kernelshap(fit, iris[1:2], bg_X = iris[1:2], type = "response")
#' s <- kernelshap(fit, iris[1:2], bg_X = iris, type = "response")
#' s
#'
kernelshap <- function(object, ...){
Expand Down
8 changes: 7 additions & 1 deletion R/utils.R
Original file line number Diff line number Diff line change
Expand Up @@ -157,6 +157,12 @@ reorganize_list <- function(alist, nms) {

# Checks and reshapes predictions to (n x K) matrix
check_pred <- function(x, n) {
if (!is.vector(x) && !is.matrix(x) && !is.data.frame(x)) {
stop("Predictions must be a vector, matrix, or data.frame")
}
if (is.data.frame(x)) {
x <- as.matrix(x)
}
if (!is.numeric(x)) {
stop("Predictions must be numeric")
}
Expand All @@ -166,7 +172,7 @@ check_pred <- function(x, n) {
if (length(x) == n) {
return(matrix(x, nrow = n))
}
stop("Predictions must be a length n vector or a matrix with n rows.")
stop("Predictions must be a length n vector or a matrix/data.frame with n rows.")
}

# Informative warning if background data is small or large
Expand Down
Loading

0 comments on commit e4870fb

Please sign in to comment.