Skip to content

Commit

Permalink
Merge pull request #118 from ModelOriented/cran-candidate
Browse files Browse the repository at this point in the history
CRAN release
  • Loading branch information
mayer79 authored Jul 12, 2024
2 parents 01f9e35 + ac27f3e commit 91848c7
Show file tree
Hide file tree
Showing 6 changed files with 54 additions and 49 deletions.
1 change: 1 addition & 0 deletions .Rbuildignore
Original file line number Diff line number Diff line change
Expand Up @@ -10,3 +10,4 @@
^test.R$
^backlog$
^CRAN-SUBMISSION$
^pkgdown$
6 changes: 3 additions & 3 deletions CRAN-SUBMISSION
Original file line number Diff line number Diff line change
@@ -1,3 +1,3 @@
Version: 1.1.2
Date: 2024-02-03 15:31:07 UTC
SHA: 9eebdab2e583bf97d51dff9e7783df2b87751097
Version: 1.2.0
Date: 2024-07-12 12:10:00 UTC
SHA: daf4cee64500abb8d78f92d8b1e8f1e588a59884
4 changes: 2 additions & 2 deletions DESCRIPTION
Original file line number Diff line number Diff line change
Expand Up @@ -30,5 +30,5 @@ Imports:
Suggests:
testthat (>= 3.0.0)
Config/testthat/edition: 3
URL: https://github.com/ModelOriented/hstats, https://mayer79.github.io/hstats
BugReports: https://github.com/ModelOriented/hstats/issues
URL: https://github.com/ModelOriented/hstats/, https://modeloriented.github.io/hstats/
BugReports: https://github.com/ModelOriented/hstats/issues/
26 changes: 11 additions & 15 deletions NEWS.md
Original file line number Diff line number Diff line change
@@ -1,18 +1,14 @@
# hstats 1.2.0

## New home
## My new home

- My brand new home: https://github.com/ModelOriented/hstats

## Major changes
## Other changes

- Factor-valued predictions are no longer possible.
- Consequently, also removed "classification_error" loss.

## Minor changes

- Code simplifications.

# hstats 1.1.2

## ICE plots
Expand All @@ -28,29 +24,29 @@

## Performance improvements

- For pure data.frames (no tibbles, data.tables etc.), most functions are significantly faster ([#110](https://github.com/mayer79/hstats/pull/110)).
- Slight speed-up of permutation importance for non-matrix `X` ([#109](https://github.com/mayer79/hstats/pull/109)).
- For pure data.frames (no tibbles, data.tables etc.), most functions are significantly faster ([#110](https://github.com/ModelOriented/hstats/pull/110)).
- Slight speed-up of permutation importance for non-matrix `X` ([#109](https://github.com/ModelOriented/hstats/pull/109)).

## Other changes

- In multivariate cases, it was possible that normalized H-statistics could equal `0/0 (= NaN)`. Such values are now replaced by 0 ([#107](https://github.com/mayer79/hstats/issues/107)).
- Removed an unnecessary special case when calculating column means ([#106](https://github.com/mayer79/hstats/pull/106)).
- In multivariate cases, it was possible that normalized H-statistics could equal `0/0 (= NaN)`. Such values are now replaced by 0 ([#107](https://github.com/ModelOriented/hstats/issues/107)).
- Removed an unnecessary special case when calculating column means ([#106](https://github.com/ModelOriented/hstats/pull/106)).

# hstats 1.1.0

## Enhancements

- {hstats} now also works for factor predictions. The levels are represented by one-hot-encoded columns ([PR#101](https://github.com/mayer79/hstats/pull/101)).
- The plot method of a two-dimensional PDP has recieved the option `d2_geom = "line"`. Instead of a heatmap of the two features, one of the features is moved to color grouping. Combined with `swap_dim = TRUE`, you can swap the role of the two `v` variables without recalculating anything. The idea was proposed by [Roel Verbelen](https://github.com/RoelVerbelen) in [issue #91](https://github.com/mayer79/hstats/issues/91), see also [issue #94](https://github.com/mayer79/hstats/issues/94).
- {hstats} now also works for factor predictions. The levels are represented by one-hot-encoded columns ([PR#101](https://github.com/ModelOriented/hstats/pull/101)).
- The plot method of a two-dimensional PDP has recieved the option `d2_geom = "line"`. Instead of a heatmap of the two features, one of the features is moved to color grouping. Combined with `swap_dim = TRUE`, you can swap the role of the two `v` variables without recalculating anything. The idea was proposed by [Roel Verbelen](https://github.com/RoelVerbelen) in [issue #91](https://github.com/ModelOriented/hstats/issues/91), see also [issue #94](https://github.com/ModelOriented/hstats/issues/94).

## Bug fixes

- Using `BY` and `w` via column names would fail for tibbles. This problem was described in [#92](https://github.com/mayer79/hstats/issues/92) by [Roel Verbelen](https://github.com/RoelVerbelen). Thx!
- Using `BY` and `w` via column names would fail for tibbles. This problem was described in [#92](https://github.com/ModelOriented/hstats/issues/92) by [Roel Verbelen](https://github.com/RoelVerbelen). Thx!

## Other changes

- Much faster one-hot-encoding, thanks to Mathias Ambühl ([PR#101](https://github.com/mayer79/hstats/pull/101)).
- Most functions are slightly faster ([PR#101](https://github.com/mayer79/hstats/pull/101)).
- Much faster one-hot-encoding, thanks to Mathias Ambühl ([PR#101](https://github.com/ModelOriented/hstats/pull/101)).
- Most functions are slightly faster ([PR#101](https://github.com/ModelOriented/hstats/pull/101)).
- Add unit tests to compare against {iml}.
- Made all examples "tibble" and "data.table" friendly.
- Revised input checks in loss functions (relevant for `perm_importance()` and `average_loss()`).
Expand Down
42 changes: 27 additions & 15 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -152,7 +152,8 @@ Let's study different plots to understand *how* the strong interaction between d
They all reveal a substantial interaction between the two variables in the sense that the age effect gets weaker the closer to the ocean. Note that numeric `BY` features are automatically binned into quartile groups.

```r
plot(partial_dep(fit, v = "age", X = X_train, BY = "log_ocean"), show_points = FALSE)
partial_dep(fit, v = "age", X = X_train, BY = "log_ocean") |>
plot(show_points = FALSE)
```

![](man/figures/pdp_ocean_age.svg)
Expand All @@ -167,8 +168,8 @@ plot(pd, d2_geom = "line", show_points = FALSE)
![](man/figures/pdp_2d_line.svg)

```r
ic <- ice(fit, v = "age", X = X_train, BY = "log_ocean")
plot(ic, center = TRUE)
ice(fit, v = "age", X = X_train, BY = "log_ocean") |>
plot(center = TRUE)
```

![](man/figures/ice.svg)
Expand All @@ -178,11 +179,14 @@ plot(ic, center = TRUE)
In the spirit of [1], and related to [4], we can extract from the "hstats" objects a partial dependence based variable importance measure. It measures not only the main effect strength (see [4]), but also all its interaction effects. It is rather experimental, so use it with care (details in the section "Background"):

```r
plot(pd_importance(s))
pd_importance(s) |>
plot()

# Compared with four times repeated permutation importance regarding MSE
set.seed(10)
plot(perm_importance(fit, X = X_valid, y = y_valid))

perm_importance(fit, X = X_valid, y = y_valid) |>
plot()
```

![](man/figures/importance.svg)
Expand All @@ -209,14 +213,17 @@ s <- hstats(ex)
s # 0.054
plot(s)

# Strongest relative interaction
plot(ice(ex, v = "Sepal.Width", BY = "Petal.Width"), center = TRUE)
plot(partial_dep(ex, v = "Sepal.Width", BY = "Petal.Width"), show_points = FALSE)
plot(partial_dep(ex, v = c("Sepal.Width", "Petal.Width"), grid_size = 200))
# Strongest relative interaction (different visualizations)
ice(ex, v = "Sepal.Width", BY = "Petal.Width") |>
plot(center = TRUE)

partial_dep(ex, v = "Sepal.Width", BY = "Petal.Width") |>
plot(show_points = FALSE)

perm_importance(ex)
partial_dep(ex, v = c("Sepal.Width", "Petal.Width"), grid_size = 200) |>
plot()

# Permutation importance
perm_importance(ex)
# Petal.Length Petal.Width Sepal.Width Species
# 0.59836442 0.11625137 0.07966910 0.03982554
```
Expand Down Expand Up @@ -298,7 +305,7 @@ fit <- lgb.train(
average_loss(fit, X = X_valid, y = y_valid, loss = "mlogloss")

perm_importance(fit, X = X_valid, y = y_valid, loss = "mlogloss", m_rep = 100)
# Permutation importance regarding mlogloss

# Petal.Length Petal.Width Sepal.Width Sepal.Length
# 2.624241332 1.011168660 0.082477177 0.009757393

Expand Down Expand Up @@ -396,7 +403,9 @@ fit <- iris_wf |>

s <- hstats(fit, X = iris[, -1])
s # 0 -> no interactions
plot(partial_dep(fit, v = "Petal.Width", X = iris))

partial_dep(fit, v = "Petal.Width", X = iris) |>
plot()

imp <- perm_importance(fit, X = iris, y = "Sepal.Length")
imp
Expand Down Expand Up @@ -425,8 +434,11 @@ fit <- train(

h2(hstats(fit, X = iris[, -1])) # 0

plot(ice(fit, v = "Petal.Width", X = iris), center = TRUE)
plot(perm_importance(fit, X = iris, y = "Sepal.Length"))
ice(fit, v = "Petal.Width", X = iris) |>
plot(center = TRUE)

perm_importance(fit, X = iris, y = "Sepal.Length") |>
plot()
```

### mlr3
Expand Down
24 changes: 10 additions & 14 deletions cran-comments.md
Original file line number Diff line number Diff line change
@@ -1,23 +1,19 @@
# hstats 1.1.2
# Re-submission: hstats 1.2.0

Hello CRAN team
Moving the github repo has left some old links in the NEWS file. This is fixed here.

This is a small release with two convenient API improvements.
# Original message

## Local checks: 0 errors, 0 warnings, 0 notes
Hello CRAN

## Rhub: 3 NOTES (sounding harmless)
This release mainly updates the new repository ("ModelOriented" of TU Warcaw instead of my personal one), and adds Prof Biecek as co-author.

- checking HTML version of manual ... NOTE
Skipping checking math rendering: package 'V8' unavailable
- checking for non-standard things in the check directory ... NOTE
Found the following files/directories:
''NULL''
- checking for detritus in the temp directory ... NOTE
Found the following files/directories:
'lastMiKTeXException'
## Local checks

0 errors, 0 warnings, 0 notes

## Winbuilder

Status: OK
Status: 1 NOTE
R Under development (unstable) (2024-07-11 r86890 ucrt)

0 comments on commit 91848c7

Please sign in to comment.