Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feature_importance for multiinput models with data as a list of arrays #142

Open
wants to merge 8 commits into
base: master
Choose a base branch
from

Conversation

jmaspons
Copy link
Contributor

Datasets can be 2d or 3d arrays

@hbaniecki
Copy link
Member

hbaniecki commented Mar 20, 2022

Hi @jmaspons, thanks for this contribution. I will try to review it next week.

  1. Could you provide some code examples (use case) for this functionality? Ideally, it would also be later incorporated into the package tests.
  2. Can you tell me how it relates to Support 3d arrays as input data for feature_importance #141?

@hbaniecki hbaniecki added the feature 💡 New feature or request label Mar 20, 2022
@jmaspons
Copy link
Contributor Author

Hello,

You can find a test script with dummy data at https://gist.github.com/jmaspons/0199ef922571bafe5eaac1a056963a83 (it requires keras, abind, DALEX and data.table packages). The patch implements feature_importance for models with more than one input datasets as 2D and 3D arrays. It can be useful for time series data (3D to a RNN) with some static variables (2D). DALEX::explainer doesn't support this kind of data input, so no changes to feature_importance.explainer

#141 implements feature_importance for a single input model. I should add some changes to that PR for the variable_groups and the autogenerated variables following this patch which I tested much more cases.

The feature_importance.default and feature_importance.multiinput

In order to add tests, do you think it's acceptable to add all the dependencies or skip some by saving some data in the package?

@hbaniecki
Copy link
Member

For starters, we should use underscore notation for function parameters instead of camelCase, e.g. perm_dim.

In order to add tests, do you think it's acceptable to add all the dependencies or skip some by saving some data in the package?

All such dependencies should be added to suggests; we wouldn't want more dependencies in imports. It would be nice to have some tests; they can run on generated data.

jmaspons added a commit to jmaspons/MLTools that referenced this pull request Mar 22, 2022
Static variables can be also categorical
Requires ModelOriented/ingredients#142
jmaspons added a commit to jmaspons/MLTools that referenced this pull request Apr 29, 2022
Implements support for 3D arrays in a list of inputs and ... to predict function
Waiting for ModelOriented/ingredients#142 and ModelOriented/ingredients#143
jmaspons added a commit to jmaspons/MLTools that referenced this pull request Apr 29, 2022
Implements support for 3D arrays in a list of inputs and ... to predict function
Waiting for ModelOriented/ingredients#142 and ModelOriented/ingredients#143
@pbiecek
Copy link
Member

pbiecek commented Jan 13, 2023

let's not have data.table and keras as dependencies

@jmaspons
Copy link
Contributor Author

let's not have data.table and keras as dependencies

I'll find alternatives implementations for the data.table part. For the keras tests, is it ok to make it conditional and add keras in the suggested packages section?

@pbiecek
Copy link
Member

pbiecek commented Mar 15, 2023

we are trying to have DALEX as light as possible and keras is quite heavy package
so maybe a valid solution would be to move this function to DALEXtra?
(it has some heavy dependencies)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature 💡 New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants