-
Notifications
You must be signed in to change notification settings - Fork 6
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Variable roles in tidymodels recipe and workflow... are they respected by rSAFE? #10
Comments
I believe it is a matter of how DALEX treats the datasets in the explainer, could you, please prepare a reproducible example and share session info? |
I attached a rendered html and rmd file with my analysis and session info at the bottom. Is it ok just to ignore from the output the variables that did not take part in modelling? And do the data transformation with the existing variables as they are? My session info:
|
Thank you, by reproducible example, I meant some toy example that is simple and fast to run, this .Rmd is taking a lot of time to compute and when I decreased the number of trees in xgboost to speed the script up I got an error:
Anyway, if you pass the data frame with all columns ( Variable filtering perhaps should be a feature in a future version of SAFE. At this point, I would suggest filtering out variables before feeding data into the explainer. |
Example (I am playing with bicycle demand data from Kaggle
will create time features out of the datetime index and then datetime will not take part in modelling.
I also removed "atemp" variable altogether (temp and atemp were strongly correlated). It is not taking part in the modelling either.
Next I run the explainer:
explainer <- explain_tidymodels(bike_final_fit, data = bike_all %>% select(-count), y = bike_all$count)
safe_extractor <- safe_extraction(explainer)
Safe extractor seems to ignore the lack of datetime and atemp in modelling process and proposes:
How to tell rSAFE these two vars (one is time index another has been removed in the bake) are not taking part?
I am attaching my quick and dirty workflow:
timeseries_modelling_xgboost_short.zip
@agosiewska
The text was updated successfully, but these errors were encountered: