Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

missing decision types #28

Open
pecto2020 opened this issue Aug 24, 2023 · 3 comments
Open

missing decision types #28

pecto2020 opened this issue Aug 24, 2023 · 3 comments

Comments

@pecto2020
Copy link

pecto2020 commented Aug 24, 2023

I was trying to create a unified lightgbm. I've fit the model using the tidymodels framework.
Unfortunately I got this error: Error in ifelse(decision_type %in% c(">=", ">"), ret.second(split_index), : Unknown decision_type. My understing is that there is a problem in decision_type. Checkig the model I've noticed that there are thousands of missing value in the decision type column...Any idea of why decisions are missing and how to solve the issue?

@krzyzinskim
Copy link
Collaborator

Missing values are expected in this column as they occur for every leaf node, so it is unlikely that this is the cause.

However, I wasn't able to reproduce this error using tidymodels framework. But please note that an object of class lgb.Booster must be provided to the lightgbm.unify function (this can be extracted with the extract_fit_engine() function, see here). If this is not the solution, please provide a reproducible example for such an error.

@cgoo4
Copy link

cgoo4 commented Oct 1, 2024

I get this error too and have been able to reproduce it with a toy example.

If the step_dummy() line is uncommented, then it works.

lightgbm does though support categorical data without the need to dummy these variables. This introduces the decision type == where a categorical variable equals a specific value. This may be seen in the object lgb_trees which has a column showing the decision_type used after fitting the model, e.g. for the variable neighbourhood.

library(bonsai)
library(treeshap)
library(tidymodels)
library(shapviz)
library(jsonlite)

set.seed(123)
split <- initial_split(ames, prop = 0.8)
train <- training(split)
test <- testing(split)

recipe <- recipe(train) |> 
  update_role(Sale_Price, new_role = "outcome") |> 
  update_role(-has_role("outcome"), new_role = "predictor") |> 
  # step_dummy(all_nominal_predictors()) |> 
  step_zv(all_predictors()) 

spec <- 
  boost_tree(trees = 100, tree_depth = 6) |> 
  set_engine("lightgbm") |> 
  set_mode("regression")

fit <- workflow() |> 
  add_recipe(recipe) |> 
  add_model(spec) |> 
  fit(data = train)

lgb_trees <- lightgbm::lgb.model.dt.tree(extract_fit_engine(fit))

data <- recipe |>
  prep() |> 
  bake(train |> slice_sample(n = 100), has_role("predictor"))

x <- recipe |>
  prep() |>
  bake(test, has_role("predictor"))

shap <- extract_fit_engine(fit) |> 
  unify(data, type = "numeric") 
#> Error in ifelse(decision_type %in% c(">=", ">"), ret.second(split_index), : Unknown decision_type

Created on 2024-10-01 with reprex v2.1.1

@ck37
Copy link

ck37 commented Nov 21, 2024

I also ran into this - had to convert factors to indicators and it was resolved.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants