Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implementing an Ordinal Loss Function for R #10888

Open
xFoxFelix opened this issue Oct 13, 2024 · 0 comments
Open

Implementing an Ordinal Loss Function for R #10888

xFoxFelix opened this issue Oct 13, 2024 · 0 comments

Comments

@xFoxFelix
Copy link

I am trying to model an ordinal variable using an XGBoost model in R. The target has 30 classes with a natural order, but the order should not be expressed as numeric. While this can be treated as a classification problem, it does not fully capture the ordinal nature of the target.

I found some issues and solutions for implementing an ordinal objective function in Python, but I couldn't find anything similar for R. For example I found the project "OrdinalGPT" for lightgbm in Python. I attempted to implement an ordinal objective function based on the ideas from the paper "A simple log-based loss function for ordinal text classification". The idea in this paper is to use a cross-entropy loss and incorporate the order of classes using a distance matrix.

Here is the R code I tried for the custom objective function:

library(stats) 

softmax <- function(x) {
  exp_x <- exp(x - max(x))
  exp_x / sum(exp_x)
}

ordinal_loss <- function(preds, dtrain) {
  labels <- as.integer(getinfo(dtrain, "label"))
  n_obs <- length(labels)
  n_ord <- max(labels) + 1  # number of ordinal cats

  # Reshape preds to matrix
  preds_matrix <- matrix(preds, nrow = n_obs, ncol = n_ord, byrow = TRUE)

  distMat <- as.matrix(dist(seq(1, n_ord), diag = TRUE, upper = TRUE)) + 1  # +1 as dist matrix should be 1 for correct class
  distMat <- as.matrix(1 / distMat)
  
  distances <- matrix(ncol = ncol(preds_matrix), nrow = nrow(preds_matrix))
  
  for (i in seq_len(length(labels))) {
    distances[i, ] <- as.numeric(distMat[labels[i] + 1, ])   
  }
  
  # probs for every cat: prediction matrix * distances
  probs <- preds_matrix * distances
  # softmax
  probs <- t(apply(probs, 1, softmax))   
  probs <- pmax(probs, 1e-8)  # no log(0)
  
  # neg log likelihood
  nll <- -log(probs[cbind(seq_along(labels), labels + 1)])
  
  # gradient
  grad <- matrix(0, nrow = n_obs, ncol = n_ord)
  for (i in 1:n_obs) {
    grad[i, labels[i] + 1] <- -1 / probs[i, labels[i] + 1]
    grad[i, -c(labels[i] + 1)] <- 1 / (1 - probs[i, -c(labels[i] + 1)])
  }
  
  # hessian
  hess <- matrix(0, nrow = n_obs, ncol = n_ord)
  for (i in 1:n_obs) {
    hess[i, labels[i] + 1] <- 1 / (probs[i, labels[i] + 1]^2)
    hess[i, -c(labels[i] + 1)] <- 1 / ((1 - probs[i, -c(labels[i] + 1)])^2)
  }
  
  return(list(grad = as.vector(t(grad)), hess = as.vector(t(hess))))
}

params <- list(
  objective = ordinal_loss,
  num_class = num_classes
)

model <- xgboost::xgb.train(
  data = data_train, 
  params = params, 
  max.depth = 8, 
  nthread = 2, 
  nrounds = 10, 
  eta = 0.1,
  verbose = 1, 
  early_stopping_rounds = 5, 
  watchlist = list(train = data_train),
  maximize = FALSE
)

The function runs without errors, but during the prediction process, I am getting probabilities smaller than zero, and the model's accuracy is poor.
Do you have any idea why this objective function is not working as expected, or can you suggest a better way to implement an ordinal loss function in R?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant