Implementing an Ordinal Loss Function for R #10888

xFoxFelix · 2024-10-13T16:12:29Z

I am trying to model an ordinal variable using an XGBoost model in R. The target has 30 classes with a natural order, but the order should not be expressed as numeric. While this can be treated as a classification problem, it does not fully capture the ordinal nature of the target.

I found some issues and solutions for implementing an ordinal objective function in Python, but I couldn't find anything similar for R. For example I found the project "OrdinalGPT" for lightgbm in Python. I attempted to implement an ordinal objective function based on the ideas from the paper "A simple log-based loss function for ordinal text classification". The idea in this paper is to use a cross-entropy loss and incorporate the order of classes using a distance matrix.

Here is the R code I tried for the custom objective function:

library(stats) 

softmax <- function(x) {
  exp_x <- exp(x - max(x))
  exp_x / sum(exp_x)
}

ordinal_loss <- function(preds, dtrain) {
  labels <- as.integer(getinfo(dtrain, "label"))
  n_obs <- length(labels)
  n_ord <- max(labels) + 1  # number of ordinal cats

  # Reshape preds to matrix
  preds_matrix <- matrix(preds, nrow = n_obs, ncol = n_ord, byrow = TRUE)

  distMat <- as.matrix(dist(seq(1, n_ord), diag = TRUE, upper = TRUE)) + 1  # +1 as dist matrix should be 1 for correct class
  distMat <- as.matrix(1 / distMat)
  
  distances <- matrix(ncol = ncol(preds_matrix), nrow = nrow(preds_matrix))
  
  for (i in seq_len(length(labels))) {
    distances[i, ] <- as.numeric(distMat[labels[i] + 1, ])   
  }
  
  # probs for every cat: prediction matrix * distances
  probs <- preds_matrix * distances
  # softmax
  probs <- t(apply(probs, 1, softmax))   
  probs <- pmax(probs, 1e-8)  # no log(0)
  
  # neg log likelihood
  nll <- -log(probs[cbind(seq_along(labels), labels + 1)])
  
  # gradient
  grad <- matrix(0, nrow = n_obs, ncol = n_ord)
  for (i in 1:n_obs) {
    grad[i, labels[i] + 1] <- -1 / probs[i, labels[i] + 1]
    grad[i, -c(labels[i] + 1)] <- 1 / (1 - probs[i, -c(labels[i] + 1)])
  }
  
  # hessian
  hess <- matrix(0, nrow = n_obs, ncol = n_ord)
  for (i in 1:n_obs) {
    hess[i, labels[i] + 1] <- 1 / (probs[i, labels[i] + 1]^2)
    hess[i, -c(labels[i] + 1)] <- 1 / ((1 - probs[i, -c(labels[i] + 1)])^2)
  }
  
  return(list(grad = as.vector(t(grad)), hess = as.vector(t(hess))))
}

params <- list(
  objective = ordinal_loss,
  num_class = num_classes
)

model <- xgboost::xgb.train(
  data = data_train, 
  params = params, 
  max.depth = 8, 
  nthread = 2, 
  nrounds = 10, 
  eta = 0.1,
  verbose = 1, 
  early_stopping_rounds = 5, 
  watchlist = list(train = data_train),
  maximize = FALSE
)

The function runs without errors, but during the prediction process, I am getting probabilities smaller than zero, and the model's accuracy is poor.
Do you have any idea why this objective function is not working as expected, or can you suggest a better way to implement an ordinal loss function in R?

The text was updated successfully, but these errors were encountered:

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Implementing an Ordinal Loss Function for R #10888

Implementing an Ordinal Loss Function for R #10888

xFoxFelix commented Oct 13, 2024

Implementing an Ordinal Loss Function for R #10888

Implementing an Ordinal Loss Function for R #10888

Comments

xFoxFelix commented Oct 13, 2024