Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix support for nnet:multinom and mclogit::mblogit multinomial models #469

Closed
tomwenseleers opened this issue Aug 25, 2022 · 14 comments
Closed

Comments

@tomwenseleers
Copy link

tomwenseleers commented Aug 25, 2022

So as you will recall, #404, there was still a problem with the nnet::multinom predict method not supporting type="link", and I think there is also a problem with the nnet::mblogit predict method with type="link" as that one now drops the first level of the response, whereas for marginaleffects we would like all levels to come out.

A solution to solve the nnet:multinom predict method lacking a type="link" predict method could be to create a wrapper method that calls the original method. For this we would also need to create a wrapper multinom function in the marginaleffects package, so that the correct package namespace is found. A sketch implementation I think would be (I also give the correct link & inverse link function for multinomial below, assuming eta and mu are predictions on the link and response scale, respectively, and have the different response levels in different columns - these were currently incorrect for both multinom & mblogit) :

multinom <- function(...)
{
    nnet::multinom(...)
}

# this is the link function for multinomial
# = generalized logit
inverse_softMax <- function(mu) {
  log_mu <- log(mu)
  return(sweep(log_mu, 1, STATS=rowMeans(log_mu), FUN="-")) # we let the log(odds) sum to zero - these predictions are referred to as type="latent" in the emmeans package
}

# this is the inverse link function for multinomial (here assuming eta is a matrix with different outcome levels as columns)
# = inverse generalized logit
softMax <- function(eta){
  exp_eta <- exp(eta)
  return(sweep(exp_eta, 1, STATS=rowSums(exp_eta), FUN="/"))
}

predict.multinom <- function(object, newdata, type=c("probs", "response", "latent", "link")) # probs=response, latent=link (latent is how it is referred to in the emmeans package)
{
    type <- match.arg(type)
    if (type == "probs"|type == "response")
        return(nnet:::predict.multinom(object, newdata, type="probs"))

    mu <- nnet:::predict.multinom(object, newdata, type="probs")
    return(inverse_softMax(mu))

}
get_predict.multinom <- function(model,
                                 newdata = insight::get_data(model),
                                 vcov = FALSE,
                                 conf_level = 0.95,
                                 type = "probs",
                                 ...) {
  
  type <- sanitize_type(model, type) # type_dictionary still needs to be updated as well
  
  pred <- predict(model,
                         newdata = newdata,
                         type = type,
                         ...)
  
  # atomic vector means there is only one row in `newdata`
  if (isTRUE(checkmate::check_atomic_vector(pred))) {
    pred <- matrix(pred, nrow = 1, dimnames = list(NULL, names(pred)))
  }
  
  # matrix with outcome levels as columns
  out <- data.frame(
    group = rep(colnames(pred), each = nrow(pred)),
    predicted = c(pred))
  
  # usually when `newdata` is supplied by `comparisons`
  if ("rowid" %in% colnames(newdata)) {
    out$rowid <- rep(newdata$rowid, times = ncol(pred))
  } else {
    out$rowid <- rep(seq_len(nrow(pred)), times = ncol(pred))
  }
  
  return(out)
}

type_dictionary referred to in sanitize_type will still have to be modifies though to reflect that type can be "response" (or equivalently "probs") or "link" (or equivalently "latent").

For predict.mblogit we also have the problem that predict(fit_mblogit, type="link") drops the first category (it normalizes the logits so that that level is zero, but doesn't explicitly return it). In the context of marginaleffects we would like all response levels to come out, and have the logits normalized so that they sum to one (as with type="latent" in the emmeans package). To change this behaviour we would also need to change the predict behaviour of that function, perhaps using something like

mblogit <- function(...)
{
    mclogit::mblogit(...)
}

predict.mblogit <- function(object, newdata, type=c("probs", "response", "latent", "link")) # probs=response, latent=link (latent is how it is referred to in the emmeans package)
{
    type <- match.arg(type)
    if (type == "probs"|type == "response")
        return(mclogit:::predict.mblogit(object, newdata, type="response"))

    mu <- mclogit:::predict.mblogit(object, newdata, type="probs")
    return(inverse_softMax(mu))

}

The variance-covariance function returned by mblogit might also need checking - Russ Lenth alerted me that that one comes out in a non-standard order (in column-major order as opposed to row-major order for the variance-covariance matrix of nnet::multinom fits) & in emmeans the vcov matrix of mblogit objects is returned in a permuted form, see https://github.com/rvlenth/emmeans/blob/master/R/multinom-support.R:

    # we have to arrange the vcov elements in row-major order
    if(missing(vcov.))
        vcov. = vcov(object)
    perm = matrix(seq_along(as.numeric(object$coefmat)), 
                  ncol = ncol(object$coefmat))
    perm = as.numeric(t(perm))
    vcov. = vcov.[perm, perm]

Would it be possible to reopen this issue so that these things could be resolved, and multinomial support in marginaleffects could be made fully functional?

Originally posted by @tomwenseleers in #404 (comment)

@tomwenseleers tomwenseleers changed the title Fix support for nnet:multinom and 'mclogit::mblogit' multinomial models Fix support for nnet:multinom and mclogit::mblogit multinomial models Aug 25, 2022
@vincentarelbundock
Copy link
Owner

vincentarelbundock commented Aug 25, 2022

Thanks a lot for the investigation, I really appreciate it!

type_dictionary

Easy. Just add a new line to the R/type_dictionary.R file.

Wrapper functions

In my view, exporting functions that overwrite those of other packages is a bad idea, because R does not have a good way of handling precedence. In practice, namespace conflicts are resolved based on the order of package loads, and this can be a nightmare for users. I will want to avoid overriding multinom and predict.multinom at almost any cost.

In any case, I'm not sure we need to do that at all. The whole point of the package design is that we already have a get_predict.multinom() method that acts as wrapper around predict.multinom(). Internally, we never call predict(), and we always call get_predict(), so it doesn't even matter what the underlying predict() does; all that matters is what our own wrapper returns. So you can probably insert all your predict() code inside get_predict() and things should work.

latent

I should invest more time to really understand this, but just to clarify, you are proposing adding a new possible value to the type argument, which would return a different and useful quantity of interest with all the levels. If so, that seems fine by me, as long as we can validate the results against some external software.

@tomwenseleers
Copy link
Author

tomwenseleers commented Aug 25, 2022

Hi Vincent,
Below I provide some reproducible tests of nnet::multinom and mclogit::mblogit together with emmeans and emtrends output (which more or less matches between both fits, save for some small rounding differences) and some preliminary steps to modify the get_predict.multinom() function. The get_predict.multinom() function is working as I intend (with predictions on link or "latent" scale as it is called in emmeans now giving predictions for all outcome levels, like in emmeans, with results that match those of emmeans), just the predictions() function is still not working 100% - maybe you can spot where it's going wrong there... But you're probably best placed to make the required modifications anyway, as I don't feel experienced enough in these sort of things to do push requests myself... Below I also illustrate that the vcov() matrix comes out in a different order (row major order in nnet::multinom, which is standard, but column major order in nnet::mblogit). Anyway, I think we should be 90% there now with this...

library(MASS)
library(marginaleffects)
library(nnet)
library(mclogit)
library(memisc)
library(mclogit)
library(insight)
library(dplyr)
library(tidyr)
library(emmeans)
library(splines)

# 1. Dataset: covid variant frequencies ("variant") through time ("collection_date_num") ####
# "count" = actual count per week
dat = read.csv("https://www.dropbox.com/s/u27cn44p5srievq/dat.csv?dl=1")
# simplify example to just 3 categories:
dat = dat[dat$variant %in% c("Alpha", "Beta", "Other"),] 
dat$variant = factor(dat$variant, levels=c("Other", "Beta", "Alpha"))

# 2. fixed get_predict methods for nnet::multinom and mclogit::mblogit ####

# this is the link function for multinomial
# = generalized logit = inverse softmax (you could also call it glogit)
# here assuming mu is a matrix with predictions on the response = probability scale
# for ALL outcome levels as columns (including for the reference level)
inverse_softMax <- function(mu) {
  log_mu <- log(mu)
  # we let the log(odds) sum to zero
  # these predictions are referred to as type="latent" in the emmeans package
  return(sweep(log_mu, 1, STATS=rowMeans(log_mu), FUN="-")) 
}

# this is the inverse link function for multinomial 
# = inverse generalized logit = softmax (you could also call it inv_glogit)
# here assuming eta is a matrix with predictions on the latent scale
# for ALL outcome levels as columns (including for the reference level)
softMax <- function(eta){
  exp_eta <- exp(eta)
  return(sweep(exp_eta, 1, STATS=rowSums(exp_eta), FUN="/"))
}

# these allow prediction with either type="response" or type="latent"
# and for both prediction types return predictions for all
# outcome levels, including the reference level
# (as is done in the emmeans package) - I don't think anything else
# makes sense
# default I think should be type="latent"

pr_multinom <- function(model, newdata, type=c("latent", "link", "probs", "response")) # probs=response, latent=link (latent is how it is referred to in the emmeans package)
{
  type <- match.arg(type)
  if ("mblogit" %in% class(model)) { resptype="response" } else { resptype="probs" }
  if (type == "probs"|type == "response") return(predict(model, newdata, type=resptype))
  
  mu <- predict(model, newdata, type=resptype)
  return(inverse_softMax(mu)) # when type=="latent" | type=="link"
  
}

get_predict.multinom <- function(model,
                                 newdata = insight::get_data(model),
                                 vcov = FALSE,
                                 conf_level = 0.95,
                                 type = "latent",
                                 ...) {
  
  # type <- sanitize_type(model, type) # type_dictionary still needs to be updated as well, just unmarked this to test
  
  pred <- pr_multinom(model,
                  newdata = newdata,
                  type = type,
                  ...)
  
  # atomic vector means there is only one row in `newdata`
  if (isTRUE(checkmate::check_atomic_vector(pred))) {
    pred <- matrix(pred, nrow = 1, dimnames = list(NULL, names(pred)))
  }
  
  # matrix with outcome levels as columns
  out <- data.frame(
    group = rep(colnames(pred), each = nrow(pred)),
    predicted = c(pred))
  
  # usually when `newdata` is supplied by `comparisons`
  if ("rowid" %in% colnames(newdata)) {
    out$rowid <- rep(newdata$rowid, times = ncol(pred))
  } else {
    out$rowid <- rep(seq_len(nrow(pred)), times = ncol(pred))
  }
  
  return(out)
}

get_predict.mblogit <- function(model,
                                newdata = insight::get_data(model),
                                vcov = FALSE,
                                conf_level = 0.95,
                                type = "latent",
                                ...) {
  
  if (!isTRUE(checkmate::check_flag(vcov, null.ok = TRUE)) &&
      isTRUE(list(...)$calling_function == "predictions")) {
    stop("The `vcov` argument is not supported for this model class.", call. = FALSE)
  }
  
  out <- suppressMessages(
    get_predict.multinom(model = model,
                         newdata = newdata,
                         type = type,
                         ...))
  return(out)
}


# 2. mclogit::mblogit multinomial fit ####
fit_mblogit = mblogit(formula=variant ~ ns(collection_date_num, df=2),
                      weights=count,
                      data=dat,
                      from.table=FALSE, dispersion=FALSE,
                      control=mclogit.control(maxit=100))

# test of mclogit::mblogit model prediction for row 160 in dataset
# prediction on response scale
mclogit:::predict.mblogit(fit_mblogit, type="response")[160,]
#      Other       Beta      Alpha 
# 0.86616413 0.01788666 0.11594921 
 
# prediction on link scale (implicitly here reference level Other=0, but dropped, better maybe to add explicit column Other=0 here?)
mclogit:::predict.mblogit(fit_mblogit, type="link")[160,]
#      Beta     Alpha 
# -3.880019 -2.010922 

# standard errors on link scale
mclogit:::predict.mblogit(fit_mblogit, type="link", se.fit=TRUE)$se.fit[160,]
#       Beta      Alpha 
# 0.15458046 0.06828786

# emmeans result (response/prob scale, with SEs and 95% CLs on response scale)
df1=as.data.frame(emmeans(fit_mblogit, ~ collection_date_num, by="variant", 
        at=list(collection_date_num=dat[160,'collection_date_num']),
        mode="prob", level=0.95))
df1
#   collection_date_num variant       prob          SE  df  asymp.LCL  asymp.UCL
# 1               18764   Other 0.86616413 0.007357713 Inf 0.85174327 0.88058498
# 2               18764    Beta 0.01788666 0.002713279 Inf 0.01256873 0.02320459
# 3               18764   Alpha 0.11594921 0.006993861 Inf 0.10224150 0.12965693

# emmeans result (latent scale, with SEs and 95% CLs on latent scale)
df2=as.data.frame(emmeans(fit_mblogit, ~ collection_date_num, by="variant", 
                      at=list(collection_date_num=dat[160,'collection_date_num']),
                      mode="latent", level=0.95))
df2
#   collection_date_num variant      emmean         SE  df  asymp.LCL   asymp.UCL
# 1               18764   Other  1.96364709 0.05718955 Inf  1.8515576  2.07573655
# 2               18764    Beta -1.91637203 0.10460975 Inf -2.1214034 -1.71134070
# 3               18764   Alpha -0.04727506 0.06732437 Inf -0.1792284  0.08467828

# predictions modified get_predict function for row 160: correct
# NOTE: having these predictions come out in long format as opposed to
# with different outcome levels in different columns is nonstandard though
# (nice for user plotting, but maybe not what you would like to happen internally -
# this could be the reason that transform_post = function (eta) softMax(eta) below
# in predictions() now doesn't work, because the eta that is passed is in long rather
# than in the expected wide format I believe)
# on response scale
preds_mblogit_response = get_predict.multinom(fit_mblogit, type="response")
preds_mblogit_response[preds_mblogit_response$rowid==160,]
#     group  predicted rowid
# 160 Other 0.86616413   160
# 487  Beta 0.01788666   160
# 814 Alpha 0.11594921   160

# on latent scale
preds_mblogit_latent = get_predict.multinom(fit_mblogit, type="latent")
preds_mblogit_latent[preds_mblogit_latent$rowid==160,]
#     group   predicted rowid
# 160 Other  1.96364709   160
# 487  Beta -1.91637203   160
# 814 Alpha -0.04727506   160


# the above emmeans output with mode="prob" I would like to obtain 
# using marginaleffects call:
predictions(fit_mblogit, 
            newdata = datagrid(collection_date_num=dat[160,'collection_date_num']),
            type="link", 
            transform_post = function (eta) softMax(eta) ) # NOTE: insight::link_inverse(fit_mblogit) NOT CORRECT
# with type="latent": Error: The `type` argument for models of class `mblogit` must be an element of: link, response
# presumably R/type_dictionary.R still needs to be modified
# with type="link": Error in exp(eta) : non-numeric argument to mathematical function
# (something going wrong with transform_post)

# leaving out transform_post :
# problem is that predictions leave out reference group, which should
# come out as zero & should then be centered to get to latent scale to get
# values for all outcome levels,
# SEs match
# mclogit:::predict.mblogit(fit_mblogit, type="link", se.fit=TRUE)$se.fit[160,]
# though, which also leaves out reference group
# (but undesired in context of marginaleffects)
predictions(fit_mblogit, 
            newdata = datagrid(collection_date_num=dat[160,'collection_date_num']),
            type="link")
#   rowid type group predicted  std.error statistic       p.value  conf.low conf.high
# 1     1 link  Beta -3.880019 0.15458046 -25.10032 4.933308e-139 -4.182991 -3.577047
# 2     1 link Alpha -2.010922 0.06828786 -29.44773 1.346264e-190 -2.144764 -1.877080
# count collection_date_num
# 1 73.10398               18764
# 2 73.10398               18764


# NOTE: could be handy to allow users to register extra types as now it's
# a bit tricky to make a model in another package being supported
# by marginaleffects without modifying the code of marginaleffects


# PS: the predictions on the latent scale are predictions on the link scale but
# with first reference category explicity added & then centered, so that
# the log(odds) sum to zero (see the vignette of emmeans), i.e.
cbind(0, mclogit:::predict.mblogit(fit_mblogit, type="link"))[160,]-mean(cbind(0, mclogit:::predict.mblogit(fit_mblogit, type="link"))[160,])
#                   Beta       Alpha 
# 1.96364709 -1.91637203 -0.04727506 

# for marginaleffects & multinomial models I think type="latent" is
# the only type that actually makes sense
# in any case, one would like to work with predictions & SEs & CLs for which
# all outcome levels appear, including the reference level

# example of emtrends (on latent scale, here this describes growth rate advantage of each variant)
# here also at timepoint of observation 160
as.data.frame(confint(pairs(emtrends(fit_mblogit, ~ variant|collection_date_num,
                       var = "collection_date_num",  
                       mode = "latent",
                       at = list(collection_date_num = 
                                   dat[160,'collection_date_num'])),
              reverse=TRUE), level=0.9))
#        contrast collection_date_num    estimate          SE  df   asymp.LCL
# 1  Beta - Other               18764 -0.01606104 0.005195378 Inf -0.02672347
# 2 Alpha - Other               18764 -0.03421657 0.002180285 Inf -0.03869116
# 3  Alpha - Beta               18764 -0.01815554 0.005577478 Inf -0.02960216
#      asymp.UCL
# 1 -0.005398602
# 2 -0.029741991
# 3 -0.006708918

# this emtrends result for mode = "latent" I would like to be able to match 
# using marginaleffects
# I think using https://vincentarelbundock.github.io/marginaleffects/articles/hypothesis.html#pairwise-contrasts-difference-in-differences
# but not sure of syntax??


# 3. nnet::multinom multinomial fit ####
set.seed(1)
fit_multinom = nnet::multinom(variant ~ ns(collection_date_num, df=2), 
                          weights=count, data=dat)

# test of nnet::multinom model prediction for row 160 in dataset
# prediction on response / probs scale
nnet:::predict.multinom(fit_multinom, type="probs")[160,]
#      Other       Beta      Alpha 
# 0.86628084 0.01791583 0.11580333 

# predictions on link or latent scale or calculation of SEs & CLs
# not supported by nnet by they are by emmeans:

# emmeans result (response/prob scale, with SEs and 95% CLs on response scale)
as.data.frame(emmeans(fit_multinom, ~ collection_date_num, by="variant", 
                      at=list(collection_date_num=dat[160,'collection_date_num']),
                      mode="prob", level=0.95))
#   collection_date_num variant       prob          SE df   lower.CL  upper.CL
# 1               18764   Other 0.86628084 0.007354173  6 0.84828582 0.8842759
# 2               18764    Beta 0.01791583 0.002712467  6 0.01127867 0.0245530
# 3               18764   Alpha 0.11580333 0.006990414  6 0.09869840 0.1329083

# emmeans result (latent scale, with SEs and 95% CLs on latent scale)
as.data.frame(emmeans(fit_multinom, ~ collection_date_num, by="variant", 
                      at=list(collection_date_num=dat[160,'collection_date_num']),
                      mode="latent", level=0.95))
#   collection_date_num variant      emmean         SE df  lower.CL   upper.CL
# 1               18764   Other  1.96361336 0.05710552  6  1.823881  2.1033455
# 2               18764    Beta -1.91491087 0.10442145  6 -2.170421 -1.6594008
# 3               18764   Alpha -0.04870248 0.06727033  6 -0.213307  0.1159021

# predictions modified get_predict function for row 160: correct
# on response scale
preds_multinom_response = get_predict.multinom(fit_multinom, type="response")
preds_multinom_response[preds_multinom_response$rowid==160,]
#     group  predicted rowid
# 160 Other 0.86628084   160
# 487  Beta 0.01791583   160
# 814 Alpha 0.11580333   160

# on latent scale
preds_multi_latent = get_predict.multinom(fit_multinom, type="latent")
preds_multi_latent[preds_multi_latent$rowid==160,]
#     group   predicted rowid
# 160 Other  1.96361336   160
# 487  Beta -1.91491087   160
# 814 Alpha -0.04870248   160

# the above emmeans output with mode="prob" I would like to obtain 
# using marginaleffects call:
predictions(fit_multinom, 
            newdata = datagrid(collection_date_num=dat[160,'collection_date_num']),
            type="link", 
            transform_post = function (eta) softMax(eta) ) # NOTE: insight::link_inverse(fit_multinom) NOT CORRECT
# with type="latent" or "link": Error: The `type` argument for models of class `mblogit` must be an element of: link, response
# presumably R/type_dictionary.R still needs to be modified

# leaving out transform_post & using type="probs":
# SE matches emmeans result: 
# as.data.frame(emmeans(fit_multinom, ~ collection_date_num, by="variant", 
# at=list(collection_date_num=dat[160,'collection_date_num']),
# mode="prob", level=0.95))
# but in contrast to emmeans results leaves out confidence intervals
predictions(fit_multinom, 
            newdata = datagrid(collection_date_num=dat[160,'collection_date_num']),
            type="probs")
#   rowid  type group  predicted   std.error    count collection_date_num
# 1     1 probs Other 0.86628084 0.007356969 73.10398               18764
# 2     1 probs  Beta 0.01791583 0.002713518 73.10398               18764
# 3     1 probs Alpha 0.11580333 0.006993107 73.10398               18764


# example of emtrends (on latent scale, here this describes growth rate advantage of each variant)
# here also at timepoint of observation 160
as.data.frame(confint(pairs(emtrends(fit_multinom, ~ variant|collection_date_num,
                                     var = "collection_date_num",  
                                     mode = "latent",
                                     at = list(collection_date_num = 
                                                 dat[160,'collection_date_num'])),
                            reverse=TRUE), level=0.9))
# contrast collection_date_num    estimate          SE df    lower.CL
# 1  Beta - Other               18764 -0.01596339 0.005179619  6 -0.02899609
# 2 Alpha - Other               18764 -0.03425342 0.002181425  6 -0.03974221
# 3  Alpha - Beta               18764 -0.01829002 0.005563444  6 -0.03228847
# upper.CL
# 1 -0.002930702
# 2 -0.028764628
# 3 -0.004291572

# this emtrends result for mode = "latent" I would like to be able to match 
# using marginaleffects
# I think using https://vincentarelbundock.github.io/marginaleffects/articles/hypothesis.html#pairwise-contrasts-difference-in-differences
# but not sure of syntax??

# FINAL NOTE: ILLUSTRATION THAT vcov MATRIX COMES OUT IN DIFFERENT ORDER
# IN nnet::multinom AND mclogit::mblogit
colnames(vcov(fit_multinom)) # (in row major order, this is standard)
# [1] "Beta:(Intercept)"                       "Beta:ns(collection_date_num, df = 2)1" 
# [3] "Beta:ns(collection_date_num, df = 2)2"  "Alpha:(Intercept)"                     
# [5] "Alpha:ns(collection_date_num, df = 2)1" "Alpha:ns(collection_date_num, df = 2)2"

colnames(vcov(fit_mblogit)) # (in column major order, this is nonstandard)
# [1] "Beta~(Intercept)"                       "Alpha~(Intercept)"                     
# [3] "Beta~ns(collection_date_num, df = 2)1"  "Alpha~ns(collection_date_num, df = 2)1"
# [5] "Beta~ns(collection_date_num, df = 2)2"  "Alpha~ns(collection_date_num, df = 2)2"

# emmeans deals with this here:
# https://github.com/rvlenth/emmeans/blob/master/R/multinom-support.R
# lines 112-118:
# # we have to arrange the vcov elements in row-major order
#     if(missing(vcov.))
# vcov. = vcov(object)
# perm = matrix(seq_along(as.numeric(object$coefmat)), 
#               ncol = ncol(object$coefmat))
# perm = as.numeric(t(perm))
# vcov. = vcov.[perm, perm]

@vincentarelbundock
Copy link
Owner

Thanks a lot for this!

It's the start of semester for me, so things are especially busy, but I'll take a serious look at this as soon as I find time.

Just to be clear on copyright issues, is any of this code lifted or "strongly inspired" by emmeans? The license is permissive, so it's not a problem, but I still need to take steps if that's the case.

@tomwenseleers
Copy link
Author

tomwenseleers commented Aug 25, 2022

Hi Vincent! No worries - I know the feeling! :-) And no, nothing lifted from emmeans, was just using it to check results here.
It appears emmeans is getting predictions on that latent scale in a somewhat different way - by doing something complicated with the model matrix (probably accomplishing the same centering of the logits, but done in a slightly different way - I have to dust off my linear algebra to remember how Kronecker products work), see lines 48 to 51 in https://github.com/rvlenth/emmeans/blob/master/R/multinom-support.R :

    X = model.matrix(trms, m, contrasts.arg = object$contrasts)
    # recenter for latent predictions
    pat = (rbind(0, diag(k + 1, k)) - 1) / (k + 1)
    X = kronecker(pat, X)

I have to ask Russ Lenth how that works exactly... But I get the same result as with emmeans, so I presume what I did is right...

Only that last bit I mentioned was in emmeans like that, lines 112-118 in https://github.com/rvlenth/emmeans/blob/master/R/multinom-support.R :

# # we have to arrange the vcov elements in row-major order
#     if(missing(vcov.))
# vcov. = vcov(object)
# perm = matrix(seq_along(as.numeric(object$coefmat)), 
#               ncol = ncol(object$coefmat))
# perm = as.numeric(t(perm))
# vcov. = vcov.[perm, perm]

@vincentarelbundock
Copy link
Owner

mblogit now support type="latent". This was easy to implement.

I re-read all this, and I'm not sure what the feature request is for nnet::multinom. The built-in function does not allow predictions on the link scale at all, so I can't support it here.

@tomwenseleers
Copy link
Author

tomwenseleers commented Aug 26, 2022

Ha thanks! For nnet::multinom - if you are looking for a predict function that behaves like that of predict.mblogit with type="link" (that normalizes the log(odds) so that the reference level comes out at zero, which is then dropped) you can get it liked this:

mu = nnet:::predict.multinom(fit_multinom, type="probs")

inverse_softMax_tolink <- function(mu) {
  log_mu <- log(mu)
  # we normalize log(odds) so that first outcome level comes out as zero & then drop that level as in predict.mblogit with type="link"
  return(sweep(log_mu, 1, STATS=log_mu[,1], FUN="-")[,-1]) 
}

inverse_softMax_tolink(mu)[1:3,]
#        Beta     Alpha
# 1 -22.25909 -21.13826
# 2 -21.65267 -20.44572
# 3 -21.04682 -19.75393

# this matches (up to rounding/fitting differences)
mclogit:::predict.mblogit(fit_mblogit, type="link")[1:3,]
#        Beta     Alpha
# 1 -22.31522 -21.13413
# 2 -21.70682 -20.44180
# 3 -21.09900 -19.75022

When using type="latent" and then using softMax(eta) as transform_post was not working for me because I had assumed predictions would come out in wide format with predictions for the different outcome levels on the latent scale in different columns, when it appears it's passed in long format... But a similar function but working on a dataframe in long format should work... (even if it's probably most efficient if one could work directly on a matrix in wide format)

@vincentarelbundock
Copy link
Owner

Got it. And what's the scientific case for these?

Naively, my intuition is they the standard errors don't get the nice properties because we first estimated them on the response scale. And then we get something that is less natural to interpret than probabilities...

@tomwenseleers
Copy link
Author

tomwenseleers commented Aug 26, 2022

Well I would like my emmeans to be calculated on the link / latent scale (where standard errors and confidence intervals would be symmetrical) and then have these backtransformed to the response / probability scale. Which is why I thought I would have to use type="latent" in combination with a softmax transform_post. I thought that's how it's done by default in emmeans if I understand correctly also if I ask mode="prob" above in the emmeans call, but maybe I am misunderstanding... In my Covid example above, which is about modelling SARS-CoV2 lineage frequencies through time, I would like to calculate emmeans / get predictions on the probability/response scale to give me the % of each variant through time. But I would also like to calculate pairwise contrasts in emtrends on the latent scale among variants, which then give you the difference in growth rate per day. Hope that makes sense? (I originally used that method in https://www.science.org/doi/10.1126/science.abg3055. - originally using the emmeans package, but I had hoped marginaleffects would be faster...)

EDIT: tried out your mblogit modifications with type = "latent" below - looks good, so many thanks for that - some very slight differences with the emmeans output, but probably just rounding errors or a difference in the assumed df's (they are taken as Infinity in emmeans for mblogit models). Only thing I wasn't sure of is if I should use type="response" or type="latent" following by a softmax backtransformation of the predicted values and confidence intervals by passing an appropriate transform_post function. Any thoughts? Also managed to get those pairwise contrasts in marginal trends on the latent scale. Just didn't know there is how to get emtrends contrasts like that for multiple timepoints (without putting the call in a loop)...

# marginaleffects predictions with type="response" :
predictions(fit_mblogit, 
            newdata = datagrid(collection_date_num=dat[160,'collection_date_num']),
            type="response" 
             ) 
# transform_post = function (eta) softMax(eta), NOTE: insight::link_inverse(fit_multinom) NOT CORRECT
# rowid     type group  predicted   std.error  statistic      p.value   conf.low
# 1     1 response Other 0.86616413 0.007360510 117.677182 0.000000e+00 0.85173779
# 2     1 response  Beta 0.01788666 0.002714336   6.589701 4.407117e-11 0.01256666
# 3     1 response Alpha 0.11594921 0.006996554  16.572332 1.104583e-61 0.10223622
# conf.high    count collection_date_num variant
# 1 0.88059046 73.10398               18764   Alpha
# 2 0.02320666 73.10398               18764   Alpha
# 3 0.12966221 73.10398               18764   Alpha

# this closely matches output of emmeans call (slight differences maybe
# due to difference in assumed df?, taken as Inf in emmeans)
as.data.frame(emmeans(fit_mblogit, ~ collection_date_num, by="variant", 
                      at=list(collection_date_num=dat[160,'collection_date_num']),
                      mode="prob", level=0.95))
#   collection_date_num variant       prob          SE  df  asymp.LCL  asymp.UCL
# 1               18764   Other 0.86616413 0.007357713 Inf 0.85174327 0.88058498
# 2               18764    Beta 0.01788666 0.002713279 Inf 0.01256873 0.02320459
# 3               18764   Alpha 0.11594921 0.006993861 Inf 0.10224150 0.12965693

# marginaleffects predictions with type="latent" :
predictions(fit_mblogit, 
            newdata = datagrid(collection_date_num=dat[160,'collection_date_num']),
            type="latent" 
)
# rowid   type group   predicted  std.error   statistic       p.value   conf.low
# 1     1 latent Other  1.96364709 0.05718955  34.3357663 2.297228e-258  1.8515576
# 2     1 latent  Beta -1.91637203 0.10460975 -18.3192493  5.810889e-75 -2.1214034
# 3     1 latent Alpha -0.04727506 0.06732437  -0.7021983  4.825555e-01 -0.1792284
# conf.high    count collection_date_num variant
# 1  2.07573655 73.10398               18764   Alpha
# 2 -1.71134070 73.10398               18764   Alpha
# 3  0.08467828 73.10398               18764   Alpha

# lower conf interval on backtransformed response scale would be
softmax = function (x) exp(x)/sum(exp(x))
softmax(c(1.8515576, -2.1214034, -0.1792284))
# 0.86952744 0.01636245 0.11411011

# appropriate transform_post function would have to be passed 
# to apply this to output in long format?

# emmeans output with mode="latent" 
as.data.frame(emmeans(fit_mblogit, ~ collection_date_num, by="variant", 
at=list(collection_date_num=dat[160,'collection_date_num']),
mode="latent", level=0.95))
#   collection_date_num variant      emmean         SE  df  asymp.LCL   asymp.UCL
# 1               18764   Other  1.96364709 0.05718955 Inf  1.8515576  2.07573655
# 2               18764    Beta -1.91637203 0.10460975 Inf -2.1214034 -1.71134070
# 3               18764   Alpha -0.04727506 0.06732437 Inf -0.1792284  0.08467828


# pairwise contrasts in marginal trends on the latent link scale
# (this would measure pairwise differences in growth rate among
# SARS-CoV2 variants) calculated using emtrends :
as.data.frame(confint(pairs(emtrends(fit_mblogit, ~ variant|collection_date_num,
                                     var = "collection_date_num",  
                                     mode = "latent",
                                     at = list(collection_date_num = 
                                                 dat[160,'collection_date_num'])),
                            reverse=TRUE), level=0.9))
#        contrast collection_date_num    estimate          SE  df   asymp.LCL
# 1  Beta - Other               18764 -0.01606104 0.005195378 Inf -0.02672347
# 2 Alpha - Other               18764 -0.03421657 0.002180285 Inf -0.03869116
# 3  Alpha - Beta               18764 -0.01815554 0.005577478 Inf -0.02960216
# asymp.UCL
# 1 -0.005398602
# 2 -0.029741991
# 3 -0.006708918

# the same emtrends on latent scale calculated using marginaleffects
# pairwise contrasts in marginal trends = 
# growth rate differences between variants :
comparisons(
  fit_mblogit,
  newdata = datagrid(collection_date_num = 
                       c(dat[160,'collection_date_num'])),
  variables = "collection_date_num",
  type = "latent",
  hypothesis = "pairwise")
#     type          term  comparison   std.error  statistic      p.value
# 1 latent  Beta - Other -0.01583897 0.005161144  -3.068888 2.148571e-03
# 2 latent Alpha - Other -0.03392950 0.002167369 -15.654696 3.085834e-55
# 3 latent  Alpha - Beta -0.01809053 0.005541114  -3.264782 1.095484e-03
# conf.low    conf.high
# 1 -0.02595463 -0.005723316
# 2 -0.03817747 -0.029681536
# 3 -0.02895091 -0.007230145

# PS wasn't sure how to do this for multiple timepoints / collection_date_num
# without putting this in a loop - is that possible?
# If I pass two timepoints I get also all contrasts between the
# different timepoints, whereas I would just like to get the
# marginal trend contrasts at the given timepoints between the different variants

@vincentarelbundock
Copy link
Owner

vincentarelbundock commented Aug 27, 2022

emmeans is very fast and I don't think we're going to beat it in that respect.

For your last use case, a loop might be the only way to achieve this currently, but it is quite easy:

library(nnet)
library(marginaleffects)
mod <- multinom(factor(gear) ~ mpg + hp, data = mtcars, trace = FALSE)
do.call("rbind", lapply(20:30, function(i)
    transform(
        comparisons(
        mod,
        newdata = datagrid(mpg = i, model = mod),
        hypothesis = "pairwise",
        type = "probs",
        variables = "mpg"),
    mpg = i)))

SoftMax is a special function because it needs to normalize by response group. The transform_post argument applies exactly the same function to all elements of a vector; it is not aware of the response groups. However, the nice thing about marginaleffects is that you can do post-transform the results very easily with normal R operations.

library(mclogit)
library(dplyr)
mod <- mblogit(factor(gear) ~ mpg, data = mtcars, trace = FALSE)
sm <- function(x) exp(x) / sum(exp(x)) 
predictions(mod, type = "link") |>
    group_by(rowid) |>
    mutate_at(c("predicted", "conf.low", "conf.high"), sm) |>
    dplyr::filter(rowid == 1)

Which is equivalent to:

p1 <- predict(mod, type = "link")
softMax(p1) |> head(1)

@vincentarelbundock
Copy link
Owner

vincentarelbundock commented Aug 27, 2022

What you are proposing for get_predict.multinom() with type="link" is that we should compute predictions on the response scale, and then apply the invese_softMax_tolink() function to convert to link scale. And then you ask if we should use transform_post to change it back to response? Personally, this is the kind of thing that would scare me when the function is not one-to-one (i.e., inverse_softMax_tolink(softMax(x)) does not return the original x).

I'll keep thinking about this, but at the moment my intuition is that marginaleffects should not be implementing new prediction types for various models. This requires a lot of model-specific statistical knowledge, and it could end up being a very deep rabbit hole.

There is a lot of stuff in this thread, but to recap:

  • type="latent" for mblogit: already implemented (this violates my rule above so I probably wouldn't do this again, but it was easy enough.)
  • type="link" for multinom models: not going to implement now
  • pairwise comparisons between group levels for different time points: I gave code above with lapply and do.call
  • apply softMax via transform_post: impossible without a major change in the code infrastructure, but I showed an easy dplyr solution above.

Am I missing anything?

@tomwenseleers
Copy link
Author

tomwenseleers commented Aug 27, 2022

Hi Vincent, Many thanks for all your help! Only thing I still don't fully understand for mblogit models is if I should use type="response" or rather type="latent" followed by a softmax backtransformation of the predicted values and confidence intervals? Any thoughts on this? These two approaches don't give the same result, but wondering what is the most commonly accepted approach (taking into account that confidence intervals are symmetrical on the link scale, but not on the response scale)?

Re supporting type="latent" or type="link" for nnet::multinom models: inverse_softMax(softMax(x)) does return x - that was why I was inclined to drop type="link" altogether, and treat "link" and "latent" as equivalent for multinomial models, as type="latent" is really the only link scale that makes sense in the context of marginal effects, as that is the one that includes all outcome levels, like type="response" (another workaround could be to include an explicit zero column in the output of predictions with type="link" - in that case inverse_softMax(softMax(x)) would also return x). I also had the feeling that my code above for nnet::multinom models was practically working already... I think multinomial models are also the rare case where one really would like to allow for a special prediction type in the context of marginal effects - emmeans e.g. did exactly that... If nnet was in active development & had a github somewhere I could fix type="link", but I never got a reply from Brian Ripley, so the chance of him implementing this will be next to zero I fear... The reason I wanted to shift from emmeans to marginaleffects was also that I am often most interested in mere model predictions or adjusted model predictions with confidence intervals for my observed datapoints, which marginaleffects can do, but emmeans not, as far as I know. For multinomial models, mblogit would be one option, but nnet::multinom would be nice to have fully supported too, as mblogit now is still a little fragile (often giving fitting errors), while nnet::multinom is very robust.

Regarding transforms_post & that softMax transform & predictions now always coming out in long format: that was also the reason I raised this in easystats/insight#603 - intuitively, I would think that for multivariate models (multinomial really is multivariate logistic) it would be most logical if predictions came out in wide format, even if for plotting in ggplot2 long format is of course more convenient. Maybe you could think about some argument to_long=TRUE or FALSE when outputting predictions, or else make transform_post aware of rowid if it is passed along, and allow two function arguments (I thought in fact it was now passing the whole dataframe with all columns to backtransform, including rowid, wasn't it)?

Ha, and don't know if anything had to be done for the vcov method of mblogit::multinom or nnet::multinom objects (they come out in a different order) - or is that already taken care of?

@vincentarelbundock
Copy link
Owner

Only thing I still don't fully understand for mblogit models is if I should use type="response" or rather type="latent" followed by a softmax backtransformation of the predicted values and confidence intervals? Any thoughts on this?

Not sure, sorry. (Trying to build tools and avoid becoming a statistical consultant. 😉)

The var-cov order is not a problem.

Can you try type="latent" for a nnet::multinom model using latest dev version from Github? Please make sure you restart R completely after installing the package. Does it work as expected and replicate emmeans?

@vincentarelbundock
Copy link
Owner

Not sure this was clear in my previous post, but I did implement latent in multinom models in the latest dev version.

FYI, I do not plan to include a wide output argument. Imagine a case where we have 3 outcome levels, with fitted value, standard error, CI. Then, we need all these columns:

predicted_level1
std.error_level1
conf.low_level1
conf.high_level1
predicted_level2
std.error_level2
conf.low_level2
conf.high_level2
predicted_level3
std.error_level3
conf.low_level3
conf.high_level3

The names of the columns include two pieces of information: estimate type and level, which violates the "tidy" principle that most modern R packages work with. The long format also works with split-apply-combine paradigms (dplyr or data.table) immediately. The long format also works immediately with ggplot2. Finally it is trivial to convert long to wide with a single pivot call. Since users can get the wide format with a single (simple!) line of code, I don't see much value in adding a new argument.

@vincentarelbundock
Copy link
Owner

Closing now to clean up the repo, but feel free to comment or open a new issue if things are not working as expected or if you have other feature requests.

Tristan-Siegfried pushed a commit to Tristan-Siegfried/marginaleffects that referenced this issue Sep 21, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants