-
Notifications
You must be signed in to change notification settings - Fork 98
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Aitchison and Robust Aitchison distance? #433
Comments
Quick look at the paper indicates that the index can be calculated as a Euclidean distance after "CLR transformation". In vegan design this potentially means implementing CLR transformation in |
Yes, Aitchison distance equals to Euclidean distances in CLR-transformed data. Yes, CLR transformation is useful also as such, outside of distance calculations. Despite its limitations (requirement of pseudocount etc) it is frequently used at least in microbial community analysis to remove compositionality bias in relative data and enhance statistical comparisons between samples. Also other log-ratio transformatins (ALR, ILR, phILR) are sometimes used for this. But there are already several R packages that provide CLR and other log-ratio transformations. One of them is compositions. If you would prefer avoiding new dependencies, we could first check if this happens to be included in any of the existing dependencies. But these transformations are simple and should be relatively straightforward to implement directly in vegan, if necessary. We can consider submitting the code as PR if this suggestion finds support. |
A comment about adding dependency: compositions adds a huge number of chained dependencies (i.e. it depends on packages that depend on packages that depend on packages that ... break). The transformations should be added independently or using a more light-weight dependency. The greatest complications on those transformations seem to be handling log(0) and 0/0. |
I'm pretty sure people doing this in community ecology just solve the log(x) for x=0 problem with the usual continuity correction and run everything with (equivalent of) log1p(x). I'm also sure Cajo has written about this whole issue back in the day at least; I was under the impression that the closed compositional nature of the data largely becomes irrelevant once you are talking about more than 10s or 100s of taxa? I would support this stuff being in Vegan; doing Aitchison's log ratio (contrast?) PCA has been something I have cooked together by hand on a number of occasions when I needed to replicate work done previously/elsewhere and this is an analysis that Canoco can handle trivially too, which is where most ecologists will have likely encountered/performed it. I don't think we need to depend on compositions; as you say the dependencies would be undesirable. |
The typical way to deal with zeroes is indeed The effect of compositionality is mitigated in higher dimensions but not removed; and at least in microbial communities we frequently use also higher taxonomic levels (e.g. Phylum) and then the number of unique groups can be rather low. We have already implemented Shall we make a PR that adds:
|
I strongly support this request. CODA methods are becoming more and more common and required in microbial ecology research, so we would greatly benefit from them being natively supported in vegan :) |
We will be happy to help, looking forward to admin comments before creating a possible PR. |
Any opportunities for a PR, or shall we look for alternative solutions meanwhile? The vegan implementation would be likely to have wide user base, considering how popular this transformation has lately become in microbial ecology. |
PR would be very welcome! |
I have prepared the (I do not have the permissions to open new branches to the vegandevs/vegan repository, so it has to be one the available ones). |
Make the PR against the master. Even if you are going to pursue the task, this looks a self-contained PR that can be merged independently. I had a quick look at at the code, and it looked OK to me. I'll have a second & deeper look before the merge, but I don't expect any complications. Nice work! |
Herewego! |
@antagomir : I would like to change the distance names in |
I can add this. But waiting first if there are more comments on the names. Suggested ones are ok to me. |
Quick question (sorry if I missed it): For the "robust Aitchison", do you simply compute the Euclidean distance on the rclr transformed abundances "putting back" the 0s? (as you cannot compute distances on a matrix containing NAs)? Or do you do rclr in combination with matrix completion as in Martino et al. (2019)? |
Thanks for pointing this out @johannesbjork - this implementation is with the simple replacement. I will check if we should clarify the documentation or add the matrix completion imputation step. |
I am not sure it makes sense adding the 0s back to compute the "robust Aitchison". |
I agree that the imputed version has clear advantages and aim to find time to add that as soon as possible. Thanks for drawing our attention to this. |
This issues is closed, except the matrix completion step which is now added as a new issue #570 |
Aitchison distance and its robust version have become frequent choices in the analysis of microbial communities, where compositional data is ubiquitous. The ability get access to these through a dedicated existing package would be a better solution than creating new implementations, and allow seamless linking with other packages that rely on
vegan::vegdist
.Aitchison distance needs pseudocount in many applications. An alternative, "robust Aitchison distance" has been proposed in the literature, the difference is that CLR transformation is done only on the non-zero values. This has gained some attention recently, see e.g. Martino et al. (2019) and from there links to more original references on robust CLR / robust Aitchison.
Would you consider adding Aitchison and Robust Aitchison distance as new options in
vegan::vegdist
?The text was updated successfully, but these errors were encountered: