Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Which selection.method to use for FindingVariableFeatures on ALRA imputed data #25

Open
Rohit-Satyam opened this issue Jul 13, 2023 · 2 comments

Comments

@Rohit-Satyam
Copy link

Rohit-Satyam commented Jul 13, 2023

Hi @linqiaozhi @JunZhao1990 @rcannood @inoue0426

I was following this issue where @ChristophH mentions that

Results should not be very different from using the original "count" data. Generally, using "data" slot
should work with "vst" method as long as the loess fit can capture the mean

  • variance relationship.

Also, @linqiaozhi suggests

For example, "The VST selection method uses count data and does not use the ALRA imputed data; please use mean.var.plot instead, if you would like to find the variable genes based on the imputed data."

So I decided to see if this relationship of mean-variance could be captured better by vst or mean.var.plot method of Seurat. Unlike mca (Malaria Cell Atlas) that I wish to use as reference and didn't perform imputation on, some cells in my samples (t1,n1) shows some deviation from the linear relationship. Is this slight deviation anticipated ?

I also observe that the standardized variance for imputed data is based at 1 unlike MCA which is based at zero. So will this be a problem when I perform integration with MCA of these samples? I am trying to resolve the problem of Jackstraw plot having all PCs as significant that I discuss in another issue here
and I thought maybe the nature of imputed data or the method used for feature selection might be influencing this.

mvp_deidentified

@Rohit-Satyam
Copy link
Author

Hi @ChristophH. Do you have thoughts on this?

@ChristophH
Copy link

I'd stick to the raw counts (not imputed) and use the vst method. If there is no mean-variance relationship, the data is violating some basic assumptions the method is based on, so proceed with care.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants