Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Choose markers to use in FlowSOM clustering #78

Open
hrj21 opened this issue Aug 16, 2024 · 1 comment
Open

Choose markers to use in FlowSOM clustering #78

hrj21 opened this issue Aug 16, 2024 · 1 comment
Labels
enhancement New feature or request

Comments

@hrj21
Copy link

hrj21 commented Aug 16, 2024

Description of feature

Hello!

Thanks again for the excellent package; I've been using it lately with great success (and fun!). My feature request is for the ability to choose a subset of var_names to cluster the observations on. This is available in the FlowSOM R and Python packages, and is a common and useful tool to control clustering.

There are a few use cases for this:

The first is when we can partition our antigens into those that define clear lineages of cells (e.g. CD3, CD14, CD11b), and those that describe the functional state of cells (e.g. cytokines, metabolic markers). Restricting clustering to only those lineage markers sometimes gives better resolution between populations, whose activation state can then studied using the functional markers.

Secondly, we have performed studies where the question was "can marker set A be used to independently identify the same cells as identified by marker set B" (it was whether metabolic antigens only can be used to identify leucocyte populations). In this case being able to select antigens for a particular clustering model was central to the experiment.

And finally, sometimes we might just have a dud marker that either wasn't expressed or the antibody didn't work, and it simply adds noise.

If there's a convenient way to do this already, please forgive me!

Best wishes
Hefin

@hrj21 hrj21 added the enhancement New feature or request label Aug 16, 2024
@mbuttner
Copy link
Collaborator

mbuttner commented Sep 3, 2024

Hi @hrj21

thank you for the praise!
I am grateful for your detailed feature enhancement description and the detailed examples. Those are very helpful to understand your feature request. I usually approach the subsetting of var_names as follows, which is certainly blowing up the memory when working with large objects:

  • Create a temporary copy of the object with the subset of features you are interested in
  • Run the FlowSOM clustering
  • Save the clustering result to the original object
  • Delete the temporary copy

However, I can put some time aside to implement a subsetting functionality that is similar to the use_highly_variable_genes parameter in various scanpy functions. For context, when we compute a PCA on single-cell RNAseq data, we can use either all 10,000+ genes or we can select a subset of informative genes whose variability exceeds the expected noise of the data. We usually don't need that for flow or mass cytometry data, but I imagine that we can create a similar implementation here. The information of which feature was used will be encoded in the .var part of the anndata object. This way, you should be able to track which subset of features you used.
I do have to mention that I have not checked whether the FlowSOM package has this functionality already.

Best,
Maren

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants