Choose markers to use in FlowSOM clustering #78

hrj21 · 2024-08-16T10:43:51Z

Description of feature

Hello!

Thanks again for the excellent package; I've been using it lately with great success (and fun!). My feature request is for the ability to choose a subset of var_names to cluster the observations on. This is available in the FlowSOM R and Python packages, and is a common and useful tool to control clustering.

There are a few use cases for this:

The first is when we can partition our antigens into those that define clear lineages of cells (e.g. CD3, CD14, CD11b), and those that describe the functional state of cells (e.g. cytokines, metabolic markers). Restricting clustering to only those lineage markers sometimes gives better resolution between populations, whose activation state can then studied using the functional markers.

Secondly, we have performed studies where the question was "can marker set A be used to independently identify the same cells as identified by marker set B" (it was whether metabolic antigens only can be used to identify leucocyte populations). In this case being able to select antigens for a particular clustering model was central to the experiment.

And finally, sometimes we might just have a dud marker that either wasn't expressed or the antibody didn't work, and it simply adds noise.

If there's a convenient way to do this already, please forgive me!

Best wishes
Hefin

The text was updated successfully, but these errors were encountered:

mbuttner · 2024-09-03T19:13:12Z

Hi @hrj21

thank you for the praise!
I am grateful for your detailed feature enhancement description and the detailed examples. Those are very helpful to understand your feature request. I usually approach the subsetting of var_names as follows, which is certainly blowing up the memory when working with large objects:

Create a temporary copy of the object with the subset of features you are interested in
Run the FlowSOM clustering
Save the clustering result to the original object
Delete the temporary copy

However, I can put some time aside to implement a subsetting functionality that is similar to the use_highly_variable_genes parameter in various scanpy functions. For context, when we compute a PCA on single-cell RNAseq data, we can use either all 10,000+ genes or we can select a subset of informative genes whose variability exceeds the expected noise of the data. We usually don't need that for flow or mass cytometry data, but I imagine that we can create a similar implementation here. The information of which feature was used will be encoded in the .var part of the anndata object. This way, you should be able to track which subset of features you used.
I do have to mention that I have not checked whether the FlowSOM package has this functionality already.

Best,
Maren

hrj21 added the enhancement New feature or request label Aug 16, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Choose markers to use in FlowSOM clustering #78

Choose markers to use in FlowSOM clustering #78

hrj21 commented Aug 16, 2024

mbuttner commented Sep 3, 2024

Choose markers to use in FlowSOM clustering #78

Choose markers to use in FlowSOM clustering #78

Comments

hrj21 commented Aug 16, 2024

Description of feature

mbuttner commented Sep 3, 2024