Celltypist symbol col assignmnet fix #222

ECM893 · 2025-08-27T16:19:15Z

PR Notes

I fully do NOT intend for this PR to be merged as is.

Problem 1: When using the .h5ed from scrnaseq pipeline, you must set symbol_col to a column that is categorical, but this is incompatible with anndata objects's var_names (when calling the function to make unique indexes). The first fix is to simply cast the pd.Series to a list when assigning it to adata.var_names.

Problem 2: This gets a little more dicey, When trying to boot strap scanvi from celltypist, it's hard to know/non-obvious the name of the celltypist column before hand that's needed for Celltypist labels, as it expect them to be in the 'label_col' column, which is empty. I still believe this can be changed elsewhere, or maybe I'm just wrong in my approach, Please correct me, im not an expert with scrnaseq.

Problem 3: Similar to 1, adata.var_names becomes categorical again, There should be a nicer place to change this upstream, but I cant find it right now.

Additional: I couldn't the debug profile to behave nicely, maybe something is wrong?

PR checklist

nf-core/scrnaseq saves the scanvi compatable symbols in the column "gene_symbols" (index is the ensbl IDs). So to be compatible with nf-core/scdownstream You must define "symbol_col" in the sample sheet and "gene_symbol" as the value for each sample in the column. However, the .h5ad from nf-core/scrnaseq this column as categorical. in line 50 of celltypist.py is simple replaces the ver_names (index, type = Object) with the categorical column. However, in line 60 celltypist.annotate will attempt to make the var_names unique. This will error as it cannot handle categorical restructuring. It must be converted to a different column type, To fix this simple, call .to_list() for the reassignment.

…. It's possible this comes from an error somewhere else and should be fixed there..

ECM893 added 3 commits August 24, 2025 14:55

Hack to fix celltypist to scanvi cell label column issue.

678438d

change to liana to handle the case where var_names become categorical…

2411346

…. It's possible this comes from an error somewhere else and should be fixed there..

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Celltypist symbol col assignmnet fix #222

Celltypist symbol col assignmnet fix #222

ECM893 commented Aug 27, 2025

Uh oh!

Uh oh!

Celltypist symbol col assignmnet fix #222

Are you sure you want to change the base?

Celltypist symbol col assignmnet fix #222

Conversation

ECM893 commented Aug 27, 2025

PR Notes

PR checklist

Uh oh!

Uh oh!