Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Something problem with batch QC #360

Open
jiangli2941 opened this issue Dec 6, 2024 · 9 comments
Open

Something problem with batch QC #360

jiangli2941 opened this issue Dec 6, 2024 · 9 comments

Comments

@jiangli2941
Copy link

I imported six file merge analysis, their format is in order as follows:
ms_data: {'0': (31381, 23559), '1': (24041, 23059), '2': (26034, 23518), '3': (28167, 23087), '4': (29321, 23074), '5': (29299, 23810)}
num_slice: 6
names: ['0', '1', '2', '3', '4', '5']

I followed the tutorial step by step, but there was a long analysis and pause at this step
ms_data.tl.batch_qc(scope=slice_generator[:],mode='integrate', cluster_res_key='leiden', report_path='./batch_qc', res_key='batch_qc')
Output:
[2024-12-05 22:47:24][Stereo][3971300][MainThread][131909463319616][ms_pipeline][113][INFO]: register algorithm batch_qc to <class 'stereo.core.stereo_exp_data.StereoExpData'>-131907745263232
[2024-12-05 23:03:08][Stereo][3971300][MainThread][131909463319616][classifier][144][INFO]: Model Training Finished!
[2024-12-05 23:03:08][Stereo][3971300][MainThread][131909463319616][classifier][145][INFO]: Trained checkpoint file has been saved to ./batch_qc

Due to the analysis on the cloud server, I waited for a night and did not get the corresponding output result, please ask: 1. Is there a code to simplify the output? 2. Is there a way to improve the speed of operation?

@jiangli2941
Copy link
Author

Another key question I want to ask is, how do you manually annotate subpopulations of cells? The tutorial only explains singleR's automatic annotation, which is mainly capable of distinguishing immune cells of PBMC. How to manually annotate other parenchymal cells, such as kidney CD-PC,PODO,EC, etc., through marker clustering?

@tanliwei-coder
Copy link
Collaborator

tanliwei-coder commented Dec 9, 2024

Another key question I want to ask is, how do you manually annotate subpopulations of cells? The tutorial only explains singleR's automatic annotation, which is mainly capable of distinguishing immune cells of PBMC. How to manually annotate other parenchymal cells, such as kidney CD-PC,PODO,EC, etc., through marker clustering?

I think if you have a reference about parenchymal cells, singleR also could be used to annotate automatically.

@tanliwei-coder
Copy link
Collaborator

I imported six file merge analysis, their format is in order as follows: ms_data: {'0': (31381, 23559), '1': (24041, 23059), '2': (26034, 23518), '3': (28167, 23087), '4': (29321, 23074), '5': (29299, 23810)} num_slice: 6 names: ['0', '1', '2', '3', '4', '5']

I followed the tutorial step by step, but there was a long analysis and pause at this step ms_data.tl.batch_qc(scope=slice_generator[:],mode='integrate', cluster_res_key='leiden', report_path='./batch_qc', res_key='batch_qc') Output: [2024-12-05 22:47:24][Stereo][3971300][MainThread][131909463319616][ms_pipeline][113][INFO]: register algorithm batch_qc to <class 'stereo.core.stereo_exp_data.StereoExpData'>-131907745263232 [2024-12-05 23:03:08][Stereo][3971300][MainThread][131909463319616][classifier][144][INFO]: Model Training Finished! [2024-12-05 23:03:08][Stereo][3971300][MainThread][131909463319616][classifier][145][INFO]: Trained checkpoint file has been saved to ./batch_qc

Due to the analysis on the cloud server, I waited for a night and did not get the corresponding output result, please ask: 1. Is there a code to simplify the output? 2. Is there a way to improve the speed of operation?

In the same directory of the notebook you ran the BatchQC, there is a subdirectory called batch_qc, in which there is an html file called BatchQC_reprot_raw.html, it can be opened directly on notebook.

@jiangli2941
Copy link
Author

Another key question I want to ask is, how do you manually annotate subpopulations of cells? The tutorial only explains singleR's automatic annotation, which is mainly capable of distinguishing immune cells of PBMC. How to manually annotate other parenchymal cells, such as kidney CD-PC,PODO,EC, etc., through marker clustering?

I think if you have a reference about parenchymal cells, singleR also could be used to annotate automatically.

I imported six file merge analysis, their format is in order as follows: ms_data: {'0': (31381, 23559), '1': (24041, 23059), '2': (26034, 23518), '3': (28167, 23087), '4': (29321, 23074), '5': (29299, 23810)} num_slice: 6 names: ['0', '1', '2', '3', '4', '5']
I followed the tutorial step by step, but there was a long analysis and pause at this step ms_data.tl.batch_qc(scope=slice_generator[:],mode='integrate', cluster_res_key='leiden', report_path='./batch_qc', res_key='batch_qc') Output: [2024-12-05 22:47:24][Stereo][3971300][MainThread][131909463319616][ms_pipeline][113][INFO]: register algorithm batch_qc to <class 'stereo.core.stereo_exp_data.StereoExpData'>-131907745263232 [2024-12-05 23:03:08][Stereo][3971300][MainThread][131909463319616][classifier][144][INFO]: Model Training Finished! [2024-12-05 23:03:08][Stereo][3971300][MainThread][131909463319616][classifier][145][INFO]: Trained checkpoint file has been saved to ./batch_qc
Due to the analysis on the cloud server, I waited for a night and did not get the corresponding output result, please ask: 1. Is there a code to simplify the output? 2. Is there a way to improve the speed of operation?

In the same directory of the notebook you ran the BatchQC, there is a subdirectory called batch_qc, in which there is an html file called BatchQC_reprot_raw.html, it can be opened directly on notebook.

But I only found to look at the suffix bgi batchQC file, does this mean that the output failed?
2400698f-bf53-4fe5-93fe-a7bde8597e42

@jiangli2941
Copy link
Author

jiangli2941 commented Dec 10, 2024

I imported six file merge analysis, their format is in order as follows: ms_data: {'0': (31381, 23559), '1': (24041, 23059), '2': (26034, 23518), '3': (28167, 23087), '4': (29321, 23074), '5': (29299, 23810)} num_slice: 6 names: ['0', '1', '2', '3', '4', '5']
I followed the tutorial step by step, but there was a long analysis and pause at this step ms_data.tl.batch_qc(scope=slice_generator[:],mode='integrate', cluster_res_key='leiden', report_path='./batch_qc', res_key='batch_qc') Output: [2024-12-05 22:47:24][Stereo][3971300][MainThread][131909463319616][ms_pipeline][113][INFO]: register algorithm batch_qc to <class 'stereo.core.stereo_exp_data.StereoExpData'>-131907745263232 [2024-12-05 23:03:08][Stereo][3971300][MainThread][131909463319616][classifier][144][INFO]: Model Training Finished! [2024-12-05 23:03:08][Stereo][3971300][MainThread][131909463319616][classifier][145][INFO]: Trained checkpoint file has been saved to ./batch_qc
Due to the analysis on the cloud server, I waited for a night and did not get the corresponding output result, please ask: 1. Is there a code to simplify the output? 2. Is there a way to improve the speed of operation?

In the same directory of the notebook you ran the BatchQC, there is a subdirectory called batch_qc, in which there is an html file called BatchQC_reprot_raw.html, it can be opened directly on notebook.

Thank u for the guiding! Could I ask in deep about the detail in which we can establish the private singleR reference? In R with Seurat package, I usually take use of some cell marker and draw the dotplot, how about in Stereopy?

BTW, the reference data of SingleR as h5ad was undownable for me. What is the format of this data?Look forward to reply

@jiangli2941
Copy link
Author

I eventually tried the singleR annovation with MouseRNAseqData. As I expected, it did perform very badly in the annotation of kidney cells. I tried to convert the data to rda format and annotate the cells in R studio, using the scCATCH package (a annotation toolkit based on single cell clusters, from cluster marker gene identification to cluster annotation based on evidence scoring). This could show some reasonably good comment results.

So the question is, how can you build an individual's singR annotated gene set for matching?

I tried 3 methods:

  1. Export the cell_table of the scCATCH package and create it as h5ad file, but it fails because the remaining necessary information of singleR is missing

  2. MouseRNAseqData in singR was replaced with gene and cell information in cell_table, but the replacement failed because the number of rows did not match.

  3. Export the stereopy standard file to h5ad and use scanpy for annotation, but the export file seems to lack some necessary information, and the annotation still fails.

In short, I really hope to get the author's help in the annotation, I think this is part of the distress after choosing your company's service.

@tanliwei-coder
Copy link
Collaborator

tanliwei-coder commented Dec 27, 2024

Another key question I want to ask is, how do you manually annotate subpopulations of cells? The tutorial only explains singleR's automatic annotation, which is mainly capable of distinguishing immune cells of PBMC. How to manually annotate other parenchymal cells, such as kidney CD-PC,PODO,EC, etc., through marker clustering?

I think if you have a reference about parenchymal cells, singleR also could be used to annotate automatically.

I imported six file merge analysis, their format is in order as follows: ms_data: {'0': (31381, 23559), '1': (24041, 23059), '2': (26034, 23518), '3': (28167, 23087), '4': (29321, 23074), '5': (29299, 23810)} num_slice: 6 names: ['0', '1', '2', '3', '4', '5']
I followed the tutorial step by step, but there was a long analysis and pause at this step ms_data.tl.batch_qc(scope=slice_generator[:],mode='integrate', cluster_res_key='leiden', report_path='./batch_qc', res_key='batch_qc') Output: [2024-12-05 22:47:24][Stereo][3971300][MainThread][131909463319616][ms_pipeline][113][INFO]: register algorithm batch_qc to <class 'stereo.core.stereo_exp_data.StereoExpData'>-131907745263232 [2024-12-05 23:03:08][Stereo][3971300][MainThread][131909463319616][classifier][144][INFO]: Model Training Finished! [2024-12-05 23:03:08][Stereo][3971300][MainThread][131909463319616][classifier][145][INFO]: Trained checkpoint file has been saved to ./batch_qc
Due to the analysis on the cloud server, I waited for a night and did not get the corresponding output result, please ask: 1. Is there a code to simplify the output? 2. Is there a way to improve the speed of operation?

In the same directory of the notebook you ran the BatchQC, there is a subdirectory called batch_qc, in which there is an html file called BatchQC_reprot_raw.html, it can be opened directly on notebook.

But I only found to look at the suffix bgi batchQC file, does this mean that the output failed? 2400698f-bf53-4fe5-93fe-a7bde8597e42

I guess your data is so lager that the BatchQC tend to take more time, I don't have your data, I can not judge it correctly.

@tanliwei-coder
Copy link
Collaborator

tanliwei-coder commented Dec 27, 2024

h5ad is a file format used to save AnnData, the reference only needs to be an h5ad file in which the obs has a column representing cell type, when you run the singleR, set the parameter ref_use_col to the obs column name of the cell type.

@tanliwei-coder
Copy link
Collaborator

tanliwei-coder commented Dec 27, 2024

There is also a method, you can use the clustering methods of stereopy to cluster the data you want to annotate, then observe the cluster result and use method data.tl.annotation to annotate the cluster result manually.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants