Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ValueError: Length mismatch: Expected axis has 0 elements, new values have 7 elements #3

Open
kayla-xu opened this issue Nov 11, 2023 · 15 comments

Comments

@kayla-xu
Copy link

kayla-xu commented Nov 11, 2023

I tried to run my own data on different servers, but all reported the same error, just in case

image

I ran the sample data provided by the author, and I can successfully get the result, this is my code, may I ask what is the problem !!! and that is my code, thank u !!!

CamoTSS --gtf ./gencode.vM25.primary_assembly.annotation.sorted.gtf --refFasta ./GRCm38.primary_assembly.genome.fa --bam ./filtered_CamoTSS.bam -c ./seurat.cell.barcode.txt -o ./ --mode TC+CTSS

@kayla-xu
Copy link
Author

I tried to run my own data on different servers, but all reported the same error, just in case

image

I ran the sample data provided by the author, and I can successfully get the result, this is my code, may I ask what is the problem !!! and that is my code, thank u !!!

CamoTSS --gtf ./gencode.vM25.primary_assembly.annotation.sorted.gtf --refFasta ./GRCm38.primary_assembly.genome.fa --bam ./filtered_CamoTSS.bam -c ./seurat.cell.barcode.txt -o ./ --mode TC+CTSS

these are my gtf, bam, cell_id, respectively. thanks again !!!
image
image
image

@houruiyan
Copy link
Collaborator

Hi kayla,

I am so sorry for the the late response. It seems that you did not fetch the TSS cluster. But I do not know the reason. It may be related with your file.

Could you upload your data to onedrive and then send email [email protected] to me? I can check it for you!

Best regards,
Ruiyan

@HjolliEin
Copy link

I've run into the same problem as reported here. Has this been solved?

@houruiyan
Copy link
Collaborator

Hello, Could you provide the subset of your dataset? I can check for you.

Ruiyan

@houruiyan
Copy link
Collaborator

@HjolliEin @kayla-xu

@houruiyan
Copy link
Collaborator

Hi the reason happened for this error is that your did not get the PASS reads in the first step.
Our package is developed based on Cellranger output and also did some filtering step (detail , please refer our manual).
So make sure that your bam file is suitable.

@ShweiSTAT
Copy link

@houruiyan Hi Ruiyan, I ran into a similar length mismatch problem, even though I have followed the steps in the tutorial to filter my possorted_genome_bam.bam directly output from cellranger –chemistry SC5P-PE.

The error message is as below:
error_message

Thank you so much for your help!

@houruiyan
Copy link
Collaborator

Hi Shiwei,

Can you transfer some your test data to me?

Best regards,
Ruiyan

@ShweiSTAT
Copy link

Hi Ruiyan, Thank you for your reply. Sorry, I cannot share the data due to the privacy regulation. But I can share with you how I process the .fastq file to generate the BAM file.

The fastq files were generated from the 10X genomics 5' single-cell RNA-seq assay. This is the command I used to process the .fastq

Screenshot 2024-03-06 at 9 42 30 AM

Then I followed these steps to filter the possorted_genome_bam.bam files (the file is renamed to sorted_filtered_-5P.bam for distinction, since I have multiple samples)

Screenshot 2024-03-06 at 9 43 47 AM.

Thank you so much for your time!

@houruiyan
Copy link
Collaborator

Hi shiwei,

Can you just subsample the bam file just around hundreds of reads and send the test bam file to me?

Ruiyan

@ShweiSTAT
Copy link

HI Ruiyan, Thank you for your reply. Sorry I can't share any of the data due to the university protocols. If we have any data that is shareable in the future, I will share it to you.

Thank you again!

Shiwei

@houruiyan
Copy link
Collaborator

ok, then you can debug by youself. I help one of users to check the problem. he used the bam file from starsolo. So the bam file is not compatible with us. He also produce this problem during getting PASS reads. So you can check the process of producing the PASS reads.

Best regards,
Ruiyan

@houruiyan
Copy link
Collaborator

Hi everyone who have this error,

You should make sure that you use the same reference genome and annotation file when you do the alignment and run CamoTSS. In other words, most of you run cellranger to do alignment. When you run cellranger, you download the reference file at first, for example, refdata-gex-GRCh38-2020-A. You can find out the fasta and gene subfolder in this parent folder. So when you run CamoTSS, you should use the genes/genes.gtf and fasta/genome.fa as the reference annotation and reference sequence file.

Ruiyan

@houruiyan
Copy link
Collaborator

Hi Ruiyan,

I hope this message finds you well.

I'm Xiao Song from Northwestern University. Recently, I've been using CamoTSS to analyze our 5' library 10x scRNA-seq data. Unfortunately, I've encountered an error message: "ValueError: Length mismatch: Expected axis has 0 elements, new values have 7 elements". It seems that another user had a similar issue, as seen in the Github Issues section. Prior to running CamoTSS, I used Cellranger to generate the bam files and completed the necessary filtering and merge steps as instructed in the manual. I've attached the first 100 reads of my filtered bam file to this email. I would greatly appreciate your assistance in debugging this issue.

Thank you very much for your time and support!


Hi xiao,

Thank you for your email and your test data.

But it seems that this bam file lacks the head.

Maybe you can subsample from your large data like this.
samtools view -@ 20 -h -b --subsample 0.0001 --subsample-seed 666 PBMC_866Aligned.sortedByCoord.out.bam > PBMC_866_test.bam

In addition, could you provide the version of reference fasta file and gtf file what you used. And also the cellbarcode file running this step.


Hi Ruiyan,

Thank you for the prompt response.

GTF and fasta files were both downloaded from genecode website, release 44.
https://www.gencodegenes.org/human/release_44.html
gtfFile: gencode.v44.primary_assembly.annotation.gtf
fastaFile: GRCh38.primary_assembly.genome.fa

Attached are test bam file with your suggested script and the top 100 rows of the cell barcode file.

Let me know if you need more information.

Thank you again for your help!


Hi Xiao,

Could you check your ref_gene or ref_TSS file in ref_file folder?

Does it looks like this ? That means that for the gene_id, the format is XXXX.num?

粘贴的图形-1.png
粘贴的图形-2.png

Do you sure that you use the same reference gtf file and fasta file with the bam file?

I found the format of gene_id of your bam file looks like this XXXXX and without the .num.

You tell me that you use the cellranger to get the bam file at first. So you should also use the same fasta file and gtf file from cellranger.

I think when you run the cellranger, you maybe download the reference at first. It may looks like this. refdata-gex-GRCh38-2020-A
In this folder, the structure looks like this.
Hi Xiao,

Could you check your ref_gene or ref_TSS file in ref_file folder?

Does it looks like this ? That means that for the gene_id, the format is XXXX.num?

粘贴的图形-1.png
粘贴的图形-2.png

Do you sure that you use the same reference gtf file and fasta file with the bam file?

I found the format of gene_id of your bam file looks like this XXXXX and without the .num.

You tell me that you use the cellranger to get the bam file at first. So you should also use the same fasta file and gtf file from cellranger.

I think when you run the cellranger, you maybe download the reference at first. It may looks like this. refdata-gex-GRCh38-2020-A
In this folder, the structure looks like this.
粘贴的图形-3.png
In the fasta folder, you can find out the genome.fa. In the genes folder, you can find out the genes.gtf.

You should input this two files to CamoTSS rather than download gtf and fasta release 44.

If you have another question, feel free to contact me.
In the fasta folder, you can find out the genome.fa. In the genes folder, you can find out the genes.gtf.

You should input this two files to CamoTSS rather than download gtf and fasta release 44.

If you have another question, feel free to contact me.


Hi Ruiyan,

Thank you for your reply.

Yes, you are right. I used the wrong gtf and fasta file. I have changed to use the downloaded reference from 10x website and now it's running without error. Thank you for helping me fix the problem. I don't mind you putting our conversation on github.

@dongwei1220
Copy link

I've run into the same problem as reported here. How can I resolve it? Thank you!
image

These two pickle files contain no data in the count folder.
image

The ref_gene and ref_TSS files are as follows:
image
image

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants