Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Prune_transcripts.py error #12

Open
marcasriv opened this issue Sep 30, 2019 · 1 comment
Open

Prune_transcripts.py error #12

marcasriv opened this issue Sep 30, 2019 · 1 comment

Comments

@marcasriv
Copy link

Hi Alex,

Thanks for making the ORF-RATE pipeline public and accessible.

When running prune_transcripts.py with the following parameters in Python 2.7.15:
prune_transcripts.py --inbed annot.bed --minlen 27 --maxlen 32 genome.fa test.bam -p 10 --verbose

I am getting the following error :

[2019-09-30 11:59:51] Reading transcriptome and genome
[2019-09-30 12:00:05] Parsing sequence and count information

Traceback (most recent call last):
File "prune_transcripts.py", line 179, in
% (len(tid_summary), lowreads_dropped, len(tid_summary)-lowreads_dropped, opts.minreads, opts.peakfrac))

ValueError: All 46526 transcripts dropped due to too few reads (46526) or too many reads coming from one position (0). Consider decreasing MINREADS (currently 64) or increasing PEAKFR(currently 0.200000), or check validity of input BAM file.

I have done as suggested in the ValueError and decreased MINREADS and increased PEAKFR, but I keep getting the same error no matter what parameters I use. I've checked the input BAM file and it seems okay too. I am quite sure that the problem isn't low coverage. When doing samtools flagstat of the BAM file I get:

203414522 + 0 in total (QC-passed reads + QC-failed reads)
100572976 + 0 secondary
0 + 0 supplementary
0 + 0 duplicates
203414522 + 0 mapped (100.00% : N/A)
0 + 0 paired in sequencing
0 + 0 read1
0 + 0 read2
0 + 0 properly paired (N/A : N/A)
0 + 0 with itself and mate mapped
0 + 0 singletons (N/A : N/A)
0 + 0 with mate mapped to a different chr
0 + 0 with mate mapped to a different chr (mapQ>=5)

Please find attached a sample of the structure of my input files:
genome.fa.txt
annot.bed.txt

Thank you so much,

Marina

@alexfields
Copy link
Owner

Apologies for the issues.

Is it possible that your input BAM file is aligned to a different reference than your BED file? For example, is the alignment in transcriptome coordinates? ORF-RATER expects genome coordinates.

Could you send a few lines from samtools view test.bam?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants