Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SampComp error #199

Open
rania-o opened this issue Mar 25, 2022 · 5 comments
Open

SampComp error #199

rania-o opened this issue Mar 25, 2022 · 5 comments

Comments

@rania-o
Copy link

rania-o commented Mar 25, 2022

Hello,

I'm using Nanocompore to compare between a modified sample and an IVT sample.
I've already done the nanopolish collapse step and I got this in the log file (for the IVT sample, the modified one also has similar results):

2022-03-25T10:41:46.337153+0100 WARNING - MainProcess | Running Eventalign_collapse
2022-03-25T10:41:46.337736+0100 INFO - MainProcess | Checking and initialising Eventalign_collapse
2022-03-25T10:41:46.339649+0100 INFO - MainProcess | Starting data processing
2022-03-25T10:54:48.308272+0100 INFO - Process-6 | Output reads written:21561


Written Reads:21561 Kmers:6887018

and when I grep the valid kmers in the output collapsed file I get :
6078205 valid kmers / 6887018 kmers.

After this, I tried to run SampComp (even-though I don't have any replicats) :

nanocompore sampcomp --file_list1 psi0_transcrit_oligo_collapsed_reads.eventalign/out_eventalign_collapse.tsv  --file_list2 psi2_transcrit_oligo_collapsed_reads.eventalign/out_eventalign_collapse.tsv --fasta ../transcript_oligo.fasta  --outpath ./samp_comp_results

2022-03-25T14:32:16.222012+0100 WARNING - MainProcess | Running SampComp
2022-03-25T14:32:16.222857+0100 INFO - MainProcess | Checking and initialising SampComp
2022-03-25T14:32:16.226479+0100 INFO - MainProcess | Only 1 replicate found for condition Condition1
2022-03-25T14:32:16.226733+0100 INFO - MainProcess | This is not recommended. The statistics will be calculated with the logit method
2022-03-25T14:32:16.227296+0100 INFO - MainProcess | Only 1 replicate found for condition Condition2
2022-03-25T14:32:16.227704+0100 INFO - MainProcess | This is not recommended. The statistics will be calculated with the logit method
2022-03-25T14:32:16.230122+0100 INFO - MainProcess | Reading eventalign index files
2022-03-25T14:32:18.253073+0100 INFO - MainProcess | 	References found in index: 1
2022-03-25T14:32:18.253414+0100 INFO - MainProcess | Filtering out references with low coverage
2022-03-25T14:32:18.254686+0100 INFO - MainProcess | 	References remaining after reference coverage filtering: 0
2022-03-25T14:32:18.255010+0100 INFO - MainProcess | Starting data processing
2022-03-25T14:32:18.301037+0100 INFO - Process-3 | All Done. Transcripts processed: 0
2022-03-25T14:32:18.309365+0100 INFO - MainProcess | Loading SampCompDB
2022-03-25T14:32:18.317105+0100 INFO - MainProcess | The result database is empty
2022-03-25T14:32:18.318381+0100 INFO - MainProcess | Saving results

So I run it again with a min_coverage equal to 0 :

nanocompore sampcomp --file_list1 psi0_transcrit_oligo_collapsed_reads.eventalign/out_eventalign_collapse.tsv  --file_list2 psi2_transcrit_oligo_collapsed_reads.eventalign/out_eventalign_collapse.tsv --fasta ../transcript_oligo.fasta  --outpath ./samp_comp_results_2 --min_coverage 0

Condition:Condition1 Sample:Condition1_1 	High fraction of invalid kmers: 21,555	valid reads: 6
Condition:Condition2 Sample:Condition2_1 	High fraction of invalid kmers: 20,243	valid reads: 2

but there are almost 6 millions of valid kmers, isn't it enough ?
or does it means that my data is not suitable for nanocompre ? (I used other tools to detect modifications, and it worked well)

This is the message error I got :

2022-03-25T14:59:18.933552+0100 WARNING - MainProcess | Running SampComp
2022-03-25T14:59:18.934119+0100 INFO - MainProcess | Checking and initialising SampComp
2022-03-25T14:59:18.937440+0100 INFO - MainProcess | Only 1 replicate found for condition Condition1
2022-03-25T14:59:18.937670+0100 INFO - MainProcess | This is not recommended. The statistics will be calculated with the logit method
2022-03-25T14:59:18.938098+0100 INFO - MainProcess | Only 1 replicate found for condition Condition2
2022-03-25T14:59:18.938339+0100 INFO - MainProcess | This is not recommended. The statistics will be calculated with the logit method
2022-03-25T14:59:18.940320+0100 INFO - MainProcess | Reading eventalign index files
2022-03-25T14:59:20.513673+0100 INFO - MainProcess | 	References found in index: 1
2022-03-25T14:59:20.514114+0100 INFO - MainProcess | Filtering out references with low coverage
2022-03-25T14:59:20.515235+0100 INFO - MainProcess | 	References remaining after reference coverage filtering: 1
2022-03-25T14:59:20.515533+0100 INFO - MainProcess | Starting data processing
2022-03-25T14:59:20.637782+0100 ERROR - Process-2 | Error doing GMM test on reference dystro-oligo
2022-03-25T14:59:20.638123+0100 ERROR - Process-2 | Error in Worker
nanocompore.common.NanocomporeError: Error doing GMM test on reference dystro-oligo
ValueError: Expected 2D array, got 1D array instead:
array=[].
Reshape your data either using array.reshape(-1, 1) if your data has a single feature or array.reshape(1, -1) if it contains a single sample.

I don't know if it's clear, waiting for your help.
Thank you.

@JannesSP
Copy link

JannesSP commented Oct 11, 2022

I get the same error, did you find a solution to that error?
I guess you need at least two replicates per condition?

@rania-o
Copy link
Author

rania-o commented Oct 11, 2022

No, I didn't. I just used other tools.

@lmulroney
Copy link
Collaborator

Hi rania-o and JannesSP,
I apologise for the lack of activity here last year. How long is your reference sequence? If it is near 100 nt long then you may need to lower the reference length. You may also want to look at the --max_invalid_kmers_freq option and set it higher than 0.1 (the default).

I know you've likely moved on from using nanocompore, but if you try these settings and it works for you, let me know.

Thanks,
Logan

@keenhl
Copy link

keenhl commented Mar 11, 2024

@rania-o What other tools have you tried ?

@rania-o
Copy link
Author

rania-o commented Mar 16, 2024

@keenhl Drummer, Epinano, Eligos, Xpore ...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants