Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

End to end workflow stops after PhaGCN #49

Open
bernt-matthias opened this issue Dec 19, 2024 · 3 comments
Open

End to end workflow stops after PhaGCN #49

bernt-matthias opened this issue Dec 19, 2024 · 3 comments

Comments

@bernt-matthias
Copy link

I have a run the exits with exit code 0 (indicating success:

PhaBOX2 is running with: 1 threads!
Running program: PhaMer (virus identification)
[1/7] filtering the length of contigs...
[2/7] calling genes with prodigal...
[3/7] running all-against-all alignment...
[4/7] converting sequences to sentences for language model...
[5/7] Predicting the viruses...
100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 14/14 [00:01<00:00, 11.22it/s]
[6/7] summarizing the results...
[7/7] writing the results...
Run time: 258.42 seconds

PhaMer finished! please check the results in output/final_prediction
Running program: PhaGCN (taxonomy classification)
[1/8] reusing existing filtered contigs...
PhaGCN finished! please check the results in output/final_prediction/phagcn_prediction.tsv

I guess its because output/filtered_contigs.fa is empty and one of the exit() calls here or here is called.

Wondering if the workflow should continue, i.e. the exits should be return statements or if there should be an exit(1). I tried the PhaTYP step separately which gave me some results .. so continuing the workflow might be of interest.

There seem to be more exit() calls in the code which might better be exit(1)?

@KennthShang
Copy link
Owner

In the "end-to-end" design, if all the sequences are non-viruses judged by the PhaMer, then they should not theoretically be passed to all other tools. This is because the following methods do not have any "negative control" and may give unestimated predictions. It's like an ML/DL model usually fails to solve an out-of-distribution problem but still assigns an in-distribution label to an input.

However, if all the sequences are quantified as viruses in experiments, or identified as viruses by other methods, I suppose the users should skip the PhaMer part.

Or probably, I should provide a version that users can choose whether they need to run PhaMer? Looking forward to your advice.

Best,
Jiayu

@bernt-matthias
Copy link
Author

Thanks for the explanation. I guess my main problem was that for a novel user it's hard to see that there is a problem from the log output.

Maybe just add log output that tells the user that phamer detected no viruses.

An option to skip phamer could also be a good idea.

@KennthShang
Copy link
Owner

Thanks for the suggestions. I added a new log to show if no viruses were detected.

Also, a new option --skip is added. Users can decide whether they would like to skip PhaMer.

Wish you a nice holiday.

Best,
Jiayu

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants