Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cherry Edge Table not Seeing Host Nodes #41

Open
erfanshekarriz opened this issue Nov 11, 2024 · 9 comments
Open

Cherry Edge Table not Seeing Host Nodes #41

erfanshekarriz opened this issue Nov 11, 2024 · 9 comments

Comments

@erfanshekarriz
Copy link

Hello,

I wanted to ask about the "cherry_edge.csv" file output. Specifically, how can we identify whether a node is a prokaryotic host node or a viral node? I don't seem to see any specific indications.

I'm trying to plot a figure showing the multimodal interactions between the viruses and hosts, and so far I can only identify edges between viruses.

Thanks!

@KennthShang
Copy link
Owner

Hi there,

Currently, all the nodes in the network (cherry_edge.tsv and cherry_node.tsv) are viruses. No host is included. All the information about the predicted host is in cherry_prediction.tsv. So it is not a multimodal graph. If you are looking for all the meta-information about the database viruses, it can be found in phabox_db_v2/RefVirus.csv

We are also glad to provide the function you need to plot such multimodal interactions. But maybe you can show more detailed information about your needs, such as the output format. Do you mean you want a network file that includes the predictions in the cherry_prediction.tsv?

Best,
Jiayu

@erfanshekarriz
Copy link
Author

erfanshekarriz commented Nov 11, 2024

Hi Jiayu,

Thank you lots for your quick response!

It would be great to output either a network file (e.g. GraphML OR separate edge + node file) that either:

  1. Represents the multi-modal graph used to predict the final output of Cherry cherry_prediction.tsv
  2. Represents the simple virus-virus and host-virus relationships which would essentially be the cherry_edge.tsv and cherry_node.tsv files, but it would include the host connections (CRISPR and local-alignment-based).

Either of these files would be very helpful in inferring biological reasoning for why these relationships are important. I've included an example that I manually inferred with a random subset of my graph below.

Without Host:
Screenshot 2024-11-11 at 3 22 00 PM

With Host:
Screenshot 2024-11-11 at 3 15 45 PM

Hope that helps

Best,
Erfan

@KennthShang
Copy link
Owner

Hi Erfan,

I uploaded a script that may help. You can find it in the latest Github folder multimodal.py

The usage is listed below:

--node
    cherry node file (tsv) || cherry_network_nodes.tsv

--edge
    cherry edge file (tsv) || cherry_network_edges.tsv

--prediction
    cherry prediction file (tsv) || cherry_prediction.tsv

--outpth
    output path || refined_out/

This will add the edges from the virus to their predicted hosts. Please also note that, there are tree columns in the cherry_network_nodes.tsv:

Accession: Name of the nodes in the network
Host: The host name of the node
TYPE: 
    Ref: reference virus sequence (node) in the database
    Query: input viral sequence (node) provided by the users
    Host: a prokaryotic node

Please try it first and let me know whether it fits your needs. Also, feel free to let me know if you want further columns/information in these outputs.

Best,
Jiayu

@KennthShang
Copy link
Owner

Hi @erfanshekarriz ,

I am wondering whether the script is good to go.

If so, I decided to add it to the main program so that you do not need to run this script to gain the network each time.

Best,
Jiayu

@erfanshekarriz
Copy link
Author

Hi Jiayu,

Thank you lots for your prompt response. I am using the older version of PhaBox, and my output files are in .csv format so I can't run the code.

I've been trying to update to PhaBox2 using conda but it's consistently giving me errors across all three servers I use. I will make a new issue about the conda installation first and I can test out this script after I've updated it!

Best,

Erfan

@KennthShang
Copy link
Owner

I see, please make sure you followed the latest guidelines in the WIKI

Also, you can try the Installing phabox2 in primitive ways. But I am glad to check your issues if PhaBOX cannot be installed from the BioConda

Best,
Jiayu

@erfanshekarriz
Copy link
Author

Hi Jiayu,

The code seems to run fine on one dataset I've tried. I will try it out on another larger dataset just to make sure and I'll get back to you soon.

Best,
Erfan

@erfanshekarriz
Copy link
Author

erfanshekarriz commented Nov 20, 2024

Hi Jiayu!

I've tested it out on the larger dataset I presented last time. It seems that the graphs don't look the same in that the viral nodes connecting to different hosts are no longer connected.

New CHERRY graph

I'm unsure whether it's because of algorithmic changes in CHERRY leading to slightly different results or the multimodal.py code.

I was expecting the multimodal graph to have multiple edges between the same pairs of viruses, but all edges seem to be unique.

Let me know if I'm missing something here!

Best,
Erfan

@KennthShang
Copy link
Owner

Hi Erfan,

There may be two reasons leading to the case:

  1. The graph used in CHERRY is updated according to the latest ICTV. The virus-virus edges are much more pure than before. And according to the RefSeq Viral database, if viruses belong to the same genus, they usually have the same host (this is even more common for bacteriophages). Thus, pure edges may lead to a pure subgraph for the genus-level grouping and also lead to a pure host label.
  2. Also, in the latest version of CHERRY, there are three steps for host prediction: 1 MAG CRISPRs, 2. Database CRISPRs, and 3. AAI-based. If your data happens to contain lots of viruses that can only be predicted by AAI-based, and they also have pure edges at the genus level, then the network will look like this.

Best,
Jiayu

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants