Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] cycle_basis not returning the correct number of cycles #219

Open
shuettel7 opened this issue Feb 2, 2023 · 9 comments
Open

[BUG] cycle_basis not returning the correct number of cycles #219

shuettel7 opened this issue Feb 2, 2023 · 9 comments
Labels
bug Something isn't working

Comments

@shuettel7
Copy link

Description of bug
The cycle_basis without root parameter has a problem with a graph in the shape of the following molecule: https://pubchem.ncbi.nlm.nih.gov/compound/149096
With root, the result is inconsistent depending on the root: Mostly 3 and sometimes the correct 4 cycles smaller than 10, see below.

How to reproduce
The BiochemicalAlgorithms.jl provides a core for loading 3D Conformers from the PubChem .json format.
Download the 3D Conformer via "Download" from: https://pubchem.ncbi.nlm.nih.gov/compound/149096#section=3D-Conformer
Add the BiochemicalAlgorithms.jl package from https://github.com/hildebrandtlab/BiochemicalAlgorithms.jl (the develop branch is sufficient for this purpose). Then in Console load the molecule:
mol_149096 = load_pubchem_json("your/path/here.json")[1]

Then build the graph with the following function:
function build_graph(mol::Molecule)
mol_graph = SimpleGraph(nrow(mol.atoms))
for i = (1:nrow(mol.bonds))
add_edge!(mol_graph, mol.bonds.a1[i], mol.bonds.a2[i])
end
return mol_graph
end

After building the graph of the molecule and naming it mol_graph_CID_149096:
filter(x -> lastindex(x) < 10, cycle_basis(mol_graph_CID_149096))

Expected behavior
filter(x -> lastindex(x) < 10, cycle_basis(mol_graph_CID_149096))
Should return 4 cycles smaller than 10:
[7, 20, 25, 23, 18, 13]
[14, 12, 19, 21, 18, 13]
[9, 15, 8, 16, 10, 6]
[7, 11, 17, 2, 14, 13]

Actual behavior
Only returns 3 cycles smaller than 10:
[13, 14, 12, 19, 21, 18]
[13, 7, 11, 17, 2, 14]
[9, 15, 8, 16, 10, 6]

Code demonstrating bug
see "How to reproduce" section above

Version information
Graphs.jl v1.7.4

Additional context
Add any other context about the problem here.

@shuettel7 shuettel7 added the bug Something isn't working label Feb 2, 2023
@shuettel7 shuettel7 changed the title [BUG] [BUG] cycle_basis not returning the correct number of cycles Feb 2, 2023
@gdalle gdalle added this to the v1.9 milestone Jun 28, 2023
@henrik-wolf
Copy link
Contributor

henrik-wolf commented Aug 12, 2023

From what I can tell this is correct behaviour, as the cycle_basis returns a fundamental set of cycles. Maybe you are after the fundamental set with minimum weight Cycle_basis#Minimum_weight?

Currently graphs does not have support for it, but there seems to be an algorithm for finding them here: Horton, 1987 (at least for some types of graphs. I think something like this would be nice to have in graphs though.)

For completeness sake, here is what I drew to understand the problem:
Screenshot 2023-08-12 at 09 54 31

The cycles I get from graphs are marked in orange. When I draw the green spanning tree from node 1, the red edges are the ones which close the fundamental cycles in the problematic region.

@antonhinneck
Copy link

antonhinneck commented Aug 22, 2023

I have been working with cycle bases for applications in the energy domain and stumbled upon this issue.
As I am reliant on Graphs.jl functioning properly, I tried to reproduce the bug:

Setup:

Code:

cd(@__DIR__)
using BiochemicalAlgorithms
using DataFrames
mol_149096 = load_pubchem_json("Conformer3D_COMPOUND_CID_149096.json")
fieldnames(typeof(mol_149096))

function build_graph(mol)
    mol_graph = SimpleGraph(nrow(mol._atoms))
    for i = (1:nrow(mol._bonds))
    add_edge!(mol_graph, mol._bonds.a1[i], mol._bonds.a2[i])
    end
    return mol_graph
end

g = build_graph(mol_149096)
cycle_basis(g)

Result:

4-element Vector{Vector{Int64}}:
 [14, 15, 13, 20, 22, 19]
 [14, 8, 12, 18, 3, 15]
 [21, 8, 12, 18, 3, 15, 13, 20, 22, 19, 24, 26]
 [10, 16, 9, 17, 11, 7]

The result supports the findings of @SuperGrobi.
Its correctness can be confirmed based on @SuperGrobi's plot.
The behaviour is as expected.
The cycle basis

  • contains 4 independent cycles that
  • are not necessarily minimum-weight.

For a minimum-weight cycle basis check out:

[1] Kavitha, Telikepalli, et al. “An O(m^2n) Algorithm for Minimum Cycle Basis of Graphs.” http://link.springer.com/article/10.1007/s00453-007-9064-z

[2] de Pina, J. 1995. Applications of shortest path methods. Ph.D. thesis, University of Amsterdam, Netherlands

Such algorithms are implemented in networkX.


I recently found a Graphs.SimpleGraph-compatible Julia implementation of a minimum-weight cycle basis algorithm here:
https://github.com/mojaie/MolecularGraph.jl/blob/master/src/graph/cycle.jl

I could open a PR to include a function min_cycle_basis in Graphs.jl, including unit tests against networkX.

@shuettel7
Copy link
Author

Thank you for your input! You're probably correct that it is not MEANT to find the smallest cycles. I will check out the literature you posted.

@henrik-wolf
Copy link
Contributor

henrik-wolf commented Aug 24, 2023

@antonhinneck thanks for checking. I think having minimum_cycle_basis in Graphs would indeed be nice. It will definitely help to get rid of the confusion @shuettel7 (and myself as well) had regarding the functionality of cycle_basis. Maybe get in contact with the MolecularGraph.jl maintainers about migrating/merging their implementation into graphs? There is some danger in having these functions implemented multiple times across the whole (extended) ecosystem.

@etiennedeg
Copy link
Member

I think this is indeed a bug, on the graph SuperGrobi plotted, any cycle basis should contains 4 cycles. No matter what, the size of the basis should not depend on the root vertex.

@henrik-wolf
Copy link
Contributor

The cycles in the original post seem to be the result of calling filter(x -> lastindex(x) < 10, cycle_basis(mol_graph_CID_149096)) so I think it does return all the cycles needed, but sometimes, one of them is longer than 9 elements.

@etiennedeg
Copy link
Member

etiennedeg commented Aug 25, 2023

Oh, correct, I read to quickly, so the issue is indeed the lack of a minimum_cycle_basis

Edit: The problem we face here is more exactly finding a cycle basis with a shortest maximal cycle

@gdalle gdalle removed this from the v1.9 milestone Sep 14, 2023
@mojaie
Copy link

mojaie commented Nov 12, 2023

Thank you for mentioning me in this issue, and I apologize for my very late reply.
I'm open to add this functionality to Graphs.jl and can make PR, but one concern is the feature seems to be very specific to the chemoinformatics tasks. You are working on compatibility with NetworkX, so in that sense I would of course welcome it.
As MolecularGraph.minimum_cycle_basis does not consider edge weight (all edges should be weight=1), we may need improvement for more general purposes.

@antonhinneck
Copy link

antonhinneck commented Nov 13, 2023

I also noticed that something has to be done regarding edge weights. Based on them the shortest-path method has to be changed.

  1. If no weights are provided or all weights are equal, bfs can be backtracked as is implemented.
  2. If weights differ between edges and are positive Dijkstra may be used (which is already implemented in Graphs.jl, i think, and could be integrated).
  3. If weights differ between edges and are not necessarily positive, Bellman-Ford? (which I also saw in Graphs.jl)

Point 1. definitely also covers problems in power systems and would be a great start.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
No open projects
Status: Todo
Development

No branches or pull requests

6 participants