-
Notifications
You must be signed in to change notification settings - Fork 5
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Implement canonicalization of components / subgraphs #47
Comments
Is the handling of graphs with disconnected parts still a problem with the bliss algorithm? |
Otherwise, I have a very simple code to detect the individual fragments and assign a "fragment number" as a node attribute. First, create a global "graph attribute" to keep the number of fragments (this is btw also the way to store other attributes that are not attached to particular nodes or edges):
But of course can only be done after creating the graph with Then, the following code will detect the fragments, update the global
|
If Still, together with a "ring membership", the "fragment number" might serve as an additional invariant. I think it's really an invariant since it is only tied to the topology of the graph, but the numbering might depend on the way the fragments appear first in the molfile. Consider NaCl as one of the most simple two-fragment systems, assuming the two atoms are ionic and there is no edge between them. Than NaCl vs. ClNa would possibly give different fragment numbers depending on the input order. So ring and fragment numbers somehow would also need to be canonicalized, best before the node index numbers, which is kind of a "hen-egg problem", right? |
Do we need the fragment detection actually? Only for fragment-based charges, right? |
I think, a meaningful way for fragment numbering would be based on the lowest (canonical) node index appearing in each fragment. Examples:
Canonical vs. non-canonical TUCAN strings: import networkx as nx
from tucan.io import graph_from_tucan
m1 = graph_from_tucan("C2H8/(1-10)(2-10)(3-10)(4-10)(5-9)(6-9)(7-9)(8-9)/(8:mass=2)(9:mass=13)")
m2 = graph_from_tucan("C2H8/(1-9)(2-9)(3-9)(4-9)(5-10)(6-10)(7-10)(8-10)/(8:mass=2)(10:mass=13)")
print(sorted(nx.connected_components(m1))) # [{0, 1, 2, 3, 9}, {4, 5, 6, 7, 8}] ... 13C is in fragment 2
print(sorted(nx.connected_components(m2))) # [{0, 1, 2, 3, 8}, {4, 5, 6, 7, 9}] ... 13C is still in fragment 2 |
Here's an example where fragment indices have to change: Now, imagine the carbon with node index 10 is charged instead of having an isotope mass and we introduce fragment-based charges in the TUCAN string. Then, in order to get the same compound from the parser, the TUCAN strings have to look the following way: |
No description provided.
The text was updated successfully, but these errors were encountered: