Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement canonicalization of components / subgraphs #47

Open
JanCBrammer opened this issue Mar 18, 2022 · 6 comments
Open

Implement canonicalization of components / subgraphs #47

JanCBrammer opened this issue Mar 18, 2022 · 6 comments

Comments

@JanCBrammer
Copy link
Collaborator

No description provided.

@JanCBrammer JanCBrammer self-assigned this Mar 21, 2022
@schatzsc
Copy link
Collaborator

Is the handling of graphs with disconnected parts still a problem with the bliss algorithm?

@schatzsc
Copy link
Collaborator

schatzsc commented Jun 23, 2022

Otherwise, I have a very simple code to detect the individual fragments and assign a "fragment number" as a node attribute.

First, create a global "graph attribute" to keep the number of fragments (this is btw also the way to store other attributes that are not attached to particular nodes or edges):

global_parameters.append(['number_of_fragments',0])

But of course can only be done after creating the graph with graph = nx.Graph()

Then, the following code will detect the fragments, update the global number of fragmentsattribute and assign each node afragment` attribute:

def detect_fragments(graph):
number_of_fragments = 1
for fragment in sorted(nx.connected_components(graph), key=len, reverse=True):
for atom in fragment:
attrs = {atom: {"fragment": number_of_fragments}}
nx.set_node_attributes(graph, attrs) (end of inner for loop)
number_of_fragments += 1 (end of outer for loop)
graph.graph["number_of_fragments"] = number_of_fragments-1
return graph

@schatzsc
Copy link
Collaborator

If detect_fragment() is run after bliss the fragment numbers should also be canonical, right?

Still, together with a "ring membership", the "fragment number" might serve as an additional invariant. I think it's really an invariant since it is only tied to the topology of the graph, but the numbering might depend on the way the fragments appear first in the molfile.

Consider NaCl as one of the most simple two-fragment systems, assuming the two atoms are ionic and there is no edge between them. Than NaCl vs. ClNa would possibly give different fragment numbers depending on the input order. So ring and fragment numbers somehow would also need to be canonicalized, best before the node index numbers, which is kind of a "hen-egg problem", right?

@schatzsc
Copy link
Collaborator

schatzsc commented Jul 4, 2022

Do we need the fragment detection actually? Only for fragment-based charges, right?

@flange-ipb
Copy link
Collaborator

I think, a meaningful way for fragment numbering would be based on the lowest (canonical) node index appearing in each fragment.

Examples:

  • For NaCl mentioned above the sodium ion has the index 1 and the chloride ion has the index 2. Then fragment 1 consists of the Na ion and fragment 2 has the Cl ion.
  • For Zeise's salt (with potassium cation) the canonical TUCAN string is C2H4Cl3KPt/(1-5)(2-5)(3-6)(4-6)(5-6)(5-11)(6-11)(7-11)(8-11)(9-11) where the hydrogens take the indices 1 to 4 and the potassium has the index 10. Now, fragment 1 would be C2H4Cl3Pt (lowest node index is 1) and fragment 2 would be K (lowest node index is 10).

Canonical vs. non-canonical TUCAN strings:
The parser can processes non-canonical TUCANs correctly, but the node indices in the graph returned by graph_from_tucan are not in canonical order. If the fragment index would be part of the TUCAN string (e.g. via fragment charges), then it should refer to the non-canonical node indices.
I'm still trying to elaborate an example where the fragment indices have to change for one and the same compound. Not sure if that's even possible.
My play code:

    import networkx as nx
    from tucan.io import graph_from_tucan

    m1 = graph_from_tucan("C2H8/(1-10)(2-10)(3-10)(4-10)(5-9)(6-9)(7-9)(8-9)/(8:mass=2)(9:mass=13)")
    m2 = graph_from_tucan("C2H8/(1-9)(2-9)(3-9)(4-9)(5-10)(6-10)(7-10)(8-10)/(8:mass=2)(10:mass=13)")
    print(sorted(nx.connected_components(m1)))  # [{0, 1, 2, 3, 9}, {4, 5, 6, 7, 8}] ... 13C is in fragment 2
    print(sorted(nx.connected_components(m2)))  # [{0, 1, 2, 3, 8}, {4, 5, 6, 7, 9}] ... 13C is still in fragment 2

@flange-ipb
Copy link
Collaborator

Here's an example where fragment indices have to change:
(1) C2H8/(1-9)(2-9)(3-9)(4-9)(5-10)(6-10)(7-10)(8-10)/(10:mass=13) -> [{0, 1, 2, 3, 8}, {4, 5, 6, 7, 9}] (13C is in fragment 2)
(2) C2H8/(1-10)(2-10)(3-10)(4-10)(5-9)(6-9)(7-9)(8-9)/(10:mass=13) -> [{0, 1, 2, 3, 9}, {4, 5, 6, 7, 8}] (13C is in fragment 1)

Now, imagine the carbon with node index 10 is charged instead of having an isotope mass and we introduce fragment-based charges in the TUCAN string. Then, in order to get the same compound from the parser, the TUCAN strings have to look the following way:
(1) C2H8/(1-9)(2-9)(3-9)(4-9)(5-10)(6-10)(7-10)(8-10)/(F2:chg=1)
(2) C2H8/(1-10)(2-10)(3-10)(4-10)(5-9)(6-9)(7-9)(8-9)/(F1:chg=1)

@JanCBrammer JanCBrammer removed their assignment Nov 14, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants