Implement canonicalization of components / subgraphs #47

JanCBrammer · 2022-03-18T09:00:57Z

No description provided.

schatzsc · 2022-06-23T22:47:34Z

Is the handling of graphs with disconnected parts still a problem with the bliss algorithm?

schatzsc · 2022-06-23T22:53:07Z

Otherwise, I have a very simple code to detect the individual fragments and assign a "fragment number" as a node attribute.

First, create a global "graph attribute" to keep the number of fragments (this is btw also the way to store other attributes that are not attached to particular nodes or edges):

global_parameters.append(['number_of_fragments',0])

But of course can only be done after creating the graph with graph = nx.Graph()

Then, the following code will detect the fragments, update the global number of fragmentsattribute and assign each node afragment` attribute:

def detect_fragments(graph):
number_of_fragments = 1
for fragment in sorted(nx.connected_components(graph), key=len, reverse=True):
for atom in fragment:
attrs = {atom: {"fragment": number_of_fragments}}
nx.set_node_attributes(graph, attrs) (end of inner for loop)
number_of_fragments += 1 (end of outer for loop)
graph.graph["number_of_fragments"] = number_of_fragments-1
return graph

schatzsc · 2022-06-23T23:05:56Z

If detect_fragment() is run after bliss the fragment numbers should also be canonical, right?

Still, together with a "ring membership", the "fragment number" might serve as an additional invariant. I think it's really an invariant since it is only tied to the topology of the graph, but the numbering might depend on the way the fragments appear first in the molfile.

Consider NaCl as one of the most simple two-fragment systems, assuming the two atoms are ionic and there is no edge between them. Than NaCl vs. ClNa would possibly give different fragment numbers depending on the input order. So ring and fragment numbers somehow would also need to be canonicalized, best before the node index numbers, which is kind of a "hen-egg problem", right?

schatzsc · 2022-07-04T20:41:43Z

Do we need the fragment detection actually? Only for fragment-based charges, right?

flange-ipb · 2022-11-11T09:45:19Z

I think, a meaningful way for fragment numbering would be based on the lowest (canonical) node index appearing in each fragment.

Examples:

For NaCl mentioned above the sodium ion has the index 1 and the chloride ion has the index 2. Then fragment 1 consists of the Na ion and fragment 2 has the Cl ion.
For Zeise's salt (with potassium cation) the canonical TUCAN string is C2H4Cl3KPt/(1-5)(2-5)(3-6)(4-6)(5-6)(5-11)(6-11)(7-11)(8-11)(9-11) where the hydrogens take the indices 1 to 4 and the potassium has the index 10. Now, fragment 1 would be C2H4Cl3Pt (lowest node index is 1) and fragment 2 would be K (lowest node index is 10).

Canonical vs. non-canonical TUCAN strings:
The parser can processes non-canonical TUCANs correctly, but the node indices in the graph returned by graph_from_tucan are not in canonical order. If the fragment index would be part of the TUCAN string (e.g. via fragment charges), then it should refer to the non-canonical node indices.
I'm still trying to elaborate an example where the fragment indices have to change for one and the same compound. Not sure if that's even possible.
My play code:

    import networkx as nx
    from tucan.io import graph_from_tucan

    m1 = graph_from_tucan("C2H8/(1-10)(2-10)(3-10)(4-10)(5-9)(6-9)(7-9)(8-9)/(8:mass=2)(9:mass=13)")
    m2 = graph_from_tucan("C2H8/(1-9)(2-9)(3-9)(4-9)(5-10)(6-10)(7-10)(8-10)/(8:mass=2)(10:mass=13)")
    print(sorted(nx.connected_components(m1)))  # [{0, 1, 2, 3, 9}, {4, 5, 6, 7, 8}] ... 13C is in fragment 2
    print(sorted(nx.connected_components(m2)))  # [{0, 1, 2, 3, 8}, {4, 5, 6, 7, 9}] ... 13C is still in fragment 2

flange-ipb · 2022-11-11T10:10:44Z

Here's an example where fragment indices have to change:
(1) C2H8/(1-9)(2-9)(3-9)(4-9)(5-10)(6-10)(7-10)(8-10)/(10:mass=13) -> [{0, 1, 2, 3, 8}, {4, 5, 6, 7, 9}] (¹³C is in fragment 2)
(2) C2H8/(1-10)(2-10)(3-10)(4-10)(5-9)(6-9)(7-9)(8-9)/(10:mass=13) -> [{0, 1, 2, 3, 9}, {4, 5, 6, 7, 8}] (¹³C is in fragment 1)

Now, imagine the carbon with node index 10 is charged instead of having an isotope mass and we introduce fragment-based charges in the TUCAN string. Then, in order to get the same compound from the parser, the TUCAN strings have to look the following way:
(1) C2H8/(1-9)(2-9)(3-9)(4-9)(5-10)(6-10)(7-10)(8-10)/(F2:chg=1)
(2) C2H8/(1-10)(2-10)(3-10)(4-10)(5-9)(6-9)(7-9)(8-9)/(F1:chg=1)

JanCBrammer self-assigned this Mar 21, 2022

JanCBrammer mentioned this issue Sep 23, 2022

Parser for TUCAN strings #74

Merged

JanCBrammer removed their assignment Nov 14, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Implement canonicalization of components / subgraphs #47

Implement canonicalization of components / subgraphs #47

JanCBrammer commented Mar 18, 2022

schatzsc commented Jun 23, 2022

schatzsc commented Jun 23, 2022 •

edited

Loading

schatzsc commented Jun 23, 2022

schatzsc commented Jul 4, 2022

flange-ipb commented Nov 11, 2022

flange-ipb commented Nov 11, 2022

Implement canonicalization of components / subgraphs #47

Implement canonicalization of components / subgraphs #47

Comments

JanCBrammer commented Mar 18, 2022

schatzsc commented Jun 23, 2022

schatzsc commented Jun 23, 2022 • edited Loading

schatzsc commented Jun 23, 2022

schatzsc commented Jul 4, 2022

flange-ipb commented Nov 11, 2022

flange-ipb commented Nov 11, 2022

schatzsc commented Jun 23, 2022 •

edited

Loading