Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Web-based demo version #67

Open
JanCBrammer opened this issue Jul 27, 2022 · 24 comments
Open

Web-based demo version #67

JanCBrammer opened this issue Jul 27, 2022 · 24 comments
Assignees
Labels

Comments

@JanCBrammer
Copy link
Collaborator

Having a website (e.g., SPA) for demo purposes would increase accessibility a lot. Currently, the accessibility of TUCAN is too limited, since the package needs to be installed in a local Python environment.

@flange-ipb
Copy link
Collaborator

flange-ipb commented Jul 28, 2022

My investigations so far:
TUCAN cannot be used via the PyScript/Pyodide ecosystem at the moment. Pyodide requires architecture-independent ("pure") wheels, which are not present for the igraph dependency. TUCAN itself can be built as pure wheel. The ecosystem is still heavily under development, so I expect future improvements.

@rapodaca has written several blog articles and GH-repos about porting the InChI C-library to JavaScript and WebAssembly:

Thus, I'll try to convert TUCAN to C as a next step (maybe with Cython) ...

@rapodaca
Copy link

rapodaca commented Jul 28, 2022

A possibly less suitable option would be to run TUCAN on a server and expose the interface through a browser.

See also Running InChI Anywhere with WebAssembly for an overview of efforts around InChI and Wasm.

@schatzsc
Copy link
Collaborator

Frank - what is still missing for the web demo? Checked the last version with Ketcher as editor and it looked pretty much complete to me ...

@flange-ipb
Copy link
Collaborator

The feedback from our project team was quite positive, so I just moved the demo website to https://tucan-nest.github.io by renaming the repository.

This covers the following use-cases:

  • The user draws a chemical structure and generates a TUCAN string.
  • The user inputs a Molfile into a text field (by copy/paste or by drag&drop) and generates a TUCAN string.

Other use-cases that will be added as soon as TUCAN supports it:

  • The user inputs a TUCAN string and draws a chemical structure from it.
  • The user inputs a TUCAN string and generates a Molfile from it.
  • The user should receive a warning if the input was a non-canonical TUCAN string.

@schatzsc
Copy link
Collaborator

So can we promote this in the public now or you would like to wait for some more time for feedback?

@flange-ipb
Copy link
Collaborator

Yes, it can be promoted.

@flange-ipb
Copy link
Collaborator

flange-ipb commented Sep 21, 2022

A few words on the technical background:
The application uses Pyodide as Python runtime environment in the browser. The key improvement of Pyodide a few weeks ago was that it can load Python wheels built specifically for the WebAssembly (WASM) platform and not just pure wheels.

igraph wheel:
igraph is the only dependency of TUCAN that includes C code and needs to be ported to the WASM platform. This was done with great support from igraph's developers. At the moment, the build of igraph-0.9.11-cp310-cp310-emscripten_3_1_14_wasm32.whl isn't automated and there are no package repositories that accept WASM wheels, so I deposited the file in the repo.

TUCAN wheel:
As mentioned earlier, the TUCAN library can be built as pure wheel. tucan-0.1.0-py2.py3-none-any.whl is also deposited in the the repo.

Glue code:
Most of the logic (initialization, DOM-manipulation, events) is implemented with JavaScript and Python is only invoked when converting Molfile to TUCAN. The molfile_to_tucan function is returned as function pointer in the last line of tucanweb.py and becomes accessible as proxy object molfile_to_tucan in JavaScript. This pattern should be extensible by returning a tuple of function pointers.

Chemical structure editor:
We decided to switch from OpenChemLib JS to Ketcher2 for better usability.
Using Ketcher poses a few challenges:

  • Ketcher is a React component and I had to assemble and deposit a simple React app in order to include it in this non-React application. I didn't find any universal build (aka UMD) released by the Ketcher developers.
  • Ketcher's operations on V3000 Molfiles are very slow and CPU-intensive as compared to handling V2000 Molfiles. (I opened an issue in the Ketcher repo.)

Implicit hydrogens:
Like most structure editors Ketcher does not offer any functionality to add implicit hydrogens in the exported Molfiles. At the moment this is done using one of OpenChemLib's functionalities.

@schatzsc
Copy link
Collaborator

BTW, just the repository or also the demo website?

@flange-ipb
Copy link
Collaborator

BTW, just the repository or also the demo website?

Do you mean what should be promoted? I think just the URL https://tucan-nest.github.io. People interested in the website's source code will be clever enough to find the repository because GitHub pages always follows the same pattern.

@schatzsc
Copy link
Collaborator

Ok, it was not clear to me that the running demo is also included there. Then I will start to promote the link that you included above on social media later today.

@JanCBrammer
Copy link
Collaborator Author

@flange-ipb, can we close this issue?

@flange-ipb
Copy link
Collaborator

Not yet. The features related to TUCAN-to-Molfile conversion are still missing and we may need some discussions on the web design.
graph_to_molfile is the last piece from the TUCAN library required for this and I'll PR it in a few minutes.

@schatzsc
Copy link
Collaborator

Just found another problem - in some cases the coordinate generation results in very small (< 10e-7) values which makes little sense. Therefore, I would like to suggest to set coordinate values to zero if initial values is below a certain threshold, best < 0.0001 since most experimental and theoretical methods cannot provide more accurate values.

tc

@flange-ipb
Copy link
Collaborator

Thanks for pointing this out. Indeed, I incorrectly use the format .6g to write floating-point numbers in the Molfile writer, which switches to between fixed-point and scientific notation depending on the magnitude.
Using .6f gives:

M  V30 1 H 0.866027 -0.500000 0.000000 0
M  V30 2 H -0.000000 1.000000 0.000000 0
M  V30 3 H -0.866027 -0.500000 0.000000 0
M  V30 4 O -0.000000 -0.000001 0.000000 0

It adds trailing zeros (you can also see this in Molfiles exported by ChemDraw). The "negative zero" is a bit flawed.

@schatzsc
Copy link
Collaborator

I think that does not matter that much.

@rapodaca
Copy link

rapodaca commented Oct 12, 2022

The spec is silent on the issue of number format for coordinates. It doesn't follow the "Fortran format" convention for decimal values in the header. Still, no example in any documentation I've founded uses exponential notation. So the V3K library I'm working on Trey, will report an error if given exponential notation. Other tools may do the same.

@schatzsc
Copy link
Collaborator

schatzsc commented Oct 13, 2022

I'm not that familiar with the code that Frank Lange (flange-ipb) has contributed but the calc_coordinates are derived in cases where no coordinates are initially available, as in TUCAN-to-molfile conversion. This is due to the fact that the TUCAN string by default does not include any coordinates, only reflects the topology, not exact 2D/3D arrangement of nodes/atoms (although one might define coordinates as "custom node attributes" but that's not part of the main implementation).

Might come from ketcher editor?

Still, even if it is just for more pleasant reading by humans, I'd prefer not to have the exponential notation. Normal high-quality low-temperature X-ray diffraction gives coordinates to three relevant digits behind the comma. DFT will provide more digits but since they cannot be calibrated against experimental data of that accuracy are more or less useless.

The .6f notation that Frank has now switched to is already total overkill for experimentalists. I hope the leading zero does not cause any problems. Python will interpret -0 as int and -0.0 as float with abs(-0.0) converting to 0.0

coords

@flange-ipb
Copy link
Collaborator

Might come from ketcher editor?

No, in case calc_coordinates is used, all coordinates are overwritten by those generated by the Kamada-Kawai layout algorithm.

It's interesting to see how others implemented coordinate serialization:

  • rdkit and CDK use a fixed-point output format and handle coordinates as IEEE-754 doubles.
  • OpenBabel seems to output doubles with .6g.

@schatzsc
Copy link
Collaborator

Thank you very much for checking how rdkit and CDK do this.

Could you please also link to the part of the TUCAN code where the Kamada-Kawai coordinates are collected and then passed on to molfile generation?

In visualization.py there is def _draw_networkx_graph_3d(m, highlight, labels, fig, col): which uses coords = nx.kamada_kawai_layout(m, dim=3) to set the coordinates but I don't find the molfile writer!?!

@flange-ipb
Copy link
Collaborator

That's not yet merged. See my pull request.

@schatzsc
Copy link
Collaborator

That's too many changes - I better leave it to @JanCBrammer to approve and merge ;-)

But now I got it:

coords = nx.kamada_kawai_layout(graph, dim=2)

Really just directly coming from the NetworkX Kamada-Kawai routine.

In case somebody is interested in the original paper, it is here:

https://dx.doi.org/10.1016/0020-0190(89)90102-6

T. Kamada, S. Kawai, An algorithm for drawing general undirected graphs, Information Processing Letters, 1989, 31, 7-15

@schatzsc
Copy link
Collaborator

schatzsc commented Nov 3, 2022

In the web demo, when using Convert TUCAN to structure or Convert TUCAN to molfile the following input generates an error This is not a canonical TUCAN string. which is a correct statement:

C6H14N2/(22-20)(21-19)(20-19)(19-18)(18-17)(17-16)(16-15)(15-20)(1-22)(2-22)(3-21)(4-21)(5-20)(6-21)(7-18)(8-18)(9-17)(10-17)(11-16)(12-16)(13-15)(14-15)

Still, it would be possible to convert this to a graph and then generate a canonical TUCAN from it for further processing. Is it intended behaviour here to terminate with an error message or shouldn't there at least also be a function Convert TUCAN to canonical TUCAN? The only requirement for that is that the node labels are not out of range.

@flange-ipb
Copy link
Collaborator

Good point!
I already calculate the canonical TUCAN for comparison with the TUCAN in the input field, so it could be shown in the yellow warning box:

Screenshot

@schatzsc
Copy link
Collaborator

schatzsc commented Nov 4, 2022

Since this functionality is already around - do you think it would be useful to have a separate "generic TUCAN to canonical TUCAN" option, where you can type in any TUCAN and then get the canonical one out? Such a string-to-string conversion is currently missing ...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

4 participants