-
Notifications
You must be signed in to change notification settings - Fork 2
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Generalisation of the idea #13
Comments
Oh, and of course you can have relationships with external documents or parts of external documents, as in HTML. Loading these in would be an option for the layout engine and done on-demand. |
We'd want potentially 1-many, many-1 and many-many relationships. This should be straightforward. |
I really like the idea of embracing the graph structure, it makes a lot of sense and we can get a lot of flexibility by annotating edges in the graph. This is probably the ideal intermediary format in terms of possible future functionality. But, I don't think we should encode the graph in HTML. Sure there are My preferred way to do this would be to store the graph in a suitable format that is lightweight and has no other strings attached. JSON can work for it, or maybe there's a data format that is more suited to graph data. Then we can convert input LaTeX/whatever into graph structure, and consume the graph structure in the front-end app.
I think the general paradigm of The hard part is to convert LaTeX/other formats into this graph structure.
In theory people could simply highlight/annotate a PDF to produce the graph structure. So users could take existing papers, and convert them to the graph structure. It could even be a crowd-sourced activity, maybe? |
In other words, HTML is good for modeling graphs when each document is a node in the graph and |
I agree that HTML isn't precisely designed to do what we're thinking about, but it can be done I think without too much cruft (example below) and has one outstanding advantage. This a tech that already exists, with millions of documents and converters already written. Friction will be very low if we build on HTML, but much higher if we invent our own tech. Here's an example of how it could look (although this could obviously be much improved):
The idea of the above is that you initially only see the paragraph results1, but that both fig1 and methods1 will be related items that behave in slightly different ways. fig1 will automatically be fully shown (fullpreview class) whereas methods1 doesn't specify, so if there is plenty of space it could be a full preview, but if space on the screen is limited it could be collapsed to a short preview or even just icon. Reverse links can be inferred, e.g. when yuou're reading methods1 you can see the relationships to results1 and figure1 too. This to me looks like low overhead, no? And can be generalised to do whatever we want and - crucially - already works even if you don't have the layout engine! |
We would still need to build tech to add data-attributes and classes to the HTML document, right? Presumably that would involve constructing an in-memory graph representation of the paper and then using that graph representation to annotate the various HTML elements? If that's the case, maybe we could be clever about this and split the system up into distinct parts: an algorithm to ingest a HTML paper (or other formats) and parse it into a graph structure stored in JSON, and then use that JSON graph format as input to a) an algorithm to markup the HTML doc with classes etc. like you suggest, and b) to a React app which can use it to construct a more dynamic and scalable version of the app (React can't take HTML as input, it needs JSON data), and c) a mobile app for example. That way, we can explore various approaches based on a common data format (JSON), instead of tying ourselves to HTML-only.
We don't have to start from the ground-up. We can use Pandoc to convert to JATS/similar, and then parse that to generate a graph structure in JSON. I don't think it's that much work to parse JATS into graph representation, and we'd probably need to do something similar for the HTML annotation tech anyway right? I also don't mind doing the heavy lifting on that part, this is my bread and butter after all.
It might be low-overhead in the short-term, but once the project scales I think we'll regret not using a portable data format for representing the paper. If we decide at some stage that we want to switch to a React app or build a mobile app, it will be much harder to do so because we can't transfer any of the HTML-specific tech over. The paradigm of parsing an input document into a suitable intermediary data format, and then passing that data format into various front-end applications is powerful and flexible.
One of the things I'm worried about is that we can't easily change the fundamental layout of the paper, because it's encoded directly in the HTML structure. If we wanted to offer multiple views of a paper then we would have to chop-and-change the HTML itself, right? That sounds painful. |
I'm not opposed to trying the HTML-markup approach, but I'd just like if we could create a JSON-format graph representation of the paper as an intermediary step and use that as input. That would at least give us flexibility to adapt our approach as we learn. It's also worth mentioning that I haven't built anything with vanilla JS + HTML in a loooong time. I can move much faster when building in React, so I'd be hamstringing myself by going back to oldschool javascript. |
My thinking here is that I'd like it as easy to use and adapt as possible. Ideally, someone could add support to an existing HTML document simply by adding a |
If the idea is to use a The simplest way to think about React is that it's a function that outputs HTML. It can't modify an existing chunk of HTML. We could load a React app the same way (via a Am I right in thinking that your HTML approach would be (at a high level) a function that takes |
Yes, purely for reasons of making it low friction to use it! I feel like every intermediate step we add we lose people. |
I think the most general form of the idea simplifies to something like this. I'd like a nonlinear document format based on a graph representation where nodes are individually viewable items (paragraph of text, figure) and links are relationships between them. The idea is to support multiple different ways of viewing this document, from straight up traditional, printed, linear view, right up to an interactive layout where related items slide into view automatically as you read.
HTML already has the right structure of elements and links, so why not just keep this. Our job then becomes coming up with a layout engine for HTML that follows some particular rule about defining the relationships between parts. Parts could be divs or even as fine grained as spans.
Some basic features here could be:
The way I'd see it working is we just take an arbitrary HTML document as input, maybe using some set of data attributes defining the roles and relationships between elements, and write javascript that gives different views of this document. Or alternatives (or additionally) we could have a separate document file with metadata about the relationships. allowing you to leave the original HTML file unchanged (but I think this would be less good).
With that general framework in place, we could have a secondary set of technologies to convert different document types into this format, and possibly even integrate with various editors to make it easy to write these relationships. We could imagine a server that automates this, converting from PDF, Word or LaTeX.
My feeling is that the layout engine makes sense to do entirely client side if possible, meaning you could download an offline view of a document amongst other things, and the converters could have open source tools but might be usefully integrated onto a server that lets you just upload a document.
Thoughts?
How does that and how would that fit into the React framework you're proposing @synek ?
The text was updated successfully, but these errors were encountered: