You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Equations is a coming soon feature, so it might be a bit too early to discuss, but I would like to understand how equation can be represented in Docling Document Model?
Basically there are two cases:
Display Equations
Inline Equations
Display Equations seem to be just TextItems with either the sanitized representation containing the TeX formula or using some extension adding the “tex” attribute, like this:
{
"self_ref": "#/texts/47",
"parent": {
"cref": "#/body"
},
"children": [],
"label": "formula",
"prov": [
…
],
"orig": "Attention( Q,K,V ) = softmax( QK T \u221a d k ) V (1)",
"text": "Attention( Q,K,V ) = softmax( QK T \u221a d k ) V (1)”,
“tex”: ”$[\mathrm{Attention}(Q, K, V) = \mathrm{softmax}(\frac{QK^T}{\sqrt{d_k}})V]”
},
Inline Equations modeling might require the hierarchy supported by Docling Document - a parent paragraph TextItem containing the list of children text items which are a mixture of the texts and inline equations? E.g., the TeX paragraph
Where the projections are parameter matrices $W^Q_i \in \mathbb{R}^{\dmodel \times d_k}$, $W^K_i \in \mathbb{R}^{\dmodel \times d_k}$, $W^V_i \in \mathbb{R}^{\dmodel \times d_v}$ and $W^O \in \mathbb{R}^{hd_v \times \dmodel}$.
How inline equations to be implemented in docling (and will it be implemented at all)? For example, a tree relationships within a paragraph make more sense if there is provenance information for inline equations and other child text nodes, otherwise markdown like the Nougat output seems to be more convenient in the paragraph.text field.
Is using tree structures for modeling inline equations within a paragraph consistent with the original design?
And, if yes, how to extend the model if needed, for instance, to define which text element requires a line break (regular paragraph, display equation) and which does not (inline equation, text between inline equations)
Thank you!
The text was updated successfully, but these errors were encountered:
Equations is a coming soon feature, so it might be a bit too early to discuss, but I would like to understand how equation can be represented in Docling Document Model?
Basically there are two cases:
Display Equations seem to be just
TextItem
s with either the sanitized representation containing the TeX formula or using some extension adding the “tex” attribute, like this:Inline Equations modeling might require the hierarchy supported by Docling Document - a parent paragraph TextItem containing the list of children text items which are a mixture of the texts and inline equations? E.g., the TeX paragraph
Where the projections are parameter matrices $W^Q_i \in \mathbb{R}^{\dmodel \times d_k}$, $W^K_i \in \mathbb{R}^{\dmodel \times d_k}$, $W^V_i \in \mathbb{R}^{\dmodel \times d_v}$ and $W^O \in \mathbb{R}^{hd_v \times \dmodel}$.
after can be represented with a tree like this:
Questions:
Thank you!
The text was updated successfully, but these errors were encountered: