Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

write_html: support <sup> & <sup> tags inside <table> #860

Open
Tolker-KU opened this issue Jul 21, 2023 · 10 comments
Open

write_html: support <sup> & <sup> tags inside <table> #860

Tolker-KU opened this issue Jul 21, 2023 · 10 comments
Labels
enhancement hacktoberfest html multi_cell research needed too complicated to implement without careful study of official specifications up-for-grabs

Comments

@Tolker-KU
Copy link

Hi,

Thanks for all the great work going into this project!

I wonder if you have considered supporting subscript/superscript in cell/multicell when styling text with markdown?

Github supports this in their markdown implementation using the HTML tags <sub>/<sup>. I imagine fpdf2 could do something similar.

If you think this is a good idea, I would be happy to take a crack at it. It seems that the machinery for this feature already is in place.

@Lucas-C
Copy link
Member

Lucas-C commented Jul 23, 2023

Hi @Tolker-KU!

Thank you for your nice words 😊

I think this was implemented by @gmischler in #520:
https://pyfpdf.github.io/fpdf2/TextStyling.html#subscript-superscript-and-fractional-numbers

I think it should work for multi_cell(), but we currently only have unit tests for .write(),
so extra unit tests covering multi_cell() would be welcome!

@gmischler
Copy link
Collaborator

I think this was implemented by @gmischler in #520:

#520 implements the general ability to render subscript and superscript text, as well as the <sub> and <sup> tags for write_html().
However, the feature is not currently supported by our version of markdown.

The reason for the latter was that I couldn't find a standard on which characters to use as markup.
The most popular markdown variant commonmark doesn't support them either, for reasons that aren't entirely clear.
But then, since our own markdown variant is rather weird anyway (fundamentally incompatible with any others), we could theoretically chose whatever we want... I've seen ^x^ and ~x~ suggested most often, in our case it would probably make sense to double them like ^^x^^ and ~~x~~ to match the style of the existing tags.

I'm not very comfortable with borrowing tags from HTML. Why not just use HTML in the first place then?
Github accepting <sub> and <sup> HTML tags has little to do with markdown. It simply passes those through to the browser unchanged, just as it does with <b>, <i>, etc.

And while we're on the topic: Adding a conforming commonmark implementation (possibly in parallel) should probably be the long term goal.

@Tolker-KU
Copy link
Author

Tolker-KU commented Jul 24, 2023

Thank for getting back this quickly.

I'm looking for a feature to render subscripts and superscript within cells. As far as I can figure out this is not quite achievable with .write_html. Or am I wrong here?

What do you about adding the ^^ and ~~ tags to the markdown syntax, so one can do .cell(txt="H~~2~~O") -> H2O or .cell(text="E=MC^^2^^") -> E = MC2?

@Lucas-C
Copy link
Member

Lucas-C commented Jul 25, 2023

I'm looking for a feature to render subscripts and superscript within cells. As far as I can figure out this is not quite achievable with .write_html. Or am I wrong here?

No, you are right.
fpdf2 currently does not support <sup> & <sup> tags inside <table>:

from fpdf import FPDF

pdf = FPDF()
pdf.set_font("Helvetica")
pdf.add_page()
pdf.write_html(
    """<table border="1"><thead><tr>
        <th width="33%">Name</th>
        <th width="66%">Formula</th>
    </tr></thead><tbody><tr>
        <td>Lucas-C</td><td>E = MC<sup>2</sup></td>
    </tr</tbody></table>""")
pdf.output("issue_860.pdf")

I agree that it would be nice if fpdf2 supported this usage! 😊
I would welcome a PR that implements this in HTML2FPDF: https://github.com/PyFPDF/fpdf2/blob/master/fpdf/html.py#L195


I also fully agree with you @gmischler on this:

And while we're on the topic: Adding a conforming commonmark implementation (possibly in parallel) should probably be the long term goal.

Ideally, we could support combining fpdf2 with https://github.com/executablebooks/markdown-it-py
But then, would the translation chain be Markdown -> HTML, and then use FPDF.write_html()?
This is not ideal, as our HTML2PDF converter is very limited: https://pyfpdf.github.io/fpdf2/HTML.html

So I'm not really sure of the path forward regarding Markdown support...

@Tolker-KU
Copy link
Author

Tolker-KU commented Jul 26, 2023

Ideally, we could support combining fpdf2 with https://github.com/executablebooks/markdown-it-py But then, would the translation chain be Markdown -> HTML, and then use FPDF.write_html()? This is not ideal, as our HTML2PDF converter is very limited: https://pyfpdf.github.io/fpdf2/HTML.html

So I'm not really sure of the path forward regarding Markdown support...

I think markdown-it-py parses markup to tokens before rendering to HTML. Maybe fpdf2 can render the tokens directly to PDF instead of using HTML as an intermediate step.

https://markdown-it-py.readthedocs.io/en/latest/using.html#the-token-stream

@Lucas-C
Copy link
Member

Lucas-C commented Jul 26, 2023

I think markdown-it-py parses markup to tokens before rendering to HTML. Maybe fpdf2 can render the tokens directly to PDF instead of using HTML as an intermediate step.

Sure, we could do that!
But then we will basically have to maintain a new "Markdown2PDF" class 😅

I'm not opposed to this, if someone is willing to contribute / initiate such converter to this project,
and if it is mostlty compatible / does not break too many existing behaviours of fpdf2.

@Lucas-C Lucas-C added research needed too complicated to implement without careful study of official specifications up-for-grabs hacktoberfest markdown and removed pending-answer labels Jul 26, 2023
@Tolker-KU
Copy link
Author

Tolker-KU commented Jul 26, 2023

I'm looking for a feature to render subscripts and superscript within cells. As far as I can figure out this is not quite achievable with .write_html. Or am I wrong here?

No, you are right. fpdf2 currently does not support <sup> & <sup> tags inside <table>:

from fpdf import FPDF

pdf = FPDF()
pdf.set_font("Helvetica")
pdf.add_page()
pdf.write_html(
    """<table border="1"><thead><tr>
        <th width="33%">Name</th>
        <th width="66%">Formula</th>
    </tr></thead><tbody><tr>
        <td>Lucas-C</td><td>E = MC<sup>2</sup></td>
    </tr</tbody></table>""")
pdf.output("issue_860.pdf")

I've been looking into how to solving this. It seems that cells in tables rendered from HTML call FPDF.multi_cell().
https://github.com/PyFPDF/fpdf2/blob/54d2eb0266bd3b1ccbf4dc384ea46c9b0d6b718d/fpdf/table.py#L278-L293
As far as I can see FPDF.multi_cell() has no ability to render text with mixed vpos. One idea is to expose something like _render_styled_text_line() on FPDF that takes a TextLine which support text fragments with different styling. Could that be a way forward?

@gmischler
Copy link
Collaborator

As far as I can see FPDF.multi_cell() has no ability to render text with mixed vpos. One idea is to expose something like _render_styled_text_line() on FPDF that takes a TextLine which support text fragments with different styling. Could that be a way forward?

As you have correctly recognized, this is a fundamental limitation of multi_cell().
For formatting changes within a paragraph, there is the alternative write(), but that currently has the disadvantage that it can only create left-aligned text.

Fixing this cleanly requires some architectural changes to fpdf2. I have outlined a possible solution in #339, and have been working on-and-off on an actual implementation. I hope I'll find time again soon so I can actually show some more progress here.

Theoretically, write_html() could also get more low-level access to the fpdf.py internals as you suggest, but I think a more general high-level approach to text formatting is better in the long run. Several similar issues have been raised over the last year, which all correctly pointed at the same set of current limitations. I'm sorry to say that the necessary groundwork for a true and general solution will take a bit more time.

@Lucas-C
Copy link
Member

Lucas-C commented Aug 2, 2023

By the way, I think that this other, older issue is related: #151

@Lucas-C Lucas-C changed the title Support for subscript/superscript in cell/multicell using markdown write_html: support <sup> & <sup> tags inside <table> May 24, 2024
@Lucas-C
Copy link
Member

Lucas-C commented May 24, 2024

Regarding the initial question about Markdown, combining fpdf2 with mistletoeo can be a good alternative approach: https://py-pdf.github.io/fpdf2/CombineWithMistletoeoToUseMarkdown.html

I renamed this issue into: write_html: support <sup> & <sup> tags inside <table> in order to clarify what the current feature request is 🙂
For clarity, just repeating the minimal code snippet that we are looking to support:

from fpdf import FPDF

pdf = FPDF()
pdf.set_font("Helvetica")
pdf.add_page()
pdf.write_html(
    """<table border="1"><thead><tr>
        <th width="33%">Name</th>
        <th width="66%">Formula</th>
    </tr></thead><tbody><tr>
        <td>Lucas-C</td><td>E = MC<sup>2</sup></td>
    </tr</tbody></table>""")
pdf.output("issue_860.pdf")

Since PR #897 by @gmischler, HTML2FPDF is better architectured and now uses .text_columns() & paragraphs to render text. This should now ease the implementation of this feature.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement hacktoberfest html multi_cell research needed too complicated to implement without careful study of official specifications up-for-grabs
Projects
None yet
Development

No branches or pull requests

3 participants