Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

<br/> not properly converted to markdown when inside <pre></pre> tags #2

Open
olegme opened this issue Mar 26, 2024 · 6 comments
Open

Comments

@olegme
Copy link

olegme commented Mar 26, 2024

Me again ;-)

After I got it running, I stumbled upon a few quirks, mainly related to handling things inside <pre></pre> tags.

The first thing, I could isolate is the handling of <br/> inside <pre></pre> tags.

It is being translated to <br></br> sequence, which might be right per the HTML spec, but the proper markdown should be something like <space><space><new line> meaning two spaces followed by the new line.

Can you by any chance take a look or at least give some pointers as to where to start if I would like to try to fix it myself?

Thank you

@smarinier
Copy link
Owner

smarinier commented Mar 26, 2024

Hi again ;)

My notes were very basic, so i didn't encounters much troubles. If you could join here a sample, and the MD you'd think he should generate, i may have a look.

Or if you feel so, you can have a look in the League converter code, you'll that the HTML conversion made in Enex and in HTML docs (EnexConverter and HtlmConverter in the code) would need to provide a dedicated League/Environment, with getConverterByTag that would react on br (may a state to say we're in pre would be necessary), and that will provide a "Converter" that could replace by space space newline.

@olegme
Copy link
Author

olegme commented Mar 26, 2024

I dug deeper into the code and it seems the League package is a culprit. Line #175 in their HtmlConverter.php file does a good job of not converting everything between <pre></pre> tags. And there is an issue on their GitHub exactly for my case - [https://github.com/thephpleague/html-to-markdown/issues/245]. No reaction from the author so far.

In terms of the HTML standard, this is a correct approach and I think it's actually Evernote, which doesn't export it properly. To be honest, I wouldn't know how to fix it, but if you want to experiment, I attached my export file here.
Templates.enex.gz

When you uncompress and open it, the very first note titled "OpenWrt Wordpress" has something like this at the very beginning: <pre><br/>mkdir -p /mnt/disk/var/www</pre>. Which converts to <br></br>mkdir -p /mnt/disk/var/www, but according to the markdown specification it has to be two spaces and then a linebreak.

There are a few more issues, I experienced with more complicated notes, but I would better open separate issues as soon as I have time for further testing.

Thank you

@smarinier
Copy link
Owner

Hi @olegme, i just had a look on it. I'm not so sure this is the right HTML behaviour. In fact, all HTML tags must be escaped, whatever the tag they are placed in. Any HTML Code must be escaped (by something like > <)

So i tried just by commenting the three lines in HtmlConverter.php (around line 175), and it seems much more better to me.

If you may try with this, you'll see the files from your sample being more readeable. If this is ok for you, i can subclass the converter and/or try to propose the change in the library (with an option i guess).

The commented file HtmlConverter.php as joined file here (but i'm sure you can do it yourself)

HtmlConverter.php.zip

Please send your feedbacks about this,

@olegme
Copy link
Author

olegme commented Apr 10, 2024

Hi @smarinier,

thank you for the feedback. I'll test the attached class and let you know.

Regards

@smarinier
Copy link
Owner

Hi @olegme,

Since my previous message, i worked on it. I've handled yours needs and the samples given in the issue form the PHP Library. I subclassed some objects form this library as the needed changes for this library are very important (to my opinion). As a major point, it's said in a comment "keep HTML code in

" : this is against HTML rules.
I'm finishing the new implementation soon and i will invite you to test it ;)

@smarinier
Copy link
Owner

Hi @olegme
It's been quite a long time. But we moved this last monthes and that was a huge job (before, during and after).

At least, i've pushed what I've told previously. All my tests are based on your Templates.enex. Thanks for it.

Please let me know if the conversion looks of for you now

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants