Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add mention links extension #171

Open
wants to merge 9 commits into
base: master
Choose a base branch
from
Open

Add mention links extension #171

wants to merge 9 commits into from

Conversation

niblo
Copy link
Contributor

@niblo niblo commented Oct 8, 2021

This adds support for mention links.

@codecov
Copy link

codecov bot commented Oct 8, 2021

Codecov Report

Merging #171 (3f355de) into master (7f05330) will increase coverage by 0.04%.
The diff coverage is 100.00%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master     #171      +/-   ##
==========================================
+ Coverage   94.33%   94.37%   +0.04%     
==========================================
  Files           3        3              
  Lines        3088     3112      +24     
==========================================
+ Hits         2913     2937      +24     
  Misses        175      175              
Impacted Files Coverage Δ
src/md4c-html.c 95.31% <100.00%> (+0.11%) ⬆️
src/md4c.c 94.27% <100.00%> (+0.03%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 7f05330...3f355de. Read the comment docs.

@niblo
Copy link
Contributor Author

niblo commented Oct 13, 2021

I'm getting a Segmentation fault: 11 when integrating this in an application.

@niblo
Copy link
Contributor Author

niblo commented Dec 9, 2021

I'm no longer getting a segmentation fault. It was something downstream.

@mity
Copy link
Owner

mity commented Jan 6, 2022

Hello, sorry for the delayed answer.

Will take a look in upcoming days, this looks as a something bigger so I don't want to rush this too much.

@mity
Copy link
Owner

mity commented Jan 6, 2022

Just tried it. Seems to crash more or less with any input containing @ now (when the extension is enabled).

Here for the input @x:

(gdb) bt
#0  0x0040d0f9 in md_process_inlines (ctx=0x64faa8, lines=0x671510, n_lines=1) at ../src/md4c.c:4318
#1  0x0040e2a8 in md_process_normal_block_contents (ctx=0x64faa8, lines=0x671510, n_lines=1) at ../src/md4c.c:4639
#2  0x0040ea42 in md_process_leaf_block (ctx=0x64faa8, block=0x671508) at ../src/md4c.c:4818
#3  0x0040ed86 in md_process_all_blocks (ctx=0x64faa8) at ../src/md4c.c:4900
#4  0x0041246e in md_process_doc (ctx=0x64faa8) at ../src/md4c.c:6333
#5  0x0041262d in md_parse (
    text=0x672dc8 "@x­ş\rđ­ş\rđ­ş\rđ­ş\rđ­ş\rđ­ş\rđ­ş\rđ­ş\rđ­ş\rđ­ş\rđ­ş\rđ­ş\rđ­ş\rđ­ş\rđ­ş\rđ­ş\rđ­ş\rđ­ş\rđ­ş\rđ­ş\rđ­ş\rđ­ş\rđ­ş\rđ­ş\rđ­ş\rđ­ş\rđ­ş\rđ­ş\rđ­ş\rđ­ş\rđ­ş\rđ­ş\rđ­ş\rđ­ş\rđ­ş\rđ­ş\rđ­ş\rđ­ş\rđ­ş\rđ­ş\rđ­ş\rđ­ş\rđ­ş\rđ­ş\rđ­ş\rđ­ş\rđ­ş\rđ­ş\rđ­ş\rđ­ş"..., size=2, parser=0x64fce4, userdata=0x64fd08) at ../src/md4c.c:6401
#6  0x00403cef in md_html (
    input=0x672dc8 "@x­ş\rđ­ş\rđ­ş\rđ­ş\rđ­ş\rđ­ş\rđ­ş\rđ­ş\rđ­ş\rđ­ş\rđ­ş\rđ­ş\rđ­ş\rđ­ş\rđ­ş\rđ­ş\rđ­ş\rđ­ş\rđ­ş\rđ­ş\rđ­ş\rđ­ş\rđ­ş\rđ­ş\rđ­ş\rđ­ş\rđ­ş\rđ­ş\rđ­ş\rđ­ş\rđ­ş\rđ­ş\rđ­ş\rđ­ş\rđ­ş\rđ­ş\rđ­ş\rđ­ş\rđ­ş\rđ­ş\rđ­ş\rđ­ş\rđ­ş\rđ­ş\rđ­ş\rđ­ş\rđ­ş\rđ­ş\rđ­ş\rđ­ş"..., input_size=2, process_output=0x401e59 <process_output>, userdata=0x64fe60,
    parser_flags=32768, renderer_flags=5) at ../src/md4c-html.c:580
#7  0x00401f87 in process_file (in=0x760c4660 <msvcrt!_iob+96>, out=0x760c4620 <msvcrt!_iob+32>)
    at ../md2html/md2html.c:144
#8  0x004026fe in main (argc=3, argv=0x672c78) at ../md2html/md2html.c:380

Copy link
Owner

@mity mity left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FYI, this is not a full review, take it as just a few cents of what I've just noticed when skimming through the code quickly. I likely won't do anything deeper until you fix the crashing so I can try it more thoroughly to assess it more.

render_mention_link(MD_HTML* r, const MD_SPAN_MENTION_DETAIL* det)
{
RENDER_VERBATIM(r, "<x-mention data-target=\"");
render_entity(r, det->text, det->size, render_html_escaped);
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should likely call render_verbatim() instead. No entity translation should place here I guess.

PUSH_MARK('@', off, index, MD_MARK_RESOLVED);
off = index;
}
else if(line->beg + 1 <= off && ISALNUM(off-1) &&
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This condition now should likely explicitly check whether are enabled MD_FLAG_PERMISSIVEEMAILAUTOLINKS. That was not needed previously because @ was added into the mark_char_map[] in md_build_mark_char_map() if and only if that extension was enabled.

Comment on lines +4320 to +4330
MD_SPAN_MENTION_DETAIL det;
if (CH(mark->beg) == '@')
{
det.text = (char *) ctx->text + mark->beg + 1;
det.size = mark->end - mark->beg - 1;
MD_ENTER_SPAN(MD_SPAN_MENTION, &det);
MD_TEXT(text_type, STR(mark->beg), mark->end - mark->beg);
MD_LEAVE_SPAN(MD_SPAN_MENTION, &det);
break;
}

Copy link
Owner

@mity mity Jan 6, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How do you know at this point whether it's a mention link or a permissive e-mail auto-link? Both extensions may be enabled at the same time.

MD_SPAN_U
MD_SPAN_U,

MD_SPAN_MENTION
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would prefer renamimg it to MD_SPAN_MENTIONLINK for the sake of consistency e.g. with MD_SPAN_WIKILINK etc.

Comment on lines +304 to +305
unsigned char size;
MD_CHAR* text;
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I understand it's supposed to contain only a verbatim text (the username) with no possible nested formatting options, yet I wonder whether rather using MD_ATTRIBUTE target (and initializing it to a single substr_type of type MD_TEXT_NORMAL) would not be better, also for the sake of consistency with other detail structures.

@@ -316,6 +324,7 @@ typedef struct MD_SPAN_WIKILINK {
#define MD_FLAG_LATEXMATHSPANS 0x1000 /* Enable $ and $$ containing LaTeX equations. */
#define MD_FLAG_WIKILINKS 0x2000 /* Enable wiki links extension. */
#define MD_FLAG_UNDERLINE 0x4000 /* Enable underline extension (and disables '_' for normal emphasis). */
#define MD_FLAG_MENTIONS 0x8000 /* Enable mention links extension. */
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Similarly please rename to MD_FLAG_MENTIONLINKS.

Copy link
Owner

@mity mity left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Noticed one more thing.

/* A potential permissive e-mail autolink. */
if(ch == _T('@')) {
if(line->beg + 1 <= off && ISALNUM(off-1) &&
if( (ctx->parser.flags & MD_FLAG_MENTIONS) && (line->beg == off || (CH(off-1) == _T(' '))) )
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why only a space? I think any whitespace char would be ok here.

Even maybe some listed (but likely not all) punctuation chars could validly proceed or follow respectivelly. I think things like the following can be common e.g. in a process of any collaborative document writing.

# The Most Important Secret (draft)

The Answer to the Ultimate Question of Life, the Universe, and Everything is 41.
(@zaphod_beeblebrox: Please verify whether it shouldn't be 42)

Or

@alice, @bob and @charlie are sending some packets to @daniel.

if( (ctx->parser.flags & MD_FLAG_MENTIONS) && (line->beg == off || (CH(off-1) == _T(' '))) )
{
OFF index = off + 1;
if (index == line->end || CH(index) == ' ') {
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ditto.

@niblo
Copy link
Contributor Author

niblo commented Jan 10, 2022

Just tried it. Seems to crash more or less with any input containing @ now (when the extension is enabled).

Here for the input @x:

(gdb) bt
#0  0x0040d0f9 in md_process_inlines (ctx=0x64faa8, lines=0x671510, n_lines=1) at ../src/md4c.c:4318
#1  0x0040e2a8 in md_process_normal_block_contents (ctx=0x64faa8, lines=0x671510, n_lines=1) at ../src/md4c.c:4639
#2  0x0040ea42 in md_process_leaf_block (ctx=0x64faa8, block=0x671508) at ../src/md4c.c:4818
#3  0x0040ed86 in md_process_all_blocks (ctx=0x64faa8) at ../src/md4c.c:4900
#4  0x0041246e in md_process_doc (ctx=0x64faa8) at ../src/md4c.c:6333
#5  0x0041262d in md_parse (
    text=0x672dc8 "@x­ş\rđ­ş\rđ­ş\rđ­ş\rđ­ş\rđ­ş\rđ­ş\rđ­ş\rđ­ş\rđ­ş\rđ­ş\rđ­ş\rđ­ş\rđ­ş\rđ­ş\rđ­ş\rđ­ş\rđ­ş\rđ­ş\rđ­ş\rđ­ş\rđ­ş\rđ­ş\rđ­ş\rđ­ş\rđ­ş\rđ­ş\rđ­ş\rđ­ş\rđ­ş\rđ­ş\rđ­ş\rđ­ş\rđ­ş\rđ­ş\rđ­ş\rđ­ş\rđ­ş\rđ­ş\rđ­ş\rđ­ş\rđ­ş\rđ­ş\rđ­ş\rđ­ş\rđ­ş\rđ­ş\rđ­ş\rđ­ş\rđ­ş"..., size=2, parser=0x64fce4, userdata=0x64fd08) at ../src/md4c.c:6401
#6  0x00403cef in md_html (
    input=0x672dc8 "@x­ş\rđ­ş\rđ­ş\rđ­ş\rđ­ş\rđ­ş\rđ­ş\rđ­ş\rđ­ş\rđ­ş\rđ­ş\rđ­ş\rđ­ş\rđ­ş\rđ­ş\rđ­ş\rđ­ş\rđ­ş\rđ­ş\rđ­ş\rđ­ş\rđ­ş\rđ­ş\rđ­ş\rđ­ş\rđ­ş\rđ­ş\rđ­ş\rđ­ş\rđ­ş\rđ­ş\rđ­ş\rđ­ş\rđ­ş\rđ­ş\rđ­ş\rđ­ş\rđ­ş\rđ­ş\rđ­ş\rđ­ş\rđ­ş\rđ­ş\rđ­ş\rđ­ş\rđ­ş\rđ­ş\rđ­ş\rđ­ş\rđ­ş"..., input_size=2, process_output=0x401e59 <process_output>, userdata=0x64fe60,
    parser_flags=32768, renderer_flags=5) at ../src/md4c-html.c:580
#7  0x00401f87 in process_file (in=0x760c4660 <msvcrt!_iob+96>, out=0x760c4620 <msvcrt!_iob+32>)
    at ../md2html/md2html.c:144
#8  0x004026fe in main (argc=3, argv=0x672c78) at ../md2html/md2html.c:380

Rebased on the latest master, I'm not getting any errors when running the tests. This is on OS X. I see you're running on Windows. Do any of the checks (Travis CI, codecov, ...) run the test suite?

@mity
Copy link
Owner

mity commented Jan 10, 2022

Rebased on the latest master, I'm not getting any errors when running the tests. This is on OS X. I see you're running on Windows.

Still crashing on Windows.

But I can see that if I move the block of lines 4315 - 4318 after that following mention-related if, the crash goes away. That opener/closer arithmetics likely makes no sense in the mention case (it has no real closer, right?).

I have quite old gcc version on windows, so I can imagine newer compiler versions reorder that code as an optimization because the mention branch block does not depend on those variables and thus hide the bug effectively.

Do any of the checks (Travis CI, codecov, ...) run the test suite?

At this time, Travis only (linux) does the tests. Not sure how easy it would be to enable them on AppVeyor (windows) as it requires python but I hope it should be possible.

@mity
Copy link
Owner

mity commented Jan 10, 2022

Update: On Linux the problem is there too, only it does not manifest by the crash for some reason. Whenever running it under valgrind, enabling mentions generates invalid reads from unallocated memory.

@niblo
Copy link
Contributor Author

niblo commented Jan 10, 2022

But I can see that if I move the block of lines 4315 - 4318 after that following mention-related if, the crash goes away. That opener/closer arithmetics likely makes no sense in the mention case (it has no real closer, right?).

Which lines exactly?

@mity
Copy link
Owner

mity commented Jan 10, 2022

Which lines exactly?

These 4 lines make no sense to me in the case of input made of only mentions. That pointer arithmetic leads to the problem. There must be some logic added makeing that code work for links and auto-links but not to be executed for the case of mention which seem to have no closer (or alternatively you need to change the logic that some virtual empty closer marks are created even for the mentions).

https://github.com/niblo/md4c/blob/3f355de536520791c844c2fa95041b897ae7bc44/src/md4c.c#L4315-L4318

@niblo
Copy link
Contributor Author

niblo commented Jan 11, 2022

Which lines exactly?

These 4 lines make no sense to me in the case of input made of only mentions. That pointer arithmetic leads to the problem. There must be some logic added makeing that code work for links and auto-links but not to be executed for the case of mention which seem to have no closer (or alternatively you need to change the logic that some virtual empty closer marks are created even for the mentions).

https://github.com/niblo/md4c/blob/3f355de536520791c844c2fa95041b897ae7bc44/src/md4c.c#L4315-L4318

I'm having some issues compiling on Windows. How do you do it?

@mity
Copy link
Owner

mity commented Jan 11, 2022

I'm having some issues compiling on Windows. How do you do it?

I'm generally using mingw-w64, or more specifically very old MSYS (not MSYS2) environment with gcc-toolchain built from https://github.com/niXman/mingw-builds/ and I'm build with Cmake+Ninja. Unfortunately that can be hard-to-replicate environment.

That said though, you should be able to build with MSYS 2 (can be downloaded from https://mingw-w64.org) or directly with any recent version of MS Visual Studio (afaik, it now supports building CMake-based projects directly). Or you can generate MSVC solution manually by CMake as appveyor.xml does for the purposes of CI.

However note I don't think that making some cosmetic change preventing the most obvious crashes is good enough approach here. The pointed piece of code shows this PR makes a fundamental difference between how the mentions are implemented and how other permissive links are.

For the permissive e-mail auto-links we add an extra dummy mark for every potential valid @ (when handling @ in md_collect_marks()), and in case we decide later to resolve it as an actual link we turn that dummy mark into a virtual (zero-length) closer mark. That means all the later link processing works quite independently on whether it was a full markdown link or permissive auto-link because they are encoded similarly in the MD_CTX::marks[].

I don't think it's a good idea to depart from this approach for mention links. It could lead to difficult-to-solve collisions of such radically different approaches in all phases of the processing. So please take a look whether mentions could work the same way: In ideal case you would then need to distinguish only at the last moment what constants to use when calling the callback functions for one or the other. I think it should be possible.

@ec1oud
Copy link
Contributor

ec1oud commented Jan 13, 2022

x-mention-data-target is an html tag already in use?

@mity
Copy link
Owner

mity commented Jan 13, 2022

x-mention-data-target is an html tag already in use?

No. Only in this PR. And even when/if it gets merged you would see it only when enabled with an extension flag.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants