Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

remove pointless SVPV mortal alloc in S_strip_spaces()/prototype parser #22655

Open
wants to merge 1 commit into
base: blead
Choose a base branch
from

Conversation

bulk88
Copy link
Contributor

@bulk88 bulk88 commented Oct 10, 2024

See commit message. Since I minimally know about the p5 prototype syntax grammar, is there an actual max_len to that string, and it is STILL parseable/not fatal later on? Other words, can string limit be FIXED, and that temp SVPV statement removed forever, adding a hard error, and using tiny stack buf only?

10MB, 100MB, and 1GB long prototype strings are welcome. Discussing prototype max_len verges on a Perl GOLF or anti-GOLF or fuzzing/sec exploit category.

@iabyn
Copy link
Contributor

iabyn commented Oct 14, 2024

I don't really understand the motivation behind this PR. It seems to add a lot of complexity: not only in the static function, but in all its callers. And just to save creating a mortal SV string a couple of times per sub-with-prototype compilation? Am I missing something obvious?

@bulk88
Copy link
Contributor Author

bulk88 commented Oct 17, 2024

I don't really understand the motivation behind this PR. It seems to add a lot of complexity: not only in the static function, but in all its callers. And just to save creating a mortal SV string a couple of times per sub-with-prototype compilation? Am I missing something obvious?

The mortal SV and malloc are being created and freed, even if the string contents are "clean" and the string contents will not change. That is a bug.

Perl prototypes are never more than ~10 chars long I think. It is senseless to, for EACH prototype parsed, to use malloc buffer+SV Body+SV head+free all 3 allocs, to parse the prototype, that 99% of the time, the string contents, will not change. A reasonably tiny, 16/32/64/128 C stack char array buffer is much better and faster. I left in the SVPV alloc code since the old code did it, but IMHO the SVPV alloc code should be removed and replaced by a bounds check and throw fatal parsing error.

If there are no comments, I will slim down the patch, and just use a C auto 32 byte char array or fallback to SVPV mortal, everywhere. Therefore less new C code by ASCII text .c file char length.

Ideally the SVPV mortal fallback code should be removed, but IDK what is the longest perl prototype that is not a grammar syntax error, and is not a security exploit/buffer overflow?

@mauke
Copy link
Contributor

mauke commented Oct 17, 2024

what is the longest perl prototype that is not a grammar syntax error, and is not a security exploit/buffer overflow?

Probably Web::Simple match specification or similar:
https://metacpan.org/pod/Web::Simple#Web%3A%3ASimple-match-specifications

Those can be given as subroutine prototypes:
https://metacpan.org/pod/Web::Simple#:~:text=An%20alternative%20to%20using%20string,the%20sub%20prototype

@bulk88
Copy link
Contributor Author

bulk88 commented Oct 17, 2024

Those can be given as subroutine prototypes: https://metacpan.org/pod/Web::Simple#:~:text=An%20alternative%20to%20using%20string,the%20sub%20prototype

Examples in the POD seem to max out at 32 chars, 80 chars is line limit in Terminal. ~120 chars is line limit in a desktop URL box before being cut off.

so 32/64 chars is average? extreme but some production code does it is 128? Perl syntax/grammer valid is unlimited length until OOM from malloc?

@iabyn
Copy link
Contributor

iabyn commented Oct 18, 2024 via email

Valid, parseable, and sane prototypes, are tiny in char len and often fit
on 1 hand, most extreme cases in production would be 80 chars, because
terminal length. Original commit referred to mitigating sloppy XS code
with random white space. Majority of XS code will have perfect clean
prototype strings.

So not SV Alloc, PV Alloc, MEXTENDPUSH, memcpy(), and alot more
free()s in scope _dec(), for clean strings. Even for dirty but parsable
prototypes, they will be tiny in bytes. Therefore use a tiny stack buffer
for dirty semi-hot path to remove overhead. Fuzzing, junk, abuse, can
OOM die in newSV()/malloc() if needed, same as in prior version of the
code.

Use newSV(len) and POK_off, SV head is private to us, and a waste to
bookkeep SVPV details. SAVEFREEPV() was not used, because previous code
did mortal, and not SAVEFREEPV(), so keep using mortal. This can be
changed if someone has rational to do it.

64 bytes was picked since its power of 2, and below 80, but not 16 or 32
bytes which is too small and C compiler would probably be leaving padding
at the end of the array anyways. Going to 80 or 100, was discarded, to
give breathing room on C stack to C compilers so on x86/x64 C compilers
can emit U8 arithmitic with U8 constants to access various C autos, and
the CC doesn't downgrade to U32 constants to access various local C auto.
Worst case production Perl code for prototype length, is a URL argument
template system on CPAN, and even urls, are unlikely to be longer than a
desktop's URL bar or 80 char terminals. So 64 byte was picked.

The struct provides type matching and length safety between
S_strip_spaces() and its callers.
@bulk88
Copy link
Contributor Author

bulk88 commented Oct 18, 2024

New patch, alot less lines of text. More basic. Still gets all goals accomplished.

@iabyn
Copy link
Contributor

iabyn commented Oct 21, 2024 via email

@bulk88
Copy link
Contributor Author

bulk88 commented Oct 23, 2024

optimisation isn't required in such non-hot code;

If I can reach it with a C debugger on every process startup its hot code.

that any any optimisation in any case should not need to make every caller of S_strip_spaces() be more complex;

You are complaining about adding only 1 extra line/1 C auto var decl for each caller?

and why not just make S_strip_spaces() return the original unmodified string if it doesn't have white space, and malloc otherwise?

Leak risk or double free or free on non malloc ptr hazard. Comparing the retval ptr to your own ptr, yeah it can be done, then storing the result of that bool comparison. Or having a char * initialized to NULL, then cleanup: fail: pass: labels at the bottom of the func, to free all those ptrs, and hope no exceptions thrown in between, I dont like that.. If retval ptr must be compared to input ptr, you might as well ret NULL if the func is supposed to return "no output ptr alloced, use your input ptr unmodified" state. The save stack/mortal design right now is less leak risk, and less new LOC in all the callers to track and free the ptr vs malloc.

@iabyn
Copy link
Contributor

iabyn commented Oct 25, 2024 via email

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants