Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

optimize/cleanup generate_uudmap.c #22651

Open
wants to merge 4 commits into
base: blead
Choose a base branch
from
Open

Conversation

bulk88
Copy link
Contributor

@bulk88 bulk88 commented Oct 10, 2024

See patch decriptions, fputc() is very unwise CPU wise. Original intent was to put all 3 output .h files in stack array buffers, but it was impossible to get sanely get mg_data.h under 4096 bytes, so I stopped after eliminating the worst FD I/O calls (putch()) and outputting ~80-100 byte lines is better. Full "mimifying" mg_data.h got it ~3700 bytes for me, but that is unreadable. After this patch mg_data is 4.3KB. 4096 byte boundary is artificial picked, since rumors on Google say some POSIX OSes require root/rlimit/cgroups/faux-sudo/ writing "/proc" and magic compiler flags, if you want > 8KB C stacks. Therefore, don't crash "so early", and defer the crash risk much later, to Configure or Makefile.SH or perl.bin crashing, not generate_uudmap.bin.

Also a benefit is stripping pointless ", 0, 0, 0}"s, less work for CC.

This is branched out into this pull, since major problems in sv_inline.h is another topic. Not sure if I will actually need a pre-perl generated .h or not to fix that, but I need the provisions at minimum.

-this commit is probably for a future change requiring
 a generated .h involving sv_inline.h

-use stack vars for output, machine code for &stack_var is smaller than
 &global_var. &global_var has to best case have a U16 or typical U32 in
 it. struct mg_data_t mg_data[256] is 8*2*256=4096 long. That is 1 full
 mem page. So that makes the bin's .data section 1 page longer (even tho
 its all zeros, so it isnt represented on disk in normal OSes).
 Also the CC can dedup b/c scope vars mg_data PL_uudmap and PL_bitcount if
 it wants to (it does on MSVC 2022).
-cache progname. Ptr never changes. Don't pass it as an arg since that is
 3 x read->read->write->read cycles vs 3 x read.
-const mg_data_raw, it never changes
-generate_uudmap.o didn't depend on generate_uudmap.c previous, altho
 the code in generate_uudmap.c almost never gets changed b/c there is no
 need, do the I/O anyways, its 1 disk stat() call at interp build time
-this commit is probably for a future change requiring
 a generated .h involving sv_inline.h

-strip white space 4 spaces to 2 spaces, nobody looks at these files but
 the CC has to parse them. Also prev ", \n" was emitted, fix that.
 This change is an attempt to get mg_data.h under 4096 bytes, which in
 a future change, will be a stack buffer, and ~4096 is the Win32 hidden
 _chkstk()/alloca() threshold inserted by all Win32 CCs. Note,
 _chkstk()/alloca() is a function call that just does
 "(void)*(char*)&stack_var;" every 4096 bytes. Note, this is irrelavent
 for portability, since this change doesn't call alloca() or use a C99
 VLA. Not that important, but improve chances the Win32 CC strips
 alloca() fn body and push->push->call->ret ops in the caller.

bitcount.h 1,070 -> 1,024
mg_data.h 5,993 -> 5,392
uudmap.h 1,089 -> 1,043
-this commit is probably for a future change requiring
 a generated .h involving sv_inline.h

-chg size_t (~64b) to uint (~32b). On all X64 cpus, this prevents the REX
 prefix machine code byte from being emitted. ARM/X86 probably no change
 in CC output. We aren't generating >2GB disk files.
-collapse fprintf()/etc libc calls, each one requires "costly" user mode
 lock aqu()/rel() in all libcs. fputc() might've been a CPP macro a few
 decades ago, but then threads came. "costly" is vs no locks.
-"memcpy(d,s,123)" pattern is extremly likely to be inlined to a single
 CPU op
-the "p" "p2" "p" "p2" "memcpy()" pattern optimizes out some
 machine code read/write category ops, incase memcpy() is a real fn call
 and not inlined, and var "p" is C stack memory vs a nonvol register.
 Plus incase on XYZ CPU arch, constant operand integer "123", is a separate
 op of its own, CC doesn't need to output "mov reg, 123" before memcpy()
 and again output "mov reg, 123" after that fn call. I'm staring at
 ARM32 Thumb with its 2 byte long OPs.
-strip tailing "0, 0, 0," from the struct initializers, C guarentees a
 zero extend. Also collapse them at a higher density than
 format_char_block(). Max "0, " vs "255, ". Collapse "\n"s out of "0,"s
 in format_mg_data() in same style that already format_char_block() does.
-this commit is probably for a future change requiring
 a generated .h involving sv_inline.h

-buffer format_char_block data in stack memory until about a full line is
 reached, avoids expensive lock and unlocking inside all libcs (os threads)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant