Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix CHostPerl's class design & remove CHostPerl malloc-ed fnc vtbls bloat #22708

Open
wants to merge 2 commits into
base: blead
Choose a base branch
from

Conversation

bulk88
Copy link
Contributor

@bulk88 bulk88 commented Oct 27, 2024

See commit message. This patch changes (through regexps) alot of code, but it is safer option for "step 1" than forklift/trashcaning iperlsys.h's design. old/current/after this commit, iperlsys.h requires the obj ptr and vtable ptr "be the same" even though that design is incompatible with perlhost.h and perllib.c where some of the obj ptrs (classes) must have back-refs/C++ cast back, to their CPerlHost container, not containers, CONTAINER. While other classes, have multiple back refs and can't cast back or have no ONE SINGLE CPerlHost owner/backref/container.

The 3 perl_*() functions that got changed prototypes are not public API. No hits on cpan.

https://grep.metacpan.org/search?q=perl_clone_using%5C%28%7Cperl_alloc_using%5C%28%7Cperl_alloc_override%5C%28&qd=&qft=&qifl=


  • This set of changes does not require a perldelta entry.

…loat

Note, non-standard used of "static link" below, I am using it to refer
to static importing funtions/data symbols from another DLL, using the
PE import table. Opposite of "LoadLibrary()"/"GetProcAddress()"
linking. I am NOT using "static link" in typical usage of fully including
a copy of a library at compile time, through a .a/.lib/.o/.obj file.

Since commit

Revision: af2f850 10/19/2015 5:47:16 PM
const vtables in win32/perlhost.h

the vtables have been stored in read-only memory. There have been no bug
tickets or complaints since, of any users, wanting or depending on this
runtime instrumentation system.

All Win32 perl builds, are static DLL linked to a specific MSVCRT (LIBC)
at interp C compile build time. No matter the name of the CRT DLL,
msvcrt.dll, msvcrt120.dll, ucrtbase.dll, etc. Runtime swapping the
libperl MSVCRT DLL by an embedder, to his favorite CRT DLL, has never
been supported, and wouldn't even work, since perlhost.h's hooking isn't
perfect, and often Win32 Perl uses "win32_*()" functions by accident, or
explictly, and those static-link call into the hard coded CRTs. Plus
prototypes of CRT posix-ish functions have changed over the years.

What is time_t? stat_t? etc. While func symbol name stays the same.

The original commit for all this complexity, was from 5.0/5.6 era, where
it was assumed, perl 5 maint/stable releases will be abandoned by P5P
in favor of Perl 6, and all this complexity were provisions and APIs,
to fix, upgrade and improve Win32 Perl, on Microsoft's/ActiveState's
rapid release schedule, without any dependency on
P5P devs/pumpking/P5P policy, about releasing a new perl5 .tar.gz.

0f4eea8 6/19/1998 6:59:50 AM
commoit title "applied patch, along with many changes:"

"The features of this are:
1. All OS dependant code is in the Perl Host and not the Perl Core.
   (At least this is the holy grail goal of this work)
2. The Perl Host (see perl.h for description) can provide a new and
   improved interface to OS functionality if required.
3. Developers can easily hook into the OS calls for instrumentation
   or diagnostic purposes."

None of these provisions and APIs, have ever been used. CPerlHost:: never
became a separate DLL. Perl >= 5.12 has a "rapid release" policy.
ActiveState dropped sponsorship/product interest in Win32 Perl, many years
ago. Strawberry Perl took over the market. CPerlHost:: is way too
over engineereed for perl's ithreads/psuedofork, which only requires
"1 OS process" and 2 %ENVs, and 2 CWDs, functionality. Most of the
CPerlHost::* methods are jump stubs to "win32_*();" anyways, and the
hooking is redundant runtime overhead, but that is for another commit.

This commit is about removing the pointless malloc() and memcpy() of the
plain C to C++ "thunk funcs" vtables, from the RO const master copy in
perl5**.dll to each "my_perl" instance at runtime.

On x64, copying the vtables to malloc memory, wasted the following amounts
of malloc() memory. These are the actual integers passed to malloc() by
CPerlHost::/perl. malloc() secret internal headers not included in these
numbers.

perlMem, 0x38
perlMemShared, 0x38
perlMemParse, 0x38
perlEnv, 0x70
perlStdIO, 0x138
perlLIO, 0xE0
perlDir, 0x58
perlSock, 0x160
perlProc, 0x108

The old design of malloc-ed vtables, seems to have been, from the
original devs not knowing, or a poor understanding, how MS COM
(C++ obj in plain C) and MSVC ISO C++ objects (almost same ABI), are
layed out in memory. The original devs realized, if they use a ptr to
global vtable struct, they can't "cast" from the child class like
VDir:: or VMem::, back to a CPerlHost:: obj which is a design
requirement here.

But they wanted to pass around child class ptrs like VMem::* instead of one
CPerlHost:: obj ptr, and those VMem:: ptrs must be seen in 'extern "C"'
land by plain C perl, since my_perl keeps 9 of these C++ obj *s as seperate
ptrs in the my_perl "plain C" struct. So instead they made malloced copies
of the vtables, and put those copies in the CPerlHost:: obj, so from a
child class ptrs, they can C++ cast to the base class CPerlHost:: obj if
needed.

This is just wrong. Almost universally, vtables are stored in const
RO memory. Monkey-patching at runtime is a Perl lang thing, and rare
to never in C/C++land.

The ptr to "plain C to C++ func thunk vtable", CAN NOT be stored
per VDir::* or per VMem::* ptr. You can't store them, per C++ tradition,
as the 1st member/field of a VDir::/VMem:: object.

The reason is, VDir::/VMem:: objects can have refcounts, and multiple
CPerlHost:: ptrs, hold refs to one VMem:: ptr. So there is no way to
reverse a random VMem:: ptr, back to a CPerlHost:: ptr.

Main examples are VMem:: "MemShared" and VMem:: "MemParse".

Also the C->C++ thunk funcs must pick/separate, between 3 VMem:: obj ptrs.
Which are "Mem", "MemShared" and "MemParse" and stored at different
offsets in CPerlHost::*, but all 3 VMem:: derived "classes",
must have the same plain-C vtable layout with 7 extern "C" func thunk ptrs.
B/c my minimal C++ knowledge and also not wanting to add even more C++
classes to iperlsys.h perlhost.h and perllib.c, and those new C++ classes
may or may not inline-away. Don't fix this with more C++ classes.

So fix all of this, by each CPerlHost:: obj storing a ptr to the RO
vtable instead of a huge RW inlined copy of the vtable.
To keep all previous design requirements, just use
"&cperlhost_obj->vmem_whatever_vtable" as the plain-C representation
of a VMem::* ptr, instead of
"&cperlhost_obj->IPerlWhateverMem.pMalloc".

The 1 extra pointer de-ref CPU machine op, in each perl core and perl xs
caller, that executes in "iperlsys.h" family of macros I think is
irrelavent compared to the savings of having RO vtables. It is the same
machine code length on x86/x64 in each caller, comparing old vs new.

This extra ptr deref to reach the vtable can be removed, and I will
probably do it in a future commit. Not done here for bisect/small patch
reasons.

"iperlsys.h" family of macros is for example, the macro
"PerlEnv_getenv(str);"

Specific example, for macro PerlMem_free() in Perl_safesysfree()

old before this commit
----
mov     rax, [rax+0CE8h]
mov     rcx, rax
call    qword ptr [rax+10h]
-----

new after this commit
-----
mov     rcx, [rax+0CE8h]
mov     rax, [rcx]
call    qword ptr [rax+10h]
----

"mov rcx, rax" is "0x48 0x8B 0xC8" compared to
"mov rax, [rcx]" which is "0x48 0x8B 0x01".

No extra machine code "bloat" in any callers. The extra 1 memory read
is irrelavent if we are about to call malloc() or any of these other
WinOS kernel32.dll syscalls. iperlsys.h/perlhost.h does NOT hook anything
super perf critical such as "memcmp()" or "memcpy()".
@bulk88
Copy link
Contributor Author

bulk88 commented Oct 28, 2024

repushed c++ fixes

-Perl's primary malloc pool (per interp, never ithread shared), doesnt
 need CS mutexes, the refcounting/multiple my_perl owners infrastruture,
 etc. Inline the IPerlMem/VPerLMem class/struct direct into CPerlHost
 class. Less ptr derefs at runtime. Saves memory, because no malloc header.
 And remove the 0x24 ??? bytes on x86-32 CS/mutex struct.
-Use retval of libc's memset(), saves a non-vol reg push/pop/saving cycle.
 ZeroMemory() has void retval. Lack of a Calloc() API in VMem.h is for
 another time.
-"virtual int Chdir(const char *dirname);" remove virtual tag. It is
 unused ptr indirection. Also the secret C++ vtable ptr im CPerlHost
 objs is now gone.
-inline VDir obj into CPerlHost, VDir *s are not shared between interps.
-Sort machine type integer members of CPerlHost class by size. Remove
 Alignment holes.
-Speedup  win32_checkTLS(), win32_checkTLS() is probably redundant
 outside -DDEBUGGING nowadays, it was added in commit

222c300  1/13/2002 10:37:48 AM
Win32 fixes:
 - vmem.h hack to handle free-by-wrong-thread after eval "".

still will leave it in for now, just optimize it instead.

I benchmarked, 10000x calls to Perl_get_context() in a loop.
Retval ignored, is 126 us (microsec). 10000x calls to
GetCurrentThreadId(), is 34 us.
@Leont
Copy link
Contributor

Leont commented Nov 1, 2024

Is this still a WIP? Until the last push I wasn't quite clear where this was going.

@bulk88
Copy link
Contributor Author

bulk88 commented Nov 1, 2024

Ultimate goal is use https://learn.microsoft.com/en-us/windows/win32/api/heapapi/nf-heapapi-heapcreate https://learn.microsoft.com/en-us/windows/win32/api/heapapi/nf-heapapi-heapalloc and deactive/archive/#if 0 all this 1990s code written for Win3.1 that didn't have HeapCreate, which was a NT Kernel day 1 func call, later backported to Win95.

https://learn.microsoft.com/en-us/windows/win32/api/heapapi/nf-heapapi-heapdestroy fixing your memory leaks jkjk is as easy as pushing on a toilet handle. this shoudlve been done from day 1 winos perl port.

@bulk88
Copy link
Contributor Author

bulk88 commented Nov 2, 2024

>	ntdll.dll!RtlpLowFragHeapAllocFromContext�()	Unknown
 	ntdll.dll!RtlAllocateHeap�()	Unknown
 	AcXtrnal.dll!NS_FaultTolerantHeap::APIHook_RtlAllocateHeap(void *,unsigned long,unsigned __int64)	Unknown
 	ucrtbase.dll!_malloc_base()	Unknown
 	perl541.dll!VMem::Malloc(unsigned __int64 size) Line 169	C++
 	perl541.dll!Perl_safesysmalloc(unsigned __int64 size) Line 176	C
 	perl541.dll!S_share_hek_flags(interpreter * my_perl, const char * str, unsigned __int64 len, unsigned long hash, int flags) Line 3421	C
 	perl541.dll!Perl_hv_common(interpreter * my_perl, hv * hv, sv * keysv, const char * key, unsigned __int64 klen, int flags, int action, sv * val, unsigned long hash) Line 983	C
 	perl541.dll!Perl_hv_common_key_len(interpreter * my_perl, hv * hv, const char * key, long klen_i32, const int action, sv * val, const unsigned long hash) Line 482	C
 	perl541.dll!Perl_gv_fetchpvn_flags(interpreter * my_perl, const char * nambeg, unsigned __int64 full_len, long flags, const svtype sv_type) Line 2618	C
 	perl541.dll!Perl_newXS_len_flags(interpreter * my_perl, const char * name, unsigned __int64 len, void(*)(interpreter *, cv *) subaddr, const char * const filename, const char * const proto, sv * * const_svp, unsigned long flags) Line 11645	C
 	perl541.dll!Perl_newXS_flags(interpreter * my_perl, const char * name, void(*)(interpreter *, cv *) subaddr, const char * const filename, const char * const proto, unsigned long flags) Line 11549	C
 	[Inline Frame] perl541.dll!Perl_boot_core_UNIVERSAL(interpreter *) Line 1418	C
 	perl541.dll!S_parse_body(interpreter * my_perl, char * * env, void(*)(interpreter *) xsinit) Line 2593	C
 	perl541.dll!perl_parse(interpreter * my_perl, void(*)(interpreter *) xsinit, int argc, char * * argv, char * * env) Line 1932	C
 	perl541.dll!RunPerl(int argc, char * * argv, char * * env) Line 196	C++
 	[Inline Frame] perl.exe!invoke_main() Line 78	C++
 	perl.exe!__scrt_common_main_seh() Line 288	C++
 	kernel32.dll!BaseThreadInitThunk�()	Unknown
 	ntdll.dll!RtlUserThreadStart�()	Unknown

This is really sloppy design. 3 layers/call frams of pointless interdections after LTO/-O1 MSVC. on good side, The C++ methods inline in their extern c plain c thunk funks, but Perl_safesysmalloc can become a macro. Ignore AcXtrnal.dll!, its specific to VC debugger and not around in unattached no debugger procs.

ucrtbase.dll!_malloc_base() can be removed too we dont need https://learn.microsoft.com/en-us/cpp/c-runtime-library/reference/callnewh?view=msvc-170 C++ event dispatch.,

i also plan to inline the interp struxt into CPerlHost.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants