Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

blib.pm dont shell out to "cmd.exe" on Win32+Win32.pm #22683

Open
wants to merge 2 commits into
base: blead
Choose a base branch
from

Conversation

bulk88
Copy link
Contributor

@bulk88 bulk88 commented Oct 20, 2024

-document Internals::getcwd() enough, with scary warnings, for future
core devs, or CPAN devs. A permanent reason for Internals::getcwd() to
exist on Win32 full perl was found. See code comments.
-optimize a bit the built-in perl core cwd() XSUBs, use TARG, and group
stack manipulation together for C compiler variable liveness reasons aka
less variables to save in non-vol regs or on C stack around function
calls.


  • This set of changes requires a perldelta entry, and it is included.

@tonycoz
Copy link
Contributor

tonycoz commented Oct 21, 2024

It looks like you need to perl regen.pl.

I suspect it would be better to move getcwd() from Internals:: to builtin:: and calling that from blib. Internals was added pre-builtin, and the implementation of getcwd_sv() looks reasonably universal (if not too efficient on non-POSIX systems.)

Adding (or calling) PerlEnv_get_childdir() from getcwd_sv() strikes me as a more universal solution - making getcwd_sv() more usable (or at least more efficient) on Windows, though the "always malloc" interface of win32_get_childdir() makes me sad.

@tonycoz
Copy link
Contributor

tonycoz commented Oct 21, 2024

Actually, I don't see why you're getting that regen error here, and on the other PRs.

I tried testing #22682 which has a similar regen failure, and couldn't reproduce it locally.

Karl's PR was submitted around the same time and didn't fail.

@haarg
Copy link
Contributor

haarg commented Oct 21, 2024

I would be strongly in support of adding a builtin::cwd function of some form.

Cwd is a loadable module, dual life, and part of File::Spec. This combines together to be horrible for anything manipulating @INC that might care about the current directory. Loading Cwd locks you in to whatever version of File::Spec exists in the current path. But you might have been wanting to use a ./local install of File::Spec. Cwd::cwd also still shells out to pwd, which would be trivial to fix if Cwd wasn't written in such a tortured way.

@richardleach
Copy link
Contributor

Actually, I don't see why you're getting that regen error here, and on the other PRs.

I tried testing #22682 which has a similar regen failure, and couldn't reproduce it locally.

Karl's PR was submitted around the same time and didn't fail.

These commits came in while blead was briefly broken during the point release yesterday. I'll re-run the checks on them.

@richardleach
Copy link
Contributor

Actually, I don't see why you're getting that regen error here, and on the other PRs.
I tried testing #22682 which has a similar regen failure, and couldn't reproduce it locally.
Karl's PR was submitted around the same time and didn't fail.

These commits came in while blead was briefly broken during the point release yesterday. I'll re-run the checks on them.

Ok. Made no difference. The PRs must be based on top of blead when it was broken (Makefile.SH)and a fresh rebase should fix.

@bulk88
Copy link
Contributor Author

bulk88 commented Oct 22, 2024

I suspect it would be better to move getcwd() from Internals:: to builtin:: and calling that from blib. Internals was added pre-builtin, and the implementation of getcwd_sv() looks reasonably universal (if not too efficient on non-POSIX systems.)

I will do that since Internals:: package is legacy and builtin:: is modern way to do it.

Adding (or calling) PerlEnv_get_childdir() from getcwd_sv() strikes me as a more universal solution - making getcwd_sv() more usable (or at least more efficient) on Windows, though the "always malloc" interface of win32_get_childdir() makes me sad.

Getting rid of that malloc is in my cross hairs, I'll just add user supplied buffer variant to Perlhost.h. Note, IMHO perlhost.h and vdir.h and friends need a 1 way trip to Recycle Bin (75% of LOCs, rest are truly needed for psudofork). B/c even though perlhost.h is using Newx()/Safefree() right now in its internal impl. I (me) think would break API conventions set up in perlhost.h, for perlhost.h, to transfer ownership of pointers for later use in sv_usepvn(), which is the max efficient API design/way to do it.

Since perlhost.h seems to be opaque what is perlhost.h's own internal memory allocator, actually is. Is it global C++ new()? per my_perl Newx? is it malloc? is it PerlMemShared_malloc? is it PerlMem_malloc? is it PerlMemParse_malloc?

I will just make a user supplied buffer API for perlhost, caller first passed a good enough sized, c stk char array, aka 256+1 or 4096, if overflow of 256, (like almost never on WinOS unless you are building Chrome browser or nodejs), caller only gets length needed and overflow fail error, and must malloc an extreme length buffer, and retry again. Basically copying MS's GetCurrentDirectory() API.

Final note, Rev 1 of this commit , i wrote the POD as NEVER USE THIS, if it is moved to builtin::, keep my original scary text about never use this, or what level of "support" will P5P offer? will the call ever be removed in the future? is it experimental? is runtime probing for the typeglob API contract required forever, in case P5P removes it in the future with no notice?

Obviously since builtin::cwd is a built in, nobody can ever get bug fixes for builtin::cwd in the future once shipped in Perl stable. But, since its just a random XSUB, Cwd.pm in theory can knock it out (builtin::cwd glob) in its BOOT:/BEGIN{}/use/require, right?

Does P5P turn on optree inlining for builtin::cwd? or that has gone too far, unmaintainable, not ultra perf critical, and builtin::cwd is not supposed to be used by CPAN/public on a daily use, so optree inling, dont do that? yes no? this can be done later on, not in this pull req, I wont do that in this PR,

As I wrote in Rev 1, other than an exotic .t situation, it has no legit usage by CPAN/public. 2nd unsanctioned/official use by CPAN/public, is a perl.elf.exe executed bare, without access to its /lib, which is supposed to be possible as part of Pure Perl API contract with users. I will quietly pretend the -E bug doesn't exist [nervous whistle emoji].

@haarg
Copy link
Contributor

haarg commented Oct 22, 2024

builtin is not the place for "internal" or "do not use" functions.

@bulk88
Copy link
Contributor Author

bulk88 commented Oct 22, 2024

builtin is not the place for "internal" or "do not use" functions.

So put in builtin::* and document as "fully supported" or document as highly discouraged unless you are writing a very specific .t and Cwd.pm isnt loadable???

Note, we support the XS backend as public api already, https://perldoc.perl.org/perlapi#getcwd_sv and that C func is not listed as experimental and does not site in "private api" perlintern.pod. I changed my mind in this paragraph. I say full feature/full support builtin::cwd as part part of builtin:: family. And no dangers or warning sentences to discourage use. Only docs and sentences that encourage use.

Any bug filed against PP builtin::cwd if the bug ticket unfairly rejected by P5P, in 5 mins that bug ticket will come back the almost verbatim text, and be filed as a bug against int Perl_getcwd_sv(interpreter *my_perl, sv *sv) which is public XS API.

If builtin::cwd is added as a new feature, does anyone see any cons or negatives, if major chunks of CPAN/many CPAN authors abandon, Cwd.pm in favor of the built in? I don't see any cons, since Cwd.xs only does

void
getcwd(...)
ALIAS:
    fastcwd=1
PPCODE:
{
    dXSTARG;
    /* fastcwd takes zero parameters:  */
    if (ix == 1 && items != 0)
	croak_xs_usage(cv,  "");
    getcwd_sv(TARG);
    XSprePUSH; PUSHTARG;
    SvTAINTED_on(TARG);
}

and is calling the P5P core impl anyway.

.. . and //// path cleaning is only thing I can think of that Cwd.pm may do on some platforms, but do any OSes ever allow setting such CWDs anymore? without the OS/shell.elf.exe auto cleaning the path/env var itself?

Overall Im happy some/a yes votes appeared in this ticket.

@tonycoz
Copy link
Contributor

tonycoz commented Oct 22, 2024

I think add it to builtin as experimental.

Cwd itself calls getcwd_sv() (or a copy of it on really old perls) on POSIX-likes, so I think we can assume it's fairly reliable. (The getcwd() system call can validly fail on POSIX-likes.)

You mention Cwd calls getcwd_sv(), but it doesn't do that on Win32, where it ends up calling Win32::GetCwd(), or on VMS.

As to perlhost, we've had a fairly recent ticket where someone was asking for the ability to intercept system calls similar to what perlhost provides, though perlhost doesn't do this 100%.

@bulk88
Copy link
Contributor Author

bulk88 commented Oct 23, 2024

I think add it to builtin as experimental.

Cwd itself calls getcwd_sv() (or a copy of it on really old perls) on POSIX-likes, so I think we can assume it's fairly reliable. (The getcwd() system call can validly fail on POSIX-likes.)

You mention Cwd calls getcwd_sv(), but it doesn't do that on Win32, where it ends up calling Win32::GetCwd(), or on VMS.

Is it a bug perl core Perl_getcwd_sv() calls ucrtbase.dll getcwd instead of PerlEnv_get_childdir()? psuedofork/ithreads per my_perl CWD is broken if you call libc global getcwd.

Also I dislike the ucrt implementation of getcwd, since its a thin wrapper around dgetcwd which is implemented with GetFullPathName(":.") not GetCurrentDirectory(). GetCurrentDirectory() internals are alot more simple, refcnt-spin lock aqu in PEB->memcpy->refcnt spinlock rel. GetFullPathName(":.") has to "start a grammar parser" with HeapAlloc/HeapFree. Neither do kernel I/O for ":." according to ProcMon, but grammer parsing, vs no parsing, no parsing is the winner.

As to perlhost, we've had a fairly recent ticket where someone was asking for the ability to intercept system calls similar to what perlhost provides, though perlhost doesn't do this 100%.

MS Detours, https://github.com/DynamoRIO/dynamorio There are other tools.

Somewhere on my todo list is sending perlhost.h/iperlsys.h to Marine bootcamp for weight loss and cutting 75% of the LOCs. I assume MS/ActiveState wanted a perlhost.dll separate maintained from perl56.dll and perl58.dllsince back then, there was a bad idea policy of no scheduled maint or stable P5P releases and years of time between stable releases.perlhost.dll` never came into existence, AS sponsorship is gone. I removed the virtual C++ decl tag ~8 years ago, nobody complained. I RO consted the extern C vtables 8 years ago, nobody complained. Some layers of abstraction can be removed now, without affecting ithreads or psuedfork or public API or perl intern API.

Rant and conspiracy theory on perlhost.h. Skip rest of the post if you want.

Staring at the C++ elephant, I came to realize, its real purpose was not for ithreads and psuedo fork, ActiveState and Microsoft has a different API design requirement. perlhost.h goes far beyond what is needed for psuedo-fork.

Someone at MS or a MS VAR company, has a business requirement to run Win32 Perl on Unix. Yes, Win32 Perl on Unix, not Perl on Unix.

Here is its real reason https://en.wikipedia.org/wiki/Internet_Explorer_for_UNIX

Design requirement was libperl as a Internet Explorer ActiveX plugin, on Internet Explorer for Unix, and Win32 Perl for Unix, loads Win32::GUI on Unix and draws a GUI. This is NOT WINE. There is no file called kernel32.so in the process.

-document Internals::getcwd() enough, with scary warnings, for future
 core devs, or CPAN devs.  A permanent reason for Internals::getcwd() to
 exist on Win32 full perl was found. See code comments.
-optimize a bit the built-in perl core cwd() XSUBs, use TARG, and group
 stack manipulation together for C compiler variable liveness reasons aka
 less variables to save in non-vol regs or on C stack around function
 calls.
@bulk88 bulk88 force-pushed the blib_pm_no_cmd.exe_add_proper_getcwd branch from 18599f6 to 284512f Compare October 23, 2024 18:42
@bulk88
Copy link
Contributor Author

bulk88 commented Oct 23, 2024

Updated branch pushed, I did find a bug, Cwd.pm does slash conversion to unix, Win32::GetCwd and Perl_getcwd_sv() and Internals::/builtin:: and msvcrt ucrtbase do not. Do nothing or what changes?

Does ``Perl_getcwd_sv() get/need to usePerlEnv_get_childdir()` aka new `PerlEnv_get_childdir_tbuf()` api? I didn't change that so far. docs are missing on `PerlEnv_get_childdir_tbuf()` since this branch is still WIP and up for comments.

Add builtin::getcwd

Perl#3 gcc syntax fixes

-win32_get_childdir() skip the strlen() because GetCurrentDirectoryA()
 gives it to us, handle 32KB paths if encountered.
 It is an infinite retry loop since its been reported, in multithreading,
 GCD()/CWD can change and get longer on our OS thread's between
 overflow Perl#1 and correct-size attempt Perl#2 because CWD val is a race cond.
 So all overflow conditions must trigger realloc. If the whole C stack
 is used up with alloca() and infinite retry. A SEGV is good.
 Win API/UNICODE_STRING struct is hard coded to USHORT/SHORT. If
 GetCurrentDirectoryA() returns above 65KB a SEGV is good. If
 GetCurrentDirectoryA()'s impl is doing {return buflen+1;} which is an API
 violation, OS is damaged, a SEGV is good. The race retry will never
 realistically trigger more than 1x ever, 2x rounds through retry loop
 might happen after a few centuries semi-guess.

-CPerlHost::GetChildDir(void) is TODO now that
 m_pvDir->GetCurrentDirectoryA/W have correct retvals.

-perllib.c silence warnings, return debug code accidentally removed in

Merge WinCE and Win32 directories -- Initial patch
7bd379e 4/27/2006 7:30:00 PM
@bulk88 bulk88 force-pushed the blib_pm_no_cmd.exe_add_proper_getcwd branch from 284512f to 59087c3 Compare October 23, 2024 21:04
@tonycoz
Copy link
Contributor

tonycoz commented Oct 23, 2024

I think add it to builtin as experimental.
Cwd itself calls getcwd_sv() (or a copy of it on really old perls) on POSIX-likes, so I think we can assume it's fairly reliable. (The getcwd() system call can validly fail on POSIX-likes.)
You mention Cwd calls getcwd_sv(), but it doesn't do that on Win32, where it ends up calling Win32::GetCwd(), or on VMS.

Is it a bug perl core Perl_getcwd_sv() calls ucrtbase.dll getcwd instead of PerlEnv_get_childdir()? psuedofork/ithreads per my_perl CWD is broken if you call libc global getcwd.

Probably since perlhost maintains per pseudo-process current directories.

Also I dislike the ucrt implementation of getcwd, since its a thin wrapper around dgetcwd which is implemented with GetFullPathName(":.") not GetCurrentDirectory(). GetCurrentDirectory() internals are alot more simple, refcnt-spin lock aqu in PEB->memcpy->refcnt spinlock rel. GetFullPathName(":.") has to "start a grammar parser" with HeapAlloc/HeapFree. Neither do kernel I/O for ":." according to ProcMon, but grammer parsing, vs no parsing, no parsing is the winner.

I'll admit I thought Win32 didn't have getcwd(), but we do use it.

As to perlhost, we've had a fairly recent ticket where someone was asking for the ability to intercept system calls similar to what perlhost provides, though perlhost doesn't do this 100%.

MS Detours, https://github.com/DynamoRIO/dynamorio There are other tools.

I've used DrMemory on Windows, which is based on this.

The user in this case wanted to use this on Linux, ActiveState still provide perl for Linux, but I don't know if they build that with -DPERL_IMPLICIT_SYS (and provide the missing bits.)

Somewhere on my todo list is sending perlhost.h/iperlsys.h to Marine bootcamp for weight loss and cutting 75% of the LOCs. I assume MS/ActiveState wanted a perlhost.dll separate maintained from perl56.dll and perl58.dllsince back then, there was a bad idea policy of no scheduled maint or stable P5P releases and years of time between stable releases.perlhost.dll` never came into existence, AS sponsorship is gone. I removed the virtual C++ decl tag ~8 years ago, nobody complained. I RO consted the extern C vtables 8 years ago, nobody complained. Some layers of abstraction can be removed now, without affecting ithreads or psuedfork or public API or perl intern API.

Rant and conspiracy theory on perlhost.h. Skip rest of the post if you want.

Staring at the C++ elephant, I came to realize, its real purpose was not for ithreads and psuedo fork, ActiveState and Microsoft has a different API design requirement. perlhost.h goes far beyond what is needed for psuedo-fork.

Someone at MS or a MS VAR company, has a business requirement to run Win32 Perl on Unix. Yes, Win32 Perl on Unix, not Perl on Unix.

Here is its real reason https://en.wikipedia.org/wiki/Internet_Explorer_for_UNIX

Design requirement was libperl as a Internet Explorer ActiveX plugin, on Internet Explorer for Unix, and Win32 Perl for Unix, loads Win32::GUI on Unix and draws a GUI. This is NOT WINE. There is no file called kernel32.so in the process.

They had an IIS plugin for non-ASP use too, IIRC, which I expected used the perlhost stuff.

@bulk88
Copy link
Contributor Author

bulk88 commented Oct 25, 2024

Is it a bug perl core Perl_getcwd_sv() calls ucrtbase.dll getcwd instead of PerlEnv_get_childdir()? psuedofork/ithreads per my_perl CWD is broken if you call libc global getcwd.

Probably since perlhost maintains per pseudo-process current directories.

So fix it and hunt down all non-ithread compatible getcwd calls for win32 ? its easy for me

Tony Cook, what is your opinion on Perl_getcwd_sv() the C function right now return MS slash paths and not unix slash on Win32? does that need to be fixed? is it a bug? will it break all of CPAN? or nobody cares so fix it?

@tonycoz
Copy link
Contributor

tonycoz commented Oct 29, 2024

So fix it and hunt down all non-ithread compatible getcwd calls for win32 ? its easy for me

Yes please.

Tony Cook, what is your opinion on Perl_getcwd_sv() the C function right now return MS slash paths and not unix slash on Win32? does that need to be fixed? is it a bug? will it break all of CPAN? or nobody cares so fix it?

Anyone who is currently directly calling getcwd_sv() is currently getting \ separated paths, and that's the native format. I think it's reasonable to continue to return that for getcwd_sv().

I think it's reasonable for builtin::getcwd() to also return the native separators.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants