-
Notifications
You must be signed in to change notification settings - Fork 11
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
partly wrong rounding when using printf with "%f" format #6
Comments
@jghub I have been researching this issue for several days now. At present, it seems that if the float is less than zero and has a succeeding zero or zeroes after the fraction separator rounding fails unless enough additional precision is given.
As a workaround, you can apply precision to a float via
As an example of supplying printf the float data as compared to its expansion:
I believe the problem is even more complex than I have described at present. I am now creating some scripts to help provide more insights into this rounding with floats and precision issue. |
@hyenias: thanks for this information. I agree that the situation is rather confusing. it would be good if you can manage to get a better grasp of what is going on here :). |
Interesting:
off by one error? |
why/how could that be an off-by-one error? I am too slow... I mean in your example every number in the semi-open interval [0.0150, 0.0250) should round to 0.02, but actually the numbers in the open interval (0.0195, 0.0295) are "rounded" to 0.02 ... so the range is shifted by 0.045 in this example and, additionally, the lower boundary is not included (open rather than semi-open interval). |
I mean that while 0.019 are incorrect if we supply another digit things start to get right. Maybe just "n+1" digit factor is wrong somewhere. As @hyenias noticed all is fine with 1.019, so looks likes only the first non-zero digit does not get rounded up or down properly. |
I am still working on some testing script(s) to help identify the extent of the problem as well as provide workaround(s). In particular, I thought I had a good working ksh rounding function but when I tried it out on a 32bit system instead of my normal 64bit it failed. Here are some of my current observations:
More to come later... |
I now have my test script running correctly on an ARMv7 box with an adjustment in my formula. The 32bitbox numbers supplied previously were run using text as input. Here are the corrected results. Corrected:
Next up for me is to see if my test script holds up on 64bit CPU using 32bit OS. |
Hey @hyenias, since ksh-community is not going anywhere fast (see discussion in #11), I'd like to invite you to submit a pull request to the 93u+m branch at https://github.com/ksh93/ksh when you've figured out a fix. Floating point math is well over my head, so your contribution and expertise would be an asset. |
I was able to get my test rounding script to work on my macOS, various Linux, and FreeBSD versions back in late April. Interestingly, ksh93's floating point behavior differed the most on my 32-bit FreeBSD box with its floating point responses being more inline with what I expected. Up until this issue, I did not remember the floating point math material from long ago. During my research, I found 0.30000000000000004.com as the most enlightening.
@McDutchie Thank you for the invite and I will be pleased to submit a pull request to your ksh93 branch. When all of my other ksh93 installations produced different results than the above along with the continuing kickoff efforts for ksh-community, I started learning other things. With all of your ksh93 branch activity, I am energized to continue working on this issue. |
Well, I have stared at the numbers long enough to determine the 3 following problems occurring with ksh's rounding using printf and float precision expansion.
|
I will now work on some code fixes for the above. Primary culprit is src/lib/libast/sfio/sfcvt.c. |
Update: I have made successful progress on problem 1 and 2 above. Need to validate against the other float formats, |
https://members.loria.fr/PZimmermann/papers/accuracy.pdf - interesting as well ... |
I have submitted corrections for sfcvt.c that fixes the original problem identified which is my 2nd bullet above about subnormal numbers. Still working on my items 1 and 3. I have also found another problem with rounding to whole numbers via a precision of 0. |
Backport the ksh2020 fix for timezone name determination Partial fix for ksh-community#6.
"UTC" is the modern name for what used to be "GMT", but ksh still preferred GMT. On systems configured to use the UTC time zone, this caused a 'printf %T' regression test failure in tests/builtins.sh as the external 'data' utility will prefer UTC these days. src/lib/libast/tm/tmdata.c: - Reorder the name alternatives for UTC/GMT so that UTC is the first preference. src/cmd/ksh93/tests/builtins.sh: - Report expected and actual values on 'printf %T' failure. Related: ksh-community#6
I didn't trust this back in e3d7bf1 (which disabled it for interactive shells) and I trust it less now. In af6a32d/6b380572, this was also disabled for virtual subshells as it caused program flow corruption there. Now, on macOS 10.14.6, a crash occurs when repeatedly running a command with this optimisation: $ ksh -c 'for((i=0;i<100;i++));do print -n "$i ";(sleep 1&);done' 0 1 2 3 4 5 6 7 Illegal instruction Oddly enough it seems that I can only reproduce this crash on macOS -- not on Linux, OpenBSD, or Solaris. It could be a macOS bug, particularly given the odd message in the stack trace below. I've had enough, though. Out it comes. Things now work fine, the reproducer is fixed on macOS, and it didn't optimise much anyway. The double-fork issue discussed in e3d7bf1 remains. ________ For future reference, here's an lldb debugger session with a stack trace. It crashes on calling calloc() (via sh_calloc(), via sh_newof()) in jobsave_create(). This is not an invalid pointer problem as we're allocating new memory, so it does look like an OS bug. The "BUG IN CLIENT OF LIBPLATFORM" message is interesting. $ lldb -- arch/*/bin/ksh -c 'for((i=0;i<100;i++));do print -n "$i ";(sleep 1&);done' (lldb) target create "arch/darwin.i386-64/bin/ksh" Current executable set to 'arch/darwin.i386-64/bin/ksh' (x86_64). (lldb) settings set -- target.run-args "-c" "for((i=0;i<100;i++));do print -n \"$i \";(sleep 1&);done" (lldb) run error: shell expansion failed (reason: lldb-argdumper exited with error 2). consider launching with 'process launch'. (lldb) process launch Process 35038 launched: '/usr/local/src/ksh93/ksh/arch/darwin.i386-64/bin/ksh' (x86_64) 0 1 2 3 4 5 6 7 8 9 Process 35038 stopped * thread ksh-community#1, queue = 'com.apple.main-thread', stop reason = EXC_BAD_INSTRUCTION (code=EXC_I386_INVOP, subcode=0x0) frame #0: 0x00007fff70deb1c2 libsystem_platform.dylib`_os_unfair_lock_recursive_abort + 23 libsystem_platform.dylib`_os_unfair_lock_recursive_abort: -> 0x7fff70deb1c2 <+23>: ud2 libsystem_platform.dylib`_os_unfair_lock_unowned_abort: 0x7fff70deb1c4 <+0>: movl %edi, %eax 0x7fff70deb1c6 <+2>: leaq 0x1a8a(%rip), %rcx ; "BUG IN CLIENT OF LIBPLATFORM: Unlock of an os_unfair_lock not owned by current thread" 0x7fff70deb1cd <+9>: movq %rcx, 0x361cb16c(%rip) ; gCRAnnotations + 8 Target 0: (ksh) stopped. (lldb) bt * thread ksh-community#1, queue = 'com.apple.main-thread', stop reason = EXC_BAD_INSTRUCTION (code=EXC_I386_INVOP, subcode=0x0) * frame #0: 0x00007fff70deb1c2 libsystem_platform.dylib`_os_unfair_lock_recursive_abort + 23 frame ksh-community#1: 0x00007fff70de7c9a libsystem_platform.dylib`_os_unfair_lock_lock_slow + 239 frame ksh-community#2: 0x00007fff70daa3bd libsystem_malloc.dylib`tiny_malloc_should_clear + 188 frame ksh-community#3: 0x00007fff70daa20f libsystem_malloc.dylib`szone_malloc_should_clear + 66 frame ksh-community#4: 0x00007fff70dab444 libsystem_malloc.dylib`malloc_zone_calloc + 99 frame ksh-community#5: 0x00007fff70dab3c4 libsystem_malloc.dylib`calloc + 30 frame ksh-community#6: 0x000000010003fa5d ksh`sh_calloc(nmemb=1, size=16) at init.c:264:13 frame ksh-community#7: 0x000000010004f8a6 ksh`jobsave_create(pid=35055) at jobs.c:272:8 frame ksh-community#8: 0x000000010004ed42 ksh`job_reap(sig=20) at jobs.c:363:9 frame ksh-community#9: 0x000000010004ff6f ksh`job_waitsafe(sig=20) at jobs.c:511:3 frame ksh-community#10: 0x00007fff70de9b5d libsystem_platform.dylib`_sigtramp + 29 frame ksh-community#11: 0x00007fff70d39ac4 libsystem_kernel.dylib`__fork + 12 frame ksh-community#12: 0x00007fff70c57d80 libsystem_c.dylib`fork + 17 frame ksh-community#13: 0x000000010009590d ksh`sh_exec(t=0x0000000101005d30, flags=4) at xec.c:1883:16 frame ksh-community#14: 0x0000000100096013 ksh`sh_exec(t=0x0000000101005d30, flags=4) at xec.c:2019:4 frame ksh-community#15: 0x0000000100096c4f ksh`sh_exec(t=0x0000000101005a40, flags=5) at xec.c:2213:9 frame ksh-community#16: 0x0000000100096013 ksh`sh_exec(t=0x0000000101005a40, flags=5) at xec.c:2019:4 frame ksh-community#17: 0x000000010001c23f ksh`exfile(iop=0x0000000100405750, fno=-1) at main.c:603:4 frame ksh-community#18: 0x000000010001b23c ksh`sh_main(ac=3, av=0x00007ffeefbff4f0, userinit=0x0000000000000000) at main.c:365:2 frame ksh-community#19: 0x0000000100000776 ksh`main(argc=3, argv=0x00007ffeefbff4f0) at pmain.c:45:9 frame ksh-community#20: 0x00007fff70bfe3d5 libdyld.dylib`start + 1
The ASan crash in basic.sh when sourcing multiple files is caused by a bug that is similar to the crash fixed in 59a5672. This is the trace for the regression test crash (note that in order to see the trace, the 2>/dev/null redirect must be disabled): ==1899388==ERROR: AddressSanitizer: heap-use-after-free on address 0x6150000005b0 at pc 0x55a5e3f9432a bp 0x7ffeb91ea110 sp 0x7ffeb91ea100 WRITE of size 8 at 0x6150000005b0 thread T0 #0 0x55a5e3f94329 in funct /home/johno/GitRepos/KornShell/ksh/src/cmd/ksh93/sh/parse.c:967 ksh-community#1 0x55a5e3f96f77 in item /home/johno/GitRepos/KornShell/ksh/src/cmd/ksh93/sh/parse.c:1349 ksh-community#2 0x55a5e3f90c9f in term /home/johno/GitRepos/KornShell/ksh/src/cmd/ksh93/sh/parse.c:642 ksh-community#3 0x55a5e3f90ac1 in list /home/johno/GitRepos/KornShell/ksh/src/cmd/ksh93/sh/parse.c:613 ksh-community#4 0x55a5e3f90845 in sh_cmd /home/johno/GitRepos/KornShell/ksh/src/cmd/ksh93/sh/parse.c:561 ksh-community#5 0x55a5e3f909e0 in sh_cmd /home/johno/GitRepos/KornShell/ksh/src/cmd/ksh93/sh/parse.c:586 ksh-community#6 0x55a5e3f8fd5e in sh_parse /home/johno/GitRepos/KornShell/ksh/src/cmd/ksh93/sh/parse.c:438 ksh-community#7 0x55a5e3fc43c1 in sh_eval /home/johno/GitRepos/KornShell/ksh/src/cmd/ksh93/sh/xec.c:635 ksh-community#8 0x55a5e4012172 in b_dot_cmd /home/johno/GitRepos/KornShell/ksh/src/cmd/ksh93/bltins/misc.c:318 ksh-community#9 0x55a5e3fca3cb in sh_exec /home/johno/GitRepos/KornShell/ksh/src/cmd/ksh93/sh/xec.c:1254 ksh-community#10 0x55a5e3fd01d4 in sh_exec /home/johno/GitRepos/KornShell/ksh/src/cmd/ksh93/sh/xec.c:1932 ksh-community#11 0x55a5e3fc4544 in sh_eval /home/johno/GitRepos/KornShell/ksh/src/cmd/ksh93/sh/xec.c:651 ksh-community#12 0x55a5e4012172 in b_dot_cmd /home/johno/GitRepos/KornShell/ksh/src/cmd/ksh93/bltins/misc.c:318 ksh-community#13 0x55a5e3fca3cb in sh_exec /home/johno/GitRepos/KornShell/ksh/src/cmd/ksh93/sh/xec.c:1254 ksh-community#14 0x55a5e3ecc1cd in exfile /home/johno/GitRepos/KornShell/ksh/src/cmd/ksh93/sh/main.c:604 ksh-community#15 0x55a5e3ec9e7f in sh_main /home/johno/GitRepos/KornShell/ksh/src/cmd/ksh93/sh/main.c:369 ksh-community#16 0x55a5e3ec801d in main /home/johno/GitRepos/KornShell/ksh/src/cmd/ksh93/sh/pmain.c:41 ksh-community#17 0x7f637b4db2cf (/usr/lib/libc.so.6+0x232cf) ksh-community#18 0x7f637b4db389 in __libc_start_main (/usr/lib/libc.so.6+0x23389) ksh-community#19 0x55a5e3ec7f24 in _start ../sysdeps/x86_64/start.S:115 Code in question: https://github.com/ksh93/ksh/blob/8d57369b0cb39074437dd82924b604155e30e1e0/src/cmd/ksh93/sh/parse.c#L963-L968 To avoid any more similar crashes, all of the fixes introduced in 7e317c5 that set slp->slptr to null have been improved with the fix in 59a5672.
When ksh executes a script without a #! path (note that the AT&T team had a real disliking for #! paths), ksh forks and goes through a quick reinitialisation procedure. This is much faster than invoking a fully new shell but should have the same effect if it all works well. Unfortunately it's not worked all that well so far. Even after recent improvements (see referenced commits) I've been finding corner case problems. FYI, running a script without #! basically goes like this: * in path_spawn(), execve() fails with ENOEXEC because the file is not a binary executable and does not start with #! * this triggers 'case ENOEXEC:' which: * forks ksh * calls exscript() * exscript() cleans up & calls siglongjmp(*sh.jmplist,SH_JMPSCRIPT) * SH_JMPSCRIPT is the highest longjmp value, so *all* the previous sigsetjmp/sh_pushcontext calls are unwinded in reverse order, triggering all sorts of cleanup, state restoration, removal of local scopes, etc. * eventually, this lands us at the top sigsetjmp in sh_main() * sh_main() calls sh_reinit(), then resumes as if the shell had just been started This commit makes the following interrelated changes for the correct functioning of this procedure: 1. exscript() now exports the environment into a dedicated Stk_t buffer and sets environ[] to that. 2. Instead of re-using existing variables, sh_reinit() deletes everything and reinits all name-value trees from scratch, then re-imports the environment from environ[]. 3. Variable values that were imported from the environment are no longer treated specially with an NV_IMPORT attribute and the np->nvenv pointer to their value in environ[], fixing at least one crash.[*1] Details of the changes follow: src/cmd/ksh93/sh/path.c: - exscript(): Generate a new environ[] by activating a dedicated AST stack that will not be overwritten before calling sh_envgen(). This will allow sh_reinit() to delete all variables and then reimport the environment. The exporting must be done here, before siglongjmp, otherwise locally scoped exported variables won't be included (siglongjmp with SH_JMPSCRIPT triggers cleanup of all scopes). src/cmd/ksh93/sh/init.c: - sh_reinit(): Largely rewritten as follows. - Reset shell options first. This has the beneficial side effect of unsetting SH_RESTRICTED which interferes with unsetting certain variables, like PATH. - Remove workarounds for FPATH, SHLVL and tilde expansion disciplines; these will not be needed now. - Properly unset and delete all functions and built-ins. Since we now unset a function before deleting it, this should now free up their memory. (See nvdisc.c below for a change allowing removal of special built-ins.) - Properly unset all variables (which includes any associated discipline functions). Incorporate here the needed logic from sh_envnolocal() in name.c; most of it is unneeded (that function was previously used to cleanup local variables but has not been used for that for decades). So sh_envnolocal() is now unused. - Delete variables in a separate pass after unsetting variables and unsetting and deleting functions; this avoids use-after- free problems as well as possible "no parent" problems with namespace variables (e.g., .rc.status in our new kshrc.sh). - After all that, close and free up all function, alias, tracked alias, type and variable trees. - Free the contiguous built-in node space and the Init_t init context (with all the special variable discipline pointers). - Call nv_init (previously only called from sh_init) to reinitialise all of the above name-value stuff from scratch. It's the only way to be sure. - Re-import the environment as stored by exscript() above. - env_init(): - Per item 3 above and footnote 1 below, no longer set NV_IMPORT attribute and no longer point np->nvenv to the item in environ. - POSIX says, for 'environ': "Any application that directly modifies the pointers to which the environ variable points has undefined behavior."[*2] Yet, env_init() is indeed juggling the environ[] pointers to deal with variables that cannot be imported because their names are invalid (because they still need to be saved to be passed on to child processes). Replace the current approach with one where those env vars get allocated on the heap, pointed to by sh.save_env and counted by sh.save_env_n (renamed from sh.nenv). This only needs to be done once as ksh cannot use or change these variables. src/cmd/ksh93/sh/name.c: - sh_envgen(): Update to match env_init() change above. - pushnam() (called by sh_envgen()): Remove NV_IMPORT attribute check as per above and never get the value from the nvenv pointer -- simply always use nv_getval(). As of this change, the NV_IMPORT attribute is unused. The next commit will remove it and do related cleanups. - staknam(): is only called if value!=NULL, so remove that 'if'. - sh_envnolocal(): Removed. src/cmd/ksh93/sh/nvdisc.c: - assign(): Remove a check for the SH_INIT state bit that avoids freeing functions during sh_reinit(). This works fine now. - sh_addbuiltin(): Allow sh_reinit() to delete special builtins by checking for the SH_INIT state bit before throwing an error. src/cmd/ksh93/sh/nvtree.c: - outval(): Add a workaround for a use-after-free, introduced by the changes above, that occurred in the types.sh tests for #!-less scripts (types.sh:675-722). The use-after-free occurred here (abridged ASan trace follows; line numbers are current as of this commit): ==30849==ERROR: AddressSanitizer: heap-use-after-free [...] #0 in dttree dttree.c:393 ksh-community#1 in sh_reinit init.c:1637 ksh-community#2 in sh_main main.c:136 [...] The pointer was freed in the same loop via nv_delete() in outval: #0 in wrap_free+0x98 (libclang_rt.asan_osx_dynamic.dylib:[...]) ksh-community#1 in nv_delete name.c:1318 ksh-community#2 in outval nvtree.c:731 ksh-community#3 in genvalue nvtree.c:905 ksh-community#4 in walk_tree nvtree.c:1042 ksh-community#5 in put_tree nvtree.c:1108 ksh-community#6 in nv_putv nvdisc.c:144 ksh-community#7 in _nv_unset name.c:2437 ksh-community#8 in sh_reinit init.c:1645 ksh-community#9 in sh_main main.c:136 [...] So, what happened was that the nv_delete() call on name.c:1318 (eventually resulting from the _nv_unset call on init.c:1645) freed the node pointed to by np, so that the next loop iteration crashed on line 1637 as the dtnext() macro now gets a freed np. Now, why on earth should _nv_unset() *ever* indirectly call nv_delete()? That's a question for another day; I suspect it may be a bug, or it may be needed for compound variables for some reason. For now, I'm adding a workaround: simply avoid calling nv_delete() if the SH_INIT state bit is on, indicating sh_reinit() is in the call stack. This allows the variables unset loop in sh_reinit() to continue without crashing. sh_reinit() handles deletion later anyway. src/cmd/ksh93/sh/main.c: - sh_main(): remove zeroing of sh.fun_depth and sh.dot_depth; these are known to be 0, coming from either sh_init() or sh_reinit(). ________ [*1] This NV_IMPORT/nvenv usage is a redundant holdout from ancient ksh code; the imported value is easily available as a normal shell variable value via nv_getval(). Plus, the nvenv pointer is overloaded with too many other purposes: so far I've discovered it's used for pointers to subarrays of arrays (multidimentional arrays), compound variables, builtins, and other things. This mess caused at least one crash in set_instance() (xec.c) due to incorrectly using that nvenv pointer. The current kshrc script triggers this. Reproducer: $ export PS1 $ bin/package use «0»26:…/src/ksh93/ksh[dev] $ typeset +x PS1 ...and crash. That is now fixed. [*2] https://pubs.opengroup.org/onlinepubs/9699919799/functions/environ.html
The referenced commit left one test unexecuted because it crashes. Minimal reproducer: typeset -a arr=((a b c) 1) got=$( typeset -a arr=( ( ((a b c)1))) ) The crash occurs when the array is redefined in a subshell. Here are abridged ASan stack traces for the crash, for the use after free, and for when it was freed: ================================================================= ==73147==ERROR: AddressSanitizer: heap-use-after-free [snippage] READ of size 8 at 0x000107403eb0 thread T0 #0 0x104fded40 in nv_search nvdisc.c:1007 ksh-community#1 0x104fbeb1c in nv_create name.c:860 ksh-community#2 0x104fb8b9c in nv_open name.c:1440 ksh-community#3 0x104fb1edc in nv_setlist name.c:309 ksh-community#4 0x104fb4a30 in nv_setlist name.c:475 ksh-community#5 0x105055b58 in sh_exec xec.c:1079 ksh-community#6 0x105045cd4 in sh_subshell subshell.c:654 ksh-community#7 0x104f92c1c in comsubst macro.c:2266 [snippage] 0x000107403eb0 is located 0 bytes inside of 80-byte region [snippage] freed by thread T0 here: #0 0x105c5ade4 in wrap_free+0x98 (libclang_rt.asan_osx_dynamic.dylib:arm64e+0x3ede4) ksh-community#1 0x105261da0 in dtclose dtclose.c:52 ksh-community#2 0x104f178cc in array_putval array.c:671 ksh-community#3 0x104fd7f4c in nv_putv nvdisc.c:144 ksh-community#4 0x104fbc5f0 in _nv_unset name.c:2435 ksh-community#5 0x104fb3250 in nv_setlist name.c:364 ksh-community#6 0x105055b58 in sh_exec xec.c:1079 ksh-community#7 0x105045cd4 in sh_subshell subshell.c:654 ksh-community#8 0x104f92c1c in comsubst macro.c:2266 [snippage] So the crash is caused because array_putval (array.c:671) calls dtclose, freeing ap->table, which is then reused after a recursive nv_setlist call via nv_open() -> nv_create() -> nv_search(). This only happens whwn we're in a virtual subshell. src/cmd/ksh93/sh/array.c: - array_putval(): When redefining an array in a virtual subshell, do not free the old ap->table; it will be needed by the parent shell environment.
I didn't trust this back in e3d7bf1 (which disabled it for interactive shells) and I trust it less now. In af6a32d/6b380572, this was also disabled for virtual subshells as it caused program flow corruption there. Now, on macOS 10.14.6, a crash occurs when repeatedly running a command with this optimisation: $ ksh -c 'for((i=0;i<100;i++));do print -n "$i ";(sleep 1&);done' 0 1 2 3 4 5 6 7 Illegal instruction Oddly enough it seems that I can only reproduce this crash on macOS -- not on Linux, OpenBSD, or Solaris. It could be a macOS bug, particularly given the odd message in the stack trace below. I've had enough, though. Out it comes. Things now work fine, the reproducer is fixed on macOS, and it didn't optimise much anyway. The double-fork issue discussed in e3d7bf1 remains. ________ For future reference, here's an lldb debugger session with a stack trace. It crashes on calling calloc() (via sh_calloc(), via sh_newof()) in jobsave_create(). This is not an invalid pointer problem as we're allocating new memory, so it does look like an OS bug. The "BUG IN CLIENT OF LIBPLATFORM" message is interesting. $ lldb -- arch/*/bin/ksh -c 'for((i=0;i<100;i++));do print -n "$i ";(sleep 1&);done' (lldb) target create "arch/darwin.i386-64/bin/ksh" Current executable set to 'arch/darwin.i386-64/bin/ksh' (x86_64). (lldb) settings set -- target.run-args "-c" "for((i=0;i<100;i++));do print -n \"$i \";(sleep 1&);done" (lldb) run error: shell expansion failed (reason: lldb-argdumper exited with error 2). consider launching with 'process launch'. (lldb) process launch Process 35038 launched: '/usr/local/src/ksh93/ksh/arch/darwin.i386-64/bin/ksh' (x86_64) 0 1 2 3 4 5 6 7 8 9 Process 35038 stopped * thread ksh-community#1, queue = 'com.apple.main-thread', stop reason = EXC_BAD_INSTRUCTION (code=EXC_I386_INVOP, subcode=0x0) frame #0: 0x00007fff70deb1c2 libsystem_platform.dylib`_os_unfair_lock_recursive_abort + 23 libsystem_platform.dylib`_os_unfair_lock_recursive_abort: -> 0x7fff70deb1c2 <+23>: ud2 libsystem_platform.dylib`_os_unfair_lock_unowned_abort: 0x7fff70deb1c4 <+0>: movl %edi, %eax 0x7fff70deb1c6 <+2>: leaq 0x1a8a(%rip), %rcx ; "BUG IN CLIENT OF LIBPLATFORM: Unlock of an os_unfair_lock not owned by current thread" 0x7fff70deb1cd <+9>: movq %rcx, 0x361cb16c(%rip) ; gCRAnnotations + 8 Target 0: (ksh) stopped. (lldb) bt * thread ksh-community#1, queue = 'com.apple.main-thread', stop reason = EXC_BAD_INSTRUCTION (code=EXC_I386_INVOP, subcode=0x0) * frame #0: 0x00007fff70deb1c2 libsystem_platform.dylib`_os_unfair_lock_recursive_abort + 23 frame ksh-community#1: 0x00007fff70de7c9a libsystem_platform.dylib`_os_unfair_lock_lock_slow + 239 frame ksh-community#2: 0x00007fff70daa3bd libsystem_malloc.dylib`tiny_malloc_should_clear + 188 frame ksh-community#3: 0x00007fff70daa20f libsystem_malloc.dylib`szone_malloc_should_clear + 66 frame ksh-community#4: 0x00007fff70dab444 libsystem_malloc.dylib`malloc_zone_calloc + 99 frame ksh-community#5: 0x00007fff70dab3c4 libsystem_malloc.dylib`calloc + 30 frame ksh-community#6: 0x000000010003fa5d ksh`sh_calloc(nmemb=1, size=16) at init.c:264:13 frame ksh-community#7: 0x000000010004f8a6 ksh`jobsave_create(pid=35055) at jobs.c:272:8 frame ksh-community#8: 0x000000010004ed42 ksh`job_reap(sig=20) at jobs.c:363:9 frame ksh-community#9: 0x000000010004ff6f ksh`job_waitsafe(sig=20) at jobs.c:511:3 frame ksh-community#10: 0x00007fff70de9b5d libsystem_platform.dylib`_sigtramp + 29 frame ksh-community#11: 0x00007fff70d39ac4 libsystem_kernel.dylib`__fork + 12 frame ksh-community#12: 0x00007fff70c57d80 libsystem_c.dylib`fork + 17 frame ksh-community#13: 0x000000010009590d ksh`sh_exec(t=0x0000000101005d30, flags=4) at xec.c:1883:16 frame ksh-community#14: 0x0000000100096013 ksh`sh_exec(t=0x0000000101005d30, flags=4) at xec.c:2019:4 frame ksh-community#15: 0x0000000100096c4f ksh`sh_exec(t=0x0000000101005a40, flags=5) at xec.c:2213:9 frame ksh-community#16: 0x0000000100096013 ksh`sh_exec(t=0x0000000101005a40, flags=5) at xec.c:2019:4 frame ksh-community#17: 0x000000010001c23f ksh`exfile(iop=0x0000000100405750, fno=-1) at main.c:603:4 frame ksh-community#18: 0x000000010001b23c ksh`sh_main(ac=3, av=0x00007ffeefbff4f0, userinit=0x0000000000000000) at main.c:365:2 frame ksh-community#19: 0x0000000100000776 ksh`main(argc=3, argv=0x00007ffeefbff4f0) at pmain.c:45:9 frame ksh-community#20: 0x00007fff70bfe3d5 libdyld.dylib`start + 1
The ASan crash in basic.sh when sourcing multiple files is caused by a bug that is similar to the crash fixed in f24040e. This is the trace for the regression test crash (note that in order to see the trace, the 2>/dev/null redirect must be disabled): ==1899388==ERROR: AddressSanitizer: heap-use-after-free on address 0x6150000005b0 at pc 0x55a5e3f9432a bp 0x7ffeb91ea110 sp 0x7ffeb91ea100 WRITE of size 8 at 0x6150000005b0 thread T0 #0 0x55a5e3f94329 in funct /home/johno/GitRepos/KornShell/ksh/src/cmd/ksh93/sh/parse.c:967 ksh-community#1 0x55a5e3f96f77 in item /home/johno/GitRepos/KornShell/ksh/src/cmd/ksh93/sh/parse.c:1349 ksh-community#2 0x55a5e3f90c9f in term /home/johno/GitRepos/KornShell/ksh/src/cmd/ksh93/sh/parse.c:642 ksh-community#3 0x55a5e3f90ac1 in list /home/johno/GitRepos/KornShell/ksh/src/cmd/ksh93/sh/parse.c:613 ksh-community#4 0x55a5e3f90845 in sh_cmd /home/johno/GitRepos/KornShell/ksh/src/cmd/ksh93/sh/parse.c:561 ksh-community#5 0x55a5e3f909e0 in sh_cmd /home/johno/GitRepos/KornShell/ksh/src/cmd/ksh93/sh/parse.c:586 ksh-community#6 0x55a5e3f8fd5e in sh_parse /home/johno/GitRepos/KornShell/ksh/src/cmd/ksh93/sh/parse.c:438 ksh-community#7 0x55a5e3fc43c1 in sh_eval /home/johno/GitRepos/KornShell/ksh/src/cmd/ksh93/sh/xec.c:635 ksh-community#8 0x55a5e4012172 in b_dot_cmd /home/johno/GitRepos/KornShell/ksh/src/cmd/ksh93/bltins/misc.c:318 ksh-community#9 0x55a5e3fca3cb in sh_exec /home/johno/GitRepos/KornShell/ksh/src/cmd/ksh93/sh/xec.c:1254 ksh-community#10 0x55a5e3fd01d4 in sh_exec /home/johno/GitRepos/KornShell/ksh/src/cmd/ksh93/sh/xec.c:1932 ksh-community#11 0x55a5e3fc4544 in sh_eval /home/johno/GitRepos/KornShell/ksh/src/cmd/ksh93/sh/xec.c:651 ksh-community#12 0x55a5e4012172 in b_dot_cmd /home/johno/GitRepos/KornShell/ksh/src/cmd/ksh93/bltins/misc.c:318 ksh-community#13 0x55a5e3fca3cb in sh_exec /home/johno/GitRepos/KornShell/ksh/src/cmd/ksh93/sh/xec.c:1254 ksh-community#14 0x55a5e3ecc1cd in exfile /home/johno/GitRepos/KornShell/ksh/src/cmd/ksh93/sh/main.c:604 ksh-community#15 0x55a5e3ec9e7f in sh_main /home/johno/GitRepos/KornShell/ksh/src/cmd/ksh93/sh/main.c:369 ksh-community#16 0x55a5e3ec801d in main /home/johno/GitRepos/KornShell/ksh/src/cmd/ksh93/sh/pmain.c:41 ksh-community#17 0x7f637b4db2cf (/usr/lib/libc.so.6+0x232cf) ksh-community#18 0x7f637b4db389 in __libc_start_main (/usr/lib/libc.so.6+0x23389) ksh-community#19 0x55a5e3ec7f24 in _start ../sysdeps/x86_64/start.S:115 Code in question: https://github.com/ksh93/ksh/blob/8d57369b0cb39074437dd82924b604155e30e1e0/src/cmd/ksh93/sh/parse.c#L963-L968 To avoid any more similar crashes, all of the fixes introduced in 69d37d5 that set slp->slptr to null have been improved with the fix in f24040e.
When ksh executes a script without a #! path (note that the AT&T team had a real disliking for #! paths), ksh forks and goes through a quick reinitialisation procedure. This is much faster than invoking a fully new shell but should have the same effect if it all works well. Unfortunately it's not worked all that well so far. Even after recent improvements (see referenced commits) I've been finding corner case problems. FYI, running a script without #! basically goes like this: * in path_spawn(), execve() fails with ENOEXEC because the file is not a binary executable and does not start with #! * this triggers 'case ENOEXEC:' which: * forks ksh * calls exscript() * exscript() cleans up & calls siglongjmp(*sh.jmplist,SH_JMPSCRIPT) * SH_JMPSCRIPT is the highest longjmp value, so *all* the previous sigsetjmp/sh_pushcontext calls are unwinded in reverse order, triggering all sorts of cleanup, state restoration, removal of local scopes, etc. * eventually, this lands us at the top sigsetjmp in sh_main() * sh_main() calls sh_reinit(), then resumes as if the shell had just been started This commit makes the following interrelated changes for the correct functioning of this procedure: 1. exscript() now exports the environment into a dedicated Stk_t buffer and sets environ[] to that. 2. Instead of re-using existing variables, sh_reinit() deletes everything and reinits all name-value trees from scratch, then re-imports the environment from environ[]. 3. Variable values that were imported from the environment are no longer treated specially with an NV_IMPORT attribute and the np->nvenv pointer to their value in environ[], fixing at least one crash.[*1] Details of the changes follow: src/cmd/ksh93/sh/path.c: - exscript(): Generate a new environ[] by activating a dedicated AST stack that will not be overwritten before calling sh_envgen(). This will allow sh_reinit() to delete all variables and then reimport the environment. The exporting must be done here, before siglongjmp, otherwise locally scoped exported variables won't be included (siglongjmp with SH_JMPSCRIPT triggers cleanup of all scopes). src/cmd/ksh93/sh/init.c: - sh_reinit(): Largely rewritten as follows. - Reset shell options first. This has the beneficial side effect of unsetting SH_RESTRICTED which interferes with unsetting certain variables, like PATH. - Remove workarounds for FPATH, SHLVL and tilde expansion disciplines; these will not be needed now. - Properly unset and delete all functions and built-ins. Since we now unset a function before deleting it, this should now free up their memory. (See nvdisc.c below for a change allowing removal of special built-ins.) - Properly unset all variables (which includes any associated discipline functions). Incorporate here the needed logic from sh_envnolocal() in name.c; most of it is unneeded (that function was previously used to cleanup local variables but has not been used for that for decades). So sh_envnolocal() is now unused. - Delete variables in a separate pass after unsetting variables and unsetting and deleting functions; this avoids use-after- free problems as well as possible "no parent" problems with namespace variables (e.g., .rc.status in our new kshrc.sh). - After all that, close and free up all function, alias, tracked alias, type and variable trees. - Free the contiguous built-in node space and the Init_t init context (with all the special variable discipline pointers). - Call nv_init (previously only called from sh_init) to reinitialise all of the above name-value stuff from scratch. It's the only way to be sure. - Re-import the environment as stored by exscript() above. - env_init(): - Per item 3 above and footnote 1 below, no longer set NV_IMPORT attribute and no longer point np->nvenv to the item in environ. - POSIX says, for 'environ': "Any application that directly modifies the pointers to which the environ variable points has undefined behavior."[*2] Yet, env_init() is indeed juggling the environ[] pointers to deal with variables that cannot be imported because their names are invalid (because they still need to be saved to be passed on to child processes). Replace the current approach with one where those env vars get allocated on the heap, pointed to by sh.save_env and counted by sh.save_env_n (renamed from sh.nenv). This only needs to be done once as ksh cannot use or change these variables. src/cmd/ksh93/sh/name.c: - sh_envgen(): Update to match env_init() change above. - pushnam() (called by sh_envgen()): Remove NV_IMPORT attribute check as per above and never get the value from the nvenv pointer -- simply always use nv_getval(). As of this change, the NV_IMPORT attribute is unused. The next commit will remove it and do related cleanups. - staknam(): is only called if value!=NULL, so remove that 'if'. - sh_envnolocal(): Removed. src/cmd/ksh93/sh/nvdisc.c: - assign(): Remove a check for the SH_INIT state bit that avoids freeing functions during sh_reinit(). This works fine now. - sh_addbuiltin(): Allow sh_reinit() to delete special builtins by checking for the SH_INIT state bit before throwing an error. src/cmd/ksh93/sh/nvtree.c: - outval(): Add a workaround for a use-after-free, introduced by the changes above, that occurred in the types.sh tests for #!-less scripts (types.sh:675-722). The use-after-free occurred here (abridged ASan trace follows; line numbers are current as of this commit): ==30849==ERROR: AddressSanitizer: heap-use-after-free [...] #0 in dttree dttree.c:393 ksh-community#1 in sh_reinit init.c:1637 ksh-community#2 in sh_main main.c:136 [...] The pointer was freed in the same loop via nv_delete() in outval: #0 in wrap_free+0x98 (libclang_rt.asan_osx_dynamic.dylib:[...]) ksh-community#1 in nv_delete name.c:1318 ksh-community#2 in outval nvtree.c:731 ksh-community#3 in genvalue nvtree.c:905 ksh-community#4 in walk_tree nvtree.c:1042 ksh-community#5 in put_tree nvtree.c:1108 ksh-community#6 in nv_putv nvdisc.c:144 ksh-community#7 in _nv_unset name.c:2437 ksh-community#8 in sh_reinit init.c:1645 ksh-community#9 in sh_main main.c:136 [...] So, what happened was that the nv_delete() call on name.c:1318 (eventually resulting from the _nv_unset call on init.c:1645) freed the node pointed to by np, so that the next loop iteration crashed on line 1637 as the dtnext() macro now gets a freed np. Now, why on earth should _nv_unset() *ever* indirectly call nv_delete()? That's a question for another day; I suspect it may be a bug, or it may be needed for compound variables for some reason. For now, I'm adding a workaround: simply avoid calling nv_delete() if the SH_INIT state bit is on, indicating sh_reinit() is in the call stack. This allows the variables unset loop in sh_reinit() to continue without crashing. sh_reinit() handles deletion later anyway. src/cmd/ksh93/sh/main.c: - sh_main(): remove zeroing of sh.fun_depth and sh.dot_depth; these are known to be 0, coming from either sh_init() or sh_reinit(). ________ [*1] This NV_IMPORT/nvenv usage is a redundant holdout from ancient ksh code; the imported value is easily available as a normal shell variable value via nv_getval(). Plus, the nvenv pointer is overloaded with too many other purposes: so far I've discovered it's used for pointers to subarrays of arrays (multidimentional arrays), compound variables, builtins, and other things. This mess caused at least one crash in set_instance() (xec.c) due to incorrectly using that nvenv pointer. The current kshrc script triggers this. Reproducer: $ export PS1 $ bin/package use «0»26:…/src/ksh93/ksh[dev] $ typeset +x PS1 ...and crash. That is now fixed. [*2] https://pubs.opengroup.org/onlinepubs/9699919799/functions/environ.html
The referenced commit left one test unexecuted because it crashes. Minimal reproducer: typeset -a arr=((a b c) 1) got=$( typeset -a arr=( ( ((a b c)1))) ) The crash occurs when the array is redefined in a subshell. Here are abridged ASan stack traces for the crash, for the use after free, and for when it was freed: ================================================================= ==73147==ERROR: AddressSanitizer: heap-use-after-free [snippage] READ of size 8 at 0x000107403eb0 thread T0 #0 0x104fded40 in nv_search nvdisc.c:1007 ksh-community#1 0x104fbeb1c in nv_create name.c:860 ksh-community#2 0x104fb8b9c in nv_open name.c:1440 ksh-community#3 0x104fb1edc in nv_setlist name.c:309 ksh-community#4 0x104fb4a30 in nv_setlist name.c:475 ksh-community#5 0x105055b58 in sh_exec xec.c:1079 ksh-community#6 0x105045cd4 in sh_subshell subshell.c:654 ksh-community#7 0x104f92c1c in comsubst macro.c:2266 [snippage] 0x000107403eb0 is located 0 bytes inside of 80-byte region [snippage] freed by thread T0 here: #0 0x105c5ade4 in wrap_free+0x98 (libclang_rt.asan_osx_dynamic.dylib:arm64e+0x3ede4) ksh-community#1 0x105261da0 in dtclose dtclose.c:52 ksh-community#2 0x104f178cc in array_putval array.c:671 ksh-community#3 0x104fd7f4c in nv_putv nvdisc.c:144 ksh-community#4 0x104fbc5f0 in _nv_unset name.c:2435 ksh-community#5 0x104fb3250 in nv_setlist name.c:364 ksh-community#6 0x105055b58 in sh_exec xec.c:1079 ksh-community#7 0x105045cd4 in sh_subshell subshell.c:654 ksh-community#8 0x104f92c1c in comsubst macro.c:2266 [snippage] So the crash is caused because array_putval (array.c:671) calls dtclose, freeing ap->table, which is then reused after a recursive nv_setlist call via nv_open() -> nv_create() -> nv_search(). This only happens whwn we're in a virtual subshell. src/cmd/ksh93/sh/array.c: - array_putval(): When redefining an array in a virtual subshell, do not free the old ap->table; it will be needed by the parent shell environment.
I accidentally noted the following behaviour:
so
%f
format seems partly broken...The text was updated successfully, but these errors were encountered: