utf8n_to_uvchr_msgs: Widen declaration to U16. #22663

khwilliamson · 2024-10-13T19:18:56Z

Testing on an EBCDIC box showed that this was too narrow.

This set of changes does not require a perldelta entry.

Testing on an EBCDIC box showed that this was too narrow.

bulk88 · 2024-10-14T22:23:16Z

Could you add an #ifdef specific for EBCDIC builds between U8 and U16 math, or a "max_IV_fnd_in_array" in /mktables or /unicore or /regen.pl, whatever .pl generates array, EXTCONST U8 PL_strict_utf8_dfa_tab[] = {}; ?

U8 math is often faster than U16 math on many CPUs archs, or all U16 math is silently upgraded to U32 math by CCs since U16 isn't a native type on many CPUs or most integer and bitwise ops are missing in U16 variants, and the CC always zero/sign extend to U32, then cast back down to U16 before storing to C stack/malloc.

If this (#ifdef U8 U16) is an easy fix do it, if it requires a page/screenful or pages/many screenfuls of code to implement, forget and stick to your current patch as written.

khwilliamson · 2024-10-17T22:37:08Z

The goal is to have as few EBCDIC compilation dependencies as possible. Excluding machine-generated files, there are now fewer than 60 in the core, more than a third of them are in toke.c. I continue to try to find ways to remove them when I encounter them. So I'm sceptical of adding new ones. But I would if I thought it would improve ASCII platform performance. But I don't believe it would do so here.

First, the declaration doesn't ask for a 16bit variable; it asks for a fast variable which is at least 16 bits wide. I looked at gotbot and I don't see extra instructions generated as a result of this not being 8bits. I tried this program

#include <stdint.h>
int_fast8_t a;
unsigned char b[1000];
unsigned char c = b[a];

on several compilers. Here is what it generated with MSVC

signed char a DB 01H DUP (?) ; a
ALIGN 4
unsigned char * b DB 03e8H DUP (?) ; b
unsigned char c DB 01H DUP (?) ; c
void dynamic initializer for 'c''(void) PROC ; dynamic initializer for 'c'', COMDAT
movsx rax, BYTE PTR signed char a ; a
lea rcx, OFFSET FLAT:unsigned char * b ; b
movzx eax, BYTE PTR [rcx+rax]
mov BYTE PTR unsigned char c, al ; c
ret 0
void `dynamic initializer for 'c''(void) ENDP

Changing the declaration for a to int_fast16_t yields

int a DD 01H DUP (?) ; a
unsigned char * b DB 03e8H DUP (?) ; b
unsigned char c DB 01H DUP (?) ; c
void dynamic initializer for 'c''(void) PROC ; dynamic initializer for 'c'', COMDAT
movsxd rax, DWORD PTR int a ; a
lea rcx, OFFSET FLAT:unsigned char * b ; b
movzx eax, BYTE PTR [rcx+rax]
mov BYTE PTR unsigned char c, al ; c
ret 0
void `dynamic initializer for 'c''(void) ENDP

bulk88 · 2024-10-18T04:11:59Z

Ignore my 1st comment. "PL_strict_utf8_dfa_tab" is already decl as U16 in my perl, and contains numbers like 322 in that array. I was wrong.

edit: i was looking at wrong src code, its a U8 in my perl binary. Yeah MSVC will always upgrade U16 math to U32 while its in a register. Its only when its written back to memory, if ever, it has to do a explicity truncate of high 2 bytes.,

bulk88 · 2024-10-18T04:32:21Z

Okay, lets do this scientifically with god bolt since im typing too fast and not thinking enough.

test done with godbolt "x86 msvc v19.latest"

Char_t math is 0x19 bytes long. Short_t math is 0x1a bytes long. Char_t math wins. I16/U16 is half-native to x86/x64. I8/U8 and I32/U32 are native to x86/x64. I64/U64 are half-native to x64!!!!!!

All U16 math requires 0x66 operand override prefix to go back to 16-bit computing.

All U64 math on x64 AMD/Intel requires 0x44 or 0x48 operand prefix override. AMD made a stupid decision back in 1998-1999, that existing 32 bit .o files, will execute WITHOUT CHANGES in a 64 bit process. AMD only made the call/return operators, native accept I64/U64 operands. Everything else stayed binary compatible with old 32 bit x86 code. AMD never thought anyone would use 64 bit ints for counters, or indexes. AMD thought people only wanted 64 bit pointers, not 64 bit arithmitic. So basically U32 math is always better/smaller than U64 math on AMD x64.

USING SHORT MATH

/FAc /O2

#include <stdio.h>

void f1(char * pch) {
    short s1 = *pch;
    s1 = s1*2;
    printf("%u", s1);
}

unsigned __int64 `__local_stdio_printf_options'::`2'::_OptionsStorage DQ 01H DUP (?) ; `__local_stdio_printf_options'::`2'::_OptionsStorage
`string' DB '%u', 00H                    ; `string'

_pch$ = 8                                         ; size = 4
void f1(char *) PROC                                  ; f1, COMDAT
  00000 8b 44 24 04        mov   eax, DWORD PTR _pch$[esp-4]
  00004 66 0f be 00        movsx         ax, BYTE PTR [eax]
  00008 66 03 c0   add         ax, ax
  0000b 98                 cwde
  0000c 50                 push    eax
  0000d 68 00 00 00 00     push    OFFSET `string'
  00012 e8 00 00 00 00     call    _printf
  00017 83 c4 08   add         esp, 8
  0001a c3                 ret     0
void f1(char *) ENDP                                  ; f1

NOW WITH CHAR MATH

/FAc /O2

#include <stdio.h>

void f1(char * pch) {
    char s1 = *pch;
    s1 = s1*2;
    printf("%u", s1);
}

unsigned __int64 `__local_stdio_printf_options'::`2'::_OptionsStorage DQ 01H DUP (?) ; `__local_stdio_printf_options'::`2'::_OptionsStorage
`string' DB '%u', 00H                    ; `string'

_pch$ = 8                                         ; size = 4
void f1(char *) PROC                                  ; f1, COMDAT
  00000 8b 44 24 04        mov   eax, DWORD PTR _pch$[esp-4]
  00004 8a 00        mov         al, BYTE PTR [eax]
  00006 02 c0        add         al, al
  00008 0f be c0   movsx       eax, al
  0000b 50                 push    eax
  0000c 68 00 00 00 00     push    OFFSET `string'
  00011 e8 00 00 00 00     call    _printf
  00016 83 c4 08   add         esp, 8
  00019 c3                 ret     0
void f1(char *) ENDP                                  ; f1

khwilliamson · 2024-10-21T22:18:51Z

We do not have enough people working on this project to enable us the luxury of sweating details.

I disagree that the savings of a single byte constitute a "win". It is instead a "wash".

And, again, 16 bit arithmetic is not relevant to this issue. The compiler has been directed to choose a suitably fast method that is larger than a character. Whether that is 32 or 64 (not likely to be anything else) is up to the compiler's discretion.

khwilliamson · 2024-10-31T18:19:38Z

I compiled utf8.c with and without this patch and, the patch actually used less space than without. The total size with all the inlined occurrences decreased a miniscule 70 bytes out of 500K

utf8n_to_uvchr_msgs: Widen declaration to U16.

5483f9e

Testing on an EBCDIC box showed that this was too narrow.

khwilliamson merged commit 4acc9fb into Perl:blead Oct 31, 2024
34 checks passed

khwilliamson deleted the fast16 branch October 31, 2024 18:19

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

utf8n_to_uvchr_msgs: Widen declaration to U16. #22663

utf8n_to_uvchr_msgs: Widen declaration to U16. #22663

khwilliamson commented Oct 13, 2024

bulk88 commented Oct 14, 2024

khwilliamson commented Oct 17, 2024

bulk88 commented Oct 18, 2024 •

edited

Loading

bulk88 commented Oct 18, 2024 •

edited

Loading

khwilliamson commented Oct 21, 2024

khwilliamson commented Oct 31, 2024

utf8n_to_uvchr_msgs: Widen declaration to U16. #22663

utf8n_to_uvchr_msgs: Widen declaration to U16. #22663

Conversation

khwilliamson commented Oct 13, 2024

bulk88 commented Oct 14, 2024

khwilliamson commented Oct 17, 2024

bulk88 commented Oct 18, 2024 • edited Loading

bulk88 commented Oct 18, 2024 • edited Loading

khwilliamson commented Oct 21, 2024

khwilliamson commented Oct 31, 2024

bulk88 commented Oct 18, 2024 •

edited

Loading

bulk88 commented Oct 18, 2024 •

edited

Loading