Is there SIMD support? #411

ntysdd · 2022-08-08T16:30:25Z

For example, use SIMD intrinsics explicity, or use long long to process 8 bytes together?

skvadrik · 2022-08-12T14:21:22Z

Not yet. I had some ideas about it, but it's not in the works yet.

ntysdd · 2022-08-15T00:18:55Z

Thanks, that's good to know.

skvadrik · 2022-09-18T07:39:12Z

A note about alignment: since we cannot guarantee input data alignment, it would be impossible to use multi-byte reads (not without explicit guarantee from the user that the input is aligned). However, it should still be possible to combine multiple bytes into a 2/4/8-byte value and do the switch on this combined value rather than do 2/4/8 consequent switch statements. This optimization is not straightforward, as it the underlying DFA may not have many linear segments that can be combined this way (due to the grammar, or due to the possible end-of-input after each byte). This needs some study and experiments.

pmetzger · 2022-09-22T14:37:14Z

I wonder if explicitly open coding the SIMD compiler intrinsics beats trying to emit constructs that the compiler can SIMD itself.

skvadrik · 2022-09-22T21:02:49Z

I don't think compiler can perform such optimization, as they require a bit of high-level insight.

Consider this simple regular grammar:

bool lex(const char* YYCURSOR) {
    const char* YYMARKER;
/*!re2c
    re2c:yyfill:enable = 0;
    re2c:define:YYCTYPE = char;

    "abcd" { return true; }
    *      { return false; }
*/
}

It has just two rules: either a string "abcd" or anything else. It would be easy for a human to read four bytes (as four 1-byte reads with arbitrary alignment) then combine into a single 4-byte value and compare it against the numeric value of abcd. Of course, it would be necessary to ensure that there are 4 byte in the buffer, so this optimization would only work with padding-based end-of-input checks.

Currently re2c generates the following "branchy" code (re2c 1.re -is -o 1.c, where -s is for compactness only and does not affect the reasoning):

bool lex(const char* YYCURSOR) {
    const char* YYMARKER;
{
	char yych;
	yych = *YYCURSOR;
	if (yych == 'a') goto yy2;
	++YYCURSOR;
yy1:
	{ return false; }
yy2:
	yych = *(YYMARKER = ++YYCURSOR);
	if (yych != 'b') goto yy1;
	yych = *++YYCURSOR;
	if (yych == 'c') goto yy4;
yy3:
	YYCURSOR = YYMARKER;
	goto yy1;
yy4:
	yych = *++YYCURSOR;
	if (yych != 'd') goto yy3;
	++YYCURSOR;
	{ return true; }
}
}

Which is compiled to very similar "branchy" assembly (g++ -O2 -c 1.c -o 1.o && objdump -d 1.o):

0000000000000000 <_Z3lexPKc>:
   0:	31 c0                	xor    %eax,%eax
   2:	80 3f 61             	cmpb   $0x61,(%rdi)
   5:	74 09                	je     10 <_Z3lexPKc+0x10>
   7:	c3                    	ret
   8:	0f 1f 84 00 00 00 00 	nopl   0x0(%rax,%rax,1)
   f:	00
  10:	80 7f 01 62          	cmpb   $0x62,0x1(%rdi)
  14:	75 f1                	jne    7 <_Z3lexPKc+0x7>
  16:	80 7f 02 63          	cmpb   $0x63,0x2(%rdi)
  1a:	75 eb                	jne    7 <_Z3lexPKc+0x7>
  1c:	80 7f 03 64          	cmpb   $0x64,0x3(%rdi)
  20:	0f 94 c0             	sete   %al
  23:	c3                        	ret

Both GCC and Clang with -O2 generate almost identical code. And they cannot reorder the branches with the reads: this kind of optimization is too unsafe for a compiler to perform (at least in my limited understanding of C++ compilers).

pmetzger · 2022-09-28T14:30:13Z

Precisely. This is why I suggest that using the compiler intrinsics is probably the correct path. clang, gcc, etc. support mostly the same set.

skvadrik · 2022-09-30T06:16:03Z

@pmetzger What intrinsics specifically do you mean? I don't see how an intrinsic can restructure the program and squash the four check-and-branch pieces in the example into one.

pmetzger · 2022-10-01T00:56:34Z

You can play games like the one you're proposing with the use of intrinsics. They're gross and have limited portabilty, but the end user could specify whether they wanted the use of intrinsics or not.

Intrinsics also let you call SIMD instructions directly from generated C code. gcc and clang both support a wide variety of intrinsics. Here, for example, is some explanation of the vector extensions both support.

https://gcc.gnu.org/onlinedocs/gcc-12.2.0/gcc/Vector-Extensions.html

https://clang.llvm.org/docs/LanguageExtensions.html#vectors-and-extended-vectors

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Is there SIMD support? #411

Is there SIMD support? #411

ntysdd commented Aug 8, 2022

skvadrik commented Aug 12, 2022

ntysdd commented Aug 15, 2022

skvadrik commented Sep 18, 2022

pmetzger commented Sep 22, 2022

skvadrik commented Sep 22, 2022

pmetzger commented Sep 28, 2022

skvadrik commented Sep 30, 2022

pmetzger commented Oct 1, 2022

Is there SIMD support? #411

Is there SIMD support? #411

Comments

ntysdd commented Aug 8, 2022

skvadrik commented Aug 12, 2022

ntysdd commented Aug 15, 2022

skvadrik commented Sep 18, 2022

pmetzger commented Sep 22, 2022

skvadrik commented Sep 22, 2022

pmetzger commented Sep 28, 2022

skvadrik commented Sep 30, 2022

pmetzger commented Oct 1, 2022