Special JIT support for FFI #14491

dstogov · 2024-06-06T21:46:01Z

No description provided.

dstogov · 2024-06-06T22:16:03Z

@iluuu1994 @nielsdos I would appreciate, if you could take a quick look over this, when you have time. If this is interesting for you - please, share your ideas.

This is a very initial PoC yet. It aims to generate optimized native code instead of generic calls to FFI callbacks. There are a lot of not solved questions:

JIT should generate FFI type guards (check that the FFI\CData type is the same as during trace recording and compilation)
most FFI types are not persistent. I didn't find an efficient way to implement FFI\CData type guards yet.
FFI array bounds checks are not implemented
It would be great to store pointers to FFI\CData in CPU registers and "unbox" the temporary FFI\CData objects
Guards, bounds checks and CData pointers loads should be moved out of the loops
Access to FFI structures and unions fields is not implemented
Access to FFI variables
Native code to call FFI functions
Native wrappers for FFI callbacks

Even in the current state this makes access to FFI Arrays more than 20 times faster. See the following example:

<?php
function ary3($n) {
  for ($i=0; $i<$n; $i++) {
    $X[$i] = $i + 1;
    $Y[$i] = 0;
  }
  for ($k=0; $k<1000; $k++) {
    for ($i=$n-1; $i>=0; $i--) {
      $Y[$i] += $X[$i];
    }
  }
  $last = $n-1;
  print "$Y[0] $Y[$last]\n";
}

function ary3_ffi($n) {
  $X = FFI::new("int[$n]");
  $Y = FFI::new("int[$n]");
  for ($i=0; $i<$n; $i++) {
    $X[$i] = $i + 1;
    $Y[$i] = 0;
  }
  for ($k=0; $k<1000; $k++) {
    for ($i=$n-1; $i>=0; $i--) {
      $Y[$i] += $X[$i];
    }
  }
  $last = $n-1;
  print "$Y[0] $Y[$last]\n";
}

/*****/

function gethrtime()
{
  $hrtime = hrtime();
  return (($hrtime[0]*1000000000 + $hrtime[1]) / 1000000000);
}

function start_test()
{
  ob_start();
  return gethrtime();
}

function end_test($start, $name)
{
  global $total;
  $end = gethrtime();
  ob_end_clean();
  $total += $end-$start;
  $num = number_format($end-$start,3);
  $pad = str_repeat(" ", 24-strlen($name)-strlen($num));

  echo $name.$pad.$num."\n";
  ob_start();
  return gethrtime();
}

function total()
{
  global $total;
  $pad = str_repeat("-", 24);
  echo $pad."\n";
  $num = number_format($total,3);
  $pad = str_repeat(" ", 24-strlen("Total")-strlen($num));
  echo "Total".$pad.$num."\n";
}

$t0 = $t = start_test();
ary3(200000);
$t = end_test($t, "ary3(200000)");
ary3_ffi(200000);
$t = end_test($t, "ary3_ffi(200000)");

bwoebi · 2024-06-07T11:08:10Z

I absolutely love the idea of JITting specific functions (like FFI here). It will also allow JITing some function calls completely away in future I hope.

I just think that the JIT should expose an API to JIT specific functions rather than the other way round, that extensions expose their internals to the JIT and it needs to be hardcoded in JIT then. That should likely scale better when more extensions find something JIT worthy.
I.e. the code doing the JITting of the FFI functions and operator overloads should live in ext/ffi.
I'm okay with not doing that right away, but I feel like JIT should become separate from opcache and have a proper public API eventually...

nielsdos · 2024-06-07T15:58:46Z

I like the idea. Extensions in PHP are often wrappers around C libraries, and by adding support for JIT specializations for FFI, it opens the door for creating extension-like functionality within PHP with reasonable overhead.

I think that LuaJIT does something similar with their FFI, but it's been a long time since I looked at that. Perhaps there are ideas there that we could use here too. I'm not sure.

I agree with Bob's comment, but it also seems like a lot more effort and difficulty (as he already pointed out).

most FFI types are not persistent. I didn't find an efficient way to implement FFI\CData type guards yet.

If I understand right, the problem is the following: In normal cases you'd compare the FFI type pointer in the guard, but because they are not persistent the pointers aren't a unique way of identifying the type (e.g. a type allocated later may reuse the same memory address). Furthermore, the type pointer may not always be dereferenced because it could have been freed.
Maybe this could be solved by giving each FFI type a unique ID that is never reused, and then compare against that ID in the guard. The ID could be created by a simple counter. I'm not sure.

Guards, bounds checks and CData pointers loads should be moved out of the loops

For guards and bounds checks, I suppose this could be solved in a general way if IR itself had range analysis or value set analysis (e.g. as part of SCCP). That would not only benefit FFI but also PHP itself. I see an open PR for SCCP so maybe this "issue" goes away in the future anyway.

dstogov · 2024-06-10T09:58:45Z

Maybe this could be solved by giving each FFI type a unique ID that is never reused, and then compare against that ID in the guard. The ID could be created by a simple counter. I'm not sure.

LuaJIT uses this approach, but we will have to serialize IDs across several workers and probably keep the types forever

arnaud-lb · 2024-06-10T18:00:38Z

I also like the idea. This would reduce the amount of C code in use-cases such as Niels mentioned, which is a good thing.

Maybe this could be solved by giving each FFI type a unique ID that is never reused, and then compare against that ID in the guard. The ID could be created by a simple counter. I'm not sure.

LuaJIT uses this approach, but we will have to serialize IDs across several workers and probably keep the types forever

At a minimum this requires a mapping from type structures to IDs, so that IDs are stable across workers and subsequent requests?

The size of the associated storage may be manageable if IDs are only used by JIT and are only allocated when a type is JITed, because then the mapping has the same lifetime as the JIT buffer, and also grows at the same time as the JIT buffer.

Guards, bounds checks and CData pointers loads should be moved out of the loops

For guards and bounds checks, I suppose this could be solved in a general way if IR itself had range analysis or value set analysis (e.g. as part of SCCP). That would not only benefit FFI but also PHP itself. I see an open PR for SCCP so maybe this "issue" goes away in the future anyway.

Agreed. I was looking at range analysis earlier this year, and will continue working on this topic (range analysis) soon (unless someone else does it first - I don't want to block progress), so I will check if this can have an impact here.

dstogov · 2024-06-10T18:32:54Z

At a minimum this requires a mapping from type structures to IDs, so that IDs are stable across workers and subsequent requests?

yes.

The size of the associated storage may be manageable if IDs are only used by JIT and are only allocated when a type is JITed, because then the mapping has the same lifetime as the JIT buffer, and also grows at the same time as the JIT buffer.

I'm not sure if we can "persist" some CType during JIT-ing, because we will need to update all CData objects of this type.

Guards, bounds checks and CData pointers loads should be moved out of the loops

For guards and bounds checks, I suppose this could be solved in a general way if IR itself had range analysis or value set analysis (e.g. as part of SCCP). That would not only benefit FFI but also PHP itself. I see an open PR for SCCP so maybe this "issue" goes away in the future anyway.

Agreed. I was looking at range analysis earlier this year, and will continue working on this topic (range analysis) soon (unless someone else does it first - I don't want to block progress), so I will check if this can have an impact here.

Luajit achieves good code through loop-peeling. It repeats loop body two times and removes all redundant code in the second copy using folding rules (common subexpression elimination, load forwarding, guard elimination, etc)

arnaud-lb · 2024-06-11T10:42:59Z

I'm not sure if we can "persist" some CType during JIT-ing, because we will need to update all CData objects of this type.

Indeed. I was thinking about something like this:

get_id(ctype):
    if ctype.id:
        return ctype.id
    if mapping[ctype]:
        return ctype.id := mapping[ctype]
    return ctype.id := mapping[ctype] := next_id()

This handles future instances, but this doesn't account for other existing instances in the same request, or existing instances of other workers that will get to execute the JITed code.

Maybe we can have a special exit that fetches the id? This is starting to get complicated though.

ext/opcache/ZendAccelerator.c

chopins · 2024-06-24T08:43:53Z

At present, the low performance of FFI\CData calculations and other operations is caused by conversion to PHP types and magic calls. If the value is simply assigned to CData, its performance is not inferior. So these problems can be avoided with good coding. The other thing is to avoid frequent type conversions by manipulating symbol overloads. I don't think it's a good idea to get a little bit of acceleration through JIT, and it would make the FFI API ugly

dstogov · 2024-06-24T08:54:05Z

At present, the low performance of FFI\CData calculations and other operations is caused by conversion to PHP types and magic calls. If the value is simply assigned to CData, its performance is not inferior.

Right. This is what JIT is doing to do.

I don't think it's a good idea to get a little bit of acceleration through JIT, and it would make the FFI API ugly

The current PoC shows 20 times speedup (see the example at the top).
This PR doesn't change PHP ext/ffi API at all.

chopins · 2024-06-24T09:51:30Z

Isn't it better to use class handles of do_operation. similar to GMP .
As discussed above, the FFI type needs to be clarified, so it is necessary to require access to the FFI type through an instance. I don't recommend accessing the FFI API through an instance.
FFI is not enable by default, so JIT may not be available

dstogov · 2024-06-24T10:10:07Z

Isn't it better to use class handles of do_operation. similar to GMP .

I don't understand what do you propose.
See the following PHP code:

$x = FFI::new("int[42]");
$y = $x[$i];

This PR translates the last line into 3 machine instructions

movq 0x60(%r14), %rcx       ; load Z_OBJ_P() from $x zval
movq 0x40(%rcx), %rcx       ; load CData->ptr (start of the array)
movl (%rcx, %rax, 4), %ecx  ; load element of the array (%rax contains value of $i)

How can you make this better with do_operation?

chopins · 2024-06-25T03:33:43Z

I want to optimize the zend_ffi_cdata_do_operation() function, but there is no good way to match the CData array.

I'm not against JIT improving performance, but I'm against changing the API to be less good because of the need to optimize performance. It is still necessary to make sure that PHP can write elegant and concise code.

dstogov · 2024-06-25T05:58:32Z

I'm not against JIT improving performance, but I'm against changing the API to be less good because of the need to optimize performance. It is still necessary to make sure that PHP can write elegant and concise code.

What kind of API changes do you mean? This PR doesn't change anything visible to PHP programmers.

chopins · 2024-06-25T06:33:52Z

The following PR is related to FFI JIT ?
4acf008#commitcomment-143451098

dstogov · 2024-06-25T06:58:32Z

The following PR is related to FFI JIT ?
4acf008#commitcomment-143451098

Not at all. I don't like it, and I think your last RFC may be a good solution.

github-actions bot added Extension: ffi Extension: opcache labels Jun 6, 2024

dstogov force-pushed the jit_ffi branch from 39774e1 to 6f9f0e1 Compare June 10, 2024 11:21

github-actions bot added the Category: Engine label Jun 10, 2024

dstogov force-pushed the jit_ffi branch from 6f9f0e1 to dd19210 Compare June 14, 2024 08:45

arnaud-lb reviewed Jun 14, 2024

View reviewed changes

ext/opcache/ZendAccelerator.c Show resolved Hide resolved

dstogov force-pushed the jit_ffi branch from 6b021e4 to b569fca Compare June 17, 2024 12:50

dstogov referenced this pull request Jun 24, 2024

Deprecate calling FFI::cast(), FFI::new(), and FFI::type() statically

4acf008

dstogov force-pushed the jit_ffi branch from 45f1f50 to e4f49d7 Compare June 24, 2024 09:03

dstogov force-pushed the jit_ffi branch 3 times, most recently from a40ccea to a75bec6 Compare July 3, 2024 22:17

dstogov force-pushed the jit_ffi branch 2 times, most recently from 34f7504 to b6b4f34 Compare July 26, 2024 10:22

github-actions bot added the Category: Build System label Jul 26, 2024

dstogov added 29 commits November 12, 2024 13:42

Fix FFI caching

7d74961

JIT for FFI::addr()

5c4d35d

Passing values to FFI functions won't change their types

889f742

Eliminate some FFI related type guards

c1e1c7d

Fix build without FFI

b9ca7fe

Fix incorrect type info

2b38120

Fix arguments cleanup

680093c

JIR for FFI::string()

6239554

JIT for FFI::typeof() and FFI:isNull()

fb12e79

ws

05fe582

Improve FFI type compatibility checks

340f025

Fix FFI scope reference counting during FFI call

e8be30b

Improve type inference for ZEND_FETCH_OBJ/DIM_FUNC_ARG

2e1b530

Add IS_OBJECT type assertions

8efd450

cleanup

af29d2e

cleanup

14a2983

JIT for FFI::memcpy(), FFI::memcmp() and FFI::memset()

a88ec2a

Add test for FETCH_DIM_W/VAR

26071ee

JIT for FFI::sizeof() and FFI::alignof()

0f89693

JIT for FETCH_DIM_W/VAR with FFI::CData op1

989c156

Record FFI::CTypes

0beee15

Add test for FFI::type()

388b9e2

JIT for FFI::type()

7dc7aea

JIT for FFI::new() with single argument

53bf1b1

JIT for FFI::new() with 2 and 3 arguments

6232403

JIT for FFI::free()

cdd2493

JIT for FFI::cast()

1cd5f13

Eliminate useless guards

976608c

Properly restore FFI_G(persistent)

8dedec5

dstogov force-pushed the jit_ffi branch from 20fe30d to 8dedec5 Compare November 12, 2024 10:42

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Special JIT support for FFI #14491

Special JIT support for FFI #14491

dstogov commented Jun 6, 2024

dstogov commented Jun 6, 2024

bwoebi commented Jun 7, 2024

nielsdos commented Jun 7, 2024 •

edited

Loading

dstogov commented Jun 10, 2024

arnaud-lb commented Jun 10, 2024

dstogov commented Jun 10, 2024

arnaud-lb commented Jun 11, 2024

chopins commented Jun 24, 2024

dstogov commented Jun 24, 2024

chopins commented Jun 24, 2024

dstogov commented Jun 24, 2024

chopins commented Jun 25, 2024

dstogov commented Jun 25, 2024

chopins commented Jun 25, 2024

dstogov commented Jun 25, 2024

Special JIT support for FFI #14491

Are you sure you want to change the base?

Special JIT support for FFI #14491

Conversation

dstogov commented Jun 6, 2024

dstogov commented Jun 6, 2024

bwoebi commented Jun 7, 2024

nielsdos commented Jun 7, 2024 • edited Loading

dstogov commented Jun 10, 2024

arnaud-lb commented Jun 10, 2024

dstogov commented Jun 10, 2024

arnaud-lb commented Jun 11, 2024

chopins commented Jun 24, 2024

dstogov commented Jun 24, 2024

chopins commented Jun 24, 2024

dstogov commented Jun 24, 2024

chopins commented Jun 25, 2024

dstogov commented Jun 25, 2024

chopins commented Jun 25, 2024

dstogov commented Jun 25, 2024

nielsdos commented Jun 7, 2024 •

edited

Loading