-
Notifications
You must be signed in to change notification settings - Fork 7.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Special JIT support for FFI #14491
base: master
Are you sure you want to change the base?
Special JIT support for FFI #14491
Conversation
@iluuu1994 @nielsdos I would appreciate, if you could take a quick look over this, when you have time. If this is interesting for you - please, share your ideas. This is a very initial PoC yet. It aims to generate optimized native code instead of generic calls to FFI callbacks. There are a lot of not solved questions:
Even in the current state this makes access to FFI Arrays more than 20 times faster. See the following example: <?php
function ary3($n) {
for ($i=0; $i<$n; $i++) {
$X[$i] = $i + 1;
$Y[$i] = 0;
}
for ($k=0; $k<1000; $k++) {
for ($i=$n-1; $i>=0; $i--) {
$Y[$i] += $X[$i];
}
}
$last = $n-1;
print "$Y[0] $Y[$last]\n";
}
function ary3_ffi($n) {
$X = FFI::new("int[$n]");
$Y = FFI::new("int[$n]");
for ($i=0; $i<$n; $i++) {
$X[$i] = $i + 1;
$Y[$i] = 0;
}
for ($k=0; $k<1000; $k++) {
for ($i=$n-1; $i>=0; $i--) {
$Y[$i] += $X[$i];
}
}
$last = $n-1;
print "$Y[0] $Y[$last]\n";
}
/*****/
function gethrtime()
{
$hrtime = hrtime();
return (($hrtime[0]*1000000000 + $hrtime[1]) / 1000000000);
}
function start_test()
{
ob_start();
return gethrtime();
}
function end_test($start, $name)
{
global $total;
$end = gethrtime();
ob_end_clean();
$total += $end-$start;
$num = number_format($end-$start,3);
$pad = str_repeat(" ", 24-strlen($name)-strlen($num));
echo $name.$pad.$num."\n";
ob_start();
return gethrtime();
}
function total()
{
global $total;
$pad = str_repeat("-", 24);
echo $pad."\n";
$num = number_format($total,3);
$pad = str_repeat(" ", 24-strlen("Total")-strlen($num));
echo "Total".$pad.$num."\n";
}
$t0 = $t = start_test();
ary3(200000);
$t = end_test($t, "ary3(200000)");
ary3_ffi(200000);
$t = end_test($t, "ary3_ffi(200000)"); |
I absolutely love the idea of JITting specific functions (like FFI here). It will also allow JITing some function calls completely away in future I hope. I just think that the JIT should expose an API to JIT specific functions rather than the other way round, that extensions expose their internals to the JIT and it needs to be hardcoded in JIT then. That should likely scale better when more extensions find something JIT worthy. |
I like the idea. Extensions in PHP are often wrappers around C libraries, and by adding support for JIT specializations for FFI, it opens the door for creating extension-like functionality within PHP with reasonable overhead. I think that LuaJIT does something similar with their FFI, but it's been a long time since I looked at that. Perhaps there are ideas there that we could use here too. I'm not sure. I agree with Bob's comment, but it also seems like a lot more effort and difficulty (as he already pointed out).
If I understand right, the problem is the following: In normal cases you'd compare the FFI type pointer in the guard, but because they are not persistent the pointers aren't a unique way of identifying the type (e.g. a type allocated later may reuse the same memory address). Furthermore, the type pointer may not always be dereferenced because it could have been freed.
For guards and bounds checks, I suppose this could be solved in a general way if IR itself had range analysis or value set analysis (e.g. as part of SCCP). That would not only benefit FFI but also PHP itself. I see an open PR for SCCP so maybe this "issue" goes away in the future anyway. |
LuaJIT uses this approach, but we will have to serialize IDs across several workers and probably keep the types forever |
I also like the idea. This would reduce the amount of C code in use-cases such as Niels mentioned, which is a good thing.
At a minimum this requires a mapping from type structures to IDs, so that IDs are stable across workers and subsequent requests? The size of the associated storage may be manageable if IDs are only used by JIT and are only allocated when a type is JITed, because then the mapping has the same lifetime as the JIT buffer, and also grows at the same time as the JIT buffer.
Agreed. I was looking at range analysis earlier this year, and will continue working on this topic (range analysis) soon (unless someone else does it first - I don't want to block progress), so I will check if this can have an impact here. |
yes.
I'm not sure if we can "persist" some CType during JIT-ing, because we will need to update all CData objects of this type.
Luajit achieves good code through loop-peeling. It repeats loop body two times and removes all redundant code in the second copy using folding rules (common subexpression elimination, load forwarding, guard elimination, etc) |
Indeed. I was thinking about something like this:
This handles future instances, but this doesn't account for other existing instances in the same request, or existing instances of other workers that will get to execute the JITed code. Maybe we can have a special exit that fetches the id? This is starting to get complicated though. |
At present, the low performance of FFI\CData calculations and other operations is caused by conversion to PHP types and magic calls. If the value is simply assigned to CData, its performance is not inferior. So these problems can be avoided with good coding. The other thing is to avoid frequent type conversions by manipulating symbol overloads. I don't think it's a good idea to get a little bit of acceleration through JIT, and it would make the FFI API ugly |
Right. This is what JIT is doing to do.
|
|
I don't understand what do you propose. $x = FFI::new("int[42]");
$y = $x[$i]; This PR translates the last line into 3 machine instructions movq 0x60(%r14), %rcx ; load Z_OBJ_P() from $x zval
movq 0x40(%rcx), %rcx ; load CData->ptr (start of the array)
movl (%rcx, %rax, 4), %ecx ; load element of the array (%rax contains value of $i) How can you make this better with |
I want to optimize the I'm not against JIT improving performance, but I'm against changing the API to be less good because of the need to optimize performance. It is still necessary to make sure that PHP can write elegant and concise code. |
What kind of API changes do you mean? This PR doesn't change anything visible to PHP programmers. |
The following PR is related to FFI JIT ? |
Not at all. I don't like it, and I think your last RFC may be a good solution. |
a40ccea
to
a75bec6
Compare
34f7504
to
b6b4f34
Compare
No description provided.