-
Notifications
You must be signed in to change notification settings - Fork 65
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add tests for precompilation support #1549
base: main
Are you sure you want to change the base?
Conversation
Benchmark Results
Benchmark PlotsA plot of the benchmark results have been uploaded as an artifact to the workflow run for this PR. |
@vchuravy just curious -- if we use the inlineabi does it crash? [since then it becomes an llvmcall -- tho ofc all the constants need to be remapped] |
Does past this simple test |
Okay awesome, we should fix the generated stuff to be relocatable, but this at least unblocks the next part. Which I presume is needing to rewrite all the constant int to ptr's to be some relocatable address? How does Julia do that normally atm |
Also relatedly, can we use the inlineabi stuff on a temporary function to force Enzyme precompilation to compile all the interal utilities used by Enzyme [as otherwise presently the first use of an autodiff is quite slow compiling the inside of Enzyme] |
Yes, but sadly it crashes:
The biggest issue is
There are two stages here.
The former is "just" normal Julia datastructures and as long as we don't cache pointers or constanst out of thin air, we should be fine. The biggest offender here is Once we can handle that, the question arises for output of Enzyme and that is a bigger and more dificult problem. |
So the big problem with worldage [which iw ould love to swap off of], is that you can't use a methodinstance in a val. This means that our generated code will be much more costly otherwise. Could we somehow make a methodinstance as legal within a val? |
autodiff(Reverse, mul, Active, Active(1.0), Active(2.0)) | ||
autodiff(Forward, mul, Duplicated, Duplicated(1.0, 1.0), Const(2.0)) | ||
autodiff(ReverseMode{false,InlineABI,false}(), mul, Active, Active(1.0), Active(2.0)) | ||
# Non-Inline mode uses `@generated` functions and poisons the caller |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
inline mode also uses generated functions, but it uses an llvmcall not a call to custom jit
No we can't pass an MI to a generated function. What I think the system should look like is something like this:
Julia can find the MI of a primal signature in about [12ns-80ns https://github.com/JuliaGPU/GPUCompiler.jl/pull/553#issuecomment-1989281301], and IDDict lookup adds another 12ns and a linear search is often under 4ns. This is very similar to the cost of Julia dynamic dispatch, with the only optimization missing that Julia https://github.com/JuliaLang/julia/blob/696d9c3e691a613af9067c9a5d47558881354a55/src/gf.c#L3202 https://github.com/JuliaLang/julia/blob/696d9c3e691a613af9067c9a5d47558881354a55/src/gf.c#L3295 |
Is it possible to hook into Julia’s AI itself to be able to resolve the definition here. Or perhaps maybe something something opaque closure. Given that we get issues opened for “100x slowdown” when the gradient function has a dynamic dispatch I’m worried of the impact on downstream folks |
I don't particularly care about the performance argument here, for me this is first and foremost a correctness issue. Once we have a correct implementation we can evaluate performance (instead of speculating) and see if there is anything we can do. Enzyme world-age handling is incorrect, it treats a range as a singular point and it's a runtime value, not a codegen constant one. Long-term proper compiler plugin support in Julia may fix that, like the idea I sketched out in #1443. Inside the abstract interpreter we can correctly compute these properties. But that work stalled with me focusing on my thesis, so it is unlikely to land for 1.12 |
Maybe we can start by adding another "ABI" here which doesn't use the generated function so we can more easily test this out and A/B test things? |
This not part ABI call. This is before to lookup the thunk that we need to call, IIRC. I would appreciate if you could take a stab at this. The code has grown quite complex over the years and I don't have the bandwidth right now to untangle it. |
Yeah I'll take a stab at it late next week. And ABI is just a placeholder type in the mode, so my thought is to overload autodiff with it to change going into the thunk [and thus not call the generated one]. That way we have both [and at any point we can change DeaultABI to point to whatever we want] |
This currently crashes (Julia 1.11-beta2) with:
This is due to the
@generated function thunk
that illegally capturesWorld
and returns a constant inferredThunk(fptr)
wherefptr
is only valid within the current session.