-
Notifications
You must be signed in to change notification settings - Fork 5
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add example with CUDA.jl #2
Comments
I let it in this comment. |
I am a bit confused by that code snipped. What is the quantity Also, why is the issue closed? The code is not in the introduction notebook. |
Hi @renatobellotti ! I'm reopening the issue as the example remains to be added in the notebook, indeed.
The kernel The adjoint code then computes all the sensitivities As an example, below is a full computational graph that ends up by computing You can refer to Baydin et al. (2018) for more on automatic differentiation. |
The best place to add this would be https://github.com/EnzymeAD/Enzyme.jl/tree/main/examples Contributions welcome! |
Thanks for the explanations, @luciano-drozda ! I think I understood now what is going on. The following Youtube video helped me to create a simple example: https://www.youtube.com/watch?v=gFfePK44ICk using Enzyme
function example!(x, y, z)
z[1] = 2 * x[1] + y[1]
z[2] = 3 * x[2] + y[2]
z[3] = 4 * x[3] + y[3]
z[4] = 5 * x[4] + y[4]
return nothing
end
x = [1., 2, 3, 4]
y = [5., 6, 7, 8]
z = zeros(4)
for i ∈ 1:4
r = zeros(4)
r[i] = 1.
dz_dx = zeros(length(x));
dx = Duplicated(x, dz_dx)
dy = Const(y)
dz = Duplicated(z, r)
autodiff(example!, Const, dx, dy, dz)
# The shadow copy contains the derivatives dz / dx_i, i. e. the columns of the Jacobian.
@show dx.dval;
# The values contain the actual result of the computations.
# @show dz
end Next we convert this code to a kernel as described in the video at about 26:00: using KernelAbstractions
@kernel function example_kernel(x, y, z)
i = @index(Global)
if(i == 1)
z[i] = 2 * x[i] + y[i]
elseif (i == 2)
z[i] = 3 * x[i] + y[i]
elseif (i == 3)
z[i] = 4 * x[i] + y[i]
elseif (i == 4)
z[i] = 5 * x[i] + y[i]
end
nothing
end
x = [1., 2, 3, 4]
y = [5., 6, 7, 8]
z = zeros(4)
cpu_kernel = example_kernel(CPU(), 4)
event = cpu_kernel(x, y, z, ndrange=4)
wait(event)
@show z; And we use Enzyme to create the autodiff kernel and launch it: using KernelGradients
using CUDAKernels
using CUDA
for i ∈ 1:4
x = cu([1., 2, 3, 4])
y = cu([5., 6, 7, 8])
z = cu(zeros(4))
dz_dx = similar(x)
fill!(dz_dx, 0)
dz_dy = similar(y)
fill!(dz_dy, 0)
r = zeros(4)
r[i] = 1.
r = cu(r)
dx = Duplicated(x, dz_dx)
dy = Duplicated(y, dz_dy)
dz = Duplicated(z, r)
# Code adapted from: https://www.youtube.com/watch?v=gFfePK44ICk (~26:00)
gpu_kernel_autodiff = autodiff(example_kernel(CUDADevice()))
event = gpu_kernel_autodiff(dx, dy, dz, ndrange=4)
wait(event)
@show dx.dval;
end Feel free to use the example in the docs. Perhaps it helps somebody! |
Now I have a question how to combine this with array programming. I know that I can do a simple reduction on the GPU using |
You can use ChainRules.jl to define a rule for your custom kernel using Enzyme and then use Zygote to differentiate through the array operations. |
Thanks for your answer. I still run into problems even with my toy example: using ChainRulesCore
using CUDA
using CUDAKernels
using Enzyme
using KernelAbstractions
using KernelGradients
using Zygote
@kernel function example_kernel(x, y, z)
i = @index(Global)
if(i == 1)
z[i] = 2 * x[i] + y[i]
elseif (i == 2)
z[i] = 3 * x[i] + y[i]
elseif (i == 3)
z[i] = 4 * x[i] + y[i]
elseif (i == 4)
z[i] = 5 * x[i] + y[i]
end
nothing
end
function calculate_z(x, y; Δx=cu(ones(4)))
z = cu(zeros(4))
dz_dx = similar(x)
fill!(dz_dx, 0)
r = Δx
dx = Duplicated(x, dz_dx)
dy = Const(y)
dz = Duplicated(z, r)
gpu_kernel_autodiff = autodiff(example_kernel(CUDADevice()))
event = gpu_kernel_autodiff(dx, dy, dz, ndrange=4)
wait(event)
return dz.val, dx.dval
end
function frule((_, Δx, Δy), ::typeof(calculate_z), x, y)
return calculate_z(x, y, Δx=Δx)
end
function calculate_loss(x, y)
z, dz_dx = calculate_z(x, y)
loss = reduce(+, z)
return loss
end
x = cu([1., 2, 3, 4])
y = cu([5., 6, 7, 8])
# This works:
# calculate_loss(x, y)
# This does not work:
Zygote.gradient(calculate_loss, x, y) The error message below refers to try/catch statements. However, I am not using any. I assume this is somehow related to the way in which I'm calling the kernel? But ChainRules shouldn't even try to go inside that function because I've defined a forward rule. Might there be a problem with the optional argument?
|
You should be defining an |
I replaced the frule with an rrule and get the same error: function ChainRulesCore.rrule(::typeof(calculate_z), x, y, z)
z, ∂z_∂x = calculate_z(x, y)
function calculate_z_pullback(z̄)
f̄ = NoTangent()
x̄ = Tangent(∂z_∂x)
ȳ = NoTangent()
z̄ = Tangent(cu([1., 1, 1, 1]))
return f̄, x̄, ȳ, z̄
end
return z, calculate_z_pullback
end |
You need to define it over |
Thanks for the hint, I adjusted my previous post accordingly. The error persists. |
I wrote complete example: JuliaDiff/ChainRules.jl#665 (comment) Feel free to take it for the docs! |
Could you please add an example with a simple CUDA kernel in the Julia introduction notebook ?
The text was updated successfully, but these errors were encountered: