-
Notifications
You must be signed in to change notification settings - Fork 26
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Flux integration + Neural Kernel Network #78
Comments
Hi @willtebbutt , the examples works well, thanks ;)
Not all, I'm still extending it. The next main feature is to make GP in GPFlux works as a normal neural network layer ( I'm currently working on it ), which I think may support e.g. Deep Gauss Process ( Neil Lawrence ) and Variational Gauss process ( David Blei ). ( hope I understand the idea of these papers correctly )
For these two functionalities, I think you have provided enough interfaces, it's good. For Flux integration, I personally like the second implementation you provided: dσ², dg = Zygote.gradient(
function(σ², g)
# Manually transform data
gx = ColVecs(g(x.X))
# Construct GP and compute marginal likelihood using transformed data
f = σ² * GP(eq(), GPC())
fx = f(gx, 0.1)
return logpdf(fx, y)
end,
σ², g,
) I think it's more clear and straightforward, this is also the way used in |
Could you elaborate a little on what this will look like?
I actually don't think that the linear combination of kernels thing is the right way to go about implementing the
Good to know -- this approach just works out of the box, so there's literally no need to provide explicit integration in Flux to make this work -- I just need to implement worked examples :) |
I agree on that, in fact, struct NeuralKernelNetwork{T<:Tuple} <: AbstractKernel
layers::T
NeuralKernelNetwork{T}(ls...) where {T} = new{T}(ls)
end
Maybe I don't make myself clear, here I mean most composite kernels can be viewed as special cases of NKN ( linear combination of kernels can also be viewed as a NKN that only has linear layer, weights of the linear layer is the same as coefficients infront of kernels ). In const ProductCompositeKernel = NeuralKernelNetwork{Tuple{Primitive, typeof(allProduct)}}
const AddCompositeKernel = NeuralKernelNetwork{Tuple{Primitive, typeof(allSum)}}
This idea comes from the fact that GP is equivalent to a neural network layer which has infinitely many neurons ( some constraints are needed for the weights of this layer ) ( this is indicated by Neal in 1994, proof can be found here in section 2 ). Give me some time, I will try to provide you an example this week :) |
Good points. I wonder whether there are any performance implications associated with this though... hmmm.
Ah, I see -- this line of work is slightly different from the Deep GP or variational GP stuff, so I'll be interested to see what you come up with :) Its not clear to me how this will play with the usual Flux way of doing this, in particular how it plays with distributions over functions rather than the deterministic objects that Flux works with. |
Hi @willtebbutt , I just finish some initial work on Gaussian process layer we discussed last week, implementation can be found in this notebook on this branch. I strongly suggest you to run this notebook by:
|
Ah I see. Looks like a nice API to aim for -- definitely needs some way to perform inference though as you're currently just sampling from the prior each time the function is evaluated. As regards testing with finite differencing -- you just have to be really careful to use exactly the same seed each time you evaluate the function. In particular, you should deterministically set the seed inside the function. I've added a Stheno.jl version of this proposal here for reference - the point being that Stheno.jl can do all of this stuff with minimal modification. |
Excellent, it's great to know that Based on our previous discussion, I think we both agree that this Flux integration should include:
AFAIK, these functionalities aren't included in Julia community. In Python, Gpytorch supports the first one, and Tensorflow recently has the last one supported in it's extension ( Pyro may support both ). I'd like to have these APIs integrated in |
Excellent. Let's tackle point number 1 first then, as I think it's likely the most straightforward in the sense that there's no real integration to be done, we just need example code and good documentation. Do you agree / what kinds of resources do you think would be helpful to address this? |
I think recover some experiments on deep kernel learning paper is a good starting point, Gpytorch also use it as a demo. I will working on this these days. |
Sounds good. Probably best to start with a small toy dataset where exact inference is tractable. You could also do something with pseudo-points quite straightforwardly -- see Stheno's |
Hi @willtebbutt , I have implemented a simple step function fitting example here, it use a feedforward neural network plus a GP with ARD kernel, I also write a binary classification example that use I noticed |
I have run into a problem when trying to make Stheno's output data type to be using Stheno
X = rand(Float32, 4, 10);
y = rand(Float32, 10);
l = rand(Float32, 4);
σ² = rand(Float32);
kernel = σ²*stretch(EQ(), 1.0f0 ./ l);
K_matrix = pw(kernel, ColVecs(X)) |> eltype # Float32
gp = GP(0.0f0, kernel, GPC());
noisy_prior = gp(ColVecs(X), 0.01f0);
rand(noisy_prior) |> eltype # Float64 !!!
logpdf(noisy_prior, y) |> eltype # Float64 !!! Though I convert all the input parameter type to |
The example looks really good @HamletWantToCode . Will take a proper look once we're satisfied that the type stability issues have been resolved. I'm creating a new |
Right, I would say that the first item has been more-or-less completed. Are you up for getting on with adding the Neural-Kernel-Network Kernel @HamletWantToCode ? |
Yeah, here are my ideas about implementation of Neural-Kernel-Network kernel:
this is an example implementation of parameter redistribution function, hope I have made it clear: function dispatch!(model, xs::AbstractVector)
loc = 1
for p in params(model)
lp = length(p)
x = reshape(xs[loc:loc+lp-1], size(p))
copyto!(p, x)
loc += lp
end
model
end What do you think ? @willtebbutt |
I think this plan sounds very reasonable.
Maybe open a PR with your plan from above and we can discuss further on that? |
Again, didn't mean to close... |
I also consider using it, but there maybe some performance related problem, I'll write something first and then we can discuss it explicitly.
I'll open a PR once I finish it :) |
Great job with the Neural Kernel Network implementation @HamletWantToCode . I noticed that there aren't currently any of the activation function layers implemented. Would you be up for adding the basics quickly before we move on to the third item on the list? |
The reason I don't include activation functions in the PR are:
Personally, I'd like to include the activation functions once more of them are found and when they are proved to be useful. |
Fair enough. I'm happy not to worry about them for the time being -- as you say, we can always add them later if the need arises. |
Relates to this issue.
A MWE an be found in the
examples
directory on this branch. It contains a basic demo ofFlux
model with aStheno
model (in the literal sense of composition)The examples are all packaged nicely in a project, so it should be straightforward to get everything up and running.
The questions now are
@HamletWantToCode what do you think?
The text was updated successfully, but these errors were encountered: