AbstractModel type & neural kernel network #94

HamletWantToCode · 2020-03-06T15:40:44Z

Overview

This PR contains neural kernel network and additional functionalities, it's an enhanced version of PR #92. While adding neural kernel network, this PR is intended to maintain Stheno's current APIs, and it also gives the 2nd & 3rd point we discussed here a try.

A quick summary of new functionalities:

abstract_model.jl:
- AbstractModel: base type for all parametric types
- parameters: function used to extract parameters within a model to a 1D array
- dispatch!: function used to redistribute parameters to the model
- extract_gradient: function used to extracted gradients from the NamedTuple returned by Zygote.jl
neural_network/basic.jl:
- LinearLayer: linear transformation with positive weights
- ProductLayer: element-wise kernel multiplication
- Chain: chain linear layer and product layer together to build a sum-product network
gp/neural_kernel_network.jl:
- Primitive: kernel container
- Neural_Kernel_Network

Usage

using Stheno
using Zygote

logl = log(0.4)
logσ² = log(1.3)

## kernels without constraints on it's parameters
# k = σ² * stretch(Matern52(), 1 / l)
## kernel with constraints on it's parameters
k = scale(stretch(Matern52(), logl, x->exp(-x)), logσ², x->exp(x))

gp = GP(k, GPC())

# extract parameters from GP model
ps = parameters(gp)

# assign new parameters back to GP model
new_ps = ps .+ randn(length(ps))
dispatch!(gp, new_ps)

const x = randn(100)
σ²_n = 0.05
gp_x = gp(x, σ²_n)
const y = rand(gp_x)
gs, = gradient(f->logpdf(f(x, σ²_n), y), gp)
grads = extract_gradient(gp, gs)
@assert length(grads) == length(ps)

example of neural kernel network can be find here.

To be discussed

Necessity for adding AbstractModel: adding this type will introduce a tree structure for our model, makes it easy for us to extract and redistribute the parameters. parameters & dispatch! functions are based on it, these two functions facilitate us using both Flux's stochastic optimizer and Optim.jl's traditional optimizer. When dealing with realistic problems, our model may contains a large number of parameters, manually writing unpack and repack function each time may be annoying.
For neural kernel network, use Flux's NN v.s. Develop our own: since type of layer can be used for neural kernel network is limited, I suggest to maintain a minimum neural network module specifically for neural kernel network feature.

All tests passed on my laptop ( macOS, julia 1.3.0 )

adds examples

merge with previous modification

merge from origin

merge remote

merge nkn kernel

…n kernels.jl

fix indentation

codecov · 2020-03-07T01:18:16Z

Codecov Report

Merging #94 into master will decrease coverage by 6.27%.
The diff coverage is 59.61%.

@@            Coverage Diff             @@
##           master      #94      +/-   ##
==========================================
- Coverage   88.55%   82.27%   -6.28%     
==========================================
  Files          24       27       +3     
  Lines         690      801     +111     
==========================================
+ Hits          611      659      +48     
- Misses         79      142      +63

Impacted Files	Coverage Δ
src/composite/compose.jl	`61.53% <ø> (ø)`	⬆️
src/Stheno.jl	`100% <ø> (ø)`	⬆️
src/composite/composite_gp.jl	`92.59% <0%> (-7.41%)`	⬇️
src/util/parameter_handler.jl	`0% <0%> (ø)`
src/gp/gp.jl	`88.23% <0%> (-11.77%)`	⬇️
src/gp/mean.jl	`85.71% <60%> (-14.29%)`	⬇️
src/gp/neural_kernel_network.jl	`61.9% <61.9%> (ø)`
src/neural_network/basic.jl	`76.47% <76.47%> (ø)`
src/gp/kernel.jl	`92.89% <84.21%> (-6.44%)`	⬇️
... and 2 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 0c2299b...38c5786. Read the comment docs.

HamletWantToCode · 2020-03-10T08:52:11Z

Hi @willtebbutt , just wondering your idea on this PR, if we are comfortable with these design, I'll move on.

willtebbutt · 2020-03-10T11:10:54Z

Hi @HamletWantToCode . Just getting around to looking at this now after being busy last week.

Firstly, thanks for putting so much work into this contribution.

Before I dive into reviewing specifics, I wonder if you could explain why Flux's @functor isn't sufficient for all of the parameter handling stuff that you've added?

See eg. here (the Layer Helpers bit) and here.

HamletWantToCode · 2020-03-10T14:29:22Z

Hi @willtebbutt , thanks for the reply, hope I don't disrupt you.

Here is what I think:

Since Flux is build for neural network, it's @functor:

Requires type of the trainable parameter inside a struct to be AbstractVecOrMat. While currently, many kernel parameters in Stheno are scalar ( e.g. PerEQ's l & RQ's α ). Therefore, using @functor requires us to change some of Stheno's currently implementation.
Doesn't flatten parameters to an AbstractVector, instead, It extracts parameters to a Zygote.IdSet, one still need to write additional code to do flattening job and assemble gradients. This inconvenience will stand out when we use optimizers in Optim.jl, since they require all parameters to be included in a 1d array. Typically in Gaussian process, the size of dataset isn't that large, we can use BFGS method in Optim.jl to do optimization since it typically has better convergence properties than, e.g., the ADAM optimizer.

The newly added parameters function solves these problems:

It allows parameters inside a struct to be scalar, vector or matrix, therefore, we don't need to change Stheno's internal implementation. We only need to add child & get_iparam interface for Stheno's current GPs and kernels. In my opinion, this won't result in any side effect. ( Note: in this PR, the type of kernel parameters is temporarily changed to AbstractVecOrMat, this has nothing to do with parameters method, it's only due to current requirement of dispatch! method )
It extracts all parameters into a 1d array, which allows us to use Optim.jl seamlessly, and Flux's optimizers also work in this case.

willtebbutt · 2020-03-10T15:10:38Z

No problem at all. I just took longer than I would have liked to take a look at the work, and didn't want to give you the impression I'm not enthusiastic about what you're doing (I'm very excited about it!)

Your explanation seems very reasonable to me, and I think that I agree with the overall principle.

A couple of initial comments:

I think it would be better to avoid adding a new AbstractModel type entirely. We could have a system that just requires a model to implement an interface that we specify, as opposed to imposing the constraint of a model having to sub-type a particular model. This means that code that was written without this kind of parameter handling in mind can be extended to handle it without having to be re-written. Am I missing anything that means we really need a new type here?
Furthermore, could we modify the default behaviour slightly? In particular, it would be really nice if types that don't have any children or parameters don't have to have anything implemented. So instead of the fallback implementations of child and iparams erroring, could they return empty tuples?

HamletWantToCode · 2020-03-10T16:06:12Z

I totally agree with your idea.

avoid adding a new AbstractModel type entirely

Thanks for pointing this out, I reread my code, AbstractModel seems to be redundant, parameters, dispatch! & extract_gradient still work after get rid of it. ( I will verify this later )

instead of the fallback implementations of child and iparams erroring, could they return empty tuples?

Yes, they currently just return empty tuples, you can run following code to check:

eq = EQ()   # EQ kernel don't have any parameters
Stheno.child(eq)  # will return (), a empty tuple
Stheno.get_iparam(eq)  # will return 0-element Array{Union{},1}

willtebbutt · 2020-03-10T18:27:43Z

Thanks for pointing this out, I reread my code, AbstractModel seems to be redundant, parameters, dispatch! & extract_gradient still work after get rid of it. ( I will verify this later )

Great!

Yes, they currently just return empty tuples, you can run following code to check:

Ah, yes, I totally believe that's a thing. I would have preferred not to have to implement anything for EQ at all though. In the current implementation, EQ implements

get_iparam(::EQ) = Union{}[]
child(::EQ) = ()

It shouldn't be necessary to write this code at all!

HamletWantToCode · 2020-03-13T17:02:54Z

Hi @willtebbutt:

I think it would be better to avoid adding a new AbstractModel type entirely.

AbstractModel type is removed.

I would have preferred not to have to implement anything for EQ at all though.

Now you don't need to implement child and get_iparam method for EQ like kernel.

Also add tests to NKN as we discussed in PR #95 .

willtebbutt · 2020-03-26T19:00:15Z

@HamletWantToCode what are your plans for this PR now?

HamletWantToCode · 2020-03-27T03:21:56Z

Like our previous discussion, I still think it's better for Stheno to have a parameters function, I’d like to continue modifying these APIs till they could be merged the master.

parameters should work for all parametric types in Stheno, now it’s able to deal with things inside gp folder, and I was considering to make it also work for things in composite folder. I need some time to figure out how composite GP work in Stheno, also I saw this here :

combine GPPP and pseudo-point approximations to do interesting things. This is a WIP -- it doesn't work properly yet.

I was also working on my thesis these days, and would like to use Stheno’s composite GP in my work :)

willtebbutt · 2020-04-07T11:05:48Z

Sorry for the slow response -- I've been trying to figure out how I want things to proceed.

On the one hand, I agree that Flux's API isn't completely ideal for Stheno.jl, in particular the need to always make parameters heap-allocated is very frustrating.

On the other I'm very reluctant to roll our own interface that's specific to this package because one of the main selling points of the package is its interoperability with other packages in the ecosystem. While I'm sure something could be devised that could be made to work with other packages, it's unclear how straightforwardly this would apply in general. Moreover, KernelFunctions.jl is being designed to play nicely with the Flux API.

On the whole, I think its best that we go with the Flux API because, from a performance perspective, it's not too bad.

That being said, there's nothing to stop us creating a couple of types that help us with transforming input variables, but that themselves play nicely with Flux. We could open a separate issue to discuss that.

I was also working on my thesis these days, and would like to use Stheno’s composite GP in my work :)

Fantastic! Do you specifically need the pseudo-point stuff to work?

HamletWantToCode · 2020-04-10T16:32:20Z

Hi @willtebbutt , it took me some time to digest your comment ;)

Totally agree with your idea on emphasizing the "interoperability with other packages in the ecosystem", and I think it's also better to determine which package should be included in that ecosystem, this will help us figure out how to modify the current API and make it clear what functionality should be added in the future.

Also, there are some questions

I agree that Flux's API isn't completely ideal for Stheno.jl, in particular the need to always make parameters heap-allocated is very frustrating.

Could you explain how this works, because someone told me @code_warntype can be used to determine whether parameters in a struct is heap-allocated or not ( when output of @code_warntype contains information that highlight in red, it means those variables are heap-allocated ), I use it on Flux layers and params functions, and find out that most of them are stack-allocated.

Moreover, KernelFunctions.jl is being designed to play nicely with the Flux API.

I play with KernelFunctions a little bit, it's good to see a package focusing on kernels. However, it seems that currently KernelFunctions and Stheno can't play with each other ( though you've introduce BaseKernel ), kernels in KernelFunctions.jl don't accept ColVecs, and Stheno's GP doesn't accept those kernels.

there's nothing to stop us creating a couple of types that help us with transforming input variables, but that themselves play nicely with Flux

While Stheno and KernelFunctions have nice APIs for input transformation ( e.g. Stretched, Scaled ), I think it's also important for us to pay attention to kernel hyperparameters, such as scaling factor and length scale, their values need more or less be constrained during the optimization. Therefore, besides types performing input transformation, we may also need types that set constraints to hyperparameters.

Do you specifically need the pseudo-point stuff to work?

Since my dataset isn't that large ( 2K~5K ), I think it's OK to use exact gp. I may use additive GP model and multi-output GP model, and I would like Stheno to work on these cases.

HamletWantToCode · 2020-04-10T16:35:58Z

let's close this PR for now, we can always come back when we need.

willtebbutt · 2020-04-11T21:41:59Z

Could you explain how this works, because someone told me @code_warntype can be used to determine whether parameters in a struct is heap-allocated or not ( when output of @code_warntype contains information that highlight in red, it means those variables are heap-allocated ), I use it on Flux layers and params functions, and find out that most of them are stack-allocated.

This isn't correct. @code_warntype shows you what types the compiler has inferred, and highlights when the compiler has not been able to infer a concrete type for a particular variable. That is, the can help you identify type instabilities, but they don't tell you anything more about stack vs heap. In our case I was just referring to the fact that you have to make all scalar values into length 1 vectors to make them work with Flux's parameter handling API. This is okay from a performance perspective because this tends not to be a bottleneck. It is, however, really ugly.

I play with KernelFunctions a little bit, it's good to see a package focusing on kernels. However, it seems that currently KernelFunctions and Stheno can't play with each other ( though you've introduce BaseKernel ), kernels in KernelFunctions.jl don't accept ColVecs, and Stheno's GP doesn't accept those kernels.

Yeah, this is currently the case. I've been holding off on switching Stheno over to KernelFunctions until it's at a level of maturity that is equivalent to Stheno, and we've settled on an API for KernelFunctions. I suspect a couple of things will wind up being named slightly differently to Stheno's conventions, but I'm anticipating it being broadly similar eventually.

willtebbutt and others added 30 commits February 21, 2020 16:06

Basic GP examples

1bf561d

Relax version requirement

aa412c9

Complete plotting basics

331605e

Document examples

2841ba8

Demonstrate approximate inference with Titsias

e5bedfe

Docuemntation

bbccdab

Furhter docs improvements

40a8725

More docs and the process decomposition example

64a3a59

More docs, more examples

b2bd74e

Sensor fusion

c2d8d1d

Tweak docs

5c172bc

More docs and more examples

73b87f4

More examples, more docs

731654a

WIP on GPPP + Pseudo-Points

7b6b0df

add fnn example

8f32481

add classification example

d7e70b6

correct some syntax

dabddaf

Merge branch 'example-revamp'

76804ab

adds examples

correct indentation

8f3d599

correct indentation

d0bd5f8

add readme entries on new examples

7bcf9e6

Merge branch 'master' of github.com:HamletWantToCode/Stheno.jl

c35f523

merge with previous modification

Merge branch 'master' of https://github.com/willtebbutt/Stheno.jl

4531354

merge from origin

add neural kernel network

84bc524

Merge branch 'master' of https://github.com/willtebbutt/Stheno.jl

0f0b96e

merge remote

Merge branch 'nkn_kernel'

e9afe7b

merge nkn kernel

correct indentation

07f4291

Update neural_kernel_network.jl

7f6bb36

update, fix NKN's ew method, modify parameter's type of some kernel i…

3e1c2d6

…n kernels.jl

update

fe77f08

HamletWantToCode and others added 9 commits March 5, 2020 10:52

update

582277d

add child & get_iparam to composite_gp

fba284b

add annotations

bc05b77

Update composite_gp.jl

0b46ba2

Update kernel.jl

a3a40ef

fix indentation

Update basic.jl

3250164

fix indentation

Update abstract_model.jl

9d14353

fix indentation

fix bug

cf39c97

fix bug

3c6ef99

HamletWantToCode mentioned this pull request Mar 10, 2020

Neural Kernel Network implements #92

Closed

2 tasks

HamletWantToCode mentioned this pull request Mar 13, 2020

Original NKN kernel ( Flux version ) #95

Merged

HamletWantToCode added 2 commits March 13, 2020 23:57

remove AbstractModel, add tests for NKN

cecd7b5

update

c487460

update

38c5786

HamletWantToCode closed this Apr 10, 2020

HamletWantToCode mentioned this pull request Apr 10, 2020

Flux compatible types that transform inputs #104

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

AbstractModel type & neural kernel network #94

AbstractModel type & neural kernel network #94

HamletWantToCode commented Mar 6, 2020 •

edited

Loading

codecov bot commented Mar 7, 2020 •

edited

Loading

HamletWantToCode commented Mar 10, 2020

willtebbutt commented Mar 10, 2020 •

edited

Loading

HamletWantToCode commented Mar 10, 2020

willtebbutt commented Mar 10, 2020

HamletWantToCode commented Mar 10, 2020

willtebbutt commented Mar 10, 2020

HamletWantToCode commented Mar 13, 2020

willtebbutt commented Mar 26, 2020

HamletWantToCode commented Mar 27, 2020

willtebbutt commented Apr 7, 2020

HamletWantToCode commented Apr 10, 2020

HamletWantToCode commented Apr 10, 2020 •

edited

Loading

willtebbutt commented Apr 11, 2020

AbstractModel type & neural kernel network #94

AbstractModel type & neural kernel network #94

Conversation

HamletWantToCode commented Mar 6, 2020 • edited Loading

Overview

A quick summary of new functionalities:

Usage

To be discussed

codecov bot commented Mar 7, 2020 • edited Loading

Codecov Report

HamletWantToCode commented Mar 10, 2020

willtebbutt commented Mar 10, 2020 • edited Loading

HamletWantToCode commented Mar 10, 2020

willtebbutt commented Mar 10, 2020

HamletWantToCode commented Mar 10, 2020

willtebbutt commented Mar 10, 2020

HamletWantToCode commented Mar 13, 2020

willtebbutt commented Mar 26, 2020

HamletWantToCode commented Mar 27, 2020

willtebbutt commented Apr 7, 2020

HamletWantToCode commented Apr 10, 2020

HamletWantToCode commented Apr 10, 2020 • edited Loading

willtebbutt commented Apr 11, 2020

HamletWantToCode commented Mar 6, 2020 •

edited

Loading

codecov bot commented Mar 7, 2020 •

edited

Loading

willtebbutt commented Mar 10, 2020 •

edited

Loading

HamletWantToCode commented Apr 10, 2020 •

edited

Loading