Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DeepOnet Multiple output #9

Closed
KirillZubov opened this issue Jun 26, 2024 · 10 comments · Fixed by #15
Closed

DeepOnet Multiple output #9

KirillZubov opened this issue Jun 26, 2024 · 10 comments · Fixed by #15
Assignees

Comments

@KirillZubov
Copy link
Member

u = ones(Float32, 10, 10, 5)
v = ones(Float32, 1, 10, 5)
deeponet = DeepONet(; branch = (10, 10, 10), trunk = (1, 10, 10))
ps, st = Lux.setup(Random.default_rng(), deeponet)

y, st_ = deeponet((u, v), ps, st)
@test size(y) == (10, 5)

Now the output only for one feature solution with dropping one dim (1, 10, 5) to (10, 5). It is not dim for multiple output.
It need additional dim (output_dim, 10, 5)
My suggestion and easy way is add linear layer as option, how it has done here src/neural_operators.jl SciML/NeuralPDE.jl@30a5134#diff-3623c72624d6f36cc808e510588bfe6ed4872162fdce6e12b69408856c038c5d

deeponet = DeepONet(; branch = (10, 10, 10), trunk = (1, 10, 10), liner =(10, output_dim))
@avik-pal
Copy link
Member

With linear do you expect linear(y) where y is the output of the dot product, aka dropdims(sum(...)) or you want it applied on sum(...) without collapsing the first dim?

@KirillZubov
Copy link
Member Author

We need sum only for embedding dims - inner feature representation, which is the dot product branch and trunk for DeepONet.
If we use the linear at last layer, we don't need dropping dim with embedding because linear transformation do it.

@avik-pal

embbeding_size = 10

branch = Lux.Chain(
    Lux.Dense(3, 20, Lux.tanh_fast),
    Lux.Dense(20, 20, Lux.tanh_fast),
    Lux.Dense(20, embbeding_size))

a = rand(3, 50,40, 1)
θ, st = Lux.setup(Random.default_rng(), branch)
b_out, st = branch(a, θ, st)

trunk = Lux.Chain(
    Lux.Dense(2, 20, Lux.tanh_fast),
    Lux.Dense(20, 20, Lux.tanh_fast),
    Lux.Dense(20, embbeding_size, Lux.tanh_fast))

a = rand(2, 1, 1, 30)
θ, st = Lux.setup(Random.default_rng(), trunk)
t_out, st = branch(a, θ, st)

out_ = b_out .* t_out;

# julia > size(out_)
# (embbeding_size, 50, 40, 30)

# here we sum embbeding inner representation 
out = sum(out_ , dims = 1)
# julia > size(out)
# (1, 50, 40, 30)


output_size = 3
linear = Lux.Dense(10, output_size)
θ, st = Lux.setup(Random.default_rng(), linear)
l_out, st = linear(out_, θ, st)

#julia > size(l_out)
#(output_size, 50, 40, 30)

ps Option for the layer at the end can be used for many other purposes (for example Fourier feature embeddings an many other) not only for the multi output that I described here as issue. So in generally I think it will be good thing for interface.

@ayushinav
Copy link
Contributor

Hi @KirillZubov,
I'm not sure if the above implementation of DeepONet is completely correct. The branch net takes the input $u$ at different locations, let's say $\vec{x}$, so $u(\vec{x})$ will be defined at multiple location points. Let's assume for now that we have only single output, i.e., $u(\vec{x})$ is a scalar.

In case of multiple outputs, $U = u_1, u_2, u_3,..., u_O$, we can learn the approximation for each of the functions independently from each other, much like what happens in NeuralPDE.jl where cases of multiple functions are learned independently. We pass a vector of chains instead of one.

The input of the branch net is then a vector. For a batch size of nb and m sampled points, the input would be m x nb. For $x \in \R^d$, the input to trunk net would be d x N x nb if we have N points to evaluate the function, the points at which we put sensors.

The output of branch net would then be embedding_size x nb and that of trunk net would be embedding_size x N x nb. We'd then want to do a dot product along the first axis or can approach the same using a matmul operation. The DeepONet would then output the function evaluations at the sensor points, that is, the output would be of the size embedding_size x nb.

This is how I understand the DeepONets work. If this is correct, the out or out_ in the above code are not correct.

@KirillZubov
Copy link
Member Author

KirillZubov commented Jun 28, 2024

I've quite fast written this script above just only for show how to use linear layer. My point here is not about how to calculate the vector inside. It is about support multiple output for DeepOnet. If you think that isn't need here - ok.

Yes we can use batch of NNs for each output independently but also can use one NN for all output. Here example https://github.com/SciML/NeuralPDE.jl/blob/a57b966df8e172f7e0eebf9ec0c2d94b6eaf596e/test/NNODE_tests.jl#L137-L153

DeepOnet is not about what size vectors. It about how to learn operator from sensor and locations information with branch-trunk architecture, other is flexible. It would be better to strive to make an implementation that could work with the widest possible range of extensions and not be strictly limited.
https://scholar.google.com/scholar?hl=en&as_sdt=0%2C5&q=deeponet&btnG=

@KirillZubov
Copy link
Member Author

I can implement this feature myself by the way.

@ayushinav
Copy link
Contributor

ayushinav commented Jun 28, 2024

I see. The linear layer wasn't there in the vanilla DeepONet so I wasn't much aware about it. Vector sizes were just a quick way to check but I got your point.

embbeding_size = 10
batch_size = 8

branch = Lux.Chain(
    Lux.Dense(3, 20, Lux.tanh_fast),
    Lux.Dense(20, 20, Lux.tanh_fast),
    Lux.Dense(20, embbeding_size))

trunk = Lux.Chain(
    Lux.Dense(6, 20, Lux.tanh_fast),
    Lux.Dense(20, 20, Lux.tanh_fast),
    Lux.Dense(20, embbeding_size, Lux.tanh_fast))

u_dims = 4 # 4-dimensional function points
u = rand(3, u_dims, batch_size) # rand(3, 50,40, 1)
θ, st = Lux.setup(Random.default_rng(), branch)
b_out, st = branch(u, θ, st)

size(b_out)
# (10, 4, 8)
# embedding_size x input_dims x batch_size

N = 16 # sensor points
x = rand(6, N, batch_size) # 6-dimensional points
θ, st = Lux.setup(Random.default_rng(), trunk)
t_out, st = branch(x, θ, st)

size(t_out)
# (10, 16, 8)
# embedding_size x sensor_dims x batch_size

out_ = batched_mul(permutedims(b_out, [2,1,3]), t_out)
size(out_)
# (4, 16, 8)
# input_dims x sensor_dims x batch_size

We can probably use the linear layer after this, but if it's not given by the user, we can just do dropdims along the dimensions 1 or 2 if the sizes are 1.

@KirillZubov
Copy link
Member Author

Why do you want use permutedims(b_out, [2,1,3]) it is limited to 3dim output only. It is already fine implement by @avik-pal with https://github.com/LuxDL/LuxNeuralOperators.jl/blob/44c39bb6d39443fb4f1a80bcbd8a28e643b63397/src/deeponet.jl#L130
But all another is fine, additional last layer(better is call it like additional cause it can be any layer not only linear ) as optional. I already implemented it in my tiny DeepONet implementation.
SciML/NeuralPDE.jl@30a5134#diff-3623c72624d6f36cc808e510588bfe6ed4872162fdce6e12b69408856c038c5d

@ayushinav
Copy link
Contributor

Hi,
Maybe I'm missing something out but I don't completely agree with the current implementation (though my above implementation wasn't correct either).
For an eg. case, the following errors

u = rand(Float32, 64, 2, 5) # m x u_dims x nb
y = rand(Float32, 6, 40, 5) # ndims x N x nb


branch = Lux.Chain(
    Lux.Dense(64, 32, Lux.tanh_fast),
    Lux.Dense(32, 32, Lux.tanh_fast),
    Lux.Dense(32, embbeding_size))

trunk = Lux.Chain(
    Lux.Dense(6, 20, Lux.tanh_fast),
    Lux.Dense(20, 20, Lux.tanh_fast),
    Lux.Dense(20, embbeding_size, Lux.tanh_fast))

model = DeepONet(branch, trunk)
ps, st = Lux.setup(Random.default_rng(), model)

first(model((u, y), ps, st))

This works when the broadcasting dimensions are same, which as we see here, won't always be true.

I think I finally got the high dimensional case. We have a function u we want to approximate that outputs a tensor of shape u_size given an input tensor y of shape y_size.

Let's have m sensor points and N points where we want to evaluate it. For a batch size of nb, we have

### branch
u                      =>  b
[m x u_size... x nb]   =>  [p x u_size x nb]

### trunk
y                      =>  t
[N x y_size... x nb]   =>  [N x p x nb]

We leave it to the user to determine how to transform y to a vector of length p through some network. The final output does not depend on the size of y.

t_ = permutedims(t, [2, 1, 3]) # p x N x nb
t_dot = reshape(t_, size(t_,1), ones(eltype(u_size), u_size), size(t_)[2:end])
# p x (1,1,1,...) x N x nb

b_dot = reshape(b, size(b,1), u_size, 1, size(b)[end])
# p x u_size x N x nb

dropdims(sum(t_dot .* p_dot; dims = 1), 1)
# u_size x N x nb

There is also a restriction on the dimensions of the outputs from the trunk, unless it implies that the user should take care of the dimensions of the trunk. While it is reasonable, because we can have any representation of the latent state where we do the dot product, but then the reshape at
https://github.com/LuxDL/LuxNeuralOperators.jl/blob/main/src/deeponet.jl#L129
should not be required, or one of the argchecks at
https://github.com/LuxDL/LuxNeuralOperators.jl/blob/main/src/deeponet.jl#L124
should not have been there. Also, maybe me but this way did not seem much intuitive to me as well.

@KirillZubov
Copy link
Member Author

ok, got it

@avik-pal
Copy link
Member

avik-pal commented Jul 3, 2024

u = rand(Float32, 64, 2, 5) # m x u_dims x nb
y = rand(Float32, 6, 40, 5) # ndims x N x nb

To solve this do it on a case by case basis:

  • If broadcasting is possible then use broadcast and reduce, that should be more performant than doing a permutedims (it forces a copy which is expensive and doubles the memory footprint)

  • If not then permutedims and batched_mul.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
3 participants