Layer Normalization #4

Aktsvigun · 2021-07-08T09:30:39Z

Hi,
thanks for a great implementation!

I wanted to clarify one thing that mismatches with the code, proposed in the article itself. In your code, you pre-normalize inputs, so that they are passed through LayerNorm before FFT. In the code, presented in the article, they have:

class FNetEncoderBlock ( nn . Module ) :
30 f o u r i e r _ l a y e r : Fou rie rT ran sfo rmLa ye r
31 f f _ l a y e r : FeedForwardLayer
32
33 @nn. compact
34 def _ _ c a l l _ _ ( s e l f , x , d e t e r m i n i s t i c ) :
35 m i x i n g _ o ut p ut = s e l f . f o u r i e r _ l a y e r ( x )
36 x = nn . LayerNorm (1 e−12 , name=" mixing_laye r_no rm " ) ( x + &
m i x i n g _ o ut p ut )
37 fe ed _fo rw a rd _o utp ut = s e l f . f f _ l a y e r ( x , d e t e r m i n i s t i c )
38 r e t u r n nn . LayerNorm (
39 1e−12 , name=" output_la ye r_no rm " ) ( x + fee d_fo rwa rd _outp ut )

which in my view is done in the opposite order.
Am I mistaken or is it indeed a bug?

The text was updated successfully, but these errors were encountered:

Aktsvigun · 2021-07-08T09:32:21Z

I see this code is damaged. Here is the image (A.5 in the paper):

Aktsvigun · 2021-07-08T09:36:38Z

A similar question regards dropout in the FeedForward layer. You have it added twice, while in the paper they add it only in the end:

erksch · 2021-07-25T12:57:41Z

@Aktsvigun you can checkout our repo https://github.com/erksch/fnet-pytorch. We reimplemented the architecture precisely to such a degree that we can even use the official checkpoints (converted from Jax to PyTorch).

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Layer Normalization #4

Layer Normalization #4

Aktsvigun commented Jul 8, 2021

Aktsvigun commented Jul 8, 2021

Aktsvigun commented Jul 8, 2021

erksch commented Jul 25, 2021

Layer Normalization #4

Layer Normalization #4

Comments

Aktsvigun commented Jul 8, 2021

Aktsvigun commented Jul 8, 2021

Aktsvigun commented Jul 8, 2021

erksch commented Jul 25, 2021