sharing weight matrix between the two embedding layers and the pre-softmax linear transformation #7

nataly-obr · 2022-01-19T17:27:48Z

Hi, thanks for your repo: helps a lot!
In the paper weight matrix is shared between the two embedding layers and the pre-softmax linear transformation.
"In our model, we share the same weight matrix between the two embedding layers and the pre-softmax
linear transformation, similar to [30]. " (Page 5, Chapter 3.4 Embeddings and Softmax)
Would it be correct to modify in transformer_model.py the following rows to something like this:
rows 32-33 -> self.src_embedding = self.trg_embedding = Embedding(src_vocab_size, model_dimension)
row 50 -> self.decoder_generator = DecoderGenerator(self.src_embedding.embeddings_table.weight)
row 221 -> def init(self, shared_embedding_weights):
row 224 -> self.linear = nn.Linear(shared_embedding_weights.size()[1], shared_embedding_weights.size()[0], bias=False)
del self.linear.weight
self.shared_embedding_weights = shared_embedding_weights
row 232 -> self.linear.weight = self.shared_embedding_weights
row 233 -> return self.log_softmax(self.linear(trg_representations_batch) * math.sqrt(self.shared_embedding_weights.size()[1]))

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

sharing weight matrix between the two embedding layers and the pre-softmax linear transformation #7

sharing weight matrix between the two embedding layers and the pre-softmax linear transformation #7

nataly-obr commented Jan 19, 2022

sharing weight matrix between the two embedding layers and the pre-softmax linear transformation #7

sharing weight matrix between the two embedding layers and the pre-softmax linear transformation #7

Comments

nataly-obr commented Jan 19, 2022