You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Hi, thanks for your repo: helps a lot!
In the paper weight matrix is shared between the two embedding layers and the pre-softmax linear transformation.
"In our model, we share the same weight matrix between the two embedding layers and the pre-softmax
linear transformation, similar to [30]. " (Page 5, Chapter 3.4 Embeddings and Softmax)
Would it be correct to modify in transformer_model.py the following rows to something like this:
rows 32-33 -> self.src_embedding = self.trg_embedding = Embedding(src_vocab_size, model_dimension)
row 50 -> self.decoder_generator = DecoderGenerator(self.src_embedding.embeddings_table.weight)
row 221 -> def init(self, shared_embedding_weights):
row 224 -> self.linear = nn.Linear(shared_embedding_weights.size()[1], shared_embedding_weights.size()[0], bias=False)
del self.linear.weight
self.shared_embedding_weights = shared_embedding_weights
row 232 -> self.linear.weight = self.shared_embedding_weights
row 233 -> return self.log_softmax(self.linear(trg_representations_batch) * math.sqrt(self.shared_embedding_weights.size()[1]))
The text was updated successfully, but these errors were encountered:
Hi, thanks for your repo: helps a lot!
In the paper weight matrix is shared between the two embedding layers and the pre-softmax linear transformation.
"In our model, we share the same weight matrix between the two embedding layers and the pre-softmax
linear transformation, similar to [30]. " (Page 5, Chapter 3.4 Embeddings and Softmax)
Would it be correct to modify in transformer_model.py the following rows to something like this:
rows 32-33 -> self.src_embedding = self.trg_embedding = Embedding(src_vocab_size, model_dimension)
row 50 -> self.decoder_generator = DecoderGenerator(self.src_embedding.embeddings_table.weight)
row 221 -> def init(self, shared_embedding_weights):
row 224 -> self.linear = nn.Linear(shared_embedding_weights.size()[1], shared_embedding_weights.size()[0], bias=False)
del self.linear.weight
self.shared_embedding_weights = shared_embedding_weights
row 232 -> self.linear.weight = self.shared_embedding_weights
row 233 -> return self.log_softmax(self.linear(trg_representations_batch) * math.sqrt(self.shared_embedding_weights.size()[1]))
The text was updated successfully, but these errors were encountered: