why attend over the <end> token? #3

homelifes · 2020-06-28T06:37:50Z

Hi @sgrvinod
in the xe train function:

predicted_sequences = model(source_sequences, target_sequences, source_sequence_lengths, target_sequence_lengths) # (N, max_target_sequence_pad_length_this_batch, vocab_size)

The target_sequence_lengths still includes the lengths with the <end> token, and in this case in MultiHead Attention it will be attending over the <end> token.

I think it should be: target_sequence_lengths - 1
predicted_sequences = model(source_sequences, target_sequences, source_sequence_lengths, target_sequence_lengths - 1) # (N, max_target_sequence_pad_length_this_batch, vocab_size)

Please clarify

The text was updated successfully, but these errors were encountered:

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

why attend over the <end> token? #3

why attend over the <end> token? #3

homelifes commented Jun 28, 2020 •

edited

Loading

why attend over the <end> token? #3

why attend over the <end> token? #3

Comments

homelifes commented Jun 28, 2020 • edited Loading

homelifes commented Jun 28, 2020 •

edited

Loading