Are EOS tokens masked during pre-training? If so, how does FIM mode know how to connect to the `text_after`? #163

zhzhangcc · 2024-05-29T22:02:12Z

As far as we can tell in the fine tuning scripts, and from other issues, because PAD = EOS in this model, and all PAD tokens are masked out, the model is never trained to predict the EOS token.

If the base model is pretrained in this fashion, for FIM, this means that the model should have problems connecting the middle generation to the text_after, and indeed we see in our experiments that in a majority of cases, the model eventually ends up in an infinite loop, outputting the same line of code repeatedly.

However, sometimes the model does accurately predict to stop with an EOS token.

I'm wondering: how can both these behaviors be explained?

How is the masking handled during pretraining for FIM examples? Is the first EOS token unmasked so that the model can learn to predict EOS?
If the answer to the above is no, then how is the model still able to correctly output EOS in some cases when prompted with FIM examples?

The text was updated successfully, but these errors were encountered:

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Are EOS tokens masked during pre-training? If so, how does FIM mode know how to connect to the `text_after`? #163

Are EOS tokens masked during pre-training? If so, how does FIM mode know how to connect to the `text_after`? #163

zhzhangcc commented May 29, 2024

Are EOS tokens masked during pre-training? If so, how does FIM mode know how to connect to the text_after? #163

Are EOS tokens masked during pre-training? If so, how does FIM mode know how to connect to the text_after? #163

Comments

zhzhangcc commented May 29, 2024

Are EOS tokens masked during pre-training? If so, how does FIM mode know how to connect to the `text_after`? #163

Are EOS tokens masked during pre-training? If so, how does FIM mode know how to connect to the `text_after`? #163