Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Are EOS tokens masked during pre-training? If so, how does FIM mode know how to connect to the text_after? #163

Open
zhzhangcc opened this issue May 29, 2024 · 0 comments

Comments

@zhzhangcc
Copy link

As far as we can tell in the fine tuning scripts, and from other issues, because PAD = EOS in this model, and all PAD tokens are masked out, the model is never trained to predict the EOS token.

If the base model is pretrained in this fashion, for FIM, this means that the model should have problems connecting the middle generation to the text_after, and indeed we see in our experiments that in a majority of cases, the model eventually ends up in an infinite loop, outputting the same line of code repeatedly.

However, sometimes the model does accurately predict to stop with an EOS token.

I'm wondering: how can both these behaviors be explained?

  • How is the masking handled during pretraining for FIM examples? Is the first EOS token unmasked so that the model can learn to predict EOS?
  • If the answer to the above is no, then how is the model still able to correctly output EOS in some cases when prompted with FIM examples?
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant