You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
As far as we can tell in the fine tuning scripts, and from other issues, because PAD = EOS in this model, and all PAD tokens are masked out, the model is never trained to predict the EOS token.
If the base model is pretrained in this fashion, for FIM, this means that the model should have problems connecting the middle generation to the text_after, and indeed we see in our experiments that in a majority of cases, the model eventually ends up in an infinite loop, outputting the same line of code repeatedly.
However, sometimes the model does accurately predict to stop with an EOS token.
I'm wondering: how can both these behaviors be explained?
How is the masking handled during pretraining for FIM examples? Is the first EOS token unmasked so that the model can learn to predict EOS?
If the answer to the above is no, then how is the model still able to correctly output EOS in some cases when prompted with FIM examples?
The text was updated successfully, but these errors were encountered:
As far as we can tell in the fine tuning scripts, and from other issues, because PAD = EOS in this model, and all PAD tokens are masked out, the model is never trained to predict the EOS token.
If the base model is pretrained in this fashion, for FIM, this means that the model should have problems connecting the middle generation to the
text_after
, and indeed we see in our experiments that in a majority of cases, the model eventually ends up in an infinite loop, outputting the same line of code repeatedly.However, sometimes the model does accurately predict to stop with an EOS token.
I'm wondering: how can both these behaviors be explained?
The text was updated successfully, but these errors were encountered: