Skip to content

Are EOS tokens masked during pre-training? If so, how does FIM mode know how to connect to the text_after? #163

@zhzhangcc

Description

@zhzhangcc

As far as we can tell in the fine tuning scripts, and from other issues, because PAD = EOS in this model, and all PAD tokens are masked out, the model is never trained to predict the EOS token.

If the base model is pretrained in this fashion, for FIM, this means that the model should have problems connecting the middle generation to the text_after, and indeed we see in our experiments that in a majority of cases, the model eventually ends up in an infinite loop, outputting the same line of code repeatedly.

However, sometimes the model does accurately predict to stop with an EOS token.

I'm wondering: how can both these behaviors be explained?

  • How is the masking handled during pretraining for FIM examples? Is the first EOS token unmasked so that the model can learn to predict EOS?
  • If the answer to the above is no, then how is the model still able to correctly output EOS in some cases when prompted with FIM examples?

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions