Skip to content

Training problem  #15

@DrYangLiu

Description

@DrYangLiu

@ConnorJL Thanks for the great work.

Unfortunately, I found out my training using OpenWebTextCorpus is too slow even for 117M model. The cross entropy loss function decreases rapidly before 10k steps using a batch size of 64. After that it stayed around 3.0. Is this a known phenomenon or is it a dataset problem? I found the loss function in model_fns is not shifted. It should be loss_batch = tf.nn.sparse_softmax_cross_entropy_with_logits(logits=output["logits"][:, :-1],labels=features[:, 1:]) , am I right?

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions