Training problem 

@ConnorJL Thanks for the great work. 

Unfortunately, I found out my training using OpenWebTextCorpus is too slow even for 117M model.  The cross entropy loss function decreases rapidly before 10k steps using a batch size of 64. After that it stayed around 3.0. Is this a known phenomenon or is it a dataset problem? I found the loss function in model_fns is not shifted. It should be loss_batch = tf.nn.sparse_softmax_cross_entropy_with_logits(logits=output["logits"][:, :-1],labels=features[:, 1:])  , am I right?                                                    

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Training problem #15

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Training problem #15

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions