Skip to content
This repository has been archived by the owner on Feb 12, 2022. It is now read-only.

Training loop cleaning and commenting for better understanding and further improvements #90

Open
wants to merge 47 commits into
base: master
Choose a base branch
from

Conversation

ngarneau
Copy link

Hey guys,

I want to add some specific behaviors (like conditional language modeling) to the AWD-LSTM model and to do so I needed to fully understand the training loop as well as the model.

To this end, I re-arranged a couple of parts within the code, simplifying the training loop with callbacks provided by an open source lib that makes this easily affordable.

I also plan to add detailed logging using a specific lib for this (I had Sacred in mind, maybe you would suggest other?) as suggested by the team of AllenNLP (see this). This is mainly for proper debugging and reproducibility.

If you think there should be things done differently please let me know and I will change accordingly.

Here is a list of the callbacks implemented:

  • Initialize the hidden state on_epoch_begin.
  • Repackage the hidden state on_batch_begin.
  • Adaptative LR on_batch_begin depending on the sequence length.
  • Evaluation callback when switched to ASGD (did not quite fully understood this one, I need more explanation of the why we need this) that clones the parameters on_epoch_begin/end. This callback may not be working as expected for the moment.
  • Switch to ASGD using the non-mono trigger on_epoch_begin.

I also did a coupled of changes within the model:

  • Moved the hidden state carrying and handling within the model.
  • Moved the criterion within the model as well as the computation of the Activation regularization and the temporal activation regularization for the final loss.
  • Added accuracy of the model for information purpose.

Feel free to integrate these changes into your codebase. I think it simplifies a lot the understanding of the training loop and the code that will support further development.

Many thanks for the implementation!

@salesforce-cla
Copy link

Thanks for the contribution! Before we can merge this, we need @ngarneau to sign the Salesforce.com Contributor License Agreement.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant