Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

continue training #14

Open
geovedi opened this issue Apr 27, 2017 · 2 comments
Open

continue training #14

geovedi opened this issue Apr 27, 2017 · 2 comments

Comments

@geovedi
Copy link

geovedi commented Apr 27, 2017

I'm planning to train a large dataset that might be too big for my GPU card and thinking to split it into several chunks. Is there any way to continue training? The train command is refusing to start training on existing model.

@odashi
Copy link
Owner

odashi commented May 4, 2017

Hello, thank you for the comment!

The train command is currently supporting only single running, and we need some additional code to resume the training process.
Okay, I'll develop this option soon.

Fortunately, the model directory has enough information to resume training process:

  • the two latest parameter files
  • config.ini
  • previous evaluation scores in training.log (or separated log files 18b5e32).
    If you like to write it yourself, train.cc and decode.cc may help developing code.

@odashi
Copy link
Owner

odashi commented May 4, 2017

Oops, I found some problems to re-load parameters.
The current train command saves the structure of the translation model and its parameters, but doesn't keep the link between trainer and parameters which is required in the model updating.
This is because of some difference of the policy between this tool and the DyNet backend, and I might need several longer time to fix this issue...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants