You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I'm planning to train a large dataset that might be too big for my GPU card and thinking to split it into several chunks. Is there any way to continue training? The train command is refusing to start training on existing model.
The text was updated successfully, but these errors were encountered:
The train command is currently supporting only single running, and we need some additional code to resume the training process.
Okay, I'll develop this option soon.
Fortunately, the model directory has enough information to resume training process:
the two latest parameter files
config.ini
previous evaluation scores in training.log (or separated log files 18b5e32).
If you like to write it yourself, train.cc and decode.cc may help developing code.
Oops, I found some problems to re-load parameters.
The current train command saves the structure of the translation model and its parameters, but doesn't keep the link between trainer and parameters which is required in the model updating.
This is because of some difference of the policy between this tool and the DyNet backend, and I might need several longer time to fix this issue...
I'm planning to train a large dataset that might be too big for my GPU card and thinking to split it into several chunks. Is there any way to continue training? The
train
command is refusing to start training on existing model.The text was updated successfully, but these errors were encountered: