-
Notifications
You must be signed in to change notification settings - Fork 584
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Issue with saving training onto model checkpoint #84
Comments
fixed it, I migrated from colab to google cloud VM Instance after running out of compute units, and realizing i have $300 in credits to spend. i went all out and got myself a nice VM with blazing training speeds. I was able to get it to work after realizing the minimum step to save was way higher than my total training step. I got it to work but i feel as if my data is way way to small for the model to learn anything. i tried to run it using demo.py but i get an error of a mismatched shape. The error message indicates that the parameter shape is [1, 20, 2] and the flat indices being accessed are [0, 20], which is out of bounds. I'm not sure this is a low data issue used to train the model or something else. No one else seems interested in this project but me, really hard to keep advancing. |
hey, im not able to run the project since the links in the READ.md are down, what do i do |
I don't understand. What links? |
I'm sure this question will come off as excessively elementary, but, how did you use your own data? Did you just use some data from the IAM On-Line Handwriting Database? Did you devise a method that enables you to create a file that looks like this: https://fki.tic.heia-fr.ch/static/iamondb/strokesz.xml but with your own writing sample? Thank you! |
|
I'm getting the same error but I'm running it on docker locally. I got to about 2200 steps and then it stopped. Do you think I just need more data? |
After training the model for hours on my own data. it seems to break since it can't save the training into a file that doesn't exist. I was using google colab for training,
restoring model from checkpoints/model-800
INFO:tensorflow:Restoring parameters from checkpoints/model-800
Restoring parameters from checkpoints/model-800
2024-05-23 11:58:55.097859: W tensorflow/core/framework/op_kernel.cc:1202] OP_REQUIRES failed at save_restore_tensor.cc:170 : Not found: Unsuccessful TensorSliceReader constructor: Failed to find any matching files for checkpoints/model-800
Traceback (most recent call last):
File "/usr/local/envs/py364/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1361, in _do_call
return fn(*args)
File "/usr/local/envs/py364/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1340, in _run_fn
target_list, status, run_metadata)
File "/usr/local/envs/py364/lib/python3.6/site-packages/tensorflow/python/framework/errors_impl.py", line 516, in exit
c_api.TF_GetCode(self.status.status))
tensorflow.python.framework.errors_impl.NotFoundError: Unsuccessful TensorSliceReader constructor: Failed to find any matching files for checkpoints/model-800
[[Node: save/RestoreV2 = RestoreV2[dtypes=[DT_INT32, DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT, ..., DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT], _device="/job:localhost/replica:0/task:0/device:CPU:0"](_arg_save/Const_0_0, save/RestoreV2/tensor_names, save/RestoreV2/shape_and_slices)]]
The text was updated successfully, but these errors were encountered: