Issue with saving training onto model checkpoint #84

ImNotOssy · 2024-05-23T17:34:19Z

After training the model for hours on my own data. it seems to break since it can't save the training into a file that doesn't exist. I was using google colab for training,

restoring model from checkpoints/model-800
INFO:tensorflow:Restoring parameters from checkpoints/model-800
Restoring parameters from checkpoints/model-800
2024-05-23 11:58:55.097859: W tensorflow/core/framework/op_kernel.cc:1202] OP_REQUIRES failed at save_restore_tensor.cc:170 : Not found: Unsuccessful TensorSliceReader constructor: Failed to find any matching files for checkpoints/model-800
Traceback (most recent call last):
File "/usr/local/envs/py364/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1361, in _do_call
return fn(*args)
File "/usr/local/envs/py364/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1340, in _run_fn
target_list, status, run_metadata)
File "/usr/local/envs/py364/lib/python3.6/site-packages/tensorflow/python/framework/errors_impl.py", line 516, in exit
c_api.TF_GetCode(self.status.status))
tensorflow.python.framework.errors_impl.NotFoundError: Unsuccessful TensorSliceReader constructor: Failed to find any matching files for checkpoints/model-800
[[Node: save/RestoreV2 = RestoreV2[dtypes=[DT_INT32, DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT, ..., DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT], _device="/job:localhost/replica:0/task:0/device:CPU:0"](_arg_save/Const_0_0, save/RestoreV2/tensor_names, save/RestoreV2/shape_and_slices)]]

ImNotOssy · 2024-05-27T21:26:56Z

fixed it, I migrated from colab to google cloud VM Instance after running out of compute units, and realizing i have $300 in credits to spend. i went all out and got myself a nice VM with blazing training speeds. I was able to get it to work after realizing the minimum step to save was way higher than my total training step. I got it to work but i feel as if my data is way way to small for the model to learn anything. i tried to run it using demo.py but i get an error of a mismatched shape. The error message indicates that the parameter shape is [1, 20, 2] and the flat indices being accessed are [0, 20], which is out of bounds. I'm not sure this is a low data issue used to train the model or something else. No one else seems interested in this project but me, really hard to keep advancing.

monickverma · 2024-06-08T22:04:31Z

hey, im not able to run the project since the links in the READ.md are down, what do i do

ImNotOssy · 2024-06-08T22:16:37Z

hey, im not able to run the project since the links in the READ.md are down, what do i do

I don't understand. What links?

letsgocodego · 2024-08-13T20:07:43Z

After training the model for hours on my own data. it seems to break since it can't save the training into a file that doesn't exist. I was using google colab for training,

restoring model from checkpoints/model-800 INFO:tensorflow:Restoring parameters from checkpoints/model-800 Restoring parameters from checkpoints/model-800 2024-05-23 11:58:55.097859: W tensorflow/core/framework/op_kernel.cc:1202] OP_REQUIRES failed at save_restore_tensor.cc:170 : Not found: Unsuccessful TensorSliceReader constructor: Failed to find any matching files for checkpoints/model-800 Traceback (most recent call last): File "/usr/local/envs/py364/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1361, in _do_call return fn(*args) File "/usr/local/envs/py364/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1340, in _run_fn target_list, status, run_metadata) File "/usr/local/envs/py364/lib/python3.6/site-packages/tensorflow/python/framework/errors_impl.py", line 516, in exit c_api.TF_GetCode(self.status.status)) tensorflow.python.framework.errors_impl.NotFoundError: Unsuccessful TensorSliceReader constructor: Failed to find any matching files for checkpoints/model-800 [[Node: save/RestoreV2 = RestoreV2[dtypes=[DT_INT32, DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT, ..., DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT], _device="/job:localhost/replica:0/task:0/device:CPU:0"](_arg_save/Const_0_0, save/RestoreV2/tensor_names, save/RestoreV2/shape_and_slices)]]

I'm sure this question will come off as excessively elementary, but, how did you use your own data? Did you just use some data from the IAM On-Line Handwriting Database? Did you devise a method that enables you to create a file that looks like this: https://fki.tic.heia-fr.ch/static/iamondb/strokesz.xml but with your own writing sample?

Thank you!

ImNotOssy · 2024-08-13T20:44:29Z

After training the model for hours on my own data. it seems to break since it can't save the training into a file that doesn't exist. I was using google colab for training,
restoring model from checkpoints/model-800 INFO:tensorflow:Restoring parameters from checkpoints/model-800 Restoring parameters from checkpoints/model-800 2024-05-23 11:58:55.097859: W tensorflow/core/framework/op_kernel.cc:1202] OP_REQUIRES failed at save_restore_tensor.cc:170 : Not found: Unsuccessful TensorSliceReader constructor: Failed to find any matching files for checkpoints/model-800 Traceback (most recent call last): File "/usr/local/envs/py364/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1361, in _do_call return fn(*args) File "/usr/local/envs/py364/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1340, in _run_fn target_list, status, run_metadata) File "/usr/local/envs/py364/lib/python3.6/site-packages/tensorflow/python/framework/errors_impl.py", line 516, in exit c_api.TF_GetCode(self.status.status)) tensorflow.python.framework.errors_impl.NotFoundError: Unsuccessful TensorSliceReader constructor: Failed to find any matching files for checkpoints/model-800 [[Node: save/RestoreV2 = RestoreV2[dtypes=[DT_INT32, DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT, ..., DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT], _device="/job:localhost/replica:0/task:0/device:CPU:0"](_arg_save/Const_0_0, save/RestoreV2/tensor_names, save/RestoreV2/shape_and_slices)]]

I'm sure this question will come off as excessively elementary, but, how did you use your own data? Did you just use some data from the IAM On-Line Handwriting Database? Did you devise a method that enables you to create a file that looks like this: https://fki.tic.heia-fr.ch/static/iamondb/strokesz.xml but with your own writing sample?

Thank you!
I used this it worked great
https://github.com/acmattson3/handwriting-data

npulsipher4 · 2024-09-22T22:20:25Z

fixed it, I migrated from colab to google cloud VM Instance after running out of compute units, and realizing i have $300 in credits to spend. i went all out and got myself a nice VM with blazing training speeds. I was able to get it to work after realizing the minimum step to save was way higher than my total training step. I got it to work but i feel as if my data is way way to small for the model to learn anything. i tried to run it using demo.py but i get an error of a mismatched shape. The error message indicates that the parameter shape is [1, 20, 2] and the flat indices being accessed are [0, 20], which is out of bounds. I'm not sure this is a low data issue used to train the model or something else. No one else seems interested in this project but me, really hard to keep advancing.

I'm getting the same error but I'm running it on docker locally. I got to about 2200 steps and then it stopped. Do you think I just need more data?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Issue with saving training onto model checkpoint #84

Issue with saving training onto model checkpoint #84

ImNotOssy commented May 23, 2024

ImNotOssy commented May 27, 2024

monickverma commented Jun 8, 2024

ImNotOssy commented Jun 8, 2024

letsgocodego commented Aug 13, 2024 •

edited

Loading

ImNotOssy commented Aug 13, 2024

npulsipher4 commented Sep 22, 2024

Issue with saving training onto model checkpoint #84

Issue with saving training onto model checkpoint #84

Comments

ImNotOssy commented May 23, 2024

ImNotOssy commented May 27, 2024

monickverma commented Jun 8, 2024

ImNotOssy commented Jun 8, 2024

letsgocodego commented Aug 13, 2024 • edited Loading

ImNotOssy commented Aug 13, 2024

npulsipher4 commented Sep 22, 2024

letsgocodego commented Aug 13, 2024 •

edited

Loading