-
Notifications
You must be signed in to change notification settings - Fork 41
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Model results #27
Comments
Thanks for sharing the results and your fork repo. I assume these examples are custom handwriting not from the IAM/CVL dataset. Well, It seems from the results that the model does perform worse for the in-the-wild scenarios. Can you share here a zip file of the style examples used in the above example, so that I can test it on my machine, and then confirm if anything is missing? |
of course! here's a zip of 30 example 32x192 pixel word pngs in style1 and style2 really appreciate the help btw! I've got the start of a web app where you take a picture of a page of your writing, and then it uses OCR to cut and scale it all into these png files ready to feed into the model. My plan is to export the model as an ONNX and use onnx-runtime web to do the generation in the browser itself... If I can get some cool results locally first! 😁 |
So I've been playing about with it a bunch more and I've found that
which suggests that the model is very sensitive to the exact resolution / scaling / thresholding of the original dataset and doesn't handle anything being upscaled/downscaled differently to exactly how the source data was. Do you reckon it's worth training it further with a bunch of slightly differently rotated/scaled data or do you reckon there's something else going wrong for me entirely? 😁 |
Sorry for the late reply. I suppose you are correct, but I am unsure whether training with differently rotated/scaled data would be beneficial. Also, doing so might make the training unstable. I haven't been able to test the results of your examples yet. It's been a busy week. I will give it a try over the weekend. |
I tried the model with your style1 samples. The results I got look not bad. You can have a look at how I preprocessed the style examples in the Handwriting-Transformers/data/dataset.py Line 45 in f79913e
Also, I added a notebook file demo_custom_handwriting.ipynb. Here you just need to input the cropped word images of the custom handwriting. Images do NOT need to be scaled, resized, or padded. I tried to find out why your results are poor. I think the preprocessing function, especially the minimum area cropping, is different in your case. I also found out that |
Thanks so much for this! I think adding the It's still quite hit and miss with other samples and seems sensitive to how the image is pre processed though, but I think that's to be expected. I'm experimenting with iam vs cvl model and different pre-processing of images to find what seems to give the optimal results - preliminarily it seems some thresholding improves things greatly, but going for a full binary black or white thresholding makes things worse again, and leaving all the noise of the white page background is worst. Can share some images if you'd be interested! |
Nice! During training, we maintain a fixed receptive field of 16 for each character. So, to get optimal results try to resize the style images to [16*len(string)x32]. For example, a 4-character word 'door' should be of dimension [64x32]. This way can reduce the domain gap further. |
Here is my zip file, thank you! |
Hi!
I've been playing around with this model locally following the instructions in the README and my results don't seem to be nearly as good as yours.. I'm following your instructions here #11 (comment) and then running the prepare.py in my fork
For instance even with different style prompts the model seems to generate very similar results for me
Real on left, Generated on right
IAM style 1
IAM style 2
and then secondly the CVL and the IAM models give very different results to each other. But also quite consistent results within the model itself for different styles
CVL style 1
CVL style2
Is there something stupid I'm missing or do I need to train it with these writers in the dataset to get better results? does the google drive contain the fully trained models that were used to generate the results in the paper?
Very cool project though - congrats!!
The text was updated successfully, but these errors were encountered: