Model results #27

ewan-m · 2024-02-09T17:56:20Z

Hi!

I've been playing around with this model locally following the instructions in the README and my results don't seem to be nearly as good as yours.. I'm following your instructions here #11 (comment) and then running the prepare.py in my fork

For instance even with different style prompts the model seems to generate very similar results for me
Real on left, Generated on right

IAM style 1

IAM style 2

and then secondly the CVL and the IAM models give very different results to each other. But also quite consistent results within the model itself for different styles

CVL style 1

CVL style2

Is there something stupid I'm missing or do I need to train it with these writers in the dataset to get better results? does the google drive contain the fully trained models that were used to generate the results in the paper?

Very cool project though - congrats!!

ankanbhunia · 2024-02-09T19:43:39Z

Thanks for sharing the results and your fork repo. I assume these examples are custom handwriting not from the IAM/CVL dataset.

Well, It seems from the results that the model does perform worse for the in-the-wild scenarios. Can you share here a zip file of the style examples used in the above example, so that I can test it on my machine, and then confirm if anything is missing?

ewan-m · 2024-02-09T20:30:10Z

of course! here's a zip of 30 example 32x192 pixel word pngs in style1 and style2

styles.zip

really appreciate the help btw!

I've got the start of a web app where you take a picture of a page of your writing, and then it uses OCR to cut and scale it all into these png files ready to feed into the model. My plan is to export the model as an ONNX and use onnx-runtime web to do the generation in the browser itself... If I can get some cool results locally first! 😁

ewan-m · 2024-02-13T01:46:50Z

So I've been playing about with it a bunch more and I've found that

running the jupyter notebook with no changes to get some baseline nice results
then screenshotting the random style example words as a png and cutting/scaling them all to 32 x 192
then running those back through the model
leads to wildly different results!

which suggests that the model is very sensitive to the exact resolution / scaling / thresholding of the original dataset and doesn't handle anything being upscaled/downscaled differently to exactly how the source data was.

Do you reckon it's worth training it further with a bunch of slightly differently rotated/scaled data or do you reckon there's something else going wrong for me entirely? 😁

ankanbhunia · 2024-02-15T09:58:07Z

Sorry for the late reply.

I suppose you are correct, but I am unsure whether training with differently rotated/scaled data would be beneficial. Also, doing so might make the training unstable.

I haven't been able to test the results of your examples yet. It's been a busy week. I will give it a try over the weekend.

ankanbhunia · 2024-02-16T17:51:26Z

@ewan-m,

I tried the model with your style1 samples. The results I got look not bad.

You can have a look at how I preprocessed the style examples in the load_itw_samples(.) function. Here, I use a minimum boundary area crop followed by a resize/padding operation.

Handwriting-Transformers/data/dataset.py

Line 45 in f79913e

def load_itw_samples(folder_path, num_samples = 15):

Also, I added a notebook file demo_custom_handwriting.ipynb. Here you just need to input the cropped word images of the custom handwriting. Images do NOT need to be scaled, resized, or padded. load_itw_samples(.) will take care of them.

I tried to find out why your results are poor. I think the preprocessing function, especially the minimum area cropping, is different in your case. I also found out that model.eval() was not called in the previous demo.ipynb file. That might have caused issues when inputting images outside the training corpus.

ewan-m · 2024-02-18T23:07:31Z

Thanks so much for this! I think adding the model.eval() is making some big differences and not having to care about resizing and padding is great too! I can confirm I reproduce the results you've shared above 😁

It's still quite hit and miss with other samples and seems sensitive to how the image is pre processed though, but I think that's to be expected.

I'm experimenting with iam vs cvl model and different pre-processing of images to find what seems to give the optimal results - preliminarily it seems some thresholding improves things greatly, but going for a full binary black or white thresholding makes things worse again, and leaving all the noise of the white page background is worst. Can share some images if you'd be interested!

ankanbhunia · 2024-02-19T11:29:54Z

Nice!

During training, we maintain a fixed receptive field of 16 for each character. So, to get optimal results try to resize the style images to [16*len(string)x32]. For example, a 4-character word 'door' should be of dimension [64x32]. This way can reduce the domain gap further.

shuangzhen361 · 2024-05-01T02:05:14Z

Hi，Why can’t I produce good results despite trying many methods?

shuangzhen361 · 2024-05-01T14:02:42Z

Here is my zip file, thank you!
image.zip

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Model results #27

Model results #27

ewan-m commented Feb 9, 2024 •

edited

Loading

ankanbhunia commented Feb 9, 2024

ewan-m commented Feb 9, 2024 •

edited

Loading

ewan-m commented Feb 13, 2024

ankanbhunia commented Feb 15, 2024

ankanbhunia commented Feb 16, 2024

ewan-m commented Feb 18, 2024 •

edited

Loading

ankanbhunia commented Feb 19, 2024

shuangzhen361 commented May 1, 2024

shuangzhen361 commented May 1, 2024

Model results #27

Model results #27

Comments

ewan-m commented Feb 9, 2024 • edited Loading

ankanbhunia commented Feb 9, 2024

ewan-m commented Feb 9, 2024 • edited Loading

ewan-m commented Feb 13, 2024

ankanbhunia commented Feb 15, 2024

ankanbhunia commented Feb 16, 2024

ewan-m commented Feb 18, 2024 • edited Loading

ankanbhunia commented Feb 19, 2024

shuangzhen361 commented May 1, 2024

shuangzhen361 commented May 1, 2024

ewan-m commented Feb 9, 2024 •

edited

Loading

ewan-m commented Feb 9, 2024 •

edited

Loading

ewan-m commented Feb 18, 2024 •

edited

Loading