how to improve performance? #146

TITC · 2022-05-05T03:44:27Z

TITC
May 5, 2022
Collaborator

This discussion is more close to an endless answer's question.

Recap

Here is some related information I have found, contains two issues #27 (comment), #44 (comment) about BLEU and one discussion about LR schedule and optimizer.

Experiment

I have added 0.35 million Wikipedia data to the training set, so there are 0.5 million in total. After that, I create a wandb sweep.

panel1 date&bleu

panel2 hyperparamer influence

panel3 fully connected layer style

panel4 token acc

panel5 val_bleu

I have added more data and tried different hyperparameters which include lr schedule and optimizer, even the number of encoder layers and decoder layers, see panel3 fully connected layer style for more detail. But nothing gets better. What can I do to achieve higher performance?

5 days before, I reckon more data and hyper-parameter searching will work, but now the result is frustrating.

lukas-blecher · 2022-05-05T10:46:34Z

lukas-blecher
May 5, 2022
Maintainer

One thing I noticed that I don't know what it means is the batchsize parameter in the fully connected diagram. The values reported are not realistic. What is the meaning of that? Also, the batch size shouldn't have such an impact on the performance of the model.

I don't believe that the optimizer should be one of the first things to swap out. I could be wrong but the sentiment in ML seems to be that Adam or AdamW are the way to go. Also min/max_height/width shouldn't be parameters to change in the sweep because that means every model trains on a different dataset. Similarly, the temperature should be equal for all runs of the sweep so that the evaluation metrics can be compared with each other.

Instead you could play around with the backbone_layers in number of layers and depth.

I also don't understand the range of gamma. I'm using it for the StepLR scheduler and there it is used as a multiplicative for the lr (not sure what happens in your code for other schedulers), hence the value should be <1 if you want the LR to decrease.

The metrics have a high variance, what value for valbatches did you use? Could also be connected to the temperature field as well.

Lastly, you can try to train for more iterations for the best hyper parameters. I've trained one model for 300k steps and it still improved somewhat in the last half.

21 replies

TITC May 31, 2022
Collaborator Author

Wow!

rainyl May 31, 2022

What kind of other changes did you make? What config did you use?

TLDR: ConvNext encoder + transformer decoder

Actually, I made a new project to build and test the model, including rewrite the dataset class and dataloader using torch.data, regenerate images, then I modified the model, removed the dependency of x_transformers.

the convnext's dims are [64, 128, 256, 512], and depths are [3, 3, 9, 3], the transformer decoder's depth is 4 with dim=512, implemented by Pytorch's TransformerDecoder.

Did you pad all images to the same size?

No, only the images in the same batch will be padding to the same size, the convnext encoder can process images with different size, but with a minimum shape of 1*32*32 in my configuration.

I am still trying to improve the performance, and I will open a PR to this repo recently :)

oakmoos Dec 15, 2023

Hello, I'd like to inquire about the last part. Could you please tell me how long you trained for? We trained for 30 epochs without loading any checkpoint files, and the accuracy remained at 0.3 without further improvement (using the default parameters from the config file). Should we adjust the training parameters or increase the number of training epochs?

rainyl Dec 15, 2023

@oakmoos Which model did you use? My ConvNext encoder + transformer decoder or vit encoder + transformer decoder by Lukas?
If the former, maybe you could get more details at https://github.com/MODCT/Celery-LaTex-OCR
The above repo was open-sourced by us and you can get a ready-to-use app written in PySide6, totally inference locally with onnxruntime.

oakmoos Dec 15, 2023

@rainyl I have only trained using Lukas' model, but I'm also amazed by the results achieved with your model! I don't have much experience in training models. I used a self-generated dataset and, based on Lukas' provided weight.pth file, I was able to achieve an accuracy of over 0.85 after 10 epochs. Now, I'm exploring how much training it would take from scratch without using checkpoints to achieve a similar result. However, the results of my attempts so far haven't been too promising.I will proceed to learn more about your methodology, May I kindly ask for your guidance regarding relevant experiences?

rainyl · 2022-05-22T02:27:37Z

rainyl
May 22, 2022

if you have enough gpu resources, you can use some hyperparameter optimization algorithm such as random search or Bayesian Model-Based optimization, there are some packages for python, such as hyperopt, more info here

2 replies

TITC May 22, 2022
Collaborator Author

Thanks for your suggestion. If you look closely, you will see that this is what I have already done in this discussion.

I have added 0.35 million Wikipedia data to the training set, so there are 0.5 million in total. After that, I create a wandb sweep.

But it's kinda time consume and won't early stop. So I choose manually to shrink to a reasonable range and then use those tools.

rainyl May 22, 2022

great! sorry for that, i am not familiar with wandb... I am also trying to find better hyper parameters or better model structure, but have little progress :(

rainyl · 2022-10-11T07:48:21Z

rainyl
Oct 11, 2022

not much, I am trying to construct a new model, only about 0.89 bleu4 for now, still finding better hyper parameters Yuhang Tao ***@***.***> 于2022年5月29日周日 22:38写道：

…

No, how about you? Best wishes Yuhang Tao On Sun, May 29, 2022 at 5:46 PM rainyl ***@***.***> wrote: > so, have you achieved higher scores? > > — > Reply to this email directly, view it on GitHub > < #146 (reply in thread) >, > or unsubscribe > < https://github.com/notifications/unsubscribe-auth/AILZBLML4F4RFKEIRDWX2PLVMM4HTANCNFSM5VD2IWCQ > > . > You are receiving this because you authored the thread.Message ID: > ***@***.***> > — Reply to this email directly, view it on GitHub <#146 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AHGQMH2C23DCLZM6YNNELXDVMN6PVANCNFSM5VD2IWCQ> . You are receiving this because you commented.Message ID: ***@***.***>

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

how to improve performance? #146

{{title}}

Replies: 3 comments 23 replies

{{title}}

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

{{title}}

{{title}}

Select a reply

how to improve performance? #146

TITC May 5, 2022 Collaborator

Recap

Experiment

panel1 date&bleu

panel2 hyperparamer influence

panel3 fully connected layer style

panel4 token acc

panel5 val_bleu

Replies: 3 comments · 23 replies

lukas-blecher May 5, 2022 Maintainer

TITC May 31, 2022 Collaborator Author

rainyl May 31, 2022

oakmoos Dec 15, 2023

rainyl Dec 15, 2023

oakmoos Dec 15, 2023

rainyl May 22, 2022

TITC May 22, 2022 Collaborator Author

rainyl May 22, 2022

rainyl Oct 11, 2022

TITC
May 5, 2022
Collaborator

Replies: 3 comments 23 replies

lukas-blecher
May 5, 2022
Maintainer

TITC May 31, 2022
Collaborator Author

rainyl
May 22, 2022

TITC May 22, 2022
Collaborator Author

rainyl
Oct 11, 2022