Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

What if reducing the batch size? #82

Open
ywen666 opened this issue Mar 31, 2022 · 9 comments
Open

What if reducing the batch size? #82

ywen666 opened this issue Mar 31, 2022 · 9 comments
Labels
bug Something isn't working question Further information is requested

Comments

@ywen666
Copy link

ywen666 commented Mar 31, 2022

Hi,

Thanks for releasing this amazing code repo!
The paper mentioned the batch size is 2k, leading to 410 gradient accumulation steps, which is a bit too slow to fine-tune. I wonder if the authors have tried reduce the batch size, such as 128 as suggested in the T5 paper? Does this degrade the performance a lot?

Thanks!

@tscholak
Copy link
Collaborator

tscholak commented Apr 1, 2022

Hi @ywen666,
Thanks! You can try with batch size around 32, and that should work as well.

@ywen666
Copy link
Author

ywen666 commented Apr 1, 2022

Hi thanks for the suggestion! I also share a question with the other issue where the evaluation is slow.

Running python seq2seq/run_seq2seq.py configs/eval.json takes 9 hours to complete on a 4-gpu (rtx 5000) node, 200s per iteration, which seems too slow.

If I disable picard, runnning python seq2seq/run_seq2seq.py configs/nopicard_eval.json, it is pretty fast, finished in 8mins, 1.6s per iteration.

I wonder what is the best way to check which part bottlenecks the inference speed?

@tscholak
Copy link
Collaborator

tscholak commented Apr 1, 2022

There were some changes recently to the parser that may have resulted in a performance regression. I suspect that this is the cause the slowdown. When I have the time, I’ll look into this.

@tscholak
Copy link
Collaborator

tscholak commented Apr 1, 2022

You could help me out by telling me which input-output pairs take the longest to generate.

@tscholak tscholak added the question Further information is requested label Apr 1, 2022
@ywen666
Copy link
Author

ywen666 commented Apr 2, 2022

I am looking into this but it will take some time for me to figure it out.

@ywen666
Copy link
Author

ywen666 commented Apr 5, 2022

Hi, I am trying to time each example generation time. I found the generate method wrapper for the SpiderModel in https://github.com/ElementAI/picard/blob/main/seq2seq/utils/picard_model_wrapper.py

However, I couldn't find the code which makes use this generate method. I checked the trainer and SpiderTrainer, it seems the evaluate never used the generate method, either the evaluation_loop inside which is from huggingface model.

Could you please give a pointer on where the generate wrapper is used in the repo?

@ywen666
Copy link
Author

ywen666 commented Apr 7, 2022

oh, generate is invoked in the Seq2SeqTrainer's prediction_step method.

@takacsg84
Copy link

Hi!
Evaluated on the spider_realistic dataset, and as @tscholak asked logged the calculation time for each question. Here is the full list: https://docs.google.com/spreadsheets/d/1NGui5DPQU5SChHzXXzfYYNjP6HknbM-VcXXI77dbGNk/edit?usp=sharing
And here is the question that was the slowest:

500: What is the msot common country for singer? (0 days 00:21:56.996459)

@tscholak
Copy link
Collaborator

Thanks so much, this information will help me with the root cause analysis for the speed regression!

@tscholak tscholak added the bug Something isn't working label Jul 31, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working question Further information is requested
Projects
None yet
Development

No branches or pull requests

3 participants