Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Throughput is not improved with the increase of batch size #316

Open
Irr-free opened this issue Aug 13, 2024 · 0 comments
Open

Throughput is not improved with the increase of batch size #316

Irr-free opened this issue Aug 13, 2024 · 0 comments

Comments

@Irr-free
Copy link

I'm using llama3 with a single Nividia V100 GPU (32GiB memory). When I increase the batch size from 1 to 8, the inference throughput does not increase, but it decreases. However, when I set the batch size to 16, throughput increases.
The max_seq_len in my config is 512, and I have not modified any other codes of the model.
Is this related to the cuda core or tensor core config of NVIDIA GPU?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant