Token based dynamic batching behaviour #151
-
Hey! Couldn't find docs on how token based dynamic batching works. All sequences in the current batch get embedded on the same forward pass if I'm not mistaken; I understand how standard dynamic batching could be applied to this at the request level , but how is it applied at the token level? |
Beta Was this translation helpful? Give feedback.
Replies: 1 comment
-
Token based dyanmic batching means that we use the number of tokens per request to control how many requests will be added to a batch. Let's say you have a queue with 16 requests of different sizes in tokens:
In token based dynamic batching, the size of the batch is very dynamic. What we care about is the number of tokens. Classic dynamic batching was originally developped for Computer Vision workflows. |
Beta Was this translation helpful? Give feedback.
Token based dyanmic batching means that we use the number of tokens per request to control how many requests will be added to a batch.
Classic implementation of dynamic batching only consider requests as a whole and batch together requests until a maximum value of requests inside the batch is achieved. This can lead to an under utilization of the hardware.
Let's say you have a queue with 16 requests of different sizes in tokens:
Classic dynamic batching with a maximum batch size of 4 will always take the 4 first requests even though they could be very small (1 token each for example)
Token based dynamic batching will continue adding requests to the batch until a maximum number of toke…