Best way to estimate tokens BEFORE sending to model providers #5110
Unanswered
printharsh
asked this question in
Help
Replies: 2 comments 1 reply
-
did you figure out a way ? i asked chatgpt if the ai vercel sdk had a tokenizer function, it gave me following but it's clearly a made up alienation :)
|
Beta Was this translation helpful? Give feedback.
0 replies
-
Facing a similar issue. I don't believe AI SDK has a token counting function for different models. Currently using tiktoken to estimate input tokens for Claude model (not sure what else to do). |
Beta Was this translation helpful? Give feedback.
1 reply
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
-
Trying to figure out what the best way to estimate tokens is before sending it to the model providers. Assume we already have a representative text -> token counter, it would be great if we could figure out what the exact messages are that are being sent to the model.
For example if you use streamText with tools, system prompt, and a CoreMessage array, what is the format that's exactly being sent to say Anthropic? Looking around in the ai-sdk but could not find a good easy way of getting the exact messages sent to the models. There are functions specific to each provider that are not exported by the ai-sdk, but we would have to make provider specific functions to give back the exact messages that are sent.
Currently we're using the CoreMessage type and for each content type, mapping our best guess at what's sent to the model to estimate the tokens (same with tools, dictionary + zod schema stringified to estimate tokens).
Would be happy to help to contribute if there's a better api to expose.
Beta Was this translation helpful? Give feedback.
All reactions