Best way to estimate tokens BEFORE sending to model providers #5110

printharsh · 2025-03-09T22:53:10Z

printharsh
Mar 9, 2025

Trying to figure out what the best way to estimate tokens is before sending it to the model providers. Assume we already have a representative text -> token counter, it would be great if we could figure out what the exact messages are that are being sent to the model.

For example if you use streamText with tools, system prompt, and a CoreMessage array, what is the format that's exactly being sent to say Anthropic? Looking around in the ai-sdk but could not find a good easy way of getting the exact messages sent to the models. There are functions specific to each provider that are not exported by the ai-sdk, but we would have to make provider specific functions to give back the exact messages that are sent.

Currently we're using the CoreMessage type and for each content type, mapping our best guess at what's sent to the model to estimate the tokens (same with tools, dictionary + zod schema stringified to estimate tokens).

Would be happy to help to contribute if there's a better api to expose.

larafale · 2025-06-03T11:50:55Z

larafale
Jun 3, 2025

did you figure out a way ?
i also need to estimate how much tokens my user prompt is before live calling ai models.

i asked chatgpt if the ai vercel sdk had a tokenizer function, it gave me following but it's clearly a made up alienation :)

import { experimental_tokenizer } from 'ai';

const prompt = "It's a place that never sleep.";
const tokens = experimental_tokenizer(prompt, {
  model: 'claude-3-sonnet-20240229' // or 'gpt-4', 'gpt-4o', etc.
});

console.log('Token count:', tokens.length);

0 replies

roshgill · 2025-08-26T23:25:57Z

roshgill
Aug 26, 2025

Facing a similar issue. I don't believe AI SDK has a token counting function for different models. Currently using tiktoken to estimate input tokens for Claude model (not sure what else to do).

1 reply

roshgill Aug 27, 2025

Actually using anthropic's official token counter endpoint

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Best way to estimate tokens BEFORE sending to model providers #5110

Uh oh!

{{title}}

Uh oh!

Replies: 2 comments 1 reply

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

Best way to estimate tokens BEFORE sending to model providers #5110

Uh oh!

printharsh Mar 9, 2025

Replies: 2 comments · 1 reply

Uh oh!

larafale Jun 3, 2025

Uh oh!

roshgill Aug 26, 2025

Uh oh!

roshgill Aug 27, 2025

printharsh
Mar 9, 2025

Replies: 2 comments 1 reply

larafale
Jun 3, 2025

roshgill
Aug 26, 2025