Skip to content

Conversation

@nazq
Copy link
Contributor

@nazq nazq commented Dec 19, 2025

Summary

  • Adds StreamChunk::Usage(Usage) variant to expose token usage data during streaming
  • Parses usage from Anthropic message_delta events
  • Parses usage from OpenAI final chunks (when stream_options.include_usage is set)
  • Usage chunk is emitted immediately before Done for predictable consumer patterns

Motivation

When using chat_stream_with_tools, we need a way to get token usage data (input_tokens, output_tokens, cache_read_tokens, etc.) that's available in non-streaming ChatResponse::usage(). Both Anthropic and OpenAI include usage in their final streaming events, but this data wasn't being exposed.

Changes

File Description
src/chat/mod.rs Added StreamChunk::Usage(Usage) variant
src/backends/anthropic.rs Added usage field to AnthropicStreamResponse, created convert_anthropic_usage() helper, updated parser to emit Usage before Done
src/providers/openai_compatible.rs Added usage field to OpenAIToolStreamChunk, updated parser to emit Usage (handles both inline and separate chunk cases)
tests/test_backends.rs Added Usage case to match statement in integration tests

API Behavior

Anthropic:

  • message_delta events include cumulative usage field
  • Usage is emitted as StreamChunk::Usage immediately before StreamChunk::Done

OpenAI:

  • Final chunk contains usage when stream_options.include_usage: true (already configured)
  • Handles both cases: usage in same chunk as finish_reason, or in separate empty-choices chunk

Usage Example

while let Some(chunk) = stream.next().await {
    match chunk? {
        StreamChunk::Text(t) => print!("{}", t),
        StreamChunk::Usage(usage) => {
            println!("Tokens: {} in, {} out", usage.prompt_tokens, usage.completion_tokens);
            if let Some(details) = usage.prompt_tokens_details {
                if let Some(cached) = details.cached_tokens {
                    println!("Cache hits: {}", cached);
                }
            }
        }
        StreamChunk::Done { stop_reason } => break,
        _ => {}
    }
}

Test Plan

  • Build passes
  • Clippy passes (no warnings)
  • 30 unit tests pass (including 5 new usage tests)
  • Integration tests pass
  • Anthropic usage parsing with cache tokens
  • OpenAI usage parsing with prompt_tokens_details

Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR adds streaming token usage support by introducing a new StreamChunk::Usage(Usage) variant. This enables consumers to receive token usage data (including cache hits/misses) during streaming operations, which was previously only available in non-streaming responses.

Key Changes:

  • Added StreamChunk::Usage(Usage) variant to expose token usage during streaming
  • Implemented usage parsing from both Anthropic message_delta events and OpenAI final chunks
  • Usage is consistently emitted immediately before the Done chunk

Reviewed changes

Copilot reviewed 4 out of 4 changed files in this pull request and generated no comments.

File Description
src/chat/mod.rs Added new StreamChunk::Usage(Usage) enum variant for streaming token usage
src/backends/anthropic.rs Added usage field to AnthropicStreamResponse, implemented convert_anthropic_usage() helper, updated parser to return Vec<StreamChunk> and emit usage before Done
src/providers/openai_compatible.rs Added usage field to OpenAIToolStreamChunk, implemented logic to handle both inline and separate chunk usage patterns
tests/test_backends.rs Updated integration test to handle new Usage variant in match statement

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Emits Usage chunk from both Anthropic and OpenAI streaming responses:
- Anthropic: extracts usage from message_delta event
- OpenAI: extracts usage from final chunk (requires stream_options.include_usage)
- Usage is emitted immediately before Done chunk
- Includes cache token support via prompt_tokens_details
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant