Skip to content

feat: add turn-level prompt caching + PostHog metrics + live API tests#10801

Open
sestinj wants to merge 7 commits intomainfrom
nate/prompt-caching
Open

feat: add turn-level prompt caching + PostHog metrics + live API tests#10801
sestinj wants to merge 7 commits intomainfrom
nate/prompt-caching

Conversation

@sestinj
Copy link
Contributor

@sestinj sestinj commented Feb 25, 2026

Summary

  • Turn-level caching: The systemAndTools caching strategy now calls addCacheControlToLastTwoUserMessages() to cache conversation turns, not just the system+tools prefix. This matches what the VS Code extension path already does.
  • PostHog cache metrics: recordStreamTelemetry now reports prompt_cache_metrics events with cache_read_tokens, cache_write_tokens, total_prompt_tokens, cache_hit_rate, and tool_count.
  • Live API tests: 13 tests across 2 files validate caching end-to-end against the real Anthropic API (guarded by ANTHROPIC_API_KEY env var, skipped in CI).

Impact of turn-level caching (validated by temporarily reverting the fix)

Scenario Without fix With fix Improvement
Basic 3-turn conversation 93% 99.4% +6%
Tool use follow-up 81.9% 99.5% +18%
Parallel tool calls 83.5% 99.4% +16%
Long conversation (8 turns) 77.5% 94.0% +17%
Large tool result (~200 lines) 62.3% 99.4% +37%

Without the fix, cache hit rates degrade as conversations grow because only the static system+tools prefix is cached. With turn-level caching, the last two user messages are also cached, keeping rates at 94-99%.

Test plan

  • All 25 existing AnthropicCachingStrategies.test.ts unit tests pass (+ 2 new)
  • 3 live API tests in anthropic-caching.live.test.ts pass
  • 10 battle test scenarios in anthropic-caching-scenarios.live.test.ts pass
  • Validated fix impact by temporarily reverting and comparing hit rates

🤖 Generated with Claude Code


Continue Tasks: ❌ 7 failed — View all


Summary by cubic

Adds turn-level prompt caching to the Anthropic systemAndTools strategy, records prompt cache metrics to PostHog, and adds live Anthropic API tests to keep cache hit rates high in multi-turn chats. Also makes the caching transform non‑mutating, fixes minor test/telemetry issues, and ensures live API tests are excluded from CI.

  • New Features

    • Turn-level caching: caches the last two user messages (plus system + tools).
    • PostHog telemetry: records prompt_cache_metrics (cache_read/write_tokens, total_prompt_tokens, cache_hit_rate, tool_count).
    • Live API tests: end-to-end against Anthropic (guarded by ANTHROPIC_API_KEY, skipped in CI) covering tool use, parallel calls, long chats, large tool results, and cache invalidation.
  • Bug Fixes

    • Non-mutating transform: clone message content blocks before adding cache_control.
    • Telemetry: use void posthogService.capture to avoid unhandled async.
    • Tests: fix duplicate assistant messages in Scenario 3; add unit test asserting no input mutation; update adapter tests to expect cache_control on user content; add casts for Anthropic cache_read_tokens fields.
    • CI/Vitest: exclude *.live.test.ts and preserve default excludes via configDefaults.exclude.

Written for commit a754a33. Summary will update on new commits.

sestinj and others added 2 commits February 24, 2026 17:28
…he metrics + live API test

- Add addCacheControlToLastTwoUserMessages call in systemAndToolsStrategy
  so the CLI path caches conversation turns (matching VS Code extension behavior)
- Report prompt_cache_metrics to PostHog with cache_read/write tokens and hit rate
- Add live API integration test validating cache writes on turn 1 and 99%+ hit
  rates on subsequent turns (guarded by ANTHROPIC_API_KEY env var)

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
7 distinct scenarios covering tool use round-trips, parallel tool calls,
long 8-turn conversations, large tool results (~200 lines), cache
invalidation on system message changes, identical request replays, and
multi-step agentic workflows with chained tool calls.

All scenarios validate cache hit rates >90% where expected and proper
cache misses when the system message changes.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@sestinj sestinj requested a review from a team as a code owner February 25, 2026 01:42
@sestinj sestinj requested review from Patrick-Erichsen and removed request for a team February 25, 2026 01:42
@dosubot dosubot bot added the size:XXL This PR changes 1000+ lines, ignoring generated files. label Feb 25, 2026
Copy link
Contributor

@cubic-dev-ai cubic-dev-ai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

5 issues found across 5 files

Prompt for AI agents (all issues)

Check if these issues are valid — if so, understand the root cause of each and fix them. If appropriate, use sub-agents to investigate and fix each issue separately.


<file name="extensions/cli/src/stream/streamChatResponse.helpers.ts">

<violation number="1" location="extensions/cli/src/stream/streamChatResponse.helpers.ts:391">
P2: Missing `await` or `void` for an asynchronous function call.</violation>
</file>

<file name="packages/openai-adapters/src/apis/AnthropicCachingStrategies.test.ts">

<violation number="1" location="packages/openai-adapters/src/apis/AnthropicCachingStrategies.test.ts:213">
P1: The turn-level caching strategy allows the total number of `cache_control` breakpoints to exceed Anthropic's maximum limit of 4. If system and tools use 3-4 breakpoints, adding 2 more in user messages will exceed the limit. `addCacheControlToLastTwoUserMessages` must respect the `availableCacheMessages` counter.</violation>

<violation number="2" location="packages/openai-adapters/src/apis/AnthropicCachingStrategies.test.ts:242">
P2: `systemAndTools` now mutates the original `body.messages` array in place. Since `result` is only a shallow copy, `addCacheControlToLastTwoUserMessages(result.messages)` modifies the original input object. The `messages` array should be mapped or deep-cloned before modification.</violation>

<violation number="3" location="packages/openai-adapters/src/apis/AnthropicCachingStrategies.test.ts:292">
P2: Turn-level caching silently fails for string content messages. Instead of explicitly skipping them, the underlying implementation should convert string messages to an array format (with the `cache_control` block) so they can benefit from caching.</violation>
</file>

<file name="packages/openai-adapters/src/test/anthropic-caching-scenarios.live.test.ts">

<violation number="1" location="packages/openai-adapters/src/test/anthropic-caching-scenarios.live.test.ts:940">
P2: Duplicate assistant messages are pushed in the simulated conversation loop, resulting in two consecutive assistant messages per turn.</violation>
</file>

Reply with feedback, questions, or to request a fix. Tag @cubic-dev-ai to re-run a review.

sestinj and others added 3 commits February 24, 2026 17:55
The OpenAI PromptTokensDetails type doesn't include cache_read_tokens
(it's an Anthropic extension). Add 'as any' casts to fix TypeScript
build errors in the battle test file.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…essages

The systemAndTools caching strategy now adds cache_control to the last
two user messages via addCacheControlToLastTwoUserMessages(). Update
existing test expectations to include the cache_control field.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Clone message content blocks before mutating for cache_control
  (prevents mutation of original input body)
- Add `void` prefix to async posthogService.capture call
- Fix duplicate assistant messages in Scenario 3 test loop
- Add unit test verifying original body is not mutated

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Copy link
Contributor

@cubic-dev-ai cubic-dev-ai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

1 issue found across 4 files (changes from recent commits).

Prompt for AI agents (all issues)

Check if these issues are valid — if so, understand the root cause of each and fix them. If appropriate, use sub-agents to investigate and fix each issue separately.


<file name="packages/openai-adapters/src/test/anthropic-caching-scenarios.live.test.ts">

<violation number="1" location="packages/openai-adapters/src/test/anthropic-caching-scenarios.live.test.ts:956">
P2: Restoring the `exchanges[i + 1]` fallback prevents dead code and improves test resilience.</violation>
</file>

Reply with feedback, questions, or to request a fix. Tag @cubic-dev-ai to re-run a review.

Live test files (*.live.test.ts) make real Anthropic API calls and are
intended for manual validation only. Exclude them from the default
vitest configuration to prevent flaky CI failures.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Copy link
Contributor

@cubic-dev-ai cubic-dev-ai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

1 issue found across 1 file (changes from recent commits).

Prompt for AI agents (all issues)

Check if these issues are valid — if so, understand the root cause of each and fix them. If appropriate, use sub-agents to investigate and fix each issue separately.


<file name="packages/openai-adapters/vitest.config.ts">

<violation number="1" location="packages/openai-adapters/vitest.config.ts:9">
P2: Overriding Vitest's `exclude` drops standard default ignores (like `**/dist/**`). Import `configDefaults` from `vitest/config` and spread `configDefaults.exclude` instead.</violation>
</file>

Reply with feedback, questions, or to request a fix. Tag @cubic-dev-ai to re-run a review.

Preserve Vitest's default excludes (dist, node_modules, etc.) when
adding the live test exclusion pattern.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

size:XXL This PR changes 1000+ lines, ignoring generated files.

Projects

Status: Todo

Development

Successfully merging this pull request may close these issues.

1 participant