feat: add turn-level prompt caching + PostHog metrics + live API tests#10801
Open
feat: add turn-level prompt caching + PostHog metrics + live API tests#10801
Conversation
…he metrics + live API test - Add addCacheControlToLastTwoUserMessages call in systemAndToolsStrategy so the CLI path caches conversation turns (matching VS Code extension behavior) - Report prompt_cache_metrics to PostHog with cache_read/write tokens and hit rate - Add live API integration test validating cache writes on turn 1 and 99%+ hit rates on subsequent turns (guarded by ANTHROPIC_API_KEY env var) Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
7 distinct scenarios covering tool use round-trips, parallel tool calls, long 8-turn conversations, large tool results (~200 lines), cache invalidation on system message changes, identical request replays, and multi-step agentic workflows with chained tool calls. All scenarios validate cache hit rates >90% where expected and proper cache misses when the system message changes. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Contributor
There was a problem hiding this comment.
5 issues found across 5 files
Prompt for AI agents (all issues)
Check if these issues are valid — if so, understand the root cause of each and fix them. If appropriate, use sub-agents to investigate and fix each issue separately.
<file name="extensions/cli/src/stream/streamChatResponse.helpers.ts">
<violation number="1" location="extensions/cli/src/stream/streamChatResponse.helpers.ts:391">
P2: Missing `await` or `void` for an asynchronous function call.</violation>
</file>
<file name="packages/openai-adapters/src/apis/AnthropicCachingStrategies.test.ts">
<violation number="1" location="packages/openai-adapters/src/apis/AnthropicCachingStrategies.test.ts:213">
P1: The turn-level caching strategy allows the total number of `cache_control` breakpoints to exceed Anthropic's maximum limit of 4. If system and tools use 3-4 breakpoints, adding 2 more in user messages will exceed the limit. `addCacheControlToLastTwoUserMessages` must respect the `availableCacheMessages` counter.</violation>
<violation number="2" location="packages/openai-adapters/src/apis/AnthropicCachingStrategies.test.ts:242">
P2: `systemAndTools` now mutates the original `body.messages` array in place. Since `result` is only a shallow copy, `addCacheControlToLastTwoUserMessages(result.messages)` modifies the original input object. The `messages` array should be mapped or deep-cloned before modification.</violation>
<violation number="3" location="packages/openai-adapters/src/apis/AnthropicCachingStrategies.test.ts:292">
P2: Turn-level caching silently fails for string content messages. Instead of explicitly skipping them, the underlying implementation should convert string messages to an array format (with the `cache_control` block) so they can benefit from caching.</violation>
</file>
<file name="packages/openai-adapters/src/test/anthropic-caching-scenarios.live.test.ts">
<violation number="1" location="packages/openai-adapters/src/test/anthropic-caching-scenarios.live.test.ts:940">
P2: Duplicate assistant messages are pushed in the simulated conversation loop, resulting in two consecutive assistant messages per turn.</violation>
</file>
Reply with feedback, questions, or to request a fix. Tag @cubic-dev-ai to re-run a review.
packages/openai-adapters/src/test/anthropic-caching-scenarios.live.test.ts
Outdated
Show resolved
Hide resolved
The OpenAI PromptTokensDetails type doesn't include cache_read_tokens (it's an Anthropic extension). Add 'as any' casts to fix TypeScript build errors in the battle test file. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…essages The systemAndTools caching strategy now adds cache_control to the last two user messages via addCacheControlToLastTwoUserMessages(). Update existing test expectations to include the cache_control field. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Clone message content blocks before mutating for cache_control (prevents mutation of original input body) - Add `void` prefix to async posthogService.capture call - Fix duplicate assistant messages in Scenario 3 test loop - Add unit test verifying original body is not mutated Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Contributor
There was a problem hiding this comment.
1 issue found across 4 files (changes from recent commits).
Prompt for AI agents (all issues)
Check if these issues are valid — if so, understand the root cause of each and fix them. If appropriate, use sub-agents to investigate and fix each issue separately.
<file name="packages/openai-adapters/src/test/anthropic-caching-scenarios.live.test.ts">
<violation number="1" location="packages/openai-adapters/src/test/anthropic-caching-scenarios.live.test.ts:956">
P2: Restoring the `exchanges[i + 1]` fallback prevents dead code and improves test resilience.</violation>
</file>
Reply with feedback, questions, or to request a fix. Tag @cubic-dev-ai to re-run a review.
packages/openai-adapters/src/test/anthropic-caching-scenarios.live.test.ts
Show resolved
Hide resolved
Live test files (*.live.test.ts) make real Anthropic API calls and are intended for manual validation only. Exclude them from the default vitest configuration to prevent flaky CI failures. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Contributor
There was a problem hiding this comment.
1 issue found across 1 file (changes from recent commits).
Prompt for AI agents (all issues)
Check if these issues are valid — if so, understand the root cause of each and fix them. If appropriate, use sub-agents to investigate and fix each issue separately.
<file name="packages/openai-adapters/vitest.config.ts">
<violation number="1" location="packages/openai-adapters/vitest.config.ts:9">
P2: Overriding Vitest's `exclude` drops standard default ignores (like `**/dist/**`). Import `configDefaults` from `vitest/config` and spread `configDefaults.exclude` instead.</violation>
</file>
Reply with feedback, questions, or to request a fix. Tag @cubic-dev-ai to re-run a review.
Preserve Vitest's default excludes (dist, node_modules, etc.) when adding the live test exclusion pattern. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
systemAndToolscaching strategy now callsaddCacheControlToLastTwoUserMessages()to cache conversation turns, not just the system+tools prefix. This matches what the VS Code extension path already does.recordStreamTelemetrynow reportsprompt_cache_metricsevents withcache_read_tokens,cache_write_tokens,total_prompt_tokens,cache_hit_rate, andtool_count.ANTHROPIC_API_KEYenv var, skipped in CI).Impact of turn-level caching (validated by temporarily reverting the fix)
Without the fix, cache hit rates degrade as conversations grow because only the static system+tools prefix is cached. With turn-level caching, the last two user messages are also cached, keeping rates at 94-99%.
Test plan
AnthropicCachingStrategies.test.tsunit tests pass (+ 2 new)anthropic-caching.live.test.tspassanthropic-caching-scenarios.live.test.tspass🤖 Generated with Claude Code
Continue Tasks: ❌ 7 failed — View all
Summary by cubic
Adds turn-level prompt caching to the Anthropic systemAndTools strategy, records prompt cache metrics to PostHog, and adds live Anthropic API tests to keep cache hit rates high in multi-turn chats. Also makes the caching transform non‑mutating, fixes minor test/telemetry issues, and ensures live API tests are excluded from CI.
New Features
Bug Fixes
Written for commit a754a33. Summary will update on new commits.