feat: R2 snapshot upload and D1 prompt truncation#49
feat: R2 snapshot upload and D1 prompt truncation#49Nsttt wants to merge 6 commits intofeat/mintingfrom
Conversation
- Add R2 bucket bindings to wrangler.toml for all environments (local, dev, staging, production) - Create snapshots API handlers (upload/download/list/delete) with R2 storage - Extend WorkerClient with snapshot management methods - Integrate R2 upload into minting process with --upload-to-r2 CLI flag - Add prompt truncation (500KB limit) in benchmark execution to prevent D1 failures on long conversations - Update types and exports for snapshot metadata 🤖 Generated with Claude Code Co-Authored-By: Claude <noreply@anthropic.com>
The benchmarkLogger.completeRun() call was not awaited, causing runs to be submitted asynchronously while the minting process started. This resulted in empty runs when querying the database. 🤖 Generated with Claude Code Co-Authored-By: Claude <noreply@anthropic.com>
Adds retry with 1s delay (up to 3 attempts) when loading benchmark batch from Worker API to handle D1 eventual consistency issues. 🤖 Generated with Claude Code Co-Authored-By: Claude <noreply@anthropic.com>
Update submit.ts fallback to remove all specialist fields (enabled, name, version) when database columns don't exist, not just enabled. The proper fix is to run migration 0004 which adds these columns. 🤖 Generated with Claude Code Co-Authored-By: Claude <noreply@anthropic.com>
There was a problem hiding this comment.
Pull request overview
This PR implements Cloudflare R2 integration for cloud snapshot storage and fixes D1 database failures caused by oversized prompts in long conversations. The changes introduce snapshot upload capabilities via CLI and API, while addressing storage limits through intelligent prompt truncation.
- R2 Integration: New API endpoints and client methods enable uploading, downloading, listing, and deleting specialist snapshots in R2 cloud storage, accessible via
--upload-to-r2CLI flag - D1 Prompt Fix: Implements 500KB truncation limit preserving first 200KB and last 50KB with clear markers to prevent database insert failures
- Cache Removal: Eliminates local benchmark cache system in favor of API-based approach with improved batch statistics calculation
Reviewed changes
Copilot reviewed 54 out of 57 changed files in this pull request and generated 6 comments.
Show a summary per file
| File | Description |
|---|---|
| templates/enriched/0.0.*/nextjs-specialist.enriched.001.json5 | New enriched template versions with documentation metadata (contains hardcoded absolute paths) |
| snapshots/nextjs-specialist/0.0.1/*.json | Updated snapshot files with modified paths and timestamps |
| packages/worker-client/src/types.ts | Added snapshot-related TypeScript interfaces for R2 operations |
| packages/worker-client/src/client.ts | Implemented R2 snapshot methods (upload, download, list, delete) |
| packages/worker-client/src/logger.ts | Added computeBatchStats method for dynamic statistics calculation |
| packages/specialist-mint/src/mint.ts | Integrated R2 upload with error handling and metadata generation |
| packages/specialist-mint/src/cli.ts | Added --upload-to-r2 and --api-key CLI flags |
| packages/specialist-mint/src/types.ts | Extended types with evaluation details and R2 upload results |
| packages/specialist-mint/src/benchmark-loader.ts | Removed cache logic, improved API retry mechanism, added evaluation mapping |
| packages/harness/src/execution/benchmark.ts | Implemented 500KB prompt truncation with preserved boundaries |
| packages/harness/src/lib/benchmark-cache.ts | Deleted - cache functionality removed |
| packages/harness/src/interactive/benchmark.ts | Removed cache calls, switched to computed batch statistics |
| packages/logger/src/logger.ts | Removed benchmark cache logger |
| apps/worker/worker-configuration.d.ts | Added R2 SNAPSHOTS bucket binding |
| @@ -0,0 +1,815 @@ | |||
| { | |||
| $schema: '/Users/shanewalker/Developer/Zephyr/ze-benchmarks/packages/specialist-mint/src/schemas/template.schema.jsonc', | |||
There was a problem hiding this comment.
The $schema field contains an absolute path specific to a developer's machine. This pattern appears in multiple enriched template files (0.0.17, 0.0.16, 0.0.15, 0.0.14, 0.0.13) and should be a relative path or schema identifier for portability.
| "snapshot_path": "/Users/nsttt/conductor/workspaces/ze-benchmarks/salvador/snapshots/nextjs-specialist/0.0.1/snapshot-001.json5", | ||
| "template": { | ||
| "name": "nextjs-specialist", | ||
| "version": "0.0.1", | ||
| "path": "/Users/nitinkanchi/github/ze-benchmarks/templates/nextjs-specialist-template.jsonc", | ||
| "path": "/Users/nsttt/conductor/workspaces/ze-benchmarks/salvador/templates/nextjs-specialist-template.jsonc", | ||
| "is_enriched": false | ||
| }, | ||
| "benchmarks": { | ||
| "included": false, | ||
| "batch_id": "batch-1765353132204-18kcrn2eh" | ||
| "included": false | ||
| }, | ||
| "output": { | ||
| "directory": "/Users/nitinkanchi/github/ze-benchmarks/snapshots/nextjs-specialist/0.0.1", | ||
| "snapshot_file": "/Users/nitinkanchi/github/ze-benchmarks/snapshots/nextjs-specialist/0.0.1/snapshot-001.json5", | ||
| "metadata_file": "/Users/nitinkanchi/github/ze-benchmarks/snapshots/nextjs-specialist/0.0.1/snapshot-001.meta.json" | ||
| "directory": "/Users/nsttt/conductor/workspaces/ze-benchmarks/salvador/snapshots/nextjs-specialist/0.0.1", | ||
| "snapshot_file": "/Users/nsttt/conductor/workspaces/ze-benchmarks/salvador/snapshots/nextjs-specialist/0.0.1/snapshot-001.json5", | ||
| "metadata_file": "/Users/nsttt/conductor/workspaces/ze-benchmarks/salvador/snapshots/nextjs-specialist/0.0.1/snapshot-001.meta.json" |
There was a problem hiding this comment.
Snapshot metadata files contain absolute paths specific to different developer machines (e.g., /Users/nsttt/conductor/workspaces/... and /Users/nitinkanchi/github/...). These paths should be relative or omitted from version control to ensure portability and avoid exposing local directory structures.
| if (fullPromptSent.length > MAX_PROMPT_SIZE) { | ||
| // Keep first 200KB and last 50KB, with truncation marker in between | ||
| const keepStart = 200 * 1024; | ||
| const keepEnd = 50 * 1024; | ||
| const truncationMarker = `\n\n[... TRUNCATED ${((fullPromptSent.length - keepStart - keepEnd) / 1024).toFixed(0)}KB - original size: ${(fullPromptSent.length / 1024).toFixed(0)}KB ...]\n\n`; | ||
|
|
||
| promptSent = fullPromptSent.substring(0, keepStart) + | ||
| truncationMarker + | ||
| fullPromptSent.substring(fullPromptSent.length - keepEnd); |
There was a problem hiding this comment.
The prompt truncation implementation could potentially corrupt JSON structures if the truncation point falls within a JSON object or array. Consider implementing JSON-aware truncation or adding validation to ensure the truncated content remains valid JSON when parsed.
| // Step 8: Upload to R2 (optional) | ||
| if (options?.uploadToR2) { | ||
| log.debug(chalk.blue('\n☁️ Step 8: Uploading to R2...')); | ||
|
|
||
| try { | ||
| const client = new WorkerClient({ | ||
| workerUrl: options.workerUrl || process.env.ZE_BENCHMARKS_WORKER_URL, | ||
| apiKey: options.apiKey || process.env.ZE_BENCHMARKS_API_KEY | ||
| }); | ||
|
|
||
| // Check worker connectivity | ||
| const isHealthy = await client.healthCheck(); | ||
| if (!isHealthy) { | ||
| log.warn('⚠️ Worker API is not accessible, skipping R2 upload'); | ||
| result.r2 = { | ||
| uploaded: false, | ||
| error: 'Worker API not accessible' | ||
| }; | ||
| } else { | ||
| // Extract specialist name without scope for R2 key | ||
| const nameWithoutScope = template.name.includes('/') | ||
| ? template.name.split('/')[1]! | ||
| : template.name; | ||
|
|
||
| // Prepare R2 metadata | ||
| const r2Metadata = { | ||
| specialistName: nameWithoutScope, | ||
| specialistVersion: template.version, | ||
| snapshotId, | ||
| createdAt: new Date().toISOString(), | ||
| batchId: options.batchId, | ||
| templateVersion: template.version, | ||
| isEnriched, | ||
| runCount: benchmarkRuns?.length || 0, | ||
| avgScore: snapshot.benchmarks.aggregated_scores?.avg_score | ||
| }; | ||
|
|
||
| // Upload to R2 | ||
| const uploadResult = await client.uploadSnapshot(snapshot, r2Metadata); | ||
|
|
||
| log.debug(chalk.green(` ✓ Uploaded to R2: ${uploadResult.key}`)); | ||
| log.debug(chalk.gray(` R2 URL: ${uploadResult.url}`)); | ||
|
|
||
| result.r2 = { | ||
| uploaded: true, | ||
| key: uploadResult.key, | ||
| metadataKey: uploadResult.metadataKey, | ||
| url: uploadResult.url | ||
| }; | ||
| } | ||
| } catch (error) { | ||
| const errorMessage = error instanceof Error ? error.message : String(error); | ||
| log.warn(`⚠️ Failed to upload to R2: ${errorMessage}`); | ||
| result.r2 = { | ||
| uploaded: false, | ||
| error: errorMessage | ||
| }; | ||
| } | ||
| } |
There was a problem hiding this comment.
The R2 snapshot upload functionality lacks test coverage. Consider adding integration tests to verify upload success/failure handling, metadata generation, and error recovery scenarios.
| @@ -0,0 +1,841 @@ | |||
| { | |||
| $schema: '/Users/shanewalker/Developer/Zephyr/ze-benchmarks/packages/specialist-mint/src/schemas/template.schema.jsonc', | |||
There was a problem hiding this comment.
The $schema field contains an absolute path specific to a developer's machine (/Users/shanewalker/Developer/Zephyr/ze-benchmarks/...). This should be a relative path or a schema identifier to ensure portability across different development environments and systems.
| @@ -0,0 +1,828 @@ | |||
| { | |||
| $schema: '/Users/shanewalker/Developer/Zephyr/ze-benchmarks/packages/specialist-mint/src/schemas/template.schema.jsonc', | |||
There was a problem hiding this comment.
The $schema field contains an absolute path specific to a developer's machine (/Users/shanewalker/Developer/Zephyr/ze-benchmarks/...). This should be a relative path or a schema identifier to ensure portability across different development environments and systems.
Summary
Implements Cloudflare R2 integration for snapshot storage and fixes D1 conversation length issues:
--upload-to-r2flagImplementation Details
R2 Integration:
apps/worker/src/api/snapshots.tswith full CRUD operationsD1 Prompt Fix:
Testing
Ready for integration testing:
🤖 Generated with Claude Code