Skip to content

feat: R2 snapshot upload and D1 prompt truncation#49

Open
Nsttt wants to merge 6 commits intofeat/mintingfrom
Nsttt/cloudflare-minting-fix
Open

feat: R2 snapshot upload and D1 prompt truncation#49
Nsttt wants to merge 6 commits intofeat/mintingfrom
Nsttt/cloudflare-minting-fix

Conversation

@Nsttt
Copy link
Member

@Nsttt Nsttt commented Dec 10, 2025

Summary

Implements Cloudflare R2 integration for snapshot storage and fixes D1 conversation length issues:

  • Added R2 bucket bindings to wrangler.toml for all environments
  • Created snapshot API handlers for upload/download/list/delete operations
  • Extended minting process with optional R2 upload via --upload-to-r2 flag
  • Implemented prompt truncation (500KB) to prevent D1 failures on long conversations

Implementation Details

R2 Integration:

  • New apps/worker/src/api/snapshots.ts with full CRUD operations
  • WorkerClient methods for snapshot management
  • CLI flag support for automatic cloud upload

D1 Prompt Fix:

  • Truncates prompts exceeding 500KB limit
  • Preserves first 200KB and last 50KB with truncation marker
  • Logs warnings when truncation occurs

Testing

Ready for integration testing:

  • Verify R2 buckets exist in all environments
  • Test snapshot upload with minting CLI
  • Validate prompt truncation in long-running benchmarks

🤖 Generated with Claude Code

Nsttt and others added 6 commits December 10, 2025 18:45
- Add R2 bucket bindings to wrangler.toml for all environments (local, dev, staging, production)
- Create snapshots API handlers (upload/download/list/delete) with R2 storage
- Extend WorkerClient with snapshot management methods
- Integrate R2 upload into minting process with --upload-to-r2 CLI flag
- Add prompt truncation (500KB limit) in benchmark execution to prevent D1 failures on long conversations
- Update types and exports for snapshot metadata

🤖 Generated with Claude Code

Co-Authored-By: Claude <noreply@anthropic.com>
The benchmarkLogger.completeRun() call was not awaited, causing runs
to be submitted asynchronously while the minting process started.
This resulted in empty runs when querying the database.

🤖 Generated with Claude Code

Co-Authored-By: Claude <noreply@anthropic.com>
Adds retry with 1s delay (up to 3 attempts) when loading benchmark
batch from Worker API to handle D1 eventual consistency issues.

🤖 Generated with Claude Code

Co-Authored-By: Claude <noreply@anthropic.com>
Update submit.ts fallback to remove all specialist fields (enabled,
name, version) when database columns don't exist, not just enabled.

The proper fix is to run migration 0004 which adds these columns.

🤖 Generated with Claude Code

Co-Authored-By: Claude <noreply@anthropic.com>
@Nsttt Nsttt marked this pull request as ready for review December 16, 2025 10:09
Copilot AI review requested due to automatic review settings December 16, 2025 10:09
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR implements Cloudflare R2 integration for cloud snapshot storage and fixes D1 database failures caused by oversized prompts in long conversations. The changes introduce snapshot upload capabilities via CLI and API, while addressing storage limits through intelligent prompt truncation.

  • R2 Integration: New API endpoints and client methods enable uploading, downloading, listing, and deleting specialist snapshots in R2 cloud storage, accessible via --upload-to-r2 CLI flag
  • D1 Prompt Fix: Implements 500KB truncation limit preserving first 200KB and last 50KB with clear markers to prevent database insert failures
  • Cache Removal: Eliminates local benchmark cache system in favor of API-based approach with improved batch statistics calculation

Reviewed changes

Copilot reviewed 54 out of 57 changed files in this pull request and generated 6 comments.

Show a summary per file
File Description
templates/enriched/0.0.*/nextjs-specialist.enriched.001.json5 New enriched template versions with documentation metadata (contains hardcoded absolute paths)
snapshots/nextjs-specialist/0.0.1/*.json Updated snapshot files with modified paths and timestamps
packages/worker-client/src/types.ts Added snapshot-related TypeScript interfaces for R2 operations
packages/worker-client/src/client.ts Implemented R2 snapshot methods (upload, download, list, delete)
packages/worker-client/src/logger.ts Added computeBatchStats method for dynamic statistics calculation
packages/specialist-mint/src/mint.ts Integrated R2 upload with error handling and metadata generation
packages/specialist-mint/src/cli.ts Added --upload-to-r2 and --api-key CLI flags
packages/specialist-mint/src/types.ts Extended types with evaluation details and R2 upload results
packages/specialist-mint/src/benchmark-loader.ts Removed cache logic, improved API retry mechanism, added evaluation mapping
packages/harness/src/execution/benchmark.ts Implemented 500KB prompt truncation with preserved boundaries
packages/harness/src/lib/benchmark-cache.ts Deleted - cache functionality removed
packages/harness/src/interactive/benchmark.ts Removed cache calls, switched to computed batch statistics
packages/logger/src/logger.ts Removed benchmark cache logger
apps/worker/worker-configuration.d.ts Added R2 SNAPSHOTS bucket binding

@@ -0,0 +1,815 @@
{
$schema: '/Users/shanewalker/Developer/Zephyr/ze-benchmarks/packages/specialist-mint/src/schemas/template.schema.jsonc',
Copy link

Copilot AI Dec 16, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The $schema field contains an absolute path specific to a developer's machine. This pattern appears in multiple enriched template files (0.0.17, 0.0.16, 0.0.15, 0.0.14, 0.0.13) and should be a relative path or schema identifier for portability.

Copilot uses AI. Check for mistakes.
Comment on lines +3 to +16
"snapshot_path": "/Users/nsttt/conductor/workspaces/ze-benchmarks/salvador/snapshots/nextjs-specialist/0.0.1/snapshot-001.json5",
"template": {
"name": "nextjs-specialist",
"version": "0.0.1",
"path": "/Users/nitinkanchi/github/ze-benchmarks/templates/nextjs-specialist-template.jsonc",
"path": "/Users/nsttt/conductor/workspaces/ze-benchmarks/salvador/templates/nextjs-specialist-template.jsonc",
"is_enriched": false
},
"benchmarks": {
"included": false,
"batch_id": "batch-1765353132204-18kcrn2eh"
"included": false
},
"output": {
"directory": "/Users/nitinkanchi/github/ze-benchmarks/snapshots/nextjs-specialist/0.0.1",
"snapshot_file": "/Users/nitinkanchi/github/ze-benchmarks/snapshots/nextjs-specialist/0.0.1/snapshot-001.json5",
"metadata_file": "/Users/nitinkanchi/github/ze-benchmarks/snapshots/nextjs-specialist/0.0.1/snapshot-001.meta.json"
"directory": "/Users/nsttt/conductor/workspaces/ze-benchmarks/salvador/snapshots/nextjs-specialist/0.0.1",
"snapshot_file": "/Users/nsttt/conductor/workspaces/ze-benchmarks/salvador/snapshots/nextjs-specialist/0.0.1/snapshot-001.json5",
"metadata_file": "/Users/nsttt/conductor/workspaces/ze-benchmarks/salvador/snapshots/nextjs-specialist/0.0.1/snapshot-001.meta.json"
Copy link

Copilot AI Dec 16, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Snapshot metadata files contain absolute paths specific to different developer machines (e.g., /Users/nsttt/conductor/workspaces/... and /Users/nitinkanchi/github/...). These paths should be relative or omitted from version control to ensure portability and avoid exposing local directory structures.

Copilot uses AI. Check for mistakes.
Comment on lines +494 to +502
if (fullPromptSent.length > MAX_PROMPT_SIZE) {
// Keep first 200KB and last 50KB, with truncation marker in between
const keepStart = 200 * 1024;
const keepEnd = 50 * 1024;
const truncationMarker = `\n\n[... TRUNCATED ${((fullPromptSent.length - keepStart - keepEnd) / 1024).toFixed(0)}KB - original size: ${(fullPromptSent.length / 1024).toFixed(0)}KB ...]\n\n`;

promptSent = fullPromptSent.substring(0, keepStart) +
truncationMarker +
fullPromptSent.substring(fullPromptSent.length - keepEnd);
Copy link

Copilot AI Dec 16, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The prompt truncation implementation could potentially corrupt JSON structures if the truncation point falls within a JSON object or array. Consider implementing JSON-aware truncation or adding validation to ensure the truncated content remains valid JSON when parsed.

Copilot uses AI. Check for mistakes.
Comment on lines +165 to +223
// Step 8: Upload to R2 (optional)
if (options?.uploadToR2) {
log.debug(chalk.blue('\n☁️ Step 8: Uploading to R2...'));

try {
const client = new WorkerClient({
workerUrl: options.workerUrl || process.env.ZE_BENCHMARKS_WORKER_URL,
apiKey: options.apiKey || process.env.ZE_BENCHMARKS_API_KEY
});

// Check worker connectivity
const isHealthy = await client.healthCheck();
if (!isHealthy) {
log.warn('⚠️ Worker API is not accessible, skipping R2 upload');
result.r2 = {
uploaded: false,
error: 'Worker API not accessible'
};
} else {
// Extract specialist name without scope for R2 key
const nameWithoutScope = template.name.includes('/')
? template.name.split('/')[1]!
: template.name;

// Prepare R2 metadata
const r2Metadata = {
specialistName: nameWithoutScope,
specialistVersion: template.version,
snapshotId,
createdAt: new Date().toISOString(),
batchId: options.batchId,
templateVersion: template.version,
isEnriched,
runCount: benchmarkRuns?.length || 0,
avgScore: snapshot.benchmarks.aggregated_scores?.avg_score
};

// Upload to R2
const uploadResult = await client.uploadSnapshot(snapshot, r2Metadata);

log.debug(chalk.green(` ✓ Uploaded to R2: ${uploadResult.key}`));
log.debug(chalk.gray(` R2 URL: ${uploadResult.url}`));

result.r2 = {
uploaded: true,
key: uploadResult.key,
metadataKey: uploadResult.metadataKey,
url: uploadResult.url
};
}
} catch (error) {
const errorMessage = error instanceof Error ? error.message : String(error);
log.warn(`⚠️ Failed to upload to R2: ${errorMessage}`);
result.r2 = {
uploaded: false,
error: errorMessage
};
}
}
Copy link

Copilot AI Dec 16, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The R2 snapshot upload functionality lacks test coverage. Consider adding integration tests to verify upload success/failure handling, metadata generation, and error recovery scenarios.

Copilot uses AI. Check for mistakes.
@@ -0,0 +1,841 @@
{
$schema: '/Users/shanewalker/Developer/Zephyr/ze-benchmarks/packages/specialist-mint/src/schemas/template.schema.jsonc',
Copy link

Copilot AI Dec 16, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The $schema field contains an absolute path specific to a developer's machine (/Users/shanewalker/Developer/Zephyr/ze-benchmarks/...). This should be a relative path or a schema identifier to ensure portability across different development environments and systems.

Copilot uses AI. Check for mistakes.
@@ -0,0 +1,828 @@
{
$schema: '/Users/shanewalker/Developer/Zephyr/ze-benchmarks/packages/specialist-mint/src/schemas/template.schema.jsonc',
Copy link

Copilot AI Dec 16, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The $schema field contains an absolute path specific to a developer's machine (/Users/shanewalker/Developer/Zephyr/ze-benchmarks/...). This should be a relative path or a schema identifier to ensure portability across different development environments and systems.

Copilot uses AI. Check for mistakes.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants

Comments