-
Notifications
You must be signed in to change notification settings - Fork 30
Open
Labels
enhancementNew feature or requestNew feature or request
Description
Problem
Most users have exactly one API key. We need a simple test matrix to ensure basic functionality works for each major provider.
From Dec 13 Benchmarking Meeting:
"What it generated was not JavaScript I asked for, but an HTML page containing script tags with JavaScript embedded... I would like to see a simple matrix, a very simple prompts, probably just three different prompts, four different popular languages"
Proposed Matrix
| Provider | Language | Prompt Type |
|---|---|---|
| OpenAI | Python | Simple function |
| OpenAI | JavaScript | Simple function |
| Python | Simple function | |
| Anthropic | Python | Simple function |
Test Criteria
- Generation completes without errors
- Output is syntactically valid (parseable)
- Output is monolingual (not HTML with embedded JS when JS requested)
- Cost < $0.10 per test
Implementation
- Add to
tests/regression.shor newtests/quality_bar.sh - Run manually before releases (too expensive for every PR)
- Could be integrated into benchmarks/ directory structure
Key Insight from Meeting
"The biggest part of new user and our biggest fraction of our user base is people who have exactly one key... we should try to optimize for the simplest prompts when people are just starting out that they all basically work rather than disappoint right on the first step."
Metadata
Metadata
Assignees
Labels
enhancementNew feature or requestNew feature or request