Skip to content

Single-Key Quality Bar Test Matrix #194

@gltanaka

Description

@gltanaka

Problem

Most users have exactly one API key. We need a simple test matrix to ensure basic functionality works for each major provider.

From Dec 13 Benchmarking Meeting:

"What it generated was not JavaScript I asked for, but an HTML page containing script tags with JavaScript embedded... I would like to see a simple matrix, a very simple prompts, probably just three different prompts, four different popular languages"

Proposed Matrix

Provider Language Prompt Type
OpenAI Python Simple function
OpenAI JavaScript Simple function
Google Python Simple function
Anthropic Python Simple function

Test Criteria

  • Generation completes without errors
  • Output is syntactically valid (parseable)
  • Output is monolingual (not HTML with embedded JS when JS requested)
  • Cost < $0.10 per test

Implementation

  • Add to tests/regression.sh or new tests/quality_bar.sh
  • Run manually before releases (too expensive for every PR)
  • Could be integrated into benchmarks/ directory structure

Key Insight from Meeting

"The biggest part of new user and our biggest fraction of our user base is people who have exactly one key... we should try to optimize for the simplest prompts when people are just starting out that they all basically work rather than disappoint right on the first step."

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions