-
Notifications
You must be signed in to change notification settings - Fork 30
Open
Description
Problem
Benchmark experiments (like Jihye's 135k lines) need a home for collaboration within the main repo.
From Dec 13 Benchmarking Meeting:
"We probably want to... put it in PDD... we probably want that in the main repo."
Proposed Structure
pdd/
└── benchmarks/
├── humaneval/ # Jihye's benchmark work
├── auto-regen/ # PDD self-regeneration benchmark
├── single-key-matrix/ # Simple quality bar tests
├── experiments/
│ └── [contributor-name]/ # Individual experiments
├── data/
│ └── benchmark_runs/ # Historical results (Git LFS)
└── README.md
Requirements
- Git LFS enabled for large data files in
benchmarks/data/ - Clear README explaining how to run benchmarks
- Separate from core
pdd/source code
Benefits
- Centralized location for benchmark experiments
- Historical data preserved for time-series analysis
- Contributors can share work in their own subdirectories
- Enables CI/CD integration for quality bar testing
Related
- Metrics for regeneration #152 (Metrics for regeneration) - auto-regen benchmark would live here
- The quality bar test matrix issue - single-key-matrix would live here
Metadata
Metadata
Assignees
Labels
No labels