Skip to content

Conversation

@Dhravya
Copy link
Member

@Dhravya Dhravya commented Dec 29, 2025

Implements configurable parallelism for benchmark phases to improve performance and throughput when running evaluations.

Changes

Core Features

  • Add ParallelExecutor utility for concurrent phase execution with graceful stop support
  • Implement per-phase parallelism configuration via CLI flags
  • Add atomic checkpoint saves with temp file + rename pattern to prevent corruption
  • Add flush() method to ensure all checkpoint writes complete before run completion

Phase Updates

  • Refactor answer, evaluate, indexing, ingest, and search phases to use ParallelExecutor
  • Replace sequential loops with parallel execution using configurable concurrency
  • Maintain per-question error handling and progress tracking

CLI Enhancements

  • Add --parallelism flag for default concurrency across all phases
  • Add phase-specific flags: --parallelism-{ingest,indexing,search,answer,evaluate}
  • Parallelism settings persisted in checkpoint and respected on resume

Type System

  • Add ParallelismConfig type for phase-specific concurrency settings
  • Add resolveParallelism() helper to determine effective parallelism with fallbacks
  • Extend RunCheckpoint to store parallelism configuration

Improvements

  • Thread-safe checkpoint saving prevents race conditions during parallel writes
  • Graceful shutdown support with shouldStop() checks in parallel execution
  • Progress logging maintains visibility during concurrent operations

Testing

  • Tested with various parallelism configurations
  • Verified checkpoint integrity under concurrent writes
  • Confirmed graceful stop functionality works with parallel execution

Implements configurable parallelism for benchmark phases to improve performance and throughput when running evaluations.

## Changes

### Core Features
- Add ParallelExecutor utility for concurrent phase execution with graceful stop support
- Implement per-phase parallelism configuration via CLI flags
- Add atomic checkpoint saves with temp file + rename pattern to prevent corruption
- Add flush() method to ensure all checkpoint writes complete before run completion

### Phase Updates
- Refactor answer, evaluate, indexing, ingest, and search phases to use ParallelExecutor
- Replace sequential loops with parallel execution using configurable concurrency
- Maintain per-question error handling and progress tracking

### CLI Enhancements
- Add --parallelism flag for default concurrency across all phases
- Add phase-specific flags: --parallelism-{ingest,indexing,search,answer,evaluate}
- Parallelism settings persisted in checkpoint and respected on resume

### Type System
- Add ParallelismConfig type for phase-specific concurrency settings
- Add resolveParallelism() helper to determine effective parallelism with fallbacks
- Extend RunCheckpoint to store parallelism configuration

### Improvements
- Thread-safe checkpoint saving prevents race conditions during parallel writes
- Graceful shutdown support with shouldStop() checks in parallel execution
- Progress logging maintains visibility during concurrent operations

## Testing
- Tested with various parallelism configurations
- Verified checkpoint integrity under concurrent writes
- Confirmed graceful stop functionality works with parallel execution
Copy link
Member Author

Dhravya commented Dec 29, 2025

This stack of pull requests is managed by Graphite. Learn more about stacking.

export interface Provider {
name: string
prompts?: ProviderPrompts
defaultParallelism?: ParallelismConfig
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Dhravya code looks great!
"parallelism" feels vague, also technically imprecise
we are doing concurrency - single-threaded with async op

popular libs also use concurrency - fastq, p-limit

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ingesting and Indexing concurrent rate could be merged because if a provider can handle x bandwidth then it can handle x indexing req [this will minimise user friction]

✅search should be separate as provider's can usually handle higher search bandwidth

✅Answer and evaluate should be separate [as we could customise judge and answering model]

Comment on lines +466 to +548

<div className="mt-6 pt-4 border-t border-[#333333]">
<button
type="button"
onClick={() => setShowPerformanceSettings(!showPerformanceSettings)}
className="flex items-center gap-2 text-sm font-medium text-text-primary mb-3 hover:text-accent transition-colors"
>
<svg className={`w-4 h-4 transition-transform ${showPerformanceSettings ? "rotate-90" : ""}`} fill="none" viewBox="0 0 24 24" stroke="currentColor">
<path strokeLinecap="round" strokeLinejoin="round" strokeWidth={2} d="M9 5l7 7-7 7" />
</svg>
Performance Settings
</button>

{showPerformanceSettings && (
<div className="ml-6 space-y-3 p-4 bg-[#1a1a1a] border border-[#333333] rounded">
<p className="text-xs text-text-muted">
Configure parallelism for this run. Leave empty to use source run settings or provider defaults.
</p>

<div className="grid grid-cols-2 gap-4">
<div>
<label className="block text-sm font-medium text-text-primary mb-2">
Default Parallelism
</label>
<input
type="number"
className="w-full px-3 py-2 text-sm bg-[#222222] border border-[#444444] rounded text-text-primary focus:outline-none focus:border-accent"
value={form.parallelism.default ?? ""}
onChange={(e) => setForm({
...form,
parallelism: { ...form.parallelism, default: e.target.value ? parseInt(e.target.value) : undefined }
})}
placeholder="1 (sequential)"
min="1"
/>
<p className="text-xs text-text-muted mt-1">Applies to all phases unless overridden</p>
</div>

<div className="flex items-end">
<button
type="button"
onClick={() => setShowPerPhaseSettings(!showPerPhaseSettings)}
className="text-sm text-accent hover:text-accent/80 transition-colors mb-2"
>
{showPerPhaseSettings ? "Hide" : "Show"} per-phase settings
</button>
</div>
</div>

{showPerPhaseSettings && (
<div className="grid grid-cols-3 gap-3 pt-2 border-t border-[#333333]">
{(["ingest", "indexing", "search", "answer", "evaluate"] as const).map(phase => (
<div key={phase}>
<label className="block text-xs font-medium text-text-secondary mb-1 capitalize">
{phase}
</label>
<input
type="number"
className="w-full px-2 py-1.5 text-sm bg-[#222222] border border-[#444444] rounded text-text-primary focus:outline-none focus:border-accent"
value={form.parallelism[phase] ?? ""}
onChange={(e) => setForm({
...form,
parallelism: { ...form.parallelism, [phase]: e.target.value ? parseInt(e.target.value) : undefined }
})}
placeholder="—"
min="1"
/>
</div>
))}
</div>
)}

<div className="flex items-start gap-2 p-3 bg-blue-500/5 border border-blue-500/20 rounded">
<svg className="w-4 h-4 text-blue-400 mt-0.5 flex-shrink-0" fill="none" viewBox="0 0 24 24" stroke="currentColor">
<path strokeLinecap="round" strokeLinejoin="round" strokeWidth={2} d="M13 16h-1v-4h-1m1-4h.01M21 12a9 9 0 11-18 0 9 9 0 0118 0z" />
</svg>
<div className="text-xs text-blue-200">
<strong>Recommendations:</strong> Ingest/Indexing: 50-200, Search: 20-50, Answer/Evaluate: 10-20
</div>
</div>
</div>
)}
</div>
Copy link
Contributor

@Prasanna721 Prasanna721 Dec 29, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't know if this is useful for the ui, once the provider sets it in the code for the first time and they're most unlikely to change it again ever

if so

the UI can be simpler, toggle button to enable parallelism, icon to change inline, advanced settings -> onclick dropdown (hides default number and icon)

[toggle button] Concurrent requests: 0 <pencil icon> [TAB space] [TAB space] [Advanced settings]
[small description text]
on dropdown
Ingest: 0 <pencil icon>
Search: 0 <pencil icon>
Answer: 0 <pencil icon>
Evaluate: 0 <pencil icon>

which follows two click rule for adding a feature

Comment on lines +349 to +376

{run.parallelism && (run.parallelism.default !== undefined ||
run.parallelism.ingest !== undefined ||
run.parallelism.indexing !== undefined ||
run.parallelism.search !== undefined ||
run.parallelism.answer !== undefined ||
run.parallelism.evaluate !== undefined) && (
<div className="p-4 bg-[#1a1a1a] border border-[#333333] rounded">
<h3 className="text-sm font-semibold text-text-primary mb-3">Performance Configuration</h3>
<div className="grid grid-cols-6 gap-3 text-xs">
{run.parallelism.default !== undefined && (
<div>
<span className="text-text-muted">Default:</span>
<span className="ml-2 text-text-primary font-medium">{run.parallelism.default}</span>
</div>
)}
{(["ingest", "indexing", "search", "answer", "evaluate"] as const).map(phase => (
run.parallelism?.[phase] !== undefined && (
<div key={phase}>
<span className="text-text-muted capitalize">{phase}:</span>
<span className="ml-2 text-text-primary font-medium">{run.parallelism[phase]}</span>
</div>
)
))}
</div>
</div>
)}

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we probably dont need to show it in the runId page

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes I also think that the UI itself doesnt look good xD

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants