Proposal - AI API Evaluation Framework -MVP #992

Aman071106 · 2025-12-28T15:34:55Z

PR Description

This PR addresses Issue #618 by introducing a Functional Prototype (MVP) for the AI API Evaluation Framework.

It establishes the foundational UI and logic to allow users to benchmark AI models (e.g., GPT-4 vs Gemini). The implementation includes a dedicated Evaluations page, a comprehensive creation wizard, and a results visualization engine.

Architecture Choice: Pure Dart Implementation

We intentionally chose a Pure Dart implementation to maintain the project's single-binary portability and architecture consistency.
Reason:

No External Dependencies: Users do not need to install Python or manage virtual environments. The app remains "Download and Run".
Network Native: Dart is optimized for asynchronous I/O, making it perfect for handling concurrent API evaluation of multiple models.
Consistency: The entire codebase remains in a single language, simplifying maintenance and contribution.

Data Flow

Input: User configures task (Type, Scoring, Batch) via the EvaluationsPage Dialog.
State Management: EvaluationsNotifier (Riverpod) manages the queue of evaluation tasks.
Execution (Mock/Future Real):
- Current: Simulates network delays and random scores.
- Future: Will utilize Dart's http package to call the respective AI APIs (OpenAI, Gemini, etc.) directly and compute metrics (e.g., Levenshtein distance, Exact Match) in-memory.
Visualization: The UI listens to state changes and renders the results implementation.

Changes

lib/models/evaluation_model.dart: Added data models for EvaluationModel and EvaluationResult.
lib/providers/evaluation_providers.dart: Implemented StateNotifier for managing the evaluation lifecycle.
lib/screens/evaluations/evaluations_page.dart:
- Implemented the List View for evaluations.
- Added the Create Dialog with "Batch Size", "Task Type", and "Scoring" configurations.
- Integrated a Dual-View Results Card (Table + Charts).
lib/screens/dashboard.dart: Added "Evaluations" to the navigation rail.

Related Issues

AI API Eval Framework #618

Screenshots

Evaluations Tab	Create Dialog

Results Table	Results Charts

Checklist

I have gone through the contributing guide
I have updated my branch and synced it with the project main branch before making this PR
I am using the latest Flutter stable branch
I have verified the fix manually (UI flows and Mock execution)

Added/updated tests?

No, and this is why: This is a UI/UX Prototype. Unit tests will be added when the real http execution logic is implemented in the next phase.

OS on which you have developed and tested the feature?

Windows

Proposal - AI API Evaluation Framework -MVP

d984d3a

Aman071106 mentioned this pull request Dec 28, 2025

AI API Eval Framework #618

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Proposal - AI API Evaluation Framework -MVP #992

Proposal - AI API Evaluation Framework -MVP #992

Aman071106 commented Dec 28, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Proposal - AI API Evaluation Framework -MVP #992

Are you sure you want to change the base?

Proposal - AI API Evaluation Framework -MVP #992

Conversation

Aman071106 commented Dec 28, 2025

PR Description

Architecture Choice: Pure Dart Implementation

Data Flow

Changes

Related Issues

Screenshots

Checklist

Added/updated tests?

OS on which you have developed and tested the feature?

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant