Skip to content

Conversation

@Aman071106
Copy link
Contributor

PR Description

This PR addresses Issue #618 by introducing a Functional Prototype (MVP) for the AI API Evaluation Framework.

It establishes the foundational UI and logic to allow users to benchmark AI models (e.g., GPT-4 vs Gemini). The implementation includes a dedicated Evaluations page, a comprehensive creation wizard, and a results visualization engine.

Architecture Choice: Pure Dart Implementation

We intentionally chose a Pure Dart implementation to maintain the project's single-binary portability and architecture consistency.
Reason:

  • No External Dependencies: Users do not need to install Python or manage virtual environments. The app remains "Download and Run".
  • Network Native: Dart is optimized for asynchronous I/O, making it perfect for handling concurrent API evaluation of multiple models.
  • Consistency: The entire codebase remains in a single language, simplifying maintenance and contribution.

Data Flow

  1. Input: User configures task (Type, Scoring, Batch) via the EvaluationsPage Dialog.
  2. State Management: EvaluationsNotifier (Riverpod) manages the queue of evaluation tasks.
  3. Execution (Mock/Future Real):
    • Current: Simulates network delays and random scores.
    • Future: Will utilize Dart's http package to call the respective AI APIs (OpenAI, Gemini, etc.) directly and compute metrics (e.g., Levenshtein distance, Exact Match) in-memory.
  4. Visualization: The UI listens to state changes and renders the results implementation.

Changes

  • lib/models/evaluation_model.dart: Added data models for EvaluationModel and EvaluationResult.
  • lib/providers/evaluation_providers.dart: Implemented StateNotifier for managing the evaluation lifecycle.
  • lib/screens/evaluations/evaluations_page.dart:
    • Implemented the List View for evaluations.
    • Added the Create Dialog with "Batch Size", "Task Type", and "Scoring" configurations.
    • Integrated a Dual-View Results Card (Table + Charts).
  • lib/screens/dashboard.dart: Added "Evaluations" to the navigation rail.

Related Issues

Screenshots

Evaluations Tab Create Dialog
Screenshot 2025-12-28 210051 Screenshot 2025-12-28 210101
Results Table Results Charts
Screenshot 2025-12-28 210024 Screenshot 2025-12-28 210036

Checklist

  • I have gone through the contributing guide
  • I have updated my branch and synced it with the project main branch before making this PR
  • I am using the latest Flutter stable branch
  • I have verified the fix manually (UI flows and Mock execution)

Added/updated tests?

  • No, and this is why: This is a UI/UX Prototype. Unit tests will be added when the real http execution logic is implemented in the next phase.

OS on which you have developed and tested the feature?

  • Windows

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant