Skip to content

Implement CLI Tool for AMP Evaluation SDK #273

@nadheesh

Description

@nadheesh

Problem Statement

Running the amp-evaluation SDK locally currently requires writing Python scripts and managing configuration programmatically, which creates friction for rapid iteration and testing. Developers need a streamlined command-line interface to quickly evaluate agent traces and run experiments without boilerplate code.

Motivation

A dedicated CLI tool will:

  • Reduce iteration time: Test evaluators instantly without writing Python scripts
  • Improve developer experience: Simple commands for common evaluation workflows
  • Enable CI/CD integration: Easy to incorporate into automated testing pipelines
  • Lower barrier to entry: Make evaluation accessible to non-Python experts

Use Cases

1. Evaluate Exported Traces

Run evaluators against locally exported OTEL trace files (JSON format):

# Run all registered evaluators
amp-eval trace evaluate my_trace.json

# Run specific evaluators
amp-eval trace evaluate my_trace.json --evaluators latency,answer_relevancy

# Specify output format
amp-eval trace evaluate my_trace.json --output results.json --format json

# Use custom evaluator config
amp-eval trace evaluate my_trace.json --config evaluators.yaml

2. Run Dataset Experiments

Execute experiment runs against local datasets:

# Run experiment with dataset
amp-eval experiment run my_dataset.csv --evaluators hallucination,toxicity

# Specify agent invoker
amp-eval experiment run my_dataset.csv --agent my_agent.py:invoke_agent

# Save detailed results
amp-eval experiment run my_dataset.csv --output-dir ./results --save-traces

# Resume failed experiment
amp-eval experiment run my_dataset.csv --resume ./results/experiment_123

3. Evaluator Management

List and inspect available evaluators:

# List all registered evaluators
amp-eval evaluators list

# Show evaluator details
amp-eval evaluators info answer_relevancy

# Validate custom evaluator
amp-eval evaluators validate my_evaluator.py

4. Configuration Management

Manage evaluation configurations:

# Initialize config template
amp-eval init --template experiment

# Validate configuration
amp-eval config validate evaluators.yaml

# Show current config
amp-eval config show

Proposed Commands

Command Structure

amp-eval <resource> <action> [arguments] [options]

Command Reference

Command Description
amp-eval trace evaluate <file> Evaluate single trace file
amp-eval trace batch <dir> Evaluate multiple traces in directory
amp-eval experiment run <dataset> Run experiment against dataset
amp-eval evaluators list List available evaluators
amp-eval evaluators info <name> Show evaluator details
amp-eval init Initialize configuration template
amp-eval config validate <file> Validate config file
amp-eval version Show CLI and SDK versions

Acceptance Criteria

  • Users can evaluate a trace file with a single command
  • Users can run experiments against datasets without writing code
  • CLI provides helpful error messages and validation
  • Configuration can be specified via files or flags
  • Output formats include console, JSON, and CSV
  • Documentation includes usage examples for all commands
  • CLI is packaged and installable via pip
  • Exit codes are appropriate for CI/CD integration
  • Tests cover core CLI functionality

Metadata

Metadata

Assignees

Labels

No labels
No labels

Type

No type

Projects

No projects

Milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions