Implement CLI Tool for AMP Evaluation SDK

## Problem Statement

Running the amp-evaluation SDK locally currently requires writing Python scripts and managing configuration programmatically, which creates friction for rapid iteration and testing. Developers need a streamlined command-line interface to quickly evaluate agent traces and run experiments without boilerplate code.

## Motivation

A dedicated CLI tool will:
- **Reduce iteration time**: Test evaluators instantly without writing Python scripts
- **Improve developer experience**: Simple commands for common evaluation workflows
- **Enable CI/CD integration**: Easy to incorporate into automated testing pipelines
- **Lower barrier to entry**: Make evaluation accessible to non-Python experts

## Use Cases

### 1. Evaluate Exported Traces
Run evaluators against locally exported OTEL trace files (JSON format):

```bash
# Run all registered evaluators
amp-eval trace evaluate my_trace.json

# Run specific evaluators
amp-eval trace evaluate my_trace.json --evaluators latency,answer_relevancy

# Specify output format
amp-eval trace evaluate my_trace.json --output results.json --format json

# Use custom evaluator config
amp-eval trace evaluate my_trace.json --config evaluators.yaml
```

### 2. Run Dataset Experiments
Execute experiment runs against local datasets:

```bash
# Run experiment with dataset
amp-eval experiment run my_dataset.csv --evaluators hallucination,toxicity

# Specify agent invoker
amp-eval experiment run my_dataset.csv --agent my_agent.py:invoke_agent

# Save detailed results
amp-eval experiment run my_dataset.csv --output-dir ./results --save-traces

# Resume failed experiment
amp-eval experiment run my_dataset.csv --resume ./results/experiment_123
```

### 3. Evaluator Management
List and inspect available evaluators:

```bash
# List all registered evaluators
amp-eval evaluators list

# Show evaluator details
amp-eval evaluators info answer_relevancy

# Validate custom evaluator
amp-eval evaluators validate my_evaluator.py
```

### 4. Configuration Management
Manage evaluation configurations:

```bash
# Initialize config template
amp-eval init --template experiment

# Validate configuration
amp-eval config validate evaluators.yaml

# Show current config
amp-eval config show
```

## Proposed Commands

### Command Structure
```
amp-eval <resource> <action> [arguments] [options]
```

### Command Reference

| Command | Description |
|---------|-------------|
| `amp-eval trace evaluate <file>` | Evaluate single trace file |
| `amp-eval trace batch <dir>` | Evaluate multiple traces in directory |
| `amp-eval experiment run <dataset>` | Run experiment against dataset |
| `amp-eval evaluators list` | List available evaluators |
| `amp-eval evaluators info <name>` | Show evaluator details |
| `amp-eval init` | Initialize configuration template |
| `amp-eval config validate <file>` | Validate config file |
| `amp-eval version` | Show CLI and SDK versions |

## Acceptance Criteria

- [ ] Users can evaluate a trace file with a single command
- [ ] Users can run experiments against datasets without writing code
- [ ] CLI provides helpful error messages and validation
- [ ] Configuration can be specified via files or flags
- [ ] Output formats include console, JSON, and CSV
- [ ] Documentation includes usage examples for all commands
- [ ] CLI is packaged and installable via pip
- [ ] Exit codes are appropriate for CI/CD integration
- [ ] Tests cover core CLI functionality


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Implement CLI Tool for AMP Evaluation SDK #273

Problem Statement

Motivation

Use Cases

1. Evaluate Exported Traces

2. Run Dataset Experiments

3. Evaluator Management

4. Configuration Management

Proposed Commands

Command Structure

Command Reference

Acceptance Criteria

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Command	Description
`amp-eval trace evaluate <file>`	Evaluate single trace file
`amp-eval trace batch <dir>`	Evaluate multiple traces in directory
`amp-eval experiment run <dataset>`	Run experiment against dataset
`amp-eval evaluators list`	List available evaluators
`amp-eval evaluators info <name>`	Show evaluator details
`amp-eval init`	Initialize configuration template
`amp-eval config validate <file>`	Validate config file
`amp-eval version`	Show CLI and SDK versions

Implement CLI Tool for AMP Evaluation SDK #273

Description

Problem Statement

Motivation

Use Cases

1. Evaluate Exported Traces

2. Run Dataset Experiments

3. Evaluator Management

4. Configuration Management

Proposed Commands

Command Structure

Command Reference

Acceptance Criteria

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions