Skip to content

A set of samples showcasing how to develop reliable and observable AI agents for production environments.

License

Notifications You must be signed in to change notification settings

Azure-Samples/eval-driven-agents

Repository files navigation

Eval-Driven Agents: From Uncertainty to Reliability 🚀

Reducing uncertainty when introducing changes to AI Apps or Agents is the key unlock for widespread adoption. Over the past decade, test-driven development (TDD) paved the way for building robust, maintainable software. As we step into the next era, evaluation-driven development (Eval-Driven or EDD) will play a pivotal role in ensuring that compound AI-driven systems are both reliable, observable, and maintainable in production.

This repository, eval-driven-agents, provides a series of samples and best practices to help developers and organizations confidently evolve their AI solutions. By integrating evaluation-driven methodologies—such as continuous evaluation, tracing, telemetry, and observability—teams can iterate rapidly, maintain high quality, and make data-driven improvements.

What’s Inside? 🌱

  • Incremental Complexity:
    Discover samples starting with basic function-calling agents with tracing, progressing towards comprehensive, fully instrumented systems.

  • Observability & Tracing:
    Gain visibility into model decisions, tool usage, system behaviors, costs, latency metrics, and other key performance indicators to diagnose issues quickly and refine AI performance.

  • Evaluation-Driven Workflows:
    Learn how to continuously evaluate changes through experimentation, measure their impact via automated CI/CD pipelines with GitHub Actions, and ensure that every update is a step toward greater reliability.

Structure 📂

  • <subfolder>: Each folder highlights a specific capability or pattern (e.g., tracing, evaluations, experimentations, scenario testing), building on the fundamental concepts of Eval-Driven methodologies.

As you explore these samples, you’ll see how Eval-Driven development transforms the way we approach building, testing, and deploying AI agents—ultimately driving more robust solutions and confident decision-making.

About

A set of samples showcasing how to develop reliable and observable AI agents for production environments.

Resources

License

Code of conduct

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages