🧠 LLM Evaluation Dashboard Agent

This is an agentic, graph-based framework for evaluating the outputs of Large Language Models (LLMs) using hallucination detection techniques.

Built using LangGraph, the project orchestrates a flow of evaluation agents that process prompts, generate responses, assess hallucination risk, and log results in a configurable, scalable pipeline.

🚀 Features

🧱 Agent-based modular design using LangGraph
✅ Multiple hallucination detection methods (LLM-as-a-Judge first, more coming)
⚙️ Configurable via config.yaml
🧪 Load test prompts and ground truths from JSON files
📊 Tracks tokens, latency, model version, and evaluation scores

🧪 Example Test Case

{
  "prompt": "What is the difference between synchronous and asynchronous programming in Python?",
  "ground_truth": "...",
  "model": "gpt-4o",
  "metadata": {
    "use_case": "technical explanation",
    "ground_truth_type": "text"
  }
}

🔮 Roadmap

✅ Completed

LangGraph setup with typed shared state (EvalState)
Input handler agent to prepare prompt state
Model runner agent to call OpenAI (GPT-4o)
Config-driven architecture via config.yaml
Hallucination detection via LLM-as-a-Judge (OpenAI)
JSON prompt ingestion for modular test cases
Add hallucination detection via embedding similarity
Telemetry logging agent (latency, token usage, verdicts)

🔜 In Progress / Planned

Add Vectara hallucination evaluation model (Hugging Face)
Evaluation metrics agent (tone, coherence, completeness scoring)
CLI or batch runner for test prompt files
Result logger (to JSONL or CSV)
Dashboard or Streamlit front-end for visualizing evaluations

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
.idea		.idea
agents		agents
config		config
data/prompts		data/prompts
state		state
utils		utils
.gitignore		.gitignore
README.md		README.md
config.yaml		config.yaml
main.py		main.py
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

🧠 LLM Evaluation Dashboard Agent

🚀 Features

🧪 Example Test Case

🔮 Roadmap

✅ Completed

🔜 In Progress / Planned

About

Uh oh!

Releases

Packages

Uh oh!

Languages

notmanas/llm-eval-dashboard-agent

Folders and files

Latest commit

History

Repository files navigation

🧠 LLM Evaluation Dashboard Agent

🚀 Features

🧪 Example Test Case

🔮 Roadmap

✅ Completed

🔜 In Progress / Planned

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Packages