examples: add RAG failure diagnostics flow example (#20795)#20818
examples: add RAG failure diagnostics flow example (#20795)#20818onestardao wants to merge 10 commits intoPrefectHQ:mainfrom
Conversation
Add a new example in examples/rag_failure_diagnostics.py that shows how to: - instrument each stage of a simple RAG pipeline with Prefect tasks and logs - surface signals that map incidents to common failure patterns (retrieval hallucination, retriever coverage issues, etc.) This is a docs-only change that addresses the request in PrefectHQ#20795.
…iling comma for query param).
- Align examples/rag_failure_diagnostics.py with pre-commit (ruff) import rules - Let CI pass on the RAG failure diagnostics flow example
|
Thanks for the earlier guidance. |
desertaxle
left a comment
There was a problem hiding this comment.
I left some comments, but overall, this example doesn't seem very interesting. I'm not sure how users would apply this to real-world use cases, and it doesn't showcase any Prefect features that help solve common RAG pipeline challenges. It's possible that RAG failure diagnostics and workflow orchestration are orthogonal, and an example for this isn't useful, but let us know if you have ideas to make this example more interesting.
examples/rag_failure_diagnostics.py
Outdated
| from __future__ import annotations | ||
|
|
||
| from dataclasses import dataclass | ||
| from typing import Dict, List, Tuple |
There was a problem hiding this comment.
We prefer to use the built-in dict, list, and tuple for typing.
examples/rag_failure_diagnostics.py
Outdated
| logger.info("Diagnostics summary:") | ||
| logger.info(" retrieved_ids = %s", retrieved_ids) | ||
| logger.info( | ||
| " retrieval_coverage = %.2f", | ||
| retrieval_metrics.get("coverage", 0.0), | ||
| ) | ||
| logger.info(" missing_keywords_in_answer = %s", missing_keywords) | ||
| logger.info(" answer_contains_forbidden = %s", answer_contains_forbidden) | ||
|
|
||
| # Map observations to higher level failure patterns. | ||
| logger.info("Possible failure patterns to investigate:") |
There was a problem hiding this comment.
Could this use a Prefect artifact instead of a wall of log messages?
- Update the RAG failure diagnostics example to emit a `rag-failure-diagnostics` markdown artifact instead of only relying on logs - Keep logs focused on step-level events while the artifact summarizes a single query (coverage, retrieved_ids, missing keywords, and probable failure patterns) - Switch typing in the example to use built-in generics (list, dict, tuple) - Still a docs-only change that adds a self-contained RAG diagnostics flow addressing the request in PrefectHQ#20795
|
Thanks a lot for the review and suggestions. I’ve updated the example to:
The artifact shows up in the Prefect UI next to the flow run, so users can inspect the Happy to tweak the artifact content or naming if you prefer something else. |
Add a new example in
examples/rag_failure_diagnostics.pythat shows how to:This is a docs-only change that addresses the request in #20795.
Overview
This PR introduces a self-contained example for users running RAG or LLM pipelines with Prefect.
The example constructs a tiny FAQ knowledge base, applies naive chunking, uses a toy keyword-based retriever, and simulates an incorrect model answer. The flow logs diagnostics such as retrieval coverage, retrieved chunk IDs, missing or forbidden keywords, and prints possible failure patterns to investigate.
The goal is to give users a minimal, inspectable template for debugging RAG flows using Prefect’s task boundaries and logging.
Checklist
mint.json