examples: add RAG failure diagnostics flow example (#20795) by onestardao · Pull Request #20818 · PrefectHQ/prefect

onestardao · 2026-02-24T05:37:27Z

Add a new example in examples/rag_failure_diagnostics.py that shows how to:

instrument each stage of a simple RAG pipeline with Prefect tasks and logs
surface signals that map incidents to common failure patterns (retrieval hallucination, retriever coverage issues, recall gaps, etc.)

This is a docs-only change that addresses the request in #20795.

Overview

This PR introduces a self-contained example for users running RAG or LLM pipelines with Prefect.

The example constructs a tiny FAQ knowledge base, applies naive chunking, uses a toy keyword-based retriever, and simulates an incorrect model answer. The flow logs diagnostics such as retrieval coverage, retrieved chunk IDs, missing or forbidden keywords, and prints possible failure patterns to investigate.

The goal is to give users a minimal, inspectable template for debugging RAG flows using Prefect’s task boundaries and logging.

Checklist

This pull request references the related issue by including: closes Proposal: RAG flow failure analysis tutorial using WFGY 16-problem ProblemMap #20795
This pull request adds no new functionality and is a docs-only change, so no unit tests are required
No docs files are removed and no redirect settings are needed in mint.json
This pull request adds an example script; docstrings are included within the file where appropriate

Add a new example in examples/rag_failure_diagnostics.py that shows how to: - instrument each stage of a simple RAG pipeline with Prefect tasks and logs - surface signals that map incidents to common failure patterns (retrieval hallucination, retriever coverage issues, etc.) This is a docs-only change that addresses the request in PrefectHQ#20795.

…iling comma for query param).

- Align examples/rag_failure_diagnostics.py with pre-commit (ruff) import rules - Let CI pass on the RAG failure diagnostics flow example

onestardao · 2026-02-24T08:30:57Z

Thanks for the earlier guidance.
I’ve aligned the example with the existing Prefect examples and all checks are passing now.
Happy to adjust anything further if needed.

desertaxle

I left some comments, but overall, this example doesn't seem very interesting. I'm not sure how users would apply this to real-world use cases, and it doesn't showcase any Prefect features that help solve common RAG pipeline challenges. It's possible that RAG failure diagnostics and workflow orchestration are orthogonal, and an example for this isn't useful, but let us know if you have ideas to make this example more interesting.

desertaxle · 2026-02-25T14:46:33Z

examples/rag_failure_diagnostics.py

+from __future__ import annotations
+
+from dataclasses import dataclass
+from typing import Dict, List, Tuple


We prefer to use the built-in dict, list, and tuple for typing.

desertaxle · 2026-02-25T14:49:09Z

examples/rag_failure_diagnostics.py

+    logger.info("Diagnostics summary:")
+    logger.info("  retrieved_ids = %s", retrieved_ids)
+    logger.info(
+        "  retrieval_coverage = %.2f",
+        retrieval_metrics.get("coverage", 0.0),
+    )
+    logger.info("  missing_keywords_in_answer = %s", missing_keywords)
+    logger.info("  answer_contains_forbidden = %s", answer_contains_forbidden)
+
+    # Map observations to higher level failure patterns.
+    logger.info("Possible failure patterns to investigate:")


Could this use a Prefect artifact instead of a wall of log messages?

…ectHQ#20795)

- Update the RAG failure diagnostics example to emit a `rag-failure-diagnostics` markdown artifact instead of only relying on logs - Keep logs focused on step-level events while the artifact summarizes a single query (coverage, retrieved_ids, missing keywords, and probable failure patterns) - Switch typing in the example to use built-in generics (list, dict, tuple) - Still a docs-only change that adds a self-contained RAG diagnostics flow addressing the request in PrefectHQ#20795

onestardao · 2026-02-26T07:48:25Z

Thanks a lot for the review and suggestions.

I’ve updated the example to:

switch the typing to the built-in generics (list[str], dict[str, float], etc.), and
emit a small rag-failure-diagnostics markdown artifact that summarizes a single run
(query, retrieved_ids, retrieval coverage, missing keywords, and the inferred failure patterns).

The artifact shows up in the Prefect UI next to the flow run, so users can inspect the
RAG failure signals at a glance instead of scrolling through a long wall of log lines.
The example is still self-contained and docs-only, and the pattern is meant to be a
minimal template that teams can adapt to their own internal failure mode checklist.

Happy to tweak the artifact content or naming if you prefer something else.

onestardao requested review from chrisguidry, cicdw, desertaxle and zzstoatzz as code owners February 24, 2026 05:37

github-actions bot added the docs label Feb 24, 2026

onestardao added 5 commits February 24, 2026 14:04

Fix formatting in RAG diagnostics example

d6ef271

Fix formatting in rag_failure_diagnostics_flow example (avg_len + tra…

fa06249

…iling comma for query param).

Fix import order in RAG diagnostics example (ProblemMap No.X)

78779a7

- Align examples/rag_failure_diagnostics.py with pre-commit (ruff) import rules - Let CI pass on the RAG failure diagnostics flow example

examples: align RAG failure diagnostics example with Prefect style

a1ea209

examples: sort imports in RAG failure diagnostics example

cdf6edf

desertaxle requested changes Feb 25, 2026

View reviewed changes

onestardao added 4 commits February 26, 2026 15:05

examples: add a self-contained RAG diagnostics flow example

c52c141

examples: add a simple RAG diagnostics and routing flow example (Pref…

55040e1

…ectHQ#20795)

Merge branch 'PrefectHQ:main' into main

b0d6050

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

examples: add RAG failure diagnostics flow example (#20795)#20818

examples: add RAG failure diagnostics flow example (#20795)#20818
onestardao wants to merge 10 commits intoPrefectHQ:mainfrom
onestardao:main

onestardao commented Feb 24, 2026

Uh oh!

onestardao commented Feb 24, 2026

Uh oh!

desertaxle left a comment

Uh oh!

desertaxle Feb 25, 2026

Uh oh!

desertaxle Feb 25, 2026

Uh oh!

onestardao commented Feb 26, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

onestardao commented Feb 24, 2026

Overview

Checklist

Uh oh!

onestardao commented Feb 24, 2026

Uh oh!

desertaxle left a comment

Choose a reason for hiding this comment

Uh oh!

desertaxle Feb 25, 2026

Choose a reason for hiding this comment

Uh oh!

desertaxle Feb 25, 2026

Choose a reason for hiding this comment

Uh oh!

onestardao commented Feb 26, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants