docs: caching in ragas (#1779)

explodinggradients · Dec 24, 2024 · 9403320 · 9403320
1 parent e8f9232
commit 9403320
Show file tree

Hide file tree

Showing 7 changed files with 392 additions and 5 deletions.
diff --git a/Makefile b/Makefile
@@ -36,10 +36,11 @@ test-e2e: ## Run end2end tests
 run-ci: format lint type test ## Running all CI checks
 
 # Docs
-rewrite-docs: ## Use GPT4 to rewrite the documentation
-	@echo "Rewriting the documentation in directory $(DIR)..."
-	$(Q)python $(GIT_ROOT)/docs/python alphred.py --directory $(DIR)
-docsite: ## Build and serve documentation
+build-docsite: ## Use GPT4 to rewrite the documentation
+	@echo "convert ipynb notebooks to md files"
+	$(Q)python $(GIT_ROOT)/docs/ipynb_to_md.py
+	@(Q)mkdocs build
+serve-docsite: ## Build and serve documentation
 	$(Q)mkdocs serve --dirtyreload
 
 # Benchmarks

diff --git a/docs/howtos/customizations/_caching.md b/docs/howtos/customizations/_caching.md
@@ -0,0 +1,100 @@
+# Caching in Ragas
+
+You can use caching to speed up your evaluations and testset generation by avoiding redundant computations. We use Exact Match Caching to cache the responses from the LLM and Embedding models.
+
+You can use the [DiskCacheBackend][ragas.cache.DiskCacheBackend] which uses a local disk cache to store the cached responses. You can also implement your own custom cacher by implementing the [CacheInterface][ragas.cache.CacheInterface].
+
+
+## Using DefaultCacher
+
+Let's see how you can use the [DiskCacheBackend][ragas.cache.DiskCacheBackend]  LLM and Embedding models.
+
+
+
+```python
+from ragas.cache import DiskCacheBackend
+
+cacher = DiskCacheBackend()
+
+# check if the cache is empty and clear it
+print(len(cacher.cache))
+cacher.cache.clear()
+print(len(cacher.cache))
+```
+
+
+
+
+    DiskCacheBackend(cache_dir=.cache)
+
+
+
+Create an LLM and Embedding model with the cacher, here I'm using the `ChatOpenAI` from [langchain-openai](https://github.com/langchain-ai/langchain-openai) as an example.
+
+
+
+```python
+from langchain_openai import ChatOpenAI
+from ragas.llms import LangchainLLMWrapper
+
+cached_llm = LangchainLLMWrapper(ChatOpenAI(model="gpt-4o"), cache=cacher)
+```
+
+
+```python
+# if you want to see the cache in action, set the logging level to debug
+import logging
+from ragas.utils import set_logging_level
+
+set_logging_level("ragas.cache", logging.DEBUG)
+```
+
+Now let's run a simple evaluation.
+
+
+```python
+from ragas import evaluate
+from ragas import EvaluationDataset
+
+from ragas.metrics import FactualCorrectness, AspectCritic
+from datasets import load_dataset
+
+# Define Answer Correctness with AspectCritic
+answer_correctness = AspectCritic(
+    name="answer_correctness",
+    definition="Is the answer correct? Does it match the reference answer?",
+    llm=cached_llm,
+)
+
+metrics = [answer_correctness, FactualCorrectness(llm=cached_llm)]
+
+# load the dataset
+dataset = load_dataset(
+    "explodinggradients/amnesty_qa", "english_v3", trust_remote_code=True
+)
+eval_dataset = EvaluationDataset.from_hf_dataset(dataset["eval"])
+
+# evaluate the dataset
+results = evaluate(
+    dataset=eval_dataset,
+    metrics=metrics,
+)
+
+results
+```
+
+This took almost 2mins to run in our local machine. Now let's run it again to see the cache in action.
+
+
+```python
+results = evaluate(
+    dataset=eval_dataset,
+    metrics=metrics,
+)
+
+results
+```
+
+Runs almost instantaneously.
+
+You can also use this with testset generation also by replacing the `generator_llm` with a cached version of it. Refer to the [testset generation](../../getstarted/rag_testset_generation.md) section for more details.
diff --git a/docs/howtos/customizations/caching.ipynb b/docs/howtos/customizations/caching.ipynb
@@ -0,0 +1,173 @@
+{
+ "cells": [
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "# Caching in Ragas\n",
+    "\n",
+    "You can use caching to speed up your evaluations and testset generation by avoiding redundant computations. We use Exact Match Caching to cache the responses from the LLM and Embedding models.\n",
+    "\n",
+    "You can use the [DiskCacheBackend][ragas.cache.DiskCacheBackend] which uses a local disk cache to store the cached responses. You can also implement your own custom cacher by implementing the [CacheInterface][ragas.cache.CacheInterface].\n",
+    "\n",
+    "\n",
+    "## Using DefaultCacher\n",
+    "\n",
+    "Let's see how you can use the [DiskCacheBackend][ragas.cache.DiskCacheBackend]  LLM and Embedding models.\n"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 1,
+   "metadata": {},
+   "outputs": [
+    {
+     "data": {
+      "text/plain": [
+       "DiskCacheBackend(cache_dir=.cache)"
+      ]
+     },
+     "execution_count": 1,
+     "metadata": {},
+     "output_type": "execute_result"
+    }
+   ],
+   "source": [
+    "from ragas.cache import DiskCacheBackend\n",
+    "\n",
+    "cacher = DiskCacheBackend()\n",
+    "\n",
+    "# check if the cache is empty and clear it\n",
+    "print(len(cacher.cache))\n",
+    "cacher.cache.clear()\n",
+    "print(len(cacher.cache))"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "Create an LLM and Embedding model with the cacher, here I'm using the `ChatOpenAI` from [langchain-openai](https://github.com/langchain-ai/langchain-openai) as an example.\n"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 2,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "from langchain_openai import ChatOpenAI\n",
+    "from ragas.llms import LangchainLLMWrapper\n",
+    "\n",
+    "cached_llm = LangchainLLMWrapper(ChatOpenAI(model=\"gpt-4o\"), cache=cacher)"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "# if you want to see the cache in action, set the logging level to debug\n",
+    "import logging\n",
+    "from ragas.utils import set_logging_level\n",
+    "\n",
+    "set_logging_level(\"ragas.cache\", logging.DEBUG)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "Now let's run a simple evaluation."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "from ragas import evaluate\n",
+    "from ragas import EvaluationDataset\n",
+    "\n",
+    "from ragas.metrics import FactualCorrectness, AspectCritic\n",
+    "from datasets import load_dataset\n",
+    "\n",
+    "# Define Answer Correctness with AspectCritic\n",
+    "answer_correctness = AspectCritic(\n",
+    "    name=\"answer_correctness\",\n",
+    "    definition=\"Is the answer correct? Does it match the reference answer?\",\n",
+    "    llm=cached_llm,\n",
+    ")\n",
+    "\n",
+    "metrics = [answer_correctness, FactualCorrectness(llm=cached_llm)]\n",
+    "\n",
+    "# load the dataset\n",
+    "dataset = load_dataset(\n",
+    "    \"explodinggradients/amnesty_qa\", \"english_v3\", trust_remote_code=True\n",
+    ")\n",
+    "eval_dataset = EvaluationDataset.from_hf_dataset(dataset[\"eval\"])\n",
+    "\n",
+    "# evaluate the dataset\n",
+    "results = evaluate(\n",
+    "    dataset=eval_dataset,\n",
+    "    metrics=metrics,\n",
+    ")\n",
+    "\n",
+    "results"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "This took almost 2mins to run in our local machine. Now let's run it again to see the cache in action."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "results = evaluate(\n",
+    "    dataset=eval_dataset,\n",
+    "    metrics=metrics,\n",
+    ")\n",
+    "\n",
+    "results"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "Runs almost instantaneously.\n",
+    "\n",
+    "You can also use this with testset generation also by replacing the `generator_llm` with a cached version of it. Refer to the [testset generation](../../getstarted/rag_testset_generation.md) section for more details."
+   ]
+  }
+ ],
+ "metadata": {
+  "kernelspec": {
+   "display_name": ".venv",
+   "language": "python",
+   "name": "python3"
+  },
+  "language_info": {
+   "codemirror_mode": {
+    "name": "ipython",
+    "version": 3
+   },
+   "file_extension": ".py",
+   "mimetype": "text/x-python",
+   "name": "python",
+   "nbconvert_exporter": "python",
+   "pygments_lexer": "ipython3",
+   "version": "3.10.15"
+  }
+ },
+ "nbformat": 4,
+ "nbformat_minor": 2
+}
diff --git a/docs/references/cache.md b/docs/references/cache.md
@@ -0,0 +1,3 @@
+::: ragas.cache
+    options:
+        members_order: "source"
diff --git a/mkdocs.yml b/mkdocs.yml
@@ -77,6 +77,7 @@ nav:
           - General:
               - Customise models: howtos/customizations/customize_models.md
               - Run Config: howtos/customizations/_run_config.md
+              - Caching: howtos/customizations/_caching.md
           - Metrics:
               - Modify Prompts: howtos/customizations/metrics/_modifying-prompts-metrics.md
               - Adapt Metrics to Languages: howtos/customizations/metrics/_metrics_language_adaptation.md
@@ -88,6 +89,7 @@ nav:
               - Persona Generation: howtos/customizations/testgenerator/_persona_generator.md
               - Custom Single-hop Query: howtos/customizations/testgenerator/_testgen-custom-single-hop.md
               - Custom Multi-hop Query: howtos/customizations/testgenerator/_testgen-customisation.md
+
       - Applications:
           - howtos/applications/index.md
           - Metrics:
@@ -107,6 +109,7 @@ nav:
       - Embeddings: references/embeddings.md
       - RunConfig: references/run_config.md
       - Executor: references/executor.md
+      - Cache: references/cache.md
     - Evaluation: 
       - Schemas: references/evaluation_schema.md
       - Metrics: references/metrics.md
@@ -237,3 +240,4 @@ extra_javascript:
   - _static/js/header_border.js
   - https://unpkg.com/mathjax@3/es5/tex-mml-chtml.js
   - _static/js/toggle.js
+  - https://cdn.octolane.com/tag.js?pk=c7c9b2b863bf7eaf4e2a # octolane for analytics