Skip to content

Commit

Permalink
feat(generator): extend construction to any langchain LLM and Embeddi…
Browse files Browse the repository at this point in the history
…ngs (#670)

## **User description**
The current version of `with_openai` contains a hardcoded instantiation
of `langchain_openai.chat_models.ChatOpenAI`, which makes
`TestsetGenerator` very limited and not compatible with completion
models, Azure OpenAI models, and open-source models.

This PR extends `TestsetGenerator` to any `BaseLanguageModel` and
`Embeddings` from langchain for versatility, addressing #230, #342,
#635, and #636.

Lastly, I've removed all the occurrences of mutable default arguments
(bad antipattern, read
[here](https://docs.python-guide.org/writing/gotchas/#mutable-default-arguments)).

---------

Co-authored-by: Shahules786 <[email protected]>
Co-authored-by: jjmachan <[email protected]>
  • Loading branch information
3 people authored Mar 12, 2024
1 parent 7c4e7e6 commit 746d723
Show file tree
Hide file tree
Showing 13 changed files with 365 additions and 69 deletions.
2 changes: 1 addition & 1 deletion .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -162,7 +162,7 @@ cython_debug/
# be found at https://github.com/github/gitignore/blob/main/Global/JetBrains.gitignore
# and can be added to the global gitignore or merged into this file. For a more nuclear
# option (not recommended) you can uncomment the following to ignore the entire idea folder.
#.idea/
.idea/

# Ragas specific
experiments/
Expand Down
11 changes: 6 additions & 5 deletions docs/alfred.py
Original file line number Diff line number Diff line change
@@ -1,14 +1,15 @@
from __future__ import annotations

import os
from collections import namedtuple
import argparse
import asyncio
from tqdm.asyncio import tqdm
import os
import typing as t
from langchain_openai.chat_models import ChatOpenAI
from langchain_core.language_models.chat_models import BaseChatModel
from collections import namedtuple

from langchain.prompts import ChatPromptTemplate
from langchain_core.language_models.chat_models import BaseChatModel
from langchain_openai.chat_models import ChatOpenAI
from tqdm.asyncio import tqdm

File = namedtuple("File", "name content")

Expand Down
11 changes: 10 additions & 1 deletion docs/concepts/testset_generation.md
Original file line number Diff line number Diff line change
Expand Up @@ -60,11 +60,20 @@ Checkout [llama-index](https://gpt-index.readthedocs.io/en/stable/core_modules/d
:caption: Customising test data distribution
from ragas.testset.generator import TestsetGenerator
from ragas.testset.evolutions import simple, reasoning, multi_context
from langchain_openai import ChatOpenAI, OpenAIEmbeddings
# documents = load your documents
# generator with openai models
generator = TestsetGenerator.with_openai()
generator_llm = ChatOpenAI(model="gpt-3.5-turbo-16k")
critic_llm = ChatOpenAI(model="gpt-4")
embeddings = OpenAIEmbeddings()
generator = TestsetGenerator.from_langchain(
generator_llm,
critic_llm,
embeddings
)
# Change resulting question type distribution
distributions = {
Expand Down
11 changes: 10 additions & 1 deletion docs/getstarted/testset_generation.md
Original file line number Diff line number Diff line change
Expand Up @@ -41,9 +41,18 @@ Now, we'll import and use Ragas' `TestsetGenerator` to quickly generate a synthe
:caption: Create 10 samples using default configuration
from ragas.testset.generator import TestsetGenerator
from ragas.testset.evolutions import simple, reasoning, multi_context
from langchain_openai import ChatOpenAI, OpenAIEmbeddings
# generator with openai models
generator = TestsetGenerator.with_openai()
generator_llm = ChatOpenAI(model="gpt-3.5-turbo-16k")
critic_llm = ChatOpenAI(model="gpt-4")
embeddings = OpenAIEmbeddings()
generator = TestsetGenerator.from_langchain(
generator_llm,
critic_llm,
embeddings
)
# generate testset
testset = generator.generate_with_langchain_docs(documents, test_size=10, distributions={simple: 0.5, reasoning: 0.25, multi_context: 0.25})
Expand Down
12 changes: 11 additions & 1 deletion docs/howtos/applications/compare_embeddings.md
Original file line number Diff line number Diff line change
Expand Up @@ -29,14 +29,24 @@ For this tutorial notebook, I am using papers from Semantic Scholar that is rela
:caption: load documents using llama-hub and create test data
from llama_index import download_loader
from ragas.testset.evolutions import simple, reasoning, multi_context
from langchain_openai import ChatOpenAI, OpenAIEmbeddings
SemanticScholarReader = download_loader("SemanticScholarReader")
loader = SemanticScholarReader()
query_space = "large language models"
documents = loader.load_data(query=query_space, limit=100)
# generator with openai models
generator = TestsetGenerator.with_openai()
generator_llm = ChatOpenAI(model="gpt-3.5-turbo-16k")
critic_llm = ChatOpenAI(model="gpt-4")
embeddings = OpenAIEmbeddings()
generator = TestsetGenerator.from_langchain(
generator_llm,
critic_llm,
embeddings
)
distributions = {
simple: 0.5,
Expand Down
11 changes: 10 additions & 1 deletion docs/howtos/applications/compare_llms.md
Original file line number Diff line number Diff line change
Expand Up @@ -35,6 +35,7 @@ from llama_index import download_loader, SimpleDirectoryReader
from ragas.testset import TestsetGenerator
from ragas.testset.generator import TestsetGenerator
from ragas.testset.evolutions import simple, reasoning, multi_context
from langchain_openai import ChatOpenAI, OpenAIEmbeddings
os.environ['OPENAI_API_KEY'] = 'Your OPEN AI key'
Expand All @@ -43,7 +44,15 @@ reader = SimpleDirectoryReader("./arxiv-papers/",num_files_limit=30)
documents = reader.load_data()
# generator with openai models
generator = TestsetGenerator.with_openai()
generator_llm = ChatOpenAI(model="gpt-3.5-turbo-16k")
critic_llm = ChatOpenAI(model="gpt-4")
embeddings = OpenAIEmbeddings()
generator = TestsetGenerator.from_langchain(
generator_llm,
critic_llm,
embeddings
)
distributions = {
simple: 0.5,
Expand Down
11 changes: 10 additions & 1 deletion docs/howtos/applications/use_prompt_adaptation.md
Original file line number Diff line number Diff line change
Expand Up @@ -125,9 +125,18 @@ Now we can import all the required evolutions and adapt it using `generator.adap
from ragas.testset.generator import TestsetGenerator
from ragas.testset.evolutions import simple, reasoning, multi_context,conditional
from langchain_openai import ChatOpenAI, OpenAIEmbeddings
# generator with openai models
generator = TestsetGenerator.with_openai()
generator_llm = ChatOpenAI(model="gpt-3.5-turbo-16k")
critic_llm = ChatOpenAI(model="gpt-4")
embeddings = OpenAIEmbeddings()
generator = TestsetGenerator.from_langchain(
generator_llm,
critic_llm,
embeddings
)
# adapt to language
language = "hindi"
Expand Down
77 changes: 75 additions & 2 deletions docs/howtos/customisations/azure-openai.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -7,7 +7,11 @@
"source": [
"# Using Azure OpenAI\n",
"\n",
"This tutorial will show you how to use Azure OpenAI endpoints instead of OpenAI endpoints."
"This tutorial will show you how to use Azure OpenAI endpoints instead of OpenAI endpoints.\n",
"\n",
"\n",
"- [Evaluation](#load-sample-dataset)\n",
"- [Test set generation](#test-set-generation)"
]
},
{
Expand Down Expand Up @@ -416,6 +420,75 @@
"\n",
"if you have any suggestion/feedbacks/things your not happy about, please do share it in the [issue section](https://github.com/explodinggradients/ragas/issues). We love hearing from you 😁"
]
},
{
"cell_type": "markdown",
"id": "3cee41e9",
"metadata": {},
"source": [
"### Test set generation\n",
"\n",
"Here you will learn how to generate a test set from your dataset using the Azure OpenAI endpoints."
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "aa9ff398",
"metadata": {},
"outputs": [],
"source": [
"! git clone https://huggingface.co/datasets/explodinggradients/2023-llm-papers"
]
},
{
"cell_type": "code",
"execution_count": 2,
"id": "d935a561",
"metadata": {},
"outputs": [],
"source": [
"from langchain.document_loaders import DirectoryLoader\n",
"from ragas.testset.generator import TestsetGenerator\n",
"from ragas.testset.evolutions import simple, reasoning, multi_context\n",
"\n",
"\n",
"loader = DirectoryLoader(\"./2023-llm-papers/\", use_multithreading=True, silent_errors=True,sample_size=1)\n",
"documents = loader.load()\n",
"\n",
"for document in documents:\n",
" document.metadata['filename'] = document.metadata['source']"
]
},
{
"cell_type": "markdown",
"id": "c8f735a7",
"metadata": {},
"source": [
"Use the `azure_model` and `azure_embedding` that we initialized in above section to generate the test set"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "04abc4b1",
"metadata": {},
"outputs": [],
"source": [
"generator = TestsetGenerator.from_langchain(generator_llm=azure_model,critic_llm=azure_model,embeddings=azure_embeddings)\n",
"\n",
"testset = generator.generate_with_langchain_docs(documents, test_size=10, \n",
" raise_exceptions=False, with_debugging_logs=False,\n",
" distributions={simple: 0.5, reasoning: 0.25, multi_context: 0.25}) "
]
},
{
"cell_type": "markdown",
"id": "d2f5a7f7",
"metadata": {},
"source": [
"testset.to_pandas()"
]
}
],
"metadata": {
Expand All @@ -434,7 +507,7 @@
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.10.12"
"version": "3.10.8"
}
},
"nbformat": 4,
Expand Down
Loading

0 comments on commit 746d723

Please sign in to comment.