Skip to content

Commit dbfd61c

Browse files
authored
docs: fix usage of eval dataset (#1514)
1 parent 583220d commit dbfd61c

File tree

2 files changed

+28
-5
lines changed

2 files changed

+28
-5
lines changed

docs/concepts/components/eval_dataset.md

+27-4
Original file line numberDiff line numberDiff line change
@@ -2,15 +2,21 @@
22

33
An evaluation dataset is a homogeneous collection of [data samples](eval_sample.md) designed to assess the performance and capabilities of an AI application. In Ragas, evaluation datasets are represented using the `EvaluationDataset` class, which provides a structured way to organize and manage data samples for evaluation purposes.
44

5-
## Structure of an Evaluation Dataset
5+
- [Overview](#overview)
6+
- [Creating an Evaluation Dataset from SingleTurnSamples](#creating-an-evaluation-dataset-from-singleturnsamples)
7+
- [Loading an Evaluation Dataset from Hugging Face Datasets](#loading-an-evaluation-dataset-from-hugging-face-datasets)
8+
9+
## Overview
10+
11+
### Structure of an Evaluation Dataset
612

713
An evaluation dataset consists of:
814

915
- **Samples**: A collection of [SingleTurnSample](eval_sample.md#singleturnsample) or [MultiTurnSample](eval_sample.md#multiturnsample) instances. Each sample represents a unique interaction or scenario.
1016
- **Consistency**: All samples within the dataset should be of the same type (either all single-turn or all multi-turn samples) to maintain consistency in evaluation.
1117

1218

13-
## Guidelines for Curating an Effective Evaluation Dataset
19+
### Guidelines for Curating an Effective Evaluation Dataset
1420

1521
- **Define Clear Objectives**: Identify the specific aspects of the AI application that you want to evaluate and the scenarios you want to test. Collect data samples that reflect these objectives.
1622

@@ -19,7 +25,7 @@ An evaluation dataset consists of:
1925
- **Quality and Size**: Aim for a dataset that is large enough to provide meaningful insights but not so large that it becomes unwieldy. Ensure that the data is of high quality and accurately reflects the real-world scenarios you want to evaluate.
2026

2127

22-
### Example
28+
## Creating an Evaluation Dataset from SingleTurnSamples
2329

2430
In this example, we’ll demonstrate how to create an EvaluationDataset using multiple `SingleTurnSample` instances. We’ll walk through the process step by step, including creating individual samples, assembling them into a dataset, and performing basic operations on the dataset.
2531

@@ -68,4 +74,21 @@ Create an EvaluationDataset by passing a list of SingleTurnSample instances.
6874
dataset = EvaluationDataset(samples=[sample1, sample2, sample3])
6975
```
7076

71-
[EvaluationDataset API Reference]()
77+
## Loading an Evaluation Dataset from Hugging Face Datasets
78+
79+
In practice, you may want to load an evaluation dataset from an existing dataset source, such as the Hugging Face Datasets library. The following example demonstrates how to load an evaluation dataset from a Hugging Face dataset and convert it into an EvaluationDataset instance.
80+
81+
Ensure that the dataset contains the necessary fields for evaluation, such as user inputs, retrieved contexts, responses, and references.
82+
83+
```python
84+
from datasets import load_dataset
85+
dataset = load_dataset("explodinggradients/amnesty_qa","english_v3")
86+
```
87+
88+
Load the dataset into a Ragas EvaluationDataset object.
89+
90+
```python
91+
from ragas import EvaluationDataset
92+
93+
eval_dataset = EvaluationDataset.from_hf_dataset(dataset["eval"])
94+
```

docs/getstarted/rag_evaluation.md

+1-1
Original file line numberDiff line numberDiff line change
@@ -11,7 +11,7 @@ from datasets import load_dataset
1111
dataset = load_dataset("explodinggradients/amnesty_qa","english_v3")
1212
```
1313

14-
Converting data to ragas [evaluation dataset](../concepts/components/eval_dataset.md)
14+
Load the dataset into Ragas EvaluationDataset object.
1515

1616
```python
1717
from ragas import EvaluationDataset

0 commit comments

Comments
 (0)