You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardexpand all lines: docs/concepts/components/eval_dataset.md
+27-4
Original file line number
Diff line number
Diff line change
@@ -2,15 +2,21 @@
2
2
3
3
An evaluation dataset is a homogeneous collection of [data samples](eval_sample.md) designed to assess the performance and capabilities of an AI application. In Ragas, evaluation datasets are represented using the `EvaluationDataset` class, which provides a structured way to organize and manage data samples for evaluation purposes.
4
4
5
-
## Structure of an Evaluation Dataset
5
+
-[Overview](#overview)
6
+
-[Creating an Evaluation Dataset from SingleTurnSamples](#creating-an-evaluation-dataset-from-singleturnsamples)
7
+
-[Loading an Evaluation Dataset from Hugging Face Datasets](#loading-an-evaluation-dataset-from-hugging-face-datasets)
8
+
9
+
## Overview
10
+
11
+
### Structure of an Evaluation Dataset
6
12
7
13
An evaluation dataset consists of:
8
14
9
15
-**Samples**: A collection of [SingleTurnSample](eval_sample.md#singleturnsample) or [MultiTurnSample](eval_sample.md#multiturnsample) instances. Each sample represents a unique interaction or scenario.
10
16
-**Consistency**: All samples within the dataset should be of the same type (either all single-turn or all multi-turn samples) to maintain consistency in evaluation.
11
17
12
18
13
-
## Guidelines for Curating an Effective Evaluation Dataset
19
+
###Guidelines for Curating an Effective Evaluation Dataset
14
20
15
21
-**Define Clear Objectives**: Identify the specific aspects of the AI application that you want to evaluate and the scenarios you want to test. Collect data samples that reflect these objectives.
16
22
@@ -19,7 +25,7 @@ An evaluation dataset consists of:
19
25
-**Quality and Size**: Aim for a dataset that is large enough to provide meaningful insights but not so large that it becomes unwieldy. Ensure that the data is of high quality and accurately reflects the real-world scenarios you want to evaluate.
20
26
21
27
22
-
### Example
28
+
##Creating an Evaluation Dataset from SingleTurnSamples
23
29
24
30
In this example, we’ll demonstrate how to create an EvaluationDataset using multiple `SingleTurnSample` instances. We’ll walk through the process step by step, including creating individual samples, assembling them into a dataset, and performing basic operations on the dataset.
25
31
@@ -68,4 +74,21 @@ Create an EvaluationDataset by passing a list of SingleTurnSample instances.
## Loading an Evaluation Dataset from Hugging Face Datasets
78
+
79
+
In practice, you may want to load an evaluation dataset from an existing dataset source, such as the Hugging Face Datasets library. The following example demonstrates how to load an evaluation dataset from a Hugging Face dataset and convert it into an EvaluationDataset instance.
80
+
81
+
Ensure that the dataset contains the necessary fields for evaluation, such as user inputs, retrieved contexts, responses, and references.
0 commit comments