This repository provides a set of benchmarks as well as the code to send the queries and evaluate the returned results of a benchmark.
benchmarks
contains the code to query targets and evaluate results.
config
contains the data sets, query templates, targets, and benchmark definitions necessary to run a benchmark. See config/README.md
for details about targets and benchmarks.
Running a benchmark is a two-step process:
- Execute the queries of a benchmark and store the scored results.
- Evaluate the scored results against the set of ground-truth relevant results.
Installation of the benchmarks
package provides access to the functions and command-line interface necessary to run benchmarks.
The command-line interface is the easiest way to run a benchmark.
-
benchmarks_fetch
- Fetches (un)scored results given the name of a benchmark (specified in
config/benchmarks.json
), target (specified inconfig/targets.json
), and a directory to store results. - By default,
benchmarks_fetch
fetches scored results using 5 concurrent requests. Runbenchmarks_fetch --help
for more details.
- Fetches (un)scored results given the name of a benchmark (specified in
-
benchmarks_score
- Scores results given the name of a benchmark (specified in
config/benchmarks.json
), target (specified inconfig/targets.json
), a directory containing unscored results, and a directory to store scored results. - By default,
benchmarks_score
uses 5 concurrent requests. Runbenchmarks_score --help
for more details.
- Scores results given the name of a benchmark (specified in
-
benchmarks_eval
- Evaluates a set of scored results given the name of a benchmark (specified in
config/benchmarks.json
) and a directory containing scored results. - By default, the evaluation considers the top 20 results of each query, and plots are not generated. Run
benchmarks_eval --help
for more details.
- Evaluates a set of scored results given the name of a benchmark (specified in
The CLI functionality is also available by importing functions from the benchmarks
package.
from benchmarks.request import fetch_results, score_results
from benchmarks.eval import evaluate_results
# Fetch unscored results
fetch_results('benchmark_name', 'target_name', 'unscored_results_dir', scored=False)
# Score unscored results
score_results('unscored_results_dir', 'target_name', 'results_dir')
# Evaluate scored results
evaluate_results('benchmark_name', 'results_dir OR results_dict')
See the documentation of each function for more information.
Install the repository as an editable package using pip
.
pip install -e .