Translator Benchmarks

This repository provides a set of benchmarks as well as the code to send the queries and evaluate the returned results of a benchmark.

benchmarks contains the code to query targets and evaluate results.

config contains the data sets, query templates, targets, and benchmark definitions necessary to run a benchmark. See config/README.md for details about targets and benchmarks.

Usage

Running a benchmark is a two-step process:

Execute the queries of a benchmark and store the scored results.
Evaluate the scored results against the set of ground-truth relevant results.

Installation of the benchmarks package provides access to the functions and command-line interface necessary to run benchmarks.

CLI

The command-line interface is the easiest way to run a benchmark.

benchmarks_fetch
- Fetches (un)scored results given the name of a benchmark (specified in config/benchmarks.json), target (specified in config/targets.json), and a directory to store results.
- By default, benchmarks_fetch fetches scored results using 5 concurrent requests. Run benchmarks_fetch --help for more details.
benchmarks_score
- Scores results given the name of a benchmark (specified in config/benchmarks.json), target (specified in config/targets.json), a directory containing unscored results, and a directory to store scored results.
- By default, benchmarks_score uses 5 concurrent requests. Run benchmarks_score --help for more details.
benchmarks_eval
- Evaluates a set of scored results given the name of a benchmark (specified in config/benchmarks.json) and a directory containing scored results.
- By default, the evaluation considers the top 20 results of each query, and plots are not generated. Run benchmarks_eval --help for more details.

Functions

The CLI functionality is also available by importing functions from the benchmarks package.

from benchmarks.request import fetch_results, score_results
from benchmarks.eval import evaluate_results

# Fetch unscored results
fetch_results('benchmark_name', 'target_name', 'unscored_results_dir', scored=False)

# Score unscored results
score_results('unscored_results_dir', 'target_name', 'results_dir')

# Evaluate scored results
evaluate_results('benchmark_name', 'results_dir OR results_dict')

See the documentation of each function for more information.

Installation

Install the repository as an editable package using pip.

pip install -e .

Name		Name	Last commit message	Last commit date
Latest commit History 27 Commits
benchmarks		benchmarks
config		config
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
requirements.txt		requirements.txt
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Translator Benchmarks

Usage

CLI

Functions

Installation

About

Releases

Packages

Languages

License

GregHydeDartmouth/Benchmarks

Folders and files

Latest commit

History

Repository files navigation

Translator Benchmarks

Usage

CLI

Functions

Installation

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages