Home

🌤️ Lighteval

Lighteval is your all-in-one toolkit for evaluating LLMs across multiple backends—whether it's transformers, tgi, vllm, or nanotron—with ease. Dive deep into your model’s performance by saving and exploring detailed, sample-by-sample results to debug and see how your models stack-up.

Customization at your fingertips: letting you effortlessly create new tasks and metrics tailored to your needs, or browsing all our existing tasks and metrics.

Seamlessly experiment, benchmark, and store your results on the Hugging Face Hub, S3, or locally.

🌤️ Lighteval
Getting Started
- Installation
- Quicktour
Guides
API Reference
- Available Metrics
- Available Tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Home

🌤️ Lighteval

Clone this wiki locally