The official evaluation suite and dynamic data release for MixEval.
benchmark evaluation benchmarking-suite evaluation-framework benchmarking-framework foundation-models large-language-models large-language-model llm-inference llm-evaluation large-multimodal-models llm-evaluation-framework benchmark-mixture mixeval
-
Updated
Nov 10, 2024 - Python