GitHub · Where software is built

Streamlining lm-eval Architecture
#3083 · baberabb opened on Jun 23, 2025
Datasets with loading scripts - no longer supported
#3171 · baberabb opened on Jul 21, 2025
12

Labels Milestones New issue

[Bug] <code>fewshot_config.doc_to_text</code> is ignored in v0.4.9.2, breaking MMLU-Pro CoT fewshot examples

#3457

· ye7shu opened

on Dec 10, 2025

[feat] absence-bench

#3456

· jannalulu opened

on Dec 9, 2025

Performance decreasing when passing from 0 to few shots?

#3452

· MikeCorv opened

on Dec 8, 2025

mbpp extract_code filter failed to correctly filter out the generated code.

#3447

· konpoku opened

on Dec 3, 2025

RULER task crashes with API models (local-chat-completions): TypeError: unhashable type: 'dict'

#3441

· mramendi opened

on Dec 2, 2025

No tasks specified, or no tasks found.

#3434

· mao1333 opened

on Nov 27, 2025

Get model SHA is always failed when pass existing model to HFLM

#3431

· wogns3623 opened

on Nov 26, 2025

<code>add_bos_token</code> is not correctly set for Gemma3

#3430

· IKACE opened

on Nov 26, 2025

Bad Performance of llama-2-70b in Humaneval Benchmark

#3427

· DarkenStar opened

on Nov 25, 2025

Cannot load custom task config file via its path

#3424

· SkyR0ver opened

on Nov 24, 2025

Requirements Are Unsatisfiable - Installing From Source

#3423

· uygarkurt opened

on Nov 23, 2025

Optional mode to skip samples exceeding max_length

#3421

· nevertmr opened

on Nov 22, 2025