Release BigCodeBench v0.2.3.post1

Latest

Latest

terryyz released this 01 Feb 04:21

dcff46f

What's Changed

Fix Docker image and its dependencies
Support more models with reasoning effort
Optional chat prefilling
E2B, Gradio, and Local code execution

Evaluated LLMs (173 models)

o3-mini
DeepSeek R1

Full Changelog: v0.2.1.post7...v0.2.3.post1

Assets 2