evaluation code pls #1

NonvolatileMemory · 2024-11-23T06:28:15Z

Hi,

Thanks for your insightful work and may I ask can you share your evaluation code of all the tasks on your paper?

VJHack · 2025-01-17T18:35:53Z

@Tomorrowdawn I'm looking for the eval code as well so that we can test the implementation of top-nsigma sampling in llama.cpp here to demonstrate its effectiveness. Thank you!

Tomorrowdawn · 2025-01-18T08:49:22Z

Thank you for your attention. After several months of development, the paper we are currently writing is quite different from the rough version on arXiv, and we have fixed some bugs (such as the unexpectedly low metrics in AQuA). The reasoning-related code can now be found in the dev branch. The latest results are shown below, which should be in line with the standard level of LLaMA-3.

Due to network restrictions, there are many temporary workarounds in local. Let me know if there is any compatibility problem or you can slightly adopt the code as described in readme. Environmental setup is hell :(

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

evaluation code pls #1

evaluation code pls #1

NonvolatileMemory commented Nov 23, 2024

VJHack commented Jan 17, 2025

Tomorrowdawn commented Jan 18, 2025

evaluation code pls #1

evaluation code pls #1

Comments

NonvolatileMemory commented Nov 23, 2024

VJHack commented Jan 17, 2025

Tomorrowdawn commented Jan 18, 2025