Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

evaluation code pls #1

Open
NonvolatileMemory opened this issue Nov 23, 2024 · 2 comments
Open

evaluation code pls #1

NonvolatileMemory opened this issue Nov 23, 2024 · 2 comments

Comments

@NonvolatileMemory
Copy link

Hi,

Thanks for your insightful work and may I ask can you share your evaluation code of all the tasks on your paper?

@VJHack
Copy link

VJHack commented Jan 17, 2025

@Tomorrowdawn I'm looking for the eval code as well so that we can test the implementation of top-nsigma sampling in llama.cpp here to demonstrate its effectiveness. Thank you!

@Tomorrowdawn
Copy link
Owner

Thank you for your attention. After several months of development, the paper we are currently writing is quite different from the rough version on arXiv, and we have fixed some bugs (such as the unexpectedly low metrics in AQuA). The reasoning-related code can now be found in the dev branch. The latest results are shown below, which should be in line with the standard level of LLaMA-3.

Image

Due to network restrictions, there are many temporary workarounds in local. Let me know if there is any compatibility problem or you can slightly adopt the code as described in readme. Environmental setup is hell :(

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants