You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
@Tomorrowdawn I'm looking for the eval code as well so that we can test the implementation of top-nsigma sampling in llama.cpp here to demonstrate its effectiveness. Thank you!
Thank you for your attention. After several months of development, the paper we are currently writing is quite different from the rough version on arXiv, and we have fixed some bugs (such as the unexpectedly low metrics in AQuA). The reasoning-related code can now be found in the dev branch. The latest results are shown below, which should be in line with the standard level of LLaMA-3.
Due to network restrictions, there are many temporary workarounds in local. Let me know if there is any compatibility problem or you can slightly adopt the code as described in readme. Environmental setup is hell :(
Hi,
Thanks for your insightful work and may I ask can you share your evaluation code of all the tasks on your paper?
The text was updated successfully, but these errors were encountered: