You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Hi thanks for the great open source model! I am trying to reproduce the MATH benchmark, but currently I only achieve 50.9% (average over 10 retries) instead of the 51.9% reported by official llama. Thus I wonder whether it is normal to have such difference, or I am doing something wrong here. Especially, it would be great if I could know the correct prompts and templates to evaluate MATH.
Hi thanks for the great open source model! I am trying to reproduce the MATH benchmark, but currently I only achieve 50.9% (average over 10 retries) instead of the 51.9% reported by official llama. Thus I wonder whether it is normal to have such difference, or I am doing something wrong here. Especially, it would be great if I could know the correct prompts and templates to evaluate MATH.
I also seem to find a bit of inconsistency between llama docs. https://github.com/meta-llama/llama3/blob/main/eval_details.md says "4-shot", while https://ai.meta.com/blog/meta-llama-3-1/ (table) says "0-shot CoT". Thus I wonder whether the numbers are 4-shot or 0-shot?
The text was updated successfully, but these errors were encountered: