-
Notifications
You must be signed in to change notification settings - Fork 15k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Language capabilities #247
Comments
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. If you believe this issue is still relevant, please leave a comment to keep it open. Thank you for your contributions! |
Bla. |
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. If you believe this issue is still relevant, please leave a comment to keep it open. Thank you for your contributions! |
Bla again |
Congrats on the amazing model release! I've been unable to find which languages you actually consider "supported" for the model, since the paper just mentions that you were "expanding multilingual coverage beyond English and Chinese" without further data. If you don't have a concrete answer to that question, maybe you can respond which languages the model has been instruction-tuned with. If you can open up the per-language sampling/mixture rate in the data distribution, that'd also be very helpful.
I also noticed that the paper says you used the non-English part from the MMMLU dataset. In the paper, the HF repo is referenced, which does not contain an English part (I'm assuming you mean that the original English MMLU wasn't included), but which does contain a Chinese part.
I'm assuming that you did not include English in this evaluation set in order to not skew the multilingual results, because English is one of the two main powerful languages of the model. Shouldn't Chinese have been filtered as well to achieve a similar reduction of result skewing?
Thank you and good luck continuing the great work! :)
The text was updated successfully, but these errors were encountered: