[BUG: https://mistral.ai/news/mistral-large-2407/ Are there relevant papers, and what are the metrics used to measure the dataset? For example, Is the evaluation metric for MultiPL-E pass@1 ? #235
Labels
bug
Something isn't working
Python -VV
Pip Freeze
Reproduction Steps
Are there relevant papers, and what are the metrics used to measure the dataset? For example, Is the evaluation metric for MultiPL-E pass@1 ?
Expected Behavior
Are there relevant papers, and what are the metrics used to measure the dataset? For example, Is the evaluation metric for MultiPL-E pass@1 ?
Additional Context
No response
Suggested Solutions
No response
The text was updated successfully, but these errors were encountered: