-
Notifications
You must be signed in to change notification settings - Fork 604
Add accuracy_sample_count #2414
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add accuracy_sample_count #2414
Conversation
|
MLCommons CLA bot All contributors have signed the MLCommons CLA ✍️ ✅ |
865c33b to
3f2f719
Compare
|
@pgmpablo157321 @tanvi-mlcommons @mrmhodak please help review this PR - the accuracy sample count is something new we add to separate the accuracy and performance test dataset. Can you help review and suggest what else is needed for this feature? |
|
@pgmpablo157321: Please take a look to see if you agree with this. |
|
@nvzhihanj Can you please confirm if this PR has been tested for a full performance/accuracy run of retinanet where the dataset size is different from the performance_sample_count? |
fa9056a to
edc6938
Compare
|
@arjunsuresh the test failures above seem to not be related to the PR: https://github.com/mlcommons/inference/actions/runs/20966840528/job/60259481963?pr=2414 Can you please check? |
|
@arjunsuresh |
9c3e1b1 to
e06d6d4
Compare
…loader (mlcommons#2358) * Remove Rclone instructions from README.md * Remove Rclone download instructions from README.md * Tweak README.md * Switch from Rclone to R2 Downloader in README.md * Switch from Rclone to R2 Downloader in README.md * Switch from Rclone to R2 Downloader in README.md * Switch Rclone for R2 Downloader in README.md * Switch Rclone for R2 Downloader in README.md * Use r2 downloader for gpt j model download (mlcommons#2365) * Provide r2 download commands for mixtral model and datasets (mlcommons#2364) * Replace MLCFlow RClone command for criteo dataset with R2 (mlcommons#2363) * Deprecate MLCFlow rclone download command with r2 (mlcommons#2362) * Add instruction to download DeepSeek model through MLCflow (mlcommons#2361) * [Automated Commit] Format Codebase * Trigger cla-check * [Automated Commit] Format Codebase * Update build_wheels.yml * [Automated Commit] Format Codebase * Add dtypes to README.md --------- Co-authored-by: ANANDHU S <[email protected]> Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com> Co-authored-by: Arjun Suresh <[email protected]> Co-authored-by: Pablo Gonzalez <[email protected]> Co-authored-by: Pablo Gonzalez <[email protected]>
e06d6d4 to
609b787
Compare
0124ac1 to
2700bc6
Compare
pgmpablo157321
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@v-shobhit LGTM, but can we add this changes to the modularized submission checker as well. I have them in this branch, but I can't add them to shobbit's repository
https://github.com/mlcommons/inference/tree/acc_sample_count
@pgmpablo157321 is it the commit f81d32a I will cherry-pick this |
In the future, benchmarks (like gpt-oss) may have separate perf and accuracy datasets
This PR adds a separate config field,
accuracy_sample_count, to set the number of samples in the acc eval dataset - separate from the existingperformance_sample_countwhich will be used for the size of the perf eval dataset.This new field defaults to
performance_sample_countfor backwards compatibility.