Add accuracy_sample_count #2414

v-shobhit · 2025-12-17T13:15:23Z

In the future, benchmarks (like gpt-oss) may have separate perf and accuracy datasets

This PR adds a separate config field, accuracy_sample_count, to set the number of samples in the acc eval dataset - separate from the existing performance_sample_count which will be used for the size of the perf eval dataset.

This new field defaults to performance_sample_count for backwards compatibility.

github-actions · 2025-12-17T13:15:32Z

MLCommons CLA bot All contributors have signed the MLCommons CLA ✍️ ✅

nvzhihanj · 2025-12-17T16:43:01Z

@pgmpablo157321 @tanvi-mlcommons @mrmhodak please help review this PR - the accuracy sample count is something new we add to separate the accuracy and performance test dataset. Can you help review and suggest what else is needed for this feature?

mrmhodak · 2025-12-17T17:47:54Z

@pgmpablo157321: Please take a look to see if you agree with this.

arjunsuresh · 2026-01-06T22:06:31Z

@nvzhihanj Can you please confirm if this PR has been tested for a full performance/accuracy run of retinanet where the dataset size is different from the performance_sample_count?

v-shobhit · 2026-01-13T17:51:17Z

@arjunsuresh the test failures above seem to not be related to the PR: https://github.com/mlcommons/inference/actions/runs/20966840528/job/60259481963?pr=2414

Can you please check?

loadgen/loadgen.cc

v-shobhit · 2026-01-14T21:54:57Z

@arjunsuresh
Checked with retinanet:

Running per image evaluation...
Evaluate annotation type *bbox*
DONE (t=133.42s).
Accumulating evaluation results...
DONE (t=28.34s).
 Average Precision  (AP) @[ IoU=0.50:0.95 | area=   all | maxDets=100 ] = 0.37582
 Average Precision  (AP) @[ IoU=0.50      | area=   all | maxDets=100 ] = 0.52478
 Average Precision  (AP) @[ IoU=0.75      | area=   all | maxDets=100 ] = 0.40635
 Average Precision  (AP) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.02461
 Average Precision  (AP) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.12698
 Average Precision  (AP) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.41543
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets=  1 ] = 0.41975
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets= 10 ] = 0.59758
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets=100 ] = 0.62703
 Average Recall     (AR) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.08161
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.34103
 Average Recall     (AR) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.67732
TestScenario.Server qps=134.19, mean=0.6929, time=184.674, acc=41.030%, mAP=37.582%, queries=24781, tiles=50.0:0.7219,80.0:0.8067,90.0:0.8279,95.0:0.8393,99.0:0.8690,99.9:1.4932

…loader (mlcommons#2358) * Remove Rclone instructions from README.md * Remove Rclone download instructions from README.md * Tweak README.md * Switch from Rclone to R2 Downloader in README.md * Switch from Rclone to R2 Downloader in README.md * Switch from Rclone to R2 Downloader in README.md * Switch Rclone for R2 Downloader in README.md * Switch Rclone for R2 Downloader in README.md * Use r2 downloader for gpt j model download (mlcommons#2365) * Provide r2 download commands for mixtral model and datasets (mlcommons#2364) * Replace MLCFlow RClone command for criteo dataset with R2 (mlcommons#2363) * Deprecate MLCFlow rclone download command with r2 (mlcommons#2362) * Add instruction to download DeepSeek model through MLCflow (mlcommons#2361) * [Automated Commit] Format Codebase * Trigger cla-check * [Automated Commit] Format Codebase * Update build_wheels.yml * [Automated Commit] Format Codebase * Add dtypes to README.md --------- Co-authored-by: ANANDHU S <[email protected]> Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com> Co-authored-by: Arjun Suresh <[email protected]> Co-authored-by: Pablo Gonzalez <[email protected]> Co-authored-by: Pablo Gonzalez <[email protected]>

pgmpablo157321

@v-shobhit LGTM, but can we add this changes to the modularized submission checker as well. I have them in this branch, but I can't add them to shobbit's repository
https://github.com/mlcommons/inference/tree/acc_sample_count

v-shobhit · 2026-01-16T19:08:33Z

@v-shobhit LGTM, but can we add this changes to the modularized submission checker as well. I have them in this branch, but I can't add them to shobbit's repository https://github.com/mlcommons/inference/tree/acc_sample_count

@pgmpablo157321 is it the commit f81d32a

I will cherry-pick this

v-shobhit requested a review from a team as a code owner December 17, 2025 13:15

v-shobhit force-pushed the shobhitv/acc_sample_count branch from 865c33b to 3f2f719 Compare December 17, 2025 13:37

mrmhodak previously approved these changes Jan 6, 2026

View reviewed changes

arjunsuresh dismissed mrmhodak’s stale review via a5fb1dc January 6, 2026 22:07

v-shobhit force-pushed the shobhitv/acc_sample_count branch from fa9056a to edc6938 Compare January 8, 2026 18:32

nvzhihanj approved these changes Jan 12, 2026

View reviewed changes

viraatc approved these changes Jan 13, 2026

View reviewed changes

nv-alicheng reviewed Jan 14, 2026

View reviewed changes

loadgen/loadgen.cc Show resolved Hide resolved

v-shobhit force-pushed the shobhitv/acc_sample_count branch from 9c3e1b1 to e06d6d4 Compare January 14, 2026 22:42

v-shobhit and others added 15 commits January 15, 2026 21:27

add accuracy_sample_count

6a7223a

cap count to QSL->TotalSampleCount()

4d0a52f

[Automated Commit] Format Codebase

f662905

empty commit to re-trigger test

209a403

[Automated Commit] Format Codebase

dfd030d

rm newline

6697f17

rm test05 lines

792b92d

add accuracy_sample_count to submission_checker

7e5f63c

[Automated Commit] Format Codebase

3d961cc

fix check

8745041

[Automated Commit] Format Codebase

c995c49

empty commit to trigger test

7633827

add gpt-oss-120b loadgen settings to mlperf.conf

d0ef037

empty commit to trigger test

e174ec2

pgmpablo157321 and others added 2 commits January 15, 2026 21:27

Update seeds for inference v6.0 (mlcommons#2437)

8a7eea8

use total_sample_count for default of accuracy_sample_count

609b787

v-shobhit force-pushed the shobhitv/acc_sample_count branch from e06d6d4 to 609b787 Compare January 15, 2026 21:28

revert changes to .github

2700bc6

v-shobhit force-pushed the shobhitv/acc_sample_count branch from 0124ac1 to 2700bc6 Compare January 15, 2026 22:25

arjunsuresh previously approved these changes Jan 16, 2026

View reviewed changes

pgmpablo157321 reviewed Jan 16, 2026

View reviewed changes

Add accuracy_sample_count to modularized submission checker

3d9c629

v-shobhit dismissed arjunsuresh’s stale review via 3d9c629 January 16, 2026 19:09

github-actions bot and others added 3 commits January 16, 2026 19:10

[Automated Commit] Format Codebase

1e48c77

empty commit to trigger test

4eeacde

Merge branch 'master' into shobhitv/acc_sample_count

f47ac4c

pgmpablo157321 approved these changes Jan 20, 2026

View reviewed changes

pgmpablo157321 merged commit a8d4d78 into mlcommons:master Jan 20, 2026
36 checks passed

github-actions bot locked and limited conversation to collaborators Jan 20, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Add accuracy_sample_count #2414

Add accuracy_sample_count #2414

v-shobhit commented Dec 17, 2025

Uh oh!

github-actions bot commented Dec 17, 2025 •

edited

Loading

Uh oh!

nvzhihanj commented Dec 17, 2025

Uh oh!

mrmhodak commented Dec 17, 2025

Uh oh!

arjunsuresh commented Jan 6, 2026

Uh oh!

v-shobhit commented Jan 13, 2026

Uh oh!

Uh oh!

v-shobhit commented Jan 14, 2026

Uh oh!

pgmpablo157321 left a comment

Uh oh!

v-shobhit commented Jan 16, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

8 participants

Add accuracy_sample_count #2414

Add accuracy_sample_count #2414

Conversation

v-shobhit commented Dec 17, 2025

Uh oh!

github-actions bot commented Dec 17, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

nvzhihanj commented Dec 17, 2025

Uh oh!

mrmhodak commented Dec 17, 2025

Uh oh!

arjunsuresh commented Jan 6, 2026

Uh oh!

v-shobhit commented Jan 13, 2026

Uh oh!

Uh oh!

v-shobhit commented Jan 14, 2026

Uh oh!

pgmpablo157321 left a comment

Choose a reason for hiding this comment

Uh oh!

v-shobhit commented Jan 16, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

8 participants

github-actions bot commented Dec 17, 2025 •

edited

Loading