Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix accuracy test errors #5348

Closed
wants to merge 1 commit into from

Conversation

wat3rBro
Copy link
Contributor

Summary:
Some accuracy tests started to fail in between Jun 11 and Jun 17:

  • ❌ mask_rcnn_R_50_FPN_inference_acc_test
  • ✅ keypoint_rcnn_R_50_FPN_inference_acc_test
  • ✅ fast_rcnn_R_50_FPN_inference_acc_test
  • ❌ panoptic_fpn_R_50_inference_acc_test
  • ✅ retinanet_R_50_FPN_inference_acc_test
  • ❌ rpn_R_50_FPN_inference_acc_test
  • ✅ semantic_R_50_FPN_inference_acc_test
  • ❌ cascade_mask_rcnn_R_50_FPN_inference_acc_test

V1: update the yaml to reflect the new scores.

Differential Revision: D61301698

@facebook-github-bot facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Aug 14, 2024
@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D61301698

@wat3rBro
Copy link
Contributor Author

@ppwwyyxx do you know if there were changes recently that lead to difference in accuracy metrics?

@ppwwyyxx
Copy link
Contributor

When did this start to happen?

It could also be a change of CUDA version / precision. For example, are these now running on Ampere cards with TF32 enabled?

@wat3rBro
Copy link
Contributor Author

wat3rBro commented Aug 18, 2024

@ppwwyyxx Great call! The last successful run (in June) was on Volta, and the first failed run was on Ampere. Setting torch.backends.cudnn.allow_tf32 to False makes the accuracy tests pass (torch.backends.cuda.matmul.allow_tf32 is False by default). Do you recommend updating the numbers or adding a config to disallow tf32?

@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D61301698

wat3rBro pushed a commit to wat3rBro/detectron2-1 that referenced this pull request Aug 20, 2024
Summary:
Pull Request resolved: facebookresearch#5348

Some accuracy tests started to fail in between Jun 11 and Jun 17:
- ❌ mask_rcnn_R_50_FPN_inference_acc_test
- ✅ keypoint_rcnn_R_50_FPN_inference_acc_test
- ✅ fast_rcnn_R_50_FPN_inference_acc_test
- ❌ panoptic_fpn_R_50_inference_acc_test
- ✅ retinanet_R_50_FPN_inference_acc_test
- ❌ rpn_R_50_FPN_inference_acc_test
- ✅ semantic_R_50_FPN_inference_acc_test
- ❌ cascade_mask_rcnn_R_50_FPN_inference_acc_test

V1: update the yaml to reflect the new scores.
V5: it turns out that we can match the old scores by disabling tf32.

Differential Revision: D61301698
@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D61301698

wat3rBro pushed a commit to wat3rBro/detectron2-1 that referenced this pull request Aug 20, 2024
Summary:
Pull Request resolved: facebookresearch#5348

Some accuracy tests started to fail in between Jun 11 and Jun 17:
- ❌ mask_rcnn_R_50_FPN_inference_acc_test
- ✅ keypoint_rcnn_R_50_FPN_inference_acc_test
- ✅ fast_rcnn_R_50_FPN_inference_acc_test
- ❌ panoptic_fpn_R_50_inference_acc_test
- ✅ retinanet_R_50_FPN_inference_acc_test
- ❌ rpn_R_50_FPN_inference_acc_test
- ✅ semantic_R_50_FPN_inference_acc_test
- ❌ cascade_mask_rcnn_R_50_FPN_inference_acc_test

V1: update the yaml to reflect the new scores.
V5: it turns out that we can match the old scores by disabling tf32.

Differential Revision: D61301698
Summary:
Pull Request resolved: facebookresearch#5348

Some accuracy tests started to fail in between Jun 11 and Jun 17:
- ❌ mask_rcnn_R_50_FPN_inference_acc_test
- ✅ keypoint_rcnn_R_50_FPN_inference_acc_test
- ✅ fast_rcnn_R_50_FPN_inference_acc_test
- ❌ panoptic_fpn_R_50_inference_acc_test
- ✅ retinanet_R_50_FPN_inference_acc_test
- ❌ rpn_R_50_FPN_inference_acc_test
- ✅ semantic_R_50_FPN_inference_acc_test
- ❌ cascade_mask_rcnn_R_50_FPN_inference_acc_test

V1: update the yaml to reflect the new scores.
V5: it turns out that we can match the old scores by disabling tf32.

Differential Revision: D61301698
@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D61301698

@facebook-github-bot
Copy link
Contributor

This pull request has been merged in 5b72c27.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. fb-exported Merged
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants