Add retry to pytest for transient failure #37046

shunping · 2025-12-08T17:17:27Z

This change enables a retry mechanism for pytest to improve the developer experience and the reliability of our test/release pipelines.

Recently, we have observed an increase in flaky Python tests. These failures disrupt development and release validation, often requiring manual and time-consuming test reruns. Some post-commit workflows can take hours to complete, and multiple flaky tests can significantly delay getting a green build.

To address this, this change leverages the pytest-rerunfailures plugin with the following configuration:

--reruns 1: Allows for a single retry of a failed test. We want to keep this a small number so that we do not risk hiding a truly flaky test.
--reruns-delay 5: Introduces a 5-second delay before the retry.

This approach provides a balance between mitigating transient failures (such as GRPC deadline exceeded) and ensuring that consistently flaky tests are still surfaced and addressed.

fixes #37038

gemini-code-assist · 2025-12-08T17:17:45Z

Summary of Changes

Hello @shunping, I'm Gemini Code Assist¹! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request introduces a robust retry mechanism for pytest tests across various Python environments. By integrating the pytest-rerunfailures plugin and configuring it within tox.ini, the system will now automatically re-run failed tests up to three times with a short delay. This enhancement is designed to significantly improve the reliability of CI/CD pipelines by addressing and reducing failures caused by intermittent or flaky tests, ensuring more consistent and accurate test results.

Highlights

New Dependency Added: The pytest-rerunfailures plugin has been added as a new dependency in sdks/python/setup.py. This plugin enables the automatic re-execution of failed pytest tests.
Pytest Retry Mechanism Implemented: The tox.ini configuration has been updated across multiple test environments to utilize the newly added pytest-rerunfailures plugin. Pytest commands will now automatically retry failed tests 3 times with a 5-second delay between retries.
Improved Test Stability: This change aims to enhance the stability of the test suite by mitigating issues caused by transient or flaky test failures, reducing the need for manual re-runs of CI/CD jobs.

Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature	Command	Description
Code Review	`/gemini review`	Performs a code review for the current pull request in its current state.
Pull Request Summary	`/gemini summary`	Provides a summary of the current pull request in its current state.
Comment	@gemini-code-assist	Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help	`/gemini help`	Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution. ↩

shunping · 2025-12-08T20:47:43Z

cc'ed @damccorm

damccorm · 2025-12-08T21:04:06Z

sdks/python/tox.ini

 commands =
  python apache_beam/examples/complete/autocomplete_test.py
-  bash {toxinidir}/scripts/run_pytest.sh {envname} "{posargs}"
+  bash {toxinidir}/scripts/run_pytest.sh {envname} "{posargs}" '--reruns 1 --reruns-delay 5'


While I understand the pragmatic nature of this fix, I'm not sure it is the right approach. Doing this has a high likelihood of masking real failures, and generally I think we should be tracking down/fixing flakes. If there are things that innately mean we're going to have this level of flakiness, we're likely passing those on to users as well.

I'm open to more discussion here, but at a minimum I think if we're going to make a change like this it should be surfaced to the dev list, and probably should come with data that describes the problem and the reason we have this kind of flakiness. We could also evaluate alternate approaches there (for example, reducing the number of tests we run to reduce flakiness, only doing this for PR runs, etc...)

My rationale for this approach was based on PR #35915, where pytest retry arguments were introduced:
https://github.com/apache/beam/pull/35915/files#diff-33fb11ecf72212eda83aaf8e36f94816ad447d8d12896dc3b5e5ac3727adbbd1R114,
even though there are no actual effects for those arguments after my investigation.

I misinterpreted the acceptance of that PR as a general agreement to use retries for flaky tests and tried to fix the retry here.

That makes sense - my understanding from that PR is that those were going to be used for Grpc retries somehow, but I may be wrong (I probably shouldn't have merged with the meaningless options though).

Regardless, I'd be more open to this for a specific GHA suite (especially a precommit), but I think it warrants broader discussion/data first

damccorm · 2025-12-08T21:04:30Z

sdks/python/tox.ini

 commands =
  python apache_beam/examples/complete/autocomplete_test.py
-  bash {toxinidir}/scripts/run_pytest.sh {envname} "{posargs}"
+  bash {toxinidir}/scripts/run_pytest.sh {envname} "{posargs}" '--reruns 1 --reruns-delay 5'


Nit: If we're going to do this for every invocation, we should include it as part of the run_pytest.sh script itself

github-actions · 2025-12-08T22:07:18Z

Assigning reviewers:

R: @damccorm for label python.

Note: If you would like to opt out of this review, comment assign to next reviewer.

Available commands:

stop reviewer notifications - opt out of the automated review tooling
remind me after tests pass - tag the comment author after tests pass
waiting on author - shift the attention set back to the author (any comment or push by the author will return the attention set to the reviewers)

The PR bot will only process comments in the main thread (not review comments).

Add retry to pytest.

b077163

github-actions bot added the python label Dec 8, 2025

shunping added 2 commits December 8, 2025 12:32

Correct version for rerunfailures plugin.

7d1e1dd

Modify retry times.

6b61a3f

shunping marked this pull request as ready for review December 8, 2025 20:35

shunping self-assigned this Dec 8, 2025

shunping changed the title ~~Add retry to pytest~~ Add retry to pytest for transient failure Dec 8, 2025

damccorm requested changes Dec 8, 2025

View reviewed changes

github-actions bot added the Next Action: Reviewers label Dec 8, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Add retry to pytest for transient failure #37046

Add retry to pytest for transient failure #37046

Uh oh!

shunping commented Dec 8, 2025 •

edited

Loading

Uh oh!

gemini-code-assist bot commented Dec 8, 2025

Uh oh!

shunping commented Dec 8, 2025

Uh oh!

damccorm Dec 8, 2025

Uh oh!

shunping Dec 8, 2025 •

edited

Loading

Uh oh!

damccorm Dec 8, 2025

Uh oh!

damccorm Dec 8, 2025

Uh oh!

github-actions bot commented Dec 8, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Add retry to pytest for transient failure #37046

Are you sure you want to change the base?

Add retry to pytest for transient failure #37046

Uh oh!

Conversation

shunping commented Dec 8, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

gemini-code-assist bot commented Dec 8, 2025

Summary of Changes

Highlights

Footnotes

Uh oh!

shunping commented Dec 8, 2025

Uh oh!

damccorm Dec 8, 2025

Choose a reason for hiding this comment

Uh oh!

shunping Dec 8, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

damccorm Dec 8, 2025

Choose a reason for hiding this comment

Uh oh!

damccorm Dec 8, 2025

Choose a reason for hiding this comment

Uh oh!

github-actions bot commented Dec 8, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

shunping commented Dec 8, 2025 •

edited

Loading

shunping Dec 8, 2025 •

edited

Loading