Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Deflake etcd tests #13167

Open
serathius opened this issue Jun 30, 2021 · 21 comments · Fixed by #13175
Open

Deflake etcd tests #13167

serathius opened this issue Jun 30, 2021 · 21 comments · Fixed by #13175

Comments

@serathius
Copy link
Member

serathius commented Jun 30, 2021

If we look into tests results since we migrated Github Actions commits on main branch we get:

Where failure/success is based on green check vs red cross under commit message (commits without them means that they were not tested as they were multiple commits in one PR).

Those are all test failures on main branch, so after a PR passed tests and was approved. We can use those failures to calculate chance of any PR failing to pass tests just due to test flaking.

(7 + 14 + 14+ 14 + 15 + 13 + 13 + 19) / (32 + 31 + 33 + 29 + 25 + 22 + 33) = 53%

Having flakyness ratio of over 50% means that average PR needs to be run 2 times, but number of failures in sequences may be much much longer, 3-5 failures in row is not something uncommon. This can be frustrating especially to new contributors, as there is no easy way to retrigger tests (need to do an empty commit amend and push).

Proposal

Etcd community should set on a test flakyness target, measure it and establish a process to fix flaky tests.

For start I would propose to target a 10% failure rate for whole test suite. It should be reachable by fixing only couple of tests as from last runs we got 22% (7 out of last 32). Measuring flakyness could start from something simple, like for example running a script once a week that checks last 100 test results. If the measured flakyness is over our target, we should identify most flaky tests, create issues for them and encourage community to fix them.

For couple of first runs we could depend on executing the scripts manualy, but we should plan to automate them.

TODO:

cc @hexfusion @Rajalakshmi-Girish

@karuppiah7890
Copy link
Contributor

This sounds like a pretty interesting thing and also like a thing that alleviates a lot of pain and improves developer experience !

karuppiah7890 added a commit to karuppiah7890/issues-info that referenced this issue Jul 1, 2021
karuppiah7890 added a commit to karuppiah7890/issues-info that referenced this issue Jul 1, 2021
@karuppiah7890
Copy link
Contributor

@karuppiah7890
Copy link
Contributor

I'm able to get the number of successes and we can get failures too. Given total (for example 100) and any one of those (successes / failures), we get the other value too

@serathius
Copy link
Member Author

Great! Would you be interested in sending PR that adds it to etcd scripts ?

@karuppiah7890
Copy link
Contributor

Sure @serathius ! I was also wondering if I should try out a golang script too, so anyone can run it with just "go run" or similar on any platform. No need to worry about OS, bash shell being available, other tools being available etc. What do you think?

@serathius
Copy link
Member Author

serathius commented Jul 2, 2021

Letting everyone to run it is a good initiative, but on the other hand long term we should just automate it. Most scripts are already written in bash and I don't think there is any need to invest in this script too much. It should be simple enough (2-3 commands) that it could be replaced when needed.

I think it would make sense revisit those improvements when we have established whole process and automated it.

@karuppiah7890
Copy link
Contributor

Makes sense @serathius ! 👍 I'll raise the PR and we can discuss more about the bash script as part of the PR

karuppiah7890 added a commit to karuppiah7890/etcd that referenced this issue Jul 2, 2021
This is to start measuring the test flakyness and see the numbers improving once we improve and deflake flaky tests

Fixes etcd-io#13167
karuppiah7890 added a commit to karuppiah7890/issues-info that referenced this issue Jul 2, 2021
karuppiah7890 added a commit to karuppiah7890/etcd that referenced this issue Jul 3, 2021
This is to start measuring the test flakyness and see the numbers improving once we improve and deflake flaky tests

Fixes etcd-io#13167
karuppiah7890 added a commit to karuppiah7890/etcd that referenced this issue Jul 6, 2021
This is to start measuring the test flakyness and see the numbers improving once we improve and deflake flaky tests

Fixes etcd-io#13167
karuppiah7890 added a commit to karuppiah7890/issues-info that referenced this issue Jul 6, 2021
karuppiah7890 added a commit to karuppiah7890/etcd that referenced this issue Jul 6, 2021
This is to start measuring the test flakyness and see the numbers improving once we improve and deflake flaky tests

Fixes etcd-io#13167
karuppiah7890 added a commit to karuppiah7890/etcd that referenced this issue Jul 8, 2021
This is to start measuring the test flakyness and see the numbers improving once we improve and deflake flaky tests

Fixes etcd-io#13167
karuppiah7890 added a commit to karuppiah7890/etcd that referenced this issue Jul 8, 2021
This is to start measuring the test flakyness and see the numbers improving once we improve and deflake flaky tests

Fixes etcd-io#13167
karuppiah7890 added a commit to karuppiah7890/etcd that referenced this issue Jul 8, 2021
This is to start measuring the test flakyness and see the numbers improving once we improve and deflake flaky tests

Fixes etcd-io#13167
karuppiah7890 added a commit to karuppiah7890/etcd that referenced this issue Jul 13, 2021
This is to start measuring the test flakiness and see the numbers improving once we improve and deflake flaky tests

Fixes etcd-io#13167
karuppiah7890 added a commit to karuppiah7890/etcd that referenced this issue Jul 13, 2021
…ommits with failed status

The workflow runs on a cron schedule on a weekly basis - once every week

Fixes etcd-io#13167
karuppiah7890 added a commit to karuppiah7890/issues-info that referenced this issue Jul 29, 2021
karuppiah7890 added a commit to karuppiah7890/issues-info that referenced this issue Aug 13, 2021
@stale
Copy link

stale bot commented Oct 1, 2021

This issue has been automatically marked as stale because it has not had recent activity. It will be closed after 21 days if no further activity occurs. Thank you for your contributions.

@stale stale bot added the stale label Oct 1, 2021
@karuppiah7890
Copy link
Contributor

commenting to avoid closing of issue

@stale
Copy link

stale bot commented Dec 30, 2021

This issue has been automatically marked as stale because it has not had recent activity. It will be closed after 21 days if no further activity occurs. Thank you for your contributions.

@endocrimes
Copy link
Contributor

I hacked together a tool for finding/tracking/fixing flakes the other day: https://github.com/endocrimes/etcd-test-analyzer

Because it parses all of the test results from every run in a given time period, it makes it relatively easy to modify to ask new questions in place, but definitely isn't a tool that is widely useful in its current form.

image

@serathius serathius reopened this Apr 5, 2022
@serathius
Copy link
Member Author

serathius commented Jun 9, 2022

Status update, running ./scripts/measure-test-flakiness.sh gave me:

Commit status failure percentage is - 23 %

So on last 100 merged commits we got 24 test failures. Excluding 7 coverage failures (not blocking merge) and 2 recent failures due to post merge bug #14101, we get 14% flakiness.

Going down from 50% to 14% is great result!!
Thanks everyone who helped.

@serathius
Copy link
Member Author

serathius commented Jun 9, 2022

Looking into failures from last 100 runs (excluding coverage and known issues) we get failures in:

  • 4 failures in e2e tests of TestDowngradeUpgradeClusterOf3 (example) - @serathius
  • 4 failures in functional tests of BLACKHOLE_PEER_PORT_TX_RX_LEADER (example)
  • 4 failures in functional tests of NO_FAIL_WITH_NO_STRESS_FOR_LIVENESS (example)
  • 2 timeouts in grpcproxy test of TestLeasingReconnectOwnerConsistency (example)
  • 2 failures in grpcproxy test of TestWatchCancelOnServer (example)
  • 1 failure in integration test of TestDropReadUnderNetworkPartition (example) (possible goroutine leak in previous test)
  • 1 failure in integration test of TestBalancerUnderNetworkPartitionTxn
  • 1 failure in integration test of TestAuthority (example)
  • 1 timeout in grpcproxy test of TestLeasingReconnectNonOwnerGet (example)
  • 1 failure in integration tests of TestMaxLearnerInCluster (example)
  • 1 failure in functional tests of DELAY_PEER_PORT_TX_RX_LEADER_UNTIL_TRIGGER_SNAPSHOT (example)

@serathius
Copy link
Member Author

As there are a lot of tests would be great to get some help. Please let me know if you are interested in tackling one of the tests listed.

tjungblu added a commit to tjungblu/etcd that referenced this issue Aug 1, 2022
This should aid in debugging test flakes, especially in tests where the process is restarted very often and thus changes its pid.
Now it's a lot easier to grep for different members, also when different tests fail at the same time.
The test TestDowngradeUpgradeClusterOf3 as mentioned in etcd-io#13167 is a good example for that.

Signed-off-by: Thomas Jungblut <[email protected]>
tjungblu added a commit to tjungblu/etcd that referenced this issue Aug 1, 2022
This should aid in debugging test flakes, especially in tests where the process is restarted very often and thus changes its pid.
Now it's a lot easier to grep for different members, also when different tests fail at the same time.
The test TestDowngradeUpgradeClusterOf3 as mentioned in etcd-io#13167 is a good example for that.

Signed-off-by: Thomas Jungblut <[email protected]>
@stale stale bot added the stale label Sep 21, 2022
@etcd-io etcd-io deleted a comment from stale bot Sep 21, 2022
@serathius
Copy link
Member Author

@chaochn47
Copy link
Member

chaochn47 commented Sep 30, 2022

Thanks for raising this issue. It is really annoying for any contributors to etcd that unrelated tests failed.

I can take one TestDowngradeUpgradeClusterOf3 because I just faced in #14331. It's also a good opportunity to learn how downgrade works as well.

Track this in #14540

@serathius
Copy link
Member Author

serathius commented Mar 18, 2023

I noticed recent increase in flakes (at least in my PRs). From https://github.com/etcd-io/etcd/actions/runs/4394774437/jobs/7696017126 we see 26% of flakiness.

Loved recent initiative by @chaochn47 to use tools developed by @endocrimes in #15501.

It would be great to integrate them into https://github.com/etcd-io/etcd/actions/workflows/measure-test-flakiness.yaml
@chaochn47 would you be interested in this?

@chaochn47
Copy link
Member

Yeah, I can help add to the existing workflow. ETA next Monday

@nitishfy
Copy link
Contributor

Hi, I'd like to work on this!

@serathius
Copy link
Member Author

Thanks @nitishfy for your interest. The issue was created some time ago so not everything is up to date, however high level goals remained relevant. We want to improve our visibility of test flakes so we can fix them more effectively.

For the original plan, we have instrumented etcd e2e tests to export JUnit reports, @endocrimes and @karuppiah7890 implemented some custom scripts that would analyse them. This approach allowed us to start reporting and manually creating issues to fix flakes.

One thing we can do better is to avoid developing our own scripting, etcd community is not very big, so we want to avoid spreading too thin maintaining too many custom tools. With introduction of SIG-etcd we now have a option to benefit from whole ecosystem of tools built by Kubernetes community. We should do that.

One example of such tool is testgrid, it's a test result visualization tool that uses the same JUnit reports to create a grid showing which tests passed and which failed. It makes it really easy to track flakes. For example https://testgrid.k8s.io/sig-etcd-periodics#ci-etcd-e2e-amd64

I think we should work more on integrating with K8s tools, this first requires migrating etcd testing to Prow, K8s CI tool. This work can be tracked in kubernetes/k8s.io#6102.

In the meantime we could improve ensure that all etcd tests generate a Junit report, that can be later used.

Looking at github workflows only in https://github.com/etcd-io/etcd/blob/main/.github/workflows/tests-template.yaml
We set JUNIT_REPORT_DIR and export junit files

- uses: actions/upload-artifact@5d5d22a31266ced268874388b861e4b58bb5c2f3 # v4.3.1
if: always()
with:
name: "${{ matrix.target }}"
path: ./**/junit_*.xml
we should look into adding it to more test scenarios.

ArkaSaha30 pushed a commit to ArkaSaha30/etcd that referenced this issue May 23, 2024
This should aid in debugging test flakes, especially in tests where the process is restarted very often and thus changes its pid.
Now it's a lot easier to grep for different members, also when different tests fail at the same time.
The test TestDowngradeUpgradeClusterOf3 as mentioned in etcd-io#13167 is a good example for that.

Signed-off-by: Thomas Jungblut <[email protected]>
ArkaSaha30 pushed a commit to ArkaSaha30/etcd that referenced this issue May 23, 2024
This should aid in debugging test flakes, especially in tests where the process is restarted very often and thus changes its pid.
Now it's a lot easier to grep for different members, also when different tests fail at the same time.
The test TestDowngradeUpgradeClusterOf3 as mentioned in etcd-io#13167 is a good example for that.

Signed-off-by: Thomas Jungblut <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Development

Successfully merging a pull request may close this issue.

6 participants