scripts: add script to measure percentage of commits with failed status #13175

karuppiah7890 · 2021-07-02T12:19:33Z

This is to start measuring the test flakiness and see the numbers improving once we improve and deflake flaky tests
This PR is a part of the fix for the issue #13167

What this PR does -

It tries to get the last 100 etcd commits and their commit status data from the GitHub GraphQL API and count the number of failures and shows the failure percentage

It uses the following tools

bash
curl
jq

What this PR does not do -

This PR does not introduce any GitHub Actions Workflow to run the script automatically weekly once as mentioned in Deflake etcd tests #13167 . Let me know if we need to do it as part of this PR, I could add a GitHub Action Workflow config Yaml for it with cron schedule - This has been added
This PR does not analyze PRs of the commits with failure status and identify flaky tests

Secrets

This PR introduces a script which needs a GitHub Token to access the GitHub API and get etcd repo commit data

Documentation

This script does not have any documentation as of now. Let me know if I need to add any details in some docs. Like what it does, it's purpose in some README and any other details

Questions

Should we consider commits that don't have any status (multiple commits in a PR)? That is they have null status. Should we count them as part of the last 100 commits? The 100 commits currently have three statuses - success, null, failure. We count failures and then do calculation and say that's the percentage, so it assumes that the ones with null are all successes
Should we add documentation for this script? If yes, where should we add it?
Should we parameterize the number of commits we want to choose from the history? Currently it's hard coded as 100 last commits
Does the script name look okay? Let me know if I need to change it
The script creates a JSON file to dump all the GitHub API data about the repo commits and then does processing on it. Is that okay? Also, I have git ignored that file too
Does the message Commit status failure percentage is 31 % make sense? Or we want to give some other message talking about test flayness directly?
Is the commit message okay?
I put Fixes #13167 in the commit message but this is only part of fixing Deflake etcd tests #13167. I think it might close Deflake etcd tests #13167 with the merge of this PR. If it does, we can reopen the issue, yeah?

karuppiah7890 · 2021-07-02T12:23:24Z

cc @serathius

karuppiah7890 · 2021-07-03T05:25:56Z

There was a failure in one of the GitHub Actions Workflows - https://github.com/etcd-io/etcd/runs/2977019406 for the shellcheck and I fixed it

% 'shellcheck' '-fgcc' 'build' 'test' 'scripts/build-release.sh' 'scripts/codecov_upload.sh' 'scripts/fix.sh' 'scripts/genproto.sh' 'scripts/install-marker.sh' 'scripts/measure-test-flakyness.sh' 'scripts/release_mod.sh' 'scripts/test_lib.sh' 'scripts/update_dep.sh' 'scripts/updatebom.sh' './build.sh' './test.sh'
FAIL: (code:1):
  % 'shellcheck' '-fgcc' 'build' 'test' 'scripts/build-release.sh' 'scripts/codecov_upload.sh' 'scripts/fix.sh' 'scripts/genproto.sh' 'scripts/install-marker.sh' 'scripts/measure-test-flakyness.sh' 'scripts/release_mod.sh' 'scripts/test_lib.sh' 'scripts/update_dep.sh' 'scripts/updatebom.sh' './build.sh' './test.sh'
FAIL: 'run shellcheck -fgcc build test scripts/build-release.sh scripts/codecov_upload.sh scripts/fix.sh scripts/genproto.sh scripts/install-marker.sh scripts/measure-test-flakyness.sh scripts/release_mod.sh scripts/test_lib.sh scripts/update_dep.sh scripts/updatebom.sh ./build.sh ./test.sh' checking failed (!=0 return code)
scripts/measure-test-flakyness.sh:19:26: note: Useless cat. Consider 'cmd < file | ..' or 'cmd file | ..' instead. [SC2002]
FAIL: 'shellcheck' failed at Sat Jul  3 01:55:04 UTC 2021

Error - scripts/measure-test-flakyness.sh:19:26: note: Useless cat. Consider 'cmd < file | ..' or 'cmd file | ..' instead. [SC2002]

I removed the usage of cat now

serathius · 2021-07-05T11:36:14Z

Looks good, but I would reconsider writing to local file. Leaving trash in current directory is a bad practice as it can be easily avoided. Could you try writing a file? Some ideas:

Store in environment variable - response is limited to 100 commits so I don't expect that storing it on disk is required
Create a tmp file using mktemp

karuppiah7890 · 2021-07-05T12:00:57Z

Makes sense 👍 I'll write to a temp file and I'll also remove the git ignore too in that case

karuppiah7890 · 2021-07-06T15:33:53Z

@serathius I made the changes and wrote to a /tmp directory JSON file

Unrelated question - By the way, is trash the word that's normally used in this context of files that one wouldn't want to keep in the repo? I have noticed other folks use it too, like - trash, garbage. I was thinking words like unnecessary or unwanted in this context

karuppiah7890 · 2021-07-06T19:14:30Z

Oops. Another shellcheck issue that I didn't check locally 🙈

https://github.com/etcd-io/etcd/pull/13175/checks?check_run_id=3001649934

Fixed it

serathius

LGTM

We could consider cleaning up the temp directory on bash exit.

trap '{ rm -f -- "$temp_dir"; }' EXIT

karuppiah7890 · 2021-07-08T17:06:35Z

Makes sense @serathius ! I was thinking about that too, since it's a temporary thing, but then later felt that it's a small file and is in /tmp and would get deleted automatically at some point? But sure, I'll try to put that in to ensure that the temp file and directory is deleted. I just tried out trap for the first time! Interesting! :D Takes care of both success and failure cases to ensure that the temp file is removed regardless. Didn't know about it before

karuppiah7890 · 2021-07-08T17:20:23Z

@serathius I made the changes for deleting the temp directory 👍

https://github.com/etcd-io/etcd/pull/13175/files#diff-f2440ca53bfc7e10b83f12f28e7aedbea36e790833d5db80d25459d570cd48fcR14

karuppiah7890 · 2021-07-11T11:32:15Z

@serathius As part of this PR, shall I also create a GitHub Action Workflow config to run the script once a week automatically? Based on a cron schedule. Or should we do that as a separate PR?

lilic

Nice, thanks!

Lets add the github job here so we can solve any possible issues with the script.

This is to start measuring the test flakiness and see the numbers improving once we improve and deflake flaky tests Fixes etcd-io#13167

…ommits with failed status The workflow runs on a cron schedule on a weekly basis - once every week Fixes etcd-io#13167

karuppiah7890 · 2021-07-13T13:24:54Z

I changed the script file's name to fix the typo - flakyness vs flakiness

The script output currently looks like this -

$ ./scripts/measure-test-flakiness.sh 
Commit status failure percentage is - 30 %

I have also added the GitHub Actions Workflow config now. It uses the built in GITHUB_TOKEN from GitHub Actions Workflow. Usually it's injected based on some conditions - I think admins have to enabled tokens on PRs from forks? Something to look out for in case that's something we haven't discussed or needed before

Also, should we add a make target in the Makefile for running this? And use that in the GitHub Action Workflow config?

Looking forward to the review of the workflow config! :) cc @lilic @serathius

karuppiah7890 · 2021-07-13T13:26:49Z

I run the workflow every week once using a cron schedule - https://crontab.guru/#0_0_*_*_0

serathius

LGTM

serathius · 2021-07-27T14:03:57Z

@ptabor @gyuho @hexfusion Can an maintainer take a look?

lilic · 2021-08-05T11:38:09Z

@karuppiah7890 there seems to be a failing test, it looks like a flake only but do you mind just having a quick look. Thanks!

Otherwise agreed lgtm :)

karuppiah7890 · 2021-08-05T11:41:25Z

@lilic makes sense! I'll take a look!

karuppiah7890 · 2021-08-10T15:18:16Z

@lilic The errors don't seem related to this PR. I was checking this - https://github.com/etcd-io/etcd/pull/13175/checks?check_run_id=3191066005 and https://semaphoreci.com/etcd-io/etcd/branches/pull-request-13175/builds/8 . I don't have much context, and it was a lot of logs. I just know that https://github.com/etcd-io/etcd/pull/13175/checks?check_run_id=3191066005 is a Golang Test failure for some test(s) and not related to this PR , same for https://semaphoreci.com/etcd-io/etcd/branches/pull-request-13175/builds/8 . But I'll try to dig more into it to contribute to the tests including flaky tests in my upcoming PRs :)

stale · 2021-12-13T21:34:23Z

This issue has been automatically marked as stale because it has not had recent activity. It will be closed after 21 days if no further activity occurs. Thank you for your contributions.

karuppiah7890 · 2022-01-05T12:01:59Z

cc @lilic @serathius , some help here? Or should I reopen the PR?

serathius · 2022-01-10T13:06:17Z

@ptabor

karuppiah7890 force-pushed the issue-13167-measure-flakyness branch from f1e8f27 to b447aaf Compare July 3, 2021 05:24

karuppiah7890 force-pushed the issue-13167-measure-flakyness branch from b447aaf to 354de1b Compare July 6, 2021 15:28

karuppiah7890 force-pushed the issue-13167-measure-flakyness branch from 354de1b to 2d47ff7 Compare July 6, 2021 19:13

serathius approved these changes Jul 8, 2021

View reviewed changes

karuppiah7890 force-pushed the issue-13167-measure-flakyness branch from 2d47ff7 to d09f928 Compare July 8, 2021 17:19

karuppiah7890 force-pushed the issue-13167-measure-flakyness branch 2 times, most recently from d0855fa to 75a27fb Compare July 8, 2021 17:23

karuppiah7890 requested a review from serathius July 8, 2021 17:25

serathius approved these changes Jul 9, 2021

View reviewed changes

lilic reviewed Jul 13, 2021

View reviewed changes

karuppiah7890 added 2 commits July 13, 2021 18:42

scripts: add script to measure percentage of commits with failed status

2b796a0

This is to start measuring the test flakiness and see the numbers improving once we improve and deflake flaky tests Fixes etcd-io#13167

workflow: add workflow to invoke script that measures percentage of c…

3317716

…ommits with failed status The workflow runs on a cron schedule on a weekly basis - once every week Fixes etcd-io#13167

karuppiah7890 force-pushed the issue-13167-measure-flakyness branch from 75a27fb to 3317716 Compare July 13, 2021 13:21

karuppiah7890 requested review from serathius and lilic July 13, 2021 13:25

serathius approved these changes Jul 14, 2021

View reviewed changes

stale bot added the stale label Dec 13, 2021

stale bot closed this Jan 5, 2022

serathius reopened this Jan 9, 2022

stale bot removed the stale label Jan 9, 2022

serathius assigned ptabor Jan 10, 2022

serathius merged commit 8d8271f into etcd-io:main Apr 5, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

scripts: add script to measure percentage of commits with failed status #13175

scripts: add script to measure percentage of commits with failed status #13175

karuppiah7890 commented Jul 2, 2021 •

edited

Loading

karuppiah7890 commented Jul 2, 2021

karuppiah7890 commented Jul 3, 2021

serathius commented Jul 5, 2021

karuppiah7890 commented Jul 5, 2021

karuppiah7890 commented Jul 6, 2021

karuppiah7890 commented Jul 6, 2021

serathius left a comment

karuppiah7890 commented Jul 8, 2021 •

edited

Loading

karuppiah7890 commented Jul 8, 2021

karuppiah7890 commented Jul 11, 2021

lilic left a comment

karuppiah7890 commented Jul 13, 2021

karuppiah7890 commented Jul 13, 2021

serathius left a comment

serathius commented Jul 27, 2021

lilic commented Aug 5, 2021

karuppiah7890 commented Aug 5, 2021

karuppiah7890 commented Aug 10, 2021

stale bot commented Dec 13, 2021

karuppiah7890 commented Jan 5, 2022

serathius commented Jan 10, 2022

scripts: add script to measure percentage of commits with failed status #13175

scripts: add script to measure percentage of commits with failed status #13175

Conversation

karuppiah7890 commented Jul 2, 2021 • edited Loading

What this PR does -

What this PR does not do -

Secrets

Documentation

Questions

karuppiah7890 commented Jul 2, 2021

karuppiah7890 commented Jul 3, 2021

serathius commented Jul 5, 2021

karuppiah7890 commented Jul 5, 2021

karuppiah7890 commented Jul 6, 2021

karuppiah7890 commented Jul 6, 2021

serathius left a comment

Choose a reason for hiding this comment

karuppiah7890 commented Jul 8, 2021 • edited Loading

karuppiah7890 commented Jul 8, 2021

karuppiah7890 commented Jul 11, 2021

lilic left a comment

Choose a reason for hiding this comment

karuppiah7890 commented Jul 13, 2021

karuppiah7890 commented Jul 13, 2021

serathius left a comment

Choose a reason for hiding this comment

serathius commented Jul 27, 2021

lilic commented Aug 5, 2021

karuppiah7890 commented Aug 5, 2021

karuppiah7890 commented Aug 10, 2021

stale bot commented Dec 13, 2021

karuppiah7890 commented Jan 5, 2022

serathius commented Jan 10, 2022

karuppiah7890 commented Jul 2, 2021 •

edited

Loading

karuppiah7890 commented Jul 8, 2021 •

edited

Loading