Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Gatekeeper misses failures in matrix jobs #27

Open
allanlewis opened this issue Apr 5, 2022 · 10 comments · May be fixed by #44
Open

Gatekeeper misses failures in matrix jobs #27

allanlewis opened this issue Apr 5, 2022 · 10 comments · May be fixed by #44

Comments

@allanlewis
Copy link
Contributor

I have a matrix job that tests a component in two configurations. When one fails, Merge Gatekeeper passes - the logs don't show my matrix jobs in the list reported for each status.

My matrix job is a Python library and is configured something like this:

jobs:
  test:
    runs-on: ubuntu-18.04
    strategy:
      matrix:
        tox-env: [env1, env2]
      fail-fast: false
    steps:
      - uses: actions/checkout@v2
      - run: tox -e ${{ matrix.tox-env }}
@rytswd
Copy link
Contributor

rytswd commented Apr 11, 2022

HI @allanlewis - thanks for the report! We at Upsider do not use too much of matrix jobs, and this sounds like an oversight indeed. We will review this a bit more closely to see if we can fix this 👍

@highb
Copy link

highb commented Jun 29, 2022

We use matrix jobs extensively where I work (see Test Render jobs in screenshot), and in a super simple test case I am not able to repro this issue. I'm going to test it further in other more complex scenarios to be sure.

image

As an aside: Thank you very much for making this action open source! It is extremely useful!

@allanlewis
Copy link
Contributor Author

allanlewis commented Jun 29, 2022

Perhaps this was fixed in v1.0.2? Based on the date I raised this, I was probably using v1.0.1. Maybe #28?

@highb
Copy link

highb commented Jun 29, 2022

That could be! I'll continue testing this out on my pipelines and provide a report if I see the issue.

@ajschmidt8
Copy link

My team is considering using merge-gatekeeper for matrix jobs. Can anyone confirm the latest status? Is the tool working for matrix jobs?

@rytswd
Copy link
Contributor

rytswd commented Aug 12, 2022

Thanks all for providing insights! 🥰

I just did a quick test on my side, and it seems that the matrix job handling is working correctly given the recent updates. I have created a test PR in a separate repository, where you can see the matrix jobs are listed out correctly.
rytswd/merge-gatekeeper-test#1
image

We will make updates to our documentation and add some example test cases to make it clear in this repository as well.

@rytswd rytswd linked a pull request Oct 29, 2022 that will close this issue
@allanlewis
Copy link
Contributor Author

Is this now closed, based on the most recent comments? I'll try out v1.2.0...

@rytswd
Copy link
Contributor

rytswd commented Nov 28, 2022

@allanlewis Yes, this should be sorted with the latest release (probably fixed from v1.1.1) 👍

@Fitzbert-Fitz
Copy link

Hi,
just want ro report, that I encountered "this" issue today. We have follwoing setup:

CI Workflow

  • Job A defines how the matrix should look like
  • Matrix B is build based on

Merge Gatekeeper is running with a check interval of 5 seconds and using @v1. I assume this should ensure that it is using latest.

Job A and merge gatekeeper where triggered. Job A calculated the matrix but the Matrix was not yet started and therefore did not exist in GitHub at all. Merge gatekeeper passed because it only saw job A which was successful. A second later GitHub builds up the matrix based on the output of job A.

This happens occasionally and I will increase our check interval to mitigate.

Possible Solution:
Merge Gatekeeper should not only check the state of all jobs it knows. It should also check the state of the parent workflow. Only if the Parent workflow state changes to successful it can stop checking for jobs and their states.

@sebastian-fredriksson-bernholtz
Copy link

sebastian-fredriksson-bernholtz commented Nov 26, 2024

We've been seeing this from time to time as well. Impression is that it's very rare, we recently had it happen once, with about 50 runs that caught the failure (after it had been merged in when it was missed), before we fixed the error.

The matrix needs another job though, and it seems like while that other job is running, the matrix jobs are not listed by merge-gatekeeper. In the cases where failures weren't missed all the jobs were listed as pending at the moment that the needed job was listed as completed, while in the case where the failures were missed, the needed job was completed without the other jobs being listed as pending. Not sure if this is a different issue?

Screenshots

Screenshot 2024-11-26 at 14 46 00
Screenshot 2024-11-26 at 14 45 34

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

6 participants